NIPS 2017
Mon Dec 4th through Sat the 9th, 2017 at Long Beach Convention Center
Paper ID: 3507 When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent

### Reviewer 1

This paper gives nice examples to show that cyclic coordinate descent can be asymptotically faster than randomized coordinate descent. Though different to those existing non-asymptotic iteration complexity analysis, this paper is still of certain interest for nonlinear optimization. I therefore recommend this paper to be accepted. My detailed comments are as follows. 1. The claim that people have the perception that CCD is always inferior is false. At least in the case that n is small, this can be not the case. The paper by Sun and Ye only claimed that it scales worse with respect to n. [6] also provided an example that random permutation CD (which usually behaves closer to RCD than to CCD) can be worse than CCD when n = 2. 2. In line 110: the result for the case l approaches infinity is called Gelfand's formula (as mentioned in [11]). 3. I would still hope to see the rate of RCD represented in E || x - x^* ||, as this value can be directly related to the expected objective value, which is the one that is really of interest. As noted in [20], the current analysis for RCD cannot be easily linked with the expected objective and therefore the result is weaker. It should not be too difficult to obtain such results by utilizing Nesterov's results in [14]. 4. I am somehow confused by the notations: in definition 4.3, is the inequality element-wise or is it for the eigenvalues? In the notation section it is claimed that the norm is always the Euclidean norm, but there are also norms applied on the matrices, which clearly cannot be the Euclidean norm. 5. [17] and [18] are referring to the same paper, and the correct papers to refer should be "Richtárik, Peter, and Martin Takáč. "Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function." Mathematical Programming 144.1-2 (2014): 1-38." and "Lu, Zhaosong, and Lin Xiao. "On the complexity analysis of randomized block-coordinate descent methods." Mathematical Programming 152.1-2 (2015): 615-642." 6. In the outline section, description of sec. 4 was ignored. 7. In the notation section, the part about the superscripts is mentioned twice. 8. The D matrix was used in line 51 without a definition, and later it's mentioned that the diagonal parts of A is denoted by I while in (4) D is used. Though under the assumption that all diagonal entries are 1, they are equivalent, it is still better to unify the notations. === After reading the authors' feedback === I thank the authors for the feedback and have no further comments.