Export Reviews, Discussions, Author Feedback and Meta-Reviews

Paper ID:	1705
Title:	Accelerated Mini-batch Randomized Block Coordinate Descent Method

Current Reviews

Submitted by Assigned_Reviewer_1

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)

This paper explores the integration of mini-batch, randomized block CCD, and variance reduction for regularized empirical risk minimization problem. The major two problems the author target is variance reduction and the incremental proximal point problems when dealing with empirical and expected risk function. This method aims to show advantage over a recently proposed method, SPVRG.

Quality:
The experiments seem not enough. There are a couple of recently proposed methods aiming the same problem the paper cited, such as semi-SG [6], SVRG [5] and (SAG) [14]. The reviewer would like to see the comparison results of these methods. The paper compared with BPG and BRBCD, which are batch methods and thus the comparison is not fair enough. Besides the lack of comparisons, the current experimental results are not so exciting, compared with the recently proposed SPVRG algorithm which also aiming at shooting the variance reduction problem, the proposed MRBCD-II shows visible yet minor improvement on two moderate-size domains. Experiments on larger domains, if applicable, is preferred.

Clarity:

Some details regarding the paper：

The paper is not quite well written, there are some typos which can be easily found, e.g.,
line 160 'shows how to reduce shows how to reduce'
line 217 'full take' should be 'fully take'

line 193-194, 'These adjustments can guarantee ... goes to zero'. Please give proof on this.

In experiment 1, from line 340-344, different stepsizes are chosen according to different approaches. The reviewer wants to know how they are chosen and if they are optimal stepsize w.r.t the method to make a fair comparison.

The paper goes without a conclusion part, which is fine, yet it is better to be with a conclusion part.

Originality:
The contribution seems incremental.

Significance:
The paper could be improved given more detailed analysis along with comprehensive comparison experimental study and result analysis.

Q2: Please summarize your review in 1-2 sentences

The idea is interesting, the theoretical analysis is sound to me, however, comparison studies in the experiment part is far from enough.

Submitted by Assigned_Reviewer_41

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)

Summary: The paper proposes a stochastic (mini-batch) version of the randomized block coordinate descent (RBCD) method called MRBCD-II. It employs a recently developed variance reduction technique for stochastic optimization to ensure that MRBCD-II still attains good convergence rate (linear convergence) as that of RBCD while reducing the overall computational complexity. They also develop a empirical variant MRBCD-III which uses active set strategy in MRBCD-II which achieves good practical performance.

Quality: The writing of the paper is well developed and organized. The algorithms are presented in a logical order which facilitates understanding. The assumptions are clearly stated. The algorithms are tested on both synthesized and real-world datasets. Overall, it has good quality.

Clarity: The paper is well motivated. The related work and techniques are clearly introduced. A typo at Line 160: “shows how to reduce” is repeated.

Originality: The paper proposes a mini-batch version of RBCD to address its scalability problem for large-scale dataset, which was not considered before (at least not mentioned by the paper). Their algorithm and analysis techniques extend from previous result.

Significance: The proposed mini-batch RBCD could potentially be useful for solving sparse learning problems with a massive dataset

Q2: Please summarize your review in 1-2 sentences

The paper combines the advantage of two previously developed algorithms (RBCD and stochastic proximal methods) through a variance reduction technique, which results in an algorithm suitable for large-scale sparse learning. The paper is of good quality and their algorithm has good potential for practical use.

Submitted by Assigned_Reviewer_42

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)

The paper proposes a mini-batch randomized block coordinate descent (MRBCD) method for empirical risk minimization problems with a nonsmooth, block separable regularization function. This can be viewed as a stochastic variant of existing RBCD method, and unlike existing RBCD methods, this approach eliminates the need for all component functions within each iteration. An accelerated variant of the method is obtained by applying variance reduction techniques, and the method is shown to converge linearly in expectation. A third variant of the method employs an active set strategy which is shown empirically to improve the performance. The numerical experiments are described in great detail, and they verify that the variance reduction technique and the active set approach work well and yield an improvement over existing methods.

The paper is well-written and technically sound, and its contributions are nontrivial and notable although the proposed methods are essentially obtained by combining a number of existing methods and techniques. The references to related work are adequate, but it should be mentioned that the work is very closely related to the following arXiv preprint that appeared after the submission deadline:
H. Wang and A. Banerjee, “Randomized Block Coordinate Descent for Online and Stochastic Optimization”, arXiv:1407.0107, July, 2014.

Remarks:
Line 160: repeated words: “shows how to reduce shows how to reduce”

Q2: Please summarize your review in 1-2 sentences

The paper proposes and analyzes new mini-batch randomized block coordinate descent methods that are shown to improve the iteration complexity for strongly convex functions when combined with variance reduction techniques. Numerical experiments verify that the methods can outperform existing methods.

Author Feedback

Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 6000 characters. Note however, that reviewers and area chairs are busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point.

We thank the reviewers for their helpful comments. We will carefully address all raised concerns in the final version. Below are some clarifications:

1. Xiao and Zhang 2014 have already compared SPVRG with SAG (Roux et al. 2012) in computational efficiency. Their results show that both methods attain similar performance. Therefore, SAG should perform worse than MRBCD-III. Nonetheless, we will add experimental results on SAG in the final version for a more comprehensive comparison.

2. Our experiments on MRBCD-II are only to validate our theoretical rates of convergence in practice. Among all three MRBCD methods, MRBCD-III is the most efficient version. From Figure 5.1b and Figure 6.1b, we can see that MRBCD-III is much faster than SPVRG. We only adopted a moderate size simulated dataset, since BPG and BRBCD are not scalable for large size dataset. In our real data example, we did not include BPG and BRBCD in the comparison. Thus we adopted a much larger dataset (n = 20,242 and d = 47,236).

3. In Section 3.1, we mention that the adjustments of the exact gradient guarantee that the variance introduced by stochastic sampling over component functions asymptotically goes to zero. We then rigorously prove that argument in Lemma 4.1. In particular, Lemma 4.1 shows that the variance of the gradient estimator is bounded by objective value gap towards the optimum. Since our method guarantees the decreasing of the objective value, it essentially also guarantees the decreasing of the variance.
4. As mentioned on Line 345, for both MRBCD and SPVRG, the step size and number of iterations m within each inner loop are tuned over a grid.

5.We will cite the paper by Wang and Banerjee as the reference in the final version.

We also thank all reviewers for pointing out the typos. We will improve the writing and fix these typos in the final version.