NIPS 2018
Sun Dec 2nd through Sat the 8th, 2018 at Palais des Congrès de Montréal

### Reviewer 1

This paper studies the multi-response regression model, and in particular, the alternating minimization algorithm for the problem . In contrast to prior work, this paper makes the following improvements: 1. It does not require the resampling assumptions that usually abound in showing results on alternating procedures. With good initialization, the procedure without resampling is able to achieve what is usually the minimax rate of such problems. 2. It shows that an arbitrarily initialized problem will still exhibit convergence, but with a worse rate. 3. It allows for the unknown parameter to lie in a possibly non-convex set, and for the noise to be sub-Gaussian. The major technique is that of generic chaining, which allows the authors to prove bounds that hold uniformly over all iterates. I liked the paper and the result overall seems quite interesting, in particular points 1 and 2 above. It seems as though the techniques could also translate to other problems. While I did not read the proofs very carefully, they seem quite reasonable overall given the deterministic conditions that the authors posit. On the negative side, the fact that proof for Alternating Minimization are known to exist without re-sampling means that the paper would benefit from a comparison of the current techniques to those used in the past; if the techniques themselves are not so different, then analyzing AltMin in this particular setting without resampling is not particularly impactful or insightful. Concerns: 1. The authors could de-emphasize point 3 above. It does not particularly impress me, since the authors require an exact solution to each iteration of the parameter estimation problem anyway, irrespective of non-convexity, and this can only be provided in some settings. Similarly, most Gaussian upper bounds can be translated to sub-Gaussian upper bounds since the required behavior of the stochastic process is similar in the two cases. 2. It would be good to compare these results to other results on alternating minimization that do not require resampling (e.g., see the references below). Minor: Line 131: The function could be misinterpreted. Consider stating as d_1 (respectively d_2) to signify two clauses. [1] I. Waldspurger, “Phase retrieval with random Gaussian sensing vectors by alternating projections” IEEE Trans. on Inf Theory, 2018. [2] M. Soltanolkotabi. Algorithms and theory for clustering and nonconvex quadratic programming. PhD thesis, Stanford University, 2014. Post-rebuttal: I have read the response, and maintain my score. This is a strong paper with interesting results.

### Reviewer 2

This paper proposes a new approach to analyzing alternating minimization algorithms. The main idea is to obtain uniform control the error of one set of iterates in terms of the error of the complementary set of iterates. The benefit of uniform control is the analysis applies to non-resampled alternating minimization algorithms. It's hard to make recommendations beyond presentation of the results without reading the proofs carefully, but the results are certainly first rate. Regarding the presentation of the results, it would be ideal if the authors can state a version of Lemma 2 (or refer to the analogous result in the appendix) in the discussion at the end of Section 3.2. Is the improvement in Theorem 2 from Theorem 1 a result of an improved Lemma 2 if we only restrict to well-initialized $\Sigma(\theta)$'s?