NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 6476 The Impact of Regularization on High-dimensional Logistic Regression

Reviewer 1

The submission is particularly well-written. The prior work, notations and mathematical setup are crystal clear and constitute excellent motivations for the work presented. Proofs and results are extensions of Sur and Candès' work; they are not straightforward though, and often commented and summarized. The submission put so much effort on being as clear as possible that few parts are disappointing in that regard: - the equation system is a central but presented somewhat abruptly; some naive intuition or interpretation would have been appreciated; - the uniqueness of the solution is assumed in Theorem 1. Are there simple settings where it is always the case? Is this assumption reasonable with l1/l2 regularizers? - the ending is sudden: what are the open questions and potential future work? One last remark: the supplementary material is on the same level of quality as the main paper, which is rare enough to be noted. Updated review: The author have addressed most of our concerned so I am favorable for the paper acceptance.

Reviewer 2

Originality: This paper develops asymptotics theory for high-dimensional regularized logistic regression (LR). The paper meaningfully generalizes the work in [1] from unregularized LR to regularized (any separable f). The main result of the paper (Theorem 1) is proved for any locally-Lipschitz function \Psi which then in special cases provides asymptotics for common descriptive statistics like correlation, variance, mean-squared error. Special case results for L1 and L2 regularized LR are also derived and quantities highlighted in 1 above are derived. The paper also demonstrates that the numerical simulation results align with the theoretical relations. Quality: The paper contains high quality results and proofs, the notation and setup is well defined in section 2 before the main results. The proofs are well organized except for the proof in section 6.2 where (a) the paper appeals to [2] on multiple occasions including an instance of flipping order of min-max; (b) the paper defines series of new variables and corresponding Lagrange multipliers where it is hard to keep track and the rationale is not clear -- both of these can be explained better. Clarity: The paper is very well written and theorems, lemmas and corollaries are clearly stated and explained. However, the paper can be improved by providing intuition at couple of places. For example, (a) when the 6 nonlinear equations are defined, it is not clear what to make of any of those quantities (\alpha, \sigma, \gamma, \theta, \tau, r) and where the come from; (b) the convergence of numerical method although mentioned in Remark 2, does not provide any intuition on why it works or cases when it does not work. This relates to lines 508-509 where a discussion/insight on whether a solution for (67) always exist? When can the solution be at the boundary where \alpha, v, r, or \tau = 0? Significance: The generalized result as well as special cases of L1/L2 described in the paper are significant in extending the theory to regularized LR which is widely used in practice without understanding the new theory for high-dimensions as pointed in [1]. [1] https://arxiv.org/abs/1803.06964 [2] https://arxiv.org/abs/1601.06233

Reviewer 3

The paper is well structured in presenting the results in a clear way. The results presented are novel. Some questions for clarifications below: 1. Is there an interpretation of the optimal lambda to that of the model parameters and the regularizer chosen ? 2. Any comments on the assumption of the uniqueness of the solutions for (6), which are being used for Theorem 1? 3. Do the results extend to the settings where the data X has correlated predictors ? Possible typo: Theorem 1, line 156. Possible typo (Should have been \beta \in \Pi) ?