Reviews: High Dimensional Structured Superposition Models

NIPS 2016
Mon Dec 5th through Sun the 11th, 2016 at Centre Convencions Internacional Barcelona

Paper ID:	1834
Title:	High Dimensional Structured Superposition Models

Reviewer 1

Summary

High dimensional superposition models have attracted much interest in parametric estimation. In these models the parameter of interest can be written as sum of multiple components, each with its own structure. The structure of component i can be captured by a norm Ri(.). A simple estimator has been proposed, which is essentially minimizing quadratic loss subject to constraints of the form "norm Ri of component i is less than some value". A condition called structural coherence (SC) has been proposed which considers interaction between different components (cone) structures. SC is intimately related to a generalization of restricted eigenvalue (RE) property to superposition models. Borrowing tools from empirical processes and generic chaining, the proposed reconstruction algorithm is analyzed and the l_2 error of components is bounded in terms of sample size, ambient dimension and maximum Gaussian width of structure cones.

Qualitative Assessment

The results are interesting and novel. The paper is well presented and clear. Two comments: 1) Explanation above (11) is a bit confusing. It is assumed that the error term \Delta is in the set sH. (Hence sum of L2 norms of component errors is s). Why is it the case? Especially that the upper bound in (11) is in term of s. It looks like an argument loop as stated in the current form. 2) Using the same tools, can you obtain upper bound on Lp norm of errors? Specifically, for Theorem 6 (Equation 25), can you derive bound for Lp norm instead of L2, say for p >= 1?

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

Reviewer 2

Summary

Superposition models consider signals, which are sums of multiple components, each of which follows some structural model. This question has been studied in multiple works, specifically for just two such components. The paper at hand studies the setup of more than two components. The core result is that they establish a structural coherence parameter for the components to be separated show that this property meaningfully describes to which extent separation is possible in a subgaussian measurement model.

Qualitative Assessment

The structural coherence property established in the paper is new as far as I can see and I can see that it could be a useful contribution to the literature. However, there is no sufficient detail on how far the condition goes beyond what is known and whether these new scenarios are relevant. The proof techniques, mainly the Mendelson small ball property with some convex analysis are appropriate, but not novel. The examples presented are mainly with just two components, which matches what has been studied before. Also in these cases, there is no discussion, whether for the concrete examples discussed the existing results are directly recovered from the theorem of this paper, and if not, if the results are better or worse in terms of statistical complexity. Another unfortunate aspect that made the paper hard to read was that apparently it did not compile well, so the references do not at all match the numbers in the bibliography, which makes it impossible to judge whether the description of the previous work is complete and accurate.

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

Reviewer 3

Summary

Qualitative Assessment

The recovery problem consists of parameters from multiple domains. This problem is a generalization of sparse recovery, low rank recovery, so on. The authors first introduce a geometric condition, structure coherence (SC) to certify the parameters associated with different norms. From this SC condition, they discuss the restricted eigenvalue (RE) condition. They also design an error bound and apply it to some practical problems. Appendix A presents an APG algorithm, however, the stepsize search for such complicated norm domains should be very challenging. Instead of APG, ADMM approach should be more systematic to handle multiple norm domains, the convergence and efficiency should be also better.

Confidence in this Review

1-Less confident (might not have understood significant parts)

Reviewer 4

Summary

This paper proposes a new convex estimatorfor superposition of different structures such as low rank matrices + sparse matrices.

Qualitative Assessment

This paper analyzes the estimation properties of a convex estimator. The analysis follows in the same steps as previous works by analyzing a restricted eigenvalue type ondition. The main technical work is in analyzing gaussian widths and lower bounding the restricted eigenvalue for different random models. These type of obunds are known in the literature, but the authors may have sharper analysis. I did not carefully compare it to known results.

Confidence in this Review

1-Less confident (might not have understood significant parts)

Reviewer 5

Summary

The authors study superposition models and their recovery from noisy linear samples. They analyze the scenario where we observe linear samples of linear superpositions of k structured parameters and obtain robust estimation conditions based on standard tools such as restricted strong convexity.

Qualitative Assessment

The authors study superposition models and their recovery from noisy linear samples. This topic is very well studied in recent years and there are multiple high impact papers characterizing the behavior of such models. In this sense, the technical contribution is not novel enough, and in particular, many of the results can be derived with already existing tools or are straightforward. Most importantly, though, the claims of this paper are misleading. Authors first state Eq 3 which would be a great result for the high-dimensional statistics literature. Here, we have $k$ structured parameters, and once we demix them, the total estimation error grows as the worst parameter \max_i omega(C_i). This is great because, what I would expect is " \sum_i omega(C_i)" which is k times worse. In reality, we can never achieve \max_i omega(C_i) for reasonable problems of interest in the demixing (or superposition) literature. In particular, the proposed SC condition (9) have a additional sqrt(k) constant on the left side for any reasonable demixing assumption. The reason is that as soon as we introduce a mild randomness to one of the parameters, the cones C_i are approximately orthogonal and l2 norm will not linearly add up. The only interesting problem for which SC condition holds for a constant is when all the parameters are same! I don't think that the authors are intentionally hiding the impact of k (maybe they didn't notice). However, as it stands, this paper claims to reduce a quantity that grows as k to log(k) which is misleading. I recommend them they emphasize the impact of k cleanly (e.g. at Eq 3 as well as when \rho is first introduced.

Confidence in this Review

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

Reviewer 6

Summary

This paper studied the error bounds for least square method for superposition models. Restricted eigenvalue condition is introduced to characterize the condition on the design matrix and noise design interaction is used to describe the upper error bound. The relationship between restricted eigenvalue condition and structural coherence is discussed.

Qualitative Assessment

The paper used restricted eigenvalue condition to characterize the error bound of superposition models and discussed the relationship between the restricted eigenvalue condition and structural coherence. These results are sufficient novel for publication. The paper, however, also have two primary weak points. First, the setting of the problem seems impractical. The requirement of alpha_i=R_i(theta_i^*) is too strict because the true model theta_i^* is unknown. Second, The probabilistic bound in Theorem 6 does not imply convergence in probability because the "high probability" (in supplemental material) does not converges to 1 as the sample size n goes to infinity. It is not sufficient to address the theoretical guarantee of recovery. Minor comments: In Theorem 6, "estimator (3)?" Something should be wrong here because equation (3) does not define an "estimator". Since "High probability" usually refers to convergence in probability and the result of Theorem 6 does not imply convergence in probability, it is suggested to state the probability explicitly to avoid misleading.

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)