NIPS 2018
Sun Dec 2nd through Sat the 8th, 2018 at Palais des Congrès de Montréal
Paper ID: 5689 The Sample Complexity of Semi-Supervised Learning with Nonparametric Mixture Models

Reviewer 1

The paper presents a theoretical analysis of a particular class of semi-supervised algorithms. In this setting the unlabeled data is used to estimate a mixture model $F$ defined by it's mixing measure $\Lambda$; under the assumption that the number $K$ of mixture components is equal to the number of classes (which is a rather strong assumption), the labeled data data is then used to find a permutation $\pi$ of $\{ 1 \dots K \}$ which minimizes the loss of the classifier $g_{\Lambda,\pi} : \mathcal{X} \rightarrow \mathcal{Y}$. In particular the paper examines both the maximum likelihood and majority vote estimators. The presented work decouples (lines 118-122) the analysis of $\Lambda$ focusing on the second step of the algorithm  (lines 77 -81), namely the estimation of $\pi$ given $\Lambda$ and provide sample complexity bounds that relate $P(\hat{\pi} = \pi^*)$ to the number of classes, and mixture components, $K$ and the 'gap' $\Delta$ (equation (6),(7)).  The authors show how the problem of finding the optimal perturbation for a given $\Lambda$ can be, in the case of MLE, be formulated as a bipartite graph matching problem, and be solved using the Hungarian Algorithm, also presenting a greedy, approximate, algorithm for solving the problem (BP-matching). Finally an empirical analysis of the three algorithms is presented in section 6 using both synthetic data (Gaussian) and 'real' data (MNIST). The paper is generally well written, however I do not feel the authors situated there work well within the prior art. The present work seems to be mainly contrasted with the work of Castelli and Cover, and though some generally grouping of other works is presented in lines 110-122, there is no sense of where exactly the work lays in relations to these  (beyond some general positioning). Is the present work the only work that analyses $\pi$ and its relation $\Lambda$?  Of interest here would also be whether or how known SSL algorithms can be analyzed  within this framework. Partly in view of this I struggle to assess the significance of the presented results. It would seem to me that the authors are able to analyze a more general class of mixture models without underlying assumptions, exactly because they eschew analyzing the estimation of $\Lambda$ (and its relation to $\Lambda^*$) and focus on $\pi$ for a given $\Lambda$. Lines 158-160 would seem to support this argument.  But even given this, I am still unsure of the significance of the elicited conditions on $\Lambda$. The authors mention that the 'ease' of learning $\pi$ is related to the quantity $\Delta$ and mention that a 'crucial condition' (line 152) is $\Delta>0$.  Here I am not sure I understand. It is obvious that for $\Delta \leq 0$, the bound in Theorem 4.1 becomes meaningless, however it is not clear to me that this translates to it being a necessary condition. Perhaps I am not understanding the meaning of 'crucial' but I feel this is an important point, if there is some prior art on how 'learnability' relates to the gap $\Delta$ this would be helpful, particularly given the fact that the condition $\Delta>0$ is subsequently accepted as a 'weak' assumption. It is also not clear why this assumption is considered 'weak'.  Proposition 4.2 does not help back up this claim as it based on the assumption that $\Lambda = \Lambda^*$ which is a very strong, in fact unrealistic, assumption. I have similar concerns regarding the analysis in section 4.2. Regarding section 4.3, could the authors comment on the assumption $\Delta \in \Omega(1)$, the assumption that the gap is not related to $K$ the number of mixture components/classes does not make intuitive sense to me. Is there perhaps prior art here? Finally, as the main focus and strength of the paper is the theoretical analysis, I do not feel the empirical analysis, though interesting, provides any particular insight into the analysis. It could perhaps be better fitted in the appendix with more space in the main paper to analyze the theoretical results. # Post response In response to authors' feedback and their willingness to revise the paper as per reviewers' input, I am updating my overall score for this submission.