NIPS 2016
Mon Dec 5th through Sun the 11th, 2016 at Centre Convencions Internacional Barcelona
Paper ID: 1536 On Robustness of Kernel Clustering

### Reviewer 1

#### Summary

This paper shows that in the context of kernel clustering, an SDP relaxation provides strong consistency and better performance with respect to outliers when compared to an SVD relaxation.

#### Qualitative Assessment

Although the main findings and analysis look promising and it is interesting for the community, I'm concerned about the following: Assumptions: * It is very unrealistic to assume that all clusters have the same size. What happens if this assumption is lifted ? * The number of clusters k is known. This is also unrealistic since in most real-life problems, k is not known. * The centroids are well-separated. Again this casts doubts on whether the analysis is valid in real-life problems. Experiments: * All experiments are performed using toy problems and in some cases, the parameters seem to be set arbitrarily. * Same \sigma, same number of clusters k. These parameters are critical in kernel clustering and the current state of the experiments fails to convince the reader.

#### Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

### Reviewer 2

#### Summary

This paper introduces a semidefinite programming relaxation for the kernel clustering problem, then proves that under a suitable model specification, both the Kernel-SVD and Kernel-SDP approaches are consistent in the limit, albeit SDP is strongly consistent, i.e. achieves exact recovery, whereas K-SVD is weakly consistent. Experimental results on toy datasets are presented.

#### Confidence in this Review

1-Less confident (might not have understood significant parts)

### Reviewer 5

#### Summary

The robustness of kernel clustering has not been sufficiently explored yet. The paper proposed two methods for kernel clustering, i.e. the semidefinite programming relaxation for the kernel clustering problem (SDP) and kernel singular value decomposition (K-SVD), and investigated the consistency and robustness of two kernel-based clustering algorithms.

#### Qualitative Assessment

1. K-PCA is a technique for dimensionality reduction, it is ambiguous that the paper considers it as a kernel clustering algorithm without any extra stating (line 101). 2. It seems that the paper applies a new SDP relaxation to the kernel clustering for the first time according to the author(s), but just reminds there exist the same relaxation in previous work (line 107), lacking of explanation for why it is reasonable and effective to do so. 3. The paper does not give analysis on robustness of other kernel clustering methods based on semidefinite relaxation (if there exist such method), thus lacking of comparision to show the essentialness for proposing the SDP relaxation for kernel clustering. 4. The experimental results on synthetic data set seem not to be convincing, tests on real data sets are needed. 5. Some terms and notations may be ambiguous without special explanation, e.g. "population matrix" (line 136), "[n+m]" (line 63).

#### Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

### Reviewer 6

#### Summary

The authors provide a robustness analysis of two popular forms of kernel k-means clustering: a semi-definite programming relaxation and a kernel singular value decomposition. The goal is to understand behavior in the presence of (a possibly growing number of) outliers, in particular, consistency. They adopt a problem set-up that is well-matched to k-means (e.g. equal-sized clusters) and partition the data into groups of inliers and outliers. The framework is also well suited for scenarios where the data lie on a lower-dimensional manifold that may be obscured. Their theoretical analysis and results are accompanied by a very nice summary, one that will serve audiences not as familiar with the technical details of the proofs. In short, SDP is the winner, showing strong consistency when there are no outliers and also better performance when the number of outliers grows slowly as compared to the number of observations. The numeric analyses show general agreement with the theoretical results.

#### Qualitative Assessment

In general, I thought the paper was well-written and made some nice theoretical contributions to this area. What I valued most was the authors' ability to explain the theoretical results conceptually along side the technical details. This is hard to do and not that common. The paper's audience and influence will be larger for it. I would have liked to see some more discussion about the assumption that the noise is sub-gaussian and isotropic. Something along the lines of how limiting those assumptions might be in practice and if some of their results might still apply (via some sort of approximation). In the experiments, the authors use an extra cluster for outliers, but it seems odd to assume that the outliers would have a similar distribution or structure as the actual clusters. In mixture model clustering, it is common to assume something like a uniformly distributed cluster to pick up the noise/outliers or to use a contaminated distribution to pick up outliers. Treating all the outliers like they're a group of the same size and distribution as the rest of the inliers sounds unrealistic. p.2, line 76, should be "where the data lie on a low-dimensional manifold" p.3, line 111, should be "One first does SVD on …, then does k-means on the first k eigenvectors" p.5, lines 182/183 - inconsistency in algorithm notation p.5, line 185 - K-means should be capitalized (it is on p.7) p.8, line 275, should be "K-SVD is doing almost as well" p.8, lines 285-286, should be "as the number of observations grows and the outliers are a small fraction of the inliers"

#### Confidence in this Review

2-Confident (read it all; understood it all reasonably well)