NIPS 2018
Sun Dec 2nd through Sat the 8th, 2018 at Palais des Congrès de Montréal
Paper ID: 2068 Bilevel Distance Metric Learning for Robust Image Recognition

### Reviewer 1

In this manuscript, authors propose to learn the low level features and high level features simultaneously. By learning the sparse low level features, authors claim that they are more robust for the metric learning process. My concerns are as follows. 1. The proposed method may be inefficient for the real-world applications. Both number of examples and dimensionality of features in the experiments are small. Note that when updating $Z$, authors has to enumerate triplet constraints, whose size is cubic in the number of training data. When projecting the updated metric back to the PSD cone, the cost is cubic in the number of low level features. It makes the algorithm hard to handle the real applications. For the test phase, the proposed algorithm also has to obtain the sparse code for each example. 2. Authors didn't mention the size of dictionary in experiments. The size of metric is quadratic in the number of low level features. If the size of dictionary is too large, which is ubiquitous for image classification, the proposed method can be impractical. Besides, authors should report more settings in experiments, e.g., $k$ in k-NN, number of iterations of LMNN, $k$ in LMNN, etc. 3. For the problem in Eqn.6, it can be nonconvex since both of metric M and Z are variables to be solved. So the analysis in Section 2.5 is suspect. After the rebuttal: 1. Authors applies 1-NN to alleviate the large-scale problem, which is not convincing. 2. The setting in the experiments is not common for DML. Besides, codebook with size of 120 is too small for a meaningful sparse coding.

### Reviewer 2

This paper is not in my area of expertise.

### Reviewer 3

This paper proposes a bilevel distance metric learning method for robust image recognition task. A set of experiments validate that the proposed method is competitive with the state-of-art algorithms on the robust image recognition task. The main idea of this paper is novel. Based on my knowledge, this is the first work on bilevel metric learning. In the proposed bilevel model, the lower level characterizes the intrinsic data structure using graph regularized sparse coefficients, while the upper level forces the data samples from the same class to be close to each other and simultaneously pushes those from different classes far away. Thus, the proposed model combines the advantages of metric learning and dictionary learning. The proposed bilevel model is optimized via ADMM method. During the optimization step, the authors converted the lower level constraint in Eq. (3) into an equivalent one in Eq. (5) using KKT conditions. This is a good and interesting idea. This paper is well-organized. The motivations of the proposed model and algorithm are articulated clearly. Meanwhile, the derivations and analysis of the proposed algorithm are correct. The experiments on image classification with occlusion/noise are interesting, because few existing papers on metric learning can effectively address these difficult cases. I think the main reason is that the proposed method unites the advantages of dictionary learning models. The experimental results show that the proposed model is better than the other existing methods. A few additional comments on this paper: 1.The experimental settings should be further clarified. 2.There are some parameters in the proposed model, so the authors need to discuss the influences of individual parameters on the model’s performance. I have read the authors’ rebuttal and other reviews. I think this work is solid, so will keep my score and vote for acceptance.