
Submitted by Assigned_Reviewer_37
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper proposed three schemes to speed up the APSVM algorithm originally proposed in [12]. All the schemes are very interesting. They can be very useful for solving largerscale APSVM problems.
The results are somewhat not so comprehensive.
1) It would be interesting to see a speedup result of the combination of all three tricks. 2) It would be interesting to see the demonstration of the algorithm performing on largescale ranking tasks.
On the table 2, if the percentage (e.g., k=75%) is the portion of hard examples, would it k=75% be the case that is closest to APSVM? But the results seems to indicate that k=25% is the closest to APSVM.
Q2: Please summarize your review in 12 sentences
An interesting paper but the results are not so comprehensive. Submitted by Assigned_Reviewer_42
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper proposes three different approaches to speed up the learning for APSVM (a structured SVM with AP loss). 1. The first approach uses the unimodal behavior of the loss function, so uses binary search to speed up calculating the loss. 2. The second approach notices that usually we have more negative samples compared to positive ones, and in calculating the loss, relative rank of negative samples falling between two consequent positive ones is not important. Hence, it speeds up by identifying all negatives between two consequent positive ones and assigns the same rank to all of them. 3. finally, it approximates the training by using 01 loss for easy examples (easy under 01 loss SVM).
In my view, all three tricks are interesting, clever, and effective in speeding up the training. They are described clearly.
In working with big data, speeding up the training is an important problem, so NIPS audience are interested in this topic.
I think the evaluation can be more thorough. Particularly, it might be important to try the algorithm on a different problem and also with larger number of training data. For instance, as I understood, the number of negatives in this task is not that large. There are other tasks like object detection that any possible bounding box in the image is considered a negative sample, so the number of negative samples can be exponentially large. I would suggest applying this algorithm on one of those tasks to show off the real power of this algorithm.
In Table 1, I was thinking APSVMSELECT should be faster than APSVMSEARCH, but it is not the case. It might be the case in another experiment with larger number of examples.
Also, in Fig 2right, I was thinking by increasing the number of positive examples, APSVMSELECT should loose its advantage and become slower than APSVMSEARCH since the gaps become small, but it is not the case in this Fig.
In Fig 2 middle, I am surprised that the binary SVM is slower than the APSVMSEARCH. I thought the only difference is in calculating the loss, and calculating 01 loss is trivial. Maybe I didn't understand it completely.
I think these should be discussed and explained if possible in the text.
Is it possible to combine the first two tricks? Or even combine all three? What do we expect to see? Line 21 in abstract says they are complementary, so should be nice to show it.
Minor points: Line 431, "hard" should change to "easy" to be consistent with Line 344. Am I right? Line 334, in "is less than 1", I think "1" should change to "1". Q2: Please summarize your review in 12 sentences
I like the main idea of this work. My only problem is with the evaluation and discussion, so the rebuttal may be helpful. Submitted by Assigned_Reviewer_44
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
In this paper, the authors propose several computational optimizations of the existing algorithm APSVM [Yue et al.]. The major drawback of this algorithm is the computational cost of ensuring the constraint of the optimization problem. The authors propose two exact methods (i.e. that give the same result as APSVM), the first one based on improving the search of the most violated constraint, the second one using the fact that the negative samples only need to be ranked relatively to positive ones. The approximate method relies on the fact that some samples are easy to classify with a binary SVM and the APSVM can be optimized only on “difficult samples”, which leads to a twostep procedure for inference, but reduce drastically the computational cost of the optimization.
The reading of this paper suffers from the fact that the authors rely a lot on the paper from Yue et al. and don’t fully introduce all the definitions and the context. For instance, the notation \Delta(R^*, R) in eq. (3) and (4) is not defined. The proposed computational optimizations are correctly described and justified. The fact that the two first methods are exact is correctly explained. The approximation made for the last method seems reasonable and the experiments are convincing.
Overall, I am convinced that the computational improvements are significant while error performance stays equals or comparable to the initial algorithm. However, I think that the scope of this paper is really restricted, as methods like LambdaMART or RankNet are really more popular than the Ranking SVMs.
Q2: Please summarize your review in 12 sentences
The quality of the paper and of the results is good, but the scope of the contribution is restricted. Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 6000 characters. Note however, that reviewers and area chairs are busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point. We thank the reviewers for their insightful comments, which will help improve the quality of the paper.
Our paper proposes three novel ideas to speedup lossaugmented inference for training APSVM [12]. The first, SEARCH, is guaranteed to obtain an asymptotic decrease in the computational complexity compared to the method of [12]. The second, SELECT, uses the fact that many negative samples can share the same interleaving rank and thus avoids the expensive sorting step of [12]. While its complexity is higher than [12] in the worst case, in practice this strategy results in a speedup. The third, APPX, approximates the AP loss with a combination of the 01 loss over easy samples and the AP loss over the hard samples. Experiments on the VOC 2010 action classification dataset confirm that our ideas result in a significant speedup.
All the reviewers agree that the proposed ideas are interesting and effective, and that the paper describes them clearly. A common suggestion from the reviewers is to show more empirical evidence, which we provide below. We also provide answer the specific questions raised in the reviews.
Q: Results for combining all ideas? (AR37, AR42)
The timing results for all three ideas individually, and their combination, for the VOC action classification dataset are as follows: [12]: 0.5660 sec SEARCH: 0.0671 sec SELECT: 0.0404 sec APPX (k=50%): 0.2341 sec ALL: 0.0251 sec
Please note that the SELECT results have improved considerably. This is achieved by restricting the SELECT strategy to negative samples with interleaving rank = P (those ranked below all positives). In our experiments, we found that in later iterations of the optimization, a large number of negative samples have interleaving rank = P, while the negative that share other values of interleaving ranks decrease considerably.
As can be seen, the three ideas are complementary since the time for ALL is less than the time for the individual ideas.
Q: Results for a largescale problem? (AR37, AR42)
As suggested by the reviewers, we evaluated our ideas for the task of object detection using latent APSVM on the VOC 2007 detection dataset. We used the features and hyperparameter values provided by the authors of the following publication: http://cvit.iiit.ac.in/projects/lapsvm/Research_files/lapsvm_journal.pdf Due to time constraints, we only tested the exact methods, SEARCH and SELECT, for which no further crossvalidation was required. In the limited time, we could complete experiments for 15 of the 20 classes. The timing results, averaged over the 15 classes, are as follows: [12]: 6.006 sec SEARCH+SELECT: 0.3180 sec
We will include the new results in the paper to provide stronger support for our ideas.
Q: Other ranking algorithms? (AR44)
The following survey paper reports comparable performance for LambdaMART and ranking SVMs: BusaFekete, R., et al. "An appletoapple comparison of Learningtorank algorithms in terms of Normalized Discounted Cumulative Gain." ECAI 201220th European Conference on Artificial Intelligence. Vol. 242. 2012.
Furthermore, our speedups are not only applicable to APSVM, they can be used with any ranker that minimizes the upper bound on the AP loss. One such scenario is mentioned in the discussion section of our paper. Specifically, we can replace the final layer of the highly successful convolutional neural network such that we minimize the AP loss instead of the softmax loss. In this case, backpropagation requires solving the lossaugmented inference problem to compute the subgradient, which can be significantly speededup using our ideas.
Q: Performance of SELECT vs. SEARCH (AR42)
The effectiveness of SELECT depends heavily on how many negative samples share the same interleaving rank. As mentioned above, we observed that the number of negative samples with interleaving rank = P increases substantially with the iterations, while it decreases for the other interleaving rank values. By restricting ourselves to apply SELECT for only the value of P, we were able to substantially improve the performance of SELECT.
The results shown above indicate that the improved SELECT strategy is more effective than SEARCH for the action classification dataset. As suggested by the reviewer, we will include our observations about the SELECT strategy in the paper.
Q: Binary SVM slower than SEARCH? (AR42)
While each lossaugmented inference step for binary SVM is significantly more efficient than for APSVM, in some cases we observed that we required more cutting plane iterations for binary SVM to converge. This is the reason why sometimes binary SVM is slower than training APSVM with our proposed speedups. We will report the number of iterations and the average time per iteration in the paper to clarify this point.
Q: 'k' in table 2, line 431? (AR37, AR42)
The reviewers correctly point out that k is the percentage of easy samples.
Q: Mistake in line 334? (AR42)
The reviewer correctly points out that this should be "less than 1" instead of "less than 1". We will correct this and the above mistake in the new version of the paper.
 