Paper ID: 66
Title: Transfer Learning in a Transductive Setting
Reviews

Submitted by Assigned_Reviewer_6

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper presents an approach to exploit the local similarity structure for zero-shot or a few-shot problem. The idea is to not only use mid-level representation such as attributes which help in zero-shot problem but also ensure that the unlabeled data is labeled such that similar images receive similar label.

Overall, I like the direction of the paper. I think exploiting graph structure is an interesting idea which hasn't been looked into the zero-shot problem (as far as I know). While there might be multiple approaches to incorporate that, this paper uses the label propagation technique. The experiments in the paper are quite convincing (performed on three different datasets) and I believe authors have incorporated some reasonable baselines.

However, I think paper can be improved..Here are some suggestions and issues:
1. The biggest issue with the paper is qualitative results. Can the authors show some labeling they get without using graph structure and w/ graph structure so that they can show how this helps.
2. I think figure 1 is also not quite easy to understand. Until you read the details of what y and z are it is unclear what to get out of figure 1. I think it can easily be made more accessible with examples.
3. The paper should cite some recent work in SSL which also looks into graph structure based on attributes etc:
Abhinav Shrivastava, Saurabh Singh, Abhinav Gupta.Constrained Semi-Supervised Learning Using Attributes and Comparative Attributes. In ECCV 2012.
Another similar paper to cite is:
Adding Unlabeled Samples to Categories by Learned Attributes
Jonghyun Choi, Mohammad Rastegari, Ali Farhadi, Larry Davis, CVPR 2013
Q2: Please summarize your review in 1-2 sentences
Overall I like the paper. The direction seems intuitive and paper does reasonable set of experiments to convince about the idea.

Submitted by Assigned_Reviewer_7

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper describes how to attack the zero-, one-, or few-shot recognition problem, where we have a fair amount of training data for some classes, but none or very few for some other classes. It does this using three different techniques, all combined in a single framework: using semantically-meaningful mid-layer knowledge (attributes), building a graph on new classes to exploit the manifold structure, and finally by using an attribute-based representation for building the graph structure (rather than low-level features), which improves performance. The method is evaluated on 3 different datasets (Animals with Attributes, ImageNet, and MPII Cooking composites), and shows improved performance on all compared to the state-of-the-art (slightly).

On the plus side, the paper's methods are well-motivated and can fit into the same optimization framework, regardless of which kinds of side- or ground truth-annotations are available. The experiments appear to be conducted fairly and show slight improvements on the existing state-of-the-art (significantly for the MPII dataset). The paper is well-written.

The main negative is simply that it is somewhat surprising that performance is not improved more by doing all of this. I also have some minor suggestions to make the paper a bit easier to read.

- Exploiting the similarities of new classes w.r.t. existing classes seems very similar to the "simile classifiers" of Kumar et al. in their ICCV 2009 paper "Face verification using attribute and simile classifiers."

- line 74: rewrite to "...models do not have to be retrained..."

- Figure 3b: reorder legend to put red curves on top, then blue curves, then black

- Figure 4a: accuracy is misspelled

- supplementary material table 2 caption: underlying is misspelled
Q2: Please summarize your review in 1-2 sentences
Good paper bringing together multiple ideas to do few-shot recognition.

Submitted by Assigned_Reviewer_8

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper builds upon previous transfer learning (zero-shot learning) work [24,23] wherein the idea of "semantic relatedness" was introduced. It augments the previous work by introducing the idea of transductive learning from [33]. By using the visual similarities of unlabeled and labeled samples, a graph structure is built and used to propagate labels to novel classes. Unlike previous work [33], they improve the graph structure by replacing noisy raw input space with more informative attribute space.

Paper is well-written and easy to read. Proposed idea has been validated using relevant experiments. Results do show the benefit of introducing transductive learning into transfer learning framework.

Q2: Please summarize your review in 1-2 sentences
Paper nicely builds upon two previous works i.e., semantic-relatedness transfer learning [24] and label-propagation [33]. Idea introduced is interesting.
Author Feedback

Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 6000 characters. Note however that reviewers and area chairs are very busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point.
We would like to thank the reviewers for their constructive feedback. All reviewers acknowledge our relevant/convincing experiments which improve performance over state-of-the-art. The reviewers recognize that we present a novel, interesting idea [R6], and use a well-motivated method [R7] which is composed in a well written paper [R7,R8].



R6:
We agree that Figure 1 is suboptimal and that qualitative results would be beneficial. Following these suggestions we are happy to add qualitative results which will also help to illustrate Figure 1 more clearly. More extensive results will be provided as supplemental material.

Additionally R6 suggests citing Shrivastava et al. [ECCV 12] and the very recent work of Choi et al. [CVPR 13]. We will discuss both papers as part of the related work section in the final version.
Similar to us, both use attributes as an intermediate layer and incorporate unlabeled data to improve image classification with reduced training data. Both works are in a classical SSL setting similar to [4], while our setting is transfer learning. This means we transfer knowledge from categories where sufficient training data is available, which allows to recognize novel categories with fewer or no training samples.
More specifically Shrivastava et al. propose to bootstrap classifiers by adding unlabeled data. The bootstrapping is constrained by attributes shared across classes. In contrast, we use attributes for transfer and exploit the similarity between instances of the novel classes.
Choi et al. automatically discover a discriminative attribute representation, while incorporating unlabeled data. This notion of attributes is different to ours as we want to use semantic attributes to enable transfer from other classes.



R7:
R7 notes that “exploiting the similarities of new classes w.r.t. existing classes seems very similar to the simile classifiers” proposed by Kumar et al. [ICCV 09]. The “simile classifiers” are similar to the direct similarity approach from [24], Equation (2). The difference is that they focus on a localized part on the image, useful in face recognition. The “simile classifiers” could replace direct similarity or attribute classifiers in our approach if this information is available for a given dataset. Although this is not the focus of this work, we can discuss this work if the reviewers and area chairs wish.

We like to thank R7 for the detailed comments with respect to formulation and clarity. We will incorporate them in the final version.



R8:
Thanks to R8 for noting that the paper title is incorrect in CMT, we will correct this.