NIPS 2017
	Mon Dec 4th through Sat the 9th, 2017  at Long Beach Convention Center
	
	
	
		
		Reviewer 1
		
		Strengths:
- The paper presents a simple, elegant and intuitive approach to few-shot learning.
- The performance is strong compared to much more complex systems that have been proposed in the past.
- I happen to have tried prototype networks on a low-shot learning setup of my own. I can attest that (a), it took me all of half an hour to implement it, and (b) with little tuning it worked quite well albeit a bit worse than a tuned matching networks implementation.
Weaknesses:
- I am curious how the performance varies quantitatively if the training "shot" is not the same as "test" shot: In realistic applications, knowing the "shot" before-hand is a fairly strong and impractical assumption.
- I find the zero-shot version and the connection to density estimation a bit distracting to the main point of the paper, which is that one can learn to produce good prototypes that are effective for few-shot learning. However, this is more an aesthetic argument than a technical one.
		
	
		
		Reviewer 2
		
		This paper proposes a simple extension of Matching networks for few-shot learning. For 1-shot learning, the proposed method and Matching networks coincide. An insightful interpretation as a mixture of density estimation is also presented.
For 5-shot learning, prototypical networks seem to work better while being more efficient. Also, they can deal with zero-shot classification. Overall, the paper is well written, and the material is well presented. Nothing earth-shattering, but an interesting contribution. 
		
	
		
		Reviewer 3
		
		Summary: This paper addresses he recently re-popularised few-shot classification task. The idea is to represent each class as its mean/prototype within a learned embedding space, and then recognising new classes via softmax over distances to the prototypes. The model is trained by randomly randomly sampling classes and instances per episode. This is appealing due to its simplicity and speed compared to other influential few-shot methodologies [21,28]. Some insights are given about the connection to mixture density estimation in the case of Bregman divergence-based distances, linear models, and matching networks. The same framework extends relatively straightforwardly to zero-shot learning by making the class prototype be the result of a learned mapping from meta-data like attributes to the prototype vector. The model performs comparable or better than contemporary few/zero-shot methods on Omniglot, miniImageNet, and CUB. 
Strengths/Weaknesses:
+ Intuitive and appealingly elegant method, that is simple and fast.
+ Authors provide several interpretations which draw connections drawn to other methods and help the reader understand well.
+ Some design choices are well explained , e.g. Euclidean distance outperforms cosine for good reason.
+ Good results
- Some other design decisions (normalisation; number of training classes per episode,  etc) less well explained. How much of good results is the proposed method per-se, and how much of it is tuning this stuff?
- Why the zero-shot part specifically works so well should be better explained.
Details:
- Recent work also has 54-56% on CUB. (Chanpinyo et al, CVPR’16, Zhang & Salgrama ECCV’16)
- This may not necessarily reduce the novelty of this work, but the use of mean of feature vectors from the same class has been proposed for the enhancement over training general (not few-shot specifically) classification model. [A Discriminative Feature Learning Approach for Deep Face Recognition, Wen et al, ECCV 2016]. The “center” in the above paper matches “prototype”. Probably this connection should be cited.
- For the results of zero-shot learning on CUB dataset, i.e., Table 3 page 7, the meta-data used here are “attribute”. This is good for fair comparison. However, from the perspective of getting better performance, better meta-data embeddings options are available. Refer to table 1 in “Learning Deep Representations of Fine-Grained Visual Descriptions, Reed et al, CVPR 2016”. It would be interesting to know the performance of the proposed method when it is equipped with better meta-data embeddings.
Update: Thanks to the authors for their response to the reviews. I think this paper is acceptable for NIPS.