NIPS 2018
Sun Dec 2nd through Sat the 8th, 2018 at Palais des Congrès de Montréal
Paper ID: 1825 Tight Bounds for Collaborative PAC Learning via Multiplicative Weights

### Reviewer 1

The problem considered -- the "collaborative PAC learning" setting, recently introduced by Blum et al. to capture settings where one wants to learn a target function with respect *simultaneously* to k different underlying distributions) is quite natural and interesting. This paper nearly settles (effectively settles, for a large range of parameters) an obvious question: " what is the overhead (as a function of k, the number of underlying distributions) in the sample complexity of collaborative PAC learning, compared to vanilla PAC learning?" An easy upper bound is a factor of k. Blum et al. established an upper bound of O(log^2 k) on the overhead factor (for k = O(d), where d is the VC dimension of the concept class to learn), and a lower bound of Omega(log k) (for the specific case k=d). The main contribution of this paper is to provide an O(log k) upper bound (for k=O(d), again; the general upper bound is slightly more complicated for general k) on that ratio; they also generalize the lower bound to hold for any k=d^{O(1)}. The lower bound is aobtained by generalized and "bootstarting" the lower bound construction of Blum et al., with some changes to handle the base case. The main contribution is, in my opinion, the upper bound: where the algorithm of Blum et al. worked in stages, refining the learning by choosing one of the k "players" uniformly at random, this paper uses a natural and more refined approached by cchoosing at each stage one of the k players according to a weighted distribution, where the weights are obtained by a multiplicative-weight-update scheme. My only complaint about the paper is the presentation: namely, the authors spend 2.5 pages on a suboptimal (and very slightly simpler) version of their algorithm, and then run out of space to prove the actual, optimal one. Given the fact that the simpler one is not used by the more complex one, and that the analysis does not really bring anything more (no new technique, and the algorithm itself is not simpler to implement), I'd suggest skipping it altogether and focus directly on the optimal one.