NIPS 2018
Sun Dec 2nd through Sat the 8th, 2018 at Palais des Congrès de Montréal
Paper ID: 3133 Distributed Learning without Distress: Privacy-Preserving Empirical Risk Minimization

### Reviewer 1

This paper proposes a multiparty learning algorithm with two different approaches. In the first approach, they combine local models from different owners and then add noise to the global model before revealing it. In the second approach, the global model is trained by each data owner jointly and in each iteration, a sufficient amount of noise is added to the gradient. The gradient perturbation method proposed in this paper is similar to the method proposed in “A Differentially Private Stochastic Gradient Descent Algorithm for Multiparty Classification” by Rajkumar and Agarwal. First of all, it would be nice to mention the differences between that work. I think it’s also necessary to use that approach as a baseline in the experiments section. One general problem of this paper is referring to the appendix a lot. This makes the paper harder to follow and understand. Maybe, some of the proofs can be included in the main part of the paper and the more complicated mathematical operations can be given in the appendix. This situation affects the comprehensiveness of the paper. I also have some doubts about the “G” parameter. Is it possible to set the Lipschitz constant to 1 in this case? If it is how do you prove that? Other than this the number of iterations is important in differentially private settings and affect the performance a lot. How did you choose the “T”? In most of the cases MPC Grad P outperforms the other algorithms. How do you explain it performs better than the MPC Output P? To sum up, this is an important problem and this paper brings a solution to this problem.

### Reviewer 2

The problem under investigation in this work is jointly training of convex models over private data. The authors propose to combine MPC with differential privacy. By using MPC the amount of noise required by the differential privacy procedure can be reduced and therefore more accurate models can be built. ======= After reading the response of the authors my evaluation of this work has increased. I do see that there is some novelty in this work but given that several key references were missing, the paper requires a major revision. Moreover, the amount of novelty is limited compared to what is stated in the current version. While this paper may be accepted, it may have larger impact if the authors would take the time to make a big facelift to the paper as well as add more novel content. I have several concerns about this work: 1. The novelty of this work is not clear. For example, Chase et al.: “Private Collaborative Neural Network Learning” have already proposed using MPC to reduce the amount of noise required by Differential Privacy. They proposed a method that scales the noise by the size of the joint dataset, instead of the size of the smallest dataset $n_1$ by jointly computing the some of gradients and the number of records instead of each party computing the average on its data. 2. From the statements of the theorems it is not clear if the approach described here provides the right level of privacy. For example, Theorem 3.4 states that \Theta_T is \epsilon,\delta-differentially private. However, the participants in the joint learning process see \Theta_1, \Theta_2,\ldots,\Theta_T. What do these participants learn about other participants? 3. When adding the noise in the MPC setting it is important to define how the noise is generated since the party that generates the noise should not learn about the data more than allowed by the privacy requirements. 4. In theorem 3.5 the learning rate is bonded to the smoothness of the loss function, why does it make sense?