
Submitted by
Assigned_Reviewer_1
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
This paper provides a generic way to perform parameter
tuning of a given training algorithm in a differentially private manner.
Given a set of examples (for training and validation), a set of model
parameters, a training algorithm, and a performance measure, the proposed
procedure outputs a differentially private hypothesis with respect to
prescribed privacy parameters. The basic idea behind the procedure is
the definition of (\beta_1, \beta_2, \delta)stability; which describes
the stability of the performance with respect to change in the training
set and the validation set. The procedure basically follows the
exponential mechanism, and the utility bound is also provided.
At
each step, the gradient is set to a tree such that each node of the tree
represents the differentially private partial sum of values held by the
descendent nodes. In execution of the differentially private FTAL, the
learner issues a query to the tree to learn a differentially private
parietal sum of previous gradients. Noting the fact that only O(log T)
accesses are needed to learn the partial sum, the variance of the noise
added to each value is O(log T).
The authors provides two
algorithms, one is for the full information model and the other is for the
bandit setting. In the full information setting, the proposed algorithm
improves the regret bounds. The regret bound in the full information model
with strongly convex and LLipschitz cost functions is significant; O(poly
log T) is achieved as FTAL does. In the bandit setting, the authors
introduces a technique called "oneshot gradient" to evaluate the gradient
and shows regret bounds in several settings; the regret bound is optimal
with respect to T when the cost function is strongly convex and
LLipschitz.
The clarity can be improved. It is hard to follows
the problem settings. It is not clearly stated what are private instances
to be protected by differential privacy. Definitions of the oblivious
adversary and the adaptive adversary in the bandit setting is unclear,
too.
line 107. What is "stronger setting"? Does this mean "with
stronger assumptions"? line 229232. The sentence was hard to follow.
line 305. This technique was referred to as onepoint gradient, not as
oneshot gradient.
Q2: Please summarize your
review in 12 sentences
Significant improvement of regret bounds of
differentially private online learning in the full information model is
shown.
Submitted by
Assigned_Reviewer_7
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
This work provides new algorithms for differentially
private online learning in basically all settings where results are known
in the nonprivate case. The results very closely parallel (with some
additional small factors and dependences) the best nonprivate bounds.
There has been significant recent interest in differentially private
learning, and this paper gives a much more comprehensive and compelling
suite of results than previous work.
The approach is actually
quite simple at its heartdo follow the approximate leader using a
private (noisy) history that you maintain using standard algorithms for
maintaining a differentially private counter. The natural suitability of
follow the leader algorithms for privacy has been observed before, but the
treatment here and the actual bounds obtained are very nice.
I'd
prefer that you describe this work as giving a class of private learning
algorithms, or a technique for constructing such algorithms, rather than a
"general technique for making online algorithms differentially private"
(since it's not the case that you can take an arbitrary online algorithm
and make it private using this techniqueright?).
The practice
of citing references without listing authors made this paper hard to
readI was constantly flipping to the bibliography, since in many cases
knowing the reference was important to understanding the ideas (e.g. in
cases where the text did not make sufficiently clear whether the
referenced work was in the private or nonprivate case).
In Table
1, it's probably worth reminding the reader that the delta in the first
column comes from (eps, delta)DP.
Why aren't there citations for
the nonprivate results in Table 2? And also, the caption says it's the
fullinformation setting, but the results are clearly for the bandit
setting. And it seems your results for the adaptive case should have a
T^3/4, not T^2/3, no?
Do you think the explicit dependences you
see on the dimensionality are necessary when preserving privacy? Do you
think there's a provable (small) gap between what is achievable in the
dependence on T between private and nonprivate algorithms (accounting for
your different polylog factors)?
The second to last paragraph on
p.6 has strange redundancy.
Q2: Please
summarize your review in 12 sentences
A simple and appealing approach and set of results on
private online learning. Submitted by
Assigned_Reviewer_8
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The paper presents a general method to design
differentially private online algorithms, for the full information and
bandit settings.
The paper is clear and well motivated. The
relation with the relevant literature is thoroughly discussed. I did
not check all the details in the appendix, but the results appear correct
to me.
I think the paper is probably not a breakthrough, but the
ideas presented are interesting and I think that it still deserves to be
published.
Minor things:  please move the discussion to the
main text, I find that the open problems are actually interesting,
especially the comparison with the results in
[1]. Q2: Please summarize your review in 12
sentences
A general method to design differentially private
online algorithms, for the full information and bandit settings. Clear and
complete theoretical results are shown.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 6000 characters. Note
however that reviewers and area chairs are very busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
Dear reviewers, Thank you for your kind and
detailed reviews. We will try to implement suggestions as best possible.
Following are the specific responses to questions and comments.
We
have copied the reviewers’ specific comments below, preceded by email
quote character (>).
=============== Reviewer 1:
===============
1) >It is not clearly stated what are
private instances to be protected by differential privacy.
The
private data set is the sequence of cost functions f_1, …, f_T. Each f_t
can be thought as private information belonging to an individual. The
output of the algorithm is the entire sequence of estimates $\hat
w_1,...,\hat w_T$. Our notion of differential privacy requires that for
every two sequences of cost functions that different in one cost function,
the corresponding (distributions on) vectors of outputs be
indistinguishable.
We have clarified this immediately before
Definition 2.
2) >Definitions of the oblivious adversary
and the adaptive adversary in the bandit setting is unclear, too.
We clarified these on page two.
3) >line 107. What
is "stronger setting"? Does this mean "with stronger assumptions"?
The paper of Dwork et al. uses a weaker model for privacy than
ours. The phrase “stronger setting” refers to the more stringent notion of
privacy our algorithms provide.
Specifically, Dwork et al. provide
“singleentry”level privacy, in the sense that a neighboring data set may
only differ in one entry of the cost vector for one round. In contrast, we
allow the entire cost vector to change at one round. Hiding that larger
set of possible changes is more difficult, so our algorithms satisfy the
weaker notion of Dwork et al. a fortiori.
We clarified the
difference in definitions in a footnote.
4) >line 229232.
The sentence was hard to follow.
The definition of $\tilde w_{t}$
Equation (3) is equivalent to defining it as the minimizer of the sum of
$\tilde f_i$; one obtains the expression in equation 3 by subtracting of
the leading constant $f_t(\hat w_t)$ from each of the functions $\tilde
f_t$.
We provided a clearer explanation.
5) > line
305. This technique was referred to as onepoint gradient, not as oneshot
gradient.
We have seen both “oneshot” and “onepoint” used in the
literature (for example, oneshot is used in ShalevShwartz’s survey).
We addressed this issue by calling it onepoint gradient
throughout the paper.
=============== Reviewer 7:
=============== 1) > I'd prefer that you describe this work
as giving a class of private learning algorithms, or a technique for
constructing such algorithms, rather than a "general technique for making
online algorithms differentially private" (since it's not the case that
you can take an arbitrary online algorithm and make it private using this
techniqueright?).
Yes, you are right that we provide a class of
private learning algorithm as opposed to a generic differentially private
transformation for any online learning algorithm.
We clarified
this in our presentation.
2) > The practice of citing
references without listing authors made this paper hard to readI was
constantly flipping to the bibliography, since in many cases knowing the
reference was important to understanding the ideas (e.g. in cases where
the text did not make sufficiently clear whether the referenced work was
in the private or nonprivate case).
We fixed the bibliography
issue.
3) > In Table 1, it's probably worth reminding the
reader that the delta in the first column comes from (eps, delta)DP.
We introduced the (\eps,\delta)variant of the differential
privacy to take care of this issue.
3) > Why aren't there
citations for the nonprivate results in Table 2? And also, the caption
says it's the fullinformation setting, but the results are clearly for
the bandit setting. And it seems your results for the adaptive case should
have a T^3/4, not T^2/3, no?
We addressed these issues.
4)
> Do you think the explicit dependence you see on the
dimensionality are necessary when preserving privacy? Do you think there's
a provable (small) gap between what is achievable in the dependence on T
between private and nonprivate algorithms (accounting for your different
polylog factors)?
It is a very good question. We are not sure if
the explicit dependence on dimensionality is necessary. In fact, in a
recent result (not mentioned in this paper) by one of us one can show that
if the cost functions are linear, then optimal dependence of log p can be
obtained. It is unclear whether that result can be generalized.
5)
> The second to last paragraph on p.6 has strange redundancy.
We tried addressing the issue.
===============
Reviewer 8: =============== 1) > Please move the
discussion to the main text, I find that the open problems are actually
interesting, especially the comparison with the results in [1].
We
moved the discussion section to the main body.
 