Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The authors have significantly improved their paper compared to a previous version. In particular, they have: 1. Made their adjacency definition explicit. 2. Discussed relation to other work. 3. Added experimental comparisons with other methods, even when those are not directly comparable and explained the difference.
This paper considers the problem of DP reinforcement learning under continuous state space setting. To be specific, this paper requires DP output of the value function approximator, given neighbouring reward functions. The previous algorithms fail to work since the authors are considering a continuous state space setting. This paper incurs Gaussian process mechanism to add functional noise iteratively in the training and give a theoretical analysis on the privacy guarantees. Utility analysis is also given under discrete state space setting since there exists no theoretical guarantees even for the non-private version of the algorithm under continuous setting. This pape is a nice application of Gaussian process mechanism. Although the algorithm is not complex, yet this paper makes contribution to the community of Differential Privacy, since reinforcement learning is a very important problem in DP. This paper is generally well written although some polishing is preferred. There are some typos and some places are confusing.
1. The definition of two neighboring reward functions is provided in Theorem 5. The authors did not explain the motivation of the guarantee of privacy for reward function clearly. It would be better if the authors could interpret the necessities of the privacy of reward function in some real application situations. 2. The description of Algorithm 1 is section 3 is difficult to follow. How about adding noise to the reward function r(.) directly? What is the reason for adding noise like line 19-20 of the Algorithm 1? The definitions of g^_k[B] in line 4 and g^_a[:] in line 15 are not given. The method to insert s to g^_a[:] such that the list remains monotonically increasing in line 15 is also not provided. In line 5 the definition of C(.), k is not given. Proposition 10 is a little confusing. It means with the increasing of the number of iteration rounds, the error also becomes large. The interpretation of proposition 10 should be offered in Section 3. 3. It would be much better if the formal proof of Theorem 5 in page 6 is removed to the appendix, then more words can be used to explain the algorithm thoroughly in the main body. 4. The technique of the paper is based on Gaussian process mechanism.