Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper proposes a differentially private Q-learning algorithm for RL with continuous observations. This is a nice application of the functional Gaussian noise mechanism, and the paper provides a rigorous privacy and utility analysis. When preparing the final version the authors should fix the presentation issues raised in the reviews, and make sure the paper is properly positioned wrt previous work (eg. BGP’16 used a stricter notion of neighbouring relation between datasets that includes changes in the states and actions, not only rewards).