NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:3504
Title:Oblivious Sampling Algorithms for Private Data Analysis

Reviewer 1

+ The proposed solution in the paper is important as it provides a practical way to implement the privacy amplification methods in the literature. + The proposed solution is simple and is reminiscent of the bag of little bootstrap paper . Q: Where else other than learning problems this framework can be used? And it would be great if the authors do an empirical study of that.

Reviewer 2

As mentioned above, the proposed oblivious sampling schemes are quite straightforward, and basically rely on (existing) oblivious data shuffling techniques along with some careful replication of data elements. The results appear to be novel, but not particularly original or technically deep. In addition, the schemes seem to work only for the case where k=n/m samples of size m each are needed from a data set of size n. Not clear if/how they might be extended to more general sampling scenarios. The paper is overall clearly written, modulo a few typos and bugs spread throughout the text (e.g., delta vs delta_f, eps vs eps' on page 4). Some parts need a more thorough discussion and explanation, e.g., the various symbols and formulas in the comparisons with the shuffling approach in Table 1. The proposed sampling algorithms are empirically shown to provide comparable utility with stronger DP guarantees for noisySGD over two test data sets. Given the limited scope of the experiments, it is not clear that these this can be extrapolated to other data sets and/or learning tasks.

Reviewer 3

Update: I thank the authors for their responses to my fellow reviewers. My opinion remains unchanged and I still recommend acceptance. Significance: The problem investigated is important to the development of secure and private differential privacy in practice Clarity: The writing contains some grammatical mistakes and typos but they are small. Overall it is clear. Originality: They consider the new problem of designing oblivious versions of sampling without replacement and poisson sampling. This seems like a unique and important problem, but I am unfamiliar with the area.

Reviewer 4

The main contributions of the work are to provide 2 algorithms for sampling in a Trusted Execution Environment (TEE). TEEs are helpful for protecting the privacy of the input data during a query computation, whereas sampling is advantageous in protecting the privacy of the input from seeing the output of a differentially private (DP) query. The authors also provide an evaluation of both the sampling techniques on MNIST and CIFAR with DP SGD. Originality: Regarding both the proposed algorithms, they heavily rely on a data oblivious shuffle. Given that the oblivious shuffle is from prior work, the novelty of the proposed algorithms seems incremental at best. Moreover, it is not clear why naively sampling multiple batches after a data oblivious shuffle within a TEE will be any worse than the proposed algorithms (as the shuffle will effectively hide which elements are present in the samples, even though memory accesses might be observable). The empirical evaluation needs to be broader and deeper to have a significant contribution. Another minor point is that the first 2 contributions stated in the contributions paragraph (117-122) are already known in the differentially privacy literature, and the authors have provided citations of many such works. Quality: The submission is technically sound. The claims are supported by theoretical analysis. Clarity: The quality of writing is not satisfactory, as even the only example they describe for explaining their 1st algorithm (lines 287-291) is poorly written, and I suspect it contains some typos. There are also other typos in the paper (e.g., line 185, 243). Significance: The significance of the proposed algorithms seems minor given prior work. ---------------------------------- Edit after author response: Update on Originality: After reading the author response and other reviews, I think that the contribution of the proposed algorithms over prior works is non-trivial. It is also clear to me why naive sampling after an oblivious shuffle will not share the privacy amplification advantage that the proposed algorithms have. Update on Significance: Impactful.