Reviews: Computational Separations between Sampling and Optimization

his paper considers two natural problems that arise in machine learning (i) maximizing a function f(x) and (ii) sampling approximately from the distribution e^(-f(x)). The goal is to show that under some situations, one of these problems is easy and the other is hard. To show that optimization can be harder than sampling, the construction hides the solution of an NP-hard problem as a small bump in a mostly flat function. Thus, approximate sampling is easy (the distribution is mostly uniform), but optimization would result in solving an NP-hard problem. To show that sampling can be harder than optimization, the construction amplifies the number of solutions of an NP-hard problem and plants an additional simple solution, and then encodes this into a function that is flat in many places, but has bumps at every possible solution of the NP-hard problem. Optimization is as easy as finding the planted simple solution, but, intuitively, sampling requires finding many of the hard solutions. In general, the proofs are elegant and intuitive, and the writing is very clear. However, there is an issue with the particular definition of approximate sampling (being close in total variation distance) used throughout the paper. Because these separations are computational, we have to be wary of defining problems that can only be solved by computing real numbers. In the case of total variation distance, the distance between a discrete and a continuous distribution is always 1, meaning that no computer can approximately sample (with respect to total variation distance) from a continuous distribution with a finite number of bits. One possible resolution to this problem is to use a different notion of approximation. Wasserstein distance, for example, allows for non-trivial distances between discrete and continuous distributions, and has been employed in reductions between optimization and sampling problems [TD19]. Based on the similarity of that paper to this one, it seems like it should not be too hard to show the exact same results in this paper but under a Wasserstein distance-type approximation. Thus, I am inclined to accept this paper, with the recommendation to the authors that they change the definition of approximate sampling to something that allows for non-trivial distances between discrete and continuous distributions. Typos: - Line 163: min(0, |y - x| - 2\epsilon) should be min(0, |\phi(y) - x| - 2\epsilon). References: [TD19] C. Tosh and S. Dasgupta. The relative complexity of maximum likelihood estimation, MAP estimation, and sampling. COLT 2019.

Summary --------------- This paper explores the computational complexity relation between sampling and optimization. More precisely given a function f(x) defined over a set X \subset of R^d, we can define the following problems: - Optimization: find x' such that f(x') <= f(x*) + epsilon where x* is the minimizer of x'. - Sampling: approximately sample y from X from the probability distribution with density p(y) \prop exp(-f(y)). When f(x) is a convex function of x then it is known that optimization is computationally equivalent with sampling and both can be solved in polynomial time. When f(x) is non-convex though too little is known for the relation of these two problems. A recent work by Ma et al shows an example where sampling is easy but optimization is hard in the oracle model of Nesterov. This paper enriches our understanding of sampling vs optimization in two ways: (1) it strengths the example provided by Ma et al. by showing that there are cases where the sampling is easy but optimization is NP-hard, (2) it shows that the opposite is true too, optimization can be easy but sampling NP-hard. Observe that NP-hardness is a very strong notion of hardness from computational complexity theory and in particular it implies hardness in the oracle model. Strengths -------------------- - The understanding of the relation between sampling and optimization in the non-convex case is a very fundamental and relevant question. - The results are the strongest possible in this direction and they rule out for example the possible use of second or higher order methods. - The proofs provide a clean but non-trivial use of complexity of combinatorial problems in a continuous space. I believe that the techniques provided in this paper can be of independent interest for showing more NP-hardness results for continuous non-convex problems. Weaknesses ------------------- Strictly technically speaking the proofs use heavy machinery from known results in complexity theory but are non very technical themselves. Summary of Recommendation ----------------------------- The problem explored in this paper is fundamental and relevant, the results are strong and the presentation very good so I strongly recommend acceptance.

Paper ID:	8572
Title:	Computational Separations between Sampling and Optimization

Reviewer 1

Reviewer 2

Reviewer 3