Paper ID: | 6073 |
---|---|

Title: | Variance Reduction for Matrix Games |

In this paper, the authors are interested in the problem of bilinear minimax games. By using tools from the optimization techniques of variance reduction, the authors show how to attain an eps-optimal (in additive error) solution to the problem in total time nnz(A) + sqrt{nnz(A)*n}/eps. Furthermore, their results hold for both l_1 - l_1 and l_1 - l_2 games. One of the key technical contributions is an approach called “sampling from the difference”, which leads to a desired variance bound. Various results in computational geometry, such as maximum inscribed ball and minimum enclosing ball, can also be recovered from these results. ====== Strengths ====== The authors present a variance reduced method for solving the bilinear saddle point problem in time nnz(A) + sqrt{nnz(A)*n}/eps, thus improving upon Nemirovski’s mirror-prox method by a factor of sqrt{nnz(A)/n}. The paper is well-written with a clear focus and detailed accounting of how the newly proposed method compares to previous work. Although the new method does not provide unilateral improvements, the authors carefully lay out the range of parameters in which their method is better, as well as when the other methods dominate, thus further illustrating the power of variance reduction for bilinear saddle point games. ====== Weaknesses ====== The authors are missing some recent related works concerning bilinear saddle point games. In particular, Sherman [3] has established rates for bilinear saddle point games that improve upon the extragradient methods of Nemirovski [1] and Nesterov [2], by combining a special type of regularizer with the dual extrapolation method in [2]. Concerning dependence on the “range” of the regularizers, it would be recommended that the authors provide additional discussion comparing these methods (and perhaps mention subsequent work by Sidford and Tian [4], which focuses on the special case of l_infty regression). Along the same lines of the “range” considerations, and though I may be mistaken, is there a missing diameter/distance term for the runtime of the extragradient methods in eq.(1)? While I know it is the case that the log(m) and log(n) factors are being hidden in \tilde{O} (which is fine for l_1 - l_1 games), for l_1 - l_2 games, shouldn’t it be necessary to account for the distance from a center point (w.r.t. the distance generating function), which may introduce an extra poly(m) term? At the very least, it would be helpful to clarify the dependence on the distance generating function(s) (including for mirror-prox), and their relation to the various min/max domains. ============ I am generally inclined to accept this paper, as the work provides an interesting insight into faster convergence rates for matrix games (in certain regimes). [1] Nemirovski, A. "Prox-method with rate of convergence O (1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems." SIAM Journal on Optimization 15, no. 1 (2004): 229-251. [2] Nesterov, Y. "Dual extrapolation and its applications to solving variational inequalities and related problems." Mathematical Programming 109, no. 2-3 (2007): 319-344. [3] Sherman, J. "Area-convexity, l∞ regularization, and undirected multicommodity flow." In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 452-460. ACM, 2017. [4] Sidford, A., and Tian, K. "Coordinate methods for accelerating ℓ∞ regression and faster approximate maximum flow." In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pp. 922-933. IEEE, 2018. ============================ After reading the authors' response, I appreciate the clarification and acknowledgement of the additional related work, and I have adjusted my score accordingly. ============================

######################################################## Having read the author's response, I maintain my comments and score. The added comparison with previous work will be a useful addition to the paper. ######################################################## This paper was a very nice read. The ideas were clearly presented, the proofs were communicated both at a high, intuitive level and in detail in the appendix, and the algorithms are a substantial improvement over previous work in certain regimes. The paper is, to my knowledge, original and the technical details are correct. In particular, the formalization in Definition 2 is quite interesting, and leads to interesting gradient estimators which, unlike in typical variance reduction methods, are constructed in a different way depending on the query point. Overall, this paper clearly presents an interesting and seemingly quite practical algorithm for solving an important class of optimization problems.

This is a very original revisitation of the classic mirror-prox framework of Nemirovski. It allows the application of variance-reduced sgd to a new setting, minmax optimization. The variance reduction technique of sampling from the difference may be of independent interest. The paper is well-written.