NIPS 2016
Mon Dec 5th through Sun the 11th, 2016 at Centre Convencions Internacional Barcelona
Paper ID: 444 Online Convex Optimization with Unconstrained Domains and Losses

### Reviewer 1

#### Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

### Reviewer 5

#### Summary

This paper studies online linear optimization with unconstrained domains and losses, providing a lower bound that rules out the ideal regret obtained when a bound on the gradients is known, and also a doubling-trick type of algorithm that matches the lower bound in terms of the horizon T. Some experiments for both linear model and neural network are conducted to show the effectiveness of this parameter-free algorithm.

#### Qualitative Assessment

Parameter-free online learning algorithms are in general an interesting subject. Understanding the gap between learning with and without the knowledge of the problem parameters would be very helpful. However, I have to say that the lower bound proof in the paper is very obscure and hard to understand. Specifically, while the construction of the adversary is pretty clear and simple, the parameters such as Equation (1), (2) and the specific choice of u in Case 1 seem to come out of nowhere. I failed to see why these choices are special and why one can’t get some other bounds (such as the optimal one when knowing L_max) by picking these parameters differently. I hope the authors could provide some more explanation in the rebuttal, and in general when revising the paper. I would suggest leaving the parameters unspecified first, and only argue what kind of tradeoff one can get by picking different choices at the end. The intuition behind the proposed algorithm in also missing. At the beginning the authors argue that doubling-trick does not work for previous work since the bound-violating gradient will ruin the algorithm entirely. So how exactly does the proposed algorithm avoid this? After all the algorithm still falls into the standard FTRL framework. So how is the special choice of the regularizer different from previous work and how is this helping the whole thing? Please explain. Some detailed comments: 1. Line 44, R_t(u) should be R_T(u)? 2. Last line of Case 1 in the proof of Thm 2.1, how is that an equality? 3. In Algorithm 1, it makes more sense to check the violation first instead of computing w_{t+1} first (which is wasted if the violation actually happens). 4. In Theorem 3.2, expression g_T \leq L_max \geq L seems weird and non-standard. 5. What exactly are the hyperparameters tuned for other algorithms in the experiments? 6. What is the conclusion for the experiments in Sec 4.1? 7. Last paragraph of Sec 5, “its regret matches the lower-bound”, why?

#### Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

### Reviewer 6

#### Summary

Thi paper studies the classic (static) online convex optimization that a learner aims to minimize its regret with respect to a fixed strategy. The authors first show that the regret can be bounded by a linear function when the loss is convex. Then, they provide a lower bound for the linear regret. Finally, they propose the RESCALEDEXP method which is an algorithm that tries to achieve this lower bound. The main idea of the proposed method, which is a variation of the follow the regularized learner method, is to approximate the maximum gradient norm of the loss functions which is unknown. The regret analysis of the proposed method is also provided. Numerical experiments compare the performance of the proposed method with existing methods in the literature.

#### Qualitative Assessment

My main concern is the definition of the lower bound. The authors show that the regret is bounded by the regret associated with a linear function. Then, they provide a lower bound for the regret corresponding to the linear loss. The point is we can not conclude that the lower bound provided in Theorem 2.1 is a lower bound for the original regret. Thus, this lower bound is not meaningful. Although the lower bound is meaningless, I think the regret analysis for the proposed method is interesting. The authors are able to show that the proposed method has a regret of order \|u\|log(\|u\|)L_max \sqrt{T} without prior knowledge of the parameter L_max. This result is novel and interesting. Regarding the lower bound, I think the authors should either clarify that lower bound in Theorem 2.1 does not hold for the original regret defined on page 1, or they have to remove it. The first line of the expression in Theorem 2.1 is also confusing since R_T(u) is used for general regret on page 1 while in Theorem 2.1 it is associated with the regret corresponding to linear losses.

#### Confidence in this Review

2-Confident (read it all; understood it all reasonably well)