NIPS 2018
Sun Dec 2nd through Sat the 8th, 2018 at Palais des Congrès de Montréal
Paper ID: 1521 Adaptation to Easy Data in Prediction with Limited Advice

### Reviewer 2

The paper presents an novel online algorithm for stochastic losses and adversarial losses with a small range under the assumption that they can observe an additional loss per round. In the adversarial setting, they provide a lower bound that almost matched their upper bound. In the stochastic setting, their regret bound remarkably does not depend on the rounds T. Even though the setting described in the paper can be characterized by a sequence of feedback graphs, the authors do not cite or compare to the large body of literature of online learning with feedback graphs (i.e. "Online Learning with Feedback Graphs: Beyond Bandits" by Alon at al. 2015, "Online Learning with Feedback Graphs Without the Graphs" Cohen et al. 2016, "Leveraging Side Observations in Stochastic Bandits" by Caron et al. 2012). Leveraging Side Observations in Stochastic Bandits by Caron et al. 2012 prove a lower bound of log(T) for stochastic losses when the feedback graph is fixed. The feedback graph used in the current paper is more complex than a fixed graph so it is surprising that they can attain a bound that does not depend on log T. I am not sure how the lower bound of Caron et al. can be reconciled with the stochastic bound in the paper, but it might be because in the current paper, they are allowed to chose which second arm they want to observe. Nevertheless, a thorough discussion and analysis is warranted here. Another criticism is that they claim their assumptions are weak. Via their assumptions, they can bypass the impossibility result of Gerchinovitz and Lattimore [2016], but allowing an extra loss observation is essentially changing the setting and moreover they allow the algorithm to chose the second arm that they get to observe. Adding extra loss observations even if it's just one more at each round can be hugely beneficial as shown in the above feedback graphs papers. The paper is understandable, but not well written or polished. Due to the lack of comparison to on-line learning with feedback graphs and the fact that Caron et al. 2012 have a lower bound that depends on log(T), I cannot vote to accept unless these issues are addressed by the authors in the rebuttal. ----------------------------------------------------------------------------------------------- The authors have adequately addressed my concerns and hence I increased their overall score.

### Reviewer 3

This paper studies the problem of achieving a regret scaled with the effective range of the losses in the bandit setting. Although it has been shown that it is impossible in the standard bandit setting, this paper proposes the second order difference adjustments algorithm (SODA) which achieves the goal when one random extra observation is available for each round in the bandit setting. Such result improves the previous one by Cesa-Bianchi and Shamir which requires the availability of the smallest loss for each round. More desirable, SODA is also proved to achieve constant expected regret in the stochastic setting, thus it automatically adapts between two regimes. Other comments: 1. On line 210, equation numbers are missing. 2. In the proof of Theorem 1, line 192. The analysis about the 3rd term is missing: the expectation of this term is 0. 3. The ‘a’ In Equation (10) and ‘a’ in Equation (11) collide. Same on line 329. 4. There is still a gap between dependencies on K in the lower bound and in the upper bound. Any comment on it?