Paper ID: | 609 |
---|---|

Title: | Polynomial Cost of Adaptation for X-Armed Bandits |

The current paper is mathematically well written, but lacks explanations. The results are established using a compilation of techniques from several articles (Kleinberg [15], Audibert and Bubbeck [2], and Garivier et al [11]). However, even if it seems sound, and can be followed line by line, it sometimes lacks explanations about global intuitions. For example, I believe that the proof of Lemma 1 is not mandatory (and its statement is anyway very technical), and it would be better to focus on global explanations of the methodology. As an example, Algorithm 2 (MeDZO) is helpful to discretize the continuous interval: it would be good to explain why this algorithm? Why this selection of the parameters p, K_i and Delta T_i is done in that way works? The horizon T is assumed to be a prior knowledge. This should be stated and commented (if this is indeed the case). In the current form, the proofs are postponed to very technical statements at the end, but the article would really benefit about shorter sketch proofs, with key ideas and explanations. Why this algorithm, and not another one? The current audience of this article, in its current form is for a specialized audience, but not for a large audience corresponding to NIPS.

Originality: The results of this paper include some clever use of existing results and techniques along with some novel algorithmic elements. Quality: The technical results of the paper are stated precisely and are accompanied with rigorous proofs. The formal statements are also complemented by the descriptions of their underlying intuition, which helps in understanding the results. Clarity: The paper is very well written, and while the technical content is dense, it was reasonably easy to follow. Significance: I feel that the algorithmic ideas presented for achieving the optimal adaptive rates are important, and can have potential applications in related problems. ------------------------- AFTER AUTHOR RESPONSE: I have read the author response, and I thank the authors for their clarifications. I will retain my previous score for this submission.

This paper considers the regret of X-armed bandit problem. They propose a lower bound on the regret of this problem. Then they show their MeDZO algorithm, which has the advantage that need nothing to be the input, and can achieve good performances for any fixed parameters. They also show the theoratical regret upper bound of this algorithm. The idea of using the empirical distribution before is very interesting, which can reduce the number of arms we needed while keeps the regret in a low level. The regret upper bound of this policy matches with the lower bound provided in this paper, which means both bounds are tight. I think I have read some papers that use a similar idea, but this one is the first one I know about using this trick on X-armed bandit model. The proof seems to be correct, but I do not check thoses ones in supplementary file in detail. The writting is also clear for me to understand this paper. I found there are no experiments in this paper, and I am wondering how much the regret gap will be when comparing the MeDZO policy with other ones who takes \alpha as input and then make the optimization. After rebuttal: - I saw the experimental results, and I think add some experiments into the paper is useful for readers to understand the contribution of your papers. - The similar ideas: for example, in "Reducing Dueling Bandits to Cardinal Bandits", their Doubler and MultiSBM policy both regard an empirical distribution on arms as a generalized arm.