Export Reviews, Discussions, Author Feedback and Meta-Reviews

Paper ID:	1497
Title:	Revenue Optimization against Strategic Buyers

Current Reviews

Submitted by Assigned_Reviewer_1

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)

This paper considers the problem of a seller selling an item to a single strategic buyer over a period of T rounds. As in a classic bandit problem, the seller attempts to learn the optimal posted price for the item by running an explore/exploit problem over the set of possible prices. However the buyer, knowing that the seller is learning about his value distribution, may lie in the hopes that the seller will eventually settle on lower prices. This problem has seen some interest recently but the models tend to differ slightly. The three defining features of the model in this paper are:

1. Buyers realized a value from their value distribution at each time step t. (as opposed to having a fixed value over all rounds).

2. They discount their utility over time (this is actually common to all models, and a necessary component to any low-regret scenario).

3. The buyer acts epsilon-strategically, meaning that if the truthful strategy (telling the truth for the remaining time period) guarantees a payoff within epsilon of the optimal strategy at any point, the buyer will use the truthful strategy.

Under this model, the authors show that a modification to the the standard UCB algorithm, denote by UCB_L where L is a free parameter to be set later, and show that it achieves the best known regret within this class of models (although similar work has a slightly different model, so the comparison is not direct). Furthermore it achieves the optimal regret when the buyer acts truthfully.

This is a nice paper that contributes to an interesting and compelling problem within the intersection of machine learning and game theory. The logic of the paper's main proof, which is a clever adaptation of the standard UCB proof, is well-written and easy to follow. The main result is not ground-breaking but it is a solid contribution to the study of online revenue optimization against strategic buyers. The result is perhaps a little incremental, in the sense that the main result is similar to other work but with a twist to the model, but I think the study of this problem is nascent enough to warrant such a contribution.

The one comment I have is that the author's definition of epsilon-strategic buyers could be more explicit. I admit that I first only skimmed their definition because I assumed it was a simple adaptation of the notion of epsilon-best response (from a standard game theory model) but it's not. I could imagine other readers who are familiar with concepts like epsilon Nash equilibria also making the same mistake. I think the definition used in this paper is a fine one but it just might need to be spelled out a bit more.

A few remarks:

- The statement of lemma 1 reads "... for delta >0, the strategic regret of a buyer can be bounded...". I assume this should be the strategic regret of seller. If I misunderstood and it in fact should be the strategic regret of the buyer, please let me know.

- On page 7, the set of inequalities that begins on like ~334: In the second inequality, the indicator variable reads "1_(v_i > p_2)". I assume the 2 in p_2 is a typo and should be just p.