NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 5472 Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

### Reviewer 1

This paper proposed Sibling Rivalry, a simple yet efficient method for learning goal-reaching tasks from distance-based shaped rewards. Sibling Rivalry introduces a pair of rollouts to encourage exploration while destabilize local optima, thus it is able to tackle sparse reward problems efficiently. The extensive experiments demonstrate Sibling Rivalry's success on a variety of sparse reward problems. The idea is very interesting and simple. Self-balancing shaped reward (Eq.3) strikes a balance between exploiting available rewards and exploring diverse states. Sibling Rivalry samples pair of rollouts and introduces mutual relabeling based on the self-balancing rewards, decides to select which rollouts for policy gradient estimation. Furthermore, the inclusion hyper-parameter $\epsilon$ is able to control the over-exploration and under-exploration. However, we need to search for this parameter, while can not learn it adaptively. The experiments are carried out on a variety of continuous and discrete sparse reward tasks such as maze navigation, 3D construction in Minecraft and so on. It is clearly that Sibling Rivalry incorporating with on-policy learning method PPO is able to achieve the best successful rate compared to DDPG+HER and other baselines. However, it is better to provide more results such as the cumulative reward curves w.r.t steps of interactions to learn the sample complexity. The paper is well organized and clearly written. The Sibling Rivalry provides an efficient method to tackle sparse reward problems and will attract the attentions from sparse reward research community. I have read the rebuttal which addressed my concern, and other reviews.