Reviewers agreed that the paper addresses an important problem in current deep RL research and appreciated the effort put into the rebuttal by the authors. New experiments using the 0/1 reward formulation and a comparison to fixed hand-tuned hyper parameters addressed two of the main concerns raised by reviewers. In the end all three reviewers recommended accepting the paper.