NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:6749
Title:Generalization of Reinforcement Learners with Working and Episodic Memory

Reviewer 1

tl;dr: This is a good paper. I recommend acceptance. The authors do a good job of motivating their work, and they contribute a nice experimental section with good results. The ablation study was thorough. Well done! --- Many tasks that might be given to an RL agent are impossible without working memory. This paper presents a suite of tasks which require use of that memory in order to succeed. These tasks are compiled from a variety of other sources, either directly or re-implemented for this suite. They're good tasks. This paper also presents a neural architecture for using both working memory and episodic memory. The working memory is implemented with an LSTM, not unlike IMPALA. The episodic memory, however, writes memory which is indexed into a many dimensional vector space. The paper claims that this type of memory lasts longer than the LSTM memory. The authors make a point of saying that none of the models, including the one presented in the paper, are able to do well on some of the tasks. They also show that none of the models perform well on extrapolated tasks (where the difficulty was increased after train time). I think they're doing this to show that their suite of tasks are challenging and worth trying to learn. There appears to be a marked improvement between agents without episodic memory and agents with episodic memory on the heldout test sets. Also, there is the same improvement between feed forward and LSTM agents (working memory). They did develop a novel architecture, though none of the pieces are particularly novel. However, their ablation tests successfully show that the agents with working memory and episodic memory perform better than similar agents without episodic memory or working memory at both training time and test time. Pros: - Generally easy to read - The neural architecture seems sufficiently novel - Need for both working and episodic memory seems well justified. - Thorough ablation tests Cons: - The formatting for Section 2 is *lousy*. Because figures, figure text, and main text are all over the page, it's hard to keep track of what refers to what. The intro to Section 2 says there are 13 tasks, but it's difficult to keep tally throughout the section. It would be especially helpful if the order of the figures matched the order that the tasks are presented in the main text. I think the direction is good, the experiments is good, and the overall quality is good. I wish they had another diagram which really showed their claims about generalization. For example, rather than showing all the data for the individual tasks in one, it could be nicer to show a graph which combined the information across tasks, or some handpicked results that demonstrated successes and failures (in addition to the data they have given). I didn't feel like their results had much to do with generalization as much as it had to do with the need of memory for different types of tasks. Personally, I would have liked more discussion on the need for different types of memory and how their results backed up the theory/intuition.

Reviewer 2

# Originality The problem of incorporating memory in model-free RL is not new, however there is a general lack of qualitative analysis on the problem due to the lack of clear testbeds (since most current ones might have many confounding elements, or different focus) and baselines. This paper attempts at providing both, and thus makes for a good and original contribution to the NeurIPS community. I also appreciated the focus on testing for generalisation across instances of the tasks, since that is an important metric that is often lacking in published papers in the area. # Quality The work presented is overall of high quality. The technical contribution is theoretically sound, as it is a relatively straightforward combination of existing methods. A satisfactory ablation study was provided, and the method was compared against a state of the art distributed RL algorithm, IMPALA. The authors are mostly careful about their state claims about performance of their method, and they managed to mostly convince me of the quality of the presented testbeds. # Clarity The paper is well written, albeit at times a bit too reliant on the presence of supplementary materials. As this is a common (and not easily addressable) problem with work presenting testbed-baselines pairs, this didn't affect the score too heavily, however the exposition would have gained from strongly focusing on any of the two main contributions. # Significance The problem of incorporating and utilising memory in model-free agents is a relatively strong focus of the RL community, and this work sets out to provide both testbeds and baselines to work towards tackling this important issue. The paper provides some insights on the usefulness of auxiliary reconstruction losses, which confirm and strengthen previous findings. Provided the code and the tasks are successfully released, this paper will make for an important baseline towards the quest to solve this general problem.

Reviewer 3

EDIT: changed my overall score from 6 to 7 in light of author's feedback. Positive/negative things +/-: + clearly written + not all tasks are solved - "We plan to release the full task suite within six months of publication." weakens the article as one of its main contribution is this task suite. Overall a good submission, but I feel like the contribution of the task suite is bigger than the modeling contribution. The delayed release of the task suite a big drawback. Nitpicks: It is weird to describe IMPALA (Importance Weighted Actor-Learner Architecture) an agent: "it would be almost identical to IMPALA" -> "it would be almost identical to the model in Espeholt et al. 2018." (page 4). I applaud trying to make it better with heatmap coloring, but Figure 5 is still a bit hard to read (I don't mean the font size).