Reviews: Deep Multi-State Dynamic Recurrent Neural Networks Operating on Wavelet Based Neural Features for Robust Brain Machine Interfaces

In this paper, the authors present a multi-state Dynamic Recurrent Neural Network architecture and training framework for Brain Machine Interface (BMI), including incorporating scheduled sampling and testing diverse neural features as input. The authors robustly analyze this model in comparison to other prior modeling frameworks on human posterior parietal cortical activity (PPC). This paper is of an impressive quality, containing rigorous and methodical analyses showing clear and significant improvements of their model. The authors compare to twelve baseline models and investigate many aspects of the modeling framework, including single-day vs multi-day performance, generalization of single-day training to other days, the reliance on amount of training data, the optimal preprocessing of neural feature inputs, and generalization of the models over time with different styles of retraining. The paper was very well-written, with most choices and details clearly explained. I also appreciated that the authors showed examples of the regression performance (Figure 4) instead of just reporting summary statistics. However, I do not think it is stated explicitly enough exactly which parts of the model are novel vs novel in the BMI setting vs pulling from prior BMI literature. The paper could benefit from an expansion of the background section of the Introduction and a citation to a “conventional DRNN” method. The authors did include a description of all baseline models in the Supplementary Material but I still think a more concise description of differences from the most similar previous model (presumably F-DRNN) in the main text would benefit readers. Also, what makes this method multi state? Additionally, this paper applies these methods to a novel brain region, posterior parietal cortex. Other than briefly stating this, the authors do not further discuss PPC or include any analyses or citations about how the decoding performance differed from that of more traditional BMI brain regions such as motor cortex data, which causes this contribution to be much less significant. Some questions: Does watching the cursor move for 3 minutes constitute a trial? Why is there a disconnect between figure 7 and the rightmost point of figure 8? The DRNN and Deep-DRNN have significant differences in figure 7 but in figure 8, when trained similarly on 20 days, they perform similarly. Is there an intuitive explanation as to why the more complex model is doing worse (Deep-DRNN vs DRNN)? It doesn’t seem a clear case of overfitting based on the training days plot. EDIT: I'd like to thank the authors for their thorough rebuttal. I am now even more confident of my high score.

The authors consider an important problem, that of decoding neural signals from the brain for control of external devices, such as a computer cursor. The author’s present a deep recurrent neural network for extracting such intent from neural signals and compare it against many other decoders using offline, prerecorded data from the posterior parietal cortex of a human subject during trials in which the subject watched the cursor move from the center of a workspace to individual targets. The authors compare the performance of their decoder to a host of others, training and testing on the same day, and find their decoder, operating on features extracted with wavelets, outperforms the other decoders. A major focus of this work is that of robustness - being able to use a BMI over multiple days without the need for retraining is noteworthy, as this is an important problem that must be addressed for the clinical translation of BMI systems. The authors extract many different types of features from voltage signals recorded from the electrodes of a Utah array and then train their decoder on data from 1 day and test on the remaining days, finding that wavelet based features produce the most robust performance with their decoder. They also train on the first 20 days in their dataset and test on the remaining days, again finding superior performance with their decoder for wavelet based features. They then assess the sensitivity of their decoder and others to the number of days used to initially train it and find uniformly higher performance for their decoding architecture compared to the others. Finally, they examine the performance of their decoder when retrained with various amounts of data from each day, finding performance improves with retraining. An evaluation of the paper now follows. Originality: I believe the main novelty of this paper lies in the problem it is considering. The problem of robust BCI decoding is still just starting to receive attention. The use of wavelet based features for this, to my knowledge is novel and while the DRNN the authors propose does not appear to be highly novel, the application to the problem of robust BCI in PPC, to the best of my knowledge, is. Quality: The authors do a good job of comparing against many different decoders and considering many different types of features which can be extracted from voltage data recorded from a Utah array. The duration of the dataset they test with is also noteworthy. The clarity of the writing and connection to some of the claims of the paper (see next) could be improved. Clarity: The paper is relatively clear. However, additional description of the motivation for the decoder could help in intuition. For example, why does feeding back the decoders output recurrently result in better performance/more robust decodes? Finally, I struggled with understanding one set of results. In section 4.1 (titled ‘Single-day performance’) the paragraph on lines 192-194 seems to describing training on one day and testing on all other days. If so, why is this under this section title? Significance: The authors are considering an important problem. The finding of new features and decoding architectures which lead to more robust decoding is noteworthy. However, my main concern with the present paper is the difficulty of the task used to evaluate performance. In particular, the trials used for evaluation are when the cursor is under computer control. While I understand the motivation for using these trials for an offline analysis (as closed loop effects would make a later offline analysis difficult), I am worried about the difficulty of decoding these types of trials with a recurrent decoder. In particular, for any given target, was the trajectory the cursor took to that target, identical from trial-to-trial? If so, it seems an effective decoding scheme would be to simply classify which target was selected for a trial and then use the recurrent dynamics of the decoder (without regard for neural activity) to move the cursor in a stereotyped way to the target. In this way, the decoder would only need to classify the direction of initial cursor movement from neural activity at the start of a trial. This is important because it is notoriously difficult to predict the closed-loop performance of a decoder from offline results and having an offline dataset to test with which is too simple may exacerbate this problem. While I fully appreciate it is hard to get continuous valued, ground-truth data in study with a human participant, either (1) providing more details in the manuscript about the variability of trajectories to each trial under computer control or (2) in the absence of such variability, showing the decoders effectiveness in an offline decode of hand-control (from a non-human primate experiment) would improve the strength of the results. Update after author response: I thank the authors for their response. My main concern update task difficulty has been at least somewhat addressed and have updated my score to reflect this.

Paper ID:	8215
Title:	Deep Multi-State Dynamic Recurrent Neural Networks Operating on Wavelet Based Neural Features for Robust Brain Machine Interfaces

Reviewer 1

Reviewer 2

Reviewer 3