Reviews: Unsupervised Discovery of Temporal Structure in Noisy Data with Dynamical Components Analysis

I appreciated the author's responses, and I think the proposed refinements will strengthen the manuscript. As such, I decided to increase my score: 5 -> 6. However, I remain lukewarm regarding the actual results shown in the paper. I found the comparisons to be limited (the authors still resist performance comparisons with common approaches such as GPFA or LDS in Fig. 4), and the performance quantifications to not be very elucidating (they are focused solely on modest gains in predictive performance whereas the strongest motivation for this method is interpretability). Given that what I find most exciting in this submission is the potential for interpretability, I'm pretty disappointed no effort is done to explore this avenue in the results. To be clear, I agree that it is unreasonable to expect fully featured scientific results in a NeurIPS submission, but I would have liked to at least see this interpretability aspect briefly explored. =============================================================== The authors propose a new linear dimensionality reduction method, termed Dynamical Components Analysis (DCA), which finds low-dimensional projections of observed data by taking into account its temporal structure, namely the dependence between successive time points in the inferred latent space. The authors then validate the proposed method by applying it to two sets of neuronal data as well as two other real world datasets, focusing on the ability of the extracted projections to explain external variables and future states. Quality and Clarity The authors tackle an interesting problem, and motivate their approach well. I do think the description of PCA’s shortcomings is a bit extreme in the Abstract and Introduction sections. Indeed, in Neuroscience applications, practitioners are often interested in shared variability across neurons, something PCA and Factor Analysis are adept at describing. There is of course value in capturing temporal structure, but I find it a stretch to suggest the method proposed here is strictly superior to PCA or FA. Rather, they have different objectives. Furthermore, the method proposed is designed to capture temporal variability, but fails to provide any proper dynamical description, i.e., given activity at time t, this method makes no predictions regarding activity in the future (other than seeking to maximize dependency across time), neither does it offer any approximation to the dynamical rules at play. This is in contrast to, for example, a kalman filter/linear dynamical systems approach, which provides a linear approximation to the underlying dynamics. I think it would be interesting to discuss this limitation in the paper. From a technical standpoint, the authors did a great job at describing DCA in section 2. Section 2.1 offers a fair comparison with PCA, and it is very motivating to see DCA do so well at capturing temporal structure in this example. Sections 2.2 and 2.3 were clear and explored the technical choices well. The comparisons with SFA and CCA in section 3 are interesting, but I feel this section could benefit from taking a higher level, more intuitive approach. As the authors state, one could try to capture dependencies across time using CCA between consecutive time points. This has the advantage that it does not assume that the predictive and predictable subspaces are the same (V != U), which DCA does. This is indeed an important point, as V != U for large families of dynamical systems, and in fact CCA outperforms DCA at capturing mutual information across time in this section. At this point, a reader will be wondering why one should use DCA over CCA. The fact that CCA returns two subspaces (predictive and predicted) doesn’t seem like a big downfall. It is my interpretation that DCA takes into account dependencies over the entire sequence (across all time), which this CCA application does not. It would be interesting to see this play out in practice. In particular, it was unclear to me why CCA was not used in section 4. Also, while the authors mention GPFA and the Kalman filter as alternative approaches, these methods are absent in the comparisons in section 4. I think it would have been extremely helpful to include them. The authors state: “learning and inference in generative models tend to be computationally expensive, particularly in models featuring dynamics, and there are often many model and optimization hyperparameters that need to be tuned.” This is certainly true for LFADS, but it’s a bit of a stretch when referring to GPFA and the Kalman filter. It DCA is that much faster and easier to fit, it would be good to show this by direct comparison. Finally, it was not clear to me why in Fig. 4 performance for the neuronal datasets was measured only with respect to external variables. I agree that it is an interesting quantification, but it should come in addition to the simpler and more direct predictability of future states. Furthermore, the results shown in Fig. 4 are a bit underwhelming: the relative improvement over SFA is pretty small across datasets, and DCA seem to struggle to outperform PCA in most cases. Originality The model proposed here is, to the best of my knowledge, novel in this context. Significance While I think the method proposed here is interesting and has the potential to be useful in practice, the experimental part of the manuscript needs to be improved for this work to have a significant impact. As it stands, it is unclear how this method compared to popular alternatives, such as GPFA, Kalman filter or temporal CCA, and it is not obvious it presents a significant improvement over PCA in the real world datasets. Furthermore, while the authors tout the ability of DCA to extract interpretable projections, no attempt is made to explore or interpret these.

Paper ID:	8045
Title:	Unsupervised Discovery of Temporal Structure in Noisy Data with Dynamical Components Analysis

Reviewer 1

Reviewer 2

Reviewer 3