NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:4175
Title:Bipartite expander Hopfield networks as self-decoding high-capacity error correcting codes

Reviewer 1


		
Regular Hopfield nets of N nodes can only store O(N) patterns. These patterns act as local minima to denoise inputs by moving them towards the stored patterns. The novel idea in this paper is to link this idea to that of ECCs. In ECCs, we have N bits of which M are the message and P are parity checksums, with N=M+P. (For example, M=7, P=3). ECCs have a similar effect to Hopfield nets in that the 2^M possible messages each have unqiue checksums, and act as attractors to noisy input versions. But unlike Hopfield, they have exponential (2^M) stored patterns rather than linear. This motivates an exploration of the question, how can ECCs have this larger capacity than Hopfields, and how can Hopfields be modified to be equavilent to ECCs. The first attempt uses modified Hopfield nets with 4-factors replacing pairwise factors. This can then be converted into an equivilent RBM-like bipartite structure with pairwise weights. I would have liked the links to RBM to be made more explicit -- mostly just as a cultural thing, most people in the community are familiar with RBMs due to the deep learning bandwagon so it's a standard concept to hang the new ideas off. e.g. Fig 1c has RBM structure. This first step is found not to work, because local minisiation doesn't always take us to the correct place. But another new idea, use of expander network toplogoes, is introduced, which encourages the minimisation to find the correct solutions. The paper is very well written and has clearly been through good internal review already. (Unusually I can't find any typos to report.) It's an interesting and novel paper which makes a new link between two previously unconnected areas and I think it could inspire more new thinking between them. Maybe it will do for Hopfield nets what Turbocodes did for Bayes nets. Accept.

Reviewer 2


		
Originality: The theoretical contributions are largely based on well-known ideas (Sipser and Spielman, 1996). However, the adaptation of these ideas to Hopfield nets is interesting and original. Significance: While associative memory networks are certainly interesting objects in their own right, as a submission in the neuroscience track, I could not say that I took away concrete neuroscientific insight from reading the paper. Indeed, the paper spends hardly any time on this except for a brief paragraph in the discussion. I feel like the paper will be much stronger with more discussion on relevance to the neuroscientific community. Quality: I did not check all the proofs in the supplementary material in detail, but the work appears to be technically sound and supported by simulations. Clarity: I feel like too much of the actual results and concrete details have been relegated to the appendix. For example, the main paper does not even include equations or much detail about the energy function, dynamics or learning of the proposed architecture. Oddly enough, the equations that are included are not of the main model. Minor: please include a legend in the panel in Figure 2c. ________ I've read the author's feedback. Thanks for your response and expanded discussion on relevance to neuroscience.

Reviewer 3


		
[As a general note to the meta-reviewer, the other reviewers, and the authors, this review was a last-minute emergency review and so unfortunately does not consider the extensive information in the supplementary] This work asks whether “it is possible to build simple, Hopfield-like neural networks that can represent exponentially many well-separated states using linearly many neurons, and perform error-correction or decoding on these states in response to errors on a finite fraction of all neurons.” As the authors note, capacity is often balanced against robustness; since code redundancy is needed to enable recovery from noise, capacity is necessarily reduced because of the redundancy, making this an especially difficult problem. The authors rise to this challenge and claim to produce a network that exhibits “exponential capacity, robustness to large errors, and self decoding or clean-up of these errors”. There network takes the structure of a restricted Boltzmann machine wherein the hidden units are comprised of clusters of neurons that laterally inhibit each other, each with the same connectivity to the input units but with different weights. The authors motivate their network using inspirations and ideas from expander graphs, error correcting codes, and Hopfield Networks. The proposed solution is straightforward and well-motivated, and all the design and algorithm choices seem quite sensible. Unfortunately, as a non-expert in this area (and given the short time available for review) I could not properly assess the significance or novelty of the approach in the context of the broader literature. On a general note, the manuscript is wonderfully written, and as a non-expert in this area it was easy to grasp the significance of the problem as well as the solution proposed. The authors are commended for putting together a clear, crisp, and easy to understand piece of work. Of particular note are the intuitions and insights scattered throughout the text, which help the reader grasp the logic of the proposed solution. I have the following questions that I hope the authors can comment on: - While acknowledging that this work is mainly conceptual, I am left wondering about the approach’s sensitivity to the type or form of data. My hunch is that the approach should be relatively robust, in which case I am wondering about the author’s decision to not experiment with other types of data (and potentially compare results to any relevant baselines, if they exist). - Building off the previous question, one constraint seems to be the requirement for binary inputs. Do the authors imagine a possible extension of this work to handle continuous valued data, or is this approach limited in this regard? - Can the authors provide further comments on the steep transition between error recovery and non-recovery as a function of noise (see figure 2C). Is there a reason for this fast transition, and/or would less severe slope be desirable?