NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:403
Title:Volumetric Correspondence Networks for Optical Flow

Reviewer 1

I vote for rejecting this submission from Neurips 2019. For once, it has severe presentation issues. Besides a fairly high number of typos and formatting issues, the explanations in the paper are at times lacking and counter intuitive, especially in the method part 3. The overview over the network architecture in Figure 3 needs to be explained more thoroughly such that it stands for itself. Regarding the method itself, the contribution over prior work is unfortunately not entirely clear to me. The main claim for novelty is an application of a truly 4D cost volume processing in contrast to state-of-the-art methods that "reshape the 4D cost volume as a multichannel 2D array with N = U × V channels" [9,...] (l. 101). However, Flownet [9] which the authors refer to in this context does nothing of the sort, instead they extract patch-wise features for both input images and correlate pairs of patches to get a notion of correspondence. The "offset invariance" (l. 104) the authors claim to have in comparison to prior work is therefore not novel but rather predominant in state-of-the-art methods in this field. Furthermore, in order to reduce the computational load the authors are separating 4D convolutions into two 2D operations (l. 123) acting independently on the two image domains. It seems to me that this allows for only very limited correlated interactions that are able to take advantage of the authors claims of using "true 4D operations". Furthermore, given that the set of core operations (2D convolutions) to fuse information is similar, it seems to me that overall this approach has a strong theoretical connection with the strategy in Flownet or other state-of-the-art methods. This aspect needs to be explored and clarified by the authors, otherwise the contribution over prior work is dubious to me. I find the evaluations in the paper lacking and not in coherence with the authors claims of "dramatically improving accuracy over the state-of-the-art". Qualitative comparisons are missing completely in the paper, there should be at least one depiction of how the method improves upon prior work on some challenging example. While the authors provide a couple of examples in the supplementary material, those comparisons are all done wrt one other method only (PWCnet).

Reviewer 2

Originality: This paper consists of several engineering "tricks" that enable 4D cost volume filtering. While these modifications re relatively straightforward, they are well motivated, principled and of great use to practitioners and deployment of optical flow models. Quality: The technical content of the paper appears to be correct. There are multiple aspects I like about this paper: + Proposed modifications seem to speedup the training process (~10x) and improve its stability (wider range of learning rates). + Proposed model significantly reduces number of required flops (< 50% of PWCNet, ~25% of LiteFlowNet), and memory while achieves state of the art accuracy (among compact models). + One of the motivations of this paper is to reduce memorization and improve generalization - this is nicely demonstrated on two tasks (stereo -> flow and small -> large motion). Perhaps, it might be interesting to see whether it also helps domain adaptation (however this is certainly beyond the scope of this paper). + Experiments in this paper are very clearly described and contain great discussion. + As this paper would be of great use to practitioners, its also nice to see the authors will release the code. Clarity: The paper is very-well written, contains nice figures that illustrates well the key concepts and authors will release the code. Perhaps, I'd just modify lines 22-30 which somewhat repeat the abstract and certain design choices could be better motivated/explained/justified (e.g. why cosine similarity is used). It is well organized, experiments section contain great discussion and I've enjoyed reading it! Significance: Memory and compute requirements of modern optical flow estimation methods represent a real bottle-neck - this paper might make deployment of optical flow much more easier. This paper proposes several relatively straightforward modifications of standard dense optical flow matching models, however, I believe these would be of great use to practitioners!

Reviewer 3

This submission presents a number of modifications to simplify volumetric layers in networks for optical flow estimation. These not only improve the memory and execution time requirements, but also accuracy when compared to previous work. It is also demonstrated that the new networks are able to more effectively generalize to and be repurposed for other correspondence tasks. Originality: This is a new method for the visual correspondence problem, and addresses quite a few shortcomings or limitations of existing methods. Related work is adequately cited and used to contextualise the proposed work. Quality: The authors takes care to properly explain and motivate design choices, and back up claims with a sufficient amount of experimental evidence. They are also successful in evaluating the strengths of the method, but could have perhaps put more effort into analysing its weaknesses. Clarity: The submission is written and organized well. I only found a few small typos, as listed under "improvements" below. Significance: This work does seem to be significantly new, with important results that can advance state-of-the-art research and practice of optical flow estimation. Tests show quite conclusive improvements over previous work.