NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:4452
Title:Chirality Nets for Human Pose Regression

Reviewer 1

This paper presents the novel Chirality Nets where pose symmetry (chirality equivariance) is directly built into the networks. The proposed method has fewer trainable parameters and lower computational complexity. Extensive experiments on three different tasks show the effectiveness of the proposed method. The idea of parameter sharing is not novel, however, this paper designs a series of novel variants of the standard building blocks. The idea of built-in chirality equivariance is well motivated and interesting. Chirality equivariance for human pose regression is of great importance and interest to the community. Extensive experiments on various tasks show the wide range of potential applications of the proposed method. The paper is reasonably well-written and easy to follow. The authors also provide codes to ensure the reproducibility. Questions: (1) Will parameter sharing cause a loss of model representation power? Table 1, 2 and 3, it seems that the proposed Chirality Nets will (slightly) outperform the test-time augmentation baseline. Why? Can parameter sharing be viewed as a kind of model regularization to reduce overfitting? (2) The reviewer is also curious about the 2d human pose estimation performance of the proposed method. In the 2d human pose estimation tasks, there exist large in-the-wild datasets (MSCOCO and MPII dataset), where the overfitting problem is not significant. Will Chirality Nets achieve good results in such cases? =================UPDATE==================== After reading the comments and the author responses, most of my concerns are addressed. Overall this is a good paper. I will raise my rating from 6 to 7.

Reviewer 2

A growing body of literature has shown that building symmetries into neural networks through equivariant layers is an effective means of improving results, especially in the face of limited data and even when data augmentation is used. This paper continues that trend by showing that equivariance to chirality transformations consistently improves results on pose regression tasks. The paper is well written and easy to follow. Related work is discussed in a mostly adequate and balanced manner. The work fits in existing theoretical frameworks when considering that the group acts in a linear way (though not via permutations). Such networks are covered by the theory of Shawe-Taylor and colleagues (e.g. "Representation theory and invariant neural networks") as well as recent work by Kondor, Cohen, and others. This paper however focuses on the practical aspects of implementing chirality-equivariant layers, rather than mathematical theory, and as such makes a very useful contribution. It is shown that the equivariant layers reduce the number of parameters and FLOPS. A very solid experimental validation is performed, showing consistent improvements over recent state of the art methods for this task. The improvements are not very large, but this is not to be expected from such a small (2 element) symmetry group. Overall, this is a nice paper with a simple, well executed idea. Typo on line 75: equvariant >>>> Post rebuttal comments I have read the other reviews and the rebuttal. The reviewers seem to agree that this paper makes a useful contribution and should be accepted. Since I did not raise any major concerns in my initial review, the rebuttal was mainly addressed at the other reviewers, and so did not change my judgement significantly.

Reviewer 3

This paper can be considered as the first to apply chirality into the network structure design and has proved its effectiveness in some tasks that are related to chirality transform. It may inspire a set of work utilizing this property in the field, which potentially has large impact. The empirical experiments mostly indicates the effectiveness of the new network structure. It would be better if it can show the performance for the entire human pose estimation pipeline, i.e., 2D pose estimation and 2D to 3D mapping. It can be achieved by modifying the state-of-the-art network, i.e. hourly-glass, into its chiral form. Besides, the network runtime and memory consumption can be revealed by quantitative results. It would be better if the experiments could be more complete. Overall, I think this is a good paper with strong potential to benefit the readers. It has well-organized structure and good clarification, which worths a clear accept.