NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:6501
Title:A Domain Agnostic Measure for Monitoring and Evaluating GANs

Reviewer 1

- Clear and well presented paper - The experiment section is detailed and provides a lot of insights on the theoretical work of the duality gap. - The results are of significance and prove the value of the DP as a good measure

Reviewer 2

The idea of studying GANs from the game theory perspective is not new; however, using the duality gap as a performance metric (some sort of divergence between the generated data distribution and the real data distribution) is original to the best of my knowledge. The paper is written clearly. In terms of significance, while the idea of the duality gap is "natural" when considering the game theory perspective for GANs, it is not clear why this is a good metric for _any_ domain. The authors imply that it is a good idea to find a metric that does not depend on the domain of the data, but given all the parallels between GANs and the different divergences between probability distributions (JS, Wasserstein, etc.) I think the main problem is to find a metric that can be thought as correctly modeling the distance between high-dimensional datasets such as the ones given by images. In that case, modeling this aspect (which is highly domain-dependent) is crucial for understanding what a GAN is capturing about the data distribution. Of course the duality gap can be a good performance measure for certain domains, but I would argue that it depends heavily on the domain. Since game theory is well founded in clear assumptions, it is possible to find scenarios for which the duality gap is a good metric, scenarios for which one can test that the assumptions hold to a certain extent. However, this is not true in general. For images for instance, it is not clear why the duality gap is a good measure: What is the model that takes one to this conclusion? What are the assumptions? And, to which extent these assumptions are correct? I agree that finding methods that can be domain agnostic is one goal of ML research; however, for the particular problem of assessing the quality of a state-of-the-art generative model, I believe that understanding better how the networks involved actually encode the data in their weights is more important than one more performance metric besides FID and IS. Then again, the duality gap can be good for certain problems, but probably not for all, and at least for image generation it would only be one more metric, together with FID and IS. With all this said, I would argue that the significance of this work is good, but not great. Rebuttal update: The authors pointed out that the DG can be used for unlabeled datasets, which is an important remark that I took into consideration when reviewing the paper, and my score considers this property. The comparison with FID and IC was in the sense that there are no clear guarantees (or proper reasoning framework or model) for images (specifically) that this measure is somehow significant. The proposed guarantees come from game theory, but why is it a good framework for testing the quality of models of natural images? Still, I believe the paper is good when thinking about GANs in general (domain agnostic as the authors propose, for which the game theory framework makes sense). However, the fact is GANs are used in very specific settings (for modelling images mostly). Therefore, the significance of this paper is good, but not great. My score remains thus unchanged.

Reviewer 3

The proposed method is a nice contribution that provides a framework for evaluation of GANs using the duality gap theory. Specifically it considers the gap between various discriminator generator pairs (worst generator, discriminator). This can provide means for evaluation of gans in various settings. The method uses a good theoretical framework and well evaluated experimentally. It solves an important problem of evaluation of GANs

Reviewer 4

This submission aims to develop metrics for monitoring of GAN training and assessment of the quality for the resulting images. Such a metric is useful for inspecting the GAN convergence and to detect the mode collapse deficiency associated with GANs. This paper adapts the duality gap as the quality and provides a minmax approximation for the duality gap. Various experiments are performed that correlate the duality gap pattern with the convergence behavior. Strong points: -metrics for monitoring the GAN performance is novel and very practical -the experiments are extensive Weak points: -the approximation for the duality gap is rather ad hoc; it is not clear how close the approximation is to the real statistical gap; this needs further experimental exploration and justification -is the duality gap evaluated in parallel with the training? -the main result in this paper is Theorem 1. However, the usefulness of the lower bound for the duality gap is not immediately useful. It would be interesting if one can develop upperbounds.