NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:1295
Title:Conditional Independence Testing using Generative Adversarial Networks

Reviewer 1

Originality This paper presents a new way to use GANs in hypothesis testing. It was very interesting to use GANs to construct a null distribution that adapts to the dataset without strong assumptions. The proposed method can be used for feature selection and explainable neural networks. Quality Authors support their framework with theoretical justification and empirical results. The quantitative experimental results are limited to synthetic data but comprehensive and expected behaviours of the GCIT are observed in synthetic experiments. The only experimental result with real data is shown but it is hard to tell which result is more accurate and powerful. This issue will be raised when GCIT is applied to real-world applications. Clarity The method is clearly described and sufficient theoretical analysis was done. Exchangeability of samples and statistics were checked. The paper is well-written so that any machine learning scientists appreciate the main contribution and their intuition. Significance The main concern of this work boils down to the robustness of the method. As the authors have shown, the method dependent on hyperparameters (e.g., Lambda, the architecture of GANs) and quality of parameters of neural networks. Especially, in some academic fields, getting a p-value less than 0.05 is crucial to get the paper published. In this case, it is doubtful that the proposed method can be accepted in that community since by training models with bigger lambda or different GAN architectures will allow them to boost their p-values. The protocols to avoid overfitting of GANs, and choose hyperparameters and models should be carefully analyzed.

Reviewer 2

The authors tackle the problem of conditional independence testing problem by using a generative model to obtain the p-value. Pros: - Using GAN to generate conditional independent sample is a new design and a new method to tackle the problem. - Good result on control of the type 1 error - Promising simulation result Cons - Any rationale behind the reason why increasing lambda causes the increasing of type 1 error? - I feel that the sample size and scalability would be an issue here. If the data is limited would GAN be able to generate high quality samples of X? Also for a decent result how many sample should be generated until the p value become stable? It would be good if one investigate these problems for large implementations. Thank the authors for the feedback addressing my questions. I am keeping my score at 6.

Reviewer 3

This paper follows the framework of conditional independence tests provided by [4]. Instead of assuming the type of distribution, it uses GAN to simulate the distribution and then make conclusions based on distribution comparison. Some theoretic and empirical analysis of this method is provided in a logical manner. The experiments are well organized and easy to read. The framework is not new but introducing GAN may help to extend the testing to high dimensional data. Theoretic analysis does not completely cover all aspects that a hypothesis testing method needs.