NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:8728
Title:Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments

Reviewer 1

I believe this paper should be accepted to NeurIPS. Originality: This paper combines a number of known ideas into a very nice framework to do heterogeneous treatment effect estimation in the IV setting. Quality: The paper is technically correct. The experiments are well motivated and well done. Clarity: The paper is very clear and easy to understand (my one comment is that I had to rewrite the equations for myself ignoring the intercept terms to really understand what was going on). Significance: This is a method with lots of potential applications in both academic and industrial practice.

Reviewer 2

The authors develop new algorithms for instrumental variables based on the orthogonal ML technique. The paper considers the IV problem in terms of moment conditions and derive conditions where the target quantity of interest has a good rate of estimation. Then the evaluation is done on a number of experimental settings on semi-synthetic and real data. The theoretical contributions could be better explained instead of purely linking it away to existing literature; please consider adding full consistency proof for completeness. It would also serve the reader if a larger discussion the original double ML work and neyman-orthogonality was included. The experimental results look promising but I think it is premature to judge them without too many benchmarks and baselines. could the authors explain the width of large width of the confidence intervals? Is it a data artefact or something practitioners should keep in mind? My main issue is that interaction between a few elements of the theorem like the functions g, delta haven't been explored in the semi-synthetic experiments. To add to the validity of the proposed methods, more functionals relationships between the covariates and the outcome should be explored and the improvement of the method over other flexible IV methods should be demonstrated. Overall this work I think addresses an important gap in the literature of observational causal estimation where IVs are leveraged. ---- POST REBUTTAL ---- The authors addressed the issues raised. I now have more faith in the presented experimental results and the discussion around it. As the authors point out, weak IVs do present significant problems. They will add more experiments to provide a full context for DMLIV's performance. Please consider adding a quick discussion about the orthogonality conditions to the appendix. Maybe discuss how they allow practitioners to trade-off interpretability and performance by specifying certain parts of the model; like the partially linear model, for example. I've raised the score accordingly.

Reviewer 3

The paper is clear and seems correct as far as I can tell, but I found it may not be very accessible to those who are not familiar with the double machine learning approach. The paper proposes an extension of the two-stage least squares method (DMLATEIV) that allows arbitrary models. The extension itself seems quite straightforward, and the significance of this contribution is limited. The authors show that DMLATEIV does not satisfy the Neyman orthogonality, meaning that it is sensitive to the errors in the first-stage estimation. To mitigate the weakness, the authors suggest modifying the estimator by the doubly-robust approach and show that the loss for the modification is Neyman orthogonal, and the resulting estimator is robust to the estimation errors of the nuisance estimators. This seems to be a very important and useful result. However, the application of the doubly robust approach to instrumental variable regression is not totally new, and the resulting estimator is known to be Neyman orthogonal according to [V. Chernozhukov, D. Chetverikov, et al. 2017. Double/Debiased/Neyman Machine Learning of Treatment Effects. arXiv 1701.08687]: "Neyman-orthogonal scores are readily available for both the ATE and ATTE – one can employ the doubly robust/efficient scores of Robins and Rotnitzky (1995) and Hahn (1998), which are automatically Neyman orthogonal." Nevertheless, I could not find any previous work on this topic for the heterogeneous setting. The proposed method is applied to a real-world treatment effect problem and other datasets. The results look great, but there is no comparison with other methods, which is a weak point of the paper. ===== Update after the authors' response: The authors' response has addressed my major concern with the novelty and the significance of the technical contributions. As the authors claim, the paper in fact presents a solution to an open question raised in [Xinkun Nie and Stefan Wager, 2017. Quasi-Oracle Estimation of Heterogeneous Treatment Effects.]. The authors are encouraged to report the results for the experimental comparison with DeepIV mentioned in the authors' feedback.