Reviews: On the Fairness of Disentangled Representations

Response read. I appreciate the author's commitment to making the paper clearer. The authors addressed my main concerns. I will upgrade to an 8. ---------------- **Summary** The authors attempt to answer the question: Do disentangled representations help with fairness? They do so by training 12,600 disentangling models and testing to what extent fair prediction can be done on top of these representations. They show experimentally that disentangling not only correlates to fairness, but even so when they “adjust” for accuracy” This leads them to surmise that maybe disentangling is a common cause for accuracy and fairness and thus maybe if it is the only common cause then when dealing with disentangled models, they could use accuracy as a proxy for fairness. **Strengths** * very well-written and clear for the most part * nice concise summary of metrics, disentangling as a whole * all claims backed up with thorough large scale experiments (12,600 models) * experimental exploration of disentangling and fairness at this scale seems novel to me * could be very significant to fairness community if disentangling is the missing piece to ensure fairness in classifiers **Weakness** * The title is elegant, but it seems a bit general/all-encompassing. Maybe mention fairness in the title? Aren’t there other benefits of disentangled representations not covered here (sample efficiency, transfer/generalization, etc.) ? * Section 4.2 is quite clear and well-written for the most part, but there are a couple confusing parts. For me, in the adjusted score part it was a bit confusing to understand how subtracting the disentangling score from the nearest neighbors (in terms of classification accuracy) achieves the effect of “removing” the effect of accuracy. * In the "How do we identify fair models?" section, I am a bit confused at how the chain of logic concluded that classification accuracy could be a proxy for fairness. Perhaps a causal graph between three nodes (disentangling score, GBT10000 accuracy, fairness) could be drawn and explained. I would be curious to see if my understanding is correct: ’’’’ accuracy does not always lead to fairness (thm 1), disentangling correlates with accuracy (Locatello, et al. ), disentangling correlates with fairness (section 4.1), disentangling correlates with fairness even if we adjust for accuracy (section 4.2), accuracy correlates with fairness in disentangling models (figure 4), so then disentangling is maybe a “common cause” between accuracy and fairness aka fairness <- disentangling -> accuracy. So then if this is true (if disentangling is the only confounder), the model with highest accuracy should be the most disentangled, which means the fairest?’’’’ Overall, a nice thoughtful paper with thorough experiments and even though I don't know much about fairness, it seems it could have some significance for the fairness community

Reviewer 2

Originality The idea of improving the fairness of predictions via disentangling looks novel to me. Quality Most of the claims in this paper are supported by theorems and experiments. But it looks like there are some issues as I presented in Section 1. Clarity In general, this paper is well-organized and not difficult to follow. I am not able to understand GBT 10000 scores and adj. metric. The authors might need to briefly introduce "gradient boosted trees classifier" and the motivation of computing adj. metric. Significance I believe this paper will have reasonable contributions if the authors can fix the issues I have mentioned above.

Reviewer 3

- Predict a target variable based on representations. - Theory suggests disentanglement doesn't guarantee fairness, but empirical results show a correlation between fairness and disentanglement. - Fairness here is defined as having a prediction not depend on a sensitive factor s. - The introduction could be a bit clearer about the definition of fairness that will be used, given the technical nature of the paper. More particularly, what justifies the demographic parity definition that is used throughout the rest of the paper? - In Figure 1, assumes a causal graph where the sensitive and target variables are independent, but the observations are generated from a complicated and unknown mixing function. - Intuitively I can think of a simple case where having entangled latent factors can "drag" along irrelevant factors which are correlated with the sensitive variable. - Several VAE variants try to improve disentanglement. - May be useful to also be fair with respect to unobserved variables. - Experiments show that disentanglement is generally correlated with fairness (Cars3D seems to be an exception) -- This paper presents an interesting empirical analysis relating fairness to disentanglement of learned representations.

Paper ID:	8270
Title:	On the Fairness of Disentangled Representations

Reviewer 1

Reviewer 2

Reviewer 3