Reviews: Noise-tolerant fair classification

At one level I really like the very cute observation (Theorem 2) presented in the paper and acknowledge that it has potentially interesting implications. On the other hand, I see the following major issues with this work, which makes me feel that it does not rise to the level of a NeurIPS paper. (a) Very limited generalizability: The findings do not generalize to (i) notions of fairness other than demographic parity (equalized odds is nothing but demographic parity over positively labeled data points), (ii) scenarios where there non-binary sensitive features -- e.g., it is unclear how the observation generalizes to scenarios where there are more than 2 races in a population, (iii) scenarios where the noisy labels deviate from mutual contamination model with constraints (\alpha + \beta < 1) (b) Notations and definitions are messed up at different places -- E.g., equations (4) and (5) are inconsistent in the way they define the corrupted distribution. equation (1) does not define \Lambda and seems to redefine a loss function already defined before. The loss function with a bar on top in equation (2) is not defined. (c) Something I did not quite get: In equation (5), \alpha D_{0,.} goes in one way, while (1 - \beta) D_{0,.} goes the other way. But, given that \alpha and \beta don't sum up to 1, what happens to the remaining data points? (d) Technically, the paper does not have a lot to add. But, the primary contribution of the paper is the cute observation and its application scenarios in practice. I wonder if this paper might be better suited at a conference focussed on the topic.

Section 1 describes the set-up of the problem. In particular, the authors emphasize that there are two cases where features might have noise in them: 1) when noise is deliberately added by researchers for privacy purposes and 2) in the "positive and unlabeled" setting where individual participants in the minority group might feel uncomfortable disclosing that, leading to unlabeled data for the sensitive feature in some cases. The case under consideration is binary classification on output $Y$ with a binary sensitive feature $A$. There are two main assumptions in this paper. The first is that the noise can be described as "mutually contaminated learning". In this case, each element in the distribution of corrupted $A=0$ examples is drawn from the true $A=0$ distribution with probability $\alpha$ and from the $A=1$ distribution with probability $1-\alpha$. Similarly, each element in the corrupted $A=1$ example is drawn from the true $A=1$ distribution with probability $\beta$ and from the $A=0$ distribution with probability $1-\beta$. The second assumption is that the fairness metric of interest falls under the category of "mean-difference scores", which means that they operate under the category of the mean difference in scores between the two subgroups. This paper uses two examples of such approaches (demographic parity and equality of opportunity) in its analysis. In Section 4 contains one of the main results of the paper. In particular, Theorem 2 shows that, for any given classifier $f$, the mean-difference score on the corrupted data is related to the mean-difference score on uncorrupted data by a simple scaling factor related to the degree of noise. This insight allows the authors to construct an algorithm to produce fair classification by incorporating existing tools. In particular, they rely on existing methods for estimating noise and finding fair classification algorithms in the absence of noise to produce fair classification algorithms in the presence of noise. Additionally, this section connects the idea of noise to that of privacy, though it's worth noting that this type of noise would only obscure the sensitive attribute $A$ and not the other features. Section 5 contains experimental results for classification in two scenarios, one where noise is added to the sensitive attribute for privacy purposes, and one where it is assumed that some members of the minority group fail to self-identify on an individual level. They compare their method to three baselines: 1) running the classifier on uncorrupted data 2) running it on corrupted data without accounting for this fact 3) a denoising method: the authors note that this undoes some of the privacy protections by inferring the sensitive label. In both cases, the authors show that their method achieves fairness and accuracy levels close to what would be obtained on uncorrupted data: a denoising method performs worse in terms of accuracy and has the added concern of violating privacy. While it returns lower fairness guarantees, it is debatable whether that is actually desirable, because it is returning lower guarantees than is desired, whereas the method this paper proposes is more consistent in what it returns. The paper is well-written and effectively communicates the ideas involved. Overall, this paper gives a fairly good treatment of what it set out to do. The main contribution is not very technically deep, but it does provide a simple and perhaps useful conceptual messsage: accounting for noisy labels can be achieved by tightening the tolerance proportional to the data quality. One minor point: the second footnote on page 3 states the assumption that the classifier $f$ doesn't use the sensitive attribute $A$. It might be useful to state in a more prominent portion of the text: without this caveat, the results seem counter-intuitive and make it harder for the reader to follow. EDIT: I have read the author response, and my vote to accept stands.

Paper ID:	147
Title:	Noise-tolerant fair classification

Reviewer 1

Reviewer 2

Reviewer 3