NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:8478
Title:Missing Not at Random in Matrix Completion: The Effectiveness of Estimating Missingness Probabilities Under a Low Nuclear Norm Assumption

Reviewer 1

I have to say I really enjoyed reading this paper. The motivation, problem and insights are clear and well presented. The paper addresses the problem of de-biasing a non-uniform sampling pattern for matrix completion by using a clever existing technique (1bit matrix completion) to estimate the matrix P indicating the probability of observing each entry. My only concern is that this idea relies on such matrix P having low nuclear norm. Empirical evidence suggest that this is true for real datasets (experiments in the paper show that this technique produces a reasonable completion). However, it would be interesting to theoretically characterize (or at least provide some insights as to) which sampling patterns would correspond to a P with low nuclear norm. In lack of such theoretical characterization, and since in general (as far as I know) P is unknown for most (if not all) real datasets, one way to obtain insights that justify the assumptions would be to analyze the type of patterns produced by matrices P with low nuclear norm, and test whether they resemble real data sampling patterns.

Reviewer 2

Originality: According to lines 59-61, it seems to me that the proposed 1BITMC approach is just a special case of the approach proposed by Davenport et al. 2014. This paper also states that the 1BITMC approach is originally proposed by Davenport et al. 2014 (lines 114-115). In this sense, the paper does not propose any novel approaches. The theoretical results of the 1BITMC approach, which seem to be the main contributions of this paper, are mostly adapted from those of Davenport et al. 2014. Quality: I am a bit concerned about the experiment setup in Section 4.2. This paper randomly split the data into a training, a validation, and a testing set. This paper uses the testing set to evaluate all rating prediction approaches in terms of MAE and three variants of MAE-IPS. This evaluation procedure might be biased because the testing set is not missing at random and hence is biased. Instead, existing studies (Schnabel et al. 2016 and Wang et al. 2019) use missing at random ratings collected by forcing users to rate randomly selected items as the testing set. Clarity: This paper is mostly well written with a few exceptions. For example, the paper defines a noise matrix at line 77, but never models the rating noise or uses the noise matrix in the propensity estimation. So, it seems to me that the definition of the noise matrix is not necessary for the overall flow of the paper. Significance: This paper demonstrates that the proposed 1BITMC approach is significantly better than the naive Bayes and the logistic regression approach on synthetic datasets. However, the experimental results of Section 4.2 in the rating prediction and the classification task on the real datasets seem not to be significant to me.

Reviewer 3

Since the algorithm of estimating the propensity is proposed by Davenport et al. 2014, the originality of the paper mainly lies in the bounds derivation and experiments. For the bounds of the bias and overall completion error, there is no direct experiments bridging the proposed theory and practice. I would like more empirical evidences on the assumptions from real-world matrices, beyond the recommendation domain where COAT and MovieLens are from. The novelty of the paper is also less impressive when the motivation of investigating the adoption of nuclear norm is unclear. From the experiments, it is only demonstrated that the proposed propensity estimator can achieve similar results as previous classic methods (and can be even slightly worse if data fits better for Naive Bayes or Logistic Regression). The performance gain of the newly proposed estimator on the MovieLens dataset (the largest experimented datasets) is not very significant compared with Naive Bayes, meaning that when m and n are large the bias and completion error is similar to Naive Bayes. Admitted that the new estimator does not require more features or MAR data, I would still say the established knowledge from this paper is not very significant in its current form. It can do better by considering whether we can use the user/item features and the MAR data when we have them in the 1BitMC algorithm. Can we then largely improve the SOTA? The paper is in general well written and easy to follow. To make it self-contained, it would be better to introduce some background about nuclear norm. The authors are also encouraged to spent slightly fewer spaces on the background of IPS related approaches, and introduce a bit more on the 1-bit matrix completion algorithms since it is a closely related work and the algorithm is the working horse for the proposed estimator.