Submitted by
Assigned_Reviewer_3
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The paper describes a method of completing an nbym
matrix M with rank r smaller than min(n,m). Instead of completing M
directly, the authors assume access to an nbyr1 matrix A and an mbyr2
matrix B such that one can write M=AZB'. The A, B matrices are considered
as side information given beforehand and the proposed method completes M
by estimating Z.
The authors prove that, when r1 is much smaller
than n and r2 is much smaller than m, one needs far fewer entries to
complete M by using the side information matrices. Moreover, in this case,
Z is of much smaller dimensions than M, and thus the estimation is very
efficient.
Matrix completion with side information has been
considered previously in (Goldberg et. al., 2010; Adams, Dahl, andMurray,
2010; Porteous, Asuncion, and Welling, 2010). The method in this paper
appears different from the previous ones, and the advantage over (Goldberg
et. al., 2010) is shown in the paper.
The sample complexity
analysis is nice and the experimental results are promising.
A
negative feeling about the paper is, the side information assumption is a
bit strong, and the reduced sample complexity and performance gain is
therefore not so surprising. Nevertheless, the paper showed an example,
where such an assumption is appropriate.
Q2: Please summarize your review in 12
sentences
The paper presents a theoretical result showing that
using side information matrices can significantly reduce sample complexity
in matrix completion. The use of side information leads to an efficient
matrixcompletion method, which achieves speedups by estimating a
lowdimensional matrix. Submitted by
Assigned_Reviewer_4
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The paper studies the matrix completion problem in the
context of collaborative filtering or multilabel learning. The
contribution over existing work is that the authors account for side
information on the rows and columns of the matrix in their model and show
how exploiting side information can bring about improvements in sample
complexity over the standard matrix completion formulation. In particular
the authors study a model for matrix completion where the matrix of
interest M can be decomposed as M = A Z_0 B^T with A and B is the known
side information . When A and B are low dimensional this formulation has
fewer degrees of freedom since the unknown matrix Z_0 is much smaller than
M itself which intuitively explains why one should expect better sample
complexity guarantees.
The authors analyze a very natural
extension of the usual nuclear norm minimization program. They show that
the sample complexity required to guarantee unique minimizer is O(\mu^2 r
(r_a + r_b) \log n) for a n \times n matrix of rank r with Z_0 of
dimension r_a \times r_b and where \mu is related to the usual incoherence
parameter used in the literature. This implies that if the side
information is low dimensional (r_a, r_b small) the sample complexity is
sublinear in the matrix dimensions, which is an improvement over standard
results.
The authors complement the theoretical results with
simulations and experiments in the context of multilabel learning. The
simulations show that their algorithm is both computationally and
statistically more efficient that singular value thresholding, one of the
standard matrix completion algorithms. Their algorithm also outperforms a
number of baselines on several data sets in the multilabel learning
application.
The paper studies an interesting problem and
contributes to the body of work on matrix completion by capturing side
information. The results are interesting and the paper is well written.
The proof techniques are novel but not particularly innovative given the
work on matrix completion. In light of this, I think the contribution is
somewhat small. In particular, it would be nice to know the necessary
conditions in the noiseless case and the guarantees for the noisy version
of the matrix completion problem. The experiments are fairly convincing
but could be presented better. Graphs in lieu of tables would lead to
improved readability while also saving space.
Questions: 1. In
the main paper there is no discussion about the role of \Omega_0 and
\Omega_1 or any discussion about the condition \Omega_1 \ge q \Omega_0. It
would be nice to provide some intuition as to why such a condition
necessary and regimes under which it is satisfied. 2. Lemma 4 in the
appendix requires that \Omega_0 \le \Omega \le \Omega_1, but in the main
theorem there is no upper bound on the sample complexity. This seems like
an error, but I'm more concerned with why there is an upper bound on
\Omega in the first place. Is this condition necessary? What is the
intuition for it? 3. Why are there test instances held out of the
experiments? Why not just randomly sample the label matrix and test on the
unobserved entries of that matrix? 4. The paper title is misleading 
Speedup implies computationally, which was not the focus of the paper. I
read the paper thinking it would be much more of an algorithmic paper than
it was. Q2: Please summarize your review in 12
sentences
The paper makes an interesting if small contribution
to the matrix completion literature by studying one way to incorporate
side information into the model and showing how this can lead to improved
sample complexity guarantees. The proofs use what are now standard tools
in convex analysis so while the formulation is novel, the analysis does
not offer much technical innovation. Submitted by
Assigned_Reviewer_5
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The paper develops new theories that use side
information to reduce sample complexity, and demonstrates the approach's
effectiveness in transductive multilabel learning.
For the proof,
it is assumed that both A and B are orthonormal matrices (top paragraph in
page 4). It would be better to clarify how the results will change when A
and B are not orthogonal. Also since U and V are linear combinations of
column vectors in A and B, the proof may be simplified.
The
unconstrained optimization problem (3) is very similar to the problem
solved in [1]. Actually, if we index the set of observation \Omega from 1
to k such that $\Omega_k = (i_k, j_k)$, and let $C_k = A_{i_k, .} B_{j_k,
.}^T$. The problem (3) becomes $min_Z \sum_k \M_k  tr( Z C_k)\^2+
\lambda Z_{tr}$ which is exactly the problem solved in [1]. Similarly,
[1] also provides consistency results and an efficient algorithm. It would
be useful to compare to the algorithm for multilabel learning experiment.
In the end of second paragraph, it is said that recent efforts try
to address the issue but at a price of losing performance guarantee. It's
not clear as paper [5] cited in the original paper gives convergence
guarantee. Actually there are several papers that provide efficient
algorithms that don't require updating full matrix and also have
convergence guarantees such as paper [2] below.
In proof of lemma
1 in supplementary material, there is a typo in line 063 that it should be
$(1  P_{T^{\perp})$ where the $\perp$ is missed.
[1] Francis
R. Bach, Consistency of Trace Norm Minimization, JMLR 2008. [2] M.
Jaggi, M. Sulovsky. A Simple Algorithm for Nuclear Norm Regularized
Problems, ICML 2010
Q2: Please summarize your review
in 12 sentences
The authors provide new theoretical and empirical
results of sample complexity for matrix completion. The results should be
compared to Bach(2008) that analyzed a very similar problem.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 6000 characters. Note
however that reviewers and area chairs are very busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
