NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 4192 A Unifying Framework for Spectrum-Preserving Graph Sparsification and Coarsening

### Reviewer 1

Originality The paper seems quite original, with new algorithms based on a novel principle of similarity between graphs. Quality In general, the developed framework is not sufficiently analyzed. This paper contains no error bounds on the similarity between the reduced and original pseudoinverse of Laplacian, nor any results in this direction. There is also no formal reasoning as to why minimizing ||\Delta L^{-1}||_F should particularly result in preservation of the top eigenvectors. There is no sense given of how sparse the graph can be made while keeping ||\Delta L^{-1}||_F small. There are many hyperparameters to Algorithms 1 and 2, and their choice is only loosely discussed as a purely empirical matter. The experiments are well flushed out and strong. However, there is no empirical evaluation of simultaneous reduction of nodes and edges, a main advertisement of the paper. Clarity Although the writing flowed well, I found this paper relatively confusing. It is not made sufficiently clear / precise why preservation (in Frobenius norm) of the psuedoinverse of the Laplacian is desired. For instance: [75 - 76] Spectral sparsification preserves the solution x* of Lx = b in a certain sense (see e.g. the monograph “Lx = b”). Does preservation of L^{-1} in the Frobenius norm imply this same property? Does it imply preservation in a different sense? [81 - 82] Can the statement that the 2nd eigenvector will be “preferentially preserved” be elaborated on? This feels to me like an interesting statement, and it comes up again in the experiments, but is discussed / justified in broad terms only. The motivation for the paper is also somewhat unclear. It is not clear whether preservation of L^{-1} is a natural framework in which to consider graph coarsening/sparsification, or whether it is merely a convenient framework. E.g. why is this considered as opposed to the “ ‘restricted’ Laplacian quadratic form” mentioned in [55-57]? I am also confused about the desire for unbiasedness in section 3, i.e. the requirement that E(L_{tilde{G}})^{-1}) = E(L_G^{-1}). This seems to be a somewhat arbitrary desire, as opposed to data reduction and preservation of ||\Delta L^{-1}||_F. Is it made purely to simplify the derivations (9) - (11), or is there a claim that it is truly a desirable property? Additionally, I think 3.4 could be reworked. It is not clear that there is a strong takeaway, and in the absence of this, this section contains a lot of implicit mathematical manipulations without much message. It is also confusing, for instance what is B_{1d} (or B_{1c}, or B_2) when one is interested in reducing both the number of nodes and edges? Similarly, 3.5 seems unnecessary. It is also not stated explicitly how the multi-step scheme eliminates the ‘undesirable behavior’ of the single-step algorithm . Significance The paper seems to tackle a relevant issue (graph compression) in a novel way, with strong empirical results. However, without more of a theoretical grounding, and more details on how to implement the given algorithm (for instance, how to choose a stop criterion in Algorithm 2), it is difficult to see it being of high practical significance.

### Reviewer 2

- The authors present an algorithm (based on an analysis of the pseudoinverse of a graph after randomly perturbing edges in the graph) that can simultaneously be used for graph sparsification and graph coarsening. - Is it clear that $w_e\Omega_e$ will not equal 1 to ensure finiteness in (4)? - I am having trouble following the claim made in Eq. (6). The $M_e$ are fixed matrices with $M_e=w_e (l_i-l_j)(l_i-l_j)^T$ where $l_i$ is the $i$-th column of $L^{\dagger}_G$ and $e=(i,j)$. The randomness lies with the $f$ distributions (which can be assumed uncorrelated) or are you saying it is reasonable to assume a generative process on $G$ that would lead to uncorrelated $M_e$? - In Figure 1, $\beta_{1d}=\emptyset$ for nodes. How are we to interpret the split in the intermediate range in this case? - Can you provide details (either a citation or a derivation in the appendix) of the claim that $d_h(L_G \vec x, L_{\tilde{G}}\vec x)\leq\log(1+\epsilon)$ implies that $\tilde{G}$ is an $\epsilon$-approximation of $G$. This is a key property of the proposed hyperbolic metric, as it ties the classical work to the present experimental comparisons. - When you introduce the problem of graph coarsening, you list a bevy of citations from the literature [9, 10, 3, 11, 12, 13, 14], yet your experiments for graph coarsening only compare against [47] (a paper from 1998) and [24] (an unpublished arXiv preprint). It is hard to gauge if these are sufficient; I understand that more comparisons are included in the appendix (to [24] again and to another unpublished arXiv preprint). Comparing against more algorithms would be helpful in gauging the effectiveness of your approach. - Similarly, for graph sparsification you compare your approach against [20], an algorithm from 2011. Comparisons against more (and more recent) approaches would be helpful in gauging the effectiveness of your approach. Figure 3 in the Appendix (referenced as evidence that sparsification preserves the spectrum) shows that coarsening preserves the spectrum. - It is not clear why Algorithm 2, as presented, avoids the pitfalls of Algorithm 1 (graph becomes disconnected, contraction issues)? Your appendix section A.2 seems to suggest that a non-uniform sampling of $s$ (an independent edge set) is sufficient to ensure no disconnectedness/pitfalls, yet in your main algorithm you have $s$ being selected uniformly-at-random? - Small typos: Equation (11): I believe there is a "-" missing in the numerator of the third term.