|  | Submitted by 
      Assigned_Reviewer_2
 
 Q1: Comments to author(s). 
      First provide a summary of the paper, and then address the following 
      criteria: Quality, clarity, originality and significance. (For detailed 
      reviewing guidelines, see 
      http://nips.cc/PaperInformation/ReviewerInstructions)
 
 This paper proposes a harmonic loss-based analysis 
      method to measure the agreement between a target function and the 
      underlying graph. The authors apply this analysis to many commonly used 
      graph-based learning methods with interesting insights. The paper is well 
      written and the analysis should be of interest to the general NIPS 
      community. Q2: Please summarize 
      your review in 1-2 sentences
 Graph-based SSL and random walks and their connections 
      to harmonic functions are well recognized in previous research, which 
      makes the results in the paper, although novel, somewhat incremental.
 
 There are a few typos in the paper which I assume the authors will 
      be able find with a careful reading.
 
 This paper proposes a harmonic loss-based analysis 
      method to measure the agreement between a target function and the 
      underlying graph. The paper is well written and the analysis should be of 
      interest to the general NIPS community. Submitted by 
      Assigned_Reviewer_5
 
 Q1: Comments to author(s). 
      First provide a summary of the paper, and then address the following 
      criteria: Quality, clarity, originality and significance. (For detailed 
      reviewing guidelines, see 
      http://nips.cc/PaperInformation/ReviewerInstructions)
 
 - A desirable property (*) for algorithms on graphs is 
      to get a solution which is almost constant in each cluster, but 
      drastically change between clusters. Q2: Please summarize your review in 1-2 
      sentencesThis paper tries to establish a 
      relation between ``near harmonic'' function on graphs and their drop in 
      dense/sparse areas. The authors then study different algorithms and their 
      ``harmonic loss'', as a step to show that property (*) holds for their 
      results.
 - This paper contains several mini steps toward getting tools 
      to check if the property (*) holds for a machine algorithm. However the 
      results are not still strong enough, and can not be considered as a proof.
 - The paper is easy to read and follow, but there are some space for 
      improvements. Some of the results can significantly simplified (See 
      detailed comments). Supplementary material is supposed to contain proofs 
      and specific details, and not complete new sections (Section 4.2 and 4.3)
 - Originality and significance: The paper introduce new tools to 
      intuitively justify the property (*) for graph algorithms. The machinery 
      looks attractive for me, but i do not see them as a tool for finding 
      rigorous proofs.
 
 Main comments:
 - The so called left/right/- 
      continuity of functions on graphs as it is defined, have nothing to do 
      with mathematical concept of continuity. It does not guaranty any bound on 
      function's jump. If you remove all results about continuity, would the 
      message of the paper is change? My suggestion is to choose another 
      keyword, which does not imply properties stronger than the actual meaning 
      to the reader.
 - lines 192-199: All the statements are imprecise and 
      hand wavy. One can build functions f which is almost harmonic everywhere, 
      but does not drop at all on a cut with a very small w(). Consider this 
      graph and function values defined on vertices, where 000 shows a big 
      complete graph with f zero on it (or super tiny random values if you 
      like):
 1--0.8--0.6--0.4--0.2--0--000--0--0--000
 f is almost 
      harmonic everywhere(except two vertices), however it does not drop at all 
      on the edges between two 000 clusters, where conductance is relatively 
      small.
 For other statements in this paragraph, also one can build 
      counterexamples.
 - In Corollary 3.1, we can explicitly characterize 
      L_f(S_i), which is the inverse of the resistance distance between vertex 1 
      and n
 - line 247: it is claimed that f continuously changes on graphs. 
      However the definition 2.5 of continuous f is much weaker than 
      "continuously changing on graphs"
 - lines 264-310: We have f_1(i) + 
      f_2(i) = 1. So we do not need to consider f_1 and f_2 as one of them 
      already contain all the information. Also Lemma 3.3 boils down to deciding 
      the class by comparing f'_1(i) with 1/n.
 - I suggest the Authors to 
      give a recipe for analyzing semi-supervised algorithms. This would 
      increase the readability of their paper. A recipe is something like:
 1- try to write the solution function f as a sum of a harmonic term 
      and non-harmonic term. In my own words, find the net incoming flow into 
      each node. It is clear that we may not be able to write the solution for 
      all methods in this form.
 2- Use theorems 2.7 and 2.8 to check if they 
      provide good bounds on the drop
 - There are many qualitative 
      statements, which Authors should avoid having them. This makes their work 
      look very hand-wavy and imprecise.
 
 
 
 tiny comments:
 - In 
      definition 2.5, it is good to mention that S_i depends on f.
 - line 
      170: r.h of inequality, k \in \bar{S_i}
 
 Pro:
 - Bound on the 
      change of function f depending on Harmonic Loss and conductance
 - 
      Presenting the solution of different graph algorithms in the harmonic form
 
 Co:
 - Upper/lower bounds on the change of function f only 
      indicate the property (*) and does not give a rigorous proof
 - The 
      normalizing step can presented much simpler. In fact i do not count it as 
      a contribution of the paper.
 - The supplementary material is misused 
      by Authors.
 
 This paper presents new bounds on the change of 
      function f depending on ``Harmonic Loss'' and conductance. Also the 
      solution of different graph algorithms are presented in a harmonic form. 
      The results are implications for the property (*), but should not 
      considered as a rigorous proof. 
 There are some novelties and 
      interesting ideas hidden in the paper, but the Authors did not explore 
      their ideas well enough.
 Submitted by 
      Assigned_Reviewer_8
 
 Q1: Comments to author(s). 
      First provide a summary of the paper, and then address the following 
      criteria: Quality, clarity, originality and significance. (For detailed 
      reviewing guidelines, see 
      http://nips.cc/PaperInformation/ReviewerInstructions)
 
 The paper studied harmonic functions in several graph 
      Laplacian based semi-supervised learning algorithms. The key finding is 
      that “the associate target function can faithfully capture the graph 
      topology in that it drops little in dense area and sharply otherwise.” 
      Q2: Please summarize your review in 1-2 
      sentences
 As mentioned in the paper, graph-based semi-supervised learning 
      has been studied from many aspects. Mostly they are either random walk 
      based or Laplacian based algorithms. This paper tried to unify different 
      studies under one framework, which is a nice motivation.
 
 However, 
      the role of harmonic functions has been studied well enough in this area I 
      think. Random walks on regular lattice and diffusion equations are closely 
      related to each other. In fact, with certain mild conditions, continuous 
      diffusion process is the limit of random walks as the lattice become 
      denser and denser. Moreover, diffusion process is described by Laplace 
      equation. All of these can be found in most basic random walk text books. 
      Then, the next piece of the puzzle is weighted graph and its limit, 
      manifold, and graph Laplacian and Laplace-Beltrami operator, which were 
      solved by the PhD dissertations of Dr. Mikhail Belkin and Dr. Stéphane 
      Lafon. A recent new puzzle [11] was also solved by [20]. All of these 
      works are already very clear about the role of harmonic functions in these 
      algorithms.
 
 I think the research of this area should be way past 
      the random walk intuition analysis stage, which was great initially. 
      Ultimately, it is a function estimation problem.
 
 Supplementary 
      material should be the content different from the main paper, not the 
      whole paper.
 
 Overall, the paper is nicely written. However, the 
      content is not significant or new enough for publication compared to other 
      submissions.
 
 
 This paper tried to unify different random walk based 
      or Laplacian based semi-supervised learning algorithms under one 
      framework, which is a nice motivation. However, most of the study in this 
      paper is not new or significant enough for the conference compared to 
      other submissions. Submitted by 
      Assigned_Reviewer_9
 
 Q1: Comments to author(s). 
      First provide a summary of the paper, and then address the following 
      criteria: Quality, clarity, originality and significance. (For detailed 
      reviewing guidelines, see 
      http://nips.cc/PaperInformation/ReviewerInstructions)
 
 The authors introduce a functional called the 
      "harmonic loss" and show that (a) it characterizes smoothness in the sense 
      that functions with small harmonic loss change little across large cuts 
      (to be precise, the cut has to be a level set separator) (b) several 
      algorithms for learning on graphs implicitly try to find functions that 
      minimize the harmonic loss, subject to some constraints. Q2: Please summarize your review in 1-2 
sentences
 The 
      "harmonic loss" they define is essentially the (signed) divergence \nabla 
      f of the function across the cut, so it's not surprising that it should be 
      closely related to smoothness. In classical vector calculus one would take 
      the inner product of this divergence with itself and use the identity
 
 < \nabla f, \nabla f > = < f, \nabla^2 f >
 
 to 
      argue that functions with small variation, i.e., small | \nabla f |^2 
      almost everywhere can be found by solving the Laplace equation. On graphs, 
      modulo some tweaking with edge weights, essentially the same holds, 
      leading to minimizing the quadratic form f^\top L f, which is at the heart 
      of all spectral methods. So in this sense, I am not surprised.
 
 Alternatively, one can minimize the integral of | \nabla f |, 
      which is the total variation, and leads to a different type of 
      regularization (l1 rather than l2 is one way to put it). The "harmonic 
      loss" introduced in this paper is essentially this total variation, except 
      there is no absolute value sign. Among all this fairly standard stuff, the 
      interesting thing about the paper is that for the purpose of analyzing 
      algorithms one can get away with only considering this divergence across 
      cuts that separate level sets of f, and in that case all the gradients 
      point in the same direction so one can drop the absolute value sign. This 
      is nice because the "harmonic loss" becomes linear and a bunch of things 
      about it are very easy to prove. At least this is my interpretation of 
      what the paper is about.
 
 I would have preferred to see an 
      exposition that explicitly makes this connection to divergence, total 
      variation, etc.. That would also have made it clear why the level sets are 
      interesting.
 
 In general, seeing a unifying analysis of different 
      algorithms for learning on graphs is interesting and enlightening. I can 
      imagine some of the results from the paper making their way into textbooks 
      eventually. Although, again, if the connection to classic notions had been 
      made clear, the article would be even more "unifying". I am sure that 
      somebody somewhere has looked at notions of divergence and total variation 
      on graphs, just not necessarily in the context of machine learning 
      algorithms. Learning on graphs is an important topic, lots of different 
      algorithms have been proposed, and any paper that clears up the dust and 
      finds the common underlying mathematics is potentially useful.
 
 On 
      the other hand, this paper doesn't propose a new algorithm, and the 
      "corrections [to exisiting algorithms] obtained from the new insights" are 
      relatively minor. Correspondingly, the experiments are not so remarkable.
 
 
 I am in two minds about this paper. On the one hand, I 
      think that the community should learn about these connections between the 
      different algorithms for learning on graphs. On the other hand the 
      "harmonic loss" that the authors introduce is not very surprising and 
      could be better related to standard concepts in math such as divergence, 
      total variation, etc., which work in both discrete and the continuous 
      domain.  Q1:Author 
      rebuttal: Please respond to any concerns raised in the reviews. There are 
      no constraints on how you want to argue your case, except for the fact 
      that your text should be limited to a maximum of 6000 characters. Note 
      however that reviewers and area chairs are very busy and may not read long 
      vague rebuttals. It is in your own interest to be concise and to the 
      point.
 We thank all the reviewers for their comments. 
      
 
 To Reviewer_5
 
 Q1. Upper/lower bound on the change of 
      function f does not give a rigorous proof of property (*).
 
 A1. 
      Note that we take a general approach without assuming any special 
      structure about graphs. This ensures the derived bounds are applicable to 
      any function on any graph, and makes it easy to analyze property (*) with 
      the bounds. If more assumptions are incorporated, we expect it would be 
      quite feasible to derive additional proofs for certain functions, which we 
      consider as an interesting line of future work.
 
 Q2. The 
      normalizing step can’t be counted as a contribution of the paper.
 
 A2. The normalization scheme is suggested by our analysis and most 
      importantly it leads to significant performance gains over prior works.
 
 Q3. Supplementary material is supposed to contain proofs and 
      specific details, and not complete new sections (Sec 4.2 and 4.3)
 
 A3. Thanks much for the reminder. We will do so in the final 
      version. But we’d like to mention that Sec 4.2 and 4.3 are only intended 
      to add more experimental evidence to confirm our analysis. They can be 
      dropped without degrading the integrity of the paper.
 
 Q4. Would 
      removing all results about continuity change the message of the paper?
 
 A4. Yes. The proposed notion of "continuity", though not as strong 
      as the usual one in Math, is a desired property for functions on graphs. 
      For example, a “discontinuous” function that changes alternatively among 
      different clusters is considered bad.
 
 Q5. The function 
      1--0.8--0.6--0.4--0.2--0--000--0--0—000 is almost harmonic everywhere 
      (except two vertices), however it does not drop at all on the edges 
      between two 000 clusters, where conductance is relatively small.
 
 A5. 0--000--0--0—000 is a constant function. Our statement doesn’t 
      include the case of constant functions. Because their harmonic loss is 
      zero everywhere, by Theorems 2.7 & 2.8, they don’t drop at all.
 
 Q6. Lemma 3.3 boils down to deciding the class by comparing 
      f'_1(i) with 1/n.
 
 A6. True, but this only holds for binary 
      classification, while our proof of Lemma 3.3 generalizes to multi-class 
      classification, as remarked in lines 236-241 (paper).
 
 
 To 
      Reviewer_8
 
 Q7. The paper studied harmonic functions in several 
      graph Laplacian based semi-supervised learning (SSL) algorithms. The role 
      of harmonic functions has been studied well enough.
 
 A7. There may 
      be some misunderstanding here. This paper is not about harmonic functions, 
      but any functions on graphs. It is not on SSL alone, but on graph-based 
      learning in general, including classification, retrieval, and clustering.
 
 First, the problem studied in this paper is fundamentally 
      different from those in Dr. Belkin and Dr. Lafon’s theses, where they 
      address the selection of the canonical basis on graphs and establish the 
      asymptotic convergence on manifolds. Here we examine how functions on 
      graphs deviate from being harmonic and develop bounds to analyze their 
      theoretical behavior, which to the best of our knowledge, has not been 
      explored before.
 
 Second, our studies go much beyond SSL. In fact, 
      4 out of 5 models we investigate are unsupervised. For examples, our 
      results justify the role of pseudo-inverse of graph Laplacian as a 
      similarity measure for recommendation, and explain the choice of 
      eigenvectors for spectral clustering.
 
 Q8. A recent new puzzle [11] 
      was also solved by [20].
 
 A8. [11, 20] argue that first-order 
      Laplacian regularization is not sufficient for high-dimensional problems 
      and suggest using high-order regularization, however how to choose the 
      order is unknown. In contrast, we show that first-order Laplacian 
      regularization actually carries sufficient information for classification, 
      which can be extracted with a simple normalization (lines 229-235). Our 
      point is validated by extensive experiments with high-dimensional 
      datasets.
 
 Q9. Graph-based SSL should be way past the random walk 
      intuition analysis stage.
 
 A9. First, as mentioned in A7, this 
      paper is not limited to random walk methods or SSL applications. Second, 
      while various methods (random walk based or not) have been proposed for 
      graph-based SSL learning, many of them are not well understood or even 
      misused, as pointed out recently in [11,17,18]. Here is a quote from [17] 
      (Luxburg et al. 2010), "In general, we believe that one has to be 
      particularly careful when using random walk based methods for extracting 
      global properties of graphs in order to avoid getting lost and converging 
      to meaningless results."
 
 
 To Reviewer_9
 
 Thanks for 
      pointing out the potential connection between “harmonic loss” and standard 
      divergence and total variation, which seems to be an interesting 
      direction. We expect such investigation would help make the concepts 
      clearer, or more importantly, lead to more systematic tools for analyzing 
      graph-based algorithms.
 |  |