NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:910
Title:D-VAE: A Variational Autoencoder for Directed Acyclic Graphs

Reviewer 1

The paper is well written -- I have a pleasant read of the paper. A few comments: 1. Theorem 1 is not very useful. The input to A is either a set or an ordered set. Only when it is an ordered one, it needs global indices to decide the order, right? If the input is a set without order, there is no need to refer to global indices. 2. The paper embeds computation graphs, not graph structure or function. The paper can do a better job of relating and differentiating the three concepts (graph structure, computation graph, function). Graph structure => computation graph => function The arrow => is an onto relation: the same graph structure indicates the same computation graph and so on. But the reverse is not true. Computation graph is defined on generic operations while functions need instantiated operations. Then your discussion of equivalence of computation graph might be easier: if any operations make two derived functions not equivalent, then the computation graph is not equivalent. 3. The title should be VAE for "Computation Graphs" not for "Directed Acyclic Graphs" because the paper does not embed graph structures. 4. Figure 7: why does the right plot much darker than the left plot in general -- the distribution of the BIC scores (pixel colors) should be the same, right? 5. I don't know the work NAONet very well, but it seems to me that it also embeds computation graphs to vectors. Why not include a fair comparison of the two?

Reviewer 2

Originality: The main architectural pieces (VAE, message passing) have been proposed previously, but the specific focus on directed graphs using these components is new to my knowledge. Quality: The proposal was technically sound, but the proposal has undesirable properties which were not addressed by the authors, and lacks sufficient empirical evaluation: 1. Does the proposal easily (from a practical implementation perspective) allow for batching? 2. Decoder sequence length. Instead of N steps with an RNN, the proposal's decoder uses N*(1+2+...+N-1) steps. This may limit the proposal to small graphs (the authors have only evaluated on a fixed, small graph size). 3. Lacks simple baselines, such as an RNN/LSTM (since this is claimed as a special case of the proposed framework, it would be nice to see that performance improves over the special case). 4. Lacks a strong graph-NN baseline. GCN has strong assumptions compared to a general message passing neural network. A concrete suggestion would be comparing to [Li et al 2018], which generates a graph using a sequence of add-node and add-edge decisions using a message passing network to represent the partial graph. Since Li et al deals with general graphs, showing an improvement over their model may indicate that the proposed restriction to DAGs is beneficial. 5. Ablation studies. Crucially, we do not know whether the proposed asynchronous message passing actually helps, versus using standard message passing with the proposed architecture. 6. Simplified experimental settings: fixed-size, small graphs. Clarity: The paper is generally written clearly, with a few exceptions: - The term "asynchronous message passing" is a bit confusing, since each node has to wait for its predecessors' message to be computed (hence it seems synchronous); the authors might consider defining asynchronous and synchronous precisely in the context of their proposal. - (Minor): Line 29 - awkward phrasing ("the answer is yes"). Significance: Due to the issues above the significance is low-medium. ----- Edit after author feedback, raising from 5 to 6 ---- The authors provided a comparison with [Li et al], ablations of the message passing, and evaluation on a larger NAS setting and variable-length setting - thanks for the efforts in implementing and adding these! The proposed method does show improvements over [Li et al] and the message passing ablation strengthens the case for the scheme proposed here (though performance drops significantly with variable length sequences). Due to the new evaluations I'll raise the review by one point. Minor: LSTMs could be used via consistently linearizing the graphs, but with the [Li et al] baseline I don't think this is needed.

Reviewer 3

Originality: Generating DAGs is a very new and important problem. My main concern is about the novelty of the method. - Both the encoding and decoding procedures of the proposed work are highly related to Ref.1. However, no discussion or comparison is given. - The asynchronous message passing algorithm is a straightforward extension of the original algorithm. Quality: In general, the paper is of good technical quality. All claims are well supported by theoretical analysis and experimental results. But both the objective function and the training strategy are missing in the main manuscript and are over-simplified in the supplementary. Clarity: The paper is well organized and clearly written. Ref.1: Yujia Li, et. al. Learning deep generative models of graphs. ICML, 2019.