
Submitted by
Assigned_Reviewer_1
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
I read the author feedbacks:
 Theorem 1 (i)
(our main result) does not require acyclicity of the summary time graph.
Note that acyclicity is only one of two sufficient conditions.
I
found that I did not reflect enough that the acyclic assumption is one of
two sufficient conditions. Thanks. I changed the score up.
I still
don't understand why they tested their method only on data which are
generated under the acyclic assumption on the summary time graph. That
seems to be more limited cases.
Summary of the
paper: This paper considers a class of structural equation models for
times series data. The models allow nonlinear instantaneous effects
and lagged effects. On the other hand, Grangercausality based methods do
not allow instantaneous effects and a linear nonGaussian method TSLiNGAM
(Hyvarinen et al., ICML2008, JMLR2010) assumes linear effects.
Then they gave Theorem 1, which shows the identifiability
conditions of the model. The conditions are i) the functions that causally
relates variables come from a class of functions called IFMOC considered
in Peters et al. [2011b], ii) the faithfulness and iii) the summary time
graph is acyclic (They define the summary time graph is a graph that
contains all the nodes of the time series and an arrow X^i (X at time
point i) and X^j, i ¥neq j, if there is an arrow from X^i_{tk} to X_t^i
in the full time graph for SOME k in lines 5355). The theorem 1 is an
extension of Peters et al. [2011b] to a time series case and a partial
extension of Hyvarinen et al. (ICML2008, JMLR2010) to nonlinear cases.
Hyvarinen et al. proves the identifiability of their model assuming
linearity and nonGaussian residuals (disturbances or external
influences), in which the summary graph needs not to be acyclic.
They further presented an estimation algorithm (Algorithm 1),
which is a natural extension of Mooij et al. [2009] to time series cases.
They use independence test to detect possible violations of model
assumptions. Their algorithm can be applied (with small modifications) to
shifted time series cases, which are often observed in fMRI data.
They conducted artificial data experiments to test their
algorithms under various settings including latent confounders with time
lag but still assumed that the summary graph is acyclic. They compared
their results with a Grangercausalitybased linear method and a linear
nonGaussian method TSLiNGAM (Hyvarinen et al., ICML2008). Their methods
are better in some nonlinear cases.
Finally, they tested some real
datasets. Their methods worked better in some cases and were comparable to
linear nonGaussian TSLiNGAM in some cases.
Pros.:  It is
interesting to extend TSLiNGAM (Hyvarinen et al., ICML2008; JMLR2010) to
nonlinear cases.  It is practically important to consider shifted
time series cases.
Cons.:  The assumption in Theorem 1 that
the summary graph is acyclic seems too restrictive unless I miss something
important. It means that it does not happen that X^1_t1 causes X^2_t and
X^2_t1 causes X^1_t, doesn’t it?? They claim that their methods is a
kind of safe methods since it outputs `I don’t know’ if independence
between exploratory variables and residuals is statistically rejected,
which implies some violations of the model assumptions. The idea is not
very new. It can be found in many papers, e.g., Entner and Hoyer
(AMBN2010), Tashiro et al. (ICANN2012), Shimizu and Kano (2008, JSPI) and
Hoyer et al. (NIPS2009).
Suggestions: Would it be possible to
prove Theorem 1 without assuming that the summary graph is acyclic?
 They claim that their method is a kind of safe methods since it
outputs `I don’t know’ if independence between exploratory variables and
residuals is rejected. The idea is not very new. For example, they should
at least prove that explanatory variables and residuals are not
independent if latent confounders exist to make their contribution
clearer. Entner and Hoyer (AMBN2010) and Tashiro et al. (ICANN2012) proved
that in linear nonGaussian cases.
Comments
An extension
of TSLiNGAM to cases with latent confounders with time lag might be
Kawahara, S. Shimizu, and T. Washio. Analyzing relationships among
ARMA processes based on nonGaussianity of external
influences.Neurocomputing, 74(1213): 22122221, 2011. It considers
ARMA instead of AR model to model temporal dependency. Another extension
of TSLiNGAM to cases with instantaneous confounders would be Gao and
H. Yang. Identifying structural VAR model with latent variables using
overcomplete ICA. Far East Journal of Theoretical Statistics, 40(1):
3144, 2012. In the last year NIPS, TSLiNGAM was extended to cases
with variancedependent external influences. Z. Chen, K. Zhang, and L.
Chan. Causal discovery with scalemixture model for spatiotemporal
variance dependencies. In Advances in Neural Information Processing
Systems 25 (NIPS2012), pp. xxxx, 2012. Taking these developments in
linear nonGaussian models into account, the proposed nonlinear model
assuming the sample time graph is acyclic might be a small contribution,
although extending to nonlinear models certainly is an interesting topic.
In Experiment 11, they say that “TiMINo does not decide since the
model leads to dependent residuals. We nevertheless analyzed ...” I didn’t
follow why they decided to continue to analyze the data even if the
residuals were found to be dependent.
 They say that “we now
refer to entries larger than 0.5 as causation”. I didn’t follow why they
choose the threshold 0.5. Q2: Please summarize your
review in 12 sentences
It is interesting and important to extend a linear
nonGaussian time series causal model TSLiNGAM (Hyvarinen et al.,
ICML2008; JMLR2010) to nonlinear cases. However, Theorem~1, which
would be the main contribution of this paper makes a too restrictive
assumption that the summary time graph is acyclic, and Theorem~1 seems not
very useful. This seems to make the other parts less
attractive. Submitted by
Assigned_Reviewer_2
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The authors discuss techniques to infer causal
influence between time series. Causal inference for timeseries
methods is an increasingly relevant topic in many domains. The
presentation is clean and they include a number of examples
(simulations and real data). This works appears as a natural
progression of some earlier causal inference methods for i.i.d.
variables.
The only major changes I would suggest are to compare
this work with Eichler’s work theoretically and experimentally and to
compare this work with Chu and Glymour’s experimentally. These points
are elaborated below.
The setting the authors consider
involves timeseries without feedback. For many applications this
could be a significant restriction. There should be some discussion
about this issue. It is noteworthy that the algorithm (adapted from
Mooij) does not require knowing the ordering. However, there is an
underlying assumption though from Theorem 1 (ii) that the joint
distribution is faithful w.r.t. the full time graph. This might limit
the range of nonlinear relationships that the algorithm could
identify.
In contrast, Eichler’s work (Granger causality method)
can handle feedback loops, though in handling instantaneous
relationships it consequently does not choose a direction. Like this
work, Eichler’s work uses independence tests. Those might perform
better than the linear and nonlinear Granger causality methods
currently used in the experimental section. More should be done to
experimentally contrast the authors’ work with that proposed by Chu
and Glymour(“ANLTSM”). It is unclear what differences are in runtime
or performance between these algorithms.
In comparing this work
with ANLTSM, the authors should include examples with nonlinear
instantaneous effects or nonlinear confounders. In the paper, ANLTSM
is mentioned as being more restrictive than the authors’ work, as the
former requires confounders and instantaneous influences to be linear.
However, none of the simulations have nonlinear instantaneous effects
or confounders.
It might be worthwhile to know of Yamada and
Sugiyama’s “Dependence Minimizing Regression with Model Selection for
NonLinear Causal Inference under NonGaussian Noise” as a possible
alternative to the HSIC independence tests.
Some other missing
references are "Estimating the Directed Information to Infer Causal
Relationships in Ensemble Neural Spike Train Recordings" and "Equivalence
between minimal generative model graphs and directed information
graphs". Q2: Please summarize your review in 12
sentences
Authors tackle an important problem and they do a nice
job presenting their results. The paper can be improved by including
comparison with Eichler’s work theoretically and experimentally and by
comparison with Chu and Glymour’s experimentally. Submitted
by Assigned_Reviewer_4
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
This paper introduces a model and procedure for
learning instantaneous and lagged causal relationships among variables in
a time series when each causal relationship is either identifiable in the
sense of the additive noise model (Hoyer et al. 2009) or exhibits a time
structure. The learning procedure finds a causal order by iteratively
fitting VAR or GAM models where each variable is a function of all other
variables and making the variable with the least dependence the lowest
variable in the order. Excess parents are then pruned to produce the
summary causal graph (where x>y indicates either an instantaneous or
lagged cause up to the order of the VAR or GAM model that is fit).
Experiments show that the method outperforms competing methods and returns
no results in cases where the model can be identified (rather than wrong
results).
This paper is an interesting extension to the additive
noise model approach to learning causal relationships and the results look
promising. However, I think the presentation/organization is at times
unclear and leaves out important details.
I think the notation in
(1) is unnecessarily confusing (including a subscript, superscript, and
additional outer parentheses subscript for each parents set). I wasn't
certain about what it actually meant until after reading on. I still don't
see why the two separate subscripts are necessary to coherently define the
model.
Rather than introducing the term IFMOC, which requires
introducing a weak form of faithfulness (which many in the NIPS community
may not be familiar with anyway) as well as referencing identifiability
conditions discussed in another paper, and then mentioning it in theorem
1, I think it would be more clear and certainly more rigorous to just
provide the explicit conditions for identifiability in theorem 1.
For algorithm 1, it is never discussed how extra parents are
pruned from the graph (line 12). Is this done in the same way as Mooij et
al. 2009? Does the time structure introduce any problems (other than if
there are feedback loops)?
Also, is there a proof that algorithm 1
is correct? A proof or at least further discussion (even just specifying
why a proof from Mooij et al. 2009 should go through without issues in
this case) would be helpful.
Q2: Please summarize
your review in 12 sentences
The authors present an interesting approach to
learning instantaneous and lagged causal relationships from time series
data which promising empirical results, but the presentation and
organization of the paper could be improved and some details for
understanding the procedure are omitted. Submitted by
Assigned_Reviewer_5
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The paper present two sets of assumptions under which
a dynamic NPSEM may be recovered from multivariate timeseries data.
The authors relate their approach to existing approaches based on
Grangercausality.
There are two sets of assumptions provided. The
method (and proof) based on the first set of assumptions seems to be
building on the method of Peters et al., designed for iid data. The second
set of assumptions essentially uses the fact that the time summary graph
is assumed to be acyclic together with the fact that it is supposed that
every variable X^i_t has as one parent some earlier version of it
(X^i_{tk}). I believe that assumptions (i) are probably of most interest
since the assumptions required for (ii) are quite stringent and would be
hard to tell in practice.
Although the extension of the
identification proof of Peters to timeseries may be fairly simple, I do
not think this should count against the paper. The central idea of
identifiable functional model class (from the Peters paper)  is both
simple and profound. This paper represents an new application of this deep
idea. (e.g. noone argues that the concept of autoregression is
superfluous since it is a simple extension of regression!).
The
paper also considers the problem of identifying structure from
timeshifted series. This is a valuable contribution since such temporal
shifting often arises in practical contexts due to time delays.
Q2: Please summarize
your review in 12 sentences
An interesting worthwhile paper that certainly
improves upon existing timeseries methods based on Granger
causality.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 6000 characters. Note
however that reviewers and area chairs are very busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
We thank all reviewers for their valuable input and
would like to point out only few points:
 Theorem 1 (i) (our main
result) does not require acyclicity of the summary time graph. Acyclicity
is only one of two sufficient conditions.  If the summary time graph
is not required to be acyclic, it may happen that all nodes of the full
time graph are directly connected. Then Theorem 1 (ii) does not hold.
 We agree that model checks have been used before. We will stress
this point and include the references in the final version.  We are
currently working on a more formal statement of the correctness of the
algorithm in Mooij et al 2009, which is too long to fit into the eight
pages. We add a corresponding reference.  Pruning parents is achieved
by excluding those parents time series that are not required to obtain
independent residuals.
 