NIPS 2018
Sun Dec 2nd through Sat the 8th, 2018 at Palais des Congrès de Montréal
Paper ID: 3472 Bayesian Alignments of Warped Multi-Output Gaussian Processes

### Reviewer 1

This submission presents a "three-layer" Gaussian process for multiple time-series analysis: a layer for transforming the input, a layer for convolutional GP, and a layer for warping the outputs. This is a different "twist" or "favour" of the existing deep-GP model. Approximate inference is via the scalable version of variational inference using inducing points. The authors state that one main contribution is the "closed-form solution for the $\Phi$-statistics for the convolution kernel". Experiments on a real data set from two wind turbines demonstrates its effectiveness over three existing models in terms of test-log-likelihoods. [Quality] This is a quality work, with clear model, approximation and experimental results. In addition, Figure 3 has shown a illustrative comparison with existing models; results against these models are also given in Table 1. One short-coming is that the authors have not considered how their approach is better (perhaps in terms of inference) than a more straightforward model where the alignment is directly placed on the input without convolution. [Clarity] L41: I would describe the model as "nested" rather than "hierarchical", so as not to be confused with Bayesian hierarchical. Section 2: I think this entire section should be rewritten just in terms of time-series, that is, one-dimensional GP, and the bold-faced of $x$ and $z$ removed. This is because L69-L70 describe only a single output $a_{d}$ function, which means $z$ must be single dimension, and hence $x$ is only single dimension. If the multi-dimensional inputs are desired, then the paper has to perhaps use a multi-output function for $a_{d}$. Also, for equation 1, it has to be stated that the functions are applied point-wise. Since the authors cited both [14] and [19] for the "warped" part, I suggest that clearly state that the model follows the spirit of [14] rather than [19]. [Originality] I would dispute the claim on L145 that one "main contribution ... is the derivation of a closed-formed solution for the $\Phi$-statistics". This is because this is only for the RBF kernel, and it is simple a mathematically tedious step to get the expressions, given the previous work of [8] and [25, section B.1]. In addition, once the model is stated, the rest trivially fall into places based on existing work. [Significance] I commend the authors for proposing this model, which I think will be very useful for time-series analysis, and give yet another manner in which GP can be "nested"/"deepened". [25] M. K. Titsias and M. Lazaro-Gredilla. Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression. NIPS, 26, 2013. [Comments on reply] The reply will be more convincing if it can in fact briefly address "how alignments can be generalized to higher dimensions".