Reviews: Bayesian Intermittent Demand Forecasting for Large Inventories

NIPS 2016
Mon Dec 5th through Sun the 11th, 2016 at Centre Convencions Internacional Barcelona

Paper ID:	2327
Title:	Bayesian Intermittent Demand Forecasting for Large Inventories

Reviewer 1

Summary

The paper proposes to use Bayesian inference i.e. the Laplace approximation in a generalised linear model with a multi-stage likelihood to perform demand forecasting for intermittent and bursty time-series data based on temporal features such as holidays and seasonality. The authors suggest the "twice logistic link function" as a replacement of the exponential or logistic link function. Newton steps for mode finding are reduced to Kalman smoothing to make the method scalable. A set of experiments (partly on public data) are conducted to illustrate and benchmark the approach.

Qualitative Assessment

I like about the paper that it describes many important aspects needed to make a Bayesian inference useful in a production system. Hence it is a very practical contribution. Unfortunately, the exposition of the technical parts is very condensed which makes the material hard to assess. A more pictorial description/illustration of the model in particular the multiple stage and the latent state aspects would make the paper much stronger. All items are dealt with in isolation. There is likely a lot of similarity among items of similar type. This aspect is not covered by the model. There are unclear points in the experiments: a) How does the "LS-pure" method work without features? b) Why is the third row in table 1 left i.e. "LS-feats" for the "Parts" dataset missing? c) How many features are used in the end? Can one interpret the w vectors?

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

Reviewer 2

Summary

The paper proposes a Bayesian model combining generalized linear models with time series smoothing, tailored for probabilistic forecasting of consumer good demand in retail. It provides many details on the practical implementation and shows comparison with state-of-the-art methods on real world data sets.

Qualitative Assessment

The potential impact is high, since this has become an important application domain for machine learning methods. The technical quality appears solid; also on a positive note, the paper provides details on the implementation which may be of interest to practitioners in this domain. I see a couple of shortcomings: - The originality is limited. The authors need almost an entire page to differentiate their work from Chapados (ICML 2014), which indicates that the improvements might be somewhat incremental. - The authors emphasize at several places that their methodology is part of a production system, running on Apache Spark. To make those claims relevant to a scientific publication, more information would be required, e.g., about the number of cores, running time, number of models, actual data volumes etc. - The experimental evaluation could be improved. Figure 2 suggests that the model might be prone to overfitting. Figure 3 (a) suggest that the main benefit of the proposed model is that it better captures seasonal effects, which could be achieved using much simpler models (e.g., generalized additive models). I am confused that, on the other hand, for the weekly risk forecasts in Figure 3 (b) and (c), the competitor methods do capture a seasonal effect. Maybe I am missing something here? - A minor comment: in the outlook (line 302-305), the authors mention the importance to model the dependency between different items in future work, however, it appears that some work in this direction has already been done by Chapados (ICML 2014).

Confidence in this Review

1-Less confident (might not have understood significant parts)

Reviewer 3

Summary

This paper deals with the temporal problem of demand forecasting at-scale. The approach is to combine Gaussian smoothing (temporal) with Generalized Linear Models (uncorrelated). A key part of this paper are the numerical methods employed to enable training and prediction at scale.

Qualitative Assessment

Nice paper Clearly explained and well written Deals with important problem and uses real-world data Generous description of key details needed to get the approach to work robustly at scale (Section 3.2) Very nice and honest discussion of related work especially Lines 183-219. solid empirical evaluation

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

Reviewer 4

Summary

The paper proposes a demand forcasting model that combine the merit of generalized linear models and exponential smoothing. It also considers the multi-stage likelihood case. The experiment is solid. My main concern is the novelty of the model.

Qualitative Assessment

Since generalized linear models do not include any temporal dependencies, the authors then borrow ideas from exponential smoothing to include the latent states. The linear dependencies among latent states then carries information from the past to the future. There are some technical problems for maximum likelihood estimation. For example, the non-Gaussianity of p(z|y) and the overall optimization for theta. It seems that the authors properly solve these technical problems while maintaing the efficienty of the algorithm. The experiments show that when more features are available, LS-feats outperforms other methods. While NegBin might be better when no features are available. The methods this paper propose can scale to very large datasets. However, the contribution of the paper seems incremental, since it barely combine several existing ideas without introducing new modules to solve their problems. But considering the effectiveness of the algorithm shown in this paper, I would give a borderline score.

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

Reviewer 5

Summary

This paper describes a method for intermittent demand forecasting that combines GLM with time series smoothing. Approximate inference method is used to enable non-Gaussian likelihood, and Newton-Raphson method is used to improve convergence.

Qualitative Assessment

This paper builds upon previous work and combines GLM with time series smoothing for demand forecast in short and long term scale. The author claimed empirical evidence for the usefulness of the deterministic linear part is provided, which is a critical difference from [6]. However I don't see relevant analysis on the deterministic linear part in the experiment section. I found the paper very hard to follow: 1) a lot of footnotes are used which are distracting and fail to provide clear explanation in many places; 2) a lot of technical/mathematical terms without proper explanation, e.g., what's the intuition behind multi-stage likelihood at Line.68? The author only gives the mathematical formulations which cannot be straightforwardly understood. What's (7X,7Y) at Line.268? dL/dS at Line.267 is misleading. In Figure.2, the author claimed demand becomes uncertain in out of stock region. But even in in-stock region the demand can also be uncertain, such as the period between Mar.2015 to Sep.2015 in the right-most plot. The empirical results show the proposed method brings only a little improvement. In Fig.3(a) the new method cannot predict the peak around Christmas holiday, which is claimed to be a strength in the paper. Also, in Tables.1-2 the improvement looks not significant. The paper claims the proposed method is scalable. However, no empirical results on time cost of the models are shown. The only related "result" is that the authors apply their models on some large datasets. However without results on time cost w.r.t data size, it's not enough to validate the scalability.

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

Reviewer 6

Summary

The paper aims to forecast accurate probability distributions of demand, particularly when the data is count or bursty as in an e-commerce setting. To do this, state space models are extended as in the case of GLMs, with a double logistic link function, and algorithms are proposed for maximum likelihood learning of parameters.

Qualitative Assessment

The authors should familarize themselves with the work on State Space Models in statistics: a useful starting point would be Durbin, James, and Siem Jan Koopman. Time series analysis by state space methods. No. 38. Oxford University Press, 2012. There is a big section of the book devoted to non gaussian models, and they would also find discussion on adding regression effects, referred to as "modeling of deterministic part" in the paper, line 27-28. In particular, closely related papers would be: Fahrmeir, Ludwig, and Stefan Wagenpfeil. "Penalized likelihood estimation and iterative Kalman smoothing for non-Gaussian dynamic regression models." Computational Statistics and Data Analysis 24.3 (1997): 295-320. Durbin, James, and Siem Jan Koopman. "Time series analysis of non‐Gaussian observations based on state space models from both classical and Bayesian perspectives." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 62.1 (2000): 3-56. The above papers discuss how finding the mode reduces to kalman Smoothing, and simplifying the non-gaussian likelihood by linearising it around its mode. Its hard for me to judge the paper without any discussion on relation with very closely related papers. However, based on my understanding a lot of the technical contributions of the paper are well known. More detailed comments follow below. 1.) How is the proposed method bayesian ? 2.) The abstract says you pay special attention to "intermittent and burst" items, but then at the end, it is mentioned that improvements are for fast and medium moving items. Isnt this contradictory ? How can an item be both intermittent and fast moving at the same time ? 3.)12-13. "Classical forecasting methods produce gaussian distrbutions only": not true. See the references I cited. 4.)18-19. very crucial point. should have been emphasized and elaborated on more. 5.) line 28: again not true representation of related work. 6.)Discussion in 3.1 and 3.2, are very hard to follow. Since this is a core and crucial technical contribution of the paper, its important to be presented in a clear way. 7.) Eq. (3) is non-standard, and most people with Math/Statistics background would find it puzzling since measurement error is completely skipped. I know this notation is from one of the references, but it would be better to follow standard notation which is more widely followed. 8.)Section 5.1 is fairly obvious and can be skipped.

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)