NIPS 2018
Sun Dec 2nd through Sat the 8th, 2018 at Palais des Congrès de Montréal
Paper ID:1253
Title:Model Agnostic Supervised Local Explanations

Reviewer 1

The author propose an algorithm for model interpretability which doesn't have the typical sidesteps the typical accuracy-interpretability trade-offs. It focuses on example-based and local explanations but is also able detect global patterns and diagnose limitations in its local explanations. The authors provide a clear overview of the different types of explanation and related work as well as random forests and their use for explainability (SILO and DSTump). SLIM (the author's model) uses a local linear model, where the coefficients determine the estimated local effect of each feature. For features with 0 coefficient, the authors determine the efficacy of a local explanation to understand if the feature is important for global patterns. They also propose a method to pick exemplar explanations using the local training distribution. The results are evaluated on multiple datasets and compared to state of the art algorithms for both accuracy and explainability, both as a predictive model and a black box explainer. It would be interesting to discuss how/if the method can generalize to other models than tree based approaches

Reviewer 2

Paper summary This paper focuses on interpretable machine learning (IML) and proposes an approach (denoted SLIM) for achieving interpretability through example-based and local explanations. The approach combines ideas from local linear models and random forests. It builds on existing work on feature selection and supervised neighborhood selection for local linear modeling. The paper uses recent work in IML to define interpretability and shows via experiments on eight UCI datasets how to maintain high accuracy (through the use of random forests and similar methods) while being able to provide local explanations. Summary of strengths and weaknesses - Strengths: the paper is well-written and well-organized. It clearly positions the main idea and proposed approach related to existing work and experimentally demonstrates the effectiveness of the proposed approach in comparison with the state-of-the-art. - Weaknesses: the research method is not very clearly described in the paper or in the abstract. The paper lacks a clear assessment of the validity of the experimental approach, the analysis, and the conclusions. Quality - Your definition of interpretable (human simulatable) focuses on to what extent a human can perform and describe the model calculations. This definition does not take into account our ability to make inferences or predictions about something as an indicator of our understanding of or our ability to interpret that something. Yet, regarding your approach, you state that you are “not trying to find causal structure in the data, but in the model’s response” and that “we can freely manipulate the input and observe how the model response changes”. Is your chosen definition of interpretability too narrow for the proposed approach? Clarity - Overall, the writing is well-organized, clear, and concise. - The abstract does a good job explaining the proposed idea but lacks description of how the idea was evaluated and what was the outcome. Minor language issues p. 95: “from from” -> “from” p. 110: “to to” -> “how to” p. 126: “as way” -> “as a way” p. 182 “can sorted” -> “can be sorted” p. 197: “on directly on” -> “directly on” p. 222: “where want” -> “where we want” p. 245: “as accurate” -> “as accurate as” Tab. 1: “square” -> “squared error” p. 323: “this are features” -> “this is features” Originality - the paper builds on recent work in IML and combines two separate lines of existing work; the work by Bloniarz et al. (2016) on supervised neighborhood selection for local linear modeling (denoted SILO) and the work by Kazemitabar et al. (2017) on feature selection (denoted DStump). The framing of the problem, combination of existing work, and empirical evaluation and analysis appear to be original contributions. Significance - the proposed method is compared to a suitable state-of-the-art IML approach (LIME) and outperforms it on seven out of eight data sets. - some concrete illustrations on how the proposed method makes explanations, from a user perspective, would likely make the paper more accessible for researchers and practitioners at the intersection between human-computer interaction and IML. You propose a “causal metric” and use it to demonstrate that your approach achieves “good local explanations” but from a user or human perspective it might be difficult to get convinced about the interpretability in this way only. - the experiments conducted demonstrate that the proposed method is indeed effective with respect to both accuracy and interpretability, at least for a significant majority of the studied datasets. - the paper points out two interesting directions for future work, which are likely to seed future research.

Reviewer 3

Thank you to the authors for the detailed response. I see your point for why you didn't compare the causal metric to SILO - SILO would have many more features, which makes it more complex than SLIM after DStumps features selection. I will keep my recommendation the same, mainly because the experiments are limited to small datasets. Contributions: This paper introduces the SLIM algorithm for local intepretability. SLIM can be used as both a black-box explainer and as a self-explainable model. The SLIM algorithm is a combination of two prior algorithms: SILO (for local linear modeling of decision trees), and DStumps (for feature selection from decision trees). SLIM proceeds as follows: 1. Given a trained forest of decision trees and an example x, use SILO to get example weights for all other training examples. 2. Use DStump to select d features. 3. Use SILO’s example weights and DStump’s d selected features to solve a weighted linear regression problem, and make the final precision for x using this linear model. Strengths: -SLIM provides a novel integration of the previously existing SILO and DStumps methods to produce a locally interpretable model. SLIM can be used as both a black-box explainer and as a self-explainable model. -SLIM provides local interpretability in the form of locally linear models, local example weighting, and feature selection. -SLIM seems to produce at least as good accuracy as the most similar prior work (SILO alone). -This paper also provides a way to get estimate global patterns using the local training distributions produced by SILO (Figures 2-4). Note that the paper only showed an example of this explanation method on simulated data, and it would be great to see similar figures on real data. Weaknesses: -It seems like SLIM requires re-solving the weighted linear regression problem for each new example x, since each example x will produce a different set of training example weights from SILO. This seems inefficient. -It would be easier to read if the authors provided a more clear pseudocode of the SLIM algorithm. -It’s not clear how much SLIM’s use of DStump feature selection helps improve the causal metric beyond SILO alone. Table 1 shows that SLIM’s addition of the DStump feature selection can help the overall RMSE in some cases. However, we don’t see a similar comparison for Table 3 or Table 4. Instead, in Table 4, SLIM is compared to LIME, which is not quite the same as SILO, since LIME fits a local sparse linear model to the given predictive model, while SILO (and SLIM) fit a tree model first and then uses that tree to build a weighted sparse linear model. In Tables 3 and 4, I would be curious to see a comparison between SLIM and SILO both for black box explanation and self-expanation. This should be possible since it should only require removing the DStump feature selection step in SLIM. -Line 181 mentions that d is selected to get the best validation accuracy, but perhaps it can be selected to get the best causal metric, especially if SLIM is being used as a black box explainer. -The datasets used in experiments are all fairly small, the largest having only 2214 examples. I would like to see an experiment with a larger dataset, particularly to see how SLIM interacts with more complex black box models with more training data. -The paper does not address any theoretical guarantees around SLIM, instead providing experimental evidence of how to use SLIM for interpretability. Recommendation: Overall, this paper presents a novel method for interpretability that seems to perform reasonably well on small datasets. Given some reservations in the experiments, my recommendation is a marginal accept.