Local explainers

LIME

Local Interpretable Model-agnostic Explanations (LIME) [ribeiro2016] is a saliency ex- planation method. LIME aims to explain a prediction \(p = (x, y)\) (an input- output pair) generated by a black box model \(f : \mathbb{R}^d \rightarrow \mathbb{R}\). Such explanations come in the form of a “saliency” \(w_i\) attached to each feature \(x_i\) in the prediction input \(x\). LIME generates a local explanation \(\xi(x)\) according to the following model:

\[\xi(x) = \arg\min_{g \in G}L(f, g, \pi_x) + \Omega(g)\]

where \(\pi_x\) is a proximity function, \(G\) the family of interpretable models, \(\Omega(g)\) is a measure of complexity of an explanation \(g \in G\) and \(L(f, g, \pi_x)\) is a measure of how unfaithful \(g\) is in approximating \(f\) in the locality defined by \(\pi_x\). In the original paper, G is the class of linear models, πx is an exponential kernel on a distance function \(D\) (e.g. cosine distance). LIME converts samples \(x_i\) from the original domain into intepretable samples as binary vectors \(x^{\prime}_i \in {0, 1}\). An encoded dataset \(E\) is built by taking non-zero elements of \(x^{\prime}_i\), recovering the original representation \(z \in \mathbb{R}^d\) and then computing \(f(z)\). A weighted linear model \(g\) (with weights provided via \(\pi_x\)) is then trained upon the generated sparse dataset \(E\) and the model weights \(w\) are used as feature weights for the final explanation \(\xi(x)\).

SHAP

SHAP, presented by Scott Lundberg and Su-In Lee in 2017[lundberg2017], seeks to unify a number of common explanation methods, notably LIME [ribeiro2016] and DeepLIFT [shrikumar2017], under a common umbrella of additive feature attributions. These are explanation methods that explain how an input \(x = [x_1, x_2, ..., x_M \)] affects the output of some model \(f\) by transforming \(x \in \mathbb{R}^M\) into simplified inputs \(z^{\prime} \in 0, 1^M\) , such that \(z^{\prime}_i\) indicates the inclusion or exclusion of feature \(i\). These simplified inputs are then passed to an explanatory model \(g\) that takes the following form:

\[x = h_x(z^{\prime}) \\ g(z^{\prime}) = \phi_0 + \sum_{i=1}^M \phi_i z_i^{\prime} \\ \textbf{s.t.}\quad g(z^{\prime}) \approx f (h_x(z^{\prime}))\]

In such a form, each value \(\phi_i\) marks the contribution that feature \(i\) had on the output model (called the attribution), and \(\phi_0\) marks the null output of the model; the model output when every feature is excluded. Therefore, this presents an easily interpretable explanation of the importance of each feature and a framework to permute the various input features to establish their collection contributions.

The final result of the algorithm are the Shapley values of each feature, which give an itemized "receipt" of all the contributing factors to the decision. For example, a SHAP explanation of a loan application might be as follows:

Feature Shapley Value φ

Null Output

50%

Income

+10%

# Children

-15%

Age

+22%

Own Home?

-30%

Acceptance%

37%

Deny

63%

From this, the applicant can see that the biggest contributor to their denial was their home ownership status, which reduced their acceptance probability by 30 percentage points. Meanwhile, their number of children was of particular benefit, increasing their probability by 22 percentage points.

References

  • [ribeiro2016] Ribeiro, M.T., Singh, S., Guestrin, C.: ” why should i trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144 (2016)

  • [lundberg2017], S., Lee, S.I.: A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems (2017)

  • [shrikumar2017], Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. CoRR abs/1704.02685 (2017)