Multi-dimensional explanations¶

Until now, we've considered the marginal distributions only (please, read the "How It Works" page to understand how the algorithm works under the hood):

In [1]:

import ethik
import lightgbm as lgb
import numpy as np
import pandas as pd
from sklearn import metrics, model_selection

X, y = ethik.datasets.load_adult()
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, shuffle=True, random_state=42)

model = lgb.LGBMClassifier(random_state=42).fit(X_train, y_train)

y_pred = model.predict_proba(X_test)[:, 1]
y_pred = pd.Series(y_pred, name=">$50k")

explainer = ethik.ClassificationExplainer(n_taus=21, memoize=True)

In [2]:

explainer.plot_influence(
    X_test=X_test["age"],
    y_pred=y_pred
).show()
# "education-num" doesn't change the result for "age"
explainer.plot_influence(
    X_test=X_test[["age", "education-num"]],
    y_pred=y_pred
).show()

100%|██████████| 21/21 [00:00<00:00, 555.30it/s]

100%|██████████| 21/21 [00:00<00:00, 584.52it/s]

But the age and the education level are correlated, so considering them independently makes us loose some information. To remedy that, we can call plot_influence_2d() (and plot_performance_2d()):

In [3]:

explainer.plot_influence_2d(
    X_test=X_test[["age", "education-num"]],
    y_pred=y_pred
)

100%|██████████| 441/441 [00:01<00:00, 255.58it/s]
/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/ethik-0.0.4-py3.7.egg/ethik/cache_explainer.py:166: ConvergenceWarning:

24 groups didn't converge.

Now that we are able to reach multiple targets simultaneously, we can also look at a single feature while fixing the mean of another(s):

In [4]:

explainer.plot_influence(
    X_test=X_test[["age", "education-num"]],
    y_pred=y_pred,
    constraints={"education-num": 8}
)

100%|██████████| 21/21 [00:00<00:00, 256.26it/s]

Since the education level here is lower than the dataset mean (about 10), the average output is lower as well.

We can specify multiple constraints:

In [5]:

explainer.plot_influence(
    X_test=X_test[["age", "education-num", "hours-per-week"]],
    y_pred=y_pred,
    constraints={"education-num": 8, "hours-per-week": 25}
)

100%|██████████| 21/21 [00:00<00:00, 259.81it/s]

In [ ]: