Multi-dimensional explanations¶
Until now, we've considered the marginal distributions only (please, read the "How It Works" page to understand how the algorithm works under the hood):
In [1]:
import ethik
import lightgbm as lgb
import numpy as np
import pandas as pd
from sklearn import metrics, model_selection
X, y = ethik.datasets.load_adult()
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, shuffle=True, random_state=42)
model = lgb.LGBMClassifier(random_state=42).fit(X_train, y_train)
y_pred = model.predict_proba(X_test)[:, 1]
y_pred = pd.Series(y_pred, name=">$50k")
explainer = ethik.ClassificationExplainer(n_taus=21, memoize=True)
In [2]:
explainer.plot_influence(
X_test=X_test["age"],
y_pred=y_pred
).show()
# "education-num" doesn't change the result for "age"
explainer.plot_influence(
X_test=X_test[["age", "education-num"]],
y_pred=y_pred
).show()
But the age and the education level are correlated, so considering them independently makes us loose some information. To remedy that, we can call plot_influence_2d()
(and plot_performance_2d()
):
In [3]:
explainer.plot_influence_2d(
X_test=X_test[["age", "education-num"]],
y_pred=y_pred
)
Now that we are able to reach multiple targets simultaneously, we can also look at a single feature while fixing the mean of another(s):
In [4]:
explainer.plot_influence(
X_test=X_test[["age", "education-num"]],
y_pred=y_pred,
constraints={"education-num": 8}
)
Since the education level here is lower than the dataset mean (about 10), the average output is lower as well.
We can specify multiple constraints:
In [5]:
explainer.plot_influence(
X_test=X_test[["age", "education-num", "hours-per-week"]],
y_pred=y_pred,
constraints={"education-num": 8, "hours-per-week": 25}
)
In [ ]: