Ethik AI

Iris species classification

In [1]:
import pandas as pd
from sklearn import datasets

iris = datasets.load_iris()

X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target).map(lambda x: iris.target_names[x])
In [2]:
X.head()
Out[2]:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
In [3]:
y.head()
Out[3]:
0    setosa
1    setosa
2    setosa
3    setosa
4    setosa
dtype: object

We'll train a nearest neighbors classifier, and scale the data beforehand by using a pipeline.

In [4]:
from sklearn import metrics
from sklearn import model_selection
from sklearn import neighbors
from sklearn import pipeline
from sklearn import preprocessing

X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, shuffle=True, random_state=42)

model = pipeline.make_pipeline(
    preprocessing.StandardScaler(),
    neighbors.KNeighborsClassifier()
)
model.fit(X_train, y_train)

y_pred = model.predict_proba(X_test)
y_pred = pd.DataFrame(y_pred, columns=model.classes_)

y_pred.head()
Out[4]:
setosa versicolor virginica
0 0.0 0.8 0.2
1 1.0 0.0 0.0
2 0.0 0.0 1.0
3 0.0 1.0 0.0
4 0.0 1.0 0.0

As we are working on a classification task, we will use a ClassificationExplainer.

In [5]:
import ethik

explainer = ethik.ClassificationExplainer()

One variable, one class.

In [6]:
explainer.plot_influence(
    X_test=X_test['sepal length (cm)'],
    y_pred=y_pred['setosa'],
)
100%|██████████| 41/41 [00:00<00:00, 744.25it/s]

Two variables, one class.

In [7]:
explainer.plot_influence(
    X_test=X_test[['sepal length (cm)', 'sepal width (cm)']],
    y_pred=y_pred['setosa']
)
100%|██████████| 82/82 [00:00<00:00, 737.72it/s]

One variable, two classes.

In [8]:
explainer.plot_influence(
    X_test=X_test['sepal length (cm)'],
    y_pred=y_pred[['versicolor', 'virginica']]
)
100%|██████████| 41/41 [00:00<00:00, 741.01it/s]
100%|██████████| 41/41 [00:00<00:00, 733.78it/s]

Two variables, two classes.

In [9]:
explainer.plot_influence(
    X_test=X_test[['sepal length (cm)', 'sepal width (cm)']],
    y_pred=y_pred[['versicolor', 'virginica']]
)
100%|██████████| 82/82 [00:00<00:00, 716.79it/s]
100%|██████████| 82/82 [00:00<00:00, 742.47it/s]

All the variables, all the classes.

In [10]:
explainer.plot_influence(
    X_test=X_test,
    y_pred=y_pred,
    size=(None, 1000)
)
100%|██████████| 164/164 [00:00<00:00, 725.91it/s]
100%|██████████| 164/164 [00:00<00:00, 734.99it/s]
100%|██████████| 164/164 [00:00<00:00, 734.03it/s]
In [11]:
explainer.plot_performance(
    X_test=X_test["sepal length (cm)"],
    y_test=y_test,
    y_pred=y_pred,
    metric=metrics.log_loss,
)
100%|██████████| 41/41 [00:00<00:00, 261.11it/s]
In [12]:
explainer.plot_performance(
    X_test=X_test,
    y_test=y_test,
    y_pred=y_pred,
    metric=metrics.log_loss,
)
100%|██████████| 164/164 [00:00<00:00, 274.41it/s]
In [13]:
explainer.plot_influence_2d(
    X_test=X_test[['sepal length (cm)', 'sepal width (cm)']],
    y_pred=y_pred[['virginica']]
)
100%|██████████| 1681/1681 [00:04<00:00, 394.21it/s]
/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/ethik-0.0.4-py3.7.egg/ethik/cache_explainer.py:166: ConvergenceWarning:

328 groups didn't converge.

In [14]:
explainer.plot_influence_2d(
    X_test=X_test[['sepal length (cm)', 'sepal width (cm)']],
    y_pred=y_pred[['versicolor', 'virginica']]
)
100%|██████████| 1681/1681 [00:04<00:00, 394.86it/s]
/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/ethik-0.0.4-py3.7.egg/ethik/cache_explainer.py:166: ConvergenceWarning:

328 groups didn't converge.

100%|██████████| 1681/1681 [00:04<00:00, 390.37it/s]
/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/ethik-0.0.4-py3.7.egg/ethik/cache_explainer.py:166: ConvergenceWarning:

328 groups didn't converge.