Iris species classification¶
In [1]:
import pandas as pd
from sklearn import datasets
iris = datasets.load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target).map(lambda x: iris.target_names[x])
In [2]:
X.head()
Out[2]:
In [3]:
y.head()
Out[3]:
We'll train a nearest neighbors classifier, and scale the data beforehand by using a pipeline.
In [4]:
from sklearn import metrics
from sklearn import model_selection
from sklearn import neighbors
from sklearn import pipeline
from sklearn import preprocessing
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, shuffle=True, random_state=42)
model = pipeline.make_pipeline(
preprocessing.StandardScaler(),
neighbors.KNeighborsClassifier()
)
model.fit(X_train, y_train)
y_pred = model.predict_proba(X_test)
y_pred = pd.DataFrame(y_pred, columns=model.classes_)
y_pred.head()
Out[4]:
As we are working on a classification task, we will use a ClassificationExplainer
.
In [5]:
import ethik
explainer = ethik.ClassificationExplainer()
One variable, one class.
In [6]:
explainer.plot_influence(
X_test=X_test['sepal length (cm)'],
y_pred=y_pred['setosa'],
)
Two variables, one class.
In [7]:
explainer.plot_influence(
X_test=X_test[['sepal length (cm)', 'sepal width (cm)']],
y_pred=y_pred['setosa']
)
One variable, two classes.
In [8]:
explainer.plot_influence(
X_test=X_test['sepal length (cm)'],
y_pred=y_pred[['versicolor', 'virginica']]
)
Two variables, two classes.
In [9]:
explainer.plot_influence(
X_test=X_test[['sepal length (cm)', 'sepal width (cm)']],
y_pred=y_pred[['versicolor', 'virginica']]
)
All the variables, all the classes.
In [10]:
explainer.plot_influence(
X_test=X_test,
y_pred=y_pred,
size=(None, 1000)
)
In [11]:
explainer.plot_performance(
X_test=X_test["sepal length (cm)"],
y_test=y_test,
y_pred=y_pred,
metric=metrics.log_loss,
)
In [12]:
explainer.plot_performance(
X_test=X_test,
y_test=y_test,
y_pred=y_pred,
metric=metrics.log_loss,
)
In [13]:
explainer.plot_influence_2d(
X_test=X_test[['sepal length (cm)', 'sepal width (cm)']],
y_pred=y_pred[['virginica']]
)
In [14]:
explainer.plot_influence_2d(
X_test=X_test[['sepal length (cm)', 'sepal width (cm)']],
y_pred=y_pred[['versicolor', 'virginica']]
)