How It Works¶

In this notebook, we explain the maths behind ethik and how it is implemented. It is not mandatory to start using the package. To so so, please read the Getting started tutorial.

Let's assume you have a dataset and a model that generates predictions on this dataset. For the sake of the example, we'll be wortking with the Adult dataset.

In [1]:

import ethik
import numpy as np
import pandas as pd
import plotly.graph_objs as go
from sklearn import metrics, model_selection

X, y = ethik.datasets.load_adult()
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, shuffle=True, random_state=42)

X_test.head()

Out[1]:

	age	workclass	fnlwgt	education	education-num	marital-status	occupation	relationship	race	gender	capital-gain	hours-per-week	native-country
14160	27	Private	160178	Some-college	10	Divorced	Adm-clerical	Not-in-family	White	Female	0	38	United-States
27048	45	State-gov	50567	HS-grad	9	Married-civ-spouse	Exec-managerial	Wife	White	Female	0	40	United-States
28868	29	Private	185908	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	Black	Male	0	55	United-States
5667	30	Private	190040	Bachelors	13	Never-married	Machine-op-inspct	Not-in-family	White	Female	0	40	United-States
7827	29	Self-emp-not-inc	189346	Some-college	10	Divorced	Craft-repair	Not-in-family	White	Male	2202	50	United-States

ethik is model-agnostic and works with the data only.

In [2]:

def get_black_box_model_predictions():
    import lightgbm as lgb
    model = lgb.LGBMClassifier(random_state=42).fit(X_train, y_train)
    return model.predict_proba(X_test)[:, 1]

y_pred = get_black_box_model_predictions()
# We use a named pandas series to make plot labels more explicit
y_pred = pd.Series(y_pred, name=">$50k")
y_pred

Out[2]:

0       0.009978
1       0.366950
2       0.749978
3       0.015823
4       0.015583
          ...   
8136    0.004573
8137    0.113812
8138    0.001476
8139    0.707632
8140    0.261709
Name: >$50k, Length: 8141, dtype: float64

ethik lets us explore how each feature impact the model's behavior, either its prediction or its performance.

The Maths¶

Mathematically, we have a dataset of $n$ samples and $p$ features:

$$ \begin{aligned} X &= \{(x_i^1, x_i^2, \ldots, x_i^p) \}_{1 \leq i \leq n} \\ &= \{x_i\}_{1 \leq i \leq n} \end{aligned} $$

For data scientists, $X$ is X_test.

We also have the model's predictions:

$$ \begin{aligned} \hat{Y} &= \{f(x_i)\}_{1 \leq i \leq n} \\ &= \{y_i\}_{1 \leq i \leq n} \end{aligned} $$

We potentially also have $Y$, the true output observed from the real world.

The dataset $(X, \hat{Y}, Y)$ can be considered as an empirical probability distribution $Q_n$ (the unobserved true distribution being $Q$). As explained in the paper, the key idea of ethik is to stress this distribution with respect to a property of $X$ to reach a target $t$ on this property.

For now, we only handle targets on the mean, i.e. we can stress $Q_n$ so that the mean of $X$ is $\mu_t = (\mu_t^1, \mu_t^2, \ldots, \mu_t^p)$. Then we obtain a new distribution $Q_t$.

But of course, a lot of distributions can have the mean $\mu_t$, so the target is not sufficient to determine the stress. Because the model was trained on $Q_n$ (we suppose that the training set and the test set have the same distribution), we want $Q_t$ to be as close as possible to $Q_n$. We do this by minimizing the Kullback-Leibler divergence between $Q_n$ and $Q_t$.

Let's take an example. First, let's build a toy dataset with two features $X_{age}$ and $X_{education-num}$:

In [3]:

n = len(X_test) 
ds = pd.DataFrame(
    [
        [X_test["age"].iloc[i], X_test["education-num"].iloc[i], y_pred[i], int(y_test.iloc[i])]
        for i in range(n)
    ],
    columns=["age", "education-num", "ŷ", "y"]
)
ds

Out[3]:

	age	education-num	ŷ	y
0	27	10	0.009978	0
1	45	9	0.366950	0
2	29	13	0.749978	1
3	30	13	0.015823	0
4	29	10	0.015583	0
...	...	...	...	...
8136	35	9	0.004573	0
8137	60	3	0.113812	1
8138	25	9	0.001476	0
8139	50	13	0.707632	1
8140	24	4	0.261709	1

8141 rows × 4 columns

Let's plot the marginal distribution of $X_{age}$:

In [4]:

def plot_pdf(x, density):
    width = x[1:] - x[:-1]
    mean = sum(x * density) / sum(density)
    return go.Figure().add_bar(
        x=x,
        y=density,
        width=width,
    ).update_layout(
        plot_bgcolor="#FFF",
        xaxis=dict(
            title="age",
            linecolor="black",
        ),
        yaxis=dict(
            title="Density",
            linecolor="black",
        ),
        shapes=[
            go.layout.Shape(
                type="line",
                x0=mean,
                y0=0,
                x1=mean,
                y1=1,
                yref="paper",
                line=dict(
                    color="red",
                    width=1,
                    dash="dash",
                )
            )
        ],
        annotations=[
            go.layout.Annotation(
                x=mean,
                y=1,
                xref="x",
                yref="paper",
                text=f"Mean = {mean:.2f}",
                showarrow=False,
                yanchor="middle",
                bgcolor="#FFF",
                bordercolor="red",
                borderpad=4,
            )
        ]
    )

In [5]:

densities, edges = np.histogram(ds["age"], bins=40, density=True)
x_pdf = edges[:-1]
plot_pdf(x_pdf, densities)

Like with counterfactual explanations, we now ask the question:

What if the mean age was $\mu_t$?

For instance, we can stress $Q_n$ so that $\mu_t \approx 30$:

In [6]:

target = 30
weights = [2 if age <= target else 0.5 for age in ds["age"]]
stressed_densities, _ = np.histogram(ds["age"], bins=40, density=True, weights=weights)
plot_pdf(x_pdf, stressed_densities)

We could sample from this distribution and observe how the model behaves for a mean age of 20. It'd be similar to Partial Dependence Plots.

The problem here is that we only stressed the marginal distribution of $X_{age}$. But features may be correlated and if we don't consider that, we are likely to create unrealistic individuals (a 18 year-old who has a PhD).

To address this issue, the key idea behind the paper is to stress the whole distribution $(X, \hat{Y}, Y)$ while minimizing the distance to the original distribution. The Kullback–Leibler divergence is used.

Minimizing the Kullback–Leibler divergence between the stressed distribution and the original distribution lets us believe that we are not creating individuals that are too unrealistic. Of course, reaching a minimum doesn't mean that this minimum is actually small and, to certify the model, we need to define a threshold on the distance. This is out of the scope of this notebook though.

As explained in the paper, we can efficiently compute a unique set of weights $\{\lambda_i^{(t)}\}_{1 \leq i \leq n}$ to stress the probability density function to reach a target on a property while minimizing the KL divergence:

$$ Q_t = \sum_{i=1}^n \lambda_i^{(t)} \delta_{x_i, \hat{y}_i, y_i}. $$

with:

$Q_t$ being the distribution meeting the target $t$ (e.g. $E[X_{age}] = \mu_t$) and minimizing $KL(Q_t, Q_n)$ ;
$\delta_{x_i, \hat{y}_i, y_i}$ being the probability density of the $i$-th sample.

Remember: $Q_t$ includes the input $X$ but also the output $\hat{Y}$ (and potentially the true output $Y$). It means that we can compute what would the output be if $X$ met the target $t$ without having to run the model again. This is a huge gain in performance!

Let's notice that we do not change the individuals of the dataset but their probability density. In other words, to increase the mean of a die, we do not replace the 6 by a 7 but we trick the die to increase the probability of rolling a 6.

The theorem that tells us how to compute these weights works for targets on the mean of $X$. By transforming $X$, we can also reach targets on higher-order moments (e.g. the variance). For instance, the mean of the feature $(X - E[X])^2$ is actually the variance of $X$. In the paper, the transformation of $X$ is called $\Phi(X)$ (e.g. $\Phi : X \mapsto (X - E[X])^2$).

So it enables us to answer questions like: What would the output distribution be if the mean of...

this feature were $\mu_t$? (we only consider one feature, i.e. its marginal distribution)
these features were $(\mu_t^1, \mu_t^2)$? (we consider multiple features and so their potential correlations)
$(X^{j_0} - E[X^{j_0}])^2$ (i.e. the variance) were $\sigma_t^2$? (we consider the variance of a feature instead of its mean)
...

And so without having to run the model again.

Implementation¶

Since we have a finite number of data samples, we can consider our probability distribution as discrete. Stressing the distribution means applying weights to the density of every individual:

$$ Q_t = \sum_{i=1}^n \lambda_i^{(t)} \delta_{x_i, \hat{y}_i, y_i}. $$

Let's notice that the $x_i$ may have been transformed by $\Phi$, as illustrated above.

As explained in the paper, these weights $\lambda_i^{(t)}$ are derived from a common parameter $\xi_t$:

$$ \begin{aligned} \lambda_i^{(t)} &= \frac{\exp(\langle \xi_t, x_i \rangle)}{\sum_{i=1}^n \exp(\langle \xi_t, x_i \rangle)} \\ &= \text{softmax}(\langle \xi_t, x_i \rangle) \end{aligned} $$

with $\langle x, y \rangle$ being the scalar product. $\lambda_i^{(t)}$ is then a scalar.

Then the mean of the stressed distribution is given by:

$$ \mu_t = \frac{1}{\sum_{i=1}^n \lambda_i^{(t)}} \sum_{i=1}^n \lambda_i^{(t)} \times x_i $$

To find $\xi_t$ algorithmically, we minimize the function (target_mean - current_mean) ** 2, with current_mean being computed with the previous formulas. The code is in ethik.base_explainer.compute_ksi().

Explaining a black-box model with a black-box is not a big deal. Let's dive into the machinery:

First, we can visualize the stressed distribution in ethik:

In [7]:

ethik.ClassificationExplainer().plot_distributions(
    X_test["age"],
    targets=[25, 45]
)

The distribution for a mean age of 25 doesn't look much like the original distribution. It means that some individuals get a lot more weights than in the data. We can visualize how many individuals capture X% of the weight (i.e. the sum of the $\lambda_i^{(t)}$):

In [8]:

ethik.ClassificationExplainer().plot_cumulative_weights(
    X_test["age"],
    targets=[25, 45]
)

To plot this chart, we:

Compute the weights $\lambda_i^{(t)}$ ;
Sort them decreasingly (the highest first) ;
Plot them cumulatively.

A straight line means that the weight is uniformely distributed on the individuals. Here, we can see that for a mean of 25, 50% of the weight is captured by 14% of the individuals. Whether this is too small or not is out of the scope of this notebook.

Let's notice that we only stress one feature here, so the potential correlations with other features are not considered. The algorithm just makes sure that the stressed (marginal) distribution is the one that is the closest to the original one (according to a KL distance).

Data structures¶

To explain how we manipulate the data, we'll use the internal class BaseExplainer:

In [9]:

base_explainer = ethik.base_explainer.BaseExplainer()

First, we need to build a query, that is a pandas dataframe that contains at least four columns to answer the question: What would the output label be this feature were equal to target?

feature: It must match the name of a column in the dataset.
target: This value must satisfy feature.min() < target < feature.max().
label: This value is used for multi-class classification problems. For binary classification and regression, it must just match the name of the pandas series containing the predictions.
group: A key to determine the targets that must be reached together. For instance, if the rows ("age", 30, "output") and ("education-num", 10, "output") have the same group, it means that we will consider the distribution $(X_{age}, X_{education-num})$ and try to reach the mean $(30, 10)$. It differs from considering the marginal distributions $X_{age}$ and $X_{education-num}$ independently.

For instance:

In [10]:

query = pd.DataFrame({
    "group": [0, 1],
    "feature": ["age", "age"],
    "target": [30, 45],
    "label": [y_pred.name, y_pred.name],
})
query

Out[10]:

	group	feature	target	label
0	0	age	30	>$50k
1	1	age	45	>$50k

To reach multiple targets at the same time, we specify a common group. Of course, a group cannot contain more than one row per feature. All groups don't need to have the same number of features.

In [11]:

pd.DataFrame({
    "group": [0, 0, 1],
    "feature": ["age", "education-num", "age"],
    "target": [30, 10, 45],
    "label": [y_pred.name, y_pred.name, y_pred.name],
})

Out[11]:

	group	feature	target	label
0	0	age	30	>$50k
1	0	education-num	10	>$50k
2	1	age	45	>$50k

To compute the weights used for the stress, the method _fill_ksis() is called internally:

In [12]:

base_explainer._fill_ksis(X_test, query)

Out[12]:

	group	feature	target	label	ksi	converged
0	0	age	30	>$50k	-0.802128	True
1	1	age	45	>$50k	0.421562	True

_fill_ksis() alters its input:

In [13]:

query

Out[13]:

	group	feature	target	label	ksi	converged
0	0	age	30	>$50k	-0.802128	True
1	1	age	45	>$50k	0.421562	True

In practice, we directly call an _explain_*() method. Let's initialize the query again to be in that case:

In [14]:

query = pd.DataFrame({
    "group": [0, 1],
    "feature": ["age", "age"],
    "target": [30, 45],
    "label": [y_pred.name, y_pred.name],
})
query

Out[14]:

	group	feature	target	label
0	0	age	30	>$50k
1	1	age	45	>$50k

In [15]:

base_explainer._explain_influence(
    X_test=X_test,
    y_pred=y_pred,
    query=query
)

100%|██████████| 2/2 [00:00<00:00, 493.71it/s]

Out[15]:

	group	feature	target	label	ksi	converged	influence	influence_low	influence_high
0	0	age	30	>$50k	-0.802128	True	0.156249	0.156249	0.156249
1	1	age	45	>$50k	0.421562	True	0.272460	0.272460	0.272460

The original query has not been altered:

In [16]:

query

Out[16]:

	group	feature	target	label
0	0	age	30	>$50k
1	1	age	45	>$50k

As seen before, ksi is the parameter used to compute the weights. influence is the mean of the output. We can read the dataframe above this way: When the mean age is 30, the average probability of earning more than $50k a year is about 15%.

influence_low and influence_high are for the confidence interval. As we didn't specify n_samples when we created the explainer, their value is equal to the mean. Otherwise, we follow this procedure:

Sub-sample the dataset (e.g. take 80% of the samples)
Add the smallest and the highest values of the dataset (so that we can reach extreme targets)
Compute the feature influence
Do that n times
Get the mean of the results and the 5% and 95% quantiles (defined by explainer.conf_level) for the confidence interval

In [17]:

ethik.base_explainer.BaseExplainer(n_samples=10)._explain_influence(
    X_test=X_test,
    y_pred=y_pred,
    query=query
)

100%|██████████| 20/20 [00:00<00:00, 583.16it/s]

Out[17]:

	group	feature	target	label	ksi	converged	influence	influence_low	influence_high
0	0	age	30	>$50k	-0.802128	True	0.156900	0.155986	0.157846
1	1	age	45	>$50k	0.421562	True	0.272785	0.270981	0.274631

We can also ask the question: What is the average performance when the mean age is 30 or 45?

In [18]:

base_explainer._explain_performance(
    X_test=X_test,
    y_test=y_test,
    y_pred=y_pred > 0.5,
    metric=metrics.accuracy_score,
    query=query
)

100%|██████████| 2/2 [00:00<00:00, 311.43it/s]

Out[18]:

	group	feature	target	label	ksi	converged	accuracy_score	accuracy_score_low	accuracy_score_high
0	0	age	30	>$50k	-0.802128	True	0.914410	0.914410	0.914410
1	1	age	45	>$50k	0.421562	True	0.865622	0.865622	0.865622

Query building¶

In the public API, we expose classes that inherit from CacheExplainer (which inherits from BaseExplainer itself). This class does three things:

It builds the query automatically using ethik.query.Query() ;
It calls the methods of BaseExplainer to explain the model ;
It stores the results in an attribute (we'll see that in detail later).

To build the query, it uses the quantiles:

In [19]:

ethik.query.Query.from_taus(
    X_test=X_test[["age"]],
    labels=[y_pred.name],
    n_taus=7,
    q=[0.05, 0.95],
)

Out[19]:

	group	tau	target	feature	label
0	e63f2712ec1d2eb591f5ba9d2a34529158a33d62933794...	-1.000000e+00	19.000000	age	>$50k
1	d42fc5d10ea54079f10475004ed37a3f314106c142ce12...	-6.666667e-01	25.510748	age	>$50k
2	3fcf77fd897e88b9d68b22be7e6a853b1f1685d9a6d2d7...	-3.333333e-01	32.021496	age	>$50k
3	62e417a8c5aab9625b453dbd2e64a570ff95960f3274c4...	-1.000000e-16	38.532244	age	>$50k
4	7bfbc5d74540bbbf35b94a8de88f3540fac9e2ef458ea8...	3.333333e-01	46.688163	age	>$50k
5	cd93f2c34b36a40ed7df22adcf93136c7eed5309020ec4...	6.666667e-01	54.844081	age	>$50k
6	7d12ffc8eecaa90ca5dc2820861acace26776f8281b312...	1.000000e+00	63.000000	age	>$50k

It doesn't make sense to target a mean that is too close to the bounds of the feature because it would lead to a distribution with a density focused on the few smallest/largest individuals only, which is quite different from the original distribution the model was trained with.

To avoid that, we keep the targets between bounds that are controlled by the alpha parameter. By default, alpha = 0.05, which means that the targets are between the 5% and the 95% quantiles.

In the query above, tau == 0 represents the original mean, tau == -1 the 5% quantile and tau == 1 the 95% quantile. As the data may not be evenly distributed, a regular step for tau doesn't mean a regular step for target.

The group id is built using a hash to easily retrieve an existing record (see below).

The query is built internally when we call explain_*():

In [20]:

cache_explainer = ethik.cache_explainer.CacheExplainer()
cache_explainer.explain_influence(
    X_test=X_test["age"],
    y_pred=y_pred,
).head()

100%|██████████| 41/41 [00:00<00:00, 567.71it/s]

Out[20]:

	group	feature	tau	target	ksi	label	converged	influence	influence_low	influence_high
0	e63f2712ec1d2eb591f5ba9d2a34529158a33d62933794...	age	-1.00	19.000000	-6.615004	>$50k	True	0.006516	0.006516	0.006516
1	c8488b853b24b27e293ff5b81e013c7329580c5834e163...	age	-0.95	19.976612	-4.691385	>$50k	True	0.015121	0.015121	0.015121
2	11a01c7f5952e2a521d2f81e93453b59ee4920fc6fe1b6...	age	-0.90	20.953224	-3.642715	>$50k	True	0.026264	0.026264	0.026264
3	8fdd3d71ff57584c6e5f8372de8126eab403d89c10a533...	age	-0.85	21.929837	-2.891141	>$50k	True	0.040686	0.040686	0.040686
4	4624202d8cbbf4b54e59bf68e76edd1baf9cf310a3ca1c...	age	-0.80	22.906449	-2.413349	>$50k	True	0.054703	0.054703	0.054703

Caching¶

CacheExplainer stores the results in its .info attribute:

In [21]:

cache_explainer.explain_influence(
    X_test=X_test["age"],
    y_pred=y_pred,
)
cache_explainer.info.head()

100%|██████████| 41/41 [00:00<00:00, 565.70it/s]

Out[21]:

	group	feature	tau	target	ksi	label	converged	influence	influence_low	influence_high
0	e63f2712ec1d2eb591f5ba9d2a34529158a33d62933794...	age	-1.00	19.000000	-6.615004	>$50k	True	0.006516	0.006516	0.006516
1	c8488b853b24b27e293ff5b81e013c7329580c5834e163...	age	-0.95	19.976612	-4.691385	>$50k	True	0.015121	0.015121	0.015121
2	11a01c7f5952e2a521d2f81e93453b59ee4920fc6fe1b6...	age	-0.90	20.953224	-3.642715	>$50k	True	0.026264	0.026264	0.026264
3	8fdd3d71ff57584c6e5f8372de8126eab403d89c10a533...	age	-0.85	21.929837	-2.891141	>$50k	True	0.040686	0.040686	0.040686
4	4624202d8cbbf4b54e59bf68e76edd1baf9cf310a3ca1c...	age	-0.80	22.906449	-2.413349	>$50k	True	0.054703	0.054703	0.054703

By default, this dataframe is reset at each call, as this is the less-magical behaviour:

In [22]:

cache_explainer.explain_influence(
    X_test=X_test["education-num"],
    y_pred=y_pred,
)
cache_explainer.info.head()

100%|██████████| 41/41 [00:00<00:00, 598.48it/s]

Out[22]:

	group	feature	tau	target	ksi	label	converged	influence	influence_low	influence_high
0	348f18bb67a8e8c11ad895a5ee5f7c6d61a10eda55a548...	education-num	-1.00	5.000000	-1.544333	>$50k	True	0.094024	0.094024	0.094024
1	c5c37148475737b3f8e58a56a7dd1cae04f90076c6eb07...	education-num	-0.95	5.252709	-1.464212	>$50k	True	0.097783	0.097783	0.097783
2	e19ad8813037d438218f8e4d785a08e73a899422bab9df...	education-num	-0.90	5.505417	-1.387641	>$50k	True	0.101815	0.101815	0.101815
3	f860fd046939b7724611c2f1cc3daa024b9c6a55c80ca1...	education-num	-0.85	5.758126	-1.314047	>$50k	True	0.106100	0.106100	0.106100
4	2e4f9142d4862ff90f4ee892dad59ea9dd95f55b32fb99...	education-num	-0.80	6.010834	-1.242817	>$50k	True	0.110628	0.110628	0.110628

But we can instantiate the explainer with memoize=True:

In [23]:

cache_explainer = ethik.cache_explainer.CacheExplainer(memoize=True)

In [24]:

cache_explainer.explain_influence(
    X_test=X_test["age"],
    y_pred=y_pred,
)
cache_explainer.info.head()

100%|██████████| 41/41 [00:00<00:00, 592.71it/s]

Out[24]:

	group	feature	tau	target	ksi	label	converged	influence	influence_low	influence_high
0	e63f2712ec1d2eb591f5ba9d2a34529158a33d62933794...	age	-1.00	19.000000	-6.615004	>$50k	True	0.006516	0.006516	0.006516
1	c8488b853b24b27e293ff5b81e013c7329580c5834e163...	age	-0.95	19.976612	-4.691385	>$50k	True	0.015121	0.015121	0.015121
2	11a01c7f5952e2a521d2f81e93453b59ee4920fc6fe1b6...	age	-0.90	20.953224	-3.642715	>$50k	True	0.026264	0.026264	0.026264
3	8fdd3d71ff57584c6e5f8372de8126eab403d89c10a533...	age	-0.85	21.929837	-2.891141	>$50k	True	0.040686	0.040686	0.040686
4	4624202d8cbbf4b54e59bf68e76edd1baf9cf310a3ca1c...	age	-0.80	22.906449	-2.413349	>$50k	True	0.054703	0.054703	0.054703

In [25]:

cache_explainer.explain_influence(
    X_test=X_test["education-num"],
    y_pred=y_pred,
)
cache_explainer.info

100%|██████████| 41/41 [00:00<00:00, 595.64it/s]

Out[25]:

	group	feature	tau	target	ksi	label	converged	influence	influence_low	influence_high
0	e63f2712ec1d2eb591f5ba9d2a34529158a33d62933794...	age	-1.00	19.000000	-6.615004	>$50k	True	0.006516	0.006516	0.006516
1	c8488b853b24b27e293ff5b81e013c7329580c5834e163...	age	-0.95	19.976612	-4.691385	>$50k	True	0.015121	0.015121	0.015121
2	11a01c7f5952e2a521d2f81e93453b59ee4920fc6fe1b6...	age	-0.90	20.953224	-3.642715	>$50k	True	0.026264	0.026264	0.026264
3	8fdd3d71ff57584c6e5f8372de8126eab403d89c10a533...	age	-0.85	21.929837	-2.891141	>$50k	True	0.040686	0.040686	0.040686
4	4624202d8cbbf4b54e59bf68e76edd1baf9cf310a3ca1c...	age	-0.80	22.906449	-2.413349	>$50k	True	0.054703	0.054703	0.054703
...	...	...	...	...	...	...	...	...	...	...
77	43e60fb600fd7bb7fb5218f9ce0568774ccff77e0661ed...	education-num	0.80	13.210834	1.477479	>$50k	True	0.474599	0.474599	0.474599
78	e7867f750ed84b1832051ce099fb10c678a4e1c06a7f02...	education-num	0.85	13.408126	1.620129	>$50k	True	0.494980	0.494980	0.494980
79	ddd8b20e3454a2db711a0f348fcb9520fe8a1d4444ea00...	education-num	0.90	13.605417	1.760868	>$50k	True	0.513974	0.513974	0.513974
80	d76832b9f61caf612fa9d40bb553255dfd4bc8fb9fe8ab...	education-num	0.95	13.802709	1.913423	>$50k	True	0.533300	0.533300	0.533300
81	adaae106a1b4eac32a2e562c727160f8ccc2da0b1a655d...	education-num	1.00	14.000000	2.079540	>$50k	True	0.552879	0.552879	0.552879

82 rows × 10 columns

This way, we don't have to compute the ksis twice when we call explain_influence() and explain_performance():

In [26]:

cache_explainer.explain_performance(
    X_test=X_test["age"],
    y_test=y_test,
    y_pred=y_pred > 0.5,
    metric=metrics.accuracy_score,
)
cache_explainer.info

100%|██████████| 41/41 [00:00<00:00, 354.52it/s]

Out[26]:

	group	feature	tau	target	ksi	label	converged	influence	influence_low	influence_high	accuracy_score	accuracy_score_low	accuracy_score_high
0	e63f2712ec1d2eb591f5ba9d2a34529158a33d62933794...	age	-1.00	19.000000	-6.615004	>$50k	True	0.006516	0.006516	0.006516	0.995804	0.995804	0.995804
1	c8488b853b24b27e293ff5b81e013c7329580c5834e163...	age	-0.95	19.976612	-4.691385	>$50k	True	0.015121	0.015121	0.015121	0.989507	0.989507	0.989507
2	11a01c7f5952e2a521d2f81e93453b59ee4920fc6fe1b6...	age	-0.90	20.953224	-3.642715	>$50k	True	0.026264	0.026264	0.026264	0.982033	0.982033	0.982033
3	8fdd3d71ff57584c6e5f8372de8126eab403d89c10a533...	age	-0.85	21.929837	-2.891141	>$50k	True	0.040686	0.040686	0.040686	0.973085	0.973085	0.973085
4	4624202d8cbbf4b54e59bf68e76edd1baf9cf310a3ca1c...	age	-0.80	22.906449	-2.413349	>$50k	True	0.054703	0.054703	0.054703	0.964937	0.964937	0.964937
...	...	...	...	...	...	...	...	...	...	...	...	...	...
77	43e60fb600fd7bb7fb5218f9ce0568774ccff77e0661ed...	education-num	0.80	13.210834	1.477479	>$50k	True	0.474599	0.474599	0.474599	NaN	NaN	NaN
78	e7867f750ed84b1832051ce099fb10c678a4e1c06a7f02...	education-num	0.85	13.408126	1.620129	>$50k	True	0.494980	0.494980	0.494980	NaN	NaN	NaN
79	ddd8b20e3454a2db711a0f348fcb9520fe8a1d4444ea00...	education-num	0.90	13.605417	1.760868	>$50k	True	0.513974	0.513974	0.513974	NaN	NaN	NaN
80	d76832b9f61caf612fa9d40bb553255dfd4bc8fb9fe8ab...	education-num	0.95	13.802709	1.913423	>$50k	True	0.533300	0.533300	0.533300	NaN	NaN	NaN
81	adaae106a1b4eac32a2e562c727160f8ccc2da0b1a655d...	education-num	1.00	14.000000	2.079540	>$50k	True	0.552879	0.552879	0.552879	NaN	NaN	NaN

82 rows × 13 columns

What characterizes $\xi$ is the list of (feature, target). We hash it to get the group id and be able to rapidly identify the parts of the query that have already been computed during a previous call.

In [ ]: