Ethik AI

How It Works

In this notebook, we explain the maths behind ethik and how it is implemented. It is not mandatory to start using the package. To so so, please read the Getting started tutorial.

Let's assume you have a dataset and a model that generates predictions on this dataset. For the sake of the example, we'll be wortking with the Adult dataset.

In [1]:
import ethik
import numpy as np
import pandas as pd
import plotly.graph_objs as go
from sklearn import metrics, model_selection

X, y = ethik.datasets.load_adult()
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, shuffle=True, random_state=42)

X_test.head()
Out[1]:
age workclass fnlwgt education education-num marital-status occupation relationship race gender capital-gain capital-loss hours-per-week native-country
14160 27 Private 160178 Some-college 10 Divorced Adm-clerical Not-in-family White Female 0 0 38 United-States
27048 45 State-gov 50567 HS-grad 9 Married-civ-spouse Exec-managerial Wife White Female 0 0 40 United-States
28868 29 Private 185908 Bachelors 13 Married-civ-spouse Exec-managerial Husband Black Male 0 0 55 United-States
5667 30 Private 190040 Bachelors 13 Never-married Machine-op-inspct Not-in-family White Female 0 0 40 United-States
7827 29 Self-emp-not-inc 189346 Some-college 10 Divorced Craft-repair Not-in-family White Male 2202 0 50 United-States

ethik is model-agnostic and works with the data only.

In [2]:
def get_black_box_model_predictions():
    import lightgbm as lgb
    model = lgb.LGBMClassifier(random_state=42).fit(X_train, y_train)
    return model.predict_proba(X_test)[:, 1]

y_pred = get_black_box_model_predictions()
# We use a named pandas series to make plot labels more explicit
y_pred = pd.Series(y_pred, name=">$50k")
y_pred
Out[2]:
0       0.009978
1       0.366950
2       0.749978
3       0.015823
4       0.015583
          ...   
8136    0.004573
8137    0.113812
8138    0.001476
8139    0.707632
8140    0.261709
Name: >$50k, Length: 8141, dtype: float64

ethik lets us explore how each feature impact the model's behavior, either its prediction or its performance.

The Maths

Mathematically, we have a dataset of $n$ samples and $p$ features:

$$ \begin{aligned} X &= \{(x_i^1, x_i^2, \ldots, x_i^p) \}_{1 \leq i \leq n} \\ &= \{x_i\}_{1 \leq i \leq n} \end{aligned} $$

For data scientists, $X$ is X_test.

We also have the model's predictions:

$$ \begin{aligned} \hat{Y} &= \{f(x_i)\}_{1 \leq i \leq n} \\ &= \{y_i\}_{1 \leq i \leq n} \end{aligned} $$

We potentially also have $Y$, the true output observed from the real world.

The dataset $(X, \hat{Y}, Y)$ can be considered as an empirical probability distribution $Q_n$ (the unobserved true distribution being $Q$). As explained in the paper, the key idea of ethik is to stress this distribution with respect to a property of $X$ to reach a target $t$ on this property.

For now, we only handle targets on the mean, i.e. we can stress $Q_n$ so that the mean of $X$ is $\mu_t = (\mu_t^1, \mu_t^2, \ldots, \mu_t^p)$. Then we obtain a new distribution $Q_t$.

But of course, a lot of distributions can have the mean $\mu_t$, so the target is not sufficient to determine the stress. Because the model was trained on $Q_n$ (we suppose that the training set and the test set have the same distribution), we want $Q_t$ to be as close as possible to $Q_n$. We do this by minimizing the Kullback-Leibler divergence between $Q_n$ and $Q_t$.

Let's take an example. First, let's build a toy dataset with two features $X_{age}$ and $X_{education-num}$:

In [3]:
n = len(X_test) 
ds = pd.DataFrame(
    [
        [X_test["age"].iloc[i], X_test["education-num"].iloc[i], y_pred[i], int(y_test.iloc[i])]
        for i in range(n)
    ],
    columns=["age", "education-num", "ŷ", "y"]
)
ds
Out[3]:
age education-num ŷ y
0 27 10 0.009978 0
1 45 9 0.366950 0
2 29 13 0.749978 1
3 30 13 0.015823 0
4 29 10 0.015583 0
... ... ... ... ...
8136 35 9 0.004573 0
8137 60 3 0.113812 1
8138 25 9 0.001476 0
8139 50 13 0.707632 1
8140 24 4 0.261709 1

8141 rows × 4 columns

Let's plot the marginal distribution of $X_{age}$:

In [4]:
def plot_pdf(x, density):
    width = x[1:] - x[:-1]
    mean = sum(x * density) / sum(density)
    return go.Figure().add_bar(
        x=x,
        y=density,
        width=width,
    ).update_layout(
        plot_bgcolor="#FFF",
        xaxis=dict(
            title="age",
            linecolor="black",
        ),
        yaxis=dict(
            title="Density",
            linecolor="black",
        ),
        shapes=[
            go.layout.Shape(
                type="line",
                x0=mean,
                y0=0,
                x1=mean,
                y1=1,
                yref="paper",
                line=dict(
                    color="red",
                    width=1,
                    dash="dash",
                )
            )
        ],
        annotations=[
            go.layout.Annotation(
                x=mean,
                y=1,
                xref="x",
                yref="paper",
                text=f"Mean = {mean:.2f}",
                showarrow=False,
                yanchor="middle",
                bgcolor="#FFF",
                bordercolor="red",
                borderpad=4,
            )
        ]
    )
In [5]:
densities, edges = np.histogram(ds["age"], bins=40, density=True)
x_pdf = edges[:-1]
plot_pdf(x_pdf, densities)

Like with counterfactual explanations, we now ask the question:

What if the mean age was $\mu_t$?

For instance, we can stress $Q_n$ so that $\mu_t \approx 30$:

In [6]:
target = 30
weights = [2 if age <= target else 0.5 for age in ds["age"]]
stressed_densities, _ = np.histogram(ds["age"], bins=40, density=True, weights=weights)
plot_pdf(x_pdf, stressed_densities)

We could sample from this distribution and observe how the model behaves for a mean age of 20. It'd be similar to Partial Dependence Plots.

The problem here is that we only stressed the marginal distribution of $X_{age}$. But features may be correlated and if we don't consider that, we are likely to create unrealistic individuals (a 18 year-old who has a PhD).

To address this issue, the key idea behind the paper is to stress the whole distribution $(X, \hat{Y}, Y)$ while minimizing the distance to the original distribution. The Kullback–Leibler divergence is used.

Minimizing the Kullback–Leibler divergence between the stressed distribution and the original distribution lets us believe that we are not creating individuals that are too unrealistic. Of course, reaching a minimum doesn't mean that this minimum is actually small and, to certify the model, we need to define a threshold on the distance. This is out of the scope of this notebook though.

As explained in the paper, we can efficiently compute a unique set of weights $\{\lambda_i^{(t)}\}_{1 \leq i \leq n}$ to stress the probability density function to reach a target on a property while minimizing the KL divergence:

$$ Q_t = \sum_{i=1}^n \lambda_i^{(t)} \delta_{x_i, \hat{y}_i, y_i}. $$

with:

  • $Q_t$ being the distribution meeting the target $t$ (e.g. $E[X_{age}] = \mu_t$) and minimizing $KL(Q_t, Q_n)$ ;
  • $\delta_{x_i, \hat{y}_i, y_i}$ being the probability density of the $i$-th sample.

Remember: $Q_t$ includes the input $X$ but also the output $\hat{Y}$ (and potentially the true output $Y$). It means that we can compute what would the output be if $X$ met the target $t$ without having to run the model again. This is a huge gain in performance!

Let's notice that we do not change the individuals of the dataset but their probability density. In other words, to increase the mean of a die, we do not replace the 6 by a 7 but we trick the die to increase the probability of rolling a 6.

The theorem that tells us how to compute these weights works for targets on the mean of $X$. By transforming $X$, we can also reach targets on higher-order moments (e.g. the variance). For instance, the mean of the feature $(X - E[X])^2$ is actually the variance of $X$. In the paper, the transformation of $X$ is called $\Phi(X)$ (e.g. $\Phi : X \mapsto (X - E[X])^2$).

So it enables us to answer questions like: What would the output distribution be if the mean of...

  • this feature were $\mu_t$? (we only consider one feature, i.e. its marginal distribution)
  • these features were $(\mu_t^1, \mu_t^2)$? (we consider multiple features and so their potential correlations)
  • $(X^{j_0} - E[X^{j_0}])^2$ (i.e. the variance) were $\sigma_t^2$? (we consider the variance of a feature instead of its mean)
  • ...

And so without having to run the model again.

Implementation

Since we have a finite number of data samples, we can consider our probability distribution as discrete. Stressing the distribution means applying weights to the density of every individual:

$$ Q_t = \sum_{i=1}^n \lambda_i^{(t)} \delta_{x_i, \hat{y}_i, y_i}. $$

Let's notice that the $x_i$ may have been transformed by $\Phi$, as illustrated above.

As explained in the paper, these weights $\lambda_i^{(t)}$ are derived from a common parameter $\xi_t$:

$$ \begin{aligned} \lambda_i^{(t)} &= \frac{\exp(\langle \xi_t, x_i \rangle)}{\sum_{i=1}^n \exp(\langle \xi_t, x_i \rangle)} \\ &= \text{softmax}(\langle \xi_t, x_i \rangle) \end{aligned} $$

with $\langle x, y \rangle$ being the scalar product. $\lambda_i^{(t)}$ is then a scalar.

Then the mean of the stressed distribution is given by:

$$ \mu_t = \frac{1}{\sum_{i=1}^n \lambda_i^{(t)}} \sum_{i=1}^n \lambda_i^{(t)} \times x_i $$

To find $\xi_t$ algorithmically, we minimize the function (target_mean - current_mean) ** 2, with current_mean being computed with the previous formulas. The code is in ethik.base_explainer.compute_ksi().

Explaining a black-box model with a black-box is not a big deal. Let's dive into the machinery:

First, we can visualize the stressed distribution in ethik:

In [7]:
ethik.ClassificationExplainer().plot_distributions(
    X_test["age"],
    targets=[25, 45]
)

The distribution for a mean age of 25 doesn't look much like the original distribution. It means that some individuals get a lot more weights than in the data. We can visualize how many individuals capture X% of the weight (i.e. the sum of the $\lambda_i^{(t)}$):

In [8]:
ethik.ClassificationExplainer().plot_cumulative_weights(
    X_test["age"],
    targets=[25, 45]
)

To plot this chart, we:

  • Compute the weights $\lambda_i^{(t)}$ ;
  • Sort them decreasingly (the highest first) ;
  • Plot them cumulatively.

A straight line means that the weight is uniformely distributed on the individuals. Here, we can see that for a mean of 25, 50% of the weight is captured by 14% of the individuals. Whether this is too small or not is out of the scope of this notebook.

Let's notice that we only stress one feature here, so the potential correlations with other features are not considered. The algorithm just makes sure that the stressed (marginal) distribution is the one that is the closest to the original one (according to a KL distance).

 Data structures

To explain how we manipulate the data, we'll use the internal class BaseExplainer:

In [9]:
base_explainer = ethik.base_explainer.BaseExplainer()

First, we need to build a query, that is a pandas dataframe that contains at least four columns to answer the question: What would the output label be this feature were equal to target?

  • feature: It must match the name of a column in the dataset.
  • target: This value must satisfy feature.min() < target < feature.max().
  • label: This value is used for multi-class classification problems. For binary classification and regression, it must just match the name of the pandas series containing the predictions.
  • group: A key to determine the targets that must be reached together. For instance, if the rows ("age", 30, "output") and ("education-num", 10, "output") have the same group, it means that we will consider the distribution $(X_{age}, X_{education-num})$ and try to reach the mean $(30, 10)$. It differs from considering the marginal distributions $X_{age}$ and $X_{education-num}$ independently.

For instance:

In [10]:
query = pd.DataFrame({
    "group": [0, 1],
    "feature": ["age", "age"],
    "target": [30, 45],
    "label": [y_pred.name, y_pred.name],
})
query
Out[10]:
group feature target label
0 0 age 30 >$50k
1 1 age 45 >$50k

To reach multiple targets at the same time, we specify a common group. Of course, a group cannot contain more than one row per feature. All groups don't need to have the same number of features.

In [11]:
pd.DataFrame({
    "group": [0, 0, 1],
    "feature": ["age", "education-num", "age"],
    "target": [30, 10, 45],
    "label": [y_pred.name, y_pred.name, y_pred.name],
})
Out[11]:
group feature target label
0 0 age 30 >$50k
1 0 education-num 10 >$50k
2 1 age 45 >$50k

To compute the weights used for the stress, the method _fill_ksis() is called internally:

In [12]:
base_explainer._fill_ksis(X_test, query)
Out[12]:
group feature target label ksi converged
0 0 age 30 >$50k -0.802128 True
1 1 age 45 >$50k 0.421562 True

_fill_ksis() alters its input:

In [13]:
query
Out[13]:
group feature target label ksi converged
0 0 age 30 >$50k -0.802128 True
1 1 age 45 >$50k 0.421562 True

In practice, we directly call an _explain_*() method. Let's initialize the query again to be in that case:

In [14]:
query = pd.DataFrame({
    "group": [0, 1],
    "feature": ["age", "age"],
    "target": [30, 45],
    "label": [y_pred.name, y_pred.name],
})
query
Out[14]:
group feature target label
0 0 age 30 >$50k
1 1 age 45 >$50k
In [15]:
base_explainer._explain_influence(
    X_test=X_test,
    y_pred=y_pred,
    query=query
)
100%|██████████| 2/2 [00:00<00:00, 493.71it/s]
Out[15]:
group feature target label ksi converged influence influence_low influence_high
0 0 age 30 >$50k -0.802128 True 0.156249 0.156249 0.156249
1 1 age 45 >$50k 0.421562 True 0.272460 0.272460 0.272460

The original query has not been altered:

In [16]:
query
Out[16]:
group feature target label
0 0 age 30 >$50k
1 1 age 45 >$50k

As seen before, ksi is the parameter used to compute the weights. influence is the mean of the output. We can read the dataframe above this way: When the mean age is 30, the average probability of earning more than $50k a year is about 15%.

influence_low and influence_high are for the confidence interval. As we didn't specify n_samples when we created the explainer, their value is equal to the mean. Otherwise, we follow this procedure:

  1. Sub-sample the dataset (e.g. take 80% of the samples)
  2. Add the smallest and the highest values of the dataset (so that we can reach extreme targets)
  3. Compute the feature influence
  4. Do that n times
  5. Get the mean of the results and the 5% and 95% quantiles (defined by explainer.conf_level) for the confidence interval
In [17]:
ethik.base_explainer.BaseExplainer(n_samples=10)._explain_influence(
    X_test=X_test,
    y_pred=y_pred,
    query=query
)
100%|██████████| 20/20 [00:00<00:00, 583.16it/s]
Out[17]:
group feature target label ksi converged influence influence_low influence_high
0 0 age 30 >$50k -0.802128 True 0.156900 0.155986 0.157846
1 1 age 45 >$50k 0.421562 True 0.272785 0.270981 0.274631

We can also ask the question: What is the average performance when the mean age is 30 or 45?

In [18]:
base_explainer._explain_performance(
    X_test=X_test,
    y_test=y_test,
    y_pred=y_pred > 0.5,
    metric=metrics.accuracy_score,
    query=query
)
100%|██████████| 2/2 [00:00<00:00, 311.43it/s]
Out[18]:
group feature target label ksi converged accuracy_score accuracy_score_low accuracy_score_high
0 0 age 30 >$50k -0.802128 True 0.914410 0.914410 0.914410
1 1 age 45 >$50k 0.421562 True 0.865622 0.865622 0.865622

Query building

In the public API, we expose classes that inherit from CacheExplainer (which inherits from BaseExplainer itself). This class does three things:

  • It builds the query automatically using ethik.query.Query() ;
  • It calls the methods of BaseExplainer to explain the model ;
  • It stores the results in an attribute (we'll see that in detail later).

To build the query, it uses the quantiles:

In [19]:
ethik.query.Query.from_taus(
    X_test=X_test[["age"]],
    labels=[y_pred.name],
    n_taus=7,
    q=[0.05, 0.95],
)
Out[19]:
group tau target feature label
0 e63f2712ec1d2eb591f5ba9d2a34529158a33d62933794... -1.000000e+00 19.000000 age >$50k
1 d42fc5d10ea54079f10475004ed37a3f314106c142ce12... -6.666667e-01 25.510748 age >$50k
2 3fcf77fd897e88b9d68b22be7e6a853b1f1685d9a6d2d7... -3.333333e-01 32.021496 age >$50k
3 62e417a8c5aab9625b453dbd2e64a570ff95960f3274c4... -1.000000e-16 38.532244 age >$50k
4 7bfbc5d74540bbbf35b94a8de88f3540fac9e2ef458ea8... 3.333333e-01 46.688163 age >$50k
5 cd93f2c34b36a40ed7df22adcf93136c7eed5309020ec4... 6.666667e-01 54.844081 age >$50k
6 7d12ffc8eecaa90ca5dc2820861acace26776f8281b312... 1.000000e+00 63.000000 age >$50k

It doesn't make sense to target a mean that is too close to the bounds of the feature because it would lead to a distribution with a density focused on the few smallest/largest individuals only, which is quite different from the original distribution the model was trained with.

To avoid that, we keep the targets between bounds that are controlled by the alpha parameter. By default, alpha = 0.05, which means that the targets are between the 5% and the 95% quantiles.

In the query above, tau == 0 represents the original mean, tau == -1 the 5% quantile and tau == 1 the 95% quantile. As the data may not be evenly distributed, a regular step for tau doesn't mean a regular step for target.

The group id is built using a hash to easily retrieve an existing record (see below).

The query is built internally when we call explain_*():

In [20]:
cache_explainer = ethik.cache_explainer.CacheExplainer()
cache_explainer.explain_influence(
    X_test=X_test["age"],
    y_pred=y_pred,
).head()
100%|██████████| 41/41 [00:00<00:00, 567.71it/s]
Out[20]:
group feature tau target ksi label converged influence influence_low influence_high
0 e63f2712ec1d2eb591f5ba9d2a34529158a33d62933794... age -1.00 19.000000 -6.615004 >$50k True 0.006516 0.006516 0.006516
1 c8488b853b24b27e293ff5b81e013c7329580c5834e163... age -0.95 19.976612 -4.691385 >$50k True 0.015121 0.015121 0.015121
2 11a01c7f5952e2a521d2f81e93453b59ee4920fc6fe1b6... age -0.90 20.953224 -3.642715 >$50k True 0.026264 0.026264 0.026264
3 8fdd3d71ff57584c6e5f8372de8126eab403d89c10a533... age -0.85 21.929837 -2.891141 >$50k True 0.040686 0.040686 0.040686
4 4624202d8cbbf4b54e59bf68e76edd1baf9cf310a3ca1c... age -0.80 22.906449 -2.413349 >$50k True 0.054703 0.054703 0.054703

Caching

CacheExplainer stores the results in its .info attribute:

In [21]:
cache_explainer.explain_influence(
    X_test=X_test["age"],
    y_pred=y_pred,
)
cache_explainer.info.head()
100%|██████████| 41/41 [00:00<00:00, 565.70it/s]
Out[21]:
group feature tau target ksi label converged influence influence_low influence_high
0 e63f2712ec1d2eb591f5ba9d2a34529158a33d62933794... age -1.00 19.000000 -6.615004 >$50k True 0.006516 0.006516 0.006516
1 c8488b853b24b27e293ff5b81e013c7329580c5834e163... age -0.95 19.976612 -4.691385 >$50k True 0.015121 0.015121 0.015121
2 11a01c7f5952e2a521d2f81e93453b59ee4920fc6fe1b6... age -0.90 20.953224 -3.642715 >$50k True 0.026264 0.026264 0.026264
3 8fdd3d71ff57584c6e5f8372de8126eab403d89c10a533... age -0.85 21.929837 -2.891141 >$50k True 0.040686 0.040686 0.040686
4 4624202d8cbbf4b54e59bf68e76edd1baf9cf310a3ca1c... age -0.80 22.906449 -2.413349 >$50k True 0.054703 0.054703 0.054703

By default, this dataframe is reset at each call, as this is the less-magical behaviour:

In [22]:
cache_explainer.explain_influence(
    X_test=X_test["education-num"],
    y_pred=y_pred,
)
cache_explainer.info.head()
100%|██████████| 41/41 [00:00<00:00, 598.48it/s]
Out[22]:
group feature tau target ksi label converged influence influence_low influence_high
0 348f18bb67a8e8c11ad895a5ee5f7c6d61a10eda55a548... education-num -1.00 5.000000 -1.544333 >$50k True 0.094024 0.094024 0.094024
1 c5c37148475737b3f8e58a56a7dd1cae04f90076c6eb07... education-num -0.95 5.252709 -1.464212 >$50k True 0.097783 0.097783 0.097783
2 e19ad8813037d438218f8e4d785a08e73a899422bab9df... education-num -0.90 5.505417 -1.387641 >$50k True 0.101815 0.101815 0.101815
3 f860fd046939b7724611c2f1cc3daa024b9c6a55c80ca1... education-num -0.85 5.758126 -1.314047 >$50k True 0.106100 0.106100 0.106100
4 2e4f9142d4862ff90f4ee892dad59ea9dd95f55b32fb99... education-num -0.80 6.010834 -1.242817 >$50k True 0.110628 0.110628 0.110628

But we can instantiate the explainer with memoize=True:

In [23]:
cache_explainer = ethik.cache_explainer.CacheExplainer(memoize=True)
In [24]:
cache_explainer.explain_influence(
    X_test=X_test["age"],
    y_pred=y_pred,
)
cache_explainer.info.head()
100%|██████████| 41/41 [00:00<00:00, 592.71it/s]
Out[24]:
group feature tau target ksi label converged influence influence_low influence_high
0 e63f2712ec1d2eb591f5ba9d2a34529158a33d62933794... age -1.00 19.000000 -6.615004 >$50k True 0.006516 0.006516 0.006516
1 c8488b853b24b27e293ff5b81e013c7329580c5834e163... age -0.95 19.976612 -4.691385 >$50k True 0.015121 0.015121 0.015121
2 11a01c7f5952e2a521d2f81e93453b59ee4920fc6fe1b6... age -0.90 20.953224 -3.642715 >$50k True 0.026264 0.026264 0.026264
3 8fdd3d71ff57584c6e5f8372de8126eab403d89c10a533... age -0.85 21.929837 -2.891141 >$50k True 0.040686 0.040686 0.040686
4 4624202d8cbbf4b54e59bf68e76edd1baf9cf310a3ca1c... age -0.80 22.906449 -2.413349 >$50k True 0.054703 0.054703 0.054703
In [25]:
cache_explainer.explain_influence(
    X_test=X_test["education-num"],
    y_pred=y_pred,
)
cache_explainer.info
100%|██████████| 41/41 [00:00<00:00, 595.64it/s]
Out[25]:
group feature tau target ksi label converged influence influence_low influence_high
0 e63f2712ec1d2eb591f5ba9d2a34529158a33d62933794... age -1.00 19.000000 -6.615004 >$50k True 0.006516 0.006516 0.006516
1 c8488b853b24b27e293ff5b81e013c7329580c5834e163... age -0.95 19.976612 -4.691385 >$50k True 0.015121 0.015121 0.015121
2 11a01c7f5952e2a521d2f81e93453b59ee4920fc6fe1b6... age -0.90 20.953224 -3.642715 >$50k True 0.026264 0.026264 0.026264
3 8fdd3d71ff57584c6e5f8372de8126eab403d89c10a533... age -0.85 21.929837 -2.891141 >$50k True 0.040686 0.040686 0.040686
4 4624202d8cbbf4b54e59bf68e76edd1baf9cf310a3ca1c... age -0.80 22.906449 -2.413349 >$50k True 0.054703 0.054703 0.054703
... ... ... ... ... ... ... ... ... ... ...
77 43e60fb600fd7bb7fb5218f9ce0568774ccff77e0661ed... education-num 0.80 13.210834 1.477479 >$50k True 0.474599 0.474599 0.474599
78 e7867f750ed84b1832051ce099fb10c678a4e1c06a7f02... education-num 0.85 13.408126 1.620129 >$50k True 0.494980 0.494980 0.494980
79 ddd8b20e3454a2db711a0f348fcb9520fe8a1d4444ea00... education-num 0.90 13.605417 1.760868 >$50k True 0.513974 0.513974 0.513974
80 d76832b9f61caf612fa9d40bb553255dfd4bc8fb9fe8ab... education-num 0.95 13.802709 1.913423 >$50k True 0.533300 0.533300 0.533300
81 adaae106a1b4eac32a2e562c727160f8ccc2da0b1a655d... education-num 1.00 14.000000 2.079540 >$50k True 0.552879 0.552879 0.552879

82 rows × 10 columns

This way, we don't have to compute the ksis twice when we call explain_influence() and explain_performance():

In [26]:
cache_explainer.explain_performance(
    X_test=X_test["age"],
    y_test=y_test,
    y_pred=y_pred > 0.5,
    metric=metrics.accuracy_score,
)
cache_explainer.info
100%|██████████| 41/41 [00:00<00:00, 354.52it/s]
Out[26]:
group feature tau target ksi label converged influence influence_low influence_high accuracy_score accuracy_score_low accuracy_score_high
0 e63f2712ec1d2eb591f5ba9d2a34529158a33d62933794... age -1.00 19.000000 -6.615004 >$50k True 0.006516 0.006516 0.006516 0.995804 0.995804 0.995804
1 c8488b853b24b27e293ff5b81e013c7329580c5834e163... age -0.95 19.976612 -4.691385 >$50k True 0.015121 0.015121 0.015121 0.989507 0.989507 0.989507
2 11a01c7f5952e2a521d2f81e93453b59ee4920fc6fe1b6... age -0.90 20.953224 -3.642715 >$50k True 0.026264 0.026264 0.026264 0.982033 0.982033 0.982033
3 8fdd3d71ff57584c6e5f8372de8126eab403d89c10a533... age -0.85 21.929837 -2.891141 >$50k True 0.040686 0.040686 0.040686 0.973085 0.973085 0.973085
4 4624202d8cbbf4b54e59bf68e76edd1baf9cf310a3ca1c... age -0.80 22.906449 -2.413349 >$50k True 0.054703 0.054703 0.054703 0.964937 0.964937 0.964937
... ... ... ... ... ... ... ... ... ... ... ... ... ...
77 43e60fb600fd7bb7fb5218f9ce0568774ccff77e0661ed... education-num 0.80 13.210834 1.477479 >$50k True 0.474599 0.474599 0.474599 NaN NaN NaN
78 e7867f750ed84b1832051ce099fb10c678a4e1c06a7f02... education-num 0.85 13.408126 1.620129 >$50k True 0.494980 0.494980 0.494980 NaN NaN NaN
79 ddd8b20e3454a2db711a0f348fcb9520fe8a1d4444ea00... education-num 0.90 13.605417 1.760868 >$50k True 0.513974 0.513974 0.513974 NaN NaN NaN
80 d76832b9f61caf612fa9d40bb553255dfd4bc8fb9fe8ab... education-num 0.95 13.802709 1.913423 >$50k True 0.533300 0.533300 0.533300 NaN NaN NaN
81 adaae106a1b4eac32a2e562c727160f8ccc2da0b1a655d... education-num 1.00 14.000000 2.079540 >$50k True 0.552879 0.552879 0.552879 NaN NaN NaN

82 rows × 13 columns

What characterizes $\xi$ is the list of (feature, target). We hash it to get the group id and be able to rapidly identify the parts of the query that have already been computed during a previous call.

In [ ]: