Module epiclass.utils.shap.shap_analysis

Module for more complex SHAP analysis functions.

Need to disable pylint unsubscriptable-object because of incorrect report in pandas. see: https://github.com/pylint-dev/pylint/issues/3637

Functions

def feature_overlap_stats(feature_lists: List[List[int]], percentile_list: list[int | float])

Calculate the statistics of feature overlap between multiple feature lists. The features lists are assumed to be lists of feature indices for different samples.

This function takes a list of feature lists and computes feature frequency percentiles. It also computes the union and intersection of all features from the given feature lists.

Args

feature_lists : List[List[int]]
A list of feature lists, where each inner list contains feature indices.

percentile_list (List[int|float]: The percentile values for which the most frequent features will be returned.

Returns

Tuple[Set[int], Set[int], Dict[int|float, List], go.Figure]: A tuple containing 1) intersection of all features 2) union of all features 3) a dict containing the list of features present in each percentile. 4) a plotly figure showing the histogram of feature frequency

def print_feature_overlap_stats(feature_stats: Sequence)

Prints the statistics of feature overlap.

This function receives the feature statistics which include the intersection, union and frequent features in each quantile of features. It then prints these statistics for easy inspection.

Args

feature_stats : Sequence
Tuple containing the intersection, union and frequent features in each quantile of features.
def print_importance_info(feature_selection: List[int], shap_matrix: np.ndarray)

Prints the feature importance information.

This function prints the feature importance information, which includes the average expected contribution of the selected features and one feature (if the importance was uniform), and statistical descriptions of the contributions of the selected features.

Args

feature_selection : List[int]
The indices of the selected features.
shap_matrix : np.ndarray
The SHAP values matrix.