Module `epiclass.utils.shap.sample_ontology_shap_ranks`

Create graphs of shap values ranks for certain cell types VS other cell types.

Examine

ranks on important features for an assay + cell type VS other assay + cell types
ranks on important features for a cell type (one output class) vs othe cell types
both above, but for features unique to the selection

Functions

def calculate_rank_stats(all_subset_ranks: Dict[Tuple[str, str], Dict[int, List[int]]] | Dict[str, Dict[int, List[int]]])

Calculate statistics for ranks associated with each feature in each subset of samples.

Args

all_subset_ranks (Dict[Tuple[str, str], Dict[int, List[int]]] OR Dict[str, Dict[int, List[int]]]): The ranks of each feature in each subset of samples. Can be either: - A dictionary with (assay, cell_type) tuples as keys and feature ranks as values, or - A dictionary with cell_type strings as keys and feature ranks as values. Feature ranks should be a dictionary with feature indices as keys and lists of ranks as values.

Returns

Tuple[Dict[Tuple[str, str], Dict[int, Tuple[float, float]]], Dict[Tuple[str, str], Dict[int, Tuple[float, float]]]]: A tuple with two dictionaries: 1. The average rank and standard deviation for each feature in each subset. 2. The median rank and interquartile range for each feature in each subset.

def compute_iqr(data: List[int]) ‑> float

Calculate the interquartile range.

def main()

Main

def parse_arguments() ‑> argparse.Namespace

Define CLI argument parser.

def write_stats(output_file: Path, header: str, data: Dict[Tuple[str, str], Dict[int, Tuple[float, float]]] | Dict[str, Dict[int, Tuple[float, float]]], features_idx: Iterable[int], include_assay: bool = True)

Write statistical data to a TSV file.

This function can handle two types of data structures: 1. Data with assay and cell type: Dict[Tuple[str, str], Dict[int, Tuple[float, float]]] 2. Data with only cell type: Dict[str, Dict[int, Tuple[float, float]]]

Args

output_file : Path: The path to the output TSV file.
header : str: A format string for the header of each feature column. Should contain '{f}' which will be replaced by the feature index.
data (Dict[Tuple[str, str], Dict[int, Tuple[float, float]]] OR
Dict[str, Dict[int, Tuple[float, float]]]):
The statistical data to write. Can be either:
- A dictionary with (assay, cell_type) tuples as keys and feature stats as values, or
- A dictionary with cell_type strings as keys and feature stats as values.
Feature stats should be a dictionary with feature indices as keys and (stat1, stat2) tuples as values.
features_idx : Iterable[int]: Feature indices to include in the output.
include_assay : bool, optional: Whether to include the assay column in the output. Defaults to True.

Returns

None

Raises

ValueError: If the data structure doesn't match the include_assay parameter.

Example

write_stats( Path("output.tsv"), "Feature_{f}Avg Feature_Std", {("AssayA", "CellType1"): {0: (1.0, 0.1), 1: (2.0, 0.2)}}, [0, 1], include_assay=True )