Module epiclass.utils.shap.sample_ontology_shap_ranks
Create graphs of shap values ranks for certain cell types VS other cell types.
Examine
- ranks on important features for an assay + cell type VS other assay + cell types
- ranks on important features for a cell type (one output class) vs othe cell types
- both above, but for features unique to the selection
Functions
def calculate_rank_stats(all_subset_ranks: Dict[Tuple[str, str], Dict[int, List[int]]] | Dict[str, Dict[int, List[int]]])
-
Calculate statistics for ranks associated with each feature in each subset of samples.
Args
all_subset_ranks (Dict[Tuple[str, str], Dict[int, List[int]]] OR Dict[str, Dict[int, List[int]]]): The ranks of each feature in each subset of samples. Can be either: - A dictionary with (assay, cell_type) tuples as keys and feature ranks as values, or - A dictionary with cell_type strings as keys and feature ranks as values. Feature ranks should be a dictionary with feature indices as keys and lists of ranks as values.
Returns
Tuple[Dict[Tuple[str, str], Dict[int, Tuple[float, float]]], Dict[Tuple[str, str], Dict[int, Tuple[float, float]]]]: A tuple with two dictionaries: 1. The average rank and standard deviation for each feature in each subset. 2. The median rank and interquartile range for each feature in each subset.
def compute_iqr(data: List[int]) ‑> float
-
Calculate the interquartile range.
def main()
-
Main
def parse_arguments() ‑> argparse.Namespace
-
Define CLI argument parser.
def write_stats(output_file: Path, header: str, data: Dict[Tuple[str, str], Dict[int, Tuple[float, float]]] | Dict[str, Dict[int, Tuple[float, float]]], features_idx: Iterable[int], include_assay: bool = True)
-
Write statistical data to a TSV file.
This function can handle two types of data structures: 1. Data with assay and cell type: Dict[Tuple[str, str], Dict[int, Tuple[float, float]]] 2. Data with only cell type: Dict[str, Dict[int, Tuple[float, float]]]
Args
output_file
:Path
- The path to the output TSV file.
header
:str
- A format string for the header of each feature column. Should contain '{f}' which will be replaced by the feature index.
- data (Dict[Tuple[str, str], Dict[int, Tuple[float, float]]] OR
- Dict[str, Dict[int, Tuple[float, float]]]):
- The statistical data to write. Can be either:
- - A dictionary with (assay, cell_type) tuples as keys and feature stats as values, or
- - A dictionary with cell_type strings as keys and feature stats as values.
- Feature stats should be a dictionary with feature indices as keys and (stat1, stat2) tuples as values.
features_idx
:Iterable[int]
- Feature indices to include in the output.
include_assay
:bool
, optional- Whether to include the assay column in the output. Defaults to True.
Returns
None
Raises
ValueError
- If the data structure doesn't match the include_assay parameter.
Example
write_stats( Path("output.tsv"), "Feature_{f}Avg Feature_Std", {("AssayA", "CellType1"): {0: (1.0, 0.1), 1: (2.0, 0.2)}}, [0, 1], include_assay=True )