Module epiclass.utils.notebooks.paper.metadata_bias_analysis
Workbook to quantify bias present in metadata Q: Can you identify certain labels by using other metadata e.g. find cell type using project+assay+other
Functions
def compute_all_max_bias(metadata_df: pd.DataFrame, target_categories: List[str], md5s_to_include: Dict[str, Set[str]], avg_observed_acc: Dict[str, float], verbose: bool = True) ‑> Dict[str, Any]
-
Compute the max metadata bias for all target categories.
def create_models() ‑> List[~T]
-
Create models for bias analysis.
def define_input_bias_categories(target_category: str) ‑> List[List[str]]
-
Define bias categories used for bias analysis.
Args
target_category
:str
- Classification target category. Is excluded from input lists.
Returns
List[List[str]]
- List of bias categories.
def filter_samples(metadata_df: pd.DataFrame, target_category: str, md5_set: Set[str], verbose: bool = True) ‑> pandas.core.frame.DataFrame
-
Filter samples based on the output category to match the original training set.
def find_max_bias(filtered_metadata_df: pd.DataFrame, target_category: str, verbose: bool = True) ‑> Dict[Tuple[str, ...], float]
-
Find the bias categories that provide the highest accuracy for the target category.
def main()
-
Main function.
def parse_arguments() ‑> argparse.Namespace
-
argument parser for command line