Module epiclass.utils.notebooks.paper.metadata_bias_analysis

Workbook to quantify bias present in metadata Q: Can you identify certain labels by using other metadata e.g. find cell type using project+assay+other

Functions

def compute_all_max_bias(metadata_df: pd.DataFrame, target_categories: List[str], md5s_to_include: Dict[str, Set[str]], avg_observed_acc: Dict[str, float], verbose: bool = True) ‑> Dict[str, Any]

Compute the max metadata bias for all target categories.

def create_models() ‑> List[~T]

Create models for bias analysis.

def define_input_bias_categories(target_category: str) ‑> List[List[str]]

Define bias categories used for bias analysis.

Args

target_category : str
Classification target category. Is excluded from input lists.

Returns

List[List[str]]
List of bias categories.
def filter_samples(metadata_df: pd.DataFrame, target_category: str, md5_set: Set[str], verbose: bool = True) ‑> pandas.core.frame.DataFrame

Filter samples based on the output category to match the original training set.

def find_max_bias(filtered_metadata_df: pd.DataFrame, target_category: str, verbose: bool = True) ‑> Dict[Tuple[str, ...], float]

Find the bias categories that provide the highest accuracy for the target category.

def main()

Main function.

def parse_arguments() ‑> argparse.Namespace

argument parser for command line