Module epiclass.utils.modify_metadata
Functions to perform more complex operations on the metadata.
Functions
def add_fake_epiatlas_metadata(metadata: Metadata) ‑> None
-
Add uuid and track_type info to non epiatlas metadata. uuid will be md5sum, and track_type will be raw.
def add_formated_date(metadata: Metadata) ‑> None
-
Add 'upload_date_2' category to dsets with YYYY-MM format.
def add_random_group(metadata: Metadata, seed=42, n_split=23) ‑> str
-
Add a 'random_seed{seed}_{n_split}splits' category made out of n_splits random separations withing uuids (different tracks have same val).
Return the name of the new category.
def filter_by_pairs(my_metadata: ~Meta, assay_cat: str = 'assay', cat2: str = 'cell_type', nb_pairs: int = 5, min_per_pair: int = 10, use_uuid: bool = True) ‑> ~Meta
-
Returns filtered metadata keeping only certain classes from cat2 based on pairs conditions with assay_cat.
1) Remove (assay_cat, cat2) pairs that have less than 'min_per_pair' signals. 2) Only keep cat2 that have at least 'nb_pairs' different pairings still non-zero.
def five_cell_types_selection(my_metadata: Metadata)
-
Return a filtered metadata with 5 major cell_types and certain assays.
def fix_roadmap(metadata: Metadata)
-
Merge info from 'data_generating_centre' category.
Convert 'NIH Roadmap Epigenomics' to 'Roadmap'.
def keep_major_assays_2019(my_metadata)
-
Combine rna_seq and polr2a classes pairs in the assay category. Written for the 2019-11 release.
def keep_major_cell_types(my_metadata: Metadata)
-
Remove datasets which are not part of a cell_type which has at least 10 signals in two assays. Those assays must also have at least two cell_type.
def keep_major_cell_types_2019(my_metadata)
-
Select 20 cell types in the major assays signal subset. A cell type needs to have at least 10 signals in one assay. Selection choices made out of the code. Written for the 2019-11 release.
def keep_major_cell_types_alt(my_metadata: Metadata)
-
Return a filtered metadata with certain assays. Datasets which are not part of a cell_type which has at least 10 signals are removed.
def merge_pair_end_info(metadata: Metadata)
-
Merge info from 'paired' and 'pair_end_mode' categories.
Convert FALSE/TRUE to 'single_end' and 'paired_end'
def special_case(my_metadata)
-
Return a filtered metadata with only rna_seq examples, but also add 3 thyroid (for model construction).
Made to evaluate an already trained model, works with min_class_size=3 and oversample=False.
def special_case_2(my_metadata)
-
Return a filtered metadata without 2 examples from all assay/cell_type pairs, and all mrna_seq.