Module epiclass.utils.classification_merging_utils

Utility functions for merging classification results.

Functions

def clean_format(x: object) ‑> str

Format a value to string, removing decimal points for whole numbers.

Args

x
Any value that needs string formatting

Returns

Formatted string representation of the value

def merge_dataframes(df1: pd.DataFrame, df2: pd.DataFrame, on: str = 'md5sum', verbose: bool = False) ‑> pandas.core.frame.DataFrame

Merge two DataFrames by concatenating along the given column, otherwise it attemps to merge on md5sum, filename. It attempts to merge by aligning common columns and appending non-common columns.

Column with same names get combined with ';' value separator.

Parameters: df1 (pd.DataFrame): The first DataFrame df2 (pd.DataFrame): The second DataFrame on (str, optional): The column to merge on. Defaults to "md5sum". verbose (bool, optional): Whether to print verbose output. Defaults to False.

Raises

ValueError
If no merge is possible.

Returns: pd.DataFrame: Merged DataFrame. Index name is preserved if it was the same.

def merge_two_columns(df: pd.DataFrame, col1: str, col2: str) ‑> pandas.core.frame.DataFrame

Return update IN-PLACE dataframe that merged values of col1 and col2, only if they are complementary.

def remove_pred_vector(df: pd.DataFrame, verbose: bool = True) ‑> pandas.core.frame.DataFrame

Remove the prediction vector from a result dataframe.

If the "files/epiRR" columns does not exist, it will remove everything after the "1rst/2nd prob ratio" column. If there is any metadata after that column, it will also be removed.

def sjoin(x)

join columns if column is not null