Module epiclass.utils.compute_bin_metrics

This module provides a command-line interface for computing the metric of signal bins for a given set of genomic data.

The script reads genomic data from hdf5 files, which are provided as input through the command line. Additional inputs include chromosome sizes.

The command line interface requires four arguments: 1) A file containing hdf5 filenames. 2) A file containing the sizes of chromosomes. 3) A directory for log outputs.

The hdf5 files are used to create a dict of signal data, from which the metrics for each signal bin is computed over all hdf5s. The metrics per bin are then written to a npz file in the log directory.

Metrics

  • Mean: The mean of each signal bin.
  • Standard deviation: The standard deviation of each signal bin.
  • Median: The median of each signal bin.
  • IQR: The interquartile range of each signal bin.

npz output: {metric: [list of values]}

Typical usage example: $ python compute_metrics.py hdf5s.list chrom.sizes logs/

Functions

def compute_metrics(hdf5s: Dict[str, numpy.ndarray]) ‑> Dict[str, numpy.ndarray]

Computes various metrics for signal bins from HDF5 signals.

Args

hdf5s : Dict[str, np.ndarray]
Dictionary of signal data.

Returns

Dict[str, np.ndarray]
Dictionary of computed metrics.
def main()

main called from command line, edit to change behavior

def parse_arguments() ‑> argparse.Namespace

argument parser for command line