Module epiclass.utils.bed_utils

This module provides a collection of utilities for manipulating and analyzing genomic data. The utilities are useful for performing operations such as mapping between genomic ranges and bins, and writing genomic data to bedgraph or .bed files.

The module provides functions to: - Compute the size of a concatenated genome based on the resolution of each chromosome. - Verify if a given resolution is coherent with the input size of the network. - Convert values to a bedgraph format. - Write given bed ranges to a .bed file. - Compute the cumulative bin positions at the start of each chromosome. - Convert multiple global genome bins to chromosome ranges. - Convert multiple chromosome ranges to global genome bins. - Generate new random bed files.

Please note: The function values_to_bedgraph() is not yet implemented and will raise a NotImplementedError when invoked.

Functions

def assert_correct_resolution(chroms, resolution, signal_length)

Raise AssertionError if the given resolution is not coherent with the input size of the network.

def bed_ranges_to_bins(ranges: List[Tuple[str, int, int]], chroms: List[Tuple[str, int]], resolution: int) ‑> List[int]

Convert multiple chromosome ranges to global genome bins.

Args

ranges : List[Tuple[str, int, int]]
List of tuples, each containing (chromosome name, start position, end position).
chroms : List[Tuple[str, int]]
List of tuples (ordered by chromosome order), where each tuple contains a chromosome name and its length in base pairs.
resolution : int
The size of each bin.

Returns

List[int]
List of bin indexes in the genome.

Raises

IndexError
If any range is not in any chromosome.

Note

The function assumes that chromosomes in chroms are ordered in alphanumerical order (chr1, chr10, …). The functions assumes that the binning was done per chromosome and then joined. The ranges are half-open intervals [start, end). The returned bin indexes are zero-based and span the entire genome considering the resolution.

def bed_to_bins(bed_source: str | Path | IO[bytes], chroms: List[Tuple[str, int]], resolution: int)

Convert the content of a .bed file to global genome bins.

Chains the read_bed_to_ranges() and bed_ranges_to_bins() functions.

Args

bed_source : Union[str, Path, IO[bytes]]
The path to the .bed file or an open file-like object.
chroms : List[Tuple[str, int]]
List of tuples (ordered by chromosome order), where each tuple contains a chromosome name and its length in base pairs.
resolution : int
The size of each bin, in bp.

Returns

List[int]
List of bin indexes in the genome.
def bins_to_bed_ranges(bin_indexes: Iterable[int], chroms: List[Tuple[str, int]], resolution: int) ‑> List[Tuple[str, int, int]]

Convert multiple global genome bins to chromosome ranges.

Args

bin_indexes : List[int]
List of bin indexes in the genome.
chroms : List[Tuple[str, int]]
List of tuples (ordered by chromosome order), where each tuple contains a chromosome name and its length in base pairs.
resolution : int
The size of each bin.

Returns

List[Tuple[str, int, int]]
List of tuples, each containing (chromosome name, start position, end position).

Raises

IndexError
If any bin index is not in any chromosome,

i.e., it's greater than the total number of bins in the genome.

Note

The function assumes that chromosomes in chroms are ordered as they appear in the genome. The functions assumes that the binning was done per chromosome and then joined. The bin indexes are zero-based and span the entire genome considering the resolution. The returned ranges are half-open intervals [start, end).

def compute_cumulative_bins(chroms: List[Tuple[str, int]], resolution: int) ‑> List[int]

Compute the cumulative bin positions at the start of each chromosome.

Args

chroms : List[Tuple[str, int]]
List of tuples (ordered by bedsort chromosome order), where each tuple contains a chromosome name and its length in base pairs.
resolution : int
The size of each bin.

Returns

List[int]
List of cumulative bin positions.
def create_new_random_bed(hdf5_size: int, desired_size: int, resolution: int, n_bed: int = 1, output_dir: Path = PosixPath('/home/local/USHERBROOKE/rabj2301/Projects/sources/epiclass/docs'))

Create new random bed files.

Args

hdf5_size : int
The total size of the HDF5 file (unique to each resolution).
desired_size : int
The desired size of the random bed file.
resolution : int
The resolution of each bed/hdf5 bins.

Returns

None

def pairwise(iterable)

s -> (s0,s1), (s1,s2), (s2, s3), …

def predict_concat_size(chroms, resolution)

Compute the size of a concatenated genome from the resolution of each chromosome.

def read_bed_to_ranges(bed_source: str | Path | IO[bytes])

Read a .bed file and return the ranges as a list of tuples.

Args

bed_source : Union[str, Path, IO[bytes]]
The path to the .bed file or an open file-like object.

Returns

List[Tuple[str, int, int]]
List of tuples, each containing (chromosome name, start position, end position).
def values_to_bedgraph(values, chroms, resolution, bedgraph_path)

Write a bedgraph from a full genome values iterable (e.g. importance). The chromosome coordinates are zero-based, half-open (from 0 to N-1).

def write_to_bed(bed_ranges: List[Tuple[str, int, int]], bed_path: str | Path, verbose: bool = False)

Writes the given bed ranges to a .bed file.

Args

bed_ranges : List[Tuple[str, int, int]]
List of tuples, each containing (chromosome name, start position, end position).
bed_path : str
The path where the .bed file should be written.

Note

The function doesn't return anything. It writes directly to a file.