mygfa: A Basic GFA Data Model

This library parses, represents, and emits pangenomic variation graphs in the GFA format. Basic use looks like this:

import mygfa
import sys
graph = mygfa.Graph.parse(sys.stdin)
seg_depths = {name: 0 for name in graph.segments}
for path in graph.paths.values():
    for step in path.segments:
        seg_depths[] += 1

The mygfa.Graph class represents an entire GFA file. You can work down the object hierarchy from there to see everything that the file contains.

mygfa is on PyPI, so you can install it with pip install mygfa.

API Reference

Simple GFA parsing, printing, and pre-processing in Python.

class mygfa.Graph(headers: List[Header], segments: Dict[str, Segment], links: List[Link], paths: Dict[str, Path])

An entire GFA file.

headers: List[Header]

Header (H) lines in the GFA file.

segments: Dict[str, Segment]

The sequence fragments that make up the graph.

The edges between (oriented) segments.

paths: Dict[str, Path]

Named walks through the graph’s edges.

classmethod parse(infile: TextIO) Graph

Parse a GFA file.

emit(outfile: TextIO, showlinks: bool = True) None

Emit a GFA file.

class mygfa.Segment(name: str, seq: Strand)

A GFA segment is nucleotide sequence.

name: str

The segment’s name as declared in the GFA file.

seq: Strand

The nucleotide sequence for this segment.

classmethod parse_inner(name: str, seq: str) Segment

Parse a GFA segment, assuming that the name and sequence have already been extracted.

classmethod parse(fields: List[str]) Segment

Parse a GFA segment.

revcomp() Segment

Returns the reverse complement of this segment.

A GFA link is an edge connecting two handles.

from_: Handle

The edge’s source vertex.

to_: Handle

The edge’s sink vertex.

overlap: Alignment

The CIGAR overlap between the two vertices.

classmethod parse_inner(from_: str, from_ori: str, to_: str, to_ori: str, overlap: str) Link

Parse a GFA link, assuming that the key elements have already been extracted.

classmethod parse(fields: List[str]) Link

Parse a GFA link.

rev() Link

Return the link representing the reverse of this link. i.e, AAAA –> GGGG becomes TTTT <– CCCC

class mygfa.Path(name: str, segments: List[Handle], olaps: List[Alignment] | None)

A GFA path is a walk through the graph.

name: str

“The path’s name as declared in the GFA file.

segments: List[Handle]

The sequence of steps that make up the path.

olaps: List[Alignment] | None

The overlaps between steps in the path.

classmethod parse_inner(name: str, seq: str, overlaps: str) Path

Parse a GFA path, assuming that the name, sequence and overlaps have already been extracted.

classmethod parse(fields: List[str]) Path

Parse a GFA path.

Extract the name, seq, and overlaps, and dispatch to the parse_inner helper.

drop_overlaps() Path

Return a copy of this path without overlaps.

class mygfa.Handle(name: str, ori: bool)

A specific orientation for a segment, referenced by name.

name: str

A segment’s name.

ori: bool

The orientation: True for forward (+), False for backward (-).

classmethod parse(seg: str, ori: str) Handle

Parse a Handle.

rev() Handle

Return the handle representing the complement of this handle.

linkstr() str

This is how a link wants handles to be string-ified.

class mygfa.Strand

A strand is a string that contains only A, T, G, C, or N.

revcomp() Strand

Returns the reverse complement of this strand.

chop(choplen: int) List[Strand]

Chop this strand into pieces of length choplen or less.

classmethod parse(string: str) Strand

Parse a strand.

class mygfa.Alignment(ops: List[Tuple[int, AlignOp]])

CIGAR representation of a sequence alignment.

classmethod parse(cigar: str) Alignment

Parse a CIGAR string, which looks like 3M7N4M.

class mygfa.AlignOp(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

An operator in an Alignment.