Segmentation & event alignment benchmark


Illustration of Nanopore signal segmentation and event alignment.

After base calling, we can get a base-called sequence that will align with the reference sequence. The segmentation and event alignment task inputs the raw signal and reference sequence. It would segment the raw signal into small fragments and align it with the reference sequence. So one small fragment and its corresponding nucleotide would be called an "event".

For each fragment of the raw signal, we can get its mean and standard deviation, which are used to evaluate the task.

  Segmentation and event alignment benchmark models introduction

Tombo re-squiggle

The re-squiggle algorithm is the basis for the Tombo framework. The re-squiggle algorithm takes as input a read file (in FAST5 format) containing raw signal and associated base calls. The base calls are mapped to a genome or transcriptome reference and then the raw signal is assigned to the reference sequence based on an expected current level model. [1]

Full read adaptive banded shifted half-normal scores. (figure source: [1])
Nanopolish eventalign

The hidden Markov model Nanopolish designed for the consensus problem had 5-mers of a proposed consensus sequence as the backbone of the HMM, with additional states and transitions to handle the skipping/splitting artefacts. Nanopolish used this HMM to calculate a consensus sequence from a set of reads. If we make a reference genome the backbone of the HMM, we can use it to align events to the reference. [2]

SegPore eventalign

SegPore extends the model kmer table by estimating the distribution parameters for both the normal and modified states of each kmer.

Illustration of SegPore eventalign. (figure source: [3])
Reference

[1] https://nanoporetech.github.io/tombo/resquiggle.html

[2] https://simpsonlab.github.io/2015/04/08/eventalign

[3] https://www.biorxiv.org/content/10.1101/2024.01.11.575207