Base calling benchmark


Base Calling

Similarly to Automatic Speech Recognition (ASR), base calling is also a sequence-to-sequence (seq2seq) model. In ASR, the input is a human speech signal, and the output is a text sequence. Similarly, in base calling, the input is the current signal from Oxford Nanopore sequencing, and the output is a nucleotide sequence (A, C, G, T, or U).


  Base Calling ground truth acquisition

We perform base calling firstly, then align the base-called sequence with the reference genome and run Nanopolish “eventalign”. Finally, we extract the matched raw signal segments and reference sequence fragments as the ground truth.


  DNA base calling benchmark

Download the benchmark datasets and resluts (NA12878_benchmark.tar.gz, 70.6 GB).

  RNA base calling benchmark

Download the benchmark datasets and resluts (hek293t_benchmark.tar.gz, 81.9 GB).

  Base calling benchmark models overview