Base calling benchmark
Similarly to Automatic Speech Recognition (ASR), base calling is also a sequence-to-sequence (seq2seq) model. In ASR, the input is a human speech signal, and the output is a text sequence. Similarly, in base calling, the input is the current signal from Oxford Nanopore sequencing, and the output is a nucleotide sequence (A, C, G, T, or U).
Base Calling ground truth acquisition |
---|
We perform base calling firstly, then align the base-called sequence with the reference genome and run Nanopolish “eventalign”. Finally, we extract the matched raw signal segments and reference sequence fragments as the ground truth.
DNA base calling benchmark |
---|
Download the benchmark datasets and resluts (NA12878_benchmark.tar.gz, 70.6 GB).
RNA base calling benchmark |
---|
Download the benchmark datasets and resluts (hek293t_benchmark.tar.gz, 81.9 GB).
Base calling benchmark models overview |
---|