Nanopore signal processing


  Raw signal conversion

     

The illustration above depicts the structure of a single-fast5 file from Nanopore sequencing. In the left figure, the "Signal" data represents the current passing through the pore (type: int16). Oxford Nanopore Technology employs different pores (proteins) in various products. A single flow cell in the sequencing device can contain between 512 and 2675 pores, each referred to as a channel. As shown in the right figure, the fast5 file also includes attributes associated with the channel through which the read passes. These parameters include channel_number (the channel number from which the read was acquired), digitisation (the digitisation is the number of quantisation levels in the Analog to Digital Converter (ADC)), offset (the ADC offset error), range (the full scale measurement range in pico amperes), and sampling_rate (sampling frequency of the ADC).

The raw signal can be converted into pico Ampere (pA) values using attributes available in the channel_id group by the equation :

signal_in_pico_ampere = (raw_signal_value + offset) * range / digitisation

After the conversion, the pA current signal will be utilized for downstream tasks. More details about all attributes can be found in [5].

  Signal standardization or normalization

Oxford Nanopore Technologies provides "standard" parameters for each kmer [3], as illustrated in the figure below. To enhance the accuracy of analyses, some tools attempt to further standardize or normalize the signal based on these parameters. For example, Tombo employs median shift and median absolute deviation (MAD) to normalize the current signal to a 0-1 range. Nanopolish uses scaling parameters to account for per-read variations, while SegPore utilizes the polyA tail to standardize the signal for each read.

Tombo
Tombo uses median shift and MAD (median absolute deviation) scale parameters to normalize the signal:

norm_signal = (signal - median) / MAD

Nanopolish

For each read, Nanopolish estimates a scale parameter to standardize the signal. You can add the --scale-events option in the nanopolish eventalign command to enable this feature. More details can be found in [6].


SegPore
SegPore first detects the polyA tail and calculates its mean (polyA_mu) and standard deviation (polyA_sigma). These values are then used to standardize the signal:

stand_signal = [(signal - polyA_mu) / ployA_sigma ] * POLYA_STANDARD_SIGMA + POLYA_STANDARD_MU

In which, the POLYA_STANDARD_MU and POLYA_STANDARD_SIGMA represent the mean and standard deviation of the kmer "AAAAA" from ONT's standard kmer table.

Reference

[1] https://github.com/nanoporetech/taiyaki/blob/master/docs/FILE_FORMATS.md

[2] Gamaarachchi, H., Samarakoon, H., Jenner, S.P. et al. Fast nanopore sequencing data analysis with SLOW5. Nat Biotechnol 40, 1026–1029 (2022).

[3] https://github.com/nanoporetech/kmer_models

[4] https://nanoporetech.github.io/tombo/resquiggle.html

[5] https://static-content.springer.com/esm/art%3A10.1038%2Fs41587-021-01147-4/MediaObjects/41587_2021_1147_MOESM1_ESM.pdf

[6] https://static-content.springer.com/esm/art%3A10.1038%2Fnmeth.4184/MediaObjects/41592_2017_BFnmeth4184_MOESM258_ESM.pdf