Dataset overview & statistics & download & structure
BC: Base calling;
PD: PolyA Detecttion;
SA: Segmentation and Event Alignmnet;
MD: Modification Detecion.
Dataset |
Publish Date |
Accession Number |
Species |
Type |
Sample |
Flowcell_type |
Sequencing_kit |
BC |
PD |
SA |
MD |
ont_ployA_standard |
2018-09 |
PRJEB28423 |
Synthetic |
RNA |
10xpolyA |
flo-min106 |
sqk-rna001 |
✔ |
✔ |
✔ |
✘ |
15xpolyA |
flo-min106 |
sqk-rna001 |
✔ |
✔ |
✔ |
✘ |
30xpolyA |
flo-min106 |
sqk-rna001 |
✔ |
✔ |
✔ |
✘ |
60xpolyA |
flo-min106 |
sqk-rna001 |
✔ |
✔ |
✔ |
✘ |
80xpolyA |
flo-min106 |
sqk-rna001 |
✔ |
✔ |
✔ |
✘ |
100xpolyA |
flo-min106 |
sqk-rna001 |
✔ |
✔ |
✔ |
✘ |
eGFP_polyA_DNA |
2019-06 |
PRJEB31806 |
Synthetic |
cDNA |
dna_rep1_sqklsk108_flipflop |
flo-min106 |
sqk-lsk108 |
✔ |
✔ |
✔ |
✘ |
dna_rep2_sqklsk109_flipflop |
flo-min106 |
sqk-lsk109 |
✔ |
✔ |
✔ |
✘ |
eGFP_polyA_RNA |
2019-06 |
PRJEB31806 |
Synthetic |
RNA |
rna_rep1_sqkrna001_plus_rt |
flo-min106 |
sqk-rna001 |
✔ |
✔ |
✔ |
✘ |
rna_rep2_sqkrna001_plus_rt |
flo-min106 |
sqk-rna001 |
✔ |
✔ |
✔ |
✘ |
rna_rep3_sqkrna002_minus_rt |
flo-min106 |
sqk-rna002 |
✔ |
✔ |
✔ |
✘ |
lambda_phage |
2021-03 |
PRJNA926802 |
lambda phage |
DNA |
VER5940 |
flo-flg001 |
sqk-lsk109 |
✔ |
✘ |
✔ |
✘ |
NA12878 |
2019-06 |
PRJEB23027 |
Homo sapiens |
DNA |
FAB42828 |
flo-min106 |
sqk-lsk108 |
✔ |
✘ |
✔ |
✘ |
FAF04090 |
flo-min106 |
sqk-lsk108 |
✔ |
✘ |
✔ |
✘ |
FAF09968 |
flo-min106 |
sqk-lsk108 |
✔ |
✘ |
✔ |
✘ |
curlcake |
2019-07 |
PRJNA511582 |
Synthetic |
RNA |
m6A-mod-rep1 |
flo-min106 |
sqk-rna001 |
✔ |
✘ |
✔ |
m6A |
m6A-mod-rep2 |
flo-min106 |
sqk-rna001 |
✔ |
✘ |
✔ |
m6A |
non-mod-rep1 |
flo-min106 |
sqk-rna001 |
✔ |
✘ |
✔ |
✘ |
non-mod-rep2 |
flo-min106 |
sqk-rna001 |
✔ |
✘ |
✔ |
✘ |
scBY4741_m5C |
2021-06 |
PRJNA563591 |
Synthetic |
RNA |
m5C_modified |
flo-min106 |
sqk-rna001 |
✔ |
✘ |
✔ |
m5C |
scBY4741_hm5C |
2021-06 |
PRJNA548268 |
Synthetic |
RNA |
hm5C_modified |
flo-min106 |
sqk-rna001 |
✔ |
✘ |
✔ |
hm5C |
scBY4741_pU |
2021-02 |
PRJNA549001 |
Synthetic |
RNA |
pU_modified |
flo-min106 |
sqk-rna001 |
✔ |
✘ |
✔ |
Ψ |
hct116 |
2021-04 |
PRJEB44348 |
Homo sapiens |
RNA |
HCT-WT-rep1 |
flo-min106 |
sqk-rna002 |
✔ |
✘ |
✔ |
m6A |
HCT-WT-rep2 |
flo-min106 |
sqk-rna002 |
✔ |
✘ |
✔ |
m6A |
HCT-WT-rep3 |
flo-min106 |
sqk-rna002 |
✔ |
✘ |
✔ |
m6A |
hek293t_wt |
2021-01 |
PRJEB40872 |
Homo sapiens |
RNA |
HEK293T-WT-rep1 |
flo-min106 |
sqk-rna001 |
✔ |
✘ |
✔ |
m6A |
HEK293T-WT-rep2 |
flo-min106 |
sqk-rna002 |
✔ |
✘ |
✔ |
m6A |
HEK293T-WT-rep3 |
flo-min106 |
sqk-rna002 |
✔ |
✘ |
✔ |
m6A |
hek293t_ko |
2021-01 |
PRJEB40872 |
Homo sapiens |
RNA |
HEK293T-Mettl3-KO-rep1 |
flo-min106 |
sqk-rna001 |
✔ |
✘ |
✔ |
✘ |
HEK293T-Mettl3-KO-rep2 |
flo-min106 |
sqk-rna002 |
✔ |
✘ |
✔ |
✘ |
HEK293T-Mettl3-KO-rep3 |
flo-min106 |
sqk-rna002 |
✔ |
✘ |
✔ |
✘ |
mESCs_eligos |
2020-10 |
PRJNA497103 |
Mus musculus |
RNA |
mESCs_Mettl3_WT |
flo-min106 |
sqk-rna002 |
✔ |
✘ |
✔ |
m6A |
mESCs_Mettl3_KO |
flo-min106 |
sqk-rna002 |
✔ |
✘ |
✔ |
✘ |
ecoli_eligos |
2020-08 |
PRJNA497103 |
Escherichia coli |
RNA |
IVT_Inosine |
flo-min106 |
sqk-rna002 |
✔ |
✘ |
✔ |
Inosine |
IVT_m5C |
flo-min106 |
sqk-rna002 |
✔ |
✘ |
✔ |
m5C |
IVT_m6A |
flo-min106 |
sqk-rna002 |
✔ |
✘ |
✔ |
m6A |
IVT_normalA |
flo-min106 |
sqk-rna002 |
✔ |
✘ |
✔ |
✘ |
IVT_normalC |
flo-min106 |
sqk-rna002 |
✔ |
✘ |
✔ |
✘ |
dinopore_ivt |
2023-01 |
SRP363295 |
Synthetic |
RNA |
gBlock_pureI |
flo-min106 |
sqk-rna001 |
✔ |
✘ |
✔ |
Inosine |
gBlock_G |
flo-min106 |
sqk-rna001 |
✔ |
✘ |
✔ |
✘ |
dinopore_xenopus |
2022-04 |
SRP363295 |
Xenopus lavies |
RNA |
rep3_stage1_20200812 |
flo-min106 |
sqk-rna002 |
✔ |
✘ |
✔ |
Inosine |
rep3_stage1_20201005 |
flo-min106 |
sqk-rna002 |
✔ |
✘ |
✔ |
Inosine |
rep3_stage9_20200812 |
flo-min106 |
sqk-rna002 |
✔ |
✘ |
✔ |
Inosine |
rep3_stage9_20201008 |
flo-min106 |
sqk-rna002 |
✔ |
✘ |
✔ |
Inosine |
* The base called sequences are from Guppy 6.0.1.
Dataset |
Type |
Raw data size |
Sample |
# multi_fast5 |
# reads |
Avg. current signal length |
Avg. base sequence length* |
ont_ployA_standard |
RNA |
81 GB |
10xpolyA |
24 |
92,428 |
59001.85 |
1207.22 |
15xpolyA |
23 |
91,084 |
56518.49 |
1216.28 |
30xpolyA |
16 |
63,886 |
54111.54 |
1192.65 |
60xpolyA |
28 |
108,314 |
57397.07 |
1172.57 |
80xpolyA |
103 |
409,634 |
47166.28 |
859.32 |
100xpolyA |
70 |
279,895 |
61938.01 |
1173.39 |
eGFP_polyA_DNA |
cDNA |
43 GB |
dna_rep1_sqklsk108_flipflop |
121 |
484,000 |
8956.69 |
763.46 |
dna_rep2_sqklsk109_flipflop |
71 |
280,428 |
21619.23 |
1667.14 |
eGFP_polyA_RNA |
RNA |
529 GB |
rna_rep1_sqkrna001_plus_rt |
231 |
922,826 |
57068.67 |
1126.53 |
rna_rep2_sqkrna001_plus_rt |
364 |
1,452,042 |
50103.37 |
928.41 |
rna_rep3_sqkrna002_minus_rt |
149 |
592,571 |
30888.61 |
465.02 |
lambda_phage |
DNA |
19 GB |
VER5940 |
114 |
113,514 |
116272.62 |
9561.99 |
NA12878 |
DNA |
68 GB |
FAB42828 |
9 |
33,633 |
131148.91 |
6810.35 |
FAF04090 |
24 |
62,833 |
509826.89 |
17801.22 |
FAF09968 |
6 |
21,947 |
334920.97 |
53615.01 |
curlcake |
RNA |
584 GB |
m6A-mod-rep1 |
34 |
134,374 |
69745.77 |
850.16 |
m6A-mod-rep2 |
160 |
638,860 |
58341.88 |
835.01 |
non-mod-rep1 |
17 |
66,736 |
57930.6 |
866.98 |
non-mod-rep2 |
212 |
846,595 |
61719.51 |
1066.53 |
scBY4741_m5C |
RNA |
37 GB |
m5C_modified |
104 |
415,453 |
40792.42 |
539.89 |
scBY4741_hm5C |
RNA |
17 GB |
hm5C_modified |
28 |
111,015 |
81528.2 |
1022.88 |
scBY4741_pU |
RNA |
4 GB |
pU_modified |
11 |
42,386 |
46652.89 |
475.18 |
hct116 |
RNA |
346 GB |
HCT-WT-rep1 |
247 |
987,488 |
66363.12 |
1217.43 |
HCT-WT-rep2 |
254 |
1,015,893 |
57524.51 |
1023.03 |
HCT-WT-rep3 |
419 |
1,673,394 |
65628.29 |
1153.23 |
hek293t_wt |
RNA |
224 GB |
HEK293T-WT-rep1 |
261 |
1,040,661 |
60169.77 |
939.8 |
HEK293T-WT-rep2 |
349 |
1,396,000 |
54077.71 |
1077.61 |
HEK293T-WT-rep3 |
133 |
513,561 |
56785.55 |
1005.06 |
hek293t_ko |
RNA |
356 GB |
HEK293T-Mettl3-KO-rep1 |
373 |
1,490,210 |
58140.7 |
952.63 |
HEK293T-Mettl3-KO-rep2 |
454 |
1,815,589 |
52569.78 |
993.85 |
HEK293T-Mettl3-KO-rep3 |
421 |
1,677,075 |
50185.96 |
970.32 |
mESCs_eligos |
RNA |
220 GB |
mESCs_Mettl3_WT |
791 |
3,163,286 |
33202.35 |
526.23 |
mESCs_Mettl3_KO |
382 |
1,527,561 |
28350.74 |
437.70 |
ecoli_eligos |
RNA |
214 GB |
IVT_Inosine |
203 |
811,953 |
32978.04 |
845.43 |
IVT_m5C |
144 |
573,674 |
45397.06 |
719.52 |
IVT_m6A |
371 |
1,482,437 |
41642.13 |
708.29 |
IVT_normalA |
96 |
383,209 |
33499.75 |
620.83 |
IVT_normalC |
114 |
452,806 |
44566.76 |
731.75 |
dinopore_ivt |
RNA |
15 GB |
gBlock_pureI |
47 |
165,628 |
29869.74 |
450.32 |
gBlock_G |
43 |
150,405 |
32047.08 |
641.17 |
dinopore_xenopus |
RNA |
399 GB |
rep3_stage1_20200812 |
363 |
1,451,289 |
46688.45 |
917.23 |
rep3_stage1_20201005 |
454 |
1,812,200 |
27213.72 |
532.63 |
rep3_stage9_20200812 |
391 |
1,560,032 |
44621.79 |
894.37 |
rep3_stage9_20201008 |
313 |
1,251,130 |
31185.45 |
448.15 |
All raw dataset can be downloaded from another page, Raw Data Download, where users can download the raw fast5 files directly.
|
Processed dataset download |
All processed dataset can be downloaded from Zenodo.
The dataset is dynamic and the current version is 1.0.0. We continue to update more processed dataset of different software.
|
Processed dataset structure |
The general structure of each dataset is as the following picture (The example dataset has two samples: sample_0 and sample_1). However, not all datasets contain all the modules. For example, Tailfindr and SegPore can not process the DNA data, so DNA datasets usually do not have the 3_tailfindr and 6_segpore. Some datasets only contain a few modules, and we are continuing to update them.