中文版 | English
题名

自适应采样技术在纳米孔测序中的研究与应用

其他题名
RESEARCH AND APPLICATION OF ADAPTIVE SAMPLING TECHNOLOGY IN NANOPORE SEQUENCING
姓名
姓名拼音
LI Junyao
学号
12132451
学位类型
硕士
学位专业
0856 材料与化工
学科门类/专业学位类别
0856 材料与化工
导师
李毅
导师单位
深港微电子学院
论文答辩日期
2023-05-15
论文提交日期
2023-06-29
学位授予单位
南方科技大学
学位授予地点
深圳
摘要

本文关注基于 ONT 纳米孔测序的自适应采样技术。这一技术通过实时、动态地选择待测序的 DNA 片段,使研究人员能够有针对性地对特定区域或特定目标的DNA 进行测序,从而提高整体测序质量。自适应采样技术在基因组组装、病原体检测、精准医疗等领域具有广泛的应用潜力。然而,现有的自适应采样技术在准确率和实时性方面仍存在一定的局限性,这限制了其在实际应用中的效果。
针对这一问题,本文对相关的理论技术进行了深入研究,以时序数据多分类任务为起点,逐步分析了电流信号预处理、碱基序列比对和深度神经网络等技术原理。通过对这些技术原理的深入了解,本文提出了一种基于 Transformer 神经网络架构的算法模型,搭建了一套端到端的自适应采样技术流程。这一流程涵盖了从测序数据的处理到算法模型的评价指标等关键环节,为实际应用提供了切实可行的解决方案。
在实验设计上,本文自行设计了一种应用于核酸标签技术的DNA条形码序列,提供了 96 分类条形码和 384 分类条形码,以适应目前不断增长的核酸标签应用需求。通过探索核酸标签技术和自适应采样技术的有机结合,本文旨在提高自适应采样技术在实际应用中的效果。
为了验证本文提出的 Transformer for barcode 神经网络模型的性能表现,本文将其与现有工作进行了详细的比较。结果表明,该模型分别在 96 和 384 分类条形码上可以达到 97.5%和 93.5%的分类准确率,这一表现位于目前相关工作的领先水平。同时,在实时性方面,Transformer for barcode 模型也展现出了较好的性能,与现有工作的主流水准相当。这说明本文提出的方法在解决自适应采样技术的准确率和实时性问题上具有明显的优势,在未来,本文将继续优化和完善自适应采样技术流程。

关键词
语种
中文
培养类别
独立培养
入学年份
2021
学位授予年份
2023-06
参考文献列表

[1] Ouzounis C A, Valencia A. Early bioinformatics: the birth of a discipline—a personal view[J]. Bioinformatics, 2003, 19(17): 2176-2190.
[2] Sanger F, Nicklen S, Coulson A R. DNA sequencing with chain-terminating inhibitors[J]. Proc Natl Acad Sci U S A, 1977, 74(12): 5463-5467.
[3] Behjati S, Tarpey P S. What is next generation sequencing? [J]. Arch Dis Child Educ Pract Ed, 2013, 98(6): 236-238.
[4] Niedringhaus T P, Milanova D, Kerby M B, Snyder M P, Barron A E. Landscape of next-generation sequencing technologies[J]. Anal Chem, 2011, 83(12): 4327-4341.
[5] Wang Y, Zhao Y, Bollas A, et al. Nanopore sequencing technology, bioinformatics and applications[J]. Nat Biotechnol, 2021, 39: 1348-1365.
[6] Loose M, Malla S, Stout M. Real-time selective sequencing using nanopore technology[J]. Nat Methods, 2016, 13: 751-754.
[7] Martin S, Heavens D, Lan Y, et al. Nanopore adaptive sampling: a tool for enrichment of low abundance species in metagenomic samples[J]. Genome Biol, 2022, 23: 11.
[8] Gnirke A, Melnikov A, Maguire J, et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing[J]. Nat Biotechnol, 2009, 27(2): 182-189.
[9] Kozarewa I, Armisen J, Gardner A F, et al. Overview of Target Enrichment Strategies[J]. Curr Protoc Mol Biol, 2015, 112: 7.21.1-7.21.23.
[10] Rand A C, Jain M, Eizenga J M, Musselman-Brown A, et al. Mapping dna methylation with high-throughput nanopore sequencing[J]. Nat Methods, 2017, 14(4): 411-413.
[11] Simpson J T, Workman R E, Zuzarte P C, et al. Detecting dna cytosine methylation using nanopore sequencing[J]. Nat Methods, 2017, 14(4): 407-410.
[12] Charalampous T, Kay G L, Richardson H, et al. Nanopore metagenomics enables rapid clinical diagnosis of bacterial lower respiratory infection[J]. Nat Biotechnol, 2019, 37(7): 783-792.
[13] Gilpatrick T, Lee I, Graham J E, et al. Targeted nanopore sequencing with cas9 for studies of methylation, structural variants, and mutations[J]. bioRxiv, 2019: 604173.
[14] Gu W, Crawford E D, O'Donovan B D, et al. Depletion of Abundant Sequences by Hybridization (DASH): using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications[J]. Genome Biol, 2016, 17(1): 41.
[15] Lin B, Hui J, Mao H. Nanopore Technology and Its Applications in Gene Sequencing[J]. Biosensors, 2021, 11(7): 214.
[16] Charalampous T, Kay G L, Richardson H, et al. Nanopore metagenomics enables rapid clinical diagnosis of bacterial lower respiratory infection[J]. Nat Biotechnol, 2019, 37: 783-792.
[17] Marotz C A, Sanders J G, Zuniga C, et al. Improving saliva shotgun metagenomics by chemical host DNA depletion[J]. Microbiome, 2018, 6: 42.
[18] Liu S. DNA barcoding and emerging reference construction and data analysis technologies[J]. Biodiversity Science, 2019, 27: 526-533.
[19] Kovaka S, Fan Y, Ni B, et al. Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED[J]. Nat Biotechnol, 2021, 39: 431-441.
[20] Ferragina P, Manzini G. Opportunistic data structures with applications[C]. Proceedings 41st Annual Symposium on Foundations of Computer Science, 2000: 390-398.
[21] Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform[J]. Bioinformatics, 2010, 26: 589-595.
[22] Zhang H, Li H, Jain C, et al. Real-time mapping of nanopore raw signals[J]. Bioinformatics, 2021, 37: 477-483.
[23] Chaisson M J, Tesler G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory[J]. BMC Bioinf, 2012, 13: 238.
[24] Sedlazeck F J, Rescheneder P, Smolka M, et al. Accurate detection of complex structural variations using single-molecule sequencing[J]. Nat Methods, 2018, 15: 461-468.
[25] Li H. Minimap2: pairwise alignment for nucleotide sequences[J]. Bioinformatics, 2018, 34: 3094-3100.
[26] Wick R R, Judd L M, Holt K E. Deepbinner: Demultiplexing barcoded Oxford Nanopore reads with deep convolutional neural networks[J]. PLoS Comput Biol, 2018, 14: e1006583.
[27] Doroschak K, Zhang K, Queen M, et al. Rapid and robust assembly and decoding of molecular tags with DNA-based nanopore signatures[J]. Nat Commun, 2020, 11: 5454.
[28] Irinyi L, Lackner M, de Hoog G S, Meyer W. DNA barcoding of fungi causing infections in humans and animals[J]. Fungal Biol, 2015, 120(2): 125-136.
[29] Bao Y, Wadden J, Erb-Downward J R, et al. SquiggleNet: real-time, direct classification of nanopore signals[J]. Genome Biol, 2021, 22: 298.
[30] He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016: 770-778.
[31] Rang F J, Kloosterman W P, de Ridder J. From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy[J]. Genome Biol, 2018, 19: 90.
[32] Payne A, Holmes N, Clarke T, et al. Readfish enables targeted nanopore sequencing of gigabase-sized genomes[J]. Nat Biotechnol, 2021, 39(4): 442-450.
[33] Payne A, Munro R, Holmes N, et al. Barcode aware adaptive sampling for Oxford Nanopore sequencers[J]. BioRxiv [Preprint], 2022. [cited 2023 March 03]. Available from: doi.org/10.1101/2021.12.01.470722
[34] Ulrich J-U, Lutfi A, Rutzen K, Renard B Y. ReadBouncer: precise and scalable adaptive sampling for nanopore sequencing[J]. Bioinformatics, 2022, 38(Supplement_1): i153–i160.
[35] Dadi T H, et al. Dream-yara: an exact read mapper for very large databases with short update time[J]. Bioinformatics, 2018, 34: i766–i772.
[36] Piro V C, et al. ganon: precise metagenomics classification against large and up-to-date sets of reference sequences[J]. Bioinformatics, 2020, 36: i12–i20.
[37] Bloom B H. Space/time trade-offs in hash coding with allowable errors[J]. Communications of the ACM, 1970, 13(7): 422-426.
[38] Bhattacharyya A. On a Measure of Divergence between Two Multinomial Populations[J]. Sankhyā: The Indian Journal of Statistics, 1946, 7(4): 401-406.
[39] Wick R R, Judd L M, Holt K E. Performance of neural network basecalling tools for Oxford Nanopore sequencing[J]. Genome Biol, 2019, 20: 129.
[40] Gong L, et al. Picky comprehensively detects high-resolution structural variants in nanopore long reads[J]. Nat Methods, 2018, 15: 455-460.
[41] Brickwedde A, Brouwers N, Broek M, et al. Structural, physiological and regulatory analysis of maltose transporter genes in Saccharomyces eubayanus CBS 12357T[J]. Front Microbiol, 2018, 9: 1786.
[42] Zeng J, Cai H, Peng H, et al. Causalcall: nanopore basecalling using a temporal convolutional network[J]. Front Genet, 2020, 10: 1332.
[43] Helmersen K, Aamot H V. DNA extraction of microbial DNA directly from infected tissue: an optimized protocol for use in nanopore sequencing[J]. Sci Rep, 2020, 10: 2985.
[44] Radford A, Kim J W, Xu T, et al. Robust Speech Recognition via Large-Scale Weak Supervision[J]. arXiv [Preprint], 2022. [cited 2023 March 03]. Available from: doi.org/10.48550/arXiv.2212.04356.
[45] Schuster M, Paliwal K K. Bidirectional recurrent neural networks[J]. IEEE Trans Signal Process, 1997, 45(11): 2673-2681.
[46] Vaswani A, Shazeer N, Parmar N, et al. Attention is All you Need[C]. Proceedings of Advances in Neural Information Processing Systems 30 (NIPS 2017), vol. 30, 2017.
[47] Cooley J W, Tukey J W. An Algorithm for the Machine Calculation of Complex Fourier Series[J]. Math Comput, 1965, 19(90): 297-301.
[48] 赵新佳. 纳米孔中纳米结构动力学及DNA测序信息分析[D]. 中国科学院大学(中国科学院物理研究所), 2020.
[49] Robert M, Hayes W, Hunt B R, et al. Reducing storage requirements for biological sequence comparison[J]. Bioinformatics, 2004, 20(18): 3363-3369.
[50] Shelhamer E, Long J, Darrell T. Fully Convolutional Networks for Semantic Segmentation[J]. IEEE Trans Pattern Anal Mach Intell., 2014, 39: 640-651.
[51] Kim Y. Convolutional Neural Networks for Sentence Classification[C]. Proceedings of Conference on Empirical Methods in Natural Language Processing, 2014.
[52] Aghdam H, Heravi E. Guide to convolutional neural networks: a practical application to traffic-sign detection and classification[M]. Springer International Publishing, 2017, pp. 85-90.
[53] Lee C, Gallagher P, Tu Z. Generalizing Pooling Functions in CNNs: Mixed, Gated, and Tree[J]. IEEE Trans Pattern Anal Mach Intell., 2018, 40(4): 863-875.
[54] Ciresan D, Meier U, Masci J, et al. Flexible, High Performance Convolutional Neural Networks for Image Classification[C]. Proceedings of the 22nd International Joint Conference on Artificial Intelligence, 2011, pp. 1237–1242.
[55] Elman J L. Finding structure in time[J]. Cognitive Science, 1990, 14(2): 179-211.
[56] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Comput, 1997, 9(8): 1735-1780.
[57] Ba J L, Kiros J R, Hinton G E. Layer normalization[J]. arXiv [Preprint], 2016. [cited 2023 March 03]. Available from: doi.org/10.48550/arXiv.1607.06450
[58] Ewing B, Hillier L, Wendl M C, et al. Base-calling of automated sequencer traces using phred. I. Accuracy assessment[J]. Genome Res, 1998, 8(3): 175-185.
[59] Leger et al. pycoQC, interactive quality control for Oxford Nanopore Sequencing[J]. Int. J. Open Source Softw Process., 2019, 4(34): 1236.
[60] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale[J]. arXiv [Preprint], 2020. [cited 2023 March 03]. Available from: doi.org/10.48550/arXiv.2010.11929
[61] Diederik K, Jimmy B. Adam: A Method for Stochastic Optimization[C]. Proceedings of International Conference for Learning Representations (ICLR), 2015.

所在学位评定分委会
材料与化工
国内图书分类号
TP391
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/544524
专题南方科技大学-香港科技大学深港微电子学院筹建办公室
推荐引用方式
GB/T 7714
李骏垚. 自适应采样技术在纳米孔测序中的研究与应用[D]. 深圳. 南方科技大学,2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12132451-李骏垚-南方科技大学-(4597KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[李骏垚]的文章
百度学术
百度学术中相似的文章
[李骏垚]的文章
必应学术
必应学术中相似的文章
[李骏垚]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。