中文版 | English
题名

轻量级眼底图像多病种识别算法及系统

其他题名
LIGHTWEIGHT MULTI-DISEASE RECOGNITION ALGORITHM AND SYSTEM FOR FUNDUS IMAGES
姓名
姓名拼音
XIONG Shaokui
学号
12132585
学位类型
硕士
学位专业
0809 电子科学与技术
学科门类/专业学位类别
08 工学
导师
陈世峰
导师单位
中国科学院深圳先进技术研究院
论文答辩日期
2024-05-08
论文提交日期
2024-07-01
学位授予单位
南方科技大学
学位授予地点
深圳
摘要

人工智能技术在医学影像领域发展迅速,越来越广泛地应用于临床医 学场景中。眼科疾病的医学影像辅助诊断具有重要的意义和应用价值,是 智慧医疗的重要内容之一。眼科疾病识别算法是眼科疾病辅助诊断的核心。

目前,卷积神经网络和 Transformer 模型是眼科疾病识别算法的主流方 法。这两类方法都是从视网膜图像中提取特征,在很多具体的眼科疾病识 别任务中取得了很好的效果。然而,他们都依赖大量的标注数据,且处理 跨模态数据困难,可迁移性较差。

针对以上存在的问题,本文开展以下研究:

( 1 ) 基 于 对 比 语 言 图 像 预 训 练 ( Contrastive Language-Image Pretraining, CLIP)模型的多疾病识别算法。视觉语言模型利用文本中丰富的 语义知识,从而产生比特定任务更具描述性的视觉特征。本文在眼科疾病 识别领域引入视觉语言模型这一新的范式,采用 CLIP 模型实现了多病种眼 科疾病识别的研究。在自定义的多疾病眼底图像数据集上,本文提出的算 法在性能上比两类传统算法分别高出 4.8%和 3.2%。同时,本文引入低秩适 配方法,完成了小样本数据对大型 CLIP 的高效训练,降低了相关技术的重 资源依赖问题。

(2)基于生成式人工智能模型的专家知识文本数据构建。本文采用 CLIP 模型,充分利用了眼底图像的图像和文本数据。然而,眼底图像数据 集缺乏文本标签,而眼科医生标注文本标签成本较高。本文在眼科图像识 别中引入大模型,利用生成式人工智能为眼底图像生成细粒度专家知识文 本,既解决了眼底图像数据集文本标签缺乏的问题,又明显提升了识别性 能。

(3)CLIP 模型轻量化以及系统实现。由于 CLIP 模型参数量巨大,难 以应用和部署在眼科疾病设备中。为了研究更轻量化的 CLIP 模型,本文采 用跨模态知识蒸馏技术对模型进行压缩。本文在性能几乎不受损的条件下, 成功将模型压缩至原来的一半,并且利用压缩后的模型搭建一个功能完善、 性能良好的多疾病眼科疾病识别系统。

其他摘要

Artificial intelligence technology is developing rapidly in the field of medical imaging and is increasingly widely used in clinical medical scenarios. Medical imaging assisted diagnosis of ophthalmic diseases is of great significance and application value, and is one of the important contents of intelligent medical care. Ophthalmic disease recognition algorithms are the core of ophthalmic disease assisted diagnosis.

Currently, convolutional neural network and Transformer model are the mainstream methods of ophthalmic disease recognition algorithms. Both types of methods extract features from retinal images and have achieved good results in many specific ophthalmic disease recognition tasks. However, they both rely on a large amount of labeled data and have difficulty in handling cross-modal data with poor transferability.

Aiming at the above problems, this paper carries out the following research:

(1) Multi-disease recognition algorithm based on Contrastive LanguageImage Pre-training (CLIP) model. Visual language models utilize the rich semantic knowledge in text to produce visual features that are more descriptive than the specific task. In this paper, we introduce the new paradigm of visual language modeling in the field of ophthalmic disease recognition and implement the CLIP model for the study of multi-disease ophthalmic disease recognition. On a customized multi-disease fundus image dataset, the algorithm proposed in this paper outperforms two types of traditional algorithms by 4.8% and 3.2%, respectively. Meanwhile, this paper introduces a low-rank adaptation method to accomplish efficient training of large CLIPs with small sample data, which reduces the heavy resource dependency problem of related techniques.

(2) Expert knowledge text data construction based on generative artificial intelligence model. In this paper, the CLIP model is used to fully utilize the image and text data of fundus images. However, the fundus image dataset lacks text labels, and it is costly for ophthalmologists to label text labels. In this paper, we Abstract III introduce a large model in ophthalmic image recognition and use generative artificial intelligence to generate fine-grained expert knowledge text for fundus images, which not only solves the problem of the lack of text labels in the fundus image dataset, but also significantly improves the recognition performance.

(3) CLIP model lightweighting as well as system implementation. Due to the huge number of CLIP model parameters, it is difficult to be applied and deployed in ophthalmic disease devices. In order to study a more lightweight CLIP model, this paper uses cross-modal knowledge distillation technique to compress the model. In this paper, the model is successfully compressed to half of its original size with almost no performance loss, and the compressed model is utilized to build a well-functioning and well-performing multi-disease ophthalmic disease recognition system.

关键词
其他关键词
语种
中文
培养类别
独立培养
入学年份
2021
学位授予年份
2024-06
参考文献列表

[1] FLAXMAN S R, BOURNE R R A, RESNIKOFF S, et al. Global causes of blindness and distance vision impairment 1990–2020: a systematic review and meta-analysis[J]. The Lancet Global Health, 2017, 5(12): e1221-e1234.
[2] 张悦, 初春燕, 余双, 等. 人工智能应用于青光眼临床筛查及卫生效益分析[J]. 现代生物医学进展, 2020, 20(10): 1868-1872.
[3] National Eye Institute Media Library. Eye disease[EB/OL].
[2024-03-19]. https://nei.nih.gov/health/examples.
[4] BOURNE R R A, FLAXMAN S R, BRAITHWAITE T, et al. Magnitude, temporal trends, and projections of the global prevalence of blindness and distance and near vision impairment: a systematic review and meta-analysis[J]. The Lancet Global Health, 2017, 5(9): e888-e897.
[5] 任恺贤, 杨卫华, 颜智鹏. 人工智能在眼底病诊疗中的应用和研究新进展[J]. 中国研究型医院, 2022, 9(05): 43-48.
[6] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//International conference on machine learning. PMLR, 2021: 8748-8763.
[7] 魏丹丹, 宋宇涵, 王淇, 等. 年龄相关性黄斑变性的研究进展[J]. 基础医学与临床, 2024, 44(4): 553-5573.
[8] BALYEN L, PETO T. Promising artificial intelligence-machine learning-deep learning algorithms in ophthalmology[J]. The Asia-Pacific Journal of Ophthalmology,2019, 8(3): 264-272.
[9] BELLEMO V, LIM Z W, LIM G, et al. Artificial intelligence using deep learning to screen for referable and vision-threatening diabetic retinopathy in Africa: a clinical validation study[J]. The Lancet Digital Health, 2019, 1(1): e35-e44.
[10] CHANDRASEKARAN R, LOGANATHAN B. Retinopathy grading with deep learning and wavelet hyper-analytic activations[J]. The Visual Computer, 2023, 39(7): 2741-2756.
[11] LIU R H, WANG X N, WU Q, et al. Deepdrid: Diabetic retinopathy-grading and image quality estimation challenge[J]. Patterns, 2022, 3(6): 100512.
[12] IMRAN A, LI J Q, PEI Y, et al. Fundus image-based cataract classification using a hybrid convolutional and recurrent neural network[J]. The Visual Computer, 2021, 37(8): 2407-2417.
[13] LIU R H, WANG T Q, LI H T, et al. TMM-Nets: transferred multi-to mono-modal generation for lupus retinopathy diagnosis[J]. IEEE Transactions on Medical Imaging, 参考文献582022, 42(4): 1083-1094.
[14] JIN K, HUANG X R, ZHOU J X, et al. Fives: A fundus image dataset for artificial Intelligence based vessel segmentation[J]. Scientific Data, 2022, 9(1): 475.
[15] SALAM A A, MAHADEVAPPA M, DAS A, et al. RDD-Net: retinal disease diagnosis network: a computer-aided diagnosis technique using graph learning and feature descriptors[J]. The Visual Computer, 2023, 39(10): 4657-4670.
[16] RAGHU M, ZHANG C Y, KLEINBERG J, et al. Transfusion: Understanding transfer learning for medical imaging[J]. Advances in neural information processing systems, 2019, 32: 3347-3357.
[17] AZIZPOUR H, RAZAVIAN A S, SULLIVAN J, et al. Factors of transferability for a generic convnet representation[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 38(9): 1790-1802.
[18] SRINIVASAM V, STRODTHOFF N, MA J, et al. To pretrain or not? A systematic analysis of the benefits of pretraining in diabetic retinopathy[J]. Plos one, 2022, 17(10): e0274291.
[19] SILVA-RODRIGUEZ J, CHAKOR H, KOBBI R, et al. A Foundation LAnguage-Image model of the Retina (FLAIR): Encoding expert knowledge in text supervision[J]. arXiv preprint arXiv:2308.07898, 2023.
[20] WANG Z F, WU Z B, AGARWAL D, et al. Medclip: Contrastive learning from unpaired medical images and text[J]. arXiv preprint arXiv:2210.10163, 2022.
[21] LU M Y, CHEN B W, ZHANG A, et al. Visual language pretrained multiple instance zero-shot transfer for histopathology images[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 19764-19775.
[22] ZHANG Y H, JIANG H, MIURA Y, et al. Contrastive learning of medical visual representations from paired images and text[C]//Machine Learning for Healthcare Conference. PMLR, 2022: 2-25.
[23] ZHOU H Y, CHEN X Y, ZHANG Y H, et al. Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports[J]. Nature Machine Intelligence, 2022, 4(1): 32-40.
[24] HUANG S C, SHEN L Y, LUNGREN M P, et al. Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 3942-3951.
[25] MULLER P, KAISSIS G, ZOU C Y, et al. Joint learning of localized representations from medical images and reports[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 685-701.
[26] CHENG P J, LIN L, LYU J Y, et al. Prior: Prototype representation joint learning from medical images and reports[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 21361-21371.参考文献59
[27] BUSTOS A, PERTUSA A, SALINAS J M, et al. Padchest: A large chest x-ray image dataset with multi-label annotated reports[J]. Medical image analysis, 2020, 66: 101797.
[28] PENG Y F, WANG X S, LU L, et al. NegBio: a high-performance tool for negation and uncertainty detection in radiology reports[J]. AMIA Summits on Translational Science Proceedings, 2018, 2018: 188.
[29] WANG Y H. Unified medical image-text-label contrastive learning with continuous prompt[J]. arXiv preprint arXiv:2307.05920, 2023.
[30] LI Y C, JIA S F, SONG G B, et al. SDA-CLIP: surgical visual domain adaptation using video and text labels[J]. Quantitative Imaging in Medicine and Surgery, 2023, 13(10): 6989.
[31] LIU B, LU D H, WEI D, et al. Improving medical vision-language contrastive pretraining with semantics-aware triage[J]. IEEE Transactions on Medical Imaging, 2023, 42(12): 3579-3589.
[32] WANG F Y, ZHOU Y Y, WANG S J, et al. Multi-granularity cross-modal alignment for generalized medical visual representation learning[J]. Advances in Neural Information Processing Systems, 2022, 35: 33536-33549.
[33] LEI J Y, DAI L S, JIANG H Y, et al. Unibrain: Universal brain mri diagnosis with hierarchical knowledge-enhanced pre-training[J]. arXiv preprint arXiv:2309.06828, 2023.
[34] PACHADE S, PORWAL P, THULKAR D, et al. Retinal fundus multi-disease image dataset (rfmid): A dataset for multi-disease detection research[J]. Data, 2021, 6(2): 14.
[35] PANCHAL S, NAIK A, KOKARE M, et al. Retinal Fundus Multi-Disease Image Dataset (RFMiD) 2.0: a dataset of frequently and rarely identified diseases[J]. Data, 2023, 8(2): 29.
[36] BENDARY N E, HASSANIEN A E, CORCHADO E, et al. ARIAS: Automated retinal image analysis system[C]//Soft Computing Models in Industrial and Environmental Applications, 6th International Conference SOCO 2011. Springer Berlin Heidelberg, 2011: 67-76.
[37] Grand Challenge. 智 慧 之 眼 预 见 未 来 [EB/OL].
[2024-03-28]. https://odir2019.grand-challenge.org.
[38] ABRAMOFF M D, GARVIN M K, SONKA M. Retinal imaging and image analysis[J]. IEEE reviews in biomedical engineering, 2010, 3: 169-208.
[39] MOKHASHI N, GRACHEVSKAYA J, CHENG L, et al. A comparison of artificial intelligence and human diabetic retinal image interpretation in an urban health system[J]. Journal of Diabetes Science and Technology, 2022, 16(4): 1003-1007.
[40] OMAR M A, TAHIR M A, KHELIFI F. Multi-label learning model for improving retinal image classification in diabetic retinopathy[C]//2017 4th International 参考文献60Conference on Control, Decision and Information Technologies (CoDIT). IEEE, 2017: 0202-0207.
[41] CARRERA E V, GONZALEZ A, CARRERA R. Automated detection of diabetic retinopathy using SVM[C]//2017 IEEE XXIV international conference on electronics, electrical engineering and computing (INTERCON). IEEE, 2017: 1-4.
[42] ACHARYA U R, KANNATHAL N, NG E Y K, et al. Computer-based classification of eye diseases[C]//2006 International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 2006: 6121-6124.
[43] CEN L P, JI J, LIN J W, et al. Automatic detection of 39 fundus diseases and conditions in retinal photographs using deep neural networks[J]. Nature communications, 2021, 12(1): 4828.
[44] JU L, WANG X, YU Z, et al. Long-tailed multi-label retinal diseases recognition using hierarchical information and hybrid knowledge distillation[J]. arXiv preprint arXiv:2111.08913, 2021.
[45] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2016: 770-778.
[46] PRAWIRA R, BUSTAMAM A, ANKI P. Multi label classification of retinal disease on fundus images using AlexNet and VGG16 architectures[C]//2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI). IEEE, 2021: 464-468.
[47] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Advances in neural information processing systems, 2012, 25: 1097-1105.
[48] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large -scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
[49] SAHLSTEN J, JASKARI J, KIVINEN J, et al. Deep learning fundus image analysis for diabetic retinopathy and macular edema grading[J]. Scientific reports, 2019, 9(1): 10750.
[50] SENGAR N, JOSHI R C, DUTTA M K, et al. EyeDeep-Net: A multi-class diagnosis of retinal diseases using deep neural network[J]. Neural Computing and Applications, 2023, 35(14): 10551-10571.
[51] NAWAZ A, ALI T, MUSTAFA G, et al. Multi-Class Retinal Diseases Detection Using Deep CNN With Minimal Memory Consumption[J]. IEEE Access, 2023, 11, 56170-56180.
[52] HISHAM I, KHALIL M I, ABBAS H. Multi-label Ophthalmological Disease Classification Using Vision Transformers[C]//2023 5th Novel Intelligent and Leading Emerging Sciences Conference (NILES). IEEE, 2023: 279-284.
[53] WANG X L, LU Y J, WANG Y J, et al. Diabetic retinopathy stage classification using 参考文献61convolutional neural networks[C]//2018 IEEE International Conference on Information Reuse and Integration (IRI). IEEE, 2018: 465-471.
[54] DOSOVITSKIV A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
[55] LIU Z, HU H, LIN Y T, et al. Swin Transformer V2: Scaling Up Capacity and Resolution[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 12009-12019.
[56] TOUVRON H, CORD M, JEGOU H. Deit iii: Revenge of the vit[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 516-533.
[57] LEE J, YOON W, KIM S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240.
[58] SILVA-RODRIGUEZ J, CHELBI J, KABIR W, et al. Exploring the Transferability of a Foundation Model for Fundus Images: Application to Hypertensive Retinopathy[C]//Computer Graphics International Conference. Cham: Springer Nature Switzerland, 2023: 427-437.
[59] HE J J, LI C, YE J, et al. Self-speculation of clinical features based on knowledge distillation for accurate ocular disease classification[J]. Biomedical Signal Processing and Control, 2021, 67: 102491.
[60] CHELARAMANI S, GUPTA M, AGARWAL V, et al. Multi-task knowledge distillation for eye disease prediction[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021: 3983-3993.
[61] ALAPARTHI S, MISHRA M. Bidirectional Encoder Representations from Transformers (BERT): A sentiment analysis odyssey[J]. arXiv preprint arXiv:2007.01127, 2020.
[62] NAZIR A, WANG Z. A comprehensive survey of ChatGPT: Advancements, applications, prospects, and challenges[J]. Meta-radiology, 2023, 1(2): 100022.
[63] GARNER A, ASHTON N. Pathogenesis of hypertensive retinopathy: a review[J]. Journal of the Royal Society of Medicine, 1979, 72(5): 362-365.
[64] WILKINSON C P, FERRIS III F L, KLEIN R E, et al. Proposed international clinical diabetic retinopathy and diabetic macular edema disease severity scales[J]. Ophthalmology, 2003, 110(9): 1677-1682.
[65] HOULSBY N, GIURGIU A, JASTRZEBSKI S, et al. Parameter-efficient transfer learning for NLP[C]//International conference on machine learning. PMLR, 2019: 2790-2799.参考文献62
[66] LI X L, LIANG P. Prefix-tuning: Optimizing continuous prompts for generation[J]. arXiv preprint arXiv:2101.00190, 2021.
[67] HU E J, SHEN Y, WALLIS P, et al. Lora: Low-rank adaptation of large language models[J]. arXiv preprint arXiv:2106.09685, 2021.
[68] LI C Y, FARKHOOR H, LIU R, et al. Measuring the intrinsic dimension of objective landscapes[J]. arXiv preprint arXiv:1804.08838, 2018.
[69] AGHAJANYAN A, ZETTLEMOYER L, GUPTA S. Intrinsic dimensionality explains the effectiveness of language model fine-tuning[J]. arXiv preprint arXiv:2012.13255, 2020.
[70] Grand Challenge. Retinal image analysis for multi-disease detection[EB/OL].
[2024-03-38]. https://riadd.grand-challenge.org/Home.
[71] LIN J K, CAI Q L, LIN M Y. Multi-label classification of fundus images with graph convolutional network and self-supervised learning[J]. IEEE Signal Processing Letters, 2021, 28: 454-458.
[72] RODRIGUEZ M A, ALMARZOUQI H, LIATSIS P. Multi-label retinal disease classification using transformers[J]. IEEE Journal of Biomedical and Health Informatics, 2023, 27(6): 2739-2750.
[73] WU K, PENG H W, ZHOU Z H, et al. Tinyclip: Clip distillation via affinity mimicking and weight inheritance[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 21970-21980.

所在学位评定分委会
电子科学与技术
国内图书分类号
TP391;TP18;R77
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/778766
专题中国科学院深圳理工大学(筹)联合培养
推荐引用方式
GB/T 7714
熊绍奎. 轻量级眼底图像多病种识别算法及系统[D]. 深圳. 南方科技大学,2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12132585-熊绍奎-中国科学院深圳(2851KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[熊绍奎]的文章
百度学术
百度学术中相似的文章
[熊绍奎]的文章
必应学术
必应学术中相似的文章
[熊绍奎]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。