中文版 | English
题名

端到端图像压缩空间区域优化研究

其他题名
SPATIAL REGION OPTIMIZATION FOR END-TO-END LEARNED IMAGE COMPRESSION
姓名
姓名拼音
HAI Bowen
学号
12232164
学位类型
硕士
学位专业
085410 人工智能
学科门类/专业学位类别
08 工学
导师
何志海
导师单位
电子与电气工程系
论文答辩日期
2024-05-08
论文提交日期
2024-06-25
学位授予单位
南方科技大学
学位授予地点
深圳
摘要

随着信息技术时代的持续深化和智能终端设备的广泛普及,各种可视化交互终端的数量不断增加。图像作为重要的信息载体,以直观且生动的方式呈现复杂的数据和信息,满足了人们对高质量信息获取和快速理解的需求。在此过程中,图像数据的生成速度呈现出爆炸式增长,进而引发了存储压力和传输难题。
传统图像压缩技术在发展数年后遇到瓶颈,在压缩率和保真度等方面逐渐无法满足现有图像传输的需求。特别是在面对更高品质图像压缩需求时,传统方法显得力不从心。因而,深度学习技术被引入图像压缩领域,成为了下一代图像压缩框架的新方向。

本文针对现有的端到端深度学习图像压缩框架存在的空间区域特征提取能力较差,无法兼顾局部特征提取与非局部特征提取能力,感受野受限以及泛化性较差的问题进行深入研究,重点分析了端到端深度学习图像压缩网络中的核心变换与编码方法。

为解决现有模型无法兼顾局部与非局部特征提取的问题,本文提出了一种基于保留注意力-卷积模块的端到端深度学习图像压缩框架。网络的核心保留注意力-卷积模块,创新性地结合了卷积神经网络和保留注意力机制,构建了一个统一的局部与非局部特征提取能力的网络架构,实现了更优的图像空间特征提取。

除此之外,本文通过融合传统图像压缩方法,在深度学习图像压缩框架中引入小波变换,使用二维哈尔小波变换对其进行优化,扩大了模型的有效感受野,进一步提升模型的压缩重建能力。

通过在四种多尺寸多样式的测试集上进行测试,本文提出的深度学习图像压缩框架在图像压缩重建的客观指标与主观视觉效果、感受野可视化中都取得了明显的提升,加大了模型的有效感受野,实现了端到端深度学习图像压缩框架在空间区域上的特征提取能力优化。

其他摘要

As the era of information technology continues to deepen and intelligent terminal devices become increasingly widespread, the number of various visual interactive terminals is on the rise. Images, as a critical medium for conveying information, present complex data and information in an intuitive and vivid manner, satisfying the public's demand for high-quality information acquisition and rapid comprehension. In this process, the generation speed of image data has shown exponential growth, giving rise to storage pressures and transmission challenges.

Traditional image compression techniques, after years of development, have encountered bottlenecks, gradually failing to meet the current demands for image transmission in terms of compression ratio and fidelity. Especially when confronted with higher quality image compression requirements, traditional methods prove inadequate. Thus, deep learning technology has been introduced into the field of image compression, emerging as the new frontier for the next-generation image compression frameworks.

This paper focus on end-to-end learned image compression frameworks, which exhibit poor spatial region feature extraction capabilities, struggling to balance both local and non-local feature extraction, suffer from limited receptive fields, and demonstrate inferior generalization performance. The paper particularly focuses on analyzing the core transformations and encoding methods within end-to-end learned image compression networks.

To address the inability of current models to effectively handle both local and non-local feature extraction, this paper proposes an end-to-end learned image compression framework based on a novel Retention-Convolution module. The core module innovatively combines Convolutional Neural Networks (CNNs) with a retention mechanism, constructing a unified network architecture capable of extracting both local and non-local features, thereby achieving enhanced spatial feature extraction in images.

Moreover, this paper integrates traditional image compression methods by introducing wavelet transformation into the learned image compression framework, specifically optimizing it using two-dimensional Haar wavelet transformation. This expansion broadens the model's effective receptive field, further enhancing the model's compression and reconstruction capabilities.

Upon rigorous testing across four diverse test sets proposed learned image compression framework demonstrates significant improvements in both objective metrics for image compression and reconstruction, subjective visual quality assessments, and receptive field visualizations. It effectively expands the model's receptive field and realizes optimized feature extraction capability over spatial regions within an end-to-end learned image compression framework.

关键词
语种
中文
培养类别
独立培养
入学年份
2022
学位授予年份
2024-06
参考文献列表

[1] 胡晨昱, 韩申生. 信息光学成像研究回顾、现状与展望 (特邀)[J]. 红外与激光工程, 2022, 51 (43-64).
[2] CHENG Y, WANG D, ZHOU P, et al. A survey of model compression and acceleration for deep neural networks[A]. 2017.
[3] GONZALEZ R C. Digital image processing[M]. Pearson education india, 2009.
[4] CHOUDHARY T, MISHRA V, GOSWAMI A, et al. A comprehensive survey on model compression and acceleration[J]. Artificial Intelligence Review, 2020, 53: 5113-515.
[5] WALLACE G K. The JPEG still picture compression standard[J]. Communications of the ACM, 1991, 34(4): 30-44.
[6] TAUBMAN D S, MARCELLIN M W. JPEG2000: Standard for interactive imaging[J]. Proceedings of the IEEE, 2002, 90(8): 1336-1357.
[7] BELLARD F. BPG (Better Portable Graphics) image format[Z]. 2014.
[8] WIEN M, BROSS B. Versatile video coding–algorithms and specification[C]//2020 IEEE International Conference on Visual Communications and Image Processing (VCIP). IEEE, 2020: 1-3.
[9] LU G, YANG R, WANG S, et al. Deep learning for visual data compression[C]//Proceedings of the 29th ACM International Conference on Multimedia. 2021: 5683-5685.
[10] GOLTS A, SCHECHNER Y Y. Image compression optimized for 3D reconstruction by utilizing deep neural networks[J]. Journal of Visual Communication and Image Representation, 2021, 79: 103208.
[11] GAUDIO A, SMAILAGIC A, FALOUTSOS C, et al. DeepFixCX: Explainable privacy preserving image compression for medical image analysis[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2023: e1495.
[12] 陈菊霞, 闫雪, 祝启斌, 等. 面向感兴趣区域的高性能图像压缩方法[J/OL]. 激光杂志, 2022, 43(12): 62-70. DOI: 10.14016/j.cnki.jgzz.2022.12.062.
[13] 秦怡, 满天龙, 万玉红, 等. 光学图像压缩加密技术研究进展[J]. 激光与光电子学进展, 2023, 60(04): 11-37.
[14] BALLé J, LAPARRA V, SIMONCELLI E P. End-to-end optimized image compression International Conference on Learning Representations[C]//International Conference on Learning Representations (ICLR), 2017.
[15] LI M, ZUO W, GU S, et al. Learning convolutional networks for content-weighted image compression[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2018: 3214-3223.
[16] TODERICI G, O’MALLEY S M, HWANG S J, et al. Variable rate image compression with recurrent neural networks[C]//International Conference on Learning Representations (ICLR),2016.
[17] TODERICI G, VINCENT D, JOHNSTON N, et al. Full Resolution Image Compression with Recurrent Neural Networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, 2017: 5435-5443.
[18] JOHNSTON N, VINCENT D, MINNEN D, et al. Improved Lossy Image Compressin with Priming and Spatially Adaptive Bit Rates for Recurrent Networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, 2018: 4385-4393.
[19] ZHU Y, YANG Y, COHEN T. Transformer-based transform coding[C]//International Conference on Learning Representations. 2021.
[20] QIAN Y, SUN X, LIN M, et al. Entroformer: A Transformer-based Entropy Model for Learned Image Compression[C]//International Conference on Learning Representations. 2021.
[21] ZOU R, SONG C, ZHANG Z. The devil is in the details: Window-based attention for image compression[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 17492-17501.
[22] SHANNON C E. A mathematical theory of communication[J]. The Bell system technical journal, 1948, 27(3): 379-423.
[23] HUFFMAN D A. A method for the construction of minimum-redundancy codes[J]. Proceedings of the IRE, 1952, 40(9): 1098-1101.
[24] RISSANEN J J. Generalized Kraft inequality and arithmetic coding[J]. IBM Journal of research and development, 1976, 20(3): 198-203.
[25] DUDA J. Asymmetric numeral systems[A]. 2009.
[26] BARNSLEY M F, SLOAN A D. A better way to compress images[J]. Byte, 1988, 13(1): 215-223.
[27] PEARSON D E. Developments in model-based video coding[J]. Proceedings of the IEEE, 1995, 83(6): 892-906.
[28] BALLé J, MINNEN D, SINGH S, et al. Variational image compression with a scale hyperprior [C/OL]//International Conference on Learning Representations. 2018. https://openreview.net/forum?id=rkcQFMZRb.
[29] RIPPEL O, BOURDEV L. Real-time adaptive image compression[C]//International Conference on Machine Learning. PMLR, 2017: 2922-2930.
[30] CHENG Z, SUN H, TAKEUCHI M, et al. Learned image compression with discretized gaussian mixture likelihoods and attention modules[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 7939-7948.
[31] HE D, ZHENG Y, SUN B, et al. Checkerboard context model for efficient learned image compression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 14771-14780.
[32] 郭扬. 信息论在深度学习中的应用概论[J/OL]. 电脑知识与技术, 2022, 18(11): 82-83. DOI: 10.14004/j.cnki.ckt.2022.0678.
[33] AHMED N, NATARAJAN T, RAO K R. Discrete cosine transform[J]. IEEE transactions on Computers, 1974, 100(1): 90-93.
[34] COOLEY J W, TUKEY J W. An algorithm for the machine calculation of complex Fourier series[J]. Mathematics of computation, 1965, 19(90): 297-301.
[35] MORLET J. Sampling theory and wave propagation[C]//Issues in acoustic Signal—image processing and recognition. Springer, 1983: 233-261.
[36] WICKER S B, BHARGAVA V K. Reed-Solomon codes and their applications[M]. John Wiley & Sons, 1999.
[37] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[38] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale[C]//International Conference on Learning Representations. 2020.
[39] LUO W, LI Y, URTASUN R, et al. Understanding the effective receptive field in deep convolutional neural networks[J]. Advances in neural information processing systems, 2016, 29.
[40] MINNEN D, BALLÉ J, TODERICI G D. Joint autoregressive and hierarchical priors for learned image compression[J]. Advances in neural information processing systems, 2018, 31.
[41] KODAK E. Kodak lossless true color image suite (PhotoCD PCD0992)[EB/OL]. 1993. http: //r0k.us/graphics/kodak.
[42] ASUNI N, GIACHETTI A. TESTIMAGES: a Large-scale Archive for Testing Visual Devices and Basic Image Processing Algorithms.[C]//STAG. 2014: 63-70.
[43] TODERICI G, SHI W, TIMOFTE R, et al. Workshop and challenge on learned image compression (clic2020)[C]//CVPR. 2020.
[44] TODERICI G, SHI W, TIMOFTE R, et al. Workshop and challenge on learned image compression (clic2021)[C]//CVPR. 2021.
[45] MA H, LIU D, XIONG R, et al. iWave: CNN-based wavelet-like transform for image compression[J]. IEEE Transactions on Multimedia, 2019, 22(7): 1667-1679.
[46] MA H, LIU D, YAN N, et al. End-to-end optimized versatile image compression with wavelet like transform[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 44(3): 1247-1263.
[47] LIU P, ZHANG H, ZHANG K, et al. Multi-level wavelet-CNN for image restoration[C]// Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2018: 773-782.
[48] LIU P, ZHANG H, LIAN W, et al. Multi-level wavelet convolutional neural networks[J]. IEEE Access, 2019, 7: 74973-74985.
[49] JEEVAN P, VISWANATHAN K, SETHI A. Wavemix-lite: A resource-efficient neural network for image analysis[A]. 2022.

所在学位评定分委会
电子信息
国内图书分类号
TP18
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/766068
专题工学院_电子与电气工程系
推荐引用方式
GB/T 7714
海博文. 端到端图像压缩空间区域优化研究[D]. 深圳. 南方科技大学,2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12232164-海博文-电子与电气工程(8320KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[海博文]的文章
百度学术
百度学术中相似的文章
[海博文]的文章
必应学术
必应学术中相似的文章
[海博文]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。