南方科技大学知识苑(SUSTech KC): 基于深度学习的鲁棒图像和视频水印

题名	基于深度学习的鲁棒图像和视频水印
其他题名	DEEP LEARNING-BASED ROBUST DIGITAL IMAGE AND VIDEO WATERMARKING
姓名	叶冠辉
姓名拼音	YE Guanhui
学号	12132370
学位类型	硕士
学位专业	0812 计算机科学与技术
学科门类/专业学位类别	08 工学
导师	危学涛
导师单位	计算机科学与工程系
论文答辩日期	2024-05-12
论文提交日期	2024-06-25
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	当前的多媒体数字作品版权保护和内容认证问题日益严峻，数字水印技术作为一种有效的解决方案备受关注。近年来，基于深度学习的数字水印新方法逐渐兴起，展现出优异的性能。本文针对图像和视频数字水印中的鲁棒性和不可见性挑战，提出了两种创新的基于深度学习的鲁棒水印算法。第一种算法是基于可逆神经网络的鲁棒数字图像水印方法。该水印方法能够在保持图像质量的前提下，实现鲁棒水印的嵌入和提取，其创新之处在于鲁棒水印特征生成与可逆神经网络之间的协同作用。通过消息处理器生成具有冗余和纠错能力的水印特征，与可逆神经网络强大的信息嵌入和提取能力相结合，从而实现更高的鲁棒性和不可见性。此外，水印信息被嵌入于离散小波变换域，并且设计了$LL$子带损失函数，将更多水印信息嵌入图像高频部分，进一步增强了水印的不可见性。第二种算法是面向视频的数字水印方法。本文重新审视了基于神经网络实现图像/视频水印嵌入和提取的过程，结果发现，无论图像还是视频，在进行数字化水印嵌入时，本质上都是基于像素数据分布的嵌入任务，并非必须考虑视频的时间维度信息。因此，本文提出一种可以将基于深度学习的图像水印方法高效扩展应用到视频水印中的方法。该方法的关键创新在于，通过合并视频的时间和通道维度，将视频视为一种多通道图像进行处理。与独立给每个视频帧添加水印不同，所提出的水印方法将整个视频片段输入到神经网络，充分利用了连续视频帧之间的冗余信息。此外，本文在水印神经网络结构中引入不同的时空卷积模块，探索空间信息与时间信息对视频水印任务的影响。研究发现，空间信息是影响水印嵌入的主要因素。总的来说，本文提出的两种基于深度学习的鲁棒数字水印算法，较好地解决了现有水印算法存在的不足，在提升水印鲁棒性的同时也保证了良好的不可见性，为版权保护和数字作品认证等应用场景提供了有力支持。
其他摘要	Protecting copyrights and content authentication for multimedia digital works have become increasingly critical issues, leading to significant attention being paid to digital watermarking techniques as an effective solution. In recent years, novel deep learning-based digital watermarking methods have emerged, demonstrating superior performance. This thesis addresses the challenges of robustness and invisibility in image and video digital watermarking by proposing two innovative deep learning-based robust watermarking algorithms. The first algorithm is a robust digital image watermarking method based on the invertible neural network. This watermarking method can achieve robust embedding and extraction of watermarks while maintaining the quality of the original image. Its innovation lies in the synergy between the invertible neural network and the generation of robust watermark features. By generating watermark features with redundancy and error correction capabilities through a message processor, combined with the powerful information embedding and extraction capabilities of the invertible neural network, higher robustness and invisibility are achieved. Besides, the watermark information is embedded in the discrete wavelet transform domain, and a loss function for the $LL$ sub-band is designed, embedding more watermark information into the high-frequency part of the image to further enhance the invisibility of the watermark. The second algorithm is a robust digital watermarking method for videos. This thesis revisited the image/video watermark embedding and extraction process through neural networks. Our findings indicate that, for both images and videos, embedding a digital watermark fundamentally relies on pixel data distribution rather than necessarily considering temporal dimensions in videos. Hence, this thesis proposes an approach to efficiently extend deep-learning-based image watermarking techniques to video content. The key innovation lies in treating videos as multi-channel images by merging their time and channel dimensions, which differs from adding watermarks to each frame separately. Our method exhibits outstanding robustness by inputting entire video segments into the neural network. Moreover, this thesis has incorporated different spatiotemporal convolutional modules within the network structure to investigate how spatial and temporal information impacts video watermarking tasks. Our research shows that spatial information plays a significant role in effective watermark embedding. In summary, the two deep learning-based robust digital watermarking algorithms proposed in this thesis effectively address the limitations of existing watermarking algorithms and achieve enhanced watermark robustness while ensuring excellent invisibility, providing robust support for applications such as copyright protection and digital content authentication.
关键词	数字水印鲁棒性不可见性深度学习可逆神经网络
语种	中文
培养类别	独立培养
入学年份	2021
学位授予年份	2024-07
参考文献列表	[1] WAN W, WANG J, ZHANG Y, et al. A comprehensive survey on robust image watermarking [J]. Neurocomputing, 2022, 488: 226-247. [2] 王翌妃, 周杨铭, 钱振兴, 等. 鲁棒视频水印研究进展[M]. 中国图象图形学报, 2022. [3] VAN SCHYNDEL R G, TIRKEL A Z, OSBORNE C F. A digital watermark[C]//Proceedings of 1st international conference on image processing: volume 2. IEEE, 1994: 86-90. [4] WANG R Z, LIN C F, LIN J C. Image hiding by optimal LSB substitution and genetic algorithm [J]. Pattern recognition, 2001, 34(3): 671-683. [5] LEE G J, YOON E J, YOO K Y. A new LSB based digital watermarking scheme with random mapping function[C]//2008 International Symposium on Ubiquitous Multimedia Computing. IEEE, 2008: 130-134. [6] DEHKORDI A B, ESFAHANI S N, AVANAKI A N. Robust LSB watermarking optimized for local structural similarity[C]//2011 19th Iranian Conference on Electrical Engineering. IEEE, 2011: 1-6. [7] HEIDARI S, NASERI M. A novel LSB based quantum watermarking[J]. International Journal of Theoretical Physics, 2016, 55(10): 4205-4218. [8] KUMAR A. A review on implementation of digital image watermarking techniques using LSB and DWT[J]. Information and Communication Technology for Sustainable Development, 2020: 595-602. [9] FAZLI S, KHODAVERDI G. Trade-off between imperceptibility and robustness of LSB wa- termarking using SSIM quality metrics[C]//2009 Second International Conference on Machine Vision. IEEE, 2009: 101-104. [10] HAMIDI M, HAZITI M E, CHERIFI H, et al. Hybrid blind robust image watermarking tech- nique based on DFT-DCT and Arnold transform[J]. Multimedia Tools and Applications, 2018, 77(20): 27181-27214. [11] GUO H, GEORGANAS N D. Digital image watermarking for joint ownership verification without a trusted dealer[C]//2003 International Conference on Multimedia and Expo. ICME’03. Proceedings (Cat. No. 03TH8698): volume 2. IEEE, 2003: II-497. [12] RUANAIDH J, DOWLING W, BOLAND F M. Phase watermarking of digital images[C]// Proceedings of 3rd IEEE International Conference on Image Processing: volume 3. IEEE, 1996: 239-242. [13] CHEN B, WORNELL G W. Digital watermarking and information embedding using dither modulation[C]//1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No. 98EX175). IEEE, 1998: 273-278. [14] CHEN B, WORNELL G W. Quantization index modulation: A class of provably good methods for digital watermarking and information embedding[J]. IEEE Transactions on Information theory, 2001, 47(4): 1423-1443. [15] LI Q, DOËRR G, COX I J. Spread transform dither modulation using a perceptual model[C]// 2006 IEEE Workshop on Multimedia Signal Processing. IEEE, 2006: 98-102. [16] GIRI K J, QUADRI S, BASHIR R, et al. DWT based color image watermarking: a review[J]. Multimedia Tools and Applications, 2020, 79(43): 32881-32895. [17] LI C, ZHANG Z, WANG Y, et al. Dither modulation of significant amplitude difference for wavelet based robust watermarking[J]. Neurocomputing, 2015, 166: 404-415. [18] GOURRAME K, DOUZI H, HARBA R, et al. Robust print-cam image watermarking in fourier domain[C]//International Conference on Image and Signal Processing. Springer, 2016: 356- 365. [19] ZHU J, KAPLAN R, JOHNSON J, et al. Hidden: Hiding data with deep networks[C]// Proceedings of the European conference on computer vision (ECCV). 2018: 657-672. [20] JIA Z, FANG H, ZHANG W. Mbrs: Enhancing robustness of dnn-based watermarking by mini- batch of real and simulated jpeg compression[C]//Proceedings of the 29th ACM International Conference on Multimedia. 2021: 41-49. [21] MA R, GUO M, HOU Y, et al. Towards Blind Watermarking: Combining Invertible and Non- invertible Mechanisms[C]//Proceedings of the 30th ACM International Conference on Multi- media. 2022: 1532-1542. [22] LIU Y, GUO M, ZHANG J, et al. A novel two-stage separable deep learning framework for practical blind watermarking[C]//Proceedings of the 27th ACM International Conference on Multimedia. 2019: 1509-1517. [23] AHMADI M, NOROUZI A, KARIMI N, et al. ReDMark: Framework for residual diffusion wa- termarking based on deep networks[J]. Expert Systems with Applications, 2020, 146: 113157. [24] SHIN R, SONG D. Jpeg-resistant adversarial images[C]//NIPS 2017 Workshop on Machine Learning and Computer Security: volume 1. 2017: 8. [25] LUO X, ZHAN R, CHANG H, et al. Distortion agnostic deep watermarking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 13548-13557. [26] ZHANG C, KARJAUV A, BENZ P, et al. Towards Robust Deep Hiding Under Non- Differentiable Distortions for Practical Blind Watermarking[C]//Proceedings of the 29th ACM International Conference on Multimedia. 2021: 5158-5166. [27] TANCIK M, MILDENHALL B, NG R. Stegastamp: Invisible hyperlinks in physical pho- tographs[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition. 2020: 2117-2126. [28] FANG H, JIA Z, MA Z, et al. PIMoG: An Effective Screen-shooting Noise-Layer Simulation for Deep-Learning-Based Watermarking Network[C]//Proceedings of the 30th ACM International Conference on Multimedia. 2022: 2267-2275. [29] WENGROWSKI E, DANA K. Light field messaging with deep photographic steganography [C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 1515-1524. [30] LUO X, LI Y, CHANG H, et al. DVMark: a deep multiscale framework for video watermarking [A]. 2021. [31] WENG X, LI Y, CHI L, et al. High-capacity convolutional video steganography with tempo- ral residual modeling[C]//Proceedings of the 2019 on international conference on multimedia retrieval. 2019: 87-95. [32] ZHANG K A, XU L, CUESTA-INFANTE A, et al. Robust invisible video watermarking with attention[A]. 2019. [33] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 40(4): 834-848. [34] ALMOHAMMAD A, GHINEA G. Stego image quality and the reliability of PSNR[C]//2010 2nd International Conference on Image Processing Theory, Tools and Applications. IEEE, 2010: 215-220. [35] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE transactions on image processing, 2004, 13(4): 600-612. [36] SHENSA M J, et al. The discrete wavelet transform: wedding the a trous and Mallat algorithms [J]. IEEE Transactions on signal processing, 1992, 40(10): 2464-2482. [37] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE confer- ence on computer vision and pattern recognition. 2018: 7132-7141. [38] DINH L, KRUEGER D, BENGIO Y. Nice: Non-linear independent components estimation[A]. 2014. [39] ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2223-2232. [40] ARDIZZONE L, LÜTH C, KRUSE J, et al. Guided image generation with conditional invertible neural networks[A]. 2019. [41] XIAO M, ZHENG S, LIU C, et al. Invertible image rescaling[C]//European Conference on Computer Vision. Springer, 2020: 126-144. [42] JING J, DENG X, XU M, et al. HiNet: deep image hiding by invertible network[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 4733-4742. [43] LU S P, WANG R, ZHONG T, et al. Large-capacity image steganography based on invertible neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 10816-10825. [44] XU H B, WANG R, WEI J, et al. A Compact Neural Network-based Algorithm for Robust Image Watermarking[A]. 2021. [45] ZEBBICHE K, KHELIFI F. Eﬀicient wavelet-based perceptual watermark masking for robust fingerprint image watermarking[J]. IET Image Processing, 2014, 8(1): 23-32. [46] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft coco: Common objects in context[C]// European conference on computer vision. Springer, 2014: 740-755. [47] COLLOBERT R, KAVUKCUOGLU K, FARABET C. Torch7: A matlab-like environment for machine learning[C]//BigLearn, NIPS workshop: CONF. 2011. [48] CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? a new model and the kinetics dataset[C]//proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 6299-6308. [49] LIU S, XU D, ZHOU S K, et al. 3d anisotropic hybrid network: Transferring convolutional features from 2d images to 3d anisotropic volumes[A]. 2017. [50] TAJBAKHSH N, SHIN J Y, GURUDU S R, et al. Convolutional neural networks for medical image analysis: Full training or fine tuning?[J]. IEEE transactions on medical imaging, 2016, 35(5): 1299-1312. [51] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[J]. Advances in neural information processing systems, 2014, 27. [52] TRAN D, WANG H, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018: 6450-6459. [53] HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: Eﬀicient convolutional neural networks for mobile vision applications[A]. 2017. [54] TRAN D, WANG H, TORRESANI L, et al. Video classification with channel-separated con- volutional networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 5552-5561. [55] JI S, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition [J]. IEEE transactions on pattern analysis and machine intelligence, 2012, 35(1): 221-231. [56] FANG H, CHEN K, QIU Y, et al. DeNoL: A Few-Shot-Sample-Based Decoupling Noise Layer for Cross-channel Watermarking Robustness[C]//Proceedings of the 31st ACM International Conference on Multimedia. 2023: 7345-7353. [57] CHEN D, YUAN L, LIAO J, et al. Stylebank: An explicit representation for neural image style transfer[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1897-1906. [58] KINGMA D P, BA J. Adam: A method for stochastic optimization[A]. 2014. [59] CARREIRA J, NOLAND E, BANKI-HORVATH A, et al. A short note about kinetics-600[A]. 2018. [60] STERGIOU A, POPPE R. Adapool: Exponential adaptive pooling for information-retaining downsampling[J]. IEEE Transactions on Image Processing, 2022, 32: 251-266. [61] THE OBS PROJECT CONTRIBUTORS. OBS Studio[M]. 2023. [62] JIN X, DANG F, FU Q A, et al. StreamingTag: a scalable piracy tracking solution for mo- bile streaming services[C]//Proceedings of the 28th Annual International Conference on Mobile Computing And Networking. 2022: 596-608.
所在学位评定分委会	电子科学与技术
国内图书分类号	TP18
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/766067
专题	工学院_计算机科学与工程系
推荐引用方式 GB/T 7714	叶冠辉. 基于深度学习的鲁棒图像和视频水印[D]. 深圳. 南方科技大学,2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
12132370-叶冠辉-计算机科学与工（3307KB）	--	--	限制开放	--	请求全文