中文版 | English
题名

高保真人脸超分辨率重建研究

其他题名
RESEARCH ON HIGH­FIDELITY FACE SUPER­RESOLUTION
姓名
姓名拼音
LI Zelin
学号
12132338
学位类型
硕士
学位专业
0809 电子科学与技术
学科门类/专业学位类别
08 工学
导师
曾丹
导师单位
计算机科学与工程系
论文答辩日期
2024-05-12
论文提交日期
2024-06-25
学位授予单位
南方科技大学
学位授予地点
深圳
摘要

随着城市化的快速推进,人脸算法的应用日益普及,涵盖了城市监控、移动刷脸支付以及自媒体内容创作等诸多领域。这些应用带来了大量的人脸图像数据,但这些图像往往会受到不同程度的质量劣化。在此背景下,人脸超分辨率(Face Super-resolution)技术应运而生,它旨在修复受损人脸图像,消除劣化,提供高质量的人脸图像,为人脸算法的应用提供支持。人脸超分辨率的核心任务是生成高保真的人脸修复结果,包括提高图像分辨率、解决噪声、模糊以及图像压缩带来的问题。为了实现这一目标,这一领域经历了从利用简单的人脸形状先验、身份先验到利用人脸生成模型先验的过程。方法的性能也从最初只能提供128 × 128 分辨率的修复结果进步到能够提供512 × 512高清分辨率的程度。而本文介绍了本课题在研究人脸超分辨率问题上的进展,可以分成三个部分。
  
  本课题首先探究了人脸先验在人脸超分任务上的重要性。提出了一种结合了人脸形状先验和身份先验的超分辨率架构——CSRNet,该架构采用级联结构充分利用了两种先验信息,本文通过实验验证了该方法相较使用单一先验的方法的优越性,验证了方法在利用先验信息上的有效性,论证了这两种先验信息的互补性。
  
  接着,本课题探讨了人脸修复方法存在的属性偏移问题,结合实验分析了其成因,根据应用场景提出了对应的解决方案——DebiasFR。DebiasFR利用生成模型的可编辑性,在隐空间中学习属性表征,利用表征调节修复结果的属性,有效缓解了属性偏移问题。同时属性表征的设计使得方法支持对修复结果做交互式调整。
  
  最后,本课题探讨了在真实场景中经常出现但学术界较少关注的大姿态人脸修复问题,并分析了现有方法在此问题上性能下降的原因。我们探索了使用三维人脸先验和自然图像生成先验的求解方案,分析了三维人脸先验失败的原因,论证了基于自然图像生成先验和对抗学习的方法在解决该问题上的可行性。
  

其他摘要

With the rapid advancement of urbanization, the utilization of facial algorithms is becoming increasingly widespread, including areas such as urban surveillance, mobile face recognition payments, and content creation in social media. These applications bring forth a wealth of facial image data; however, these images often suffer from varying degrees of quality degradation. In this context, Face Super-resolution (FSR) technology has emerged as a crucial facet of facial image processing. Its aim is to restore damaged facial images, mitigate degradation, and provide high-quality facial images to support facial algorithm applications. The core task of FSR is to generate high-fidelity facial restoration results, including enhancing image resolution, removing noise, blur, and addressing image compression issues. To achieve this goal, this field has evolved from incorporating simple facial shape priors and identity priors to priors from facial generative models. The performance of these methods has progressed from initially providing restoration results at 128 × 128 resolution to reaching a level capable of delivering 512 × 512 high-definition resolution. This paper introduces the progress of this study on the facial super-resolution problem, which can be divided into three parts.
  
  Firstly, this study explores the importance of facial priors in facial super-resolution tasks. A super-resolution architecture called CSRNet, which combines facial shape priors and identity priors, is proposed. This architecture fully utilizes the two kinds of prior information through a cascaded structure. Through experiments, this paper verifies the superiority of this method compared to using a single prior, validates the effectiveness of the method in utilizing prior information, and demonstrates the complementarity of these two types of prior information.
  
  Next, this study discusses the problem of attribute bias in face restoration methods, analyzes its causes through experimental analysis, and proposes corresponding solutions DebiasFR based on application scenarios. DebiasFR utilizes the editability of generative models to learn attribute representations in latent space and uses these representations to adjust the attributes of restoration results, effectively alleviating attribute bias. Meanwhile, the design of attribute representations enables the method to support interactive adjustments of restoration results.
  
  Finally, this study discusses the problem of large-posed face restoration, which is frequently encountered in real scenes but receives less attention in the academic community. The reasons for the performance degradation of existing methods on this problem are analyzed. Solutions using 3D facial prior and natural image generative prior are explored, the reasons for the failure of 3D facial prior are analyzed, and the feasibility of methods based on natural image generative prior and adversarial learning in solving this problem is demonstrated.

关键词
其他关键词
语种
中文
培养类别
独立培养
入学年份
2021
学位授予年份
2024-07
参考文献列表

[1] ZHANG K, ZUO W, CHEN Y, et al. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising[J]. IEEE transactions on image processing, 2017, 26(7): 3142­3155.
[2] KUPYN O, BUDZAN V, MYKHAILYCH M, et al. Deblurgan: Blind motion deblurring using conditional adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8183­8192.
[3] SHEN Z, LAI W S, XU T, et al. Deep semantic face deblurring[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8260­8269.
[4] DONG C, DENG Y, LOY C C, et al. Compression artifacts reduction by a deep convolutional network[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 576­584.
[5] CHEN Y, TAI Y, LIU X, et al. Fsrnet: End­to­end learning face super­resolution with facial priors[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 2492­2501.
[6] DONG C, LOY C C, HE K, et al. Learning a deep convolutional network for image super resolution[C]//Proceedings of the European Conference on Computer Vision. Springer, 2014:184­199.
[7] LIM B, SON S, KIM H, et al. Enhanced deep residual networks for single image super-resolution[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017: 136­144.
[8] ZHAO Y, SU Y C, CHU C T, et al. Rethinking Deep Face Restoration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 7652­7661.
[9] LI X, CHEN C, ZHOU S, et al. Blind face restoration via deep multi­scale component dictionaries[C]//Proceedings of the European Conference on Computer Vision. 2020: 399­415.
[10] ZHU F, ZHU J, CHU W, et al. Blind Face Restoration via Integrating Face Shape and GenIterative Priors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 7662­7671.
[11] GU Y, WANG X, XIE L, et al. VQFR: Blind Face Restoration with Vector­Quantized Dictionary and Parallel Decoder[C]//Proceedings of the European Conference on Computer Vision. 2022: 126­143.
[12] GOODFELLOW I, POUGET­ABADIE J, MIRZA M, et al. Generative adversarial nets[J]. Advances in neural information processing systems, 2014, 27.
[13] KARRAS T, AILA T, LAINE S, et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation[C]//Proceedings of the the International Conference on Learning Representations. 2018.
[14] KARRAS T, LAINE S, AILA T. A style­based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 4401­4410.
[15] SONG J, MENG C, ERMON S. Denoising Diffusion Implicit Models[J]. CoRR, 2020,abs/2010.02502.
[16] DHARIWAL P, NICHOL A. Diffusion Models Beat GANs on Image Synthesis[J]. CoRR, 2021, abs/2105.05233.
[17] BULAT A, TZIMIROPOULOS G. Super­fan: Integrated facial landmark localization and super­resolution of real­world low resolution faces in arbitrary poses with gans[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 109­117.
[18] YIN Y, ROBINSON J P, ZHANG Y, et al. Joint Super­Resolution and Alignment of Tiny Faces [C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2020: 12693­12700.
[19] MA C, JIANG Z, RAO Y, et al. Deep face super­resolution with iterative collaboration between attentive recovery and landmark estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 5569­5578.
[20] HU X, REN W, LAMASTER J, et al. Face super­resolution guided by 3d facial priors[C]// European Conference on Computer Vision. Springer, 2020: 763­780.
[21] ZHANG K, ZHANG Z, CHENG C W, et al. Super­identity convolutional neural network for face hallucination[C]//Proceedings of the European Conference on Computer Vision. 2018: 183­198.
[22] Grm K, Scheirer W J, Štruc V. Face Hallucination Using Cascaded Super­Resolution and Identity Priors[J]. IEEE Transactions on Image Processing, 2020, 29: 2150­2165.
[23] CHEN J, CHEN J, WANG Z, et al. Identity­aware face super­resolution for low­resolution face recognition[J]. IEEE Signal Processing Letters, 2020, 27: 645­649.
[24] LEDIG C, THEIS L, HUSZÁR F, et al. Photo­realistic single image super­resolution using a generative adversarial network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 4681­4690.
[25] MENON S, DAMIAN A, HU S, et al. Pulse: Self­supervised photo upsampling via latent space exploration of generative models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 2437­2445.
[26] WANG X, LI Y, ZHANG H, et al. Towards Real­World Blind Face Restoration with Generative Facial Prior[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 9168­9178.
[27] CHAN K C, WANG X, XU X, et al. Glean: Generative latent bank for large­factor image super­resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 14245­14254.
[28] YANG T, REN P, XIE X, et al. GAN Prior Embedded Network for Blind Face Restoration in the Wild[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 672­681.
[29] HE J, SHI W, CHEN K, et al. GCFSR: a Generative and Controllable Face SuperResolution Method Without Facial and GAN Priors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 1889­1898.
[30] WANG Y, HU Y, ZHANG J. Panini­Net: GAN Prior Based Degradation­Aware Feature Interpolation for Face Restoration[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2022: 2576­2584.
[31] ESSER P, ROMBACH R, OMMER B. Taming transformers for high­resolution image synthesis [M]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 12873­12883.
[32] WANG Z, ZHANG J, CHEN R, et al. RestoreFormer: High­Quality Blind Face Restoration from Undegraded Key­Value Pairs[M]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 17512­17521.
[33] ZHOU S, CHAN K C, LI C, et al. Towards Robust Blind Face Restoration with Codebook Lookup TransFormer[C]//Advances in Neural Information Processing Systems. 2022: 30599­30611.
[34] YUE Z, LOY C C. DifFace: Blind Face Restoration with Diffused Error Contraction[J]. CoRR, 2022, abs/2212.06512.
[35] WANG Z, ZHANG Z, ZHANG X, et al. DR2: Diffusion­Based Robust Degradation Remover for Blind Face Restoration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 1704­1713.
[36] YANG P, ZHOU S, TAO Q, et al. PGDiff: Guiding Diffusion Models for Versatile Face Restoration via Partial Guidance[C]//Advances in Neural Information Processing Systems. 2023.
[37] ZHAO Y, HOU T, SU Y, et al. Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond[C]//Proceedings of the IEEE International Conference on Computer Vision. 2023: 7278­7288.
[38] QIU X, HAN C, ZHANG Z, et al. DiffBFR: Bootstrapping Diffusion Model for Blind Face Restoration[C]//Proceedings of the ACM International Conference on Multimedia. ACM, 2023: 7785­7795.
[39] WANG J, YUE Z, ZHOU S, et al. Exploiting Diffusion Prior for Real­World Image Super Resolution[J]. CoRR, 2023, abs/2305.07015.
[40] LIN X, HE J, CHEN Z, et al. DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior[J]. CoRR, 2023, abs/2308.15070.
[41] SCHUHMANN C, BEAUMONT R, VENCU R, et al. Laion­5b: An open large­scale dataset for training next generation image­text models[J]. Advances in Neural Information Processing Systems, 2022, 35: 25278­25294.
[42] ROMBACH R, BLATTMANN A, LORENZ D, et al. High­resolution image synthesis with latent diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 10684­10695.
[43] WANG X, XIE L, DONG C, et al. Real­ESRGAN: Training Real­World Blind Super­Resolution with Pure Synthetic Data[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops. 2021: 1905­1914
[44] ZHANG K, LIANG J, VAN GOOL L, et al. Designing a Practical Degradation Model for DeepBlind Image Super­Resolution[C]//IEEE International Conference on Computer Vision. 2021: 4791­4800.
[45] LUO Z, HUANG Y, LI S, et al. Learning the Degradation Distribution for Blind Image Super-Resolution[J]. CoRR, 2022, abs/2203.04962.
[46] LI X, CHEN C, LIN X, et al. From Face to Natural Image: Learning Real Degradation for Blind Image Super­Resolution[C]//Proceedings of the European Conference on Computer Vision. 2022: 376­392.
[47] LIU Z, LUO P, WANG X, et al. Deep Learning Face Attributes in the Wild[C]//Proceedings ofthe IEEE International Conference on Computer Vision. 2015: 3730­3738.
[48] F. ZHOU J B, LIN Z. Exemplar­based Graph Matching for Robust Facial Landmark Localization[C]//Proceedings of the IEEE International Conference on Computer Vision. 2013: 1025­1032.
[49] YI D, LEI Z, LIAO S, et al. Learning face representation from scratch[A]. 2014.
[50] LEARNED­MILLER G B H E. Labeled Faces in the Wild: Updates and New Reporting Procedures: UM­CS­2014­003[R]. University of Massachusetts, Amherst, 2014.
[51] ZHANG R, ISOLA P, EFROS A A, et al. The Unreasonable Effectiveness of Deep Features asa Perceptual Metric[C]//Proceedings of the IEEE Conference on Computer Vision and PatternRecognition. 2018: 586­595.
[52] DENG J, GUO J, VERVERAS E, et al. Retinaface: Single­shot multi­level face localisation in the wild[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 5203­5212.
[53] ZHANG K, ZHANG Z, LI Z, et al. Joint face detection and alignment using multitask cascadedconvolutional networks[J]. IEEE Signal Processing Letters, 2016, 23(10): 1499­1503.
[54] NEWELL A, YANG K, DENG J. Stacked hourglass networks for human pose estimation[C]//Proceedings of the European Conference on Computer Vision. Springer, 2016: 483­499.
[55] SCHROFF F, KALENICHENKO D, PHILBIN J. Facenet: A unified embedding for face recognition and clustering[C]//Proceedings of the IEEE Conference on Computer Vision and PatternRecognition. 2015: 815­823.
[56] SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception­v4, inception­resnet and the impact of residual connections on learning[C]//Proceedings of the AAAI Conference on ArtificialIntelligence: volume 31. 2017.
[57] DENG J, GUO J, XUE N, et al. Arcface: Additive angular margin loss for deep face recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019: 4690­4699.
[58] CHEN Y, TAI Y, LIU X, et al. FSRNet: End­to­end learning face super­resolution with facialpriors[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018: 2492­2501.
[59] WANG Y, PERAZZI F, MCWILLIAMS B, et al. A fully progressive approach to singleimage super­resolution[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 864­873
[60] MIRZA M, OSINDERO S. Conditional Generative Adversarial Nets[J]. CoRR, 2014,abs/1411.1784.
[61] ODENA A, OLAH C, SHLENS J. Conditional image synthesis with auxiliary classifier gans[C]//Proceedings of the International Conference on Machine Learning. 2017: 2642­2651.
[62] LE V, BRANDT J, LIN Z, et al. Interactive facial feature localization[C]//Proceedings of theEuropean Conference on Computer Vision. Springer, 2012: 679­692.
[63] SMITH B M, ZHANG L, BRANDT J, et al. Exemplar­based face parsing[C]//Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition. 2013: 3484­3491.
[64] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility tostructural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600­612.
[65] LENG J, WANG Y. RCNet: Recurrent Collaboration Network Guided by Facial Priors forFace Super­Resolution[C]//Proceedings of the IEEE International Conference on Multimediaand Expo. IEEE, 2022: 01­06.
[66] LIU S, XIONG C, SHI X, et al. Progressive face super­resolution with cascaded recurrentconvolutional network[J]. Neurocomputing, 2021, 449: 357­367.
[67] LAI W S, HUANG J B, AHUJA N, et al. Deep laplacian pyramid networks for fast and accurate super­resolution[C]//Proceedings of the IEEE Conference on Computer Vision and PatternRecognition. 2017: 624­632.
[68] KIM J, LEE J K, LEE K M. Accurate image super­resolution using very deep convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016: 1646­1654.
[69] MEI Y, FAN Y, ZHOU Y. Image super­resolution with non­local sparse attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021:3517­3526.
[70] HASSAN B, IZQUIERDO E, PIATRIK T. Soft biometrics: a survey[J]. Multimedia Tools andApplications, 2021: 1­44.
[71] ROTHE R, TIMOFTE R, GOOL L V. Deep expectation of real and apparent age from a singleimage without facial landmarks[J]. International Journal of Computer Vision, 2018, 126(2­4):144­157.
[72] CHEN C, LI X, YANG L, et al. Progressive Semantic­Aware Style Transformation for BlindFace Restoration[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 11896­11905.
[73] SHEN Y, GU J, TANG X, et al. Interpreting the latent space of gans for semantic face editing[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020: 9243­9252.
[74] HÄRKÖNEN E, HERTZMANN A, LEHTINEN J, et al. GANSpace: Discovering InterpretableGAN Controls[C]//Advances in Neural Information Processing Systems. 2020: 9841­9850.
[75] SHEN Y, ZHOU B. Closed­form factorization of latent semantics in gans[C]//Proceedings ofthe IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 1532­1540.
[76] SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large­Scale ImageRecognition[C]//Proceedings of the the International Conference on Learning Representations.2015.
[77] GONDAL M W, SCHÖLKOPF B, HIRSCH M. The unreasonable effectiveness of texturetransfer for single image super­resolution[C]//Proceedings of the European Conference on Computer Vision. 2018: 80­97.
[78] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from naturallanguage supervision[C]//Proceedings of the International Conference on Machine Learning.2021: 8748­8763.
[79] COURBARIAUX M, BENGIO Y, DAVID J. BinaryConnect: Training Deep Neural Networkswith binary weights during propagations[C]//Advances in Neural Information Processing Systems. 2015.
[80] OR­EL R, SENGUPTA S, FRIED O, et al. Lifespan age transformation synthesis[C]//Proceedings of the European Conference on Computer Vision. 2020: 739­755.
[81] HUANG Z, SHAN S, WANG R, et al. A Benchmark and Comparative Study of Video­BasedFace Recognition on COX Face Database[J]. IEEE Transactions on Image Processing, 2015,24(12): 5967­5981.
[82] LEE C H, ZHANG K, LEE H C, et al. Attribute augmented convolutional neural network forface hallucination[C]//Proceedings of the IEEE Conference on Computer Vision and PatternRecognition Workshops. 2018: 721­729.
[83] CHAN E R, LIN C Z, CHAN M A, et al. Efficient Geometry­aware 3D Generative Adversarial Networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and PatternRecognition. 2022: 16123­16133.
[84] LI X, LIU M, YE Y, et al. Learning warped guidance for blind face restoration[C]//Proceedingsof the European Conference on Computer Vision. 2018: 272­289.
[85] MEISHVILI G, JENNI S, FAVARO P. Learning to have an ear for face super­resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020:1364­1374

所在学位评定分委会
电子科学与技术
国内图书分类号
TP391.41
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/766028
专题南方科技大学
工学院_计算机科学与工程系
推荐引用方式
GB/T 7714
黎泽林. 高保真人脸超分辨率重建研究[D]. 深圳. 南方科技大学,2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12132338-黎泽林-计算机科学与工(9462KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[黎泽林]的文章
百度学术
百度学术中相似的文章
[黎泽林]的文章
必应学术
必应学术中相似的文章
[黎泽林]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。