南方科技大学知识苑(SUSTech KC): Subject-Diffusion: Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning

题名	Subject-Diffusion: Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
作者	Ma, Jian 1; Liang, Junhao2 ; Chen, Chen 1; Lu, Haonan 1
通讯作者	Chen, Chen; Lu, Haonan
DOI	10.1145/3641519.3657469
发表日期	2024-07-13
会议名称	SIGGRAPH 2024 Conference Papers
ISBN	9798400705250
会议录名称	Proceedings - SIGGRAPH 2024 Conference Papers
会议日期	July 28, 2024 - August 1, 2024
会议地点	Denver, CO, United states
会议录编者/会议主办者	ACM SIGGRAPH
出版地	1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES
出版者	Association for Computing Machinery, Inc
摘要	Recent progress in personalized image generation using diffusion models has been significant. However, development in the area of open-domain and test-time fine-tuning-free personalized image generation is proceeding rather slowly. In this paper, we propose Subject-Diffusion, a novel open-domain personalized image generation model that, in addition to not requiring test-time fine-tuning, also only requires a single reference image to support personalized generation of single- or two-subjects in any domain. Firstly, we construct an automatic data labeling tool and use the LAION-Aesthetics dataset to construct a large-scale dataset consisting of 76M images and their corresponding subject detection bounding boxes, segmentation masks, and text descriptions. Secondly, we design a new unified framework that combines text and image semantics by incorporating coarse location and fine-grained reference image control to maximize subject fidelity and generalization. Furthermore, we also adopt an attention control mechanism to support two-subject generation. Extensive qualitative and quantitative results demonstrate that our method have certain advantages over other frameworks in single, multiple, and human-customized image generation. © 2024 ACM.
关键词	Text-to-Image Personalization Open-Domain Diffusion
学校署名	其他
语种	英语
相关链接	[来源记录]
收录类别	CPCI-S ; EI
WOS研究方向	Computer Science
WOS类目	Computer Science, Artificial Intelligence ; Computer Science, Theory & Methods
WOS记录号	WOS:001282218200075
EI入藏号	20243116789920
来源库	EV Compendex
引用统计
成果类型	会议论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/794446
专题	南方科技大学
作者单位	1.OPPO AI Center, ShenZhen, China 2.Southern University of Science and Technology, ShenZhen, China
推荐引用方式 GB/T 7714	Ma, Jian,Liang, Junhao,Chen, Chen,et al. Subject-Diffusion: Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning[C]//ACM SIGGRAPH. 1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES:Association for Computing Machinery, Inc,2024.