题名 | Subject-Diffusion: Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning |
作者 | |
通讯作者 | Chen, Chen; Lu, Haonan |
DOI | |
发表日期 | 2024-07-13
|
会议名称 | SIGGRAPH 2024 Conference Papers
|
ISBN | 9798400705250
|
会议录名称 | |
会议日期 | July 28, 2024 - August 1, 2024
|
会议地点 | Denver, CO, United states
|
会议录编者/会议主办者 | ACM SIGGRAPH
|
出版地 | 1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES
|
出版者 | |
摘要 | Recent progress in personalized image generation using diffusion models has been significant. However, development in the area of open-domain and test-time fine-tuning-free personalized image generation is proceeding rather slowly. In this paper, we propose Subject-Diffusion, a novel open-domain personalized image generation model that, in addition to not requiring test-time fine-tuning, also only requires a single reference image to support personalized generation of single- or two-subjects in any domain. Firstly, we construct an automatic data labeling tool and use the LAION-Aesthetics dataset to construct a large-scale dataset consisting of 76M images and their corresponding subject detection bounding boxes, segmentation masks, and text descriptions. Secondly, we design a new unified framework that combines text and image semantics by incorporating coarse location and fine-grained reference image control to maximize subject fidelity and generalization. Furthermore, we also adopt an attention control mechanism to support two-subject generation. Extensive qualitative and quantitative results demonstrate that our method have certain advantages over other frameworks in single, multiple, and human-customized image generation. © 2024 ACM. |
关键词 | |
学校署名 | 其他
|
语种 | 英语
|
相关链接 | [来源记录] |
收录类别 | |
WOS研究方向 | Computer Science
|
WOS类目 | Computer Science, Artificial Intelligence
; Computer Science, Theory & Methods
|
WOS记录号 | WOS:001282218200075
|
EI入藏号 | 20243116789920
|
来源库 | EV Compendex
|
引用统计 | |
成果类型 | 会议论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/794446 |
专题 | 南方科技大学 |
作者单位 | 1.OPPO AI Center, ShenZhen, China 2.Southern University of Science and Technology, ShenZhen, China |
推荐引用方式 GB/T 7714 |
Ma, Jian,Liang, Junhao,Chen, Chen,et al. Subject-Diffusion: Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning[C]//ACM SIGGRAPH. 1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES:Association for Computing Machinery, Inc,2024.
|
条目包含的文件 | 条目无相关文件。 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论