中文版 | English
题名

Subject-Diffusion: Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning

作者
通讯作者Chen, Chen; Lu, Haonan
DOI
发表日期
2024-07-13
会议名称
SIGGRAPH 2024 Conference Papers
ISBN
9798400705250
会议录名称
会议日期
July 28, 2024 - August 1, 2024
会议地点
Denver, CO, United states
会议录编者/会议主办者
ACM SIGGRAPH
出版地
1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES
出版者
摘要
Recent progress in personalized image generation using diffusion models has been significant. However, development in the area of open-domain and test-time fine-tuning-free personalized image generation is proceeding rather slowly. In this paper, we propose Subject-Diffusion, a novel open-domain personalized image generation model that, in addition to not requiring test-time fine-tuning, also only requires a single reference image to support personalized generation of single- or two-subjects in any domain. Firstly, we construct an automatic data labeling tool and use the LAION-Aesthetics dataset to construct a large-scale dataset consisting of 76M images and their corresponding subject detection bounding boxes, segmentation masks, and text descriptions. Secondly, we design a new unified framework that combines text and image semantics by incorporating coarse location and fine-grained reference image control to maximize subject fidelity and generalization. Furthermore, we also adopt an attention control mechanism to support two-subject generation. Extensive qualitative and quantitative results demonstrate that our method have certain advantages over other frameworks in single, multiple, and human-customized image generation.
© 2024 ACM.
关键词
学校署名
其他
语种
英语
相关链接[来源记录]
收录类别
WOS研究方向
Computer Science
WOS类目
Computer Science, Artificial Intelligence ; Computer Science, Theory & Methods
WOS记录号
WOS:001282218200075
EI入藏号
20243116789920
来源库
EV Compendex
引用统计
成果类型会议论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/794446
专题南方科技大学
作者单位
1.OPPO AI Center, ShenZhen, China
2.Southern University of Science and Technology, ShenZhen, China
推荐引用方式
GB/T 7714
Ma, Jian,Liang, Junhao,Chen, Chen,et al. Subject-Diffusion: Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning[C]//ACM SIGGRAPH. 1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES:Association for Computing Machinery, Inc,2024.
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Ma, Jian]的文章
[Liang, Junhao]的文章
[Chen, Chen]的文章
百度学术
百度学术中相似的文章
[Ma, Jian]的文章
[Liang, Junhao]的文章
[Chen, Chen]的文章
必应学术
必应学术中相似的文章
[Ma, Jian]的文章
[Liang, Junhao]的文章
[Chen, Chen]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。