中文版 | English
题名

Learning to Adapt CLIP for Few-Shot Monocular Depth Estimation

作者
DOI
发表日期
2024
ISSN
2472-6737
ISBN
979-8-3503-1893-7
会议录名称
会议日期
3-8 Jan. 2024
会议地点
Waikoloa, HI, USA
摘要
Pre-trained Vision-Language Models (VLMs), such as CLIP, have shown enhanced performance across a range of tasks that involve the integration of visual and linguistic modalities. When CLIP is used for depth estimation tasks, the patches, divided from the input images, can be combined with a series of semantic descriptions of the depth information to obtain similarity results. The coarse estimation of depth is then achieved by weighting and summing the depth values, called depth bins, corresponding to the predefined semantic descriptions. The zero-shot approach circumvents the computational and time-intensive nature of traditional fully-supervised depth estimation methods. However, this method, utilizing fixed depth bins, may not effectively generalize as images from different scenes may exhibit distinct depth distributions. To address this challenge, we propose a few-shot-based method which learns to adapt the VLMs for monocular depth estimation to balance training costs and generalization capabilities. Specifically, it assigns different depth bins for different scenes, which can be selected by the model during inference. Additionally, we incorporate learnable prompts to preprocess the input text to convert the easily human-understood text into easily model-understood vectors and further enhance the performance. With only one image per scene for training, our extensive experiment results on the NYU V2 and KITTI dataset demonstrate that our method outperforms the previous state-of-the-art method by up to 10.6% in terms of MARE1.
学校署名
第一
相关链接[IEEE记录]
收录类别
引用统计
成果类型会议论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/789102
专题工学院_电子与电气工程系
作者单位
1.Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen, China
2.Pengcheng Laboratory
第一作者单位电子与电气工程系
第一作者的第一单位电子与电气工程系
推荐引用方式
GB/T 7714
Xueting Hu,Ce Zhang,Yi Zhang,et al. Learning to Adapt CLIP for Few-Shot Monocular Depth Estimation[C],2024.
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Xueting Hu]的文章
[Ce Zhang]的文章
[Yi Zhang]的文章
百度学术
百度学术中相似的文章
[Xueting Hu]的文章
[Ce Zhang]的文章
[Yi Zhang]的文章
必应学术
必应学术中相似的文章
[Xueting Hu]的文章
[Ce Zhang]的文章
[Yi Zhang]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。