题名 | Hierarchical Spatial-temporal Masked Contrast for skeleton action recognition |
作者 | |
发表日期 | 2024
|
DOI | |
发表期刊 | |
ISSN | 2691-4581
|
卷号 | PP期号:99 |
摘要 | In the field of 3D action recognition, self-supervised learning has shown promising results but remains a challenging task. Previous approaches to motion modeling often relied on selecting features solely from the temporal or spatial domain, which limited the extraction of higher-level semantic information. Additionally, traditional one-to-one approaches in multilevel comparative learning overlooked the relationships between different levels, hindering the learning representation of the model. To address these issues, we propose the Hierarchical Spatial-temporal Masked network (HSTM) for learning 3D action representations. HSTM introduces a novel masking method that operates simultaneously in both the temporal and spatial dimensions. This approach leverages semantic relevance to identify meaningful regions in time and space, guiding the masking process based on semantic richness. This guidance is crucial for learning useful feature representations effectively. Furthermore, to enhance the learning of potential features, we introduce cross-level distillation (CLD) to extend the comparative learning approach. By training the model with two types of losses simultaneously, each level of the multi-level comparative learning process can be guided by levels rich in semantic information. This allows for more effective supervision of comparative learning, leading to improved performance. Extensive experiments conducted on the NTU-60, NTU-120, and PKU-MMD datasets demonstrate the effectiveness of our proposed framework. The learned action representations exhibit strong transferability and achieve state-of- the-art results. |
相关链接 | [IEEE记录] |
学校署名 | 其他
|
引用统计 | |
成果类型 | 期刊论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/803257 |
专题 | 工学院_电子与电气工程系 |
作者单位 | 1.State Key Laboratory of Radio Frequency Heterogeneous Integration, Shenzhen University 2.Department of Electronic and Electrical Engineering, Southern University of Science and Technology 3.Mechanical Design Engineering, UTBM Université de Technologie de Belfort-Montbéliard |
推荐引用方式 GB/T 7714 |
Wenming Cao,Aoyu Zhang,Zhihai He,et al. Hierarchical Spatial-temporal Masked Contrast for skeleton action recognition[J]. IEEE Transactions on Artificial Intelligence,2024,PP(99).
|
APA |
Wenming Cao,Aoyu Zhang,Zhihai He,Yicha Zhang,&Xinpeng Yin.(2024).Hierarchical Spatial-temporal Masked Contrast for skeleton action recognition.IEEE Transactions on Artificial Intelligence,PP(99).
|
MLA |
Wenming Cao,et al."Hierarchical Spatial-temporal Masked Contrast for skeleton action recognition".IEEE Transactions on Artificial Intelligence PP.99(2024).
|
条目包含的文件 | 条目无相关文件。 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论