南方科技大学知识苑(SUSTech KC): Hierarchical Spatial-temporal Masked Contrast for skeleton action recognition

题名	Hierarchical Spatial-temporal Masked Contrast for skeleton action recognition
作者	Wenming Cao 1; Aoyu Zhang 1; Zhihai He2 ; Yicha Zhang 3; Xinpeng Yin 1
发表日期	2024
DOI	10.1109/TAI.2024.3430260
发表期刊	IEEE Transactions on Artificial Intelligence 影响因子和分区
ISSN	2691-4581
卷号	PP 期号:99
摘要	In the field of 3D action recognition, self-supervised learning has shown promising results but remains a challenging task. Previous approaches to motion modeling often relied on selecting features solely from the temporal or spatial domain, which limited the extraction of higher-level semantic information. Additionally, traditional one-to-one approaches in multilevel comparative learning overlooked the relationships between different levels, hindering the learning representation of the model. To address these issues, we propose the Hierarchical Spatial-temporal Masked network (HSTM) for learning 3D action representations. HSTM introduces a novel masking method that operates simultaneously in both the temporal and spatial dimensions. This approach leverages semantic relevance to identify meaningful regions in time and space, guiding the masking process based on semantic richness. This guidance is crucial for learning useful feature representations effectively. Furthermore, to enhance the learning of potential features, we introduce cross-level distillation (CLD) to extend the comparative learning approach. By training the model with two types of losses simultaneously, each level of the multi-level comparative learning process can be guided by levels rich in semantic information. This allows for more effective supervision of comparative learning, leading to improved performance. Extensive experiments conducted on the NTU-60, NTU-120, and PKU-MMD datasets demonstrate the effectiveness of our proposed framework. The learned action representations exhibit strong transferability and achieve state-of- the-art results.
相关链接	[IEEE记录]
学校署名	其他
引用统计
成果类型	期刊论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/803257
专题	工学院_电子与电气工程系
作者单位	1.State Key Laboratory of Radio Frequency Heterogeneous Integration, Shenzhen University 2.Department of Electronic and Electrical Engineering, Southern University of Science and Technology 3.Mechanical Design Engineering, UTBM Université de Technologie de Belfort-Montbéliard
推荐引用方式 GB/T 7714	Wenming Cao,Aoyu Zhang,Zhihai He,et al. Hierarchical Spatial-temporal Masked Contrast for skeleton action recognition[J]. IEEE Transactions on Artificial Intelligence,2024,PP(99).
APA	Wenming Cao,Aoyu Zhang,Zhihai He,Yicha Zhang,&Xinpeng Yin.(2024).Hierarchical Spatial-temporal Masked Contrast for skeleton action recognition.IEEE Transactions on Artificial Intelligence,PP(99).
MLA	Wenming Cao,et al."Hierarchical Spatial-temporal Masked Contrast for skeleton action recognition".IEEE Transactions on Artificial Intelligence PP.99(2024).