南方科技大学知识苑(SUSTech KC): ACT-Net: Anchor-Context Action Detection in Surgery Videos

题名	ACT-Net: Anchor-Context Action Detection in Surgery Videos
作者	Hao, Luoying 1,2 ; Hu, Yan2 ; Lin, Wenjun 2,3 ; Wang, Qun 4; Li, Heng2 ; Fu, Huazhu 5; Duan, Jinming 1; Liu, Jiang2
通讯作者	Hu, Yan; Duan, Jinming; Liu, Jiang
共同第一作者	Hao, Luoying; Hu, Yan
DOI	10.1007/978-3-031-43996-4_19
发表日期	2023
会议名称	26th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)
ISSN	0302-9743
EISSN	1611-3349
ISBN	978-3-031-43995-7
会议录名称	MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IX
卷号	14228
会议日期	OCT 08-12, 2023
会议地点	null,Vancouver,CANADA
出版地	GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND
出版者	SPRINGER INTERNATIONAL PUBLISHING AG
摘要	Recognition and localization of surgical detailed actions is an essential component of developing a context-aware decision support system. However, most existing detection algorithms fail to provide high-accuracy action classes even having their locations, as they do not consider the surgery procedure's regularity in the whole video. This limitation hinders their application. Moreover, implementing the predictions in clinical applications seriously needs to convey model confidence to earn entrustment, which is unexplored in surgical action prediction. In this paper, to accurately detect fine-grained actions that happen at every moment, we propose an anchor-context action detection network (ACT-Net), including an anchor-context detection (ACD) module and a class conditional diffusion (CCD) module, to answer the following questions: 1) where the actions happen; 2) what actions are; 3) how confidence predictions are. Specifically, the proposed ACD module spatially and temporally highlights the regions interacting with the extracted anchor in surgery video, which outputs action location and its class distribution based on anchor-context interactions. Considering the full distribution of action classes in videos, the CCD module adopts a denoising diffusion-based generative model conditioned on our ACD estimator to further reconstruct accurately the action predictions. Moreover, we utilize the stochastic nature of the diffusion model outputs to access model confidence for each prediction. Our method reports the state-of-the-art performance, with improvements of 4.0% mAP against baseline on the surgical video dataset.
关键词	Action detection Anchor-context Conditional diffusion Surgical video
学校署名	共同第一 ; 通讯
语种	英语
相关链接	[来源记录]
收录类别	EI ; CPCI-S
资助项目	General Program of National Natural Science Foundation of China["82272086","82102189"] ; Guangdong Basic and Applied Basic Research Foundation[2021A1515012195] ; Shenzhen Stable Support Plan Program["20220815111736001","20200925174052004"] ; Agency for Science, Technology and Research (A*STAR) Advanced Manufacturing and Engineering (AME) Programmatic Fund[A20H4b0141]
WOS研究方向	Computer Science ; Engineering
WOS类目	Computer Science, Artificial Intelligence ; Computer Science, Theory & Methods ; Engineering, Biomedical
WOS记录号	WOS:001109638800019
来源库	Web of Science
引用统计	被引频次[WOS]：1
成果类型	会议论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/673810
专题	工学院_斯发基斯可信自主研究院工学院_计算机科学与工程系
作者单位	1.School of Computer Science, University of Birmingham, Birmingham, United Kingdom 2.Research Institute of Trustworthy Autonomous Systems and Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China 3.Department of Mechanical Engineering, National University of Singapore, Singapore, Singapore 4.Third Medical Center of Chinese PLAGH, Beijing, China 5.Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
第一作者单位	斯发基斯可信自主系统研究院; 计算机科学与工程系
通讯作者单位	斯发基斯可信自主系统研究院; 计算机科学与工程系
推荐引用方式 GB/T 7714	Hao, Luoying,Hu, Yan,Lin, Wenjun,et al. ACT-Net: Anchor-Context Action Detection in Surgery Videos[C]. GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND:SPRINGER INTERNATIONAL PUBLISHING AG,2023.