南方科技大学知识苑(SUSTech KC): 法院裁判文书当中的数据自动抽取及其可视化分析推演

题名	法院裁判文书当中的数据自动抽取及其可视化分析推演
其他题名	AUTOMATIC DATA EXTRACTION AND VISUAL ANALYSIS AND DEDUCTION IN COURT JUDGMENT DOCUMENTS
姓名	张涛
学号	11749178
学位类型	硕士
学位专业	电子与通信工程
导师	姚新
论文答辩日期	2019-05-30
论文提交日期	2019-06-28
学位授予单位	哈尔滨工业大学
学位授予地点	深圳
摘要	自党的十八大以来，我国对职务犯罪采取零容忍的态度，将反腐败斗争作为国家治理的重要内容之一。在广大反腐败工作人员的努力下，越来越多的职务犯罪案件腐败分子被查处并公诸于世。随之而来的问题是大量的职务犯罪案件需要纪委监察部门投入巨大的人力物力。在此背景下，科技部联合公安部设立《公共安全风险防控与应急技术装备》重大专项，旨在通过计算机相关技术辅助办案人员提高办案效率，实现对于职务犯罪态势的研判，以加大对于重点高发领域的防治，降低职务犯罪的发生率。本研究内容即属于该科技部“公共安全”重大专项《多模态反腐案例特征发现与腐败案件发展态势预判》课题。主要研究内容为如何自动收集互联网公开的判决文书，并利用自然语言处理技术对于判决文书 27 类关键信息实现抽取，最后设计完成对职务犯罪案件的态势分析研判和对个案的模拟推演的 WEB 系统。截止 2019 年 3 月，通过互联网公开的判决文书有超过 7 000 万份，其中关于职务犯罪类别的判决文书有七万多份。所以首先我们采用了爬虫技术获取公开的判决文书以作为最初的文书数据。之后通过人工构建基于规则信息抽取算法、命名实体识别等多种自然语言处理算法实现对于多种类别的信息实现准确抽取，并存入数据库。在此基础上设计开发 WEB 端的判决文书数据分析及其态势研判推演可视化系统。并结合相关系统测试技术实现对于系统的设计优化，以验证达到课题的任务需求。
其他摘要	Since the 18th National Congress of the Communist Party of China, China hasadopted zero tolerance attitude towards duty crimes and made a fight against corruptionfor an important part of national governance. With the efforts of the vast number of anti-corruption staff , more and more crimes have been investigated. The problem is that a large number of crime cases require to invest more resources.In this situation, the Ministry of Science and Technology and the Ministry of Public Security have set up a major research project "Public Security Risk Prevention and Control and Emergency Technical Equipment". It aims at improving the efficiency of investigationthrough computer technology and realize the judgment of the situation of duty crimes so as to control of key high-incidence areas and reduce the duty crimes.The content of my research belongs to the project "Multi-modal Anti-corruption Case Characteristics Discovery and Prediction of the Development Trend of Corruption Cases", it is a sub-project of of "Public Security Risk Prevention and Control and Emergency Technical Equipment". The main research content is how to automatically collectjudgments for the internet in my research. Then I use natural language processing technology to extract 27 kinds of key information of judgment documents vand design the WEB system of situation analysis of duty crime cases and case simulation and other modules.As of March 2019, there are more than 70 000 000 judgment documents publicized on the internet. So we use the crawler technology to obtain the public judgment documents as the initial document data. After that, we use many kinds natural languageprocessing algorithms, such as based on rules information extraction algorithm, named entity recognition and so on. Then we use MySQL database to store the extraction of multi-category information. On this basis, we design the WEB-side visual system for judgment document data analysis and situation research and deduction modules. Finally we do the system testing to achieve the system design requirements of the subject.
关键词	自然语言处理信息抽取判决文书可视化系统
其他关键词	natural language processing information extraction judgment documents visualization system
语种	中文
培养类别	联合培养
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/38888
专题	创新创业学院
作者单位	南方科技大学
推荐引用方式 GB/T 7714	张涛. 法院裁判文书当中的数据自动抽取及其可视化分析推演[D]. 深圳. 哈尔滨工业大学,2019.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
法院裁判文书当中的数据自动抽取及其可视化（3677KB）	--	--	限制开放	--	请求全文