中文版 | English
题名

TMIXT: A process flow for Transcribing MIXed handwritten and machine-printed Text

作者
通讯作者Obara, Boguslaw
DOI
发表日期
2018
ISSN
2639-1589
ISBN
978-1-5386-5036-3
会议录名称
页码
2986-2994
会议日期
10-13 Dec. 2018
会议地点
Seattle, WA, United states
出版地
345 E 47TH ST, NEW YORK, NY 10017 USA
出版者
摘要
Handling large corpuses of documents is of significant importance in many fields, no more so than in the areas of crime investigation and defence, where an organisation may be presented with a large volume of scanned documents which need to be processed in a finite time. However, this problem is exacerbated both by the volume, in terms of scanned documents and the complexity of the pages, which need to be processed. Often containing many different elements, which each need to be processed and understood. Text recognition, which is a primary task of this process, is usually dependent upon the type of text, being either handwritten or machine-printed. Accordingly, the recognition involves prior classification of the text category, before deciding on the recognition method to be applied. This poses a more challenging task if a document contains both handwritten and machine-printed text. In this work, we present a generic process flow for text recognition in scanned documents containing mixed handwritten and machine-printed text without the need to classify text in advance. We realize the proposed process flow using several open-source image processing and text recognition packages. The evaluation is performed using a specially developed variant, presented in this work, of the IAM handwriting database, where we achieve an average transcription accuracy of nearly 80% for pages containing both printed and handwritten text.
关键词
学校署名
其他
语种
英语
相关链接[来源记录]
收录类别
资助项目
Istituto Superiore di Sanità[]
WOS研究方向
Computer Science
WOS类目
Computer Science, Artificial Intelligence ; Computer Science, Information Systems ; Computer Science, Theory & Methods
WOS记录号
WOS:000468499303009
EI入藏号
20191106615762
EI主题词
Big data ; Copying ; Image processing ; Optical character recognition ; Optical data processing ; Transcription
EI分类号
Biology:461.9 ; Data Processing and Image Processing:723.2 ; Light/Optics:741.1 ; Reproduction, Copying:745.2 ; Information Sources and Analysis:903.1
来源库
Web of Science
全文链接https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8622136
引用统计
被引频次[WOS]:3
成果类型会议论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/24651
专题工学院_计算机科学与工程系
作者单位
1.Univ Durham, Dept Comp Sci, Durham, England
2.Newcastle Univ, Sch Comp, Newcastle Upon Tyne, Tyne & Wear, England
3.Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Peoples R China
推荐引用方式
GB/T 7714
Medhat, Fady,Mohammadi, Mahnaz,Jaf, Sardar,et al. TMIXT: A process flow for Transcribing MIXed handwritten and machine-printed Text[C]. 345 E 47TH ST, NEW YORK, NY 10017 USA:IEEE,2018:2986-2994.
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Medhat, Fady]的文章
[Mohammadi, Mahnaz]的文章
[Jaf, Sardar]的文章
百度学术
百度学术中相似的文章
[Medhat, Fady]的文章
[Mohammadi, Mahnaz]的文章
[Jaf, Sardar]的文章
必应学术
必应学术中相似的文章
[Medhat, Fady]的文章
[Mohammadi, Mahnaz]的文章
[Jaf, Sardar]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。