题名 | Cost-effective crowdsourced join queries for entity resolution without prior knowledge |
作者 | |
通讯作者 | Yin,Bo |
发表日期 | 2022-02-01
|
DOI | |
发表期刊 | |
ISSN | 0167-739X
|
卷号 | 127页码:240-251 |
摘要 | The join query, which finds matching pairs from two object sets, is a fundamental operation in computer systems and helps to solve many real problems, e.g., entity resolution. In this paper, we address the problem of join queries by leveraging crowdsourcing to obtain matching relationships. The goal is to minimize the monetary cost while maintaining high quality of query results. However, existing approaches focused on finding matching pairs from a single object set and assumed the existence of prior knowledge, which is not applicable in real applications. We propose a cost-effective crowdsourced join query framework that minimizes the overall monetary cost by reducing the monetary cost of labeling single pairs and the amount of comparison pairs. Specifically, we first propose a novel two-level confidence-based labeling model that minimizes the cost for labeling a single pair with confidence guarantee. This model crowdsources easy-judging pairs to ordinary workers, and asks for skilled workers who may charge more than ordinary workers to compare only hard-judging pairs. Statistical estimations are used to aggregate crowdsourcing results with 1−α confidence. Then, we propose a transitivity-based query scheme that minimizes the number of comparison pairs on the basis of transitive relations. Guided by the principle of eagerly identifying matching pairs, especially matching pairs from a single set, our scheme carefully designs the processing order of pairs in order to make full use of transitivities to infer new labels. The results of our extensive experiments demonstrate that the proposed framework can save much more monetary cost while assuring the accuracy of results. |
关键词 | |
相关链接 | [Scopus记录] |
收录类别 | |
语种 | 英语
|
学校署名 | 其他
|
WOS记录号 | WOS:000706478900004
|
EI入藏号 | 20213910954793
|
EI主题词 | Cost effectiveness
; Search engines
|
EI分类号 | Computer Software, Data Handling and Applications:723
; Industrial Economics:911.2
|
Scopus记录号 | 2-s2.0-85115749173
|
来源库 | Scopus
|
引用统计 |
被引频次[WOS]:1
|
成果类型 | 期刊论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/253413 |
专题 | 工学院_计算机科学与工程系 |
作者单位 | 1.School of Computer and Communication Engineering,ChangSha University of Science and Technology,Changsha,410114,China 2.Department of Computer Science and Engineering,Southern University of Science and Technology,Shenzhen,518055,China |
推荐引用方式 GB/T 7714 |
Yin,Bo,Zeng,Weilong,Wei,Xuetao. Cost-effective crowdsourced join queries for entity resolution without prior knowledge[J]. Future Generation Computer Systems-The International Journal of eScience,2022,127:240-251.
|
APA |
Yin,Bo,Zeng,Weilong,&Wei,Xuetao.(2022).Cost-effective crowdsourced join queries for entity resolution without prior knowledge.Future Generation Computer Systems-The International Journal of eScience,127,240-251.
|
MLA |
Yin,Bo,et al."Cost-effective crowdsourced join queries for entity resolution without prior knowledge".Future Generation Computer Systems-The International Journal of eScience 127(2022):240-251.
|
条目包含的文件 | 条目无相关文件。 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论