Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Cleaning uncertain data with crowdsourcing a general model with diverse accuracy rates
1. 2020 – 2021
#13/ 19, 1st Floor, Municipal Colony, Kangayanellore Road, Gandhi Nagar, Vellore – 6.
Off: 0416-2247353 Mo: +91 9500218218 / +91 8220150373
Website: www.shakastech.com, Email - id: shakastech@gmail.com, info@shakastech.com
Cleaning Uncertain Data with Crowdsourcing - a General Model with Diverse
Accuracy Rates
Abstract
Uncertain data has been emerged as an important problem in database systems due to
the imprecise nature of many applications. To handle the uncertainty, probabilistic
databases can be used to store uncertain data, and querying facilities are provided to
yield answers with confidence. However, the uncertainty may propagate, hence the
results from a query or mining process may not be useful. In this paper, we leverage the
power of crowdsourcing by designing a set of Human Intelligence Tasks (HITs) to ask a
crowd with diverse accuracy rates, to improve the quality of uncertain data. Each HIT is
associated with a cost, thus, we need to design solutions to maximize the data quality
with minimal number of HITs. There are two obstacles for this non-trivial optimization,
which lead to very high computational cost for selecting the optimal set of HITs. First,
members of a crowd may return incorrect answers with different probabilities. Second,
the HITs decomposed from uncertain data are often correlated. We have addressed
these challenges in this paper by designing an effective approximation algorithm and an
efficient heuristic solution, even under diverse individual accuracy rates of the
crowdsourcing workers.