SlideShare ist ein Scribd-Unternehmen logo
Apollo: Towards Factfinding in Participatory Sensing
H. Khac Le, J. Pasternack, H. Ahmadi, M. Gupta, Y. Sun, T. Abdelzaher, J. Han, D. Roth
University of Illinois at Urbana Champaign
Urbana, IL 61801
B. Szymanski, S. Adali
Rensselaer Polytechnic Institute
Troy, NY 12180
ABSTRACT
This demonstration presents Apollo, a new sensor informa-
tion processing tool for uncovering likely facts in noisy par-
ticipatory sensing data1
. Participatory sensing, where users
proactively document and share their observations, has re-
ceived significant attention in recent years as a paradigm for
crowd-sourcing observation tasks. However, it poses inter-
esting challenges in assessing confidence in the information
received. By borrowing clustering and ranking tools from
data mining literature, we show how to group data into sets
(or claims), corroborating specific events or observations,
then iteratively assess both claim and source credibility, ulti-
mately leading to a ranking of described claims by their like-
lihoold of occurrence. Apollo belongs to a category of tools
called fact-finders. It is the first fact-finder designed and im-
plemented specifically for participatory sensing. Apollo uses
Twitter as the underlying engine for sharing participatory
sensing data. Twitter is widely popular, can be interfaced to
cell-phones that share sensor data, and comes with a pow-
erful search API, as well as a publish-subscribe mechanism.
We evaluate it using a participatory sensing application that
collects and posts noisy vehicular traffic data on Twitter, as
well as a set of 60,000 (human) tweets collected during the
Haiti tsunami and a set of 500,000 tweets collected about
Cairo during its recent unrest. Viewers of the demonstra-
tion will interact with Apollo for various fact-finding tasks.
Categories and Subject Descriptors
D.2.5 [Software Engineering]:
General Terms
Design, Reliability, Experimentation
Keywords
Data fusion, Participatory sensing, Quality of information
1
Named after the Greek god of light and truth, among other
designations
IPSN 2011 Chicago, IL USA
1. INTRODUCTION
This demonstration illustrates a fact-finding tool designed
to uncover most likely truth in noisy participatory sensing
data. Participatory sensing [1], where participants proac-
tively report their observations, has become an increasingly
important data collection paradigm, where humans act as
the sensors, or employ devices they own (such as cell phones)
to perform sensing tasks. Its growing importance stems from
the large penetration of sensor-enabled devices with com-
munication capabilities in human populations. However, it
poses challenges in that data are collected from individu-
als and devices who may not always be accurate. Filtering
the deluge of data for correctness is an important challenge,
commonly known in machine learning and knowledge discov-
ery literature as fact-finding. Apollo is the first fact-finder
designed specifically for participatory sensing data.
A fact-finder [6] maintains the abstraction of sources and
claims. In general, starting with no a priori information,
it iteratively computes their credibility: The credibility of
claims depends on the credibility of sources that make them.
Similarly, the credibility of sources depends on the credibil-
ity of claims they make. Iterations continue until they con-
verge. Enhancements of this basic iterative scheme include
a more general notion of claim assertion, where weights de-
scribe how confidently a source asserts a claim [5], incorpo-
ration of prior knowledge in the analysis [4], and clustering
of claims by subject [3] since the credibility of a source may
depend on the subject matter. The impact of dependence
between sources on credibility (e.g., when one spreads claims
made by others) can also be accounted for [2].
Apollo is a general framework for fact-finding in participa-
tory sensing data. Data-type-dependent modules first con-
vert source data (a set of source, observation tuples) into a
common representation of sources and claims. Clustering is
performed on input tuples first, by similarity of their obser-
vations, to generate a smaller number of claims for scalabil-
ity. Their credibility is then assessed in an iterative fashion,
together with the credibility of each source. Figure 1 illus-
trates the architecture of Apollo.
We evaluate our fact-finding algorithm in three distinct sce-
narios. In the first, a controlled experiment is conducted
where a set of participants report (tweet) traffic speed data
from GPS sensors in vehicles. We intentially corrupt a frac-
tion of observations. We also allow some participants to
report traffic information they heard from others, as op-
Sensor
Data
Front-end
Text
Front-end
Image
Front-end
Distance
M etric, S
Distance
M etric, T
Distance
M etric, I
Claim Clusters
A Network
of Claims
and Sources
Claim
Credibility
Assessment
Source
Credibility
Assessment
Figure 1: The Architecture of Apollo
posed to reporting their own (i.e., spread rumors). We show
that computing average statistics from such noisy data is not
always accurate. However, when data tuples are clustered
and ranked by Apollo, the quality of reported observations
increases considerably. The second and third scenarios use
Twitter data sets collected during the Haiti Tsunami (60,000
reports) and the recent Cairo unrest (500,000 reports), rep-
sectively. In each case, the significant number of reports
describe a much smaller number of events, some real and
some rumored. We use our algorithm to identify the dis-
tinct events and rank them by credibility. The results are
then compared to media reports.
2. SYSTEM OVERVIEW
We consider a system composed of human or sensory data
sources that send their observations as Twitter feeds. The
sink has two modes of operation. A follower mode, where it
follows only explicitly named sources, and a crawler mode,
where it uses the Twitter search API to download all tweets
on the topic of a query (subject to API-specific volume con-
straints). A fact-finding algorithm is then applied to clean
up the data. The algorithm has the following reconfigurable
parts:
‱ The parser: It uses a configuration file (that describes
the format of the input data stream) to convert in-
put data into a standard JSON format. Conceptually,
the data stream is composed of source, observation tu-
ples, where source identifies the data source (e.g., the
Twitter user ID), and the observation content may be
structured, as in the case of phones sharing sensor val-
ues, or unstructured, as in the case of people sharing
text. The configuration file describes the observation
format.
‱ The distance metrics: A library of different distance
metrics is provided for clustering of observations. For
unstructured text, these metrics reflect text similar-
ity. For structured data, metrics compute differences
in data vectors. Data can be multidimensional. For ex-
ample, when cell-phones report sensor values at given
locations, both measurements and locations can be el-
ements of the data vector.
‱ The cluster credibility metrics: When observations are
clustered by the distance metric, one needs to rank
different clusters in terms of credibility. A library
of ranking functions are provided, inspired by current
fact-finding literature, ranging from simple ones (how
many tuples are in the cluster) to more complex ones
(that take into account the social network relation be-
tween sources). For example, the rank of a cluster may
be lower if observations in the cluster are reported by a
group of tweeters who are followers of the same source.
‱ Source credibility metrics: More credible sources are
those whose observations are found more often in more
credible clusters. This basic intuition leads to a library
of simple credibility metrics to rank sources.
With the above parameters, the rest of the algorithm simply
iterates on computing clusters, cluster credibility metrics,
source credibility metrics, until stable results are obtained.
3. THE DEMONSTRATION SCRIPT
During the demo, a laptop will be used that runs Apollo on
three different sets of Twitter data; a set obtained in a con-
trolled vehicular traffic speed measurement experiment, a set
collected from Haiti during its tsunami, and a set collected
from Cairo during its recent unrest. Users will be allowed to
reconfigure the experiments by changing the used distance
and credibility metrics, clustering algorithm parameters, as
well as the amount of data operated on, and observing their
effects on final results. This will help answer questions such
as: what is the impact (in the given scenario) of considering
social network relations between data sources on improv-
ing quality of information in participatory sensing systems?
What are the trade-offs between different distance metrics
among data vectors? How coarse-grained can observation
clustering be without impacting correctness of results? How
much data is needed to get reliable results? Results will
be displayed as lists of credible events and credible sources.
“Ground truth” data will also be shown (which is known for
our controlled experiment and estimated from other media
in the other two case studies).
4. REFERENCES
[1] J. Burke, D. Estrin, M. Hansen, A. Parker, N. Ramanathan,
S. Reddy, and M. B. Srivastava. Participatory sensing. In In:
Workshop on World-Sensor-Web (WSW): Mobile Device
Centric Sensor Networks and Applications, pages 117–134,
2006.
[2] X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating
conflicting data: the role of source dependence. Proc. VLDB
Endow., 2:550–561, August 2009.
[3] M. Gupta, Y. Sun, and J. Han. Trust analysis with
clustering (poster paper). In World Wide Web Conference
(WWW), Hyderabad, India, March 2011.
[4] J. Pasternack and D. Roth. Knowing what to believe (when
you already know something). In Proc. the International
Conference on Computational Linguistics (COLING),
Beijing, China, August 2010.
[5] J. Pasternack and D. Roth. Generalized fact-finding (poster
paper). In World Wide Web Conference (WWW),
Hyderabad, India, March 2011.
[6] X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple
conflicting information providers on the web. IEEE Trans.
on Knowl. and Data Eng., 20:796–808, June 2008.

Weitere Àhnliche Inhalte

Ähnlich wie Apollo Towards Factfinding In Participatory Sensing

Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Editor IJAIEM
 
On nonmetric similarity search problems in complex domains
On nonmetric similarity search problems in complex domainsOn nonmetric similarity search problems in complex domains
On nonmetric similarity search problems in complex domains
unyil96
 
Nonmetric similarity search
Nonmetric similarity searchNonmetric similarity search
Nonmetric similarity search
unyil96
 
Information retrieval
Information retrievalInformation retrieval
Information retrieval
hplap
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Mining
butest
 

Ähnlich wie Apollo Towards Factfinding In Participatory Sensing (20)

Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
 
Event detection and summarization based on social networks and semantic query...
Event detection and summarization based on social networks and semantic query...Event detection and summarization based on social networks and semantic query...
Event detection and summarization based on social networks and semantic query...
 
Data mining java titles adrit solutions
Data mining java titles adrit solutionsData mining java titles adrit solutions
Data mining java titles adrit solutions
 
Data repository for sensor network a data mining approach
Data repository for sensor network  a data mining approachData repository for sensor network  a data mining approach
Data repository for sensor network a data mining approach
 
Lspnew (1)
Lspnew (1)Lspnew (1)
Lspnew (1)
 
On nonmetric similarity search problems in complex domains
On nonmetric similarity search problems in complex domainsOn nonmetric similarity search problems in complex domains
On nonmetric similarity search problems in complex domains
 
Nonmetric similarity search
Nonmetric similarity searchNonmetric similarity search
Nonmetric similarity search
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...
 
Information retrieval
Information retrievalInformation retrieval
Information retrieval
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Mining
 
Inspection of Certain RNN-ELM Algorithms for Societal Applications
Inspection of Certain RNN-ELM Algorithms for Societal ApplicationsInspection of Certain RNN-ELM Algorithms for Societal Applications
Inspection of Certain RNN-ELM Algorithms for Societal Applications
 
Ijetcas14 446
Ijetcas14 446Ijetcas14 446
Ijetcas14 446
 
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...
Novel Methodology of Data Management in Ad Hoc Network Formulated using Nanos...
 
Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised DataOutlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
 
A predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analyticsA predictive model for mapping crime using big data analytics
A predictive model for mapping crime using big data analytics
 
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
 
IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...
IRJET-  	  Predicting Outcome of Judicial Cases and Analysis using Machine Le...IRJET-  	  Predicting Outcome of Judicial Cases and Analysis using Machine Le...
IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...
 
Meliorating usable document density for online event detection
Meliorating usable document density for online event detectionMeliorating usable document density for online event detection
Meliorating usable document density for online event detection
 
Linked sensor data
Linked sensor dataLinked sensor data
Linked sensor data
 

Mehr von Carmen Pell

Mehr von Carmen Pell (20)

Abstract Paper Sample Format - 010 Essay Examp
Abstract Paper Sample Format - 010 Essay ExampAbstract Paper Sample Format - 010 Essay Examp
Abstract Paper Sample Format - 010 Essay Examp
 
Introducing A Topic Opinion Writing Writing Introductions
Introducing A Topic Opinion Writing  Writing IntroductionsIntroducing A Topic Opinion Writing  Writing Introductions
Introducing A Topic Opinion Writing Writing Introductions
 
Routemybook - Buy MLA Handbook For Writers Of R
Routemybook - Buy MLA Handbook For Writers Of RRoutemybook - Buy MLA Handbook For Writers Of R
Routemybook - Buy MLA Handbook For Writers Of R
 
The Fascinating Recommendation Report T
The Fascinating Recommendation Report TThe Fascinating Recommendation Report T
The Fascinating Recommendation Report T
 
Example Of Abstract In Thesis Writing. Writing An Abstract
Example Of Abstract In Thesis Writing. Writing An AbstractExample Of Abstract In Thesis Writing. Writing An Abstract
Example Of Abstract In Thesis Writing. Writing An Abstract
 
Policy Brief Format Template
Policy Brief Format TemplatePolicy Brief Format Template
Policy Brief Format Template
 
Sample Annotated Bibliography Free Download
Sample Annotated Bibliography Free DownloadSample Annotated Bibliography Free Download
Sample Annotated Bibliography Free Download
 
Components Of Thesis
Components Of ThesisComponents Of Thesis
Components Of Thesis
 
TEP025 Writing The Aims And Hypotheses Of Your Laboratory Report
TEP025 Writing The Aims And Hypotheses Of Your Laboratory ReportTEP025 Writing The Aims And Hypotheses Of Your Laboratory Report
TEP025 Writing The Aims And Hypotheses Of Your Laboratory Report
 
Animal nutrition 1 (a.pdf
Animal nutrition 1 (a.pdfAnimal nutrition 1 (a.pdf
Animal nutrition 1 (a.pdf
 
A Review of Sustainable Finance and Financial Performance Literature Mini-Re...
A Review of Sustainable Finance and Financial Performance Literature  Mini-Re...A Review of Sustainable Finance and Financial Performance Literature  Mini-Re...
A Review of Sustainable Finance and Financial Performance Literature Mini-Re...
 
Alimentacion De Camelidos Sudamericanos
Alimentacion De Camelidos SudamericanosAlimentacion De Camelidos Sudamericanos
Alimentacion De Camelidos Sudamericanos
 
Anisha- Brand Audit
Anisha- Brand AuditAnisha- Brand Audit
Anisha- Brand Audit
 
Author Holger Polch (Version 2.0 -11 2011) SAP Transportation Management 9.X...
Author  Holger Polch (Version 2.0 -11 2011) SAP Transportation Management 9.X...Author  Holger Polch (Version 2.0 -11 2011) SAP Transportation Management 9.X...
Author Holger Polch (Version 2.0 -11 2011) SAP Transportation Management 9.X...
 
After The Content Course An Expert-Novice Study Of Disciplinary Literacy Pra...
After The Content Course  An Expert-Novice Study Of Disciplinary Literacy Pra...After The Content Course  An Expert-Novice Study Of Disciplinary Literacy Pra...
After The Content Course An Expert-Novice Study Of Disciplinary Literacy Pra...
 
A Study Of Visible Light Communication With Li- Fi Technology
A Study Of Visible Light Communication With Li- Fi TechnologyA Study Of Visible Light Communication With Li- Fi Technology
A Study Of Visible Light Communication With Li- Fi Technology
 
Animal Amp Cat Anatomy Coloring Book
Animal  Amp  Cat Anatomy Coloring BookAnimal  Amp  Cat Anatomy Coloring Book
Animal Amp Cat Anatomy Coloring Book
 
Alan Turing Mathematical Mechanist
Alan Turing  Mathematical MechanistAlan Turing  Mathematical Mechanist
Alan Turing Mathematical Mechanist
 
Analysis Of GMO Food Products Companies Financial Risks And Opportunities In...
Analysis Of GMO Food Products Companies  Financial Risks And Opportunities In...Analysis Of GMO Food Products Companies  Financial Risks And Opportunities In...
Analysis Of GMO Food Products Companies Financial Risks And Opportunities In...
 
Adults And Children Share Literacy Practices The Case Of Homework
Adults And Children Share Literacy Practices  The Case Of HomeworkAdults And Children Share Literacy Practices  The Case Of Homework
Adults And Children Share Literacy Practices The Case Of Homework
 

KĂŒrzlich hochgeladen

KĂŒrzlich hochgeladen (20)

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17
 
INU_CAPSTONEDESIGN_ᄇᅔᄆᅔᆯᄇᅄᆫ호486_á„‹á…„á†žá„…á…©á„ƒá…łá„‹á…­á†Œ 발표자료.pdf
INU_CAPSTONEDESIGN_ᄇᅔᄆᅔᆯᄇᅄᆫ호486_á„‹á…„á†žá„…á…©á„ƒá…łá„‹á…­á†Œ 발표자료.pdfINU_CAPSTONEDESIGN_ᄇᅔᄆᅔᆯᄇᅄᆫ호486_á„‹á…„á†žá„…á…©á„ƒá…łá„‹á…­á†Œ 발표자료.pdf
INU_CAPSTONEDESIGN_ᄇᅔᄆᅔᆯᄇᅄᆫ호486_á„‹á…„á†žá„…á…©á„ƒá…łá„‹á…­á†Œ 발표자료.pdf
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
NCERT Solutions Power Sharing Class 10 Notes pdf
NCERT Solutions Power Sharing Class 10 Notes pdfNCERT Solutions Power Sharing Class 10 Notes pdf
NCERT Solutions Power Sharing Class 10 Notes pdf
 
The Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryThe Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. Henry
 
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptxJose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
Jose-Rizal-and-Philippine-Nationalism-National-Symbol-2.pptx
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
The impact of social media on mental health and well-being has been a topic o...
The impact of social media on mental health and well-being has been a topic o...The impact of social media on mental health and well-being has been a topic o...
The impact of social media on mental health and well-being has been a topic o...
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Salient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxSalient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptx
 
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
 
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...Basic Civil Engineering Notes of Chapter-6,  Topic- Ecosystem, Biodiversity G...
Basic Civil Engineering Notes of Chapter-6, Topic- Ecosystem, Biodiversity G...
 
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General QuizPragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 

Apollo Towards Factfinding In Participatory Sensing

  • 1. Apollo: Towards Factfinding in Participatory Sensing H. Khac Le, J. Pasternack, H. Ahmadi, M. Gupta, Y. Sun, T. Abdelzaher, J. Han, D. Roth University of Illinois at Urbana Champaign Urbana, IL 61801 B. Szymanski, S. Adali Rensselaer Polytechnic Institute Troy, NY 12180 ABSTRACT This demonstration presents Apollo, a new sensor informa- tion processing tool for uncovering likely facts in noisy par- ticipatory sensing data1 . Participatory sensing, where users proactively document and share their observations, has re- ceived significant attention in recent years as a paradigm for crowd-sourcing observation tasks. However, it poses inter- esting challenges in assessing confidence in the information received. By borrowing clustering and ranking tools from data mining literature, we show how to group data into sets (or claims), corroborating specific events or observations, then iteratively assess both claim and source credibility, ulti- mately leading to a ranking of described claims by their like- lihoold of occurrence. Apollo belongs to a category of tools called fact-finders. It is the first fact-finder designed and im- plemented specifically for participatory sensing. Apollo uses Twitter as the underlying engine for sharing participatory sensing data. Twitter is widely popular, can be interfaced to cell-phones that share sensor data, and comes with a pow- erful search API, as well as a publish-subscribe mechanism. We evaluate it using a participatory sensing application that collects and posts noisy vehicular traffic data on Twitter, as well as a set of 60,000 (human) tweets collected during the Haiti tsunami and a set of 500,000 tweets collected about Cairo during its recent unrest. Viewers of the demonstra- tion will interact with Apollo for various fact-finding tasks. Categories and Subject Descriptors D.2.5 [Software Engineering]: General Terms Design, Reliability, Experimentation Keywords Data fusion, Participatory sensing, Quality of information 1 Named after the Greek god of light and truth, among other designations IPSN 2011 Chicago, IL USA 1. INTRODUCTION This demonstration illustrates a fact-finding tool designed to uncover most likely truth in noisy participatory sensing data. Participatory sensing [1], where participants proac- tively report their observations, has become an increasingly important data collection paradigm, where humans act as the sensors, or employ devices they own (such as cell phones) to perform sensing tasks. Its growing importance stems from the large penetration of sensor-enabled devices with com- munication capabilities in human populations. However, it poses challenges in that data are collected from individu- als and devices who may not always be accurate. Filtering the deluge of data for correctness is an important challenge, commonly known in machine learning and knowledge discov- ery literature as fact-finding. Apollo is the first fact-finder designed specifically for participatory sensing data. A fact-finder [6] maintains the abstraction of sources and claims. In general, starting with no a priori information, it iteratively computes their credibility: The credibility of claims depends on the credibility of sources that make them. Similarly, the credibility of sources depends on the credibil- ity of claims they make. Iterations continue until they con- verge. Enhancements of this basic iterative scheme include a more general notion of claim assertion, where weights de- scribe how confidently a source asserts a claim [5], incorpo- ration of prior knowledge in the analysis [4], and clustering of claims by subject [3] since the credibility of a source may depend on the subject matter. The impact of dependence between sources on credibility (e.g., when one spreads claims made by others) can also be accounted for [2]. Apollo is a general framework for fact-finding in participa- tory sensing data. Data-type-dependent modules first con- vert source data (a set of source, observation tuples) into a common representation of sources and claims. Clustering is performed on input tuples first, by similarity of their obser- vations, to generate a smaller number of claims for scalabil- ity. Their credibility is then assessed in an iterative fashion, together with the credibility of each source. Figure 1 illus- trates the architecture of Apollo. We evaluate our fact-finding algorithm in three distinct sce- narios. In the first, a controlled experiment is conducted where a set of participants report (tweet) traffic speed data from GPS sensors in vehicles. We intentially corrupt a frac- tion of observations. We also allow some participants to report traffic information they heard from others, as op-
  • 2. Sensor Data Front-end Text Front-end Image Front-end Distance M etric, S Distance M etric, T Distance M etric, I Claim Clusters A Network of Claims and Sources Claim Credibility Assessment Source Credibility Assessment Figure 1: The Architecture of Apollo posed to reporting their own (i.e., spread rumors). We show that computing average statistics from such noisy data is not always accurate. However, when data tuples are clustered and ranked by Apollo, the quality of reported observations increases considerably. The second and third scenarios use Twitter data sets collected during the Haiti Tsunami (60,000 reports) and the recent Cairo unrest (500,000 reports), rep- sectively. In each case, the significant number of reports describe a much smaller number of events, some real and some rumored. We use our algorithm to identify the dis- tinct events and rank them by credibility. The results are then compared to media reports. 2. SYSTEM OVERVIEW We consider a system composed of human or sensory data sources that send their observations as Twitter feeds. The sink has two modes of operation. A follower mode, where it follows only explicitly named sources, and a crawler mode, where it uses the Twitter search API to download all tweets on the topic of a query (subject to API-specific volume con- straints). A fact-finding algorithm is then applied to clean up the data. The algorithm has the following reconfigurable parts: ‱ The parser: It uses a configuration file (that describes the format of the input data stream) to convert in- put data into a standard JSON format. Conceptually, the data stream is composed of source, observation tu- ples, where source identifies the data source (e.g., the Twitter user ID), and the observation content may be structured, as in the case of phones sharing sensor val- ues, or unstructured, as in the case of people sharing text. The configuration file describes the observation format. ‱ The distance metrics: A library of different distance metrics is provided for clustering of observations. For unstructured text, these metrics reflect text similar- ity. For structured data, metrics compute differences in data vectors. Data can be multidimensional. For ex- ample, when cell-phones report sensor values at given locations, both measurements and locations can be el- ements of the data vector. ‱ The cluster credibility metrics: When observations are clustered by the distance metric, one needs to rank different clusters in terms of credibility. A library of ranking functions are provided, inspired by current fact-finding literature, ranging from simple ones (how many tuples are in the cluster) to more complex ones (that take into account the social network relation be- tween sources). For example, the rank of a cluster may be lower if observations in the cluster are reported by a group of tweeters who are followers of the same source. ‱ Source credibility metrics: More credible sources are those whose observations are found more often in more credible clusters. This basic intuition leads to a library of simple credibility metrics to rank sources. With the above parameters, the rest of the algorithm simply iterates on computing clusters, cluster credibility metrics, source credibility metrics, until stable results are obtained. 3. THE DEMONSTRATION SCRIPT During the demo, a laptop will be used that runs Apollo on three different sets of Twitter data; a set obtained in a con- trolled vehicular traffic speed measurement experiment, a set collected from Haiti during its tsunami, and a set collected from Cairo during its recent unrest. Users will be allowed to reconfigure the experiments by changing the used distance and credibility metrics, clustering algorithm parameters, as well as the amount of data operated on, and observing their effects on final results. This will help answer questions such as: what is the impact (in the given scenario) of considering social network relations between data sources on improv- ing quality of information in participatory sensing systems? What are the trade-offs between different distance metrics among data vectors? How coarse-grained can observation clustering be without impacting correctness of results? How much data is needed to get reliable results? Results will be displayed as lists of credible events and credible sources. “Ground truth” data will also be shown (which is known for our controlled experiment and estimated from other media in the other two case studies). 4. REFERENCES [1] J. Burke, D. Estrin, M. Hansen, A. Parker, N. Ramanathan, S. Reddy, and M. B. Srivastava. Participatory sensing. In In: Workshop on World-Sensor-Web (WSW): Mobile Device Centric Sensor Networks and Applications, pages 117–134, 2006. [2] X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: the role of source dependence. Proc. VLDB Endow., 2:550–561, August 2009. [3] M. Gupta, Y. Sun, and J. Han. Trust analysis with clustering (poster paper). In World Wide Web Conference (WWW), Hyderabad, India, March 2011. [4] J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In Proc. the International Conference on Computational Linguistics (COLING), Beijing, China, August 2010. [5] J. Pasternack and D. Roth. Generalized fact-finding (poster paper). In World Wide Web Conference (WWW), Hyderabad, India, March 2011. [6] X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. IEEE Trans. on Knowl. and Data Eng., 20:796–808, June 2008.