SlideShare a Scribd company logo
1 of 55
Download to read offline
Collaborative Data Management: How
Crowdsourcing Can Help To Manage Data
Edward Curry
Enterprise Data World 2013
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  Problems with Data
¨ Master Data Management
n  Crowdsourcing
n  Collaborative Data Management
n  Setting up a CDM Process
n  Future Directions
Overview
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
The Problems with Data
Knowledge Workers need:
¨  Access to the right data
¨  Confidence in that data
Flawed data effects 25%
of critical data in world’s
top companies
Data quality role in recent
financial crisis:
¨  “Asset are defined differently
in different programs”
¨  “Numbers did not always add
up”
¨  “Departments do not trust
each other’s figures”
¨  “Figures … not worth the
pixels they were made of”
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  Master Data Management is a process that
can improve data quality
n  What is Data Quality?
¨ Desirable characteristics for information
resource
¨ Described as a series of quality dimensions
–  Discoverability, Accessibility, Timeliness,
Completeness, Interpretation, Accuracy, Consistency,
Provenance & Reputation
Master Data Management
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Data Quailty
Master Data Management
Profile
Sources
Define
Mappings
Cleans Enrich
De-duplicate
Define
Rules
Master
Data
Data Developer
Data Steward
Data Governance
Business Users
Applications
Product DataProduct Data
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Data Quality
6	
  
ID PNAME PCOLOR PRICE
APNR iPod Nano Red 150
APNS iPod Nano Silver 160
<Product	
  name=“iPod	
  Nano”>	
  
	
  	
  	
  <Items>	
  
	
  	
  	
  	
  	
  	
  	
  	
  <Item	
  code=“IPN890”>	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  <price>150</price>	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  <genera?on>5</genera?on>	
  
	
  	
  	
  	
  	
  	
  	
  	
  </Item>	
  
	
  	
  	
  	
  </Items>	
  
</Product>	
  
Source A
Source B
Schema Difference?
Data Developer
APNR	
  
iPod	
  Nano	
  
Red	
  
150	
  
APNR	
  
iPod	
  Nano	
  
Silver	
  
160	
  
iPod	
  Nano	
   IPN890	
  
150	
  
5	
  
Value Conflicts?
Entity Duplication?
Data Steward
Business Users
?
Technical Domain
(Technical)
Domain
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  Pros
¨  Can create a single version of truth
¨  Standardized information creation and management
¨  Improves data quality
n  Cons
¨  Significant upfront costs and efforts
¨  Participation limited to few (mostly) technical experts
¨  Difficult to scale for large data sources
–  Extended Enterprise e.g. partner, data vendors
¨  Small % of data under management (i.e. CRM, Product, …)
Master Data Management
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Enterprise Data Landscape
The
Managed
8
Reference data managed
through well define policies
and governance council
Data directly
managed by
enterprise and
its departments
All data relevant to
enterprise and its
operationsThe
Reality
The
Known
MDM
Enterprise Data
Relevant External Data
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
CROWDSOURCING
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Crowdsourcing Industry
Landscape
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  Coordinating a crowd (a large group of workers)to
do micro-work (small tasks) that solves problems
(that computers or a single user can’t)
n  A collection of mechanisms and associated
methodologies for scaling and directing crowd
activities to achieve goals
n  Related Areas
¨  Collective Intelligence
¨  Social Computing
¨  Human Computation
¨  Data Mining
Introduction to Crowdsourcing
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  Maskelyne 1760
¨ Used human computers
to created almanac of
moon positions
– Used for shipping/
navigation
¨ Quality assurance
– Do calculations twice
– Compare to third verifier
When Computers Were Human
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
When Computers Were Human
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Human
ü Visual perception
ü Visuospatial thinking
ü Audiolinguistic ability
ü Sociocultural
awareness
ü Creativity
ü Domain knowledge
Machine
ü Large-scale data
manipulation
ü Collecting and storing
large amounts of data
ü Efficient data movement
ü Bias-free analysis
Human vs Machine Affordances
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  Computers cannot do the task
n  Single person cannot do the task
n  Work can be split into smaller tasks
When to Crowdsource?
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Tag a Tune
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Peekaboom
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Foldit
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
ReCaptcha
n  OCR
¨  ~ 1% error rate
¨  20%-30% for 18th and
19th century books
n  40 million ReCAPTCHAs
every day” (2008)
¨  Fixing 40,000 books a
day
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Generic Architecture
Workers
Platform/Marketplace
(Publish Task, Task Management)
Requestors
1.
2.
4.
3.
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Amazon Mechanical Turk
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
CrowdFlower
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
COLLABORATIVE DATA
MANAGEMENT
•  Collabora?ve	
  knowledge	
  
base	
  maintained	
  by	
  
community	
  of	
  web	
  users	
  
•  Users	
  create	
  en?ty	
  types	
  
and	
  their	
  meta-­‐data	
  
according	
  to	
  guidelines	
  	
  
•  Requires	
  administra?ve	
  
approvals	
  for	
  schema	
  
changes	
  by	
  end	
  users	
  
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  Collaboratively built by large community
¨  More than 19,000,000 articles, 270+ languages,
3,200,000+ articles in English
¨  More than 157,000 active contributors
n  Accuracy and stylistic formality are
equivalent to expert-based resources
¨  i.e. Columbia and Britannica encyclopedias
n  WikiMeida
¨  Software behind Wikipedia
¨  Widely used inside organizations
¨  Intellipedia:16 U.S. Intelligence agencies
¨  Wiki Proteins: curated Protein data for
knowledge discovery
Wikipedia
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  DBPedia provides direct access to data
¨ Indirectly uses wiki as data curation platform
¨ Inherits massive volume of curated
Wikipedia data
¨ 3.4 million entities and 1 billion RDF triples
¨ Comprehensive data infrastructure
– Concept URIs
– Definitions
– Basic types
DBPedia Knowledge base
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
A Bottom up Approach to MDM
Engage	
  More	
  Human	
  Workers	
  to	
  Collabora4vely	
  
Manage	
  Enterprise	
  Data	
  
31	
  of	
  50	
  
Collaborative Enterprise
Data Management
10s-100s 10,000s-100,000sNumber of Participants
Data Control
Top-down
Bottom-up
MDM
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Emerging Enterprise Data
Landscape
The
Managed
8
Reference data managed
through well define policies
and governance council
Data directly
managed by
enterprise and
its departments
All data relevant to
enterprise and its
operationsThe
Reality
The
Known
Enterprise Data
Relevant External Data
Collaboratively
Managed
MDM
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Clean Data
Algorithm + Crowd
Developers Data Governance
Internal Community
External Crowd
Data
Sources
Data Quality
Algorithms
Human
Computation
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Examples of CDM Tasks
n  Understanding customer sentiment for
launch of new product around the world.
n  Implemented 24/7 sentiment analysis
system with workers from around the
world.
n  Categorize millions of products on eBay’s
catalog with accurate and complete
attributes
n  Combine the crowd with machine learning to
create an affordable and flexible catalog
quality system
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  Natural Language Processing
¨  Dialect Identification, Spelling Correction, Machine
Translation, Word Similarity
n  Computer Vision
¨  Image Similarity, Image Annotation/Analysis
n  Classification
¨  Data attributes, Improving taxonomy, search results
n  Verification
¨  Entity consolidation, de-duplicate, cross-check, validate
data
n  Enrichment
¨  Judgments, annotation
Examples of CDM Tasks
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SETTING UP A CDM PROCESS
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Core Design Questions of CDM
Goal
What
Why IncentivesWhoWorkers
How
Process
Malone, T. W., Laubacher, R., & Dellarocas, C. N.
Harnessing crowds: Mapping the genome of collective intelligence. MIT Sloan Research Paper 4732-09, (2009).
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  Hierarchy (Assignment)
¨ Someone in authority assigns a particular person
or group of people to perform the task
¨ Within the Enterprise
n  Crowd (Choice)
¨ Anyone in a large group who choses to do so
¨ Internal or External Crowds
Who is doing it? (Workers)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  Motivation
¨  Money ($$££)
¨  Glory (reputation/prestige)
¨  Love (altruism, socialize, enjoyment)
¨  Unintended by-product (e.g. re-Captcha, captured in workflow)
¨  Self-serving resources (e.g. Wikipedia, product/customer data)
n  Determine pay and time for each task
¨  Marketplace: Delicate balance
–  Money does not improve quality but can increase participation
¨  Internal Hierarchy: Engineering opportunities for recognition
–  Performance review, prizes for top contributors, badges,
leaderboards, etc.
Why are they doing it? (Incentives)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Effect of Payment on Quality
n  Cost does not affect quality [Mason and Watts, 2009, AdSafe]
n  Similar results for bigger tasks [Ariely et al, 2009]
[Panos Ipeirotis. WWW2011 tutorial]
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  Creation Tasks
¨ Create/Generate
¨ Find
¨ Improve/ Edit / Fix
n  Decision (Vote) Tasks
¨ Accept / Reject
¨ Thumbs up / Thumbs Down
¨ Vote for Best
What is being done? (Goal)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  Tasks integrated in normal workflow of
those creating and managing data
¨ Simple as vetting or “rating” results of algorithm
n  Task Design
¨ Task Interface
¨ Task Assignment/Routing
¨ Task Quality Assurance
How is it being done? (How)
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Task Design
43
* Edith Law and Luis von Ahn, Human Computation - Core Research Questions and State of the Art
Input Output
Task Router
before computation
Output Aggregation
after computation
Task Interface
during computation
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Pull Routing
n  Workers seek tasks and assign to themselves
¨  Search and Discovery of tasks support by platform
¨  Task Recommendation
¨  Peer Routing
Workers
Tasks Select
Result
Algorithm
Search & Browse Interface
Result
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Push Routing
n  System assigns tasks to workers based on:
¨  Past performance
¨  Expertise
¨  Cost
¨  Latency
45
Workers
Tasks
Assign
Result
Assign
Algorithm
Task Interface
* www.mobileworks.com
Result
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  Redundancy: Quorum Votes
¨  Replicate the task (i.e. 3 times)
¨  Use majority voting to determine right value (% agreement)
¨  Weighted majority vote
n  Gold Data / Honey Pots
¨  Inject trap question to test quality
¨  Worker fatigue check (habit of saying no all the time)
n  Estimation of Worker Quality
¨  Redundancy plus gold data
n  Qualification Test
¨  Use test tasks to determine users ability for such tasks
Managing Task Quality Assurance
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  Task Management
¨ Task assignment, payment, routing
–  Optimizing for Cost, Quality, Completion Time
n  Human–Computer Interaction
¨ Payment / incentives
¨ User interface and interaction design
¨ Worker reputation, recruitment, retention
n  Quality Control
¨ Trust, reliability, spam detection, consensus
Future Directions
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  Collaborative Data Management
¨  Emerging trend for data management in the Enterprise.
¨  Crowdsourcing + Micro Tasks
¨  A number of emerging platform to assist
Summary
Data Quality
Algorithms
Human
Computation Clean DataDirty Data
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Edward is a research scientist at the Digital Enterprise Research
Institute. His areas of research include green IT/IS, energy informatics,
linked data, integrated reporting, and cloud computing.
He has worked extensively with industry and government advising on
the adoption patterns, practicalities and benefits of new technologies.
He has published in leading journals and books, and has spoken at
international conferences including the MIT CIO Symposium.
About the Presenter
URL: www.edwardcurry.org
Email: edcurry@acm.org
Twitter: @EdwardACurry
Slides: slideshare.net/edwardcurry
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  Big Data & Data Quality
¨  S. Lavalle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data,
Analytics and the Path from Insights to Value,” MIT Sloan Management Review, vol.
52, no. 2, pp. 21–32, 2011.
¨  A. Haug and J. S. Arlbjørn, “Barriers to master data quality,” Journal of Enterprise
Information Management, vol. 24, no. 3, pp. 288–303, 2011.
¨  R. Silvola, O. Jaaskelainen, H. Kropsu-Vehkapera, and H. Haapasalo, “Managing one
master data – challenges and preconditions,” Industrial Management & Data
Systems, vol. 111, no. 1, pp. 146–162, 2011.
¨  E. Curry, S. Hasan, and S. O’Riain, “Enterprise Energy Management using a Linked
Dataspace for Energy Intelligence,” in Second IFIP Conference on Sustainable
Internet and ICT for Sustainability, 2012.
¨  D. Loshin, Master Data Management. San Francisco, CA, USA: Morgan Kaufmann
Publishers Inc., 2008.
¨  B. Otto and A. Reichert, “Organizing Master Data Management: Findings from an
Expert Survey,” in Proceedings of the 2010 ACM Symposium on Applied Computing
- SAC ’10, 2010, pp. 106–110.
Selected References
50
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  Collective Intelligence, Crowdsourcing & Human Computation
¨  A. Doan, R. Ramakrishnan, and A. Y. Halevy, “Crowdsourcing systems on the World-
Wide Web,” Communications of the ACM, vol. 54, no. 4, p. 86, Apr. 2011.
¨  E. Law and L. von Ahn, “Human Computation,” Synthesis Lectures on Artificial
Intelligence and Machine Learning, vol. 5, no. 3, pp. 1–121, Jun. 2011.
¨  M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin, “CrowdDB : Answering
Queries with Crowdsourcing,” in Proceedings of the 2011 international conference
on Management of data - SIGMOD ’11, 2011, p. 61.
¨  P. Wichmann, A. Borek, R. Kern, P. Woodall, A. K. Parlikad, and G. Satzger,
“Exploring the ‘Crowd’ as Enabler of Better Information Quality,” in Proceedings of
the 16th International Conference on Information Quality, 2011, pp. 302–312.
¨  Winter A. Mason, Duncan J. Watts: Financial incentives and the "performance of
crowds". SIGKDD Explorations (SIGKDD) 11(2):100-108 (2009)
¨  Panos Ipeirotis. Managing Crowdsourced Human Computation, WWW2011 Tutorial
¨  O. Alonso & M. Lease. Crowdsourcing 101: Putting the WSDM of Crowds to Work for
You, WSDM Hong Kong 2011.
¨  When Computers Were Human: http://www.youtube.com/watch?v=YwqltwvPnkw
Selected References
51
Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
n  Collaborative Data Management
¨  E. Curry, A. Freitas, and S. O. Riain, “The Role of Community-Driven Data Curation
for Enterprises,” in Linking Enterprise Data, D. Wood, Ed. Boston, MA: Springer US,
2010, pp. 25–47.
¨  ul Hassan, U., O’Riain, S., and Curry, E. 2012. “Towards Expertise Modelling for
Routing Data Cleaning Tasks within a Community of Knowledge Workers,” In 17th
International Conference on Information Quality (ICIQ 2012), Paris, France.
¨  ul Hassan, U., O’Riain, S., and Curry, E. 2013. “Effects of Expertise Assessment on
the Quality of Task Routing in Human Computation,” In 2nd International Workshop
on Social Media for Crowdsourcing and Human Computation, Paris, France.
¨  ul Hassan, U., O’Riain, S., and Curry, E. 2012. “Leveraging Matching Dependencies
for Guided User Feedback in Linked Data Applications,” In 9th International
Workshop on Information Integration on the Web (IIWeb2012) Scottsdale, Arizona,:
ACM.
Selected References
52
Collaborative Data Management: How Crowdsourcing Can Help To Manage Data
Collaborative Data Management: How Crowdsourcing Can Help To Manage Data
Collaborative Data Management: How Crowdsourcing Can Help To Manage Data

More Related Content

What's hot

Towards a BIG Data Public Private Partnership
Towards a BIG Data Public Private PartnershipTowards a BIG Data Public Private Partnership
Towards a BIG Data Public Private PartnershipEdward Curry
 
Data Curation at the New York Times
Data Curation at the New York TimesData Curation at the New York Times
Data Curation at the New York TimesEdward Curry
 
Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...Edward Curry
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
 
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...Edward Curry
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebEdward Curry
 
Interactive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics ApproachInteractive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics ApproachEdward Curry
 
Developing an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's JourneyDeveloping an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's JourneyEdward Curry
 
Big Data: Beyond the hype, Delivering value
Big Data: Beyond the hype, Delivering valueBig Data: Beyond the hype, Delivering value
Big Data: Beyond the hype, Delivering valueEdward Curry
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Edward Curry
 
An Environmental Chargeback for Data Center and Cloud Computing Consumers
An Environmental Chargeback for Data Center and Cloud Computing ConsumersAn Environmental Chargeback for Data Center and Cloud Computing Consumers
An Environmental Chargeback for Data Center and Cloud Computing ConsumersEdward Curry
 
Challenges Ahead for Converging Financial Data
Challenges Ahead for Converging Financial DataChallenges Ahead for Converging Financial Data
Challenges Ahead for Converging Financial DataEdward Curry
 
Towards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing SystemsTowards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing SystemsEdward Curry
 
Crowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth SciencesCrowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth SciencesEdward Curry
 
Building Optimisation using Scenario Modeling and Linked Data
Building Optimisation using Scenario Modeling and Linked DataBuilding Optimisation using Scenario Modeling and Linked Data
Building Optimisation using Scenario Modeling and Linked DataEdward Curry
 
Citizen Actuation For Lightweight Energy Management
Citizen Actuation For Lightweight Energy ManagementCitizen Actuation For Lightweight Energy Management
Citizen Actuation For Lightweight Energy ManagementEdward Curry
 
Crowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementCrowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementEdward Curry
 
Using Linked Data and the Internet of Things for Energy Management
Using Linked Data and the Internet of Things for Energy ManagementUsing Linked Data and the Internet of Things for Energy Management
Using Linked Data and the Internet of Things for Energy ManagementEdward Curry
 
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingSLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingEdward Curry
 
Linked Water Data For Water Information Management
Linked Water Data For Water Information ManagementLinked Water Data For Water Information Management
Linked Water Data For Water Information ManagementEdward Curry
 

What's hot (20)

Towards a BIG Data Public Private Partnership
Towards a BIG Data Public Private PartnershipTowards a BIG Data Public Private Partnership
Towards a BIG Data Public Private Partnership
 
Data Curation at the New York Times
Data Curation at the New York TimesData Curation at the New York Times
Data Curation at the New York Times
 
Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
 
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data Web
 
Interactive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics ApproachInteractive Water Services: The Waternomics Approach
Interactive Water Services: The Waternomics Approach
 
Developing an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's JourneyDeveloping an Sustainable IT Capability: Lessons From Intel's Journey
Developing an Sustainable IT Capability: Lessons From Intel's Journey
 
Big Data: Beyond the hype, Delivering value
Big Data: Beyond the hype, Delivering valueBig Data: Beyond the hype, Delivering value
Big Data: Beyond the hype, Delivering value
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
 
An Environmental Chargeback for Data Center and Cloud Computing Consumers
An Environmental Chargeback for Data Center and Cloud Computing ConsumersAn Environmental Chargeback for Data Center and Cloud Computing Consumers
An Environmental Chargeback for Data Center and Cloud Computing Consumers
 
Challenges Ahead for Converging Financial Data
Challenges Ahead for Converging Financial DataChallenges Ahead for Converging Financial Data
Challenges Ahead for Converging Financial Data
 
Towards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing SystemsTowards Unified and Native Enrichment in Event Processing Systems
Towards Unified and Native Enrichment in Event Processing Systems
 
Crowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth SciencesCrowdsourcing Approaches to Big Data Curation for Earth Sciences
Crowdsourcing Approaches to Big Data Curation for Earth Sciences
 
Building Optimisation using Scenario Modeling and Linked Data
Building Optimisation using Scenario Modeling and Linked DataBuilding Optimisation using Scenario Modeling and Linked Data
Building Optimisation using Scenario Modeling and Linked Data
 
Citizen Actuation For Lightweight Energy Management
Citizen Actuation For Lightweight Energy ManagementCitizen Actuation For Lightweight Energy Management
Citizen Actuation For Lightweight Energy Management
 
Crowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementCrowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data Management
 
Using Linked Data and the Internet of Things for Energy Management
Using Linked Data and the Internet of Things for Energy ManagementUsing Linked Data and the Internet of Things for Energy Management
Using Linked Data and the Internet of Things for Energy Management
 
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingSLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
 
Linked Water Data For Water Information Management
Linked Water Data For Water Information ManagementLinked Water Data For Water Information Management
Linked Water Data For Water Information Management
 

Viewers also liked

Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Edward Curry
 
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Edward Curry
 
Designing Next Generation Smart City Initiatives: Harnessing Findings And Les...
Designing Next Generation Smart City Initiatives:Harnessing Findings And Les...Designing Next Generation Smart City Initiatives:Harnessing Findings And Les...
Designing Next Generation Smart City Initiatives: Harnessing Findings And Les...Edward Curry
 
Open Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and TrendsOpen Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and TrendsEdward Curry
 
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsSustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsEdward Curry
 
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEnterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEdward Curry
 
A Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTA Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTEdward Curry
 

Viewers also liked (7)

Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013Big Data Public Private Forum (BIG) @ European Data Forum 2013
Big Data Public Private Forum (BIG) @ European Data Forum 2013
 
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...Improving Policy Coherence and Accessibility through Semantic Web Technologie...
Improving Policy Coherence and Accessibility through Semantic Web Technologie...
 
Designing Next Generation Smart City Initiatives: Harnessing Findings And Les...
Designing Next Generation Smart City Initiatives:Harnessing Findings And Les...Designing Next Generation Smart City Initiatives:Harnessing Findings And Les...
Designing Next Generation Smart City Initiatives: Harnessing Findings And Les...
 
Open Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and TrendsOpen Data Innovation in Smart Cities: Challenges and Trends
Open Data Innovation in Smart Cities: Challenges and Trends
 
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and TrendsSustainable IT for Energy Management: Approaches, Challenges, and Trends
Sustainable IT for Energy Management: Approaches, Challenges, and Trends
 
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEnterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
 
A Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICTA Capability Maturity Framework for Sustainable ICT
A Capability Maturity Framework for Sustainable ICT
 

Similar to Collaborative Data Management: How Crowdsourcing Can Help To Manage Data

Integrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientIntegrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientPerficient, Inc.
 
John Mancini's Predictions for Information Management in 2015
John Mancini's Predictions for Information Management in 2015John Mancini's Predictions for Information Management in 2015
John Mancini's Predictions for Information Management in 2015AIIM International
 
Building Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New NormalBuilding Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New NormalDenodo
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"MDS ap
 
EPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdfEPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdfcedrinemadera
 
Riding and Capitalizing the Next Wave of Information Technology
Riding and Capitalizing the Next Wave of Information TechnologyRiding and Capitalizing the Next Wave of Information Technology
Riding and Capitalizing the Next Wave of Information TechnologyGoutama Bachtiar
 
Building the Cognitive Era : Big Data Strategies
Building the Cognitive Era : Big Data StrategiesBuilding the Cognitive Era : Big Data Strategies
Building the Cognitive Era : Big Data StrategiesKevin Sigliano
 
CTO Radshow Hamburg17 - Keynote - The CxO responsibilities in Big Data and AI...
CTO Radshow Hamburg17 - Keynote - The CxO responsibilities in Big Data and AI...CTO Radshow Hamburg17 - Keynote - The CxO responsibilities in Big Data and AI...
CTO Radshow Hamburg17 - Keynote - The CxO responsibilities in Big Data and AI...Santiago Cabrera-Naranjo
 
Your AI Transformation
Your AI Transformation Your AI Transformation
Your AI Transformation Sri Ambati
 
Big Data Customer Experience Analytics -- The Next Big Opportunity for You
Big Data Customer Experience Analytics -- The Next Big Opportunity for You Big Data Customer Experience Analytics -- The Next Big Opportunity for You
Big Data Customer Experience Analytics -- The Next Big Opportunity for You Dr.Dinesh Chandrasekar PhD(hc)
 
Delivering Value Through Business Analytics
Delivering Value Through Business AnalyticsDelivering Value Through Business Analytics
Delivering Value Through Business AnalyticsSocial Media Today
 
Tangenz big data
Tangenz big dataTangenz big data
Tangenz big dataemmajones88
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
 
ZIGRAM Introduction Deck June 2019
ZIGRAM Introduction Deck June 2019ZIGRAM Introduction Deck June 2019
ZIGRAM Introduction Deck June 2019ZIGRAM
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperativeTrillium Software
 
final oracle presentation
final oracle presentationfinal oracle presentation
final oracle presentationPriyesh Patel
 
The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...
The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...
The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...Denodo
 
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern AnalyticsThe Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern AnalyticsDenodo
 

Similar to Collaborative Data Management: How Crowdsourcing Can Help To Manage Data (20)

Integrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientIntegrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and Perficient
 
John Mancini's Predictions for Information Management in 2015
John Mancini's Predictions for Information Management in 2015John Mancini's Predictions for Information Management in 2015
John Mancini's Predictions for Information Management in 2015
 
Building Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New NormalBuilding Resiliency and Agility with Data Virtualization for the New Normal
Building Resiliency and Agility with Data Virtualization for the New Normal
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
 
EPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdfEPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdf
 
Riding and Capitalizing the Next Wave of Information Technology
Riding and Capitalizing the Next Wave of Information TechnologyRiding and Capitalizing the Next Wave of Information Technology
Riding and Capitalizing the Next Wave of Information Technology
 
Building the Cognitive Era : Big Data Strategies
Building the Cognitive Era : Big Data StrategiesBuilding the Cognitive Era : Big Data Strategies
Building the Cognitive Era : Big Data Strategies
 
CTO Radshow Hamburg17 - Keynote - The CxO responsibilities in Big Data and AI...
CTO Radshow Hamburg17 - Keynote - The CxO responsibilities in Big Data and AI...CTO Radshow Hamburg17 - Keynote - The CxO responsibilities in Big Data and AI...
CTO Radshow Hamburg17 - Keynote - The CxO responsibilities in Big Data and AI...
 
Your AI Transformation
Your AI Transformation Your AI Transformation
Your AI Transformation
 
Big Data Customer Experience Analytics -- The Next Big Opportunity for You
Big Data Customer Experience Analytics -- The Next Big Opportunity for You Big Data Customer Experience Analytics -- The Next Big Opportunity for You
Big Data Customer Experience Analytics -- The Next Big Opportunity for You
 
Delivering Value Through Business Analytics
Delivering Value Through Business AnalyticsDelivering Value Through Business Analytics
Delivering Value Through Business Analytics
 
Tangenz big data
Tangenz big dataTangenz big data
Tangenz big data
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
ZIGRAM Introduction Deck June 2019
ZIGRAM Introduction Deck June 2019ZIGRAM Introduction Deck June 2019
ZIGRAM Introduction Deck June 2019
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperative
 
final oracle presentation
final oracle presentationfinal oracle presentation
final oracle presentation
 
The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...
The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...
The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...
 
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern AnalyticsThe Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
 
Taming Big Data With Modern Software Architecture
Taming Big Data  With Modern Software ArchitectureTaming Big Data  With Modern Software Architecture
Taming Big Data With Modern Software Architecture
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Collaborative Data Management: How Crowdsourcing Can Help To Manage Data

  • 1. Collaborative Data Management: How Crowdsourcing Can Help To Manage Data Edward Curry Enterprise Data World 2013
  • 2. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  Problems with Data ¨ Master Data Management n  Crowdsourcing n  Collaborative Data Management n  Setting up a CDM Process n  Future Directions Overview
  • 3. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge The Problems with Data Knowledge Workers need: ¨  Access to the right data ¨  Confidence in that data Flawed data effects 25% of critical data in world’s top companies Data quality role in recent financial crisis: ¨  “Asset are defined differently in different programs” ¨  “Numbers did not always add up” ¨  “Departments do not trust each other’s figures” ¨  “Figures … not worth the pixels they were made of”
  • 4. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  Master Data Management is a process that can improve data quality n  What is Data Quality? ¨ Desirable characteristics for information resource ¨ Described as a series of quality dimensions –  Discoverability, Accessibility, Timeliness, Completeness, Interpretation, Accuracy, Consistency, Provenance & Reputation Master Data Management
  • 5. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Data Quailty Master Data Management Profile Sources Define Mappings Cleans Enrich De-duplicate Define Rules Master Data Data Developer Data Steward Data Governance Business Users Applications Product DataProduct Data
  • 6. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Data Quality 6   ID PNAME PCOLOR PRICE APNR iPod Nano Red 150 APNS iPod Nano Silver 160 <Product  name=“iPod  Nano”>        <Items>                  <Item  code=“IPN890”>                              <price>150</price>                              <genera?on>5</genera?on>                  </Item>          </Items>   </Product>   Source A Source B Schema Difference? Data Developer APNR   iPod  Nano   Red   150   APNR   iPod  Nano   Silver   160   iPod  Nano   IPN890   150   5   Value Conflicts? Entity Duplication? Data Steward Business Users ? Technical Domain (Technical) Domain
  • 7. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  Pros ¨  Can create a single version of truth ¨  Standardized information creation and management ¨  Improves data quality n  Cons ¨  Significant upfront costs and efforts ¨  Participation limited to few (mostly) technical experts ¨  Difficult to scale for large data sources –  Extended Enterprise e.g. partner, data vendors ¨  Small % of data under management (i.e. CRM, Product, …) Master Data Management
  • 8. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Enterprise Data Landscape The Managed 8 Reference data managed through well define policies and governance council Data directly managed by enterprise and its departments All data relevant to enterprise and its operationsThe Reality The Known MDM Enterprise Data Relevant External Data
  • 9. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge CROWDSOURCING
  • 10. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Crowdsourcing Industry Landscape
  • 11. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  Coordinating a crowd (a large group of workers)to do micro-work (small tasks) that solves problems (that computers or a single user can’t) n  A collection of mechanisms and associated methodologies for scaling and directing crowd activities to achieve goals n  Related Areas ¨  Collective Intelligence ¨  Social Computing ¨  Human Computation ¨  Data Mining Introduction to Crowdsourcing
  • 12. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  Maskelyne 1760 ¨ Used human computers to created almanac of moon positions – Used for shipping/ navigation ¨ Quality assurance – Do calculations twice – Compare to third verifier When Computers Were Human
  • 13. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge When Computers Were Human
  • 14. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Human ü Visual perception ü Visuospatial thinking ü Audiolinguistic ability ü Sociocultural awareness ü Creativity ü Domain knowledge Machine ü Large-scale data manipulation ü Collecting and storing large amounts of data ü Efficient data movement ü Bias-free analysis Human vs Machine Affordances
  • 15. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  Computers cannot do the task n  Single person cannot do the task n  Work can be split into smaller tasks When to Crowdsource?
  • 16. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Tag a Tune
  • 17. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Peekaboom
  • 18. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Foldit
  • 19. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge ReCaptcha n  OCR ¨  ~ 1% error rate ¨  20%-30% for 18th and 19th century books n  40 million ReCAPTCHAs every day” (2008) ¨  Fixing 40,000 books a day
  • 20. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Generic Architecture Workers Platform/Marketplace (Publish Task, Task Management) Requestors 1. 2. 4. 3.
  • 21. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Amazon Mechanical Turk
  • 22. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge CrowdFlower
  • 23. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge COLLABORATIVE DATA MANAGEMENT
  • 24. •  Collabora?ve  knowledge   base  maintained  by   community  of  web  users   •  Users  create  en?ty  types   and  their  meta-­‐data   according  to  guidelines     •  Requires  administra?ve   approvals  for  schema   changes  by  end  users  
  • 25.
  • 26.
  • 27.
  • 28. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  Collaboratively built by large community ¨  More than 19,000,000 articles, 270+ languages, 3,200,000+ articles in English ¨  More than 157,000 active contributors n  Accuracy and stylistic formality are equivalent to expert-based resources ¨  i.e. Columbia and Britannica encyclopedias n  WikiMeida ¨  Software behind Wikipedia ¨  Widely used inside organizations ¨  Intellipedia:16 U.S. Intelligence agencies ¨  Wiki Proteins: curated Protein data for knowledge discovery Wikipedia
  • 29. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  DBPedia provides direct access to data ¨ Indirectly uses wiki as data curation platform ¨ Inherits massive volume of curated Wikipedia data ¨ 3.4 million entities and 1 billion RDF triples ¨ Comprehensive data infrastructure – Concept URIs – Definitions – Basic types DBPedia Knowledge base
  • 30.
  • 31. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge A Bottom up Approach to MDM Engage  More  Human  Workers  to  Collabora4vely   Manage  Enterprise  Data   31  of  50   Collaborative Enterprise Data Management 10s-100s 10,000s-100,000sNumber of Participants Data Control Top-down Bottom-up MDM
  • 32. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Emerging Enterprise Data Landscape The Managed 8 Reference data managed through well define policies and governance council Data directly managed by enterprise and its departments All data relevant to enterprise and its operationsThe Reality The Known Enterprise Data Relevant External Data Collaboratively Managed MDM
  • 33. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Clean Data Algorithm + Crowd Developers Data Governance Internal Community External Crowd Data Sources Data Quality Algorithms Human Computation
  • 34. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Examples of CDM Tasks n  Understanding customer sentiment for launch of new product around the world. n  Implemented 24/7 sentiment analysis system with workers from around the world. n  Categorize millions of products on eBay’s catalog with accurate and complete attributes n  Combine the crowd with machine learning to create an affordable and flexible catalog quality system
  • 35. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  Natural Language Processing ¨  Dialect Identification, Spelling Correction, Machine Translation, Word Similarity n  Computer Vision ¨  Image Similarity, Image Annotation/Analysis n  Classification ¨  Data attributes, Improving taxonomy, search results n  Verification ¨  Entity consolidation, de-duplicate, cross-check, validate data n  Enrichment ¨  Judgments, annotation Examples of CDM Tasks
  • 36. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge SETTING UP A CDM PROCESS
  • 37. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Core Design Questions of CDM Goal What Why IncentivesWhoWorkers How Process Malone, T. W., Laubacher, R., & Dellarocas, C. N. Harnessing crowds: Mapping the genome of collective intelligence. MIT Sloan Research Paper 4732-09, (2009).
  • 38. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  Hierarchy (Assignment) ¨ Someone in authority assigns a particular person or group of people to perform the task ¨ Within the Enterprise n  Crowd (Choice) ¨ Anyone in a large group who choses to do so ¨ Internal or External Crowds Who is doing it? (Workers)
  • 39. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  Motivation ¨  Money ($$££) ¨  Glory (reputation/prestige) ¨  Love (altruism, socialize, enjoyment) ¨  Unintended by-product (e.g. re-Captcha, captured in workflow) ¨  Self-serving resources (e.g. Wikipedia, product/customer data) n  Determine pay and time for each task ¨  Marketplace: Delicate balance –  Money does not improve quality but can increase participation ¨  Internal Hierarchy: Engineering opportunities for recognition –  Performance review, prizes for top contributors, badges, leaderboards, etc. Why are they doing it? (Incentives)
  • 40. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Effect of Payment on Quality n  Cost does not affect quality [Mason and Watts, 2009, AdSafe] n  Similar results for bigger tasks [Ariely et al, 2009] [Panos Ipeirotis. WWW2011 tutorial]
  • 41. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  Creation Tasks ¨ Create/Generate ¨ Find ¨ Improve/ Edit / Fix n  Decision (Vote) Tasks ¨ Accept / Reject ¨ Thumbs up / Thumbs Down ¨ Vote for Best What is being done? (Goal)
  • 42. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  Tasks integrated in normal workflow of those creating and managing data ¨ Simple as vetting or “rating” results of algorithm n  Task Design ¨ Task Interface ¨ Task Assignment/Routing ¨ Task Quality Assurance How is it being done? (How)
  • 43. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Task Design 43 * Edith Law and Luis von Ahn, Human Computation - Core Research Questions and State of the Art Input Output Task Router before computation Output Aggregation after computation Task Interface during computation
  • 44. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Pull Routing n  Workers seek tasks and assign to themselves ¨  Search and Discovery of tasks support by platform ¨  Task Recommendation ¨  Peer Routing Workers Tasks Select Result Algorithm Search & Browse Interface Result
  • 45. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Push Routing n  System assigns tasks to workers based on: ¨  Past performance ¨  Expertise ¨  Cost ¨  Latency 45 Workers Tasks Assign Result Assign Algorithm Task Interface * www.mobileworks.com Result
  • 46. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  Redundancy: Quorum Votes ¨  Replicate the task (i.e. 3 times) ¨  Use majority voting to determine right value (% agreement) ¨  Weighted majority vote n  Gold Data / Honey Pots ¨  Inject trap question to test quality ¨  Worker fatigue check (habit of saying no all the time) n  Estimation of Worker Quality ¨  Redundancy plus gold data n  Qualification Test ¨  Use test tasks to determine users ability for such tasks Managing Task Quality Assurance
  • 47. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  Task Management ¨ Task assignment, payment, routing –  Optimizing for Cost, Quality, Completion Time n  Human–Computer Interaction ¨ Payment / incentives ¨ User interface and interaction design ¨ Worker reputation, recruitment, retention n  Quality Control ¨ Trust, reliability, spam detection, consensus Future Directions
  • 48. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  Collaborative Data Management ¨  Emerging trend for data management in the Enterprise. ¨  Crowdsourcing + Micro Tasks ¨  A number of emerging platform to assist Summary Data Quality Algorithms Human Computation Clean DataDirty Data
  • 49. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge Edward is a research scientist at the Digital Enterprise Research Institute. His areas of research include green IT/IS, energy informatics, linked data, integrated reporting, and cloud computing. He has worked extensively with industry and government advising on the adoption patterns, practicalities and benefits of new technologies. He has published in leading journals and books, and has spoken at international conferences including the MIT CIO Symposium. About the Presenter URL: www.edwardcurry.org Email: edcurry@acm.org Twitter: @EdwardACurry Slides: slideshare.net/edwardcurry
  • 50. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  Big Data & Data Quality ¨  S. Lavalle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data, Analytics and the Path from Insights to Value,” MIT Sloan Management Review, vol. 52, no. 2, pp. 21–32, 2011. ¨  A. Haug and J. S. Arlbjørn, “Barriers to master data quality,” Journal of Enterprise Information Management, vol. 24, no. 3, pp. 288–303, 2011. ¨  R. Silvola, O. Jaaskelainen, H. Kropsu-Vehkapera, and H. Haapasalo, “Managing one master data – challenges and preconditions,” Industrial Management & Data Systems, vol. 111, no. 1, pp. 146–162, 2011. ¨  E. Curry, S. Hasan, and S. O’Riain, “Enterprise Energy Management using a Linked Dataspace for Energy Intelligence,” in Second IFIP Conference on Sustainable Internet and ICT for Sustainability, 2012. ¨  D. Loshin, Master Data Management. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2008. ¨  B. Otto and A. Reichert, “Organizing Master Data Management: Findings from an Expert Survey,” in Proceedings of the 2010 ACM Symposium on Applied Computing - SAC ’10, 2010, pp. 106–110. Selected References 50
  • 51. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  Collective Intelligence, Crowdsourcing & Human Computation ¨  A. Doan, R. Ramakrishnan, and A. Y. Halevy, “Crowdsourcing systems on the World- Wide Web,” Communications of the ACM, vol. 54, no. 4, p. 86, Apr. 2011. ¨  E. Law and L. von Ahn, “Human Computation,” Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 5, no. 3, pp. 1–121, Jun. 2011. ¨  M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin, “CrowdDB : Answering Queries with Crowdsourcing,” in Proceedings of the 2011 international conference on Management of data - SIGMOD ’11, 2011, p. 61. ¨  P. Wichmann, A. Borek, R. Kern, P. Woodall, A. K. Parlikad, and G. Satzger, “Exploring the ‘Crowd’ as Enabler of Better Information Quality,” in Proceedings of the 16th International Conference on Information Quality, 2011, pp. 302–312. ¨  Winter A. Mason, Duncan J. Watts: Financial incentives and the "performance of crowds". SIGKDD Explorations (SIGKDD) 11(2):100-108 (2009) ¨  Panos Ipeirotis. Managing Crowdsourced Human Computation, WWW2011 Tutorial ¨  O. Alonso & M. Lease. Crowdsourcing 101: Putting the WSDM of Crowds to Work for You, WSDM Hong Kong 2011. ¨  When Computers Were Human: http://www.youtube.com/watch?v=YwqltwvPnkw Selected References 51
  • 52. Digital Enterprise Research Institute www.deri.ie Enabling networked knowledge n  Collaborative Data Management ¨  E. Curry, A. Freitas, and S. O. Riain, “The Role of Community-Driven Data Curation for Enterprises,” in Linking Enterprise Data, D. Wood, Ed. Boston, MA: Springer US, 2010, pp. 25–47. ¨  ul Hassan, U., O’Riain, S., and Curry, E. 2012. “Towards Expertise Modelling for Routing Data Cleaning Tasks within a Community of Knowledge Workers,” In 17th International Conference on Information Quality (ICIQ 2012), Paris, France. ¨  ul Hassan, U., O’Riain, S., and Curry, E. 2013. “Effects of Expertise Assessment on the Quality of Task Routing in Human Computation,” In 2nd International Workshop on Social Media for Crowdsourcing and Human Computation, Paris, France. ¨  ul Hassan, U., O’Riain, S., and Curry, E. 2012. “Leveraging Matching Dependencies for Guided User Feedback in Linked Data Applications,” In 9th International Workshop on Information Integration on the Web (IIWeb2012) Scottsdale, Arizona,: ACM. Selected References 52