SlideShare ist ein Scribd-Unternehmen logo
1 von 56
Robert Leaman Benjamin Good
Zhiyong Lu Andrew Su
http://slideshare.net/andrewsu
 The aggregated decisions of a group
are often better than the those of any
single member
 Requirements:
 Diversity
 Independence
 Decentralization
 Aggregation
2[Surowiecki, 2004]
Sir Francis Galton
 An undefined group of people
 Typically ‘large’
 Diverse skills and abilities
 Typically no special skills assumed
3
[Estelles-Arolas, 2012]
 Computational power
 Distributed computing
 Content
 Web searches, social media
updates, blogs
 Observations
 Online surveys
 Personal data
4[Good & Su, 2013]
 Cognitive power
 Visual reasoning, language
processing
 Creative effort
 Resource creation, algorithm
development
 Funding: $$$
5[Good & Su, 2013]
 Crowd data
 Content
 Search logs
 Crowdsourcing
 Observations
 Cognitive power
 Creative effort
 Not a focus in this
tutorial
 Distributed
computing
 Crowdfunding
6
 Access
 To the data; to the crowd
▪ 1 in 5 people have a smartphone worldwide
 Engagement
 Getting contributors’ attention
 Incentive
 Quality control
7
 Information reflects health
 Disease status
 Disease associations
 Health related behaviors
 Information also drives health
 Knowledge and beliefs regarding prevention and
treatment
 Quality monitoring of health information
available to public 8
“Infodemiology”
[Eysenbach, 2006]
 Key challenge: text
 Variability: tired, wiped, pooped  somnolence
 Ambiguity: numb  sensory or cognition?
 Two levels
 Keyword: locate specific terms + synonyms
 Concept: attempt to normalize mentions to
specific entities
 Measurement
 Disproportionality analysis
 Separating signal from noise
9
 Objective: predict flu
outbreaks from internet
search trends
 Access to search data via
direct access to logs or via
ad clicks
 High correlation between
clicks one week and cases
the next
 Caveats!
 Many potential confounders
10
[Eysenbach, 2006]
[Eysenbach, 2009]
[Ginsberg et al., 2009]
2004 2005 2006 2007
searches
cases
 Objective: Mine social media
forums for ADR reports
 Lexicon based on UMLS
Metathesaurus, SIDER,
MedEffect, and a set of
colloquial phrases (“zonked”,
misspellings)
 Demonstrated viability of
text mining (73.9% f-
measure)
 Revealed known ADRs and
putatively novel ADRs
Olanzapine Known
incidence
Corpus
Frequency
Weight gain 65% 30.0%
Fatigue 26% 15.9%
Increased
cholesterol
22% -
Increased
appetite
- 4.9%
Depression - 3.1%
Tremor - 2.7%
Diabetes 2% 2.6%
Anxiety - 1.4%
11
[Leaman et al., 2010]
 Objective: identify DDI from
internet search logs
 DDI reports difficult to find
 Focused on a DDI unknown at
time data collected
▪ Paroxetine + pravastatin 
hyperglycemia
 Synonyms
 Web searches
 Disproportionality analysis
 Results
 Significant association
 Classifying 31TP & 31TN pairs
▪ AUC = 0.82 12
[White et al., 2013]
 Outsourcing
 Tasks normally performed in-house
 To a large, diverse, external group
 Via an open call
13
[Estelles-Arolas, 2012]
EXPERT LABOR
 Must be found
 Expensive
 Often slow
 High quality
 Ambiguity OK
 Hard to use for
experiments
 Must be retained
CROWD LABOR
 Readily available
 Inexpensive
 Fast
 Quality variable
 Instructions must be clear
 Easy prototyping and
experimentation
 Retention less important
14
 Humans (even unskilled) simply better than
computers at some tasks
 Allows workflows to include an “HPU”
 Highly scalable
 Rapid turn-around
 High throughput
 Diverse solutions
 Low risk
 Low cost
15
[Quinn & Bederson, 2011]
 Microtask: low difficulty, large in number
 Observations or data processing
 Surveying, text or image annotation
 Validation: redundancy and aggregation
 Megatask: high difficulty, low in number
 Problem solving, creative effort
 Validation: manually, with metrics or rubric
16
[Good & Su, 2013]
MICROTASK
 Microtask market
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Innovation contest
 Hard game
 Collaborative
content creation
17
[Good & Su, 2013]
18
Requester
Tasks
Amazon
Tasks
Tasks
TasksTasks
Tasks
Tasks
Aggregation
function
Workers
http://www.thesheepmarket.com/
 Automatically tag all genes (NCBI’s gene tagger), all
mutations (UMBC’s EMU)
 Highlight candidate gene-mutation pairs in context
 Frame task as simple yes/no questions
Slide courtesy: L. Hirschman [Burger et al., 2012]
20
21
[Mea 2014]
Tagging cells for
breast cancer
based on stain
22
Requester
Tasks
Amazon
Tasks
Tasks
TasksTasks
Tasks
Tasks
Aggregation
function
Workers
 Baseline: majority vote
 Can we do better?
 Separate annotator bias and error
 Model annotator quality
▪ Measure with labeled data or reputation
 Model difficulty of each task
 Sometimes disagreement is informative
23
[Ipeirotis et al., 2010]
[Raykar et al., 2010]
[Arroyo &Welty, 2013]
MICROTASK
 Microtask market
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Innovation contest
 Hard game
 Collaborative
content creation
24
[Good & Su, 2013]
 Volunteers label images of cell biopsies from
cancer patients
 Estimate presence and number of cancer cells
 Incentive
 Altruism, sense of mastery
 Quality
 training, redundancy
 Analyzed 2.4 million images as of 11/2014
25
[cellslider.net]
MICROTASK
 Microtask market
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Innovation contest
 Hard game
 Collaborative
framework
26
[Good & Su, 2013]
EXAMPLE: RECAPTCHA,
 Workflow:
logging into a
website
 Sequestration:
performing
optical
character
recognition
27
EXAMPLE: PROBLEM-TREATMENT KNOWLEDGE BASE CREATION
 Workflow: prescribing medication
 Sequestration:entering reason for prescription
into ordering system
28
[Mccoy 2012]
MICROTASK
 Microtask market
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Innovation contest
 Hard game
 Collaborative
content creation
29
[Good & Su, 2013]
30
MalariaSpot: Luengo-Ortiz 2012
MOLT: Mavandadi 2012
MICROTASK
 Microtask market
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Innovation contest
 Hard game
 Collaborative
content creation
31
[Good & Su, 2013]
 Bioinformatics students simultaneously learn
and perform metagenome annotation
 Incentive:
educational
 Quality:
aggregation,
instructor
evaluation
32[Hingamp et al., 2008]
MICROTASK
 Microtask market
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Innovation contest
 Hard game
 Collaborative
content creation
33
[Good & Su, 2013]
OPEN PROFESSIONAL PLATFORMS ($$$)
 Innocentive
 TopCoder
 Kaggle
ACADEMIC (PUBLICATIONS..)
 DREAM (see invited opening talk at crowdsourcing session)
 CASP
34
MICROTASK
 Microtask market
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Innovation contest
 Hard game
 Collaborative
content creation
35
[Good & Su, 2013]
 Players manipulate proteins to find the 3D
shape with the lowest calculated free energy
 Competitive and collaborative
 Incentive
 Altruism, fun, community
 Quality
 Automated scoring
 High performance, found
a difficult key retroviral structure
36
[Khatib, et al., 2011]
MICROTASK
 Microtask market
 Citizen science
 Workflow
sequestration
 Casual game
 Educational
MEGATASK
 Innovation contest
 Hard game
 Collaborative
content creation
37
 Aims to provide a
Wikipedia page for
every notable human
gene
 Repository of
functional knowledge
 10K distinct genes
 50M views & 15K edits
per year
38
[Huss et al., 2008]
[Good et al., 2011]
 Means many different things
 Fundamental points:
 Humans (even unskilled) simply better than
computers at some tasks
 There are a lot of humans available
 There are many approaches for accessing their
talents
39
INTRINSIC
 Altruism
 Fun
 Education
 Sense of mastery
 Resource creation
EXTRINSIC
 Money
 Recognition
 Community
40
 Define problem & goal
 Decide platform
 Decompose problem into tasks
 Separate: expert, crowdsourced & automatable
 Refine crowdsourced tasks
 Simple, clear, self-contained, engaging
 Design: instructions and user interface
41
[Hetmank, 2013]
[Alonso & Lease, 2011]
[Eickhoff & deVries, 2011]
 Iterate
 Test internally
 Calibrate with small crowdsourced sample
 Verify understanding, timing, pricing & quality
 Incorporate feedback
 Run production
 Scale on data before workers
 Validate results
42
[Hetmank, 2013]
[Alonso & Lease, 2011]
[Eickhoff & deVries, 2011]
 Automatic evaluation
 If possible
 Direct quality assessment
 Expensive
▪ Microtask: Include tasks with known answers
▪ Megatask: Evaluate tasks after completion (rubric)
 Aggregate redundant responses
43
PRO
 Reduced cost  more
data
 Fast turn-around time
 High throughput
 “Real world”
environment
 Public participation &
awareness
CON
 Potentially poor quality
 Spammers
 Potentially low
retention
 Privacy concerns for
sensitive data
 Lax protections for
workers
44
 Potentially poor quality: discussed previously
 Low retention
 Complicates quality estimation due to sparsity
 Do workers build task-specific expertise?
 Privacy
 Sensitive data requires trusted workers
45
 Protection for workers
 Low pay, no protections, benefits, or career path
 Potential to cause harm
▪ E.g. exposure to anti-vaccine information
 Is IRB approval needed?
 Can be addressed
 Responsibility of the researcher
▪ “[opportunity to] deliberately value ethics above cost
savings”
46
[Graber & Graber, 2013]
[Fort, Adda and Cohen, 2011]
[Fort, Adda and Cohen, 2011]
 Demographics:
 Shift from mostly US to US/India mix
 Average pay is <$2.00 / hour
 Over 30% rely on MTurk for basic income
 Workers not anonymous
 However:
 Tools can be used ethically or unethically
 Crowdsourcing ≠ AMT
47
[Ross et al., 2009]
[Lease et al., 2013]
 Improved predictability
 Pricing, quality, retention
 Improved infrastructure
 Data analysis, validation & aggregation
 Improved trust mechanisms
 Matching workers and tasks
 Relevant characteristics for matching each
 Increased mobility
48
 Crowdsourcing and learning from crowd data
offer distinct advantages
 Scalability
 Rapid turn-around
 Throughput
 Low cost
 Must be carefully planned and managed
49
 Wide variety of approaches and platforms
available
 Resources section lists several
 Many questions still open
 Science using crowdsourcing
 Science of crowdsourcing
50
 Thanks to the members of the crowd who make this
methodology possible
 Questions: robert.leaman@nih.gov,
bgood@scripps.edu, asu@scripps.edu
 Support:
 Robert Leaman & Zhiyong Lu:
▪ Intramural Research Program of National Library of Medicine, NIH
 Benjamin Good & Andrew Su:
▪ National Institute of General Medical Sciences, NIH: R01GM089820
and R01GM083924
▪ NationalCenter for AdvancingTranslational Sciences, NIH:
UL1TR001114
51
 Distributed computing: BOINC
 Microtask markets: Amazon MechanicalTurk,
Clickworker, SamaSource, many others
 Meta services: Crowdflower, Crowdsource
 Educational: annotathon.org
 Innovation contest: Innocentive,TopCoder
 Crowdfunding: Rockethub, Petridish
52
 Adar E:Why I hate MechanicalTurk research (and workshops). In: CHI: 2011;
Vancouver, BC, Canada. Citeseer.
 Alonso O, Lease M: Crowdsourcing for Information Retrieval: Principles, Methods
and Applications.Tutorial at ACM-SIGIR 2011.
 Aroyo L,Welty C: CrowdTruth: Harnessing disagreement in crowdsourcing a
relation extraction gold standard. In:WebSci2013 ACM 2013. 2013.
 Burger J, Doughty E, Bayer S,Tresner-Kirsch D,Wellner B, Aberdeen J, Lee K,
Kann M, Hirschman L:Validating Candidate Gene-Mutation Relations in MEDLINE
Abstracts via Crowdsourcing. In: Data Integration in the Life Sciences.vol. 7348:
Springer Berlin Heidelberg; 2012: 83-91.
 Eickhoff C, deVries A: How Crowdsourceable is yourTask? In:WSDM 2011
Workshop on Crowdsourcing for Search and Data Mining; Hong Kong, China. 2011:
11-14.
 Estelles-Arolas E, Gonzalez-Ladron-de-Guevara F:Towards an integrated
crowdsourcing definition. Journal of Information Science 2012, 38(189).
 Fort K, Adda G, Cohen KB: Amazon MechanicalTurk: Gold Mine or Coal Mine?
Computational Linguistics 2011, 37(2).
 Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L:
Detecting influenza epidemics using search engine query data. Nature 2009,
457(7232):1012-1014.
53
 Good BM, Clarke EL, de Alfaro L, Su AI: Gene Wiki in 2011: community
intelligence applied to human gene annotation. Nucleic Acids Res 2011, 40:D1255-
1261.
 Good BM, Su AI: Crowdsourcing for bioinformatics. Bioinformatics 2013,
29(16):1925-1933.
 Graber MA, Graber A: Internet-based crowdsourcing and research ethics: the
case for IRB review. Journal of medical ethics 2013, 39(2):115-118.
 Halevy A, Norvig P, Pereira F:The Unreasonable Effectiveness of Data. IEEE
Intelligent Systems 2009, 9:8-12.
 Harpaz R, Callahan A,Tamang S, LowY, Odgers D, Finlayson S, Jung K, LePendu
P, Shah NH:Text Mining for Adverse Drug Events: the Promise,Challenges, and
State of the Art. Drug Safety 2014, 37(10):777-790.
 Hetmank L: Components and Functions of Crowdsourcing Systems - A
Systematic Literature Review. In: 11th International Conference on
Wirtschaftsinformatik; Leipzip,Germany. 2013.
 Hingamp P, Brochier C,Talla E, Gautheret D,Thieffry D, Herrmann C:
Metagenome annotation using a distributed grid of undergraduate students. PLoS
biology 2008, 6(11):e296.
 Howe J: Crowdsourcing:Why the power of the crowd is driving the future of
business:Crown Business; 2009.
54
 Huss JW, Orozco D, Goodale J,Wu C, Batalov S,VickersTJ,Valafar F, Su AI:A
GeneWiki for Community Annotation of Gene Function. PLoS biology 2008,
6(7):e175.
 Ipeirotis P: Managing Crowdsourced Human Computation.Tutorial at WWW2011.
 Ipeirotis PG, Provost F,Wang J: Quality Management on Amazon Mechanical
Turk. In: KDD-HCOMP;Washington DC, USA. 2010.
 Khatib F, DiMaio F, Foldit Contenders G, FolditVoid Crushers G, Cooper S,
Kazmierczyk M, Gilski M, Krzywda S, Zabranska H, Pichova I et al: Crystal structure
of a monomeric retroviral protease solved by protein folding game players. Nature
structural & molecular biology 2011, 18(10):1175-1177.
 Leaman R,Wojtulewicz L, Sullivan R, Skariah A,Yang J, Gonzalez G:Towards
Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User
Posts to Health-Related Social Networks. In: BioNLPWorkshop; 2010: 117-125.
 Lease M, Hullman J, Bingham JP, Bernstein M, Kim J, LaseckiWS, Bakhshi S,
MitraT, Miller RC: MechanicalTurk is Not Anonymous. In.: Social Science Research
Network; 2013.
 Nakatsu RT, Grossman EB, Iacovou CL: A taxonomy of crowdsourcing based on
task complexity. Journal of Information Science 2014.
 Nielsen J: Usability Engineering:Academic Press; 1993.
55
 Pustejovsky J, Stubbs A: Natural Language Annotation for Machine Learning:
O'Reilly Media; 2012.
 Quinn AJ, Bederson BB: Human Computation: A Survey andTaxonomy of a
Growing Field. In: CHI;Vancouver, BC, Canada. 2011.
 Ranard BL, HaYP, Meisel ZF, Asch DA, Hill SS, Becker LB, Seymour AK, Merchant
RM: Crowdsourcing--harnessing the masses to advance health and medicine, a
systematic review. Journal ofGeneral Internal Medicine 2014, 29(1):187-203.
 RaykarVC,Yu S, Zhao LH,Valadez GH, Florin C, Bogoni L, Moy L: Learning from
Crowds. Journal of Machine Learning Research 2010, 11:1297-1332.
 Ross J, Zaldivar A, Irani L:Who are theTurkers?Worker demographics in Amazon
MechanicalTurk. In.: Department of Informatics, UC Irvine USA; 2009.
 Surowiecki J:The Wisdom of Crowds: Doubleday; 2004.
 Vakharia D, Lease M: Beyond AMT: AnAnalysis of Crowd Work Platforms. arXiv;
2013.
 Von Ahn L: Games with a Purpose.Computer 2006, 39(6):92-94.
 White R,Tatonetti NP, Shah NH, Altman RB, Horvitz E:Web-scale
pharmacovigilance: listening to signals from the crowd. J Am Med InformAssoc
2013, 20:404-408.
 Yuen M-C, King I, Leung K-S:A Survey of Crowdsourcing Systems. In: IEEE
International Conference on Privacy, Security, Risk andTrust. 2011.
56

Weitere ähnliche Inhalte

Ähnlich wie Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)

Perceived and Actual Role of Gamification Principles
Perceived and Actual Role of Gamification PrinciplesPerceived and Actual Role of Gamification Principles
Perceived and Actual Role of Gamification PrinciplesMichael Meder
 
CIT in information literacy, ECIL 2016, Sabina Cisek
CIT in information literacy, ECIL 2016, Sabina CisekCIT in information literacy, ECIL 2016, Sabina Cisek
CIT in information literacy, ECIL 2016, Sabina CisekSabina Cisek
 
Ways of seeing learning - 2017v1.0 - NUI Galway University of Limerick postgr...
Ways of seeing learning - 2017v1.0 - NUI Galway University of Limerick postgr...Ways of seeing learning - 2017v1.0 - NUI Galway University of Limerick postgr...
Ways of seeing learning - 2017v1.0 - NUI Galway University of Limerick postgr...Mary Loftus
 
EDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action ResearchEDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action Researcheckchela
 
Mobile Learning & New Trends
Mobile Learning & New TrendsMobile Learning & New Trends
Mobile Learning & New TrendsEADTU
 
ICIS Rating Scales for Collective IntelligenceIcis idea rating-v1.0-final
ICIS Rating Scales for Collective IntelligenceIcis idea rating-v1.0-finalICIS Rating Scales for Collective IntelligenceIcis idea rating-v1.0-final
ICIS Rating Scales for Collective IntelligenceIcis idea rating-v1.0-finalriedlc
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data SciencePhilip Bourne
 
8-year Evaluation of GameBus: Status quo in Aiming for an Open Access Platfor...
8-year Evaluation of GameBus: Status quo in Aiming for an Open Access Platfor...8-year Evaluation of GameBus: Status quo in Aiming for an Open Access Platfor...
8-year Evaluation of GameBus: Status quo in Aiming for an Open Access Platfor...Pieter Van Gorp
 
Measuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systemsMeasuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systemsAmit Sharma
 
How to find out about the usability of your web site using a survey by @cjforms
How to find out about the usability of your web site using a survey by @cjformsHow to find out about the usability of your web site using a survey by @cjforms
How to find out about the usability of your web site using a survey by @cjformsCaroline Jarrett
 
Quality Forum new technologies (sessionD7)
Quality Forum new technologies (sessionD7)Quality Forum new technologies (sessionD7)
Quality Forum new technologies (sessionD7)MedEdHelen
 
Predictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallPredictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallDATAVERSITY
 
Expo Day: Neuroenginnering, BPI, Arrowsmith Program & ARPF
Expo Day: Neuroenginnering, BPI, Arrowsmith Program & ARPFExpo Day: Neuroenginnering, BPI, Arrowsmith Program & ARPF
Expo Day: Neuroenginnering, BPI, Arrowsmith Program & ARPFSharpBrains
 
Using Experiments and Cognitive Science Research to Improve the Design of Onl...
Using Experiments and Cognitive Science Research to Improve the Design of Onl...Using Experiments and Cognitive Science Research to Improve the Design of Onl...
Using Experiments and Cognitive Science Research to Improve the Design of Onl...Joseph Jay Williams
 
Content validity study: a gamification model to drive behavior change in defe...
Content validity study: a gamification model to drive behavior change in defe...Content validity study: a gamification model to drive behavior change in defe...
Content validity study: a gamification model to drive behavior change in defe...IJECEIAES
 
Networked Participatory Action Research: How it worked in the first year of t...
Networked Participatory Action Research: How it worked in the first year of t...Networked Participatory Action Research: How it worked in the first year of t...
Networked Participatory Action Research: How it worked in the first year of t...Alana James
 

Ähnlich wie Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015) (20)

Perceived and Actual Role of Gamification Principles
Perceived and Actual Role of Gamification PrinciplesPerceived and Actual Role of Gamification Principles
Perceived and Actual Role of Gamification Principles
 
CIT in information literacy, ECIL 2016, Sabina Cisek
CIT in information literacy, ECIL 2016, Sabina CisekCIT in information literacy, ECIL 2016, Sabina Cisek
CIT in information literacy, ECIL 2016, Sabina Cisek
 
Ways of seeing learning - 2017v1.0 - NUI Galway University of Limerick postgr...
Ways of seeing learning - 2017v1.0 - NUI Galway University of Limerick postgr...Ways of seeing learning - 2017v1.0 - NUI Galway University of Limerick postgr...
Ways of seeing learning - 2017v1.0 - NUI Galway University of Limerick postgr...
 
EDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action ResearchEDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action Research
 
We are the data
We are the dataWe are the data
We are the data
 
Mobile Learning & New Trends
Mobile Learning & New TrendsMobile Learning & New Trends
Mobile Learning & New Trends
 
ICIS Rating Scales for Collective IntelligenceIcis idea rating-v1.0-final
ICIS Rating Scales for Collective IntelligenceIcis idea rating-v1.0-finalICIS Rating Scales for Collective IntelligenceIcis idea rating-v1.0-final
ICIS Rating Scales for Collective IntelligenceIcis idea rating-v1.0-final
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
8-year Evaluation of GameBus: Status quo in Aiming for an Open Access Platfor...
8-year Evaluation of GameBus: Status quo in Aiming for an Open Access Platfor...8-year Evaluation of GameBus: Status quo in Aiming for an Open Access Platfor...
8-year Evaluation of GameBus: Status quo in Aiming for an Open Access Platfor...
 
Measuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systemsMeasuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systems
 
How to find out about the usability of your web site using a survey by @cjforms
How to find out about the usability of your web site using a survey by @cjformsHow to find out about the usability of your web site using a survey by @cjforms
How to find out about the usability of your web site using a survey by @cjforms
 
AMIA 2014
AMIA 2014AMIA 2014
AMIA 2014
 
Quality Forum new technologies (sessionD7)
Quality Forum new technologies (sessionD7)Quality Forum new technologies (sessionD7)
Quality Forum new technologies (sessionD7)
 
Predictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallPredictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal Ball
 
Expo Day: Neuroenginnering, BPI, Arrowsmith Program & ARPF
Expo Day: Neuroenginnering, BPI, Arrowsmith Program & ARPFExpo Day: Neuroenginnering, BPI, Arrowsmith Program & ARPF
Expo Day: Neuroenginnering, BPI, Arrowsmith Program & ARPF
 
Inquiry Learning and the Big6
Inquiry Learning and the Big6Inquiry Learning and the Big6
Inquiry Learning and the Big6
 
Using Experiments and Cognitive Science Research to Improve the Design of Onl...
Using Experiments and Cognitive Science Research to Improve the Design of Onl...Using Experiments and Cognitive Science Research to Improve the Design of Onl...
Using Experiments and Cognitive Science Research to Improve the Design of Onl...
 
Content validity study: a gamification model to drive behavior change in defe...
Content validity study: a gamification model to drive behavior change in defe...Content validity study: a gamification model to drive behavior change in defe...
Content validity study: a gamification model to drive behavior change in defe...
 
Networked Participatory Action Research: How it worked in the first year of t...
Networked Participatory Action Research: How it worked in the first year of t...Networked Participatory Action Research: How it worked in the first year of t...
Networked Participatory Action Research: How it worked in the first year of t...
 
Strategic systems improvement pp poste
Strategic systems improvement pp posteStrategic systems improvement pp poste
Strategic systems improvement pp poste
 

Mehr von Andrew Su

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphAndrew Su
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesAndrew Su
 
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeAndrew Su
 
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...Andrew Su
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)Andrew Su
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseAndrew Su
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Andrew Su
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Andrew Su
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchAndrew Su
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceAndrew Su
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceAndrew Su
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Andrew Su
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeAndrew Su
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6Andrew Su
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceAndrew Su
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Andrew Su
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgAndrew Su
 
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...Andrew Su
 

Mehr von Andrew Su (20)

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciences
 
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
 
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebase
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease Research
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen Science
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledge
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
 

Kürzlich hochgeladen

Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxSimeonChristian
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)itwameryclare
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 

Kürzlich hochgeladen (20)

Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 

Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)

  • 1. Robert Leaman Benjamin Good Zhiyong Lu Andrew Su http://slideshare.net/andrewsu
  • 2.  The aggregated decisions of a group are often better than the those of any single member  Requirements:  Diversity  Independence  Decentralization  Aggregation 2[Surowiecki, 2004] Sir Francis Galton
  • 3.  An undefined group of people  Typically ‘large’  Diverse skills and abilities  Typically no special skills assumed 3 [Estelles-Arolas, 2012]
  • 4.  Computational power  Distributed computing  Content  Web searches, social media updates, blogs  Observations  Online surveys  Personal data 4[Good & Su, 2013]
  • 5.  Cognitive power  Visual reasoning, language processing  Creative effort  Resource creation, algorithm development  Funding: $$$ 5[Good & Su, 2013]
  • 6.  Crowd data  Content  Search logs  Crowdsourcing  Observations  Cognitive power  Creative effort  Not a focus in this tutorial  Distributed computing  Crowdfunding 6
  • 7.  Access  To the data; to the crowd ▪ 1 in 5 people have a smartphone worldwide  Engagement  Getting contributors’ attention  Incentive  Quality control 7
  • 8.  Information reflects health  Disease status  Disease associations  Health related behaviors  Information also drives health  Knowledge and beliefs regarding prevention and treatment  Quality monitoring of health information available to public 8 “Infodemiology” [Eysenbach, 2006]
  • 9.  Key challenge: text  Variability: tired, wiped, pooped  somnolence  Ambiguity: numb  sensory or cognition?  Two levels  Keyword: locate specific terms + synonyms  Concept: attempt to normalize mentions to specific entities  Measurement  Disproportionality analysis  Separating signal from noise 9
  • 10.  Objective: predict flu outbreaks from internet search trends  Access to search data via direct access to logs or via ad clicks  High correlation between clicks one week and cases the next  Caveats!  Many potential confounders 10 [Eysenbach, 2006] [Eysenbach, 2009] [Ginsberg et al., 2009] 2004 2005 2006 2007 searches cases
  • 11.  Objective: Mine social media forums for ADR reports  Lexicon based on UMLS Metathesaurus, SIDER, MedEffect, and a set of colloquial phrases (“zonked”, misspellings)  Demonstrated viability of text mining (73.9% f- measure)  Revealed known ADRs and putatively novel ADRs Olanzapine Known incidence Corpus Frequency Weight gain 65% 30.0% Fatigue 26% 15.9% Increased cholesterol 22% - Increased appetite - 4.9% Depression - 3.1% Tremor - 2.7% Diabetes 2% 2.6% Anxiety - 1.4% 11 [Leaman et al., 2010]
  • 12.  Objective: identify DDI from internet search logs  DDI reports difficult to find  Focused on a DDI unknown at time data collected ▪ Paroxetine + pravastatin  hyperglycemia  Synonyms  Web searches  Disproportionality analysis  Results  Significant association  Classifying 31TP & 31TN pairs ▪ AUC = 0.82 12 [White et al., 2013]
  • 13.  Outsourcing  Tasks normally performed in-house  To a large, diverse, external group  Via an open call 13 [Estelles-Arolas, 2012]
  • 14. EXPERT LABOR  Must be found  Expensive  Often slow  High quality  Ambiguity OK  Hard to use for experiments  Must be retained CROWD LABOR  Readily available  Inexpensive  Fast  Quality variable  Instructions must be clear  Easy prototyping and experimentation  Retention less important 14
  • 15.  Humans (even unskilled) simply better than computers at some tasks  Allows workflows to include an “HPU”  Highly scalable  Rapid turn-around  High throughput  Diverse solutions  Low risk  Low cost 15 [Quinn & Bederson, 2011]
  • 16.  Microtask: low difficulty, large in number  Observations or data processing  Surveying, text or image annotation  Validation: redundancy and aggregation  Megatask: high difficulty, low in number  Problem solving, creative effort  Validation: manually, with metrics or rubric 16 [Good & Su, 2013]
  • 17. MICROTASK  Microtask market  Citizen science  Workflow sequestration  Casual game  Educational MEGATASK  Innovation contest  Hard game  Collaborative content creation 17 [Good & Su, 2013]
  • 19.  Automatically tag all genes (NCBI’s gene tagger), all mutations (UMBC’s EMU)  Highlight candidate gene-mutation pairs in context  Frame task as simple yes/no questions Slide courtesy: L. Hirschman [Burger et al., 2012]
  • 20. 20
  • 21. 21 [Mea 2014] Tagging cells for breast cancer based on stain
  • 23.  Baseline: majority vote  Can we do better?  Separate annotator bias and error  Model annotator quality ▪ Measure with labeled data or reputation  Model difficulty of each task  Sometimes disagreement is informative 23 [Ipeirotis et al., 2010] [Raykar et al., 2010] [Arroyo &Welty, 2013]
  • 24. MICROTASK  Microtask market  Citizen science  Workflow sequestration  Casual game  Educational MEGATASK  Innovation contest  Hard game  Collaborative content creation 24 [Good & Su, 2013]
  • 25.  Volunteers label images of cell biopsies from cancer patients  Estimate presence and number of cancer cells  Incentive  Altruism, sense of mastery  Quality  training, redundancy  Analyzed 2.4 million images as of 11/2014 25 [cellslider.net]
  • 26. MICROTASK  Microtask market  Citizen science  Workflow sequestration  Casual game  Educational MEGATASK  Innovation contest  Hard game  Collaborative framework 26 [Good & Su, 2013]
  • 27. EXAMPLE: RECAPTCHA,  Workflow: logging into a website  Sequestration: performing optical character recognition 27
  • 28. EXAMPLE: PROBLEM-TREATMENT KNOWLEDGE BASE CREATION  Workflow: prescribing medication  Sequestration:entering reason for prescription into ordering system 28 [Mccoy 2012]
  • 29. MICROTASK  Microtask market  Citizen science  Workflow sequestration  Casual game  Educational MEGATASK  Innovation contest  Hard game  Collaborative content creation 29 [Good & Su, 2013]
  • 31. MICROTASK  Microtask market  Citizen science  Workflow sequestration  Casual game  Educational MEGATASK  Innovation contest  Hard game  Collaborative content creation 31 [Good & Su, 2013]
  • 32.  Bioinformatics students simultaneously learn and perform metagenome annotation  Incentive: educational  Quality: aggregation, instructor evaluation 32[Hingamp et al., 2008]
  • 33. MICROTASK  Microtask market  Citizen science  Workflow sequestration  Casual game  Educational MEGATASK  Innovation contest  Hard game  Collaborative content creation 33 [Good & Su, 2013]
  • 34. OPEN PROFESSIONAL PLATFORMS ($$$)  Innocentive  TopCoder  Kaggle ACADEMIC (PUBLICATIONS..)  DREAM (see invited opening talk at crowdsourcing session)  CASP 34
  • 35. MICROTASK  Microtask market  Citizen science  Workflow sequestration  Casual game  Educational MEGATASK  Innovation contest  Hard game  Collaborative content creation 35 [Good & Su, 2013]
  • 36.  Players manipulate proteins to find the 3D shape with the lowest calculated free energy  Competitive and collaborative  Incentive  Altruism, fun, community  Quality  Automated scoring  High performance, found a difficult key retroviral structure 36 [Khatib, et al., 2011]
  • 37. MICROTASK  Microtask market  Citizen science  Workflow sequestration  Casual game  Educational MEGATASK  Innovation contest  Hard game  Collaborative content creation 37
  • 38.  Aims to provide a Wikipedia page for every notable human gene  Repository of functional knowledge  10K distinct genes  50M views & 15K edits per year 38 [Huss et al., 2008] [Good et al., 2011]
  • 39.  Means many different things  Fundamental points:  Humans (even unskilled) simply better than computers at some tasks  There are a lot of humans available  There are many approaches for accessing their talents 39
  • 40. INTRINSIC  Altruism  Fun  Education  Sense of mastery  Resource creation EXTRINSIC  Money  Recognition  Community 40
  • 41.  Define problem & goal  Decide platform  Decompose problem into tasks  Separate: expert, crowdsourced & automatable  Refine crowdsourced tasks  Simple, clear, self-contained, engaging  Design: instructions and user interface 41 [Hetmank, 2013] [Alonso & Lease, 2011] [Eickhoff & deVries, 2011]
  • 42.  Iterate  Test internally  Calibrate with small crowdsourced sample  Verify understanding, timing, pricing & quality  Incorporate feedback  Run production  Scale on data before workers  Validate results 42 [Hetmank, 2013] [Alonso & Lease, 2011] [Eickhoff & deVries, 2011]
  • 43.  Automatic evaluation  If possible  Direct quality assessment  Expensive ▪ Microtask: Include tasks with known answers ▪ Megatask: Evaluate tasks after completion (rubric)  Aggregate redundant responses 43
  • 44. PRO  Reduced cost  more data  Fast turn-around time  High throughput  “Real world” environment  Public participation & awareness CON  Potentially poor quality  Spammers  Potentially low retention  Privacy concerns for sensitive data  Lax protections for workers 44
  • 45.  Potentially poor quality: discussed previously  Low retention  Complicates quality estimation due to sparsity  Do workers build task-specific expertise?  Privacy  Sensitive data requires trusted workers 45
  • 46.  Protection for workers  Low pay, no protections, benefits, or career path  Potential to cause harm ▪ E.g. exposure to anti-vaccine information  Is IRB approval needed?  Can be addressed  Responsibility of the researcher ▪ “[opportunity to] deliberately value ethics above cost savings” 46 [Graber & Graber, 2013] [Fort, Adda and Cohen, 2011] [Fort, Adda and Cohen, 2011]
  • 47.  Demographics:  Shift from mostly US to US/India mix  Average pay is <$2.00 / hour  Over 30% rely on MTurk for basic income  Workers not anonymous  However:  Tools can be used ethically or unethically  Crowdsourcing ≠ AMT 47 [Ross et al., 2009] [Lease et al., 2013]
  • 48.  Improved predictability  Pricing, quality, retention  Improved infrastructure  Data analysis, validation & aggregation  Improved trust mechanisms  Matching workers and tasks  Relevant characteristics for matching each  Increased mobility 48
  • 49.  Crowdsourcing and learning from crowd data offer distinct advantages  Scalability  Rapid turn-around  Throughput  Low cost  Must be carefully planned and managed 49
  • 50.  Wide variety of approaches and platforms available  Resources section lists several  Many questions still open  Science using crowdsourcing  Science of crowdsourcing 50
  • 51.  Thanks to the members of the crowd who make this methodology possible  Questions: robert.leaman@nih.gov, bgood@scripps.edu, asu@scripps.edu  Support:  Robert Leaman & Zhiyong Lu: ▪ Intramural Research Program of National Library of Medicine, NIH  Benjamin Good & Andrew Su: ▪ National Institute of General Medical Sciences, NIH: R01GM089820 and R01GM083924 ▪ NationalCenter for AdvancingTranslational Sciences, NIH: UL1TR001114 51
  • 52.  Distributed computing: BOINC  Microtask markets: Amazon MechanicalTurk, Clickworker, SamaSource, many others  Meta services: Crowdflower, Crowdsource  Educational: annotathon.org  Innovation contest: Innocentive,TopCoder  Crowdfunding: Rockethub, Petridish 52
  • 53.  Adar E:Why I hate MechanicalTurk research (and workshops). In: CHI: 2011; Vancouver, BC, Canada. Citeseer.  Alonso O, Lease M: Crowdsourcing for Information Retrieval: Principles, Methods and Applications.Tutorial at ACM-SIGIR 2011.  Aroyo L,Welty C: CrowdTruth: Harnessing disagreement in crowdsourcing a relation extraction gold standard. In:WebSci2013 ACM 2013. 2013.  Burger J, Doughty E, Bayer S,Tresner-Kirsch D,Wellner B, Aberdeen J, Lee K, Kann M, Hirschman L:Validating Candidate Gene-Mutation Relations in MEDLINE Abstracts via Crowdsourcing. In: Data Integration in the Life Sciences.vol. 7348: Springer Berlin Heidelberg; 2012: 83-91.  Eickhoff C, deVries A: How Crowdsourceable is yourTask? In:WSDM 2011 Workshop on Crowdsourcing for Search and Data Mining; Hong Kong, China. 2011: 11-14.  Estelles-Arolas E, Gonzalez-Ladron-de-Guevara F:Towards an integrated crowdsourcing definition. Journal of Information Science 2012, 38(189).  Fort K, Adda G, Cohen KB: Amazon MechanicalTurk: Gold Mine or Coal Mine? Computational Linguistics 2011, 37(2).  Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L: Detecting influenza epidemics using search engine query data. Nature 2009, 457(7232):1012-1014. 53
  • 54.  Good BM, Clarke EL, de Alfaro L, Su AI: Gene Wiki in 2011: community intelligence applied to human gene annotation. Nucleic Acids Res 2011, 40:D1255- 1261.  Good BM, Su AI: Crowdsourcing for bioinformatics. Bioinformatics 2013, 29(16):1925-1933.  Graber MA, Graber A: Internet-based crowdsourcing and research ethics: the case for IRB review. Journal of medical ethics 2013, 39(2):115-118.  Halevy A, Norvig P, Pereira F:The Unreasonable Effectiveness of Data. IEEE Intelligent Systems 2009, 9:8-12.  Harpaz R, Callahan A,Tamang S, LowY, Odgers D, Finlayson S, Jung K, LePendu P, Shah NH:Text Mining for Adverse Drug Events: the Promise,Challenges, and State of the Art. Drug Safety 2014, 37(10):777-790.  Hetmank L: Components and Functions of Crowdsourcing Systems - A Systematic Literature Review. In: 11th International Conference on Wirtschaftsinformatik; Leipzip,Germany. 2013.  Hingamp P, Brochier C,Talla E, Gautheret D,Thieffry D, Herrmann C: Metagenome annotation using a distributed grid of undergraduate students. PLoS biology 2008, 6(11):e296.  Howe J: Crowdsourcing:Why the power of the crowd is driving the future of business:Crown Business; 2009. 54
  • 55.  Huss JW, Orozco D, Goodale J,Wu C, Batalov S,VickersTJ,Valafar F, Su AI:A GeneWiki for Community Annotation of Gene Function. PLoS biology 2008, 6(7):e175.  Ipeirotis P: Managing Crowdsourced Human Computation.Tutorial at WWW2011.  Ipeirotis PG, Provost F,Wang J: Quality Management on Amazon Mechanical Turk. In: KDD-HCOMP;Washington DC, USA. 2010.  Khatib F, DiMaio F, Foldit Contenders G, FolditVoid Crushers G, Cooper S, Kazmierczyk M, Gilski M, Krzywda S, Zabranska H, Pichova I et al: Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nature structural & molecular biology 2011, 18(10):1175-1177.  Leaman R,Wojtulewicz L, Sullivan R, Skariah A,Yang J, Gonzalez G:Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts to Health-Related Social Networks. In: BioNLPWorkshop; 2010: 117-125.  Lease M, Hullman J, Bingham JP, Bernstein M, Kim J, LaseckiWS, Bakhshi S, MitraT, Miller RC: MechanicalTurk is Not Anonymous. In.: Social Science Research Network; 2013.  Nakatsu RT, Grossman EB, Iacovou CL: A taxonomy of crowdsourcing based on task complexity. Journal of Information Science 2014.  Nielsen J: Usability Engineering:Academic Press; 1993. 55
  • 56.  Pustejovsky J, Stubbs A: Natural Language Annotation for Machine Learning: O'Reilly Media; 2012.  Quinn AJ, Bederson BB: Human Computation: A Survey andTaxonomy of a Growing Field. In: CHI;Vancouver, BC, Canada. 2011.  Ranard BL, HaYP, Meisel ZF, Asch DA, Hill SS, Becker LB, Seymour AK, Merchant RM: Crowdsourcing--harnessing the masses to advance health and medicine, a systematic review. Journal ofGeneral Internal Medicine 2014, 29(1):187-203.  RaykarVC,Yu S, Zhao LH,Valadez GH, Florin C, Bogoni L, Moy L: Learning from Crowds. Journal of Machine Learning Research 2010, 11:1297-1332.  Ross J, Zaldivar A, Irani L:Who are theTurkers?Worker demographics in Amazon MechanicalTurk. In.: Department of Informatics, UC Irvine USA; 2009.  Surowiecki J:The Wisdom of Crowds: Doubleday; 2004.  Vakharia D, Lease M: Beyond AMT: AnAnalysis of Crowd Work Platforms. arXiv; 2013.  Von Ahn L: Games with a Purpose.Computer 2006, 39(6):92-94.  White R,Tatonetti NP, Shah NH, Altman RB, Horvitz E:Web-scale pharmacovigilance: listening to signals from the crowd. J Am Med InformAssoc 2013, 20:404-408.  Yuen M-C, King I, Leung K-S:A Survey of Crowdsourcing Systems. In: IEEE International Conference on Privacy, Security, Risk andTrust. 2011. 56

Hinweis der Redaktion

  1. Vox populi = “one vote, one value” 787 votes on ox weight, the median value was <1% off, mean was even closer Criteria Description Diversity of opinion Each person should have private information even if it's just an eccentric interpretation of the known facts. Independence People's opinions aren't determined by the opinions of those around them. Decentralization People are able to specialize and draw on local knowledge. Aggregation Some mechanism exists for turning private judgments into a collective decision.
  2. Drawn examples from biomedical research – many examples in other fields from astronomy to botany to ornithology
  3. Some links for distributed computing and crowdfunding on resources page
  4. Access to data can be hard
  5. Blurs the line between demand crowd data and observational crowdsourcing Example confounders – changes in search engine algorithm, seasonal searches, media reports, baseline search activity
  6. Olanzapine used to treat schizophrenia and bipolar depression Most frequently mentioned ADR was always a known ADR “We used the DailyStrength1 health-related social network as the source of user comments in this study. DailyStrength allows users to create profiles, maintain friends and join various disease-related support groups. It serves as a resource for patients to connect with others who have similar conditions, many of whom are friends solely online. As of 2007, DailyStrength had an average of 14,000 daily visitors, each spending 82 minutes on the site and viewing approximately 145 pages (comScore Media Metrix Canada, 2007).»
  7. DDI officially described in 2011, web search logs from 2010
  8. credit Aaron Koblin - integrate with previous
  9. Animate red box to emphasize Turkers don't see it
  10. Using NLP to tag diseases and conditions in drug labels. One disease at a time. Ask turkers to answer yes/no questions w.r.t. whether the highlighted disease is an indicated use of the highlighted drug.
  11. This is a jumping off point for the audience to consider.
  12. Note the differences between this and AMT. Incentives are different, tasks are the same, training same, aggregation same, Cost scales differently..
  13. CACAO Jim Hu.
  14. “Instrumental” ??
  15. Task-specific expertise is lost at end of experiment