SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Microtask crowdsourcing for
disease mention annotation
in PubMed abstracts
Benjamin Good, Max Nanis, Andrew Su
The Scripps Research Institute
@bgood
• Rapid growth of text
Long term goal: improve
information extraction from text
2
• Existing computational
methods
- are not perfect
- need training data
pubs/year
>100/hour
Information Extraction
1. Find mentions of high level concepts in text
2. Map mentions to specific terms in ontologies
3. Identify relationships between concepts
3
Crowdsourcing
There is accumulating evidence that many
non-expert members of ‘the crowd’ can
read English well enough to help with many
information extraction tasks - even in
complex biomedical text
4 Zhai 2013, Aroyo 2013, Burger 2014
Microtask Crowdsourcing
• Distribute discrete units of work
(aka “human intelligence tasks” or
HITs) to many workers in parallel
who are paid to solve them.
5
Reported 500,000
registered workers in
2011 [1]
[1] Paritosh P, Ipeirotis P, Cooper M, Suri S: The computer is the new sewing
machine: benefits and perils of crowdsourcing. WWW '11 2011:325–326.
AMT, how it works
6
Requester Tasks
Amazon
For each task, specify:
• a qualification test
• how many workers per
task
• how much we will pay
per task
• in this case, a link to a
website that we host
where they can
complete the task.
Interact directly with
Amazon system
Manages:
• parallel execution of jobs
• worker access to tasks
via qualification tests
• payments
• task advertising
Workers
How well can AMT workers, in aggregate,
reproduce a gold standard disease mention
corpus within the text of PubMed abstracts?
7
Corpus used for comparison
NCBI Disease corpus
• 793 PubMed abstracts
• (100 development, 593 training, 100 test)
• 12 expert annotators (2 annotate each abstract)
6,900 “disease” mentions
8
Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012
Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics.
Disease
Phrase is a disease IF:
• it can be mapped to a unique UMLS metathesaurus
concept in one of these semantic types
9
Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012
Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics.
• and it contains information helpful to physicians
10
• Specific Disease:
• “Diastrophic dysplasia”
• Disease Class:
• “Cancers”
• Composite Mention:
• “prostatic , skin , and lung cancer”
• Modifier:
• ..the “familial breast cancer” gene , BRCA2..
Disease
mentions
Instructions
• Task: You will be presented with text from the biomedical literature which we believe may help
resolve some important medical questions. The task is to highlight words and phrases in that
text which are diseases, disease groups, or symptoms of diseases. This work will help
advance research in cancer and many other diseases!
• Highlight all diseases and disease abbreviations !
• “...are associated with Huntington disease ( HD )... HD patients
received...”
• “The Wiskott-Aldrich syndrome ( WAS ) , an X-linked immunodeficiency…”
• Highlight the longest span of text specific to a disease !
• “... contains the insulin-dependent diabetes mellitus locus …”
• and not just ‘diabetes’.
• Highlight disease conjunctions as single, long spans.
• “... a significant fraction of familial breast and ovarian cancer , but
undergoes…”
• Highlight symptoms - physical results of having a disease!
• “XFE progeroid syndrome can cause dwarfism, cachexia, and microcephaly.
Patients often display learning disabilities, hearing loss, and visual impairment.
11
Qualification task: Q1
Select all and only the terms that should be
highlighted for each text segment:
12
1. “Myotonic dystrophy ( DM ) is associated with a ( CTG ) n trinucleotide repeat expansion in
the 3-untranslated region of a protein kinase-encoding gene , DMPK , which maps to
chromosome 19q13 . 3 . ”
• Myotonic
• dystrophy
• Myotonic dystrophy
• DM
• CTG
• trinucleotide repeat expansion
• kinase-encoding gene
• DMPK
Qualification task: Q2
13
2. “Germline mutations in BRCA1 are responsible for most cases of inherited breast
and ovarian cancer . However , the function of the BRCA1 protein has remained
elusive . As a regulated secretory protein , BRCA1 appears to function by a
mechanism not previously described for tumour suppressor gene products.”
• Germline mutations
• BRCA1
• breast
• ovarian cancer
• inherited breast and ovarian cancer
• cancer
• tumour
• tumour suppressor
Qualification task: Q3
14
3. “We report about Dr . Kniest , who first described the condition in 1952 , and his patient ,
who , at the age of 50 years is severely handicapped with short stature , restricted joint
mobility , and blindness but is mentally alert and leads an active life . This is in accordance
with molecular findings in other patients with Kniest dysplasia and…”
• age of 50 years
• severely handicapped
• short
• short stature
• restricted joint mobility
• blindness
• mentally alert
• molecular findings
• Kniest dysplasia
• dysplasia
Qualification task results
15
Threshold
for passing
33/194 passed
17%
Workers
qualified
workers
Tagging interface
16
Click to see instructions
Highlight
mentions
Experiment
17
Identify the disease mentions in the 593
abstracts from the NCBI disease corpus
• 6 cents per HIT
• HIT = annotate one abstract from PubMed
• 5 workers annotate each abstract
AMT, how it really works
18
Requester
Tasks
Amazon
Aggregation
function
Workers
http://www.thesheepmarket.com/
Increase precision with voting
19
1 or more votes (K=1)
This molecule inhibits the growth of a broad
panel of cancer cell lines, and is particularly
efficacious in leukemia cells, including
orthotopic leukemia preclinical models as well
as in ex vivo acute myeloid leukemia (AML)
and chronic lymphocytic leukemia (CLL)
patient tumor samples. Thus, inhibition of
CDK9 may represent an interesting approach
as a cancer therapeutic target especially in
hematologic malignancies.
K=2
This molecule inhibits the growth of a broad
panel of cancer cell lines, and is particularly
efficacious in leukemia cells, including
orthotopic leukemia preclinical models as well
as in ex vivo acute myeloid leukemia (AML)
and chronic lymphocytic leukemia (CLL)
patient tumor samples. Thus, inhibition of
CDK9 may represent an interesting approach
as a cancer therapeutic target especially in
hematologic malignancies.
K=3
This molecule inhibits the growth of a broad
panel of cancer cell lines, and is particularly
efficacious in leukemia cells, including
orthotopic leukemia preclinical models as well
as in ex vivo acute myeloid leukemia (AML)
and chronic lymphocytic leukemia (CLL)
patient tumor samples. Thus, inhibition of
CDK9 may represent an interesting approach
as a cancer therapeutic target especially in
hematologic malignancies.
K=4
This molecule inhibits the growth of a broad
panel of cancer cell lines, and is particularly
efficacious in leukemia cells, including
orthotopic leukemia preclinical models as well
as in ex vivo acute myeloid leukemia (AML)
and chronic lymphocytic leukemia (CLL)
patient tumor samples. Thus, inhibition of
CDK9 may represent an interesting approach
as a cancer therapeutic target especially in
hematologic malignancies.
Aggregation
function
Results 593 abstracts
compared to gold standard
• 7 days
• $192.90
• 17 workers
20
F = 0.81, k = 2
Inter-Annotator agreement among
experts, NCBI Disease corpus
21
Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of
the 2012 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics, 2012.
0.76
0.87
Average level
of agreement
between expert
annotators
(stage 1)
In aggregate, our worker ensemble is faster,
cheaper and as accurate as a single expert
annotator for this task
• experts had consistency (F) with other experts =
0.76.
• The turker ensemble had consistency with the
finalized standard = 0.81
22
Summary
• Some members of the crowd can tag “disease”
mentions in PubMed abstracts with comparable
accuracy to experts
• This was nontrivial to set up
• We can now generate disease mention
annotations at a rate of about 500 abstracts and
$150 per week
• Next step: mentions to concepts…
23
The Future
• It looks like, if we want to, we can have access
to much larger sets of annotated corpora than
ever before
• The annotations are different
• New ways of using and evaluating IE algorithms
are needed [1].
24
[1] Aroyo, Lora, and Chris Welty. Harnessing disagreement in crowdsourcing a relation
extraction gold standard. Tech. Rep. RC25371 (WAT1304-058), IBM Research, 2013.
Thanks
25
Max Nanis Andrew Su
Mechanical Turk Workers!
@bgood
bgood@scipps.edu
Try it yourself!
• GATE crowdsourcing plugin.
http://gate.ac.uk/wiki/crowdsourcing.html
• Or you can try our code at
https://bitbucket.org/sulab/mark2cure/
!
• And present your findings at the crowdsourcing
session at the Pacific Symposium on
Biocomputing January 2015, Big Island, Hawaii
26
Clarification…
• This is NOT a replacement for
professional annotators
• This IS a tool that could be used by
professional annotators
27
Related work
• [1] Zhai et al 2013, used similar protocol to tag medication
names in clinical trials descriptions. F = 0.88 compared to
gold standard
• [2] Burger et al, using microtask workers to identify
relationships between genes and mutations.
• [3] Aroyo & Welty, used workers to identify relations
between concepts in medical text.
28
[1] Zhai H. et al (2013) ”Web 2.0-Based Crowdsourcing for High-Quality Gold Standard
Development in Clinical Natural Language Processing” J Med Internet Res
[2] Burger, John, et al. (2014) "Hybrid curation of gene-mutation relations combining automated
extraction and crowdsourcing.” Mitre technical report
[3] Aroyo, Lora, and Chris Welty. Harnessing disagreement in crowdsourcing a relation extraction
gold standard. Tech. Rep. RC25371 (WAT1304-058), IBM Research, 2013.

Weitere ähnliche Inhalte

Was ist angesagt?

3D In Vitro Model for Drug Efficiency Testing
3D In Vitro Model for Drug Efficiency Testing3D In Vitro Model for Drug Efficiency Testing
3D In Vitro Model for Drug Efficiency Testingjudoublen
 
Potentials of 3D models in anticancer drug screening
Potentials of 3D models in anticancer drug screeningPotentials of 3D models in anticancer drug screening
Potentials of 3D models in anticancer drug screeningAnjali R.
 
Interrogating differences in expression of targeted gene sets to predict brea...
Interrogating differences in expression of targeted gene sets to predict brea...Interrogating differences in expression of targeted gene sets to predict brea...
Interrogating differences in expression of targeted gene sets to predict brea...Enrique Moreno Gonzalez
 
An Overview on Gene Expression Analysis
An Overview on Gene Expression AnalysisAn Overview on Gene Expression Analysis
An Overview on Gene Expression AnalysisIOSR Journals
 
Substrate stiffness and cell fate
Substrate stiffness and cell fateSubstrate stiffness and cell fate
Substrate stiffness and cell fateDiana Santos
 
Final Tissue Project Paper Fall 2015
Final Tissue Project Paper Fall 2015Final Tissue Project Paper Fall 2015
Final Tissue Project Paper Fall 2015Jenna Alsaleh
 

Was ist angesagt? (8)

3D In Vitro Model for Drug Efficiency Testing
3D In Vitro Model for Drug Efficiency Testing3D In Vitro Model for Drug Efficiency Testing
3D In Vitro Model for Drug Efficiency Testing
 
Potentials of 3D models in anticancer drug screening
Potentials of 3D models in anticancer drug screeningPotentials of 3D models in anticancer drug screening
Potentials of 3D models in anticancer drug screening
 
Interrogating differences in expression of targeted gene sets to predict brea...
Interrogating differences in expression of targeted gene sets to predict brea...Interrogating differences in expression of targeted gene sets to predict brea...
Interrogating differences in expression of targeted gene sets to predict brea...
 
FRD Grant Paper
FRD Grant PaperFRD Grant Paper
FRD Grant Paper
 
International Journal of Stem Cells & Research
International Journal of Stem Cells & ResearchInternational Journal of Stem Cells & Research
International Journal of Stem Cells & Research
 
An Overview on Gene Expression Analysis
An Overview on Gene Expression AnalysisAn Overview on Gene Expression Analysis
An Overview on Gene Expression Analysis
 
Substrate stiffness and cell fate
Substrate stiffness and cell fateSubstrate stiffness and cell fate
Substrate stiffness and cell fate
 
Final Tissue Project Paper Fall 2015
Final Tissue Project Paper Fall 2015Final Tissue Project Paper Fall 2015
Final Tissue Project Paper Fall 2015
 

Andere mochten auch

Fedora Iptables
Fedora IptablesFedora Iptables
Fedora Iptableszubin71
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyCominvent AS
 
B2B Branding Explained
B2B Branding ExplainedB2B Branding Explained
B2B Branding Explainedcsadhy
 
Dagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkDagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkCominvent AS
 
2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbioBenjamin Good
 
Bio Logical Mass Collaboration3
Bio Logical Mass Collaboration3Bio Logical Mass Collaboration3
Bio Logical Mass Collaboration3Benjamin Good
 
Gene Wiki at Phenotype RCN annual meeting
Gene Wiki at Phenotype RCN annual meetingGene Wiki at Phenotype RCN annual meeting
Gene Wiki at Phenotype RCN annual meetingBenjamin Good
 
Human Guided Forests (HGF)
Human Guided Forests (HGF)Human Guided Forests (HGF)
Human Guided Forests (HGF)Benjamin Good
 
Welcome to Ukraine - SunCity Travel LLC
Welcome to Ukraine - SunCity Travel LLCWelcome to Ukraine - SunCity Travel LLC
Welcome to Ukraine - SunCity Travel LLCAlex Faynin
 
Channeling Collaborative Spirit
Channeling Collaborative SpiritChanneling Collaborative Spirit
Channeling Collaborative SpiritBenjamin Good
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaCominvent AS
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giantsBenjamin Good
 
Open source breakfast norge findwise
Open source breakfast norge findwiseOpen source breakfast norge findwise
Open source breakfast norge findwiseCominvent AS
 
EISHI CO. main eps machine catalogue
EISHI CO. main eps machine catalogueEISHI CO. main eps machine catalogue
EISHI CO. main eps machine catalogueeishimachinery
 
Eishi Company Profile 修改好的
Eishi Company Profile 修改好的Eishi Company Profile 修改好的
Eishi Company Profile 修改好的eishimachinery
 

Andere mochten auch (20)

Fedora Iptables
Fedora IptablesFedora Iptables
Fedora Iptables
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoy
 
Gene wiki jamboree
Gene wiki jamboreeGene wiki jamboree
Gene wiki jamboree
 
B2B Branding Explained
B2B Branding ExplainedB2B Branding Explained
B2B Branding Explained
 
Dagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søkDagens Næringslivs overgang til Lucene/Solr søk
Dagens Næringslivs overgang til Lucene/Solr søk
 
2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio2015 6 bd2k_biobranch_knowbio
2015 6 bd2k_biobranch_knowbio
 
Bio Logical Mass Collaboration3
Bio Logical Mass Collaboration3Bio Logical Mass Collaboration3
Bio Logical Mass Collaboration3
 
Gene Wiki at Phenotype RCN annual meeting
Gene Wiki at Phenotype RCN annual meetingGene Wiki at Phenotype RCN annual meeting
Gene Wiki at Phenotype RCN annual meeting
 
Human Guided Forests (HGF)
Human Guided Forests (HGF)Human Guided Forests (HGF)
Human Guided Forests (HGF)
 
2to3
2to32to3
2to3
 
IMSafer Angel Round
IMSafer Angel RoundIMSafer Angel Round
IMSafer Angel Round
 
2016 mem good
2016 mem good2016 mem good
2016 mem good
 
Welcome to Ukraine - SunCity Travel LLC
Welcome to Ukraine - SunCity Travel LLCWelcome to Ukraine - SunCity Travel LLC
Welcome to Ukraine - SunCity Travel LLC
 
Channeling Collaborative Spirit
Channeling Collaborative SpiritChanneling Collaborative Spirit
Channeling Collaborative Spirit
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alpha
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
Open source breakfast norge findwise
Open source breakfast norge findwiseOpen source breakfast norge findwise
Open source breakfast norge findwise
 
EISHI CO. main eps machine catalogue
EISHI CO. main eps machine catalogueEISHI CO. main eps machine catalogue
EISHI CO. main eps machine catalogue
 
genegames.org
genegames.orggenegames.org
genegames.org
 
Eishi Company Profile 修改好的
Eishi Company Profile 修改好的Eishi Company Profile 修改好的
Eishi Company Profile 修改好的
 

Ähnlich wie Microtask crowdsourcing for disease mention annotation in PubMed abstracts

Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KBenjamin Good
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6Andrew Su
 
Building a massive biomedical knowledge graph with citizen science
Building a massive biomedical knowledge graph with citizen scienceBuilding a massive biomedical knowledge graph with citizen science
Building a massive biomedical knowledge graph with citizen scienceBenjamin Good
 
ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014Warren Kibbe
 
Federal Research & Development for the Florida system Sept 2014
Federal Research & Development for the Florida system Sept 2014 Federal Research & Development for the Florida system Sept 2014
Federal Research & Development for the Florida system Sept 2014 Warren Kibbe
 
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...Health IT Conference – iHT2
 
Clinical Genomics and Medicine
Clinical Genomics and MedicineClinical Genomics and Medicine
Clinical Genomics and MedicineWarren Kibbe
 
Mark2Cure: a crowdsourcing platform for biomedical literature annotation
Mark2Cure: a crowdsourcing platform for biomedical literature annotationMark2Cure: a crowdsourcing platform for biomedical literature annotation
Mark2Cure: a crowdsourcing platform for biomedical literature annotationBenjamin Good
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Andrew Su
 
Cell centered database for immunology and cancer research feb252016
Cell centered database for immunology and cancer research feb252016Cell centered database for immunology and cancer research feb252016
Cell centered database for immunology and cancer research feb252016Ann-Marie Roche
 
Cancer immunity webinar april 29
Cancer immunity webinar april 29Cancer immunity webinar april 29
Cancer immunity webinar april 29Ann-Marie Roche
 
Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016Rick Silva
 
2014 Rejuvenation biotechnology full program
2014 Rejuvenation biotechnology full program2014 Rejuvenation biotechnology full program
2014 Rejuvenation biotechnology full programJohn Redaelli
 
Kamala Maddali Cv2011
Kamala Maddali Cv2011Kamala Maddali Cv2011
Kamala Maddali Cv2011kamalamaddali
 
Data supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbeData supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbeWarren Kibbe
 
Block23 investor 27_04_2018
Block23 investor 27_04_2018Block23 investor 27_04_2018
Block23 investor 27_04_2018Prem Couture
 

Ähnlich wie Microtask crowdsourcing for disease mention annotation in PubMed abstracts (20)

Gene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2KGene Wiki and Mark2Cure update for BD2K
Gene Wiki and Mark2Cure update for BD2K
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
 
Building a massive biomedical knowledge graph with citizen science
Building a massive biomedical knowledge graph with citizen scienceBuilding a massive biomedical knowledge graph with citizen science
Building a massive biomedical knowledge graph with citizen science
 
ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014
 
Federal Research & Development for the Florida system Sept 2014
Federal Research & Development for the Florida system Sept 2014 Federal Research & Development for the Florida system Sept 2014
Federal Research & Development for the Florida system Sept 2014
 
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
Health IT Summit Austin 2013 - Presentation "The Impact of All Data on Health...
 
Clinical Genomics and Medicine
Clinical Genomics and MedicineClinical Genomics and Medicine
Clinical Genomics and Medicine
 
16
1616
16
 
Mark2Cure: a crowdsourcing platform for biomedical literature annotation
Mark2Cure: a crowdsourcing platform for biomedical literature annotationMark2Cure: a crowdsourcing platform for biomedical literature annotation
Mark2Cure: a crowdsourcing platform for biomedical literature annotation
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
 
Cell centered database for immunology and cancer research feb252016
Cell centered database for immunology and cancer research feb252016Cell centered database for immunology and cancer research feb252016
Cell centered database for immunology and cancer research feb252016
 
Cancer immunity webinar april 29
Cancer immunity webinar april 29Cancer immunity webinar april 29
Cancer immunity webinar april 29
 
Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016
 
2014 Rejuvenation biotechnology full program
2014 Rejuvenation biotechnology full program2014 Rejuvenation biotechnology full program
2014 Rejuvenation biotechnology full program
 
Bio ontology drtc-seminar_anwesha
Bio ontology drtc-seminar_anweshaBio ontology drtc-seminar_anwesha
Bio ontology drtc-seminar_anwesha
 
Kamala Maddali Cv2011
Kamala Maddali Cv2011Kamala Maddali Cv2011
Kamala Maddali Cv2011
 
Data supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbeData supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbe
 
Nov 2014 ouellette_windsor_icgc_final
Nov 2014 ouellette_windsor_icgc_finalNov 2014 ouellette_windsor_icgc_final
Nov 2014 ouellette_windsor_icgc_final
 
HRB-Health Research In Action booklet (feat. NICB)
HRB-Health Research In Action booklet (feat. NICB)HRB-Health Research In Action booklet (feat. NICB)
HRB-Health Research In Action booklet (feat. NICB)
 
Block23 investor 27_04_2018
Block23 investor 27_04_2018Block23 investor 27_04_2018
Block23 investor 27_04_2018
 

Mehr von Benjamin Good

Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledgeBenjamin Good
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsBenjamin Good
 
Pathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsPathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsBenjamin Good
 
Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden Benjamin Good
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of FoodBenjamin Good
 
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshopGene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshopBenjamin Good
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationBenjamin Good
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Benjamin Good
 
Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Benjamin Good
 
2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidataBenjamin Good
 
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery (Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery Benjamin Good
 
Citizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfCitizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfBenjamin Good
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBenjamin Good
 
Serious games for bioinformatics education. ISMB 2014 education workshop
Serious games for bioinformatics education.  ISMB 2014 education workshopSerious games for bioinformatics education.  ISMB 2014 education workshop
Serious games for bioinformatics education. ISMB 2014 education workshopBenjamin Good
 
The Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival predictionThe Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival predictionBenjamin Good
 
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...Benjamin Good
 
The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...
The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...
The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...Benjamin Good
 

Mehr von Benjamin Good (20)

Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledge
 
Integrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity ModelsIntegrating Pathway Databases with Gene Ontology Causal Activity Models
Integrating Pathway Databases with Gene Ontology Causal Activity Models
 
Pathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMsPathways2GO: Converting BioPax pathways to GO-CAMs
Pathways2GO: Converting BioPax pathways to GO-CAMs
 
Knowledge Beacons
Knowledge BeaconsKnowledge Beacons
Knowledge Beacons
 
Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden
 
Science Game Lab
Science Game LabScience Game Lab
Science Game Lab
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
 
Gene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshopGene Wiki and Wikimedia Foundation SPARQL workshop
Gene Wiki and Wikimedia Foundation SPARQL workshop
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocuration
 
Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2Scripps bioinformatics seminar_day_2
Scripps bioinformatics seminar_day_2
 
Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016Wikidata workshop for ISB Biocuration 2016
Wikidata workshop for ISB Biocuration 2016
 
2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata2016 bd2k bgood_wikidata
2016 bd2k bgood_wikidata
 
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery (Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
(Poster) Knowledge.Bio: an Interactive Tool for Literature-based Discovery
 
(Bio)Hackathons
(Bio)Hackathons(Bio)Hackathons
(Bio)Hackathons
 
Citizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdfCitizen sciencepanel2015 pdf
Citizen sciencepanel2015 pdf
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiers
 
Serious games for bioinformatics education. ISMB 2014 education workshop
Serious games for bioinformatics education.  ISMB 2014 education workshopSerious games for bioinformatics education.  ISMB 2014 education workshop
Serious games for bioinformatics education. ISMB 2014 education workshop
 
The Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival predictionThe Cure: Making a game of gene selection for breast cancer survival prediction
The Cure: Making a game of gene selection for breast cancer survival prediction
 
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
 
The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...
The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...
The Cure: A Game with the Purpose of Gene Selection for Breast Cancer Surviva...
 

Kürzlich hochgeladen

PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Youngkajalvid75
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 

Kürzlich hochgeladen (20)

PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 

Microtask crowdsourcing for disease mention annotation in PubMed abstracts

  • 1. Microtask crowdsourcing for disease mention annotation in PubMed abstracts Benjamin Good, Max Nanis, Andrew Su The Scripps Research Institute @bgood
  • 2. • Rapid growth of text Long term goal: improve information extraction from text 2 • Existing computational methods - are not perfect - need training data pubs/year >100/hour
  • 3. Information Extraction 1. Find mentions of high level concepts in text 2. Map mentions to specific terms in ontologies 3. Identify relationships between concepts 3
  • 4. Crowdsourcing There is accumulating evidence that many non-expert members of ‘the crowd’ can read English well enough to help with many information extraction tasks - even in complex biomedical text 4 Zhai 2013, Aroyo 2013, Burger 2014
  • 5. Microtask Crowdsourcing • Distribute discrete units of work (aka “human intelligence tasks” or HITs) to many workers in parallel who are paid to solve them. 5 Reported 500,000 registered workers in 2011 [1] [1] Paritosh P, Ipeirotis P, Cooper M, Suri S: The computer is the new sewing machine: benefits and perils of crowdsourcing. WWW '11 2011:325–326.
  • 6. AMT, how it works 6 Requester Tasks Amazon For each task, specify: • a qualification test • how many workers per task • how much we will pay per task • in this case, a link to a website that we host where they can complete the task. Interact directly with Amazon system Manages: • parallel execution of jobs • worker access to tasks via qualification tests • payments • task advertising Workers
  • 7. How well can AMT workers, in aggregate, reproduce a gold standard disease mention corpus within the text of PubMed abstracts? 7
  • 8. Corpus used for comparison NCBI Disease corpus • 793 PubMed abstracts • (100 development, 593 training, 100 test) • 12 expert annotators (2 annotate each abstract) 6,900 “disease” mentions 8 Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics.
  • 9. Disease Phrase is a disease IF: • it can be mapped to a unique UMLS metathesaurus concept in one of these semantic types 9 Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics. • and it contains information helpful to physicians
  • 10. 10 • Specific Disease: • “Diastrophic dysplasia” • Disease Class: • “Cancers” • Composite Mention: • “prostatic , skin , and lung cancer” • Modifier: • ..the “familial breast cancer” gene , BRCA2.. Disease mentions
  • 11. Instructions • Task: You will be presented with text from the biomedical literature which we believe may help resolve some important medical questions. The task is to highlight words and phrases in that text which are diseases, disease groups, or symptoms of diseases. This work will help advance research in cancer and many other diseases! • Highlight all diseases and disease abbreviations ! • “...are associated with Huntington disease ( HD )... HD patients received...” • “The Wiskott-Aldrich syndrome ( WAS ) , an X-linked immunodeficiency…” • Highlight the longest span of text specific to a disease ! • “... contains the insulin-dependent diabetes mellitus locus …” • and not just ‘diabetes’. • Highlight disease conjunctions as single, long spans. • “... a significant fraction of familial breast and ovarian cancer , but undergoes…” • Highlight symptoms - physical results of having a disease! • “XFE progeroid syndrome can cause dwarfism, cachexia, and microcephaly. Patients often display learning disabilities, hearing loss, and visual impairment. 11
  • 12. Qualification task: Q1 Select all and only the terms that should be highlighted for each text segment: 12 1. “Myotonic dystrophy ( DM ) is associated with a ( CTG ) n trinucleotide repeat expansion in the 3-untranslated region of a protein kinase-encoding gene , DMPK , which maps to chromosome 19q13 . 3 . ” • Myotonic • dystrophy • Myotonic dystrophy • DM • CTG • trinucleotide repeat expansion • kinase-encoding gene • DMPK
  • 13. Qualification task: Q2 13 2. “Germline mutations in BRCA1 are responsible for most cases of inherited breast and ovarian cancer . However , the function of the BRCA1 protein has remained elusive . As a regulated secretory protein , BRCA1 appears to function by a mechanism not previously described for tumour suppressor gene products.” • Germline mutations • BRCA1 • breast • ovarian cancer • inherited breast and ovarian cancer • cancer • tumour • tumour suppressor
  • 14. Qualification task: Q3 14 3. “We report about Dr . Kniest , who first described the condition in 1952 , and his patient , who , at the age of 50 years is severely handicapped with short stature , restricted joint mobility , and blindness but is mentally alert and leads an active life . This is in accordance with molecular findings in other patients with Kniest dysplasia and…” • age of 50 years • severely handicapped • short • short stature • restricted joint mobility • blindness • mentally alert • molecular findings • Kniest dysplasia • dysplasia
  • 15. Qualification task results 15 Threshold for passing 33/194 passed 17% Workers qualified workers
  • 16. Tagging interface 16 Click to see instructions Highlight mentions
  • 17. Experiment 17 Identify the disease mentions in the 593 abstracts from the NCBI disease corpus • 6 cents per HIT • HIT = annotate one abstract from PubMed • 5 workers annotate each abstract
  • 18. AMT, how it really works 18 Requester Tasks Amazon Aggregation function Workers http://www.thesheepmarket.com/
  • 19. Increase precision with voting 19 1 or more votes (K=1) This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. K=2 This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. K=3 This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. K=4 This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies. Aggregation function
  • 20. Results 593 abstracts compared to gold standard • 7 days • $192.90 • 17 workers 20 F = 0.81, k = 2
  • 21. Inter-Annotator agreement among experts, NCBI Disease corpus 21 Doğan, Rezarta, and Zhiyong Lu. "An improved corpus of disease mentions in PubMed citations." Proceedings of the 2012 Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics, 2012. 0.76 0.87 Average level of agreement between expert annotators (stage 1)
  • 22. In aggregate, our worker ensemble is faster, cheaper and as accurate as a single expert annotator for this task • experts had consistency (F) with other experts = 0.76. • The turker ensemble had consistency with the finalized standard = 0.81 22
  • 23. Summary • Some members of the crowd can tag “disease” mentions in PubMed abstracts with comparable accuracy to experts • This was nontrivial to set up • We can now generate disease mention annotations at a rate of about 500 abstracts and $150 per week • Next step: mentions to concepts… 23
  • 24. The Future • It looks like, if we want to, we can have access to much larger sets of annotated corpora than ever before • The annotations are different • New ways of using and evaluating IE algorithms are needed [1]. 24 [1] Aroyo, Lora, and Chris Welty. Harnessing disagreement in crowdsourcing a relation extraction gold standard. Tech. Rep. RC25371 (WAT1304-058), IBM Research, 2013.
  • 25. Thanks 25 Max Nanis Andrew Su Mechanical Turk Workers! @bgood bgood@scipps.edu
  • 26. Try it yourself! • GATE crowdsourcing plugin. http://gate.ac.uk/wiki/crowdsourcing.html • Or you can try our code at https://bitbucket.org/sulab/mark2cure/ ! • And present your findings at the crowdsourcing session at the Pacific Symposium on Biocomputing January 2015, Big Island, Hawaii 26
  • 27. Clarification… • This is NOT a replacement for professional annotators • This IS a tool that could be used by professional annotators 27
  • 28. Related work • [1] Zhai et al 2013, used similar protocol to tag medication names in clinical trials descriptions. F = 0.88 compared to gold standard • [2] Burger et al, using microtask workers to identify relationships between genes and mutations. • [3] Aroyo & Welty, used workers to identify relations between concepts in medical text. 28 [1] Zhai H. et al (2013) ”Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing” J Med Internet Res [2] Burger, John, et al. (2014) "Hybrid curation of gene-mutation relations combining automated extraction and crowdsourcing.” Mitre technical report [3] Aroyo, Lora, and Chris Welty. Harnessing disagreement in crowdsourcing a relation extraction gold standard. Tech. Rep. RC25371 (WAT1304-058), IBM Research, 2013.