SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Cultivating and mining the Gene Wiki for crowdsourcedgene annotation ISMB Bio-Ontologies SIG July 14, 2011 Andrew Su, Ph.D.
Few genes are well annotated… 2 Counts TP53 TNF APOE MTHFR IL6 HLA-DRB1 VEGFA EGFR TGFB1 ACE 59% PubMed 38% 23,278 protein-coding genes Gene ontology Genes, sorted by decreasing counts Data: NCBI gene2pubmed, August 2010
… because the literature is sparsely curated? 3
… because the literature is sparsely curated? 4 Number of articles read by typical scientist
5 311,696 articles (1.5% of PubMed) have been cited by GO annotations
6 0 Sooner or later, the research community will need to be involved in the annotation effort to scale up to the rate of data generation.
The Long Tail is a prolific source of content 7 Short Head Content produced Long Tail Contributors (sorted) Publishing: Video: Product reviews: Food reviews: Judging: Newspapers TV/Hollywood Consumer reports Food critics Olympics Blogs YouTube Amazon reviews Yelp American Idol
Wikipedia is reasonably accurate 8
Wikipedia has breadth and depth 9 Articles Words (millions) Words/ article Wikipedia Britannica Online http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008
10 We can harness the Long Tail of scientists to directly participate in the gene annotation process.
10,000 gene “stubs” within Wikipedia 11 Protein structure Gene summary Symbols and identifiers Gene Ontology annotations Protein interactions Tissue expression pattern Linked references Links to structured databases
Wiki success depends on a positive feedback 12 Gene wiki page utility 1 100 2 200 Number of users Number of contributors
Filtering, extracting, and summarizing PubMed Documents Concepts
A review article for every gene is powerful 14 Reelin: 68 editors, 543 edits since July 2002 Heparin: 175 editors, 320 edits since June 2003 AMPK: 44 editors, 84 edits since March 2004 RNAi: 232 editors, 708 edits since October 2002 References to the literature Hyperlinks to related concepts
Gene Wiki has a diverse critical mass of readers 15 Utility Rank 101-110: Scientists Tau protein Interleukin 10 APC C-Met Factor V Interleukin 8 CD44 Histamine H1 receptor Kappa Opioid receptor Dihydrofolatereductase Rank 1001-1010: Specialists CSDA CNTNAP2 IGSF8 Adenosine A3 receptor RYR1 ETV6 Small heterodimer partner 5-HT1D receptor TRPC6 Interleukin-6 receptor Users Contributors Rank 1-10: General society Insulin Titin Human chorionic gonadotropin Vasopressin ANKH CLOCK Catalase Erythropoietin Glucagon Parathyroid hormone Total: 5.0 million views / month
Readership is poised to grow 16 Utility Users Contributors
The Gene Wiki has a critical mass of editors 17 Utility Users Contributors Editors Editor count Edit count Edits In Jan – Jun 2010 … … 7474 edits were made by 2109 unique users  … total increase in text  ≈  20 PLoS Biology research articles
Making the Gene Wiki more reliable 18 The company name is derived from old Greek, and means "destroyer of birds". Novartis is a multinational pharmaceutical company based in Basel, Switzerland that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 2 2
Making the Gene Wiki more reliable 19 The company name is derived from old Greek, and means "destroyer of birds". Novartis is a multinational pharmaceutical company based in Basel, Switzerland that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 2 36211 total edits 36 total edits * * * * * * * * * * * * * * High-trust author Low-trust author http://www.wikitrust.net/
Making the Gene Wiki more computable 20 Structured annotations Free text !
Example text from 5-HT1A receptor Agonists Heart rate Receptor Blood pressure Snippet from article on 5-HT1A receptor: Snippet from article on 5-HT1A receptor: “…5-HT1A receptor agonistsdecrease blood pressureand heart rateor cause hypotension via a central mechanism, by inducing peripheral vasodilation, and by stimulating the vagus nerve…” “…5-HT1A receptor agonists decrease blood pressure and heart rate or cause hypotension via a central mechanism, by inducing peripheral vasodilation, and by stimulating the vagus nerve…” Vasodilation Hypotension Vagus nerve
Example text from 5-HT1A receptor Agonists Heart rate Receptor Blood pressure 5-HT1A receptor Vasodilation Hypotension Vagus nerve
23
Re-discovering common knowledge 24 NCBI Entrez Gene: 3362 Wikilink Candidate assertion GO:0004993 GO exact synonym Gene Wiki mapping
Mining the most recent literature 25 NCBI Entrez Gene: 57620 Wikilink Candidate assertion GO:0030154 GO related concept Gene Wiki mapping
Filling the gaps in gene annotation 26 NCBI Entrez Gene: 334 Wikilink Candidate assertion GO:0006897 GO exact match Gene Wiki mapping
Disease associations mined from the Gene Wiki Gene Wiki Articles (10,271) 23% exact match Filter out seeded text 5% match parent 2% match child NCBO Annotator 70% have no match Compare to DO database Matched Disease Ontology terms (2983) 2147 candidate  annotations
Disease associations mined from the Gene Wiki Expert curation Correct Maybe Incorrect 86% 10% Overall specificity: 90-93% 4%
GO associations mined from the Gene Wiki Gene Wiki Articles (10,271) 17% exact match Filter out seeded text 26% match parent NCBO Annotator 55% have no match 2% match child Compare to GO database Matched Gene Ontology terms (11,022) 6319 candidate  annotations
GO associations mined from the Gene Wiki Expert curation Correct Maybe Incorrect 14% 26% Overall specificity: 48-64% 60%
Common sources of error in GO associations 31 1)  Incorrect concept recognition OR2F1: “Olfactory receptors … are responsible for the recognition and G protein-mediated transductionof odorant signals.” Transduction (GO:0009293) The transfer of genetic information to a bacterium from a bacteriophage or between bacterial or yeast cells mediated by a phage vector.  Signal transduction (GO:0007165) The cellular process in which a signal is conveyed to trigger a change in the activity or state of a cell. Signal transduction begins with reception of a signal, e.g. a ligand binding to a receptor or receptor activation by a stimulus such as light, and ends with regulation of a downstream cellular process…
Common sources of error in GO associations 32 Dephosphorylation Excretion Gene expression Glycosylation Localization Methylation Proteolysis Secretion Transport Transcription Translation 2)  Incorrect sentence context Phosporylation MEF2C: “Several post translational modifications have been identified including phosphorylation on serine-59 …” MEF2C Neurogenesis Myelination
Is 48 – 64 % specificity useful? 33 Enrichment analysis muscle contraction (GO:0006936) GO term 5449 articles Concept recognition PubMed abstracts Gene list 87 genes + Gene Wiki 87 articles GO:0006936 GO:0006936 Linked genes by PubMed only Linked genes by PubMed + Gene Wiki P = 1.0 P = 1.22 E-09
GO associations improve enrichment analyses 34 p-value (PubMed + Gene Wiki) Muscle contraction p-value (PubMed only)
35 “Like the image of the [mammoth] hairball, it is equally unhelpful in understanding the object’s properties. You can guess that the network is large and its connectivity is complex, but not more. At best, the visualization is merely decorative.” - Martin Krzywinski http://mkweb.bcgsc.ca/linnet/talks/linnet-informatics2010.pdf
36 TOP 100 GENES
Mapping to many biomedical semantic groups 37
Semantic representation From text mining to a Semantic Gene Wiki 38 Community contributions Semantics Semantic querying û ü ü Home-grown wiki ü ü û ? Gene Wiki/ Wikipedia ü ü –  Semantic Gene Wiki
Semantic Wiki Links 39 Semantic Gene Wiki Rendered text Gene Wiki Based on Semantic MediaWiki (SMW) Based on MediaWiki apoptosis apoptosis apoptosis Mirror and translate apoptosis [[apoptosis]] [[apoptosis]] [[repress::apoptosis]] {{SWL|target=apoptosis|type=promotes}} apoptosis [[promote::apoptosis]] [[modulate::apoptosis]] Semantic queries, RDF, etc
For community-based science, data is king 40 Data without structure    is valuable, but structure    without data is not.
For community-based science, data is king 41 Data without structure    is valuable, but structure    without data is not. X X Wikipedia WP:MCB, Boghog Artists and illustrators Wiki links, infoboxes DOI bot, CitationBot WikiTrust Copy-editing Figures Structure Citations Provenance = X Domain expert Information scientist
The Gene Wiki successfully harnesses the  Long Tail of scientists  for community annotation  of gene function 42
43 Collaborators Group members Doug Howe, ZFIN Salvatore Loguercio (*), TU Dresden John Hogenesch, U Penn Jon Huss, GNF Angel Pizzaro, U Penn Faramarz Valafar, SDSU Pierre Lindenbaum,  FondationJean Dausset Michael Martone, Rush Konrad Koehler, Karo Bio Warren Kibbe, Simon Lim, Northwestern Many Wikipedia editors 	WP:MCB Project Erik Clarke Ben Good (*) Ian Macleod ChunleiWu (*) See talk on SNPediamashup at 1:55 PM WikiTrust (UCSC) Luca de Alfaro Bo Adler Ian Pye Contact http://sulab.org asu@scripps.edu @andrewsu +Andrew Su ISMB travel support Funding and Support (BioGPS: GM83924, Gene Wiki: GM089820)

Weitere ähnliche Inhalte

Mehr von Andrew Su

WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)Andrew Su
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseAndrew Su
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Andrew Su
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Andrew Su
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchAndrew Su
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceAndrew Su
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceAndrew Su
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Andrew Su
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeAndrew Su
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6Andrew Su
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Andrew Su
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceAndrew Su
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Andrew Su
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgAndrew Su
 
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgAndrew Su
 
Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)Andrew Su
 

Mehr von Andrew Su (20)

WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebase
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease Research
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen Science
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledge
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
 

Kürzlich hochgeladen

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Cultivating and mining the Gene Wiki for crowdsourced gene annotation

  • 1. Cultivating and mining the Gene Wiki for crowdsourcedgene annotation ISMB Bio-Ontologies SIG July 14, 2011 Andrew Su, Ph.D.
  • 2. Few genes are well annotated… 2 Counts TP53 TNF APOE MTHFR IL6 HLA-DRB1 VEGFA EGFR TGFB1 ACE 59% PubMed 38% 23,278 protein-coding genes Gene ontology Genes, sorted by decreasing counts Data: NCBI gene2pubmed, August 2010
  • 3. … because the literature is sparsely curated? 3
  • 4. … because the literature is sparsely curated? 4 Number of articles read by typical scientist
  • 5. 5 311,696 articles (1.5% of PubMed) have been cited by GO annotations
  • 6. 6 0 Sooner or later, the research community will need to be involved in the annotation effort to scale up to the rate of data generation.
  • 7. The Long Tail is a prolific source of content 7 Short Head Content produced Long Tail Contributors (sorted) Publishing: Video: Product reviews: Food reviews: Judging: Newspapers TV/Hollywood Consumer reports Food critics Olympics Blogs YouTube Amazon reviews Yelp American Idol
  • 9. Wikipedia has breadth and depth 9 Articles Words (millions) Words/ article Wikipedia Britannica Online http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008
  • 10. 10 We can harness the Long Tail of scientists to directly participate in the gene annotation process.
  • 11. 10,000 gene “stubs” within Wikipedia 11 Protein structure Gene summary Symbols and identifiers Gene Ontology annotations Protein interactions Tissue expression pattern Linked references Links to structured databases
  • 12. Wiki success depends on a positive feedback 12 Gene wiki page utility 1 100 2 200 Number of users Number of contributors
  • 13. Filtering, extracting, and summarizing PubMed Documents Concepts
  • 14. A review article for every gene is powerful 14 Reelin: 68 editors, 543 edits since July 2002 Heparin: 175 editors, 320 edits since June 2003 AMPK: 44 editors, 84 edits since March 2004 RNAi: 232 editors, 708 edits since October 2002 References to the literature Hyperlinks to related concepts
  • 15. Gene Wiki has a diverse critical mass of readers 15 Utility Rank 101-110: Scientists Tau protein Interleukin 10 APC C-Met Factor V Interleukin 8 CD44 Histamine H1 receptor Kappa Opioid receptor Dihydrofolatereductase Rank 1001-1010: Specialists CSDA CNTNAP2 IGSF8 Adenosine A3 receptor RYR1 ETV6 Small heterodimer partner 5-HT1D receptor TRPC6 Interleukin-6 receptor Users Contributors Rank 1-10: General society Insulin Titin Human chorionic gonadotropin Vasopressin ANKH CLOCK Catalase Erythropoietin Glucagon Parathyroid hormone Total: 5.0 million views / month
  • 16. Readership is poised to grow 16 Utility Users Contributors
  • 17. The Gene Wiki has a critical mass of editors 17 Utility Users Contributors Editors Editor count Edit count Edits In Jan – Jun 2010 … … 7474 edits were made by 2109 unique users … total increase in text ≈ 20 PLoS Biology research articles
  • 18. Making the Gene Wiki more reliable 18 The company name is derived from old Greek, and means "destroyer of birds". Novartis is a multinational pharmaceutical company based in Basel, Switzerland that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 2 2
  • 19. Making the Gene Wiki more reliable 19 The company name is derived from old Greek, and means "destroyer of birds". Novartis is a multinational pharmaceutical company based in Basel, Switzerland that manufactures drugs such as clozapine (Clozaril), diclofenac (Voltaren), … 2 36211 total edits 36 total edits * * * * * * * * * * * * * * High-trust author Low-trust author http://www.wikitrust.net/
  • 20. Making the Gene Wiki more computable 20 Structured annotations Free text !
  • 21. Example text from 5-HT1A receptor Agonists Heart rate Receptor Blood pressure Snippet from article on 5-HT1A receptor: Snippet from article on 5-HT1A receptor: “…5-HT1A receptor agonistsdecrease blood pressureand heart rateor cause hypotension via a central mechanism, by inducing peripheral vasodilation, and by stimulating the vagus nerve…” “…5-HT1A receptor agonists decrease blood pressure and heart rate or cause hypotension via a central mechanism, by inducing peripheral vasodilation, and by stimulating the vagus nerve…” Vasodilation Hypotension Vagus nerve
  • 22. Example text from 5-HT1A receptor Agonists Heart rate Receptor Blood pressure 5-HT1A receptor Vasodilation Hypotension Vagus nerve
  • 23. 23
  • 24. Re-discovering common knowledge 24 NCBI Entrez Gene: 3362 Wikilink Candidate assertion GO:0004993 GO exact synonym Gene Wiki mapping
  • 25. Mining the most recent literature 25 NCBI Entrez Gene: 57620 Wikilink Candidate assertion GO:0030154 GO related concept Gene Wiki mapping
  • 26. Filling the gaps in gene annotation 26 NCBI Entrez Gene: 334 Wikilink Candidate assertion GO:0006897 GO exact match Gene Wiki mapping
  • 27. Disease associations mined from the Gene Wiki Gene Wiki Articles (10,271) 23% exact match Filter out seeded text 5% match parent 2% match child NCBO Annotator 70% have no match Compare to DO database Matched Disease Ontology terms (2983) 2147 candidate annotations
  • 28. Disease associations mined from the Gene Wiki Expert curation Correct Maybe Incorrect 86% 10% Overall specificity: 90-93% 4%
  • 29. GO associations mined from the Gene Wiki Gene Wiki Articles (10,271) 17% exact match Filter out seeded text 26% match parent NCBO Annotator 55% have no match 2% match child Compare to GO database Matched Gene Ontology terms (11,022) 6319 candidate annotations
  • 30. GO associations mined from the Gene Wiki Expert curation Correct Maybe Incorrect 14% 26% Overall specificity: 48-64% 60%
  • 31. Common sources of error in GO associations 31 1) Incorrect concept recognition OR2F1: “Olfactory receptors … are responsible for the recognition and G protein-mediated transductionof odorant signals.” Transduction (GO:0009293) The transfer of genetic information to a bacterium from a bacteriophage or between bacterial or yeast cells mediated by a phage vector. Signal transduction (GO:0007165) The cellular process in which a signal is conveyed to trigger a change in the activity or state of a cell. Signal transduction begins with reception of a signal, e.g. a ligand binding to a receptor or receptor activation by a stimulus such as light, and ends with regulation of a downstream cellular process…
  • 32. Common sources of error in GO associations 32 Dephosphorylation Excretion Gene expression Glycosylation Localization Methylation Proteolysis Secretion Transport Transcription Translation 2) Incorrect sentence context Phosporylation MEF2C: “Several post translational modifications have been identified including phosphorylation on serine-59 …” MEF2C Neurogenesis Myelination
  • 33. Is 48 – 64 % specificity useful? 33 Enrichment analysis muscle contraction (GO:0006936) GO term 5449 articles Concept recognition PubMed abstracts Gene list 87 genes + Gene Wiki 87 articles GO:0006936 GO:0006936 Linked genes by PubMed only Linked genes by PubMed + Gene Wiki P = 1.0 P = 1.22 E-09
  • 34. GO associations improve enrichment analyses 34 p-value (PubMed + Gene Wiki) Muscle contraction p-value (PubMed only)
  • 35. 35 “Like the image of the [mammoth] hairball, it is equally unhelpful in understanding the object’s properties. You can guess that the network is large and its connectivity is complex, but not more. At best, the visualization is merely decorative.” - Martin Krzywinski http://mkweb.bcgsc.ca/linnet/talks/linnet-informatics2010.pdf
  • 36. 36 TOP 100 GENES
  • 37. Mapping to many biomedical semantic groups 37
  • 38. Semantic representation From text mining to a Semantic Gene Wiki 38 Community contributions Semantics Semantic querying û ü ü Home-grown wiki ü ü û ? Gene Wiki/ Wikipedia ü ü – Semantic Gene Wiki
  • 39. Semantic Wiki Links 39 Semantic Gene Wiki Rendered text Gene Wiki Based on Semantic MediaWiki (SMW) Based on MediaWiki apoptosis apoptosis apoptosis Mirror and translate apoptosis [[apoptosis]] [[apoptosis]] [[repress::apoptosis]] {{SWL|target=apoptosis|type=promotes}} apoptosis [[promote::apoptosis]] [[modulate::apoptosis]] Semantic queries, RDF, etc
  • 40. For community-based science, data is king 40 Data without structure is valuable, but structure without data is not.
  • 41. For community-based science, data is king 41 Data without structure is valuable, but structure without data is not. X X Wikipedia WP:MCB, Boghog Artists and illustrators Wiki links, infoboxes DOI bot, CitationBot WikiTrust Copy-editing Figures Structure Citations Provenance = X Domain expert Information scientist
  • 42. The Gene Wiki successfully harnesses the Long Tail of scientists for community annotation of gene function 42
  • 43. 43 Collaborators Group members Doug Howe, ZFIN Salvatore Loguercio (*), TU Dresden John Hogenesch, U Penn Jon Huss, GNF Angel Pizzaro, U Penn Faramarz Valafar, SDSU Pierre Lindenbaum, FondationJean Dausset Michael Martone, Rush Konrad Koehler, Karo Bio Warren Kibbe, Simon Lim, Northwestern Many Wikipedia editors WP:MCB Project Erik Clarke Ben Good (*) Ian Macleod ChunleiWu (*) See talk on SNPediamashup at 1:55 PM WikiTrust (UCSC) Luca de Alfaro Bo Adler Ian Pye Contact http://sulab.org asu@scripps.edu @andrewsu +Andrew Su ISMB travel support Funding and Support (BioGPS: GM83924, Gene Wiki: GM089820)

Hinweis der Redaktion

  1. We are very early in our efforts to comprehensively annotate human gene functionWhy important? Genome-scale surveys aren’t biased toward well studied genes, huge opportunity for biomedical discovery59% have 5 or fewer references38% have one or no references
  2. If you believe that greater than 1.5% of articles have relevance to gene function, then it says there is a bottleneck in in our curation effortsNumbers updated 7/15/2011
  3. Relying on the entire community of scientists to digest the biomedical literature: identification filtering extraction summarization
  4. Reverted four minutes later
  5. Reverted four minutes later
  6. Structured annotations enable pathway analysis, statistical analyses, cross-species comparisons
  7. 5-HT1a is a serotonin receptorTODO: add real ontology identifiers
  8. 5-HT1a is a serotonin receptor
  9. TODO: update example?
  10. Transduction accounts for 70% of the concept recognition problems
  11. We extended this analysis to all 773 GO terms used in human gene annotations and found a consistent improvement in the enrichment scores