SlideShare ist ein Scribd-Unternehmen logo
1 von 33
GeneGames.org
  The Gene Wiki: Crowdsourcing
     human gene annotation
                Andrew Su, Ph.D.
                    @andrewsu
                  asu@scripps.edu
                   http://sulab.org   OK

          Genome Informatics          OK

          September 6, 2012
2
The Gene Wiki crib sheet
                                                   http://www.slideshare.net/andrewsu

   • Bulk creation of ~10k Wikipedia articles
     (http://dx.doi.org/10.1371/journal.pbio.0060175)
   • Monthly stats: > 4 million views, > 1000 edits
     (http://dx.doi.org/10.1093/nar/gkr925)
   • Text mining reveals novel Gene Ontology and Disease
     Ontology annotations (http://dx.doi.org/doi:10.1186/1471-2164-
     12-603)
   • Mash-up with SNPedia for crowdsourced gene-
     disease database (http://www.jbiomedsem.com/content/3/S1/S6)
   • Merging Wikipedia with the Semantic Web
     (http://dx.doi.org/10.1093/database/bar060)
3



Seven million human hours




                            http://www.flickr.com/photos/archana3k1/4124330493/
4



Twenty million human hours




                             http://www.flickr.com/photos/ableman/2171326385/
5
-
    150 billion human hours
              per year




                              http://www.flickr.com/photos/rvp-cw/6243289302/
6
Using games to fold proteins



        Fold.it players have successfully:
        • Outperformed state of the art protein
          folding algorithms (Cooper, Nature, 2010)
        • Solved a previously-intractable crystal
          structure (Khatib, Nat Struct Mol Biol, 2011)
        • Designed an improved protein folding
          algorithm (Khatib, PNAS, 2011)
        • Improved enzyme activity of de novo
          designed enzyme (Eiben, Nat Biotechnol, 2011)

                         http://fold.it
7
Using games to fold RNAs




              http://eterna.cmu.edu/
8
Using games to align sequences




              http://phylo.cs.mcgill.ca
9
Using games to annotate genes?




              http://genegames.org
10
No good gene-disease annotation database
             Query: Apolipoprotein E




            Alzheimer's disease (AD)
            Lipoprotein glomerulopathy
            Sea-blue histiocyte disease
11
No good gene-disease annotation database
             Query: Apolipoprotein E




            Alzheimer's disease (AD)
            Lipoprotein glomerulopathy
            Sea-blue histiocyte disease
            Hyperlipoproteinemia, type III
            Macular degeneration, age-related
            Myocardial infarction susceptibility
12
No good gene-disease annotation database
              Query: Apolipoprotein E




           ? Alzheimer's disease (AD)
           ? Lipoprotein glomerulopathy
           ? Sea-blue histiocyte disease
             Hyperlipoproteinemia, type III
           ? Macular degeneration, age-related
           ? Myocardial infarction susceptibility
             HIV
             Psoriasis
             Vascular Diseases
13
No good gene-disease annotation database
             Query: Apolipoprotein E




            Alzheimer's disease (AD)    Memory
                                        Coronary Artery Disease
            Neuropsychological Tests    Hypertension
            Cognition Disorders         Mental Status Schedule
                                        Psychiatric Status Rating
            Dementia                        Scales
            Cognition                   Hyperlipidemias
                                        Atrophy
            Disease Progression         Dementia, Vascular
            Cardiovascular Diseases     Parkinson Disease
                                        Brain Injuries
            Coronary Disease            Myocardial Infarction
            Diabetes Mellitus, Type 2   …

            Memory Disorders            477 diseases!
14
Play Dizeez to annotate gene-disease links
                                                6. Play to win!
               5. Hurry!
                                 4. Then on to the
                                 next question…

           3. If it‟s „right‟, you get points

            1. Read the clue (gene)




                             2. Click the related disease
                                (only one is “right”)
15
Dizeez players seem pretty smart…

  In total (since Dec 2011):
  • 207 unique gamers
  • 1045 games played
  • 8525 guesses

# Occurrences   Gene Disease              Pubmed   OMIM PharmGKB   Gene Wiki

      7         GAST gastrinoma
      7         RBP3 retinoblastoma
      7         SSX1 synovial sarcoma
      6          TG    Graves' disease
      6         CRYGC Cataract
      6         SOX8 mental retardation
      6          WRN Werner syndrome
      6          ABL1 leukemia
      6         MLL3 leukemia
      6         SNAI2 breast carcinoma
16
Dizeez players seem pretty smart…

  In total (since Dec 2011):
  • 207 unique gamers
  • 1045 games played
  • 8525 guesses

# Occurrences    Gene Disease              Pubmed   OMIM PharmGKB   Gene Wiki

      5         MECOM sarcoma
      4         ATF7   cancer
      3         ABCB5 acute myeloid leukemia
      3         SART1 glioblastoma
      3         NCK1   leukemia
      3         NEK1   cancer
17
Using games to predict phenotype from genotype?




                                  The Cure




               http://genegames.org
18
Classification problems in genome biology

                                                   Classify new
   cancer                    normal                  samples


                                      find patterns
                                                                  cancer
   100,000s features




                                                                  normal
                                          SVM
                                         Neural
                                        networks
                                          Naïve
                                          Bayes
                                          KNN
                                           …
                       100s samples
19
Random forests
                                      Sample subset
                                       of cases and   Train decision
  cancer                     normal       features         tree
   100,000s features




                       100s samples
20
Random forests


  cancer                     normal
   100,000s features




                       100s samples
21
Random forests

                                                         Classify new
  cancer                     normal                        samples



                                                                        cancer
   100,000s features




                                                                        normal




                                      How to interject
                                        biological
                       100s samples    knowledge?
22
Network-guided forests




                         Dutkowski & Ideker (2011). PLoS Computational Biology
23
Network-guided forests
                                          Sample
                                      features by PPI   Train decision
  cancer                     normal       network            tree
   100,000s features




                       100s samples
24
Human-guided forests
                                        Sample
                                      features by    Train decision
  cancer                     normal      human            tree
                                      intelligence
   100,000s features




                       100s samples
25
The Cure: Genomic predictors for disease
26
The Cure: Genomic predictors for disease
27
The Cure: Genomic predictors for disease
28
The Cure: Genomic predictors for disease
29
The Cure: Genomic predictors for disease
30
The Cure: Genomic predictors for disease
31
Human-guided forests

                       Classify new
                         samples



                                      cancer
                                      normal
32
“Critical Assessment”-style challenge




      Will this work? Check our blog after October 15.
33
       Collaborators                                                        Group members
Doug Howe, ZFIN                                             Ben Good                   Max Nanis
John Hogenesch, U Penn
Jon Huss, GNF
                                                            Salvatore Loguercio        Chunlei Wu
Luca de Alfaro, UCSC                                        Ian Macleod
Angel Pizzaro, U Penn
Faramarz Valafar, SDSU
Pierre Lindenbaum,
      Fondation Jean Dausset
Michael Martone, Rush
Konrad Koehler, Karo Bio
Warren Kibbe, Simon Lim, Northwestern
Many Wikipedia editors
    WP:MCB Project



                                                                                         Contact
                                                                                     http://sulab.org
 Recruiting graduate students
                                                                                    asu@scripps.edu
  in quantitative biology! See                                                        @andrewsu
 http://education.scripps.edu/                                                        +Andrew Su



                                        Funding and Support


                                                                                      @genegame
                                   (BioGPS: GM83924, Gene Wiki: GM089820)

Weitere ähnliche Inhalte

Andere mochten auch

3B_2_Development of a server to manage a customised localised local version o...
3B_2_Development of a server to manage a customised localised local version o...3B_2_Development of a server to manage a customised localised local version o...
3B_2_Development of a server to manage a customised localised local version o...
GISRUK conference
 
Phenotypes and models at rgd -meet joe rat
Phenotypes and models at rgd -meet joe ratPhenotypes and models at rgd -meet joe rat
Phenotypes and models at rgd -meet joe rat
Jennifer Smith
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informatics
Daniela Rotariu
 

Andere mochten auch (11)

3B_2_Development of a server to manage a customised localised local version o...
3B_2_Development of a server to manage a customised localised local version o...3B_2_Development of a server to manage a customised localised local version o...
3B_2_Development of a server to manage a customised localised local version o...
 
ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation
ISMB2012: The Gene Wiki: Crowdsourcing human gene annotationISMB2012: The Gene Wiki: Crowdsourcing human gene annotation
ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation
 
Phenotypes and models at rgd -meet joe rat
Phenotypes and models at rgd -meet joe ratPhenotypes and models at rgd -meet joe rat
Phenotypes and models at rgd -meet joe rat
 
Biological sequences analysis
Biological sequences analysisBiological sequences analysis
Biological sequences analysis
 
Phylogenetics1
Phylogenetics1Phylogenetics1
Phylogenetics1
 
CV Biplabendu Das
CV Biplabendu DasCV Biplabendu Das
CV Biplabendu Das
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informatics
 
Proteins – Basics you need to know for Proteomics
Proteins – Basics you need to know for ProteomicsProteins – Basics you need to know for Proteomics
Proteins – Basics you need to know for Proteomics
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
Molecular marker
Molecular markerMolecular marker
Molecular marker
 
Translation
TranslationTranslation
Translation
 

Ähnlich wie GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

Haas diagnosis 2012
Haas diagnosis 2012Haas diagnosis 2012
Haas diagnosis 2012
mitoaction
 
Alz capability 1.13
Alz capability 1.13Alz capability 1.13
Alz capability 1.13
Folio Bio
 
Copy number variations in monozygotic twins discordant for schizophrenia
Copy number variations in monozygotic twins discordant for schizophreniaCopy number variations in monozygotic twins discordant for schizophrenia
Copy number variations in monozygotic twins discordant for schizophrenia
ccastel3
 
Cloning - #Scichallenge2017
Cloning - #Scichallenge2017Cloning - #Scichallenge2017
Cloning - #Scichallenge2017
Ondřej Volejník
 

Ähnlich wie GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012) (20)

Xenotech presentation May 1 2008
Xenotech presentation May 1 2008Xenotech presentation May 1 2008
Xenotech presentation May 1 2008
 
Judith Campisi at Health Extension Salon #3
Judith Campisi at Health Extension Salon #3Judith Campisi at Health Extension Salon #3
Judith Campisi at Health Extension Salon #3
 
Biotechnology
BiotechnologyBiotechnology
Biotechnology
 
Neurodegerative Disorder.pptx
Neurodegerative Disorder.pptxNeurodegerative Disorder.pptx
Neurodegerative Disorder.pptx
 
Neurodegerative Disorder.pptx
Neurodegerative Disorder.pptxNeurodegerative Disorder.pptx
Neurodegerative Disorder.pptx
 
Haas diagnosis 2012
Haas diagnosis 2012Haas diagnosis 2012
Haas diagnosis 2012
 
Alz capability 1.13
Alz capability 1.13Alz capability 1.13
Alz capability 1.13
 
Presentation from Dr. Melton
Presentation from Dr. MeltonPresentation from Dr. Melton
Presentation from Dr. Melton
 
Stephan Zuckner - 'Neuropatías periféricas hereditarias'
Stephan Zuckner - 'Neuropatías periféricas hereditarias'Stephan Zuckner - 'Neuropatías periféricas hereditarias'
Stephan Zuckner - 'Neuropatías periféricas hereditarias'
 
SBGN comprehensive disease maps at LCSB.
SBGN comprehensive disease maps at LCSB.SBGN comprehensive disease maps at LCSB.
SBGN comprehensive disease maps at LCSB.
 
From Bench to Bedside: Research and Clinical Applications of Induced Pluripot...
From Bench to Bedside: Research and Clinical Applications of Induced Pluripot...From Bench to Bedside: Research and Clinical Applications of Induced Pluripot...
From Bench to Bedside: Research and Clinical Applications of Induced Pluripot...
 
2013 alumni-webinar
2013 alumni-webinar2013 alumni-webinar
2013 alumni-webinar
 
Stem cells in regenrative therapy
Stem cells in regenrative therapyStem cells in regenrative therapy
Stem cells in regenrative therapy
 
Copy number variations in monozygotic twins discordant for schizophrenia
Copy number variations in monozygotic twins discordant for schizophreniaCopy number variations in monozygotic twins discordant for schizophrenia
Copy number variations in monozygotic twins discordant for schizophrenia
 
Cloning - #Scichallenge2017
Cloning - #Scichallenge2017Cloning - #Scichallenge2017
Cloning - #Scichallenge2017
 
Apoptosis Pathway
 Apoptosis Pathway Apoptosis Pathway
Apoptosis Pathway
 
Elementary genetics by momen
Elementary genetics by momenElementary genetics by momen
Elementary genetics by momen
 
Pathology of CNS Degenerations Lecture
Pathology of CNS Degenerations LecturePathology of CNS Degenerations Lecture
Pathology of CNS Degenerations Lecture
 
Inference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldInference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' world
 
Genetics and health
Genetics and healthGenetics and health
Genetics and health
 

Mehr von Andrew Su

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
Andrew Su
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Andrew Su
 

Mehr von Andrew Su (20)

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciences
 
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
 
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebase
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease Research
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen Science
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledge
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)

  • 1. GeneGames.org The Gene Wiki: Crowdsourcing human gene annotation Andrew Su, Ph.D. @andrewsu asu@scripps.edu http://sulab.org OK Genome Informatics OK September 6, 2012
  • 2. 2 The Gene Wiki crib sheet http://www.slideshare.net/andrewsu • Bulk creation of ~10k Wikipedia articles (http://dx.doi.org/10.1371/journal.pbio.0060175) • Monthly stats: > 4 million views, > 1000 edits (http://dx.doi.org/10.1093/nar/gkr925) • Text mining reveals novel Gene Ontology and Disease Ontology annotations (http://dx.doi.org/doi:10.1186/1471-2164- 12-603) • Mash-up with SNPedia for crowdsourced gene- disease database (http://www.jbiomedsem.com/content/3/S1/S6) • Merging Wikipedia with the Semantic Web (http://dx.doi.org/10.1093/database/bar060)
  • 3. 3 Seven million human hours http://www.flickr.com/photos/archana3k1/4124330493/
  • 4. 4 Twenty million human hours http://www.flickr.com/photos/ableman/2171326385/
  • 5. 5 - 150 billion human hours per year http://www.flickr.com/photos/rvp-cw/6243289302/
  • 6. 6 Using games to fold proteins Fold.it players have successfully: • Outperformed state of the art protein folding algorithms (Cooper, Nature, 2010) • Solved a previously-intractable crystal structure (Khatib, Nat Struct Mol Biol, 2011) • Designed an improved protein folding algorithm (Khatib, PNAS, 2011) • Improved enzyme activity of de novo designed enzyme (Eiben, Nat Biotechnol, 2011) http://fold.it
  • 7. 7 Using games to fold RNAs http://eterna.cmu.edu/
  • 8. 8 Using games to align sequences http://phylo.cs.mcgill.ca
  • 9. 9 Using games to annotate genes? http://genegames.org
  • 10. 10 No good gene-disease annotation database Query: Apolipoprotein E Alzheimer's disease (AD) Lipoprotein glomerulopathy Sea-blue histiocyte disease
  • 11. 11 No good gene-disease annotation database Query: Apolipoprotein E Alzheimer's disease (AD) Lipoprotein glomerulopathy Sea-blue histiocyte disease Hyperlipoproteinemia, type III Macular degeneration, age-related Myocardial infarction susceptibility
  • 12. 12 No good gene-disease annotation database Query: Apolipoprotein E ? Alzheimer's disease (AD) ? Lipoprotein glomerulopathy ? Sea-blue histiocyte disease Hyperlipoproteinemia, type III ? Macular degeneration, age-related ? Myocardial infarction susceptibility HIV Psoriasis Vascular Diseases
  • 13. 13 No good gene-disease annotation database Query: Apolipoprotein E Alzheimer's disease (AD) Memory Coronary Artery Disease Neuropsychological Tests Hypertension Cognition Disorders Mental Status Schedule Psychiatric Status Rating Dementia Scales Cognition Hyperlipidemias Atrophy Disease Progression Dementia, Vascular Cardiovascular Diseases Parkinson Disease Brain Injuries Coronary Disease Myocardial Infarction Diabetes Mellitus, Type 2 … Memory Disorders 477 diseases!
  • 14. 14 Play Dizeez to annotate gene-disease links 6. Play to win! 5. Hurry! 4. Then on to the next question… 3. If it‟s „right‟, you get points 1. Read the clue (gene) 2. Click the related disease (only one is “right”)
  • 15. 15 Dizeez players seem pretty smart… In total (since Dec 2011): • 207 unique gamers • 1045 games played • 8525 guesses # Occurrences Gene Disease Pubmed OMIM PharmGKB Gene Wiki 7 GAST gastrinoma 7 RBP3 retinoblastoma 7 SSX1 synovial sarcoma 6 TG Graves' disease 6 CRYGC Cataract 6 SOX8 mental retardation 6 WRN Werner syndrome 6 ABL1 leukemia 6 MLL3 leukemia 6 SNAI2 breast carcinoma
  • 16. 16 Dizeez players seem pretty smart… In total (since Dec 2011): • 207 unique gamers • 1045 games played • 8525 guesses # Occurrences Gene Disease Pubmed OMIM PharmGKB Gene Wiki 5 MECOM sarcoma 4 ATF7 cancer 3 ABCB5 acute myeloid leukemia 3 SART1 glioblastoma 3 NCK1 leukemia 3 NEK1 cancer
  • 17. 17 Using games to predict phenotype from genotype? The Cure http://genegames.org
  • 18. 18 Classification problems in genome biology Classify new cancer normal samples find patterns cancer 100,000s features normal SVM Neural networks Naïve Bayes KNN … 100s samples
  • 19. 19 Random forests Sample subset of cases and Train decision cancer normal features tree 100,000s features 100s samples
  • 20. 20 Random forests cancer normal 100,000s features 100s samples
  • 21. 21 Random forests Classify new cancer normal samples cancer 100,000s features normal How to interject biological 100s samples knowledge?
  • 22. 22 Network-guided forests Dutkowski & Ideker (2011). PLoS Computational Biology
  • 23. 23 Network-guided forests Sample features by PPI Train decision cancer normal network tree 100,000s features 100s samples
  • 24. 24 Human-guided forests Sample features by Train decision cancer normal human tree intelligence 100,000s features 100s samples
  • 25. 25 The Cure: Genomic predictors for disease
  • 26. 26 The Cure: Genomic predictors for disease
  • 27. 27 The Cure: Genomic predictors for disease
  • 28. 28 The Cure: Genomic predictors for disease
  • 29. 29 The Cure: Genomic predictors for disease
  • 30. 30 The Cure: Genomic predictors for disease
  • 31. 31 Human-guided forests Classify new samples cancer normal
  • 32. 32 “Critical Assessment”-style challenge Will this work? Check our blog after October 15.
  • 33. 33 Collaborators Group members Doug Howe, ZFIN Ben Good Max Nanis John Hogenesch, U Penn Jon Huss, GNF Salvatore Loguercio Chunlei Wu Luca de Alfaro, UCSC Ian Macleod Angel Pizzaro, U Penn Faramarz Valafar, SDSU Pierre Lindenbaum, Fondation Jean Dausset Michael Martone, Rush Konrad Koehler, Karo Bio Warren Kibbe, Simon Lim, Northwestern Many Wikipedia editors WP:MCB Project Contact http://sulab.org Recruiting graduate students asu@scripps.edu in quantitative biology! See @andrewsu http://education.scripps.edu/ +Andrew Su Funding and Support @genegame (BioGPS: GM83924, Gene Wiki: GM089820)

Hinweis der Redaktion

  1. Empire state building
  2. One of the seven wonders of the modern world
  3. Except for a bit of personal pleasure, that expended effort has no societal valueOver last ~decade, “serious games” have attempted to harness this resourceTraining and educationHealth and fitness
  4. Question: how to interject biological knowledge in the feature selection process?