SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Using	
  computa.onal	
  predic.ons	
  to	
  improve	
  
literature-­‐based	
  Gene	
  Ontology	
  annota.ons	
  




                                    Julie	
  Park,	
  Ph.D.	
  
    Saccharomyces	
  Genome	
  Database	
  	
  •	
  	
  hAp://www.yeastgenome.org/	
  
     Department	
  of	
  Gene.cs	
  •	
  Stanford	
  University	
  School	
  of	
  Medicine	
  
Attaining curation nirvana…



•  Curation efficiency

•  Annotation consistency

•  Data accuracy
…is not easy!


Annotation errors
1.	
  Mistakes	
  in	
  capturing	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  the	
  annota.on	
  
	
  
2.	
  Outdated	
  informa.on	
  
	
  
3.	
  Missing	
  annota.ons	
  




               How	
  can	
  you	
  find	
  these	
  errors?	
  
Flavors of GO annotations
                 1.  Literature-based – “Manual”
                       Individually assigned by biocurators based on the
                       published literature

                 2.  Computationally-predicted – “Computational”
                       Automatically generated by in silico methods such as
                       protein signatures or computational algorithms

                    Sources of computational predictions in SGD
                        InterPro 1
                        Swiss-Prot Keywords (SPKW) 2
                        YeastFunc 3
                        BioPixie 4


1.   Camon, et al (2003) Genome Res. 13:662-72
2.   http://www.ebi.ac.uk/GOA/Swiss-ProtKeyword2GO.html
3.   Tian, et al (2008) Genome Biol. 9 Suppl 1:S7
4.   Huttenhower and Troyanskaya (2008) Bioinformatics. 24:i330-8
Flavors of GO annotations
       1.  Literature-based – “Manual”
             Individually assigned by biocurators based on the
             published literature

       2.  Computationally-predicted – “Computational”
             Automatically generated by in silico methods such as
             protein signatures or computational algorithms

          Sources of computational predictions in SGD
             InterPro 1
             Swiss-Prot Keywords (SPKW) 2
             YeastFunc 3
             BioPixie 4


Is	
  it	
  possible	
  to	
  take	
  advantage	
  of	
  the	
  strengths	
  of	
  computa=onal	
  	
  
  predic=ons	
  and	
  leverage	
  these	
  annota=ons	
  to	
  improve	
  manual	
  
                                               ones?	
  	
  
CvManGO:
Computational vs. Manual GO Annotations


          M              ✓	
  
                                            C
                 ✓	
             ✓	
  
          M
                                    ✗	
  
                 ✓	
                        C
                            ✓	
  
          M
CvManGO:
Computational vs. Manual GO Annotations




                     M
                                         ✗	
  
                                                    C



  Do	
  discrepancies	
  between	
  a	
  literature-­‐based	
  annota.on	
  	
  
        and	
  a	
  computa.onal	
  predic.on	
  indicate	
  that	
  the	
  	
  
            manual	
  annota.on	
  needs	
  to	
  be	
  updated?	
  	
  
CvManGO:
   Computational vs. Manual GO Annotations

                                     C
   OK
                M   =   C
    ✓	
  
                                     M


                    M
Discrepancies

     ✗	
  
                                No parent-child
                    C       M   relationship      C
                                between terms
CvManGO:
         Computational vs. Manual GO Annotations

                                                                                 C
                                                M               =   C

                                                                                 M


                                                            M
             4379	
  genes	
  

              Reviewed	
  	
  
                                                                            No parent-child
              336	
  genes	
                                    C       M   relationship
                                                                            between terms
                                                                                              C
from	
  October	
  2009	
  gene_associa=on.sgd	
  file	
  	
  
(6353	
  total	
  genes)	
  
Discrepancies can identify genes that need updating
                                                   100%"


                                                   90%"




             percentage of total genes reviewed"
                                                   80%"


                                                   70%"


                                                   60%"


                                                   50%"                                                         no change"

                                                   40%"
                                                                                                                updatable"

                                                   30%"


                                                   20%"


                                                   10%"


                                                    0%"

                                                           Control	
         Genes	
  flagged	
  with	
  
                                                                         discrepant	
  annota.on	
  pairs	
  
Extrapolating to the entire genome


                                                          no
                                                     computational
                                                       prediction"
                                                       available"
                                                         15%"
                                                                 Not flagged "
                                                                 for review: "
                             flagged for review:"              no discrepancies"
                            resulting in potential           with computational
                               improvement"                      predictions"
                                    53%"                            16%"

                                                          flagged for
                                                            review: "
                                                          no change
                                                           needed"
                                                             16%"




S.ll	
  requires	
  reviewing	
  4379/6353	
  genes—can	
  we	
  narrow	
  this	
  down	
  
Factoring in the type of update


      flagged for review"           no
     resulting in potential   computational
        improvement"            prediction"
    (add novel annotation)"     available"
             20%"                 15%"
                                          Not flagged "
                                          for review: "
                                       no discrepancies"
                                      with computational
                                          predictions"
      flagged for review "                    16%"
     resulting in potential
         improvement"
       (refinement and "            flagged for
                                     review: "
         removal only)"
                                   no change
             33%"
                                    needed"
                                      16%"
Factoring in the type of update


      flagged for review"           no
     resulting in potential   computational
        improvement"            prediction"
    (add novel annotation)"     available"
             20%"                 15%"
                                          Not flagged "
                                          for review: "
                                       no discrepancies"
                                      with computational
                                          predictions"
      flagged for review "                    16%"
     resulting in potential
         improvement"
       (refinement and "            flagged for
                                     review: "
         removal only)"
                                   no change
             33%"
                                    needed"
                                      16%"
Attributes of flagged genes

 What	
  are	
  factors	
  that	
  enrich	
  for	
  genes	
  missing	
  annota.ons?	
  


•    Type	
  of	
  discrepancy	
  
•    GO	
  aspect	
  
•    Amount	
  of	
  literature	
  for	
  a	
  gene	
  
•    Source	
  of	
  the	
  computa.onal	
  predic.on	
  
•    Number	
  of	
  computa.onal	
  sources	
  with	
  discrepancies	
  
Attributes of flagged genes

 What	
  are	
  factors	
  that	
  enrich	
  for	
  genes	
  missing	
  annota.ons?	
  


•    Type	
  of	
  discrepancy	
  
•    GO	
  aspect	
  
•    Amount	
  of	
  literature	
  for	
  a	
  gene	
  
•    Source	
  of	
  the	
  computa.onal	
  predic.on	
  
•    Number	
  of	
  computa.onal	
  sources	
  with	
  discrepancies	
  
Analysis by Class of Discrepancies



    M
            Shallow class
    C




            Mismatch class
M       C
Analysis by Class of Discrepancies

                              % updatable


    M
            Shallow class       78.8%
    C




            Mismatch class      59.2%
M       C
Types of annotation updates by class

     Mismatch                                     Shallow




          18
42                                                               5 16
                     32                                           1 4
                                                  138
          20
                                                            24
     16        15                                                 20
          14




                    Number of genes needing:
                      annotation refinement
                      annotation removal
                      novel annotation addition
Summary & Conclusions

•  Majority of S. cerevisiae literature-based GO annotations are good

•  Comparing manual vs. computational prediction can identify genes
   whose annotations need updating

•  Additional work needs to be done to pinpoint these annotations
   and genes
Summary & Conclusions

•  Majority of S. cerevisiae literature-based GO annotations are good

•  Comparing manual vs. computational prediction can identify genes
   whose annotations need updating

•  Additional work needs to be done to pinpoint these annotations
   and genes




             It works but
       there is still work to do!
Future plans
•  Identify predictive features of genes that need updating
 -  Are there specific GO terms used for manual curation more likely to be updated?
 -  Do specific computational predictions indicate a GO term should be updated?
 -  Examine node distance between GO terms used for computational and
    literature-based annotations
 -  Examine contribution of annotation date and new publications
 -  A combination of or all of the above?

•  Evaluate the accuracy of computational predictions for S. cerevisiae

•  Expand to evaluate annotations made based on orthology
 -  Annotations from GOC PAINT project


•  Develop a pipeline for curation prioritization at SGD

•  Extend to other annotation projects
GO	
  Consor.um	
  




http://www.geneontology.org/
Saccharomyces	
  Genome	
  Database	
  staff	
  




    @yeastgenome                                    http://on.fb.me/ksgskb
                     sgd-helpdesk@lists.stanford.edu
                                                                                  	
  
      Supported	
  by	
  NIH,	
  Na.onal	
  Human	
  Genome	
  Research	
  Ins.tute

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Dubai Call Girls/// Hot Afternoon O525547819 Call Girls In Dubai
Dubai Call Girls/// Hot Afternoon O525547819 Call Girls In DubaiDubai Call Girls/// Hot Afternoon O525547819 Call Girls In Dubai
Dubai Call Girls/// Hot Afternoon O525547819 Call Girls In Dubaikojalkojal131
 
Osisko Development - Investor Presentation - May 2024
Osisko Development - Investor Presentation - May 2024Osisko Development - Investor Presentation - May 2024
Osisko Development - Investor Presentation - May 2024Philip Rabenok
 
countries with the highest gold reserves in 2024
countries with the highest gold reserves in 2024countries with the highest gold reserves in 2024
countries with the highest gold reserves in 2024Kweku Zurek
 
The Leonardo 1Q 2024 Results Presentation
The Leonardo 1Q 2024 Results PresentationThe Leonardo 1Q 2024 Results Presentation
The Leonardo 1Q 2024 Results PresentationLeonardo
 
Osisko Gold Royalties Ltd - Corporate Presentation, May 2024
Osisko Gold Royalties Ltd - Corporate Presentation, May 2024Osisko Gold Royalties Ltd - Corporate Presentation, May 2024
Osisko Gold Royalties Ltd - Corporate Presentation, May 2024Osisko Gold Royalties Ltd
 
Gorakhpur Call Girls 8250092165 Low Price Escorts Service in Your Area
Gorakhpur Call Girls 8250092165 Low Price Escorts Service in Your AreaGorakhpur Call Girls 8250092165 Low Price Escorts Service in Your Area
Gorakhpur Call Girls 8250092165 Low Price Escorts Service in Your Areameghakumariji156
 
Teekay Tankers Q1-24 Earnings Presentation
Teekay Tankers Q1-24 Earnings PresentationTeekay Tankers Q1-24 Earnings Presentation
Teekay Tankers Q1-24 Earnings PresentationTeekay Tankers Ltd
 
Premium Call Girls In Kapurthala} 9332606886❤️VVIP Sonya Call Girls
Premium Call Girls In Kapurthala} 9332606886❤️VVIP Sonya Call GirlsPremium Call Girls In Kapurthala} 9332606886❤️VVIP Sonya Call Girls
Premium Call Girls In Kapurthala} 9332606886❤️VVIP Sonya Call Girlsmeghakumariji156
 
Jual obat aborsi Tawangmangu ( 085657271886 ) Cytote pil telat bulan penggugu...
Jual obat aborsi Tawangmangu ( 085657271886 ) Cytote pil telat bulan penggugu...Jual obat aborsi Tawangmangu ( 085657271886 ) Cytote pil telat bulan penggugu...
Jual obat aborsi Tawangmangu ( 085657271886 ) Cytote pil telat bulan penggugu...Klinik kandungan
 
AMG Quarterly Investor Presentation May 2024
AMG Quarterly Investor Presentation May 2024AMG Quarterly Investor Presentation May 2024
AMG Quarterly Investor Presentation May 2024gstubel
 
一比一原版(Acadia毕业证书)加拿大阿卡迪亚大学毕业证学历认证可查认证
一比一原版(Acadia毕业证书)加拿大阿卡迪亚大学毕业证学历认证可查认证一比一原版(Acadia毕业证书)加拿大阿卡迪亚大学毕业证学历认证可查认证
一比一原版(Acadia毕业证书)加拿大阿卡迪亚大学毕业证学历认证可查认证xzxvi5zp
 
Financial Results for the Fiscal Year Ended March 2024
Financial Results for the Fiscal Year Ended March 2024Financial Results for the Fiscal Year Ended March 2024
Financial Results for the Fiscal Year Ended March 2024KDDI
 
Terna - 1Q 2024 Consolidated Results Presentation
Terna - 1Q 2024 Consolidated Results PresentationTerna - 1Q 2024 Consolidated Results Presentation
Terna - 1Q 2024 Consolidated Results PresentationTerna SpA
 
Teekay Corporation Q1-24 Earnings Results
Teekay Corporation Q1-24 Earnings ResultsTeekay Corporation Q1-24 Earnings Results
Teekay Corporation Q1-24 Earnings ResultsTeekay Corporation
 
Corporate Presentation Probe Canaccord Conference 2024.pdf
Corporate Presentation Probe Canaccord Conference 2024.pdfCorporate Presentation Probe Canaccord Conference 2024.pdf
Corporate Presentation Probe Canaccord Conference 2024.pdfProbe Gold
 
Camil Institutional Presentation_Mai24.pdf
Camil Institutional Presentation_Mai24.pdfCamil Institutional Presentation_Mai24.pdf
Camil Institutional Presentation_Mai24.pdfCAMILRI
 
Camil Institutional Presentation_Mai24.pdf
Camil Institutional Presentation_Mai24.pdfCamil Institutional Presentation_Mai24.pdf
Camil Institutional Presentation_Mai24.pdfCAMILRI
 
一比一原版(EUR毕业证书)鹿特丹伊拉斯姆斯大学毕业证原件一模一样
一比一原版(EUR毕业证书)鹿特丹伊拉斯姆斯大学毕业证原件一模一样一比一原版(EUR毕业证书)鹿特丹伊拉斯姆斯大学毕业证原件一模一样
一比一原版(EUR毕业证书)鹿特丹伊拉斯姆斯大学毕业证原件一模一样sovco
 

Kürzlich hochgeladen (20)

Dubai Call Girls/// Hot Afternoon O525547819 Call Girls In Dubai
Dubai Call Girls/// Hot Afternoon O525547819 Call Girls In DubaiDubai Call Girls/// Hot Afternoon O525547819 Call Girls In Dubai
Dubai Call Girls/// Hot Afternoon O525547819 Call Girls In Dubai
 
Osisko Development - Investor Presentation - May 2024
Osisko Development - Investor Presentation - May 2024Osisko Development - Investor Presentation - May 2024
Osisko Development - Investor Presentation - May 2024
 
countries with the highest gold reserves in 2024
countries with the highest gold reserves in 2024countries with the highest gold reserves in 2024
countries with the highest gold reserves in 2024
 
The Leonardo 1Q 2024 Results Presentation
The Leonardo 1Q 2024 Results PresentationThe Leonardo 1Q 2024 Results Presentation
The Leonardo 1Q 2024 Results Presentation
 
Osisko Gold Royalties Ltd - Corporate Presentation, May 2024
Osisko Gold Royalties Ltd - Corporate Presentation, May 2024Osisko Gold Royalties Ltd - Corporate Presentation, May 2024
Osisko Gold Royalties Ltd - Corporate Presentation, May 2024
 
Gorakhpur Call Girls 8250092165 Low Price Escorts Service in Your Area
Gorakhpur Call Girls 8250092165 Low Price Escorts Service in Your AreaGorakhpur Call Girls 8250092165 Low Price Escorts Service in Your Area
Gorakhpur Call Girls 8250092165 Low Price Escorts Service in Your Area
 
Teekay Tankers Q1-24 Earnings Presentation
Teekay Tankers Q1-24 Earnings PresentationTeekay Tankers Q1-24 Earnings Presentation
Teekay Tankers Q1-24 Earnings Presentation
 
Premium Call Girls In Kapurthala} 9332606886❤️VVIP Sonya Call Girls
Premium Call Girls In Kapurthala} 9332606886❤️VVIP Sonya Call GirlsPremium Call Girls In Kapurthala} 9332606886❤️VVIP Sonya Call Girls
Premium Call Girls In Kapurthala} 9332606886❤️VVIP Sonya Call Girls
 
Jual obat aborsi Tawangmangu ( 085657271886 ) Cytote pil telat bulan penggugu...
Jual obat aborsi Tawangmangu ( 085657271886 ) Cytote pil telat bulan penggugu...Jual obat aborsi Tawangmangu ( 085657271886 ) Cytote pil telat bulan penggugu...
Jual obat aborsi Tawangmangu ( 085657271886 ) Cytote pil telat bulan penggugu...
 
AMG Quarterly Investor Presentation May 2024
AMG Quarterly Investor Presentation May 2024AMG Quarterly Investor Presentation May 2024
AMG Quarterly Investor Presentation May 2024
 
一比一原版(Acadia毕业证书)加拿大阿卡迪亚大学毕业证学历认证可查认证
一比一原版(Acadia毕业证书)加拿大阿卡迪亚大学毕业证学历认证可查认证一比一原版(Acadia毕业证书)加拿大阿卡迪亚大学毕业证学历认证可查认证
一比一原版(Acadia毕业证书)加拿大阿卡迪亚大学毕业证学历认证可查认证
 
Financial Results for the Fiscal Year Ended March 2024
Financial Results for the Fiscal Year Ended March 2024Financial Results for the Fiscal Year Ended March 2024
Financial Results for the Fiscal Year Ended March 2024
 
Terna - 1Q 2024 Consolidated Results Presentation
Terna - 1Q 2024 Consolidated Results PresentationTerna - 1Q 2024 Consolidated Results Presentation
Terna - 1Q 2024 Consolidated Results Presentation
 
Teekay Corporation Q1-24 Earnings Results
Teekay Corporation Q1-24 Earnings ResultsTeekay Corporation Q1-24 Earnings Results
Teekay Corporation Q1-24 Earnings Results
 
Corporate Presentation Probe Canaccord Conference 2024.pdf
Corporate Presentation Probe Canaccord Conference 2024.pdfCorporate Presentation Probe Canaccord Conference 2024.pdf
Corporate Presentation Probe Canaccord Conference 2024.pdf
 
Camil Institutional Presentation_Mai24.pdf
Camil Institutional Presentation_Mai24.pdfCamil Institutional Presentation_Mai24.pdf
Camil Institutional Presentation_Mai24.pdf
 
Camil Institutional Presentation_Mai24.pdf
Camil Institutional Presentation_Mai24.pdfCamil Institutional Presentation_Mai24.pdf
Camil Institutional Presentation_Mai24.pdf
 
Osisko Gold Royalties Ltd - Q1 2024 Results
Osisko Gold Royalties Ltd - Q1 2024 ResultsOsisko Gold Royalties Ltd - Q1 2024 Results
Osisko Gold Royalties Ltd - Q1 2024 Results
 
SME IPO Opportunity and Trends of May 2024
SME IPO Opportunity and Trends of May 2024SME IPO Opportunity and Trends of May 2024
SME IPO Opportunity and Trends of May 2024
 
一比一原版(EUR毕业证书)鹿特丹伊拉斯姆斯大学毕业证原件一模一样
一比一原版(EUR毕业证书)鹿特丹伊拉斯姆斯大学毕业证原件一模一样一比一原版(EUR毕业证书)鹿特丹伊拉斯姆斯大学毕业证原件一模一样
一比一原版(EUR毕业证书)鹿特丹伊拉斯姆斯大学毕业证原件一模一样
 

Empfohlen

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Using computational predictions to improve literature-based Gene Ontology annotations, Julie Park

  • 1. Using  computa.onal  predic.ons  to  improve   literature-­‐based  Gene  Ontology  annota.ons   Julie  Park,  Ph.D.   Saccharomyces  Genome  Database    •    hAp://www.yeastgenome.org/   Department  of  Gene.cs  •  Stanford  University  School  of  Medicine  
  • 2. Attaining curation nirvana… •  Curation efficiency •  Annotation consistency •  Data accuracy
  • 3. …is not easy! Annotation errors 1.  Mistakes  in  capturing                        the  annota.on     2.  Outdated  informa.on     3.  Missing  annota.ons   How  can  you  find  these  errors?  
  • 4. Flavors of GO annotations 1.  Literature-based – “Manual” Individually assigned by biocurators based on the published literature 2.  Computationally-predicted – “Computational” Automatically generated by in silico methods such as protein signatures or computational algorithms Sources of computational predictions in SGD InterPro 1 Swiss-Prot Keywords (SPKW) 2 YeastFunc 3 BioPixie 4 1. Camon, et al (2003) Genome Res. 13:662-72 2. http://www.ebi.ac.uk/GOA/Swiss-ProtKeyword2GO.html 3. Tian, et al (2008) Genome Biol. 9 Suppl 1:S7 4. Huttenhower and Troyanskaya (2008) Bioinformatics. 24:i330-8
  • 5. Flavors of GO annotations 1.  Literature-based – “Manual” Individually assigned by biocurators based on the published literature 2.  Computationally-predicted – “Computational” Automatically generated by in silico methods such as protein signatures or computational algorithms Sources of computational predictions in SGD InterPro 1 Swiss-Prot Keywords (SPKW) 2 YeastFunc 3 BioPixie 4 Is  it  possible  to  take  advantage  of  the  strengths  of  computa=onal     predic=ons  and  leverage  these  annota=ons  to  improve  manual   ones?    
  • 6. CvManGO: Computational vs. Manual GO Annotations M ✓   C ✓   ✓   M ✗   ✓   C ✓   M
  • 7. CvManGO: Computational vs. Manual GO Annotations M ✗   C Do  discrepancies  between  a  literature-­‐based  annota.on     and  a  computa.onal  predic.on  indicate  that  the     manual  annota.on  needs  to  be  updated?    
  • 8. CvManGO: Computational vs. Manual GO Annotations C OK M = C ✓   M M Discrepancies ✗   No parent-child C M relationship C between terms
  • 9. CvManGO: Computational vs. Manual GO Annotations C M = C M M 4379  genes   Reviewed     No parent-child 336  genes   C M relationship between terms C from  October  2009  gene_associa=on.sgd  file     (6353  total  genes)  
  • 10. Discrepancies can identify genes that need updating 100%" 90%" percentage of total genes reviewed" 80%" 70%" 60%" 50%" no change" 40%" updatable" 30%" 20%" 10%" 0%" Control   Genes  flagged  with   discrepant  annota.on  pairs  
  • 11. Extrapolating to the entire genome no computational prediction" available" 15%" Not flagged " for review: " flagged for review:" no discrepancies" resulting in potential with computational improvement" predictions" 53%" 16%" flagged for review: " no change needed" 16%" S.ll  requires  reviewing  4379/6353  genes—can  we  narrow  this  down  
  • 12. Factoring in the type of update flagged for review" no resulting in potential computational improvement" prediction" (add novel annotation)" available" 20%" 15%" Not flagged " for review: " no discrepancies" with computational predictions" flagged for review " 16%" resulting in potential improvement" (refinement and " flagged for review: " removal only)" no change 33%" needed" 16%"
  • 13. Factoring in the type of update flagged for review" no resulting in potential computational improvement" prediction" (add novel annotation)" available" 20%" 15%" Not flagged " for review: " no discrepancies" with computational predictions" flagged for review " 16%" resulting in potential improvement" (refinement and " flagged for review: " removal only)" no change 33%" needed" 16%"
  • 14. Attributes of flagged genes What  are  factors  that  enrich  for  genes  missing  annota.ons?   •  Type  of  discrepancy   •  GO  aspect   •  Amount  of  literature  for  a  gene   •  Source  of  the  computa.onal  predic.on   •  Number  of  computa.onal  sources  with  discrepancies  
  • 15. Attributes of flagged genes What  are  factors  that  enrich  for  genes  missing  annota.ons?   •  Type  of  discrepancy   •  GO  aspect   •  Amount  of  literature  for  a  gene   •  Source  of  the  computa.onal  predic.on   •  Number  of  computa.onal  sources  with  discrepancies  
  • 16. Analysis by Class of Discrepancies M Shallow class C Mismatch class M C
  • 17. Analysis by Class of Discrepancies % updatable M Shallow class 78.8% C Mismatch class 59.2% M C
  • 18. Types of annotation updates by class Mismatch Shallow 18 42 5 16 32 1 4 138 20 24 16 15 20 14 Number of genes needing: annotation refinement annotation removal novel annotation addition
  • 19. Summary & Conclusions •  Majority of S. cerevisiae literature-based GO annotations are good •  Comparing manual vs. computational prediction can identify genes whose annotations need updating •  Additional work needs to be done to pinpoint these annotations and genes
  • 20. Summary & Conclusions •  Majority of S. cerevisiae literature-based GO annotations are good •  Comparing manual vs. computational prediction can identify genes whose annotations need updating •  Additional work needs to be done to pinpoint these annotations and genes It works but there is still work to do!
  • 21. Future plans •  Identify predictive features of genes that need updating -  Are there specific GO terms used for manual curation more likely to be updated? -  Do specific computational predictions indicate a GO term should be updated? -  Examine node distance between GO terms used for computational and literature-based annotations -  Examine contribution of annotation date and new publications -  A combination of or all of the above? •  Evaluate the accuracy of computational predictions for S. cerevisiae •  Expand to evaluate annotations made based on orthology -  Annotations from GOC PAINT project •  Develop a pipeline for curation prioritization at SGD •  Extend to other annotation projects
  • 23. Saccharomyces  Genome  Database  staff   @yeastgenome http://on.fb.me/ksgskb sgd-helpdesk@lists.stanford.edu   Supported  by  NIH,  Na.onal  Human  Genome  Research  Ins.tute