SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Musite: Prediction of Protein
  Phosphorylation Sites


               Jianjiong Gao
          University of Missouri Columbia
                        Missouri,
           http://musite.sourceforge.net/
Background:
       Protein Phosphorylation
Protein phosphorylation is one of the most
important p
  p       post-translational modifications.
  It was estimated that up to 50% of proteins are
  phosphorylated in some cellular state
  Abnormality in phosphorylation is a cause or
  consequence of many diseases
    Cancer
    Diabete
    Parkinson’s
    Hepertitis B
    …
Background:
       Protein Phosphorylation
Phosphorylation-dephosphorylation is a
biochemical switch system regulating
                     y       g      g
various cellular processes.
Catalyzed by various specific protein
kinases.
                   Kinase
                               ON

           OFF
                 Phosphatase
Phosphorylation Site Prediction
         Problem Formulation



Phosphorylation site: a phosphorylated amino acid
in a protein (determined by protein sequence)
General phosphorylation site prediction: to predict
whether an amino acid can be phosphorylated
Kinase-specific p
         p      phosphorylation site p
                    p y              prediction: to
predict whether an amino acid can be
p
phosphorylated by a specific kinase
     p y          y  p
Based on protein sequence only
Limitations of Current Methods

Current prediction tools have
limitations when applying to whole
proteomes
 Prediction accuracy could be improved
 Most were released as web servers and have
 restrictions for the uploaded data by users
 Training data were out of date
 Stringency adjustment was not fully
 supported
Our tool Musite is unique

Novel method with better accuracy
First open source tool in the field that meet
      open-source
OSI Open Standards Requirement
Standalone program designed for proteome-
scale prediction
      p
Support both general and kinase-specific
phosphorylation site prediction
Support customized model training
Support continuous stringency adjustment
Phosphorylation Site Prediction
                 Flowchart
Data collection from high quality sources,            Training data
such as Uniprot/Swiss-Prot,Phospho.ELM,
      PhosphoPep,and PhosPhAt                              Bootstrap

   Non-redundant datasets built by BLASTclust
                                                   Bootstrap
                                                   sample 1
                                                                     ...       Bootstrap
                                                                               sample m
                                                                   Training
Phosphorylation it
Ph h l ti sites        Non-phosphorylation it
                       N h h l ti sites
           Feature extraction                       Classifier 1     ...      Classifier m
          KNN scores   Disorder scores
          Amino acid frequencies                              Aggregating
                                                Specificity
 Features from                 Features from    estimation Phosphorylation
  positive set                  negative set
                                                            prediction model

                                Control data                Making predictions
                                                            on new data
Phosphorylation Site Prediction
                 Data Extraction
Data collection from high quality sources,            Training data
such as Uniprot/Swiss-Prot,Phospho.ELM,
      PhosphoPep,and PhosPhAt                              Bootstrap

   Non-redundant datasets built by BLASTclust
                                                   Bootstrap
                                                   sample 1
                                                                     ...       Bootstrap
                                                                               sample m
                                                                   Training
Phosphorylation it
Ph h l ti sites        Non-phosphorylation it
                       N h h l ti sites
           Feature extraction                       Classifier 1     ...      Classifier m
          KNN scores   Disorder scores
          Amino acid frequencies                              Aggregating
                                                Specificity
 Features from                 Features from    estimation Phosphorylation
  positive set                  negative set
                                                            prediction model

                                Control data                Making predictions
                                                            on new data
Phosphorylation Site Prediction
                 Feature Extraction
Data collection from high quality sources,            Training data
such as Uniprot/Swiss-Prot,Phospho.ELM,
      PhosphoPep,and PhosPhAt                              Bootstrap

   Non-redundant datasets built by BLASTclust
                                                   Bootstrap
                                                   sample 1
                                                                     ...       Bootstrap
                                                                               sample m
                                                                   Training
Phosphorylation it
Ph h l ti sites        Non-phosphorylation it
                       N h h l ti sites
           Feature extraction                       Classifier 1     ...      Classifier m
          KNN scores   Disorder scores
          Amino acid frequencies                              Aggregating
                                                Specificity
 Features from                 Features from    estimation Phosphorylation
  positive set                  negative set
                                                            prediction model

                                Control data                Making predictions
                                                            on new data
Phosphorylation Site Prediction
                 Feature Extraction
Data collection from high quality sources,            Training data
such as Uniprot/Swiss-Prot,Phospho.ELM,
      PhosphoPep,and PhosPhAt                              Bootstrap

   Non-redundant datasets built by BLASTclust
                                                   Bootstrap
                                                   sample 1
                                                                     ...       Bootstrap
                                                                               sample m
                                                                   Training
Phosphorylation it
Ph h l ti sites        Non-phosphorylation it
                       N h h l ti sites
           Feature extraction                       Classifier 1     ...      Classifier m
          KNN scores   Disorder scores
          Amino acid frequencies                              Aggregating
                                                Specificity
 Features from                 Features from    estimation Phosphorylation
  positive set                  negative set
                                                            prediction model

                                Control data                Making predictions
                                                            on new data
KNN Features
        Motivation
Rationale of using KNN features: local
sequence clusters exist around
phosphorylation sites, since
  Each phosphorylation site is a substrate of a specific
  protein kinase
  Substrates of the same kinase or kinase family
  usually shares similar patterns in local sequences
KNN Features
         Result
                                                                       (A)

Overall, phosphosites                                              Phospho           Nonphospho

have larger KNN scores                 1



than non-phosphosites                 0.8




                               core
                          KNN sc
                                      0.6



Average KNN scores                    0.4


  0.7~0.8 for phosphosites            0.2


  ≈0.5 for non-phosphosites            0
                                            0.25
                                            0 25         0.5
                                                         05              1            2           4
                                                   Size of nearest neighbors (% of sample size)


                                                   Boxplot of KNN features
                                                      (Human S /Th )
                                                      (H       Ser/Thr)
Disorder Features
        Concept & Rationale

Disordered region (structure)
 Some parts of a protein have a rigid structure,
 such as α-helix and β-sheet.
 Other parts, disordered regions, do not have
 well defined
 well-defined conformations
 The conformational flexibility of disordered
 regions may facilitate protein phosphorylation
 [Dunker, 2008]: protein phosphorylation sites
 are frequently located within disordered regions
Disorder Features
             Result
For h
F phosphosites
       h it                                                     (A) Phospho-S/T in H. sapiens
                                                                                                        6
  Occurrence increases exponentially             10000                                                  5
  when d so de sco e increases
    e disorder score c eases                                                                            4
For non-phosphosites                                  5000                                              3
                                                                                                        2
  Significantly different distribution




                                         occurrence
                                                  e
                                                         0                                              1
                                                          0        0.2     0.4     0.6      0.8     1
                                                           x 10
                                                               5
                                                                (B) Non-phospho-S/T in H. sapiens       0
Disorder score > 0.5                                   2.5
                                                                                                        -1
                                                        2
  Phosphosites: ~91%                                                                                    -2
                                                       1.5
  Non-phosphosites: ~55%                                                                                -3
                                                        1
Phosphosites are significantly                         0.5
                                                       05
                                                                                                        -4

over-represented in disordered                          0
                                                                                                        -5
                                                                                                        -6
regions                                                  0       0.2      0.4      0.6
                                                                        Disorder Score
                                                                                           0.8      1


                                                Histogram of disorder features
                                                      (Human Ser/Thr)
Amino Acid Frequencies
                              Result
                 quency)     1
                           0.5
                             0
Log2(Ratio of Freq



                           -0.5       H. sapiens (S/T)
                                      M. musculus (S/T)
                            -1
                             1
                                      D. melanogaster (S/T)
                           -1.5       C. elegans (S/T)
                            -2
                             2        S. cerevisiae (S/T)
                                                    ( )
  g




                                      A. thaliana (S/T)
                           -2.5
                                  P R D E S K G A Q N V T H L M I F Y W C
                                                  Amino Acid
                                                  A i A id

                   P, R, D, E, S, K, and G are enriched around
                   phosphosites
                   C, W, Y, F, I, M, L, H, T, and V are depleted
Phosphorylation Site Prediction
                 Classifier Training
Data collection from high quality sources,            Training data
such as Uniprot/Swiss-Prot,Phospho.ELM,
      PhosphoPep,and PhosPhAt                              Bootstrap

   Non-redundant datasets built by BLASTclust
                                                   Bootstrap
                                                   sample 1
                                                                     ...       Bootstrap
                                                                               sample m
                                                                   Training
Phosphorylation it
Ph h l ti sites        Non-phosphorylation it
                       N h h l ti sites
           Feature extraction                       Classifier 1     ...      Classifier m
          KNN scores   Disorder scores
          Amino acid frequencies                              Aggregating
                                                Specificity
 Features from                 Features from    estimation Phosphorylation
  positive set                  negative set
                                                            prediction model

                                Control data                Making predictions
                                                            on new data
Results
        Trained Models
General Prediction      Kinase-Specific
  Human ser/thr
           /            Prediction
  Human tyr               ATM
  Mouse ser/thr           CDK/CDK1/CDK2
  Mouse tyr               CK1/CK2
  Fluit fly ser/thr       MAPK1/MAPK3
  Worm ser/thr            PKA
  Yeast ser/thr           PKB
  Arabidopsis ser/thr     PKC
                          Src
Results
                       Cross validation
               1
                                                                            C. elegans (S/T)
                                                                            A. thaliana (S/T)
                                                                            H. sapiens (S/T)
              0.8
              08
                                                                            M. musculus (S/T)
                                                                            S. cerevisiae (S/T)
                            0.8                                             D. melanogaster (S/T)
Sensitivity
          y




              0.6
              06
                                                                            M. musculus (Y)
                            0.6                                             H. sapiens (Y)
                                                                            Random guess
              0.4
              04
S




                            0.4


              0.2
              02            0.2


                             0
                              0         0.02   0.04   0.06     0.08   0.1
               0
                0     0.2         0.4          0.6           0.8        1
                                  1 - Specificity
Results
             Comparison to other tools
               1

              0.9
                                                              Musite
              0.8
              08
                                                              Scan-x
              0.7                                             DISPHOS
                                                              NetPhos
              0.6
              06
Sensitivity




                          0.6
              0.5
S




              0.4
              0           0.4

              0.3
                          0.2
              0.2

              0.1
                           0
                            0    0.02
                                 0 02      0.04
                                           0 04       0.06
                                                      0 06    0.08
                                                              0 08   0.1
                                                                     01
               0
                0   0.2         0.4             0.6          0.8           1
                                  1 - Specificity
Phosphorylation Site Prediction
        Software Implementation-Musite

Open Source
  License: GNU General Public License (GPL)
  http://musite.sourceforge.net/
  http://musite sourceforge net/
Stand-alone application
  Based on Java
  Support Windows Linux and Mac OS X
          Windows, Linux,
A web server is also being developed
                         g       p
  http://musite.net/
Implementation
   User Interface
Implementation
      Customized Model Training

A unique utility for users to train
prediction models f
   di ti       d l from th i own d t
                         their      data
  Take advantage of latest data
  Train disease-specific models
  Train organ-specific models
  Integrate into experimental p
       g           p          procedure in an
  iterative way
Summary

Musite is for prediction of general and kinase-
specific phosphosites in a better accuracy


Musite is a open-source standalone program
capable of performing proteome-wide
                      proteome wide
predictions
Acknowledgements

Dr. Dong Xu (University of Missouri)
Dr. Jay Thelen (U e s ty o Missouri)
          e e (University of ssou )
Dr. Keith Dunker (Indiana University)
Curtis Bollinger (University of Missouri)


Funding                          Visit us at
   NSF [# DBI 0604439]
          DBI-0604439]               http://musite.sourceforge.net
                                        p                   g
   NIH [# R21/R33 GM078601]          http://musite.net
                                     Poster R09 at ISMB

Weitere ähnliche Inhalte

Andere mochten auch

The Case For The Sustainable Workplace
The Case For The Sustainable WorkplaceThe Case For The Sustainable Workplace
The Case For The Sustainable WorkplaceStephanie Stine
 
IPad boot camp iste 2013 without videos
IPad boot camp iste 2013 without videosIPad boot camp iste 2013 without videos
IPad boot camp iste 2013 without videosKevin Amboe
 
Benjamín arditi (democracia postliberal participativa)
Benjamín arditi (democracia postliberal participativa)Benjamín arditi (democracia postliberal participativa)
Benjamín arditi (democracia postliberal participativa)Adolfo Orive
 
Оптимизация интерактивного тестирования с использованием метрики Покрытие кода
Оптимизация интерактивного тестирования с использованием метрики Покрытие кодаОптимизация интерактивного тестирования с использованием метрики Покрытие кода
Оптимизация интерактивного тестирования с использованием метрики Покрытие кодаSPB SQA Group
 
Article Fogo Glissement Caldeira
Article Fogo Glissement CaldeiraArticle Fogo Glissement Caldeira
Article Fogo Glissement Caldeiranastydette
 
RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development
RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development
RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development Naz Torabi
 
Marketing Life Prospective 2012
Marketing Life Prospective 2012Marketing Life Prospective 2012
Marketing Life Prospective 2012Arif Mahmood
 
Edison.powerpoint.106.v2
Edison.powerpoint.106.v2Edison.powerpoint.106.v2
Edison.powerpoint.106.v2aedison
 
Advanced Nutrients thesystemmagalog
Advanced Nutrients thesystemmagalogAdvanced Nutrients thesystemmagalog
Advanced Nutrients thesystemmagalogJean Smith
 
Academic Honesty at Oxford College of Emory University: Fall 2011
Academic Honesty at Oxford College of Emory University: Fall 2011Academic Honesty at Oxford College of Emory University: Fall 2011
Academic Honesty at Oxford College of Emory University: Fall 2011oxfordcollegelibrary
 
Hoe schrijf je een brief?
Hoe schrijf je een brief?Hoe schrijf je een brief?
Hoe schrijf je een brief?CVO-SSH
 
mHealth Insights for Wireless Carrier
mHealth Insights for Wireless CarriermHealth Insights for Wireless Carrier
mHealth Insights for Wireless CarrierKarthik Ethirajan
 
Bibliotheken moeten naar buiten toe
Bibliotheken moeten naar buiten toeBibliotheken moeten naar buiten toe
Bibliotheken moeten naar buiten toeErna Winters
 
Drupal theming intro
Drupal theming introDrupal theming intro
Drupal theming introtlattimore
 

Andere mochten auch (20)

The Case For The Sustainable Workplace
The Case For The Sustainable WorkplaceThe Case For The Sustainable Workplace
The Case For The Sustainable Workplace
 
IPad boot camp iste 2013 without videos
IPad boot camp iste 2013 without videosIPad boot camp iste 2013 without videos
IPad boot camp iste 2013 without videos
 
Benjamín arditi (democracia postliberal participativa)
Benjamín arditi (democracia postliberal participativa)Benjamín arditi (democracia postliberal participativa)
Benjamín arditi (democracia postliberal participativa)
 
Nordic e commerce3
Nordic e commerce3Nordic e commerce3
Nordic e commerce3
 
Limecoconut
LimecoconutLimecoconut
Limecoconut
 
Оптимизация интерактивного тестирования с использованием метрики Покрытие кода
Оптимизация интерактивного тестирования с использованием метрики Покрытие кодаОптимизация интерактивного тестирования с использованием метрики Покрытие кода
Оптимизация интерактивного тестирования с использованием метрики Покрытие кода
 
Article Fogo Glissement Caldeira
Article Fogo Glissement CaldeiraArticle Fogo Glissement Caldeira
Article Fogo Glissement Caldeira
 
RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development
RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development
RefWorks for DEPARTMENT OF FAMILY MEDICINE - Faculty Development
 
Marketing Life Prospective 2012
Marketing Life Prospective 2012Marketing Life Prospective 2012
Marketing Life Prospective 2012
 
Edison.powerpoint.106.v2
Edison.powerpoint.106.v2Edison.powerpoint.106.v2
Edison.powerpoint.106.v2
 
Cheers
CheersCheers
Cheers
 
Advanced Nutrients thesystemmagalog
Advanced Nutrients thesystemmagalogAdvanced Nutrients thesystemmagalog
Advanced Nutrients thesystemmagalog
 
Manager Info Kit
Manager Info KitManager Info Kit
Manager Info Kit
 
Academic Honesty at Oxford College of Emory University: Fall 2011
Academic Honesty at Oxford College of Emory University: Fall 2011Academic Honesty at Oxford College of Emory University: Fall 2011
Academic Honesty at Oxford College of Emory University: Fall 2011
 
Hoe schrijf je een brief?
Hoe schrijf je een brief?Hoe schrijf je een brief?
Hoe schrijf je een brief?
 
mHealth Insights for Wireless Carrier
mHealth Insights for Wireless CarriermHealth Insights for Wireless Carrier
mHealth Insights for Wireless Carrier
 
Single Sign On Social Login
Single Sign On Social LoginSingle Sign On Social Login
Single Sign On Social Login
 
Bibliotheken moeten naar buiten toe
Bibliotheken moeten naar buiten toeBibliotheken moeten naar buiten toe
Bibliotheken moeten naar buiten toe
 
Drupal theming intro
Drupal theming introDrupal theming intro
Drupal theming intro
 
Cadets cat
Cadets catCadets cat
Cadets cat
 

Mehr von BOSC 2010

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkBOSC 2010
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsBOSC 2010
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesBOSC 2010
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenisBOSC 2010
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 embossBOSC 2010
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evokerBOSC 2010
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorBOSC 2010
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisBOSC 2010
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorBOSC 2010
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfBOSC 2010
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsBOSC 2010
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perlBOSC 2010
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopythonBOSC 2010
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBOSC 2010
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaBOSC 2010
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytowebBOSC 2010
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloBOSC 2010
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptxBOSC 2010
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiBOSC 2010
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010BOSC 2010
 

Mehr von BOSC 2010 (20)

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomics
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-services
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 emboss
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evoker
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projector
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenis
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductor
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasf
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstats
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perl
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rna
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytoweb
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phylo
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptx
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadi
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 

Kürzlich hochgeladen

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Kürzlich hochgeladen (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Gao bosc2010 musite

  • 1. Musite: Prediction of Protein Phosphorylation Sites Jianjiong Gao University of Missouri Columbia Missouri, http://musite.sourceforge.net/
  • 2. Background: Protein Phosphorylation Protein phosphorylation is one of the most important p p post-translational modifications. It was estimated that up to 50% of proteins are phosphorylated in some cellular state Abnormality in phosphorylation is a cause or consequence of many diseases Cancer Diabete Parkinson’s Hepertitis B …
  • 3. Background: Protein Phosphorylation Phosphorylation-dephosphorylation is a biochemical switch system regulating y g g various cellular processes. Catalyzed by various specific protein kinases. Kinase ON OFF Phosphatase
  • 4. Phosphorylation Site Prediction Problem Formulation Phosphorylation site: a phosphorylated amino acid in a protein (determined by protein sequence) General phosphorylation site prediction: to predict whether an amino acid can be phosphorylated Kinase-specific p p phosphorylation site p p y prediction: to predict whether an amino acid can be p phosphorylated by a specific kinase p y y p Based on protein sequence only
  • 5. Limitations of Current Methods Current prediction tools have limitations when applying to whole proteomes Prediction accuracy could be improved Most were released as web servers and have restrictions for the uploaded data by users Training data were out of date Stringency adjustment was not fully supported
  • 6. Our tool Musite is unique Novel method with better accuracy First open source tool in the field that meet open-source OSI Open Standards Requirement Standalone program designed for proteome- scale prediction p Support both general and kinase-specific phosphorylation site prediction Support customized model training Support continuous stringency adjustment
  • 7. Phosphorylation Site Prediction Flowchart Data collection from high quality sources, Training data such as Uniprot/Swiss-Prot,Phospho.ELM, PhosphoPep,and PhosPhAt Bootstrap Non-redundant datasets built by BLASTclust Bootstrap sample 1 ... Bootstrap sample m Training Phosphorylation it Ph h l ti sites Non-phosphorylation it N h h l ti sites Feature extraction Classifier 1 ... Classifier m KNN scores Disorder scores Amino acid frequencies Aggregating Specificity Features from Features from estimation Phosphorylation positive set negative set prediction model Control data Making predictions on new data
  • 8. Phosphorylation Site Prediction Data Extraction Data collection from high quality sources, Training data such as Uniprot/Swiss-Prot,Phospho.ELM, PhosphoPep,and PhosPhAt Bootstrap Non-redundant datasets built by BLASTclust Bootstrap sample 1 ... Bootstrap sample m Training Phosphorylation it Ph h l ti sites Non-phosphorylation it N h h l ti sites Feature extraction Classifier 1 ... Classifier m KNN scores Disorder scores Amino acid frequencies Aggregating Specificity Features from Features from estimation Phosphorylation positive set negative set prediction model Control data Making predictions on new data
  • 9. Phosphorylation Site Prediction Feature Extraction Data collection from high quality sources, Training data such as Uniprot/Swiss-Prot,Phospho.ELM, PhosphoPep,and PhosPhAt Bootstrap Non-redundant datasets built by BLASTclust Bootstrap sample 1 ... Bootstrap sample m Training Phosphorylation it Ph h l ti sites Non-phosphorylation it N h h l ti sites Feature extraction Classifier 1 ... Classifier m KNN scores Disorder scores Amino acid frequencies Aggregating Specificity Features from Features from estimation Phosphorylation positive set negative set prediction model Control data Making predictions on new data
  • 10. Phosphorylation Site Prediction Feature Extraction Data collection from high quality sources, Training data such as Uniprot/Swiss-Prot,Phospho.ELM, PhosphoPep,and PhosPhAt Bootstrap Non-redundant datasets built by BLASTclust Bootstrap sample 1 ... Bootstrap sample m Training Phosphorylation it Ph h l ti sites Non-phosphorylation it N h h l ti sites Feature extraction Classifier 1 ... Classifier m KNN scores Disorder scores Amino acid frequencies Aggregating Specificity Features from Features from estimation Phosphorylation positive set negative set prediction model Control data Making predictions on new data
  • 11. KNN Features Motivation Rationale of using KNN features: local sequence clusters exist around phosphorylation sites, since Each phosphorylation site is a substrate of a specific protein kinase Substrates of the same kinase or kinase family usually shares similar patterns in local sequences
  • 12. KNN Features Result (A) Overall, phosphosites Phospho Nonphospho have larger KNN scores 1 than non-phosphosites 0.8 core KNN sc 0.6 Average KNN scores 0.4 0.7~0.8 for phosphosites 0.2 ≈0.5 for non-phosphosites 0 0.25 0 25 0.5 05 1 2 4 Size of nearest neighbors (% of sample size) Boxplot of KNN features (Human S /Th ) (H Ser/Thr)
  • 13. Disorder Features Concept & Rationale Disordered region (structure) Some parts of a protein have a rigid structure, such as α-helix and β-sheet. Other parts, disordered regions, do not have well defined well-defined conformations The conformational flexibility of disordered regions may facilitate protein phosphorylation [Dunker, 2008]: protein phosphorylation sites are frequently located within disordered regions
  • 14. Disorder Features Result For h F phosphosites h it (A) Phospho-S/T in H. sapiens 6 Occurrence increases exponentially 10000 5 when d so de sco e increases e disorder score c eases 4 For non-phosphosites 5000 3 2 Significantly different distribution occurrence e 0 1 0 0.2 0.4 0.6 0.8 1 x 10 5 (B) Non-phospho-S/T in H. sapiens 0 Disorder score > 0.5 2.5 -1 2 Phosphosites: ~91% -2 1.5 Non-phosphosites: ~55% -3 1 Phosphosites are significantly 0.5 05 -4 over-represented in disordered 0 -5 -6 regions 0 0.2 0.4 0.6 Disorder Score 0.8 1 Histogram of disorder features (Human Ser/Thr)
  • 15. Amino Acid Frequencies Result quency) 1 0.5 0 Log2(Ratio of Freq -0.5 H. sapiens (S/T) M. musculus (S/T) -1 1 D. melanogaster (S/T) -1.5 C. elegans (S/T) -2 2 S. cerevisiae (S/T) ( ) g A. thaliana (S/T) -2.5 P R D E S K G A Q N V T H L M I F Y W C Amino Acid A i A id P, R, D, E, S, K, and G are enriched around phosphosites C, W, Y, F, I, M, L, H, T, and V are depleted
  • 16. Phosphorylation Site Prediction Classifier Training Data collection from high quality sources, Training data such as Uniprot/Swiss-Prot,Phospho.ELM, PhosphoPep,and PhosPhAt Bootstrap Non-redundant datasets built by BLASTclust Bootstrap sample 1 ... Bootstrap sample m Training Phosphorylation it Ph h l ti sites Non-phosphorylation it N h h l ti sites Feature extraction Classifier 1 ... Classifier m KNN scores Disorder scores Amino acid frequencies Aggregating Specificity Features from Features from estimation Phosphorylation positive set negative set prediction model Control data Making predictions on new data
  • 17. Results Trained Models General Prediction Kinase-Specific Human ser/thr / Prediction Human tyr ATM Mouse ser/thr CDK/CDK1/CDK2 Mouse tyr CK1/CK2 Fluit fly ser/thr MAPK1/MAPK3 Worm ser/thr PKA Yeast ser/thr PKB Arabidopsis ser/thr PKC Src
  • 18. Results Cross validation 1 C. elegans (S/T) A. thaliana (S/T) H. sapiens (S/T) 0.8 08 M. musculus (S/T) S. cerevisiae (S/T) 0.8 D. melanogaster (S/T) Sensitivity y 0.6 06 M. musculus (Y) 0.6 H. sapiens (Y) Random guess 0.4 04 S 0.4 0.2 02 0.2 0 0 0.02 0.04 0.06 0.08 0.1 0 0 0.2 0.4 0.6 0.8 1 1 - Specificity
  • 19. Results Comparison to other tools 1 0.9 Musite 0.8 08 Scan-x 0.7 DISPHOS NetPhos 0.6 06 Sensitivity 0.6 0.5 S 0.4 0 0.4 0.3 0.2 0.2 0.1 0 0 0.02 0 02 0.04 0 04 0.06 0 06 0.08 0 08 0.1 01 0 0 0.2 0.4 0.6 0.8 1 1 - Specificity
  • 20. Phosphorylation Site Prediction Software Implementation-Musite Open Source License: GNU General Public License (GPL) http://musite.sourceforge.net/ http://musite sourceforge net/ Stand-alone application Based on Java Support Windows Linux and Mac OS X Windows, Linux, A web server is also being developed g p http://musite.net/
  • 21. Implementation User Interface
  • 22. Implementation Customized Model Training A unique utility for users to train prediction models f di ti d l from th i own d t their data Take advantage of latest data Train disease-specific models Train organ-specific models Integrate into experimental p g p procedure in an iterative way
  • 23. Summary Musite is for prediction of general and kinase- specific phosphosites in a better accuracy Musite is a open-source standalone program capable of performing proteome-wide proteome wide predictions
  • 24. Acknowledgements Dr. Dong Xu (University of Missouri) Dr. Jay Thelen (U e s ty o Missouri) e e (University of ssou ) Dr. Keith Dunker (Indiana University) Curtis Bollinger (University of Missouri) Funding Visit us at NSF [# DBI 0604439] DBI-0604439] http://musite.sourceforge.net p g NIH [# R21/R33 GM078601] http://musite.net Poster R09 at ISMB