SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
Making	
  protein	
  func0on	
  and	
  subcellular	
  
localiza0on	
  predic0ons	
  –	
  challenges	
  and	
  
opportuni0es	
  
Fiona	
  Brinkman	
  
	
  
Department	
  of	
  Molecular	
  Biology	
  and	
  Biochemistry	
  
(Associate,	
  Faculty	
  of	
  Health	
  Sciences	
  and	
  School	
  of	
  Compu0ng	
  Sciences)	
  
Simon	
  Fraser	
  University	
  
Greater	
  Vancouver,	
  BC,	
  Canada	
  
	
  
April	
  2014	
  
•  Improving	
  seq	
  similarity/orthology-­‐based	
  predic0ons	
  –	
  a	
  keystone	
  
of	
  many	
  predictors	
  
	
  
•  Improving	
  pathway/network-­‐based	
  analysis	
  to	
  iden0fy	
  protein	
  
func0ons	
  	
  
	
  
•  Future	
  challenges	
  and	
  opportuni0es	
  (using	
  protein	
  localiza0on	
  as	
  
an	
  example	
  of	
  what	
  is	
  to	
  come)	
  
	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  What	
  we	
  MUST	
  do	
  to	
  move	
  AFP	
  forward….	
   2	
  
3	
  
	
  
	
  
One-­‐to-­‐one	
  orthologs	
  are,	
  in	
  par0cular,	
  more	
  func0onally	
  similar	
  to	
  
each	
  other,	
  vs	
  other	
  orthologs,	
  paralogs,	
  when	
  >80%	
  seq	
  iden0ty	
  
Func0onal	
  similarity	
  measured	
  by	
  GO	
  annota0on	
  similarity	
  (13	
  species)	
  
Altenhoff	
  AM	
  et	
  al.	
  PLoS	
  Comput	
  Biol.	
  2012	
  
4	
  
	
  
	
  
One-­‐to-­‐one	
  orthologs	
  are,	
  in	
  par0cular,	
  more	
  func0onally	
  similar	
  to	
  
each	
  other,	
  vs	
  other	
  orthologs,	
  paralogs,	
  when	
  >80%	
  seq	
  iden0ty	
  
Func0onal	
  similarity	
  measured	
  by	
  GO	
  annota0on	
  similarity	
  (13	
  species)	
  
Altenhoff	
  AM	
  et	
  al.	
  PLoS	
  Comput	
  Biol.	
  2012	
  
6	
  
If	
  true	
  ortholog	
  is	
  missing…	
  	
  
(gene	
  loss,	
  or	
  incomplete	
  genome)	
  
	
  
Ingroup1	
   Ingroup2	
   Outgroup	
  
Species	
  Tree:	
  
Gene	
  Tree:	
  
Ingroup1	
   Ingroup2	
   Outgroup	
  
RBBH	
  
Reciprocal	
  Best	
  Blast	
  Hit	
  	
  FAIL
Gene	
  Tree:	
  
Ingroup1	
   Outgroup	
  
Ingroup2	
  
Usual	
  
Divergence	
  
One	
  of	
  the	
  orthologous	
  genes	
  
diverges	
  faster…	
  
	
  
Paralog	
  
RBBH	
  
Paralog	
  
Ortholuge
Uses	
  phyle0c	
  ra0os	
  to	
  differen0ate	
  	
  
Suppor0ng	
  Species	
  Divergence	
  (SSD)	
  orthologs	
  	
  
vs	
  proteins	
  more	
  divergent	
  than	
  expected	
  (non-­‐SSD)	
  
7	
  
Ra*o1	
  
distance{ ingroup1-­‐ingroup2} 	
  
distance{ ingroup1-­‐outgroup } 	
  
Ingroup1	
   Ingroup2	
   Outgroup	
  
SSD	
  
Non-­‐SSD	
  
Ortholuge	
  analysis	
  comparing	
  Burkholderia	
  cepacia	
  
&	
  B.cenocepacia	
  (outgroup:	
  B.pseudomallei)	
  
Ra*o2	
  
distance{ ingroup1-­‐ingroup2} 	
  
distance{ ingroup2-­‐outgroup } 	
  
Ingroup1	
   Ingroup2	
   Outgroup	
  
Whiteside	
  et	
  al	
  2013	
  	
  
PMID	
  23203876	
  
0.000	
  
0.200	
  
0.400	
  
0.600	
  
0.800	
  
1.000	
  
KEGG	
  
Orthology	
  
Pfam	
  Domains	
   Tigrfam	
  
Annota0ons	
  
Subcellular	
  
Localiza0ons	
  
Propor*on	
   Predicted	
  Orthologs	
  in	
  600	
  Pairs	
  of	
  Bacterial	
  Species	
  
SSD	
  Ortholog	
  
Non-­‐SSD	
  
8	
  
*	
   *	
   *	
  
*	
  
*	
  p-­‐value	
  <	
  0.05	
  
0	
  
0.1	
  
0.2	
  
0.3	
  
0.4	
  
0.5	
  
0.6	
  
0.7	
  
One	
  or	
  more	
  
homologs	
  (based	
  on	
  
BLAST	
  hits)	
  
Propor*on	
  
SSD	
  orthologs	
  
Non-­‐SSD	
  
*	
  
*	
  p-­‐value	
  <	
  0.05	
  
Non-­‐SSD	
  “Orthologs”	
  
more	
  likely:	
  	
  
	
  
-­‐	
  Func0onally	
  dissimilar	
  	
  
-­‐	
  Have	
  one	
  or	
  more	
  
homologs	
  
A Database of Ortholuge Evaluations
OrtholugeDB	
  	
  	
  (0nyurl.com/ortholugeDB)	
  
•  Provides	
  pre-­‐computed	
  ortholog	
  predic0ons	
  for	
  >1400	
  bacteria	
  
and	
  archaea	
  (update	
  coming	
  next	
  month!),	
  with	
  further	
  	
  
Ortholuge	
  assessments	
  
•  Covers	
  all	
  genes	
  in	
  fully	
  sequenced	
  bacterial	
  and	
  archaeal	
  genomes	
  
•  Facilitates	
  visualiza0on	
  and	
  evalua0on	
  of	
  ortholog	
  predic0ons	
  
9	
  
Similar	
  issue	
  with	
  ini0al	
  metagenomics	
  seq	
  
func0onal	
  evalua0on	
  
1.  Simulated	
  reads	
  from	
  Pseudomonas	
  aeruginosa	
  PAO1	
  
2.  Created	
  databases	
  at	
  different	
  levels	
  of	
  clade	
  exclusion	
  
•  E.g.	
  for	
  species	
  clade	
  exclusion	
  removed	
  all	
  Pseudomonas	
  	
  
aeruginosa	
  genomes	
  from	
  the	
  database	
  
3.  Used	
  RAPSearch2	
  and	
  MEGAN5	
  to	
  assign	
  func0onal	
  
categories	
  to	
  the	
  simulated	
  reads	
  
4.  Calculated	
  propor0on	
  of	
  reads	
  assigned	
  to	
  each	
  func0onal	
  
category	
  rela0ve	
  to	
  how	
  many	
  reads	
  expected	
  
•  E.g:	
  
10	
  
Category	
  
Expected	
  #	
  
assigned	
  
Actual	
  #	
  
assigned	
  
Rela0ve	
  
Propor0on	
  
Membrane	
  
Transport	
   567	
   583	
   1.02822	
  
Most	
  func0onal	
  categories	
  are	
  predicted	
  well	
  
but	
  some	
  are	
  overpredicted	
  (ra0o	
  notably	
  >1)	
  
0	
  
0.5	
  
1	
  
1.5	
  
2	
  
2.5	
  
Ra*o	
  of	
  assigned	
  	
  
rela*ve	
  to	
  expected	
  
None	
  
Species	
  
Family	
  
Class	
  
Level of
clade
exclusion:
Ie. Endocrine system: 3 problematic
orthology groups – all with high #’s of
proteins (one has 3538 when median is 54!)
The	
  rela0ve	
  propor0ons	
  of	
  func0onal	
  categories	
  stays	
  
rela0vely	
  consistent	
  as	
  clade	
  exclusion	
  level	
  increases	
  
0%	
  
10%	
  
20%	
  
30%	
  
40%	
  
50%	
  
60%	
  
70%	
  
80%	
  
90%	
  
100%	
  
None	
   Species	
   Family	
   Class	
  
Propor*on	
  of	
  reads	
  assigned	
  
Clade	
  exclusion	
  level	
  
Xenobio0cs	
  Biodegrada0on	
  
and	
  Metabolism	
  
Transcrip0on	
  
Signal	
  Transduc0on	
  
Replica0on	
  and	
  Repair	
  
Infec0ous	
  Diseases	
  
Nucleo0de	
  Metabolism	
  
Neurodegenera0ve	
  
Diseases	
  
Metabolism	
  of	
  Other	
  
Amino	
  Acids	
  
Metabolism	
  of	
  Cofactors	
  
and	
  Vitamins	
  
Membrane	
  Transport	
  
…
Improving	
  pathway-­‐based	
  analysis	
  
Issue:	
  Biomolecular	
  pathway	
  classifica0ons	
  can	
  bias	
  analyses	
  of	
  
pathways	
  found	
  to	
  be	
  upregulated	
  or	
  downregulated	
  by	
  
transcriptome	
  (or	
  other	
  omics-­‐level)	
  analysis	
  
	
  
What	
  you	
  iden0fy	
  depends	
  on	
  how	
  everything	
  is	
  classified….	
  
	
  
Need	
  beper	
  “signatures”	
  of	
  pathways…	
  
Dealing	
  with	
  PART	
  of	
  the	
  issue…	
  	
  
	
  
Distribu0on	
  of	
  the	
  number	
  of	
  associated	
  	
  
pathways	
  for	
  human	
  genes	
  in	
  KEGG.	
  
1
7-45
2
3
4
5
6
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
Membership	
  of	
  a	
  gene	
  in	
  mul0ple	
  pathways	
  is	
  the	
  norm,	
  not	
  the	
  
excep0on…	
  
Foroushani et al, 2014 PMCID: PMC3883547
Not	
  all	
  genes	
  are	
  equal…	
  	
  
Maroon:	
  pathway	
  member	
  	
  	
  White:	
  no	
  membership	
  
	
  
	
  
All	
  genes	
  are	
  not	
  
equivalent	
  signatures	
  
of	
  a	
  given	
  pathway	
  
Foroushani et al, 2014
PMCID: PMC3883547
Individual Gene ORA
Antigen processing and presentation
Graft-versus-host disease
Natural killer cell mediated cytotoxicity
Viral myocarditis
Allograft rejection
Cell adhesion molecules (CAMs)
Chemokine signaling pathway
Type I diabetes mellitus
Toll-like receptor signaling pathway
Cytokine-cytokine receptor interaction
Example:	
  Treated	
  vs	
  Untreated	
  Mouse	
  Severe	
  InflammaIon	
  –	
  
Gene	
  Expression	
  Dataset	
  
	
  
	
  
Standard Over-
Representation Analysis
(ORA) and Gene Set
Enrichment Analysis
(GSEA) treat all genes in
a given pathway as equal
indicators that that
pathway is significant.
à Emphasizes
generalist genes/
pathways
Foroushani et al, 2014 PMCID: PMC3883547
Pathway	
  Signatures	
  using	
  SIGORA:	
  IdenIfying	
  genes/gene	
  pairs	
  	
  
uniquely	
  associated	
  with	
  a	
  single	
  pathway	
  
SIGORA identifies statistically significant enrichment of
Pathway Signatures in a gene list of interest.
Foroushani et al, 2014 PMCID: PMC3883547
Example: Treated vs Untreated Mouse Severe Inflammation –
Gene Expression Dataset	
  
	
  
SIGORA	
  avoids	
  many	
  biologically	
  less	
  plausible	
  results	
  seen	
  by	
  other	
  
methods	
  that	
  over-­‐emphasize	
  generalist	
  genes/pathways.	
  
For example, 6/8 up-regulated genes in “Type I diabetes mellitus”
pathway are also in the "Antigen processing and presentation" pathway.
Individual Gene ORA SIGORA
Antigen processing and presentation Antigen processing and presentation
Graft-versus-host disease Natural killer cell mediated cytotoxicity
Natural killer cell mediated cytotoxicity Complement and coagulation cascades
Viral myocarditis Toll-like receptor signaling pathway
Allograft rejection Cytokine-cytokine receptor interaction
Cell adhesion molecules (CAMs) Leukocyte transendothelial migration
Chemokine signaling pathway Cell adhesion molecules (CAMs)
Type I diabetes mellitus Cytosolic DNA-sensing pathway
Toll-like receptor signaling pathway Chemokine signaling pathway
Cytokine-cytokine receptor interaction
Future	
  challenges	
  and	
  opportuni0es	
  	
  
	
  
(using	
  bacterial	
  protein	
  localiza0on	
  as	
  an	
  example	
  	
  
of	
  what	
  is	
  to	
  come)	
  
	
  
(Gardy & Brinkman 2006 Nature Reviews Microbiology 4:741)
19	
  
Bacterial	
  protein	
  subcellular	
  localiza0on	
  predic0on	
  
•  Aids	
  genome	
  annota0on	
  and	
  predic0on	
  of	
  protein	
  func0on	
  	
  
•  Used	
  to	
  iden0fy	
  cell	
  surface/secreted	
  targets	
  for	
  drugs	
  and	
  
diagnos0cs,	
  as	
  well	
  as	
  poten0al	
  vaccine	
  components	
  
•  Many	
  pathogen-­‐associated	
  virulence	
  factors	
  predicted	
  as	
  secreted	
  
(Gardy & Brinkman 2006 Nature Reviews Microbiology 4:741)
20	
  
Signal	
  pep0des:	
  Non-­‐cytoplasmic	
  
	
  
Amino	
  acid	
  composi0on/paperns:	
  All	
  localiza0ons	
  
	
  -­‐	
  Support	
  Vector	
  Machine’s	
  trained	
  with	
  amino	
  acid	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  composi0ons	
  or	
  frequent	
  subsequences	
  	
  
	
   	
  	
  
Transmembrane	
  helices:	
  Cytoplasmic	
  membrane	
  
	
  -­‐	
  HMMTOP	
  
	
  
PROSITE	
  mo0fs	
  with	
  100%	
  precision:	
  All	
  localiza0ons	
  
	
  
Outer	
  membrane	
  mo0fs:	
  Outer	
  membrane	
  
	
  -­‐	
  Iden0fied	
  by	
  associa0on-­‐rule	
  mining	
  	
  
	
  
Homology	
  to	
  proteins	
  of	
  experimentally	
  known	
  localiza0on:	
  All	
  loc.	
  
	
  -­‐	
  “SCL-­‐BLAST”	
  against	
  pro	
  of	
  known	
  localiza0on	
  
	
  -­‐	
  E=10e-­‐10	
  and	
  length	
  restric0on	
  for	
  precision	
  	
  
Integra0on	
  
with	
  a	
  
Baysian	
  
Network	
  
Yu	
  et	
  al	
  (2010)	
  BioinformaIcs	
  26:1608	
  	
  
PSORTb:	
  bacterial	
  protein	
  subcellular	
  
localiza0on	
  (SCL)	
  predic0on	
  sosware	
  
PSORTb:	
  version	
  3	
  
22
• Type	
  III	
  secre0on	
  apparatus	
  
• Pili/fimbria	
  
• Host-­‐associated	
  SCL	
  
• Flagellum	
  
• Spore	
  
• Gas	
  vesicle	
  
Sub-­‐category	
  localiza0on	
  predic0ons	
  
Main	
  localiza0ons	
  predicted	
   Bacteria	
  and	
  Archaea	
  predic0ons	
  
Gram-­‐
nega6ve
SoNware Precision Recall
PSORTb	
  v3.0 96.8 88.0
PSORTb	
  v2.0 95.7 81.5
Gram-­‐
posi6ve
PSORTb	
  v3.0 97.0 93.2	
  
PSORTb	
  v2.0 96.7 89.3
Archaea	
  
PSORTb	
  v3.0 95.0	
   93.3	
  
PSORTb	
  v3.0:	
  high	
  precision,	
  improved	
  sensi0vity/
recall	
  and	
  genome	
  predic0on	
  coverage	
  
0	
  
10	
  
20	
  
30	
  
40	
  
50	
  
60	
  
70	
  
80	
  
90	
  
100	
  
PSORTb	
  v.2.
PSORTb	
  v.3.
Five-­‐fold	
  cross	
  valida0on	
   Genome	
  predic0on	
  coverage	
  
Gram-­‐negaIve	
   Gram-­‐posiIve	
  
A	
  computa0onal	
  predictor	
  more	
  accurate	
  than	
  related	
  high-­‐throughput	
  lab	
  methods	
  
 
Classic	
  Gram	
  posi0ve	
  bacteria,	
  monoderms:	
  Thick	
  pep0doglycan,	
  no	
  outer	
  membrane	
  
Classic	
  Gram	
  nega0ve	
  bacteria,	
  diderms:	
  Thin	
  pep0doglycan	
  +	
  outer	
  membrane	
  
	
  
…but	
  can	
  have	
  Gram	
  nega0ves	
  with	
  no	
  outer	
  membrane	
  (i.e.	
  Mycoplasma)	
  	
  
or	
  a	
  different	
  outer	
  membrane	
  (Synergistetes,	
  Sphingomonas),	
  or	
  Gram	
  posi0ve	
  (thick	
  
peptdoglycan)	
  with	
  a	
  different	
  outer	
  membrane	
  (Deinococcus	
  –	
  6	
  layers	
  in	
  cell	
  
envelope!),	
  or	
  “acid	
  fast”with	
  asymmetric	
  lipid-­‐containing	
  thick	
  cell	
  wall	
  (Mycobacteria)
	
  
Plus	
  bacterial	
  organelles	
  and	
  other	
  substructures	
  
(ie.	
  magnetosome	
  of	
  Magnetospirillum)...	
  
	
  
Solu*on:	
  	
  
-­‐ 	
  For	
  whole	
  genome	
  (deduced-­‐proteome)	
  analysis,	
  	
  
	
  	
  detect	
  key	
  protein	
  markers	
  of	
  a	
  par0cular	
  cell	
  type	
  	
  
	
  	
  (i.e.	
  Omp85	
  essen0al	
  for	
  classic	
  Gram	
  nega0ve	
  membrane)	
  
-­‐	
  For	
  single	
  protein	
  analysis,	
  learn	
  from	
  above	
  analysis,	
  plus	
  	
  
	
  	
  literature	
  cura0on,	
  the	
  most	
  likely	
  cell	
  type	
  for	
  a	
  given	
  phyla	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  …then	
  make	
  predic0ons	
  assuming	
  that	
  cell	
  “type”	
  
Challenge:	
  Organismal	
  diversity	
  	
  
24
Reproduced under Fair Use
Challenge:	
  Temporal,	
  contextual	
  diversity	
  
Proteins	
  can	
  be	
  associated	
  with	
  mul0ple	
  subcellular	
  localiza0ons	
  	
  
	
  	
  
	
  	
  
	
  
	
  
i.e.	
  Cell	
  division	
  proteins,	
  Autotransporters,	
  “protein	
  A	
  dependant	
  on	
  protein	
  B”	
  	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
Solu0on:	
  Note	
  all	
  possible	
  localizaIons	
  since	
  Temporal,	
  contextual	
  predic0ons	
  
non-­‐trivial	
  –	
  not	
  enough	
  knowledge	
  for	
  most	
  
Kjærgaard K et al. J. Bacteriol. 2000;182:4789-4796
Challenge:	
  Metagenomics	
  
High	
  demand	
  for	
  PSORTb	
  to	
  be	
  able	
  to	
  analyze	
  metagenomic	
  sequences	
  
….	
  under	
  development	
  
	
  
	
  	
  
	
  	
  
	
  
	
   Need	
  taxonomy	
  data	
  to	
  aid	
  predic0ons	
  	
  
(then	
  enable	
  appropriate	
  cell	
  type	
  analysis)	
  
	
  
	
  
 
	
  	
  
	
  	
  
	
  
	
  
Through	
  over	
  a	
  decade	
  of	
  cura0ng	
  for,	
  
making	
  and	
  evalua0ng	
  predictors	
  of	
  
protein	
  localiza0on,	
  genomic	
  islands,	
  etc	
  	
  
	
  
What	
  makes	
  a	
  great	
  predictor?	
  	
  
	
  
	
  
 
	
  	
  
	
  	
  
	
  
	
  
Through	
  over	
  a	
  decade	
  of	
  cura0ng	
  for,	
  
making	
  and	
  evalua0ng	
  predictors	
  of	
  
protein	
  localiza0on,	
  genomic	
  islands,	
  etc	
  	
  
	
  
What	
  makes	
  a	
  great	
  predictor?	
  	
  
	
  
(besides	
  it	
  being	
  right)	
  	
  ☺	
  
	
  
Bioinforma0cs	
  Predictor’s	
  Code	
  of	
  Conduct	
  
-­‐	
  Never	
  force	
  predic0ons	
  -­‐	
  always	
  have	
  a	
  predic0on	
  op0on/category	
  of	
  
	
  	
  	
  “unknown”	
  
	
  
	
  
	
  	
  
	
  	
  
	
  
	
  
Inspired	
  by	
  the	
  classic	
  “Data	
  Provider’s	
  Code	
  of	
  Conduct”	
  in	
  Stein	
  (2002)	
  Nature	
  417,	
  119-­‐120	
  
Example	
  of	
  forced	
  predic0ons:	
  PSORT	
  I	
  predic0on	
  method	
  	
  
Nakai & Kanehisa, Proteins: Structure, Function, Genetics (1991) Overall Accuracy = 69%
What’s
wrong
here?
Example	
  of	
  forced	
  predic0ons:	
  PSORT	
  I	
  predic0on	
  method	
  
Nakai & Kanehisa, Proteins: Structure, Function, Genetics (1991) Overall Accuracy = 69%
No secreted/
extracellular
localization!
Inspired	
  by	
  the	
  classic	
  “Data	
  Provider’s	
  Code	
  of	
  Conduct”	
  in	
  Stein	
  (2002)	
  Nature	
  417,	
  119-­‐120	
  
	
  
-­‐	
  Never	
  force	
  predic0ons	
  -­‐	
  always	
  have	
  “unknown”	
  op0on/category	
  	
  
	
  	
  	
  	
  
-­‐	
  Ensure	
  open	
  source	
  -­‐	
  enable	
  viewing	
  of	
  predic0on	
  method	
  details	
  	
  
	
  
-­‐ 	
  Predictor	
  should	
  easily	
  be	
  trainable	
  with	
  different	
  datasets	
  	
  
	
  	
  	
  (if	
  applicable;	
  so	
  others	
  can	
  robustly	
  evaluate	
  accuracy)	
  
	
  
-­‐ 	
  Have	
  ability	
  to	
  run	
  locally	
  or	
  over	
  web	
  (with	
  an	
  API	
  is	
  preferred)	
  
-­‐ 	
  Provide	
  access	
  to	
  old	
  versions	
  (at	
  minimum	
  when	
  transi0oning	
  
	
  	
  	
  to	
  new	
  version)	
  
-­‐	
  Encourage	
  con0nuing	
  cura0on	
  from	
  the	
  literature/lab	
  experiments!	
  	
  
	
  	
  	
  Incorporate	
  some	
  curaIon	
  efforts	
  into	
  predictor	
  funding	
  applicaIons	
  
Bioinforma0cs	
  Predictor’s	
  Code	
  of	
  Conduct	
  
Bioinforma0cs	
  Predictor’s	
  Code	
  of	
  Conduct	
  -­‐	
  evalua*on	
  
33
	
  
-­‐	
  Evaluate	
  precision	
  and	
  recall	
  (and	
  accuracy	
  measure	
  combos	
  thereof)	
  	
  
	
  	
  with	
  x-­‐fold	
  cross	
  valida0on	
  and/or	
  new	
  datasets	
  (like	
  CAFA!)	
  
	
  
-­‐ 	
  ID	
  errors,	
  biases	
  and	
  provide	
  guidance	
  to	
  users	
  re	
  issues	
  to	
  watch	
  for	
  
-­‐ 	
  bias	
  in	
  training	
  and/or	
  tes0ng	
  datasets	
  	
  
	
  	
  (“homology	
  reduc0on”,	
  “clade	
  exclusion”	
  may	
  help)	
  
-­‐	
  errors	
  in	
  “gold	
  standard”	
  lab-­‐based	
  measure	
  
-­‐	
  contextual/temporal	
  changes	
  in	
  proteins,	
  impac0ng	
  predic0on	
  	
  
	
  	
  (ie.	
  Func0on	
  changes	
  when	
  another	
  protein/compound	
  present)	
  	
  
	
  
	
  
	
  
What	
  we	
  MUST	
  do:	
  
Guide	
  users	
  to	
  not	
  just	
  blindly	
  use	
  a	
  predictor	
  and	
  its	
  default	
  output.	
  	
  
What	
  we	
  MUST	
  do:	
  
	
  
Guide	
  users	
  to	
  not	
  just	
  blindly	
  use	
  a	
  predictor	
  and	
  its	
  default	
  output.	
  	
  
	
  
Curators,	
  experimentalists,	
  and	
  automated	
  funcIon	
  predictor	
  
developers	
  must	
  coordinate	
  efforts	
  more	
  
	
  
•  Experimentalists	
  working	
  on	
  what	
  	
  
they	
  think	
  best…	
  
•  Curators	
  cura0ng	
  what	
  they	
  	
  
priori0ze…	
  
•  Func0on	
  predictors	
  op0mizing	
  	
  
predic0on	
  using	
  exis0ng	
  data….	
  	
  
	
  
FuncIon	
  predictors/bioinformaIcists	
  need	
  to	
  get	
  in	
  the	
  drivers	
  seat	
  
more	
  for	
  research	
  	
  
Bioinforma0cs	
  Predictor’s	
  Code	
  of	
  Conduct	
  
Brinkman	
  Lab	
  Kayaking	
  Trip,	
  Summer	
  2013
	
  	
  
(Next	
  up,	
  Archery	
  Tag!)
	
  	
  
Amir	
  Foroushani	
  
Maphew	
  Laird	
  
David	
  Lynn	
  
Raymond	
  Lo	
  
	
  
	
  
Mike	
  Peabody	
  
Thea	
  Van	
  Rossum	
  
Maphew	
  Whiteside	
  
Nancy	
  Yu	
  	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Molecular analysis of Microbial Community
Molecular analysis of Microbial CommunityMolecular analysis of Microbial Community
Molecular analysis of Microbial CommunityRinaldo John
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSHEETHUMOLKS
 
Prediction of mi-RNA related to late blight disease of potato
Prediction of mi-RNA related to late blight disease of potatoPrediction of mi-RNA related to late blight disease of potato
Prediction of mi-RNA related to late blight disease of potatoAnimesh Kumar
 
Drug Repositioning Conference Washington DC 20190923
Drug Repositioning Conference Washington DC 20190923Drug Repositioning Conference Washington DC 20190923
Drug Repositioning Conference Washington DC 20190923Tudor Oprea
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!adcobb
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Prof. Wim Van Criekinge
 
Envisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve diseaseEnvisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve diseasemhaendel
 
Protein-Protein Interactions (PPIs)
Protein-Protein Interactions (PPIs)Protein-Protein Interactions (PPIs)
Protein-Protein Interactions (PPIs)Sai Ram
 
Introduction to Gene Mining: Part B: How similar are plant and animal version...
Introduction to Gene Mining: Part B: How similar are plant and animal version...Introduction to Gene Mining: Part B: How similar are plant and animal version...
Introduction to Gene Mining: Part B: How similar are plant and animal version...adcobb
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTNathan Olson
 
Techniques in proteomics
Techniques in proteomicsTechniques in proteomics
Techniques in proteomicsN Poorin
 
Target Validation / Biochemical and Cellular Assay Development
Target Validation / Biochemical and Cellular Assay Development Target Validation / Biochemical and Cellular Assay Development
Target Validation / Biochemical and Cellular Assay Development OSUCCC - James
 
Protein Interaction Reporters : Protein-Protein Interactions (PPI) elucidated...
Protein Interaction Reporters : Protein-Protein Interactions (PPI) elucidated...Protein Interaction Reporters : Protein-Protein Interactions (PPI) elucidated...
Protein Interaction Reporters : Protein-Protein Interactions (PPI) elucidated...Lorenz Lo Sauer
 
Structural Genomics
Structural GenomicsStructural Genomics
Structural GenomicsAqsa Javed
 

Was ist angesagt? (20)

Molecular analysis of Microbial Community
Molecular analysis of Microbial CommunityMolecular analysis of Microbial Community
Molecular analysis of Microbial Community
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 
Prediction of mi-RNA related to late blight disease of potato
Prediction of mi-RNA related to late blight disease of potatoPrediction of mi-RNA related to late blight disease of potato
Prediction of mi-RNA related to late blight disease of potato
 
Proteomics
ProteomicsProteomics
Proteomics
 
Proteomics
ProteomicsProteomics
Proteomics
 
Drug Repositioning Conference Washington DC 20190923
Drug Repositioning Conference Washington DC 20190923Drug Repositioning Conference Washington DC 20190923
Drug Repositioning Conference Washington DC 20190923
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!
 
2015 04 22_time_labs_shared
2015 04 22_time_labs_shared2015 04 22_time_labs_shared
2015 04 22_time_labs_shared
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014
 
Envisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve diseaseEnvisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve disease
 
Protein-Protein Interactions (PPIs)
Protein-Protein Interactions (PPIs)Protein-Protein Interactions (PPIs)
Protein-Protein Interactions (PPIs)
 
Introduction to Gene Mining: Part B: How similar are plant and animal version...
Introduction to Gene Mining: Part B: How similar are plant and animal version...Introduction to Gene Mining: Part B: How similar are plant and animal version...
Introduction to Gene Mining: Part B: How similar are plant and animal version...
 
Techniques in proteomics
Techniques in proteomicsTechniques in proteomics
Techniques in proteomics
 
protein microarray
protein microarray protein microarray
protein microarray
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NIST
 
Techniques in proteomics
Techniques in proteomicsTechniques in proteomics
Techniques in proteomics
 
Target Validation / Biochemical and Cellular Assay Development
Target Validation / Biochemical and Cellular Assay Development Target Validation / Biochemical and Cellular Assay Development
Target Validation / Biochemical and Cellular Assay Development
 
Protein Interaction Reporters : Protein-Protein Interactions (PPI) elucidated...
Protein Interaction Reporters : Protein-Protein Interactions (PPI) elucidated...Protein Interaction Reporters : Protein-Protein Interactions (PPI) elucidated...
Protein Interaction Reporters : Protein-Protein Interactions (PPI) elucidated...
 
Slides 0
Slides 0Slides 0
Slides 0
 
Structural Genomics
Structural GenomicsStructural Genomics
Structural Genomics
 

Andere mochten auch

Andere mochten auch (16)

B.sc biochem i bobi u 3.3 homologous and heterologous
B.sc biochem i bobi u 3.3 homologous and heterologousB.sc biochem i bobi u 3.3 homologous and heterologous
B.sc biochem i bobi u 3.3 homologous and heterologous
 
Poster
PosterPoster
Poster
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomics
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
 
Bioc 523
Bioc 523Bioc 523
Bioc 523
 
Homology
HomologyHomology
Homology
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 
Phylogeny
PhylogenyPhylogeny
Phylogeny
 
Phylogenetic tree
Phylogenetic treePhylogenetic tree
Phylogenetic tree
 
Genetics ppt Robles , Jan Zedric H.
Genetics ppt Robles , Jan Zedric H.Genetics ppt Robles , Jan Zedric H.
Genetics ppt Robles , Jan Zedric H.
 
What is a phylogenetic tree
What is a phylogenetic treeWhat is a phylogenetic tree
What is a phylogenetic tree
 
Phylogenetic analysis
Phylogenetic analysisPhylogenetic analysis
Phylogenetic analysis
 
Phylogenetic trees
Phylogenetic treesPhylogenetic trees
Phylogenetic trees
 
Proteomics
ProteomicsProteomics
Proteomics
 
14 Lecture Animation Ppt
14 Lecture Animation Ppt14 Lecture Animation Ppt
14 Lecture Animation Ppt
 
Proteomics ppt
Proteomics pptProteomics ppt
Proteomics ppt
 

Ähnlich wie Making Protein Function and Subcellular Localization Predictions: Challenges and Opportunities

Functional proteomics, and tools
Functional proteomics, and toolsFunctional proteomics, and tools
Functional proteomics, and toolsKAUSHAL SAHU
 
How Can Ngs Forward Research Essay
How Can Ngs Forward Research EssayHow Can Ngs Forward Research Essay
How Can Ngs Forward Research EssayStefanie Yang
 
Functional genomics, and tools
Functional genomics, and toolsFunctional genomics, and tools
Functional genomics, and toolsKAUSHAL SAHU
 
Bioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalBioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalJennifer Shelton
 
Psb tutorial cancer_pathways
Psb tutorial cancer_pathwaysPsb tutorial cancer_pathways
Psb tutorial cancer_pathwaysJeff Kiefer
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingJoaquin Dopazo
 
A systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlationsA systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlationsfisherp
 
gene mapping, clonning of disease gene(1).pptx
gene mapping, clonning of disease gene(1).pptxgene mapping, clonning of disease gene(1).pptx
gene mapping, clonning of disease gene(1).pptxRajesh Yadav
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein functionLars Juhl Jensen
 
Molecular markers for measuring genetic diversity
Molecular markers for measuring genetic diversity Molecular markers for measuring genetic diversity
Molecular markers for measuring genetic diversity Zohaib HUSSAIN
 
Update on the Druggable Proteome
Update on the Druggable ProteomeUpdate on the Druggable Proteome
Update on the Druggable ProteomeChris Southan
 
STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...Lars Juhl Jensen
 
Rice stress related gene expression analysis
Rice stress related gene expression analysisRice stress related gene expression analysis
Rice stress related gene expression analysisRonHazarika
 
Review Class on Introduction to Bioinformatics
Review Class on Introduction to BioinformaticsReview Class on Introduction to Bioinformatics
Review Class on Introduction to BioinformaticsSyed Lokman
 
Genome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattleGenome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattleLaurence Dawkins-Hall
 
Cell Authentication By STR Profiling
Cell Authentication By STR ProfilingCell Authentication By STR Profiling
Cell Authentication By STR ProfilingCreative-Bioarray
 

Ähnlich wie Making Protein Function and Subcellular Localization Predictions: Challenges and Opportunities (20)

Functional proteomics, and tools
Functional proteomics, and toolsFunctional proteomics, and tools
Functional proteomics, and tools
 
How Can Ngs Forward Research Essay
How Can Ngs Forward Research EssayHow Can Ngs Forward Research Essay
How Can Ngs Forward Research Essay
 
Pathogen Genome Data
Pathogen Genome DataPathogen Genome Data
Pathogen Genome Data
 
Proteomics
ProteomicsProteomics
Proteomics
 
Functional genomics, and tools
Functional genomics, and toolsFunctional genomics, and tools
Functional genomics, and tools
 
Bioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalBioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formal
 
Psb tutorial cancer_pathways
Psb tutorial cancer_pathwaysPsb tutorial cancer_pathways
Psb tutorial cancer_pathways
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene finding
 
A systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlationsA systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlations
 
gene mapping, clonning of disease gene(1).pptx
gene mapping, clonning of disease gene(1).pptxgene mapping, clonning of disease gene(1).pptx
gene mapping, clonning of disease gene(1).pptx
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
Molecular markers for measuring genetic diversity
Molecular markers for measuring genetic diversity Molecular markers for measuring genetic diversity
Molecular markers for measuring genetic diversity
 
Update on the Druggable Proteome
Update on the Druggable ProteomeUpdate on the Druggable Proteome
Update on the Druggable Proteome
 
STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...STRING - Prediction of a functional association network for the yeast mitocho...
STRING - Prediction of a functional association network for the yeast mitocho...
 
Rice stress related gene expression analysis
Rice stress related gene expression analysisRice stress related gene expression analysis
Rice stress related gene expression analysis
 
Review Class on Introduction to Bioinformatics
Review Class on Introduction to BioinformaticsReview Class on Introduction to Bioinformatics
Review Class on Introduction to Bioinformatics
 
Genome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattleGenome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattle
 
Applications of bioinformatics
Applications of bioinformaticsApplications of bioinformatics
Applications of bioinformatics
 
MORPH-R article
MORPH-R articleMORPH-R article
MORPH-R article
 
Cell Authentication By STR Profiling
Cell Authentication By STR ProfilingCell Authentication By STR Profiling
Cell Authentication By STR Profiling
 

Kürzlich hochgeladen

Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxSilpa
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Silpa
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry Areesha Ahmad
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Silpa
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxSilpa
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 

Kürzlich hochgeladen (20)

Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 

Making Protein Function and Subcellular Localization Predictions: Challenges and Opportunities

  • 1. Making  protein  func0on  and  subcellular   localiza0on  predic0ons  –  challenges  and   opportuni0es   Fiona  Brinkman     Department  of  Molecular  Biology  and  Biochemistry   (Associate,  Faculty  of  Health  Sciences  and  School  of  Compu0ng  Sciences)   Simon  Fraser  University   Greater  Vancouver,  BC,  Canada     April  2014  
  • 2. •  Improving  seq  similarity/orthology-­‐based  predic0ons  –  a  keystone   of  many  predictors     •  Improving  pathway/network-­‐based  analysis  to  iden0fy  protein   func0ons       •  Future  challenges  and  opportuni0es  (using  protein  localiza0on  as   an  example  of  what  is  to  come)                                                                      What  we  MUST  do  to  move  AFP  forward….   2  
  • 3. 3       One-­‐to-­‐one  orthologs  are,  in  par0cular,  more  func0onally  similar  to   each  other,  vs  other  orthologs,  paralogs,  when  >80%  seq  iden0ty   Func0onal  similarity  measured  by  GO  annota0on  similarity  (13  species)   Altenhoff  AM  et  al.  PLoS  Comput  Biol.  2012  
  • 4. 4       One-­‐to-­‐one  orthologs  are,  in  par0cular,  more  func0onally  similar  to   each  other,  vs  other  orthologs,  paralogs,  when  >80%  seq  iden0ty   Func0onal  similarity  measured  by  GO  annota0on  similarity  (13  species)   Altenhoff  AM  et  al.  PLoS  Comput  Biol.  2012  
  • 5.
  • 6. 6   If  true  ortholog  is  missing…     (gene  loss,  or  incomplete  genome)     Ingroup1   Ingroup2   Outgroup   Species  Tree:   Gene  Tree:   Ingroup1   Ingroup2   Outgroup   RBBH   Reciprocal  Best  Blast  Hit    FAIL Gene  Tree:   Ingroup1   Outgroup   Ingroup2   Usual   Divergence   One  of  the  orthologous  genes   diverges  faster…     Paralog   RBBH   Paralog  
  • 7. Ortholuge Uses  phyle0c  ra0os  to  differen0ate     Suppor0ng  Species  Divergence  (SSD)  orthologs     vs  proteins  more  divergent  than  expected  (non-­‐SSD)   7   Ra*o1   distance{ ingroup1-­‐ingroup2}   distance{ ingroup1-­‐outgroup }   Ingroup1   Ingroup2   Outgroup   SSD   Non-­‐SSD   Ortholuge  analysis  comparing  Burkholderia  cepacia   &  B.cenocepacia  (outgroup:  B.pseudomallei)   Ra*o2   distance{ ingroup1-­‐ingroup2}   distance{ ingroup2-­‐outgroup }   Ingroup1   Ingroup2   Outgroup   Whiteside  et  al  2013     PMID  23203876  
  • 8. 0.000   0.200   0.400   0.600   0.800   1.000   KEGG   Orthology   Pfam  Domains   Tigrfam   Annota0ons   Subcellular   Localiza0ons   Propor*on   Predicted  Orthologs  in  600  Pairs  of  Bacterial  Species   SSD  Ortholog   Non-­‐SSD   8   *   *   *   *   *  p-­‐value  <  0.05   0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   One  or  more   homologs  (based  on   BLAST  hits)   Propor*on   SSD  orthologs   Non-­‐SSD   *   *  p-­‐value  <  0.05   Non-­‐SSD  “Orthologs”   more  likely:       -­‐  Func0onally  dissimilar     -­‐  Have  one  or  more   homologs  
  • 9. A Database of Ortholuge Evaluations OrtholugeDB      (0nyurl.com/ortholugeDB)   •  Provides  pre-­‐computed  ortholog  predic0ons  for  >1400  bacteria   and  archaea  (update  coming  next  month!),  with  further     Ortholuge  assessments   •  Covers  all  genes  in  fully  sequenced  bacterial  and  archaeal  genomes   •  Facilitates  visualiza0on  and  evalua0on  of  ortholog  predic0ons   9  
  • 10. Similar  issue  with  ini0al  metagenomics  seq   func0onal  evalua0on   1.  Simulated  reads  from  Pseudomonas  aeruginosa  PAO1   2.  Created  databases  at  different  levels  of  clade  exclusion   •  E.g.  for  species  clade  exclusion  removed  all  Pseudomonas     aeruginosa  genomes  from  the  database   3.  Used  RAPSearch2  and  MEGAN5  to  assign  func0onal   categories  to  the  simulated  reads   4.  Calculated  propor0on  of  reads  assigned  to  each  func0onal   category  rela0ve  to  how  many  reads  expected   •  E.g:   10   Category   Expected  #   assigned   Actual  #   assigned   Rela0ve   Propor0on   Membrane   Transport   567   583   1.02822  
  • 11. Most  func0onal  categories  are  predicted  well   but  some  are  overpredicted  (ra0o  notably  >1)   0   0.5   1   1.5   2   2.5   Ra*o  of  assigned     rela*ve  to  expected   None   Species   Family   Class   Level of clade exclusion: Ie. Endocrine system: 3 problematic orthology groups – all with high #’s of proteins (one has 3538 when median is 54!)
  • 12. The  rela0ve  propor0ons  of  func0onal  categories  stays   rela0vely  consistent  as  clade  exclusion  level  increases   0%   10%   20%   30%   40%   50%   60%   70%   80%   90%   100%   None   Species   Family   Class   Propor*on  of  reads  assigned   Clade  exclusion  level   Xenobio0cs  Biodegrada0on   and  Metabolism   Transcrip0on   Signal  Transduc0on   Replica0on  and  Repair   Infec0ous  Diseases   Nucleo0de  Metabolism   Neurodegenera0ve   Diseases   Metabolism  of  Other   Amino  Acids   Metabolism  of  Cofactors   and  Vitamins   Membrane  Transport   …
  • 13. Improving  pathway-­‐based  analysis   Issue:  Biomolecular  pathway  classifica0ons  can  bias  analyses  of   pathways  found  to  be  upregulated  or  downregulated  by   transcriptome  (or  other  omics-­‐level)  analysis     What  you  iden0fy  depends  on  how  everything  is  classified….     Need  beper  “signatures”  of  pathways…  
  • 14. Dealing  with  PART  of  the  issue…       Distribu0on  of  the  number  of  associated     pathways  for  human  genes  in  KEGG.   1 7-45 2 3 4 5 6                                                                                                             Membership  of  a  gene  in  mul0ple  pathways  is  the  norm,  not  the   excep0on…   Foroushani et al, 2014 PMCID: PMC3883547
  • 15. Not  all  genes  are  equal…     Maroon:  pathway  member      White:  no  membership       All  genes  are  not   equivalent  signatures   of  a  given  pathway   Foroushani et al, 2014 PMCID: PMC3883547
  • 16. Individual Gene ORA Antigen processing and presentation Graft-versus-host disease Natural killer cell mediated cytotoxicity Viral myocarditis Allograft rejection Cell adhesion molecules (CAMs) Chemokine signaling pathway Type I diabetes mellitus Toll-like receptor signaling pathway Cytokine-cytokine receptor interaction Example:  Treated  vs  Untreated  Mouse  Severe  InflammaIon  –   Gene  Expression  Dataset       Standard Over- Representation Analysis (ORA) and Gene Set Enrichment Analysis (GSEA) treat all genes in a given pathway as equal indicators that that pathway is significant. à Emphasizes generalist genes/ pathways Foroushani et al, 2014 PMCID: PMC3883547
  • 17. Pathway  Signatures  using  SIGORA:  IdenIfying  genes/gene  pairs     uniquely  associated  with  a  single  pathway   SIGORA identifies statistically significant enrichment of Pathway Signatures in a gene list of interest. Foroushani et al, 2014 PMCID: PMC3883547
  • 18. Example: Treated vs Untreated Mouse Severe Inflammation – Gene Expression Dataset     SIGORA  avoids  many  biologically  less  plausible  results  seen  by  other   methods  that  over-­‐emphasize  generalist  genes/pathways.   For example, 6/8 up-regulated genes in “Type I diabetes mellitus” pathway are also in the "Antigen processing and presentation" pathway. Individual Gene ORA SIGORA Antigen processing and presentation Antigen processing and presentation Graft-versus-host disease Natural killer cell mediated cytotoxicity Natural killer cell mediated cytotoxicity Complement and coagulation cascades Viral myocarditis Toll-like receptor signaling pathway Allograft rejection Cytokine-cytokine receptor interaction Cell adhesion molecules (CAMs) Leukocyte transendothelial migration Chemokine signaling pathway Cell adhesion molecules (CAMs) Type I diabetes mellitus Cytosolic DNA-sensing pathway Toll-like receptor signaling pathway Chemokine signaling pathway Cytokine-cytokine receptor interaction
  • 19. Future  challenges  and  opportuni0es       (using  bacterial  protein  localiza0on  as  an  example     of  what  is  to  come)     (Gardy & Brinkman 2006 Nature Reviews Microbiology 4:741) 19  
  • 20. Bacterial  protein  subcellular  localiza0on  predic0on   •  Aids  genome  annota0on  and  predic0on  of  protein  func0on     •  Used  to  iden0fy  cell  surface/secreted  targets  for  drugs  and   diagnos0cs,  as  well  as  poten0al  vaccine  components   •  Many  pathogen-­‐associated  virulence  factors  predicted  as  secreted   (Gardy & Brinkman 2006 Nature Reviews Microbiology 4:741) 20  
  • 21. Signal  pep0des:  Non-­‐cytoplasmic     Amino  acid  composi0on/paperns:  All  localiza0ons    -­‐  Support  Vector  Machine’s  trained  with  amino  acid                                      composi0ons  or  frequent  subsequences           Transmembrane  helices:  Cytoplasmic  membrane    -­‐  HMMTOP     PROSITE  mo0fs  with  100%  precision:  All  localiza0ons     Outer  membrane  mo0fs:  Outer  membrane    -­‐  Iden0fied  by  associa0on-­‐rule  mining       Homology  to  proteins  of  experimentally  known  localiza0on:  All  loc.    -­‐  “SCL-­‐BLAST”  against  pro  of  known  localiza0on    -­‐  E=10e-­‐10  and  length  restric0on  for  precision     Integra0on   with  a   Baysian   Network   Yu  et  al  (2010)  BioinformaIcs  26:1608     PSORTb:  bacterial  protein  subcellular   localiza0on  (SCL)  predic0on  sosware  
  • 22. PSORTb:  version  3   22 • Type  III  secre0on  apparatus   • Pili/fimbria   • Host-­‐associated  SCL   • Flagellum   • Spore   • Gas  vesicle   Sub-­‐category  localiza0on  predic0ons   Main  localiza0ons  predicted   Bacteria  and  Archaea  predic0ons  
  • 23. Gram-­‐ nega6ve SoNware Precision Recall PSORTb  v3.0 96.8 88.0 PSORTb  v2.0 95.7 81.5 Gram-­‐ posi6ve PSORTb  v3.0 97.0 93.2   PSORTb  v2.0 96.7 89.3 Archaea   PSORTb  v3.0 95.0   93.3   PSORTb  v3.0:  high  precision,  improved  sensi0vity/ recall  and  genome  predic0on  coverage   0   10   20   30   40   50   60   70   80   90   100   PSORTb  v.2. PSORTb  v.3. Five-­‐fold  cross  valida0on   Genome  predic0on  coverage   Gram-­‐negaIve   Gram-­‐posiIve   A  computa0onal  predictor  more  accurate  than  related  high-­‐throughput  lab  methods  
  • 24.   Classic  Gram  posi0ve  bacteria,  monoderms:  Thick  pep0doglycan,  no  outer  membrane   Classic  Gram  nega0ve  bacteria,  diderms:  Thin  pep0doglycan  +  outer  membrane     …but  can  have  Gram  nega0ves  with  no  outer  membrane  (i.e.  Mycoplasma)     or  a  different  outer  membrane  (Synergistetes,  Sphingomonas),  or  Gram  posi0ve  (thick   peptdoglycan)  with  a  different  outer  membrane  (Deinococcus  –  6  layers  in  cell   envelope!),  or  “acid  fast”with  asymmetric  lipid-­‐containing  thick  cell  wall  (Mycobacteria)   Plus  bacterial  organelles  and  other  substructures   (ie.  magnetosome  of  Magnetospirillum)...     Solu*on:     -­‐   For  whole  genome  (deduced-­‐proteome)  analysis,        detect  key  protein  markers  of  a  par0cular  cell  type        (i.e.  Omp85  essen0al  for  classic  Gram  nega0ve  membrane)   -­‐  For  single  protein  analysis,  learn  from  above  analysis,  plus        literature  cura0on,  the  most  likely  cell  type  for  a  given  phyla                                                  …then  make  predic0ons  assuming  that  cell  “type”   Challenge:  Organismal  diversity     24 Reproduced under Fair Use
  • 25. Challenge:  Temporal,  contextual  diversity   Proteins  can  be  associated  with  mul0ple  subcellular  localiza0ons                 i.e.  Cell  division  proteins,  Autotransporters,  “protein  A  dependant  on  protein  B”                               Solu0on:  Note  all  possible  localizaIons  since  Temporal,  contextual  predic0ons   non-­‐trivial  –  not  enough  knowledge  for  most   Kjærgaard K et al. J. Bacteriol. 2000;182:4789-4796
  • 26. Challenge:  Metagenomics   High  demand  for  PSORTb  to  be  able  to  analyze  metagenomic  sequences   ….  under  development                 Need  taxonomy  data  to  aid  predic0ons     (then  enable  appropriate  cell  type  analysis)      
  • 27.               Through  over  a  decade  of  cura0ng  for,   making  and  evalua0ng  predictors  of   protein  localiza0on,  genomic  islands,  etc       What  makes  a  great  predictor?        
  • 28.               Through  over  a  decade  of  cura0ng  for,   making  and  evalua0ng  predictors  of   protein  localiza0on,  genomic  islands,  etc       What  makes  a  great  predictor?       (besides  it  being  right)    ☺    
  • 29. Bioinforma0cs  Predictor’s  Code  of  Conduct   -­‐  Never  force  predic0ons  -­‐  always  have  a  predic0on  op0on/category  of        “unknown”                   Inspired  by  the  classic  “Data  Provider’s  Code  of  Conduct”  in  Stein  (2002)  Nature  417,  119-­‐120  
  • 30. Example  of  forced  predic0ons:  PSORT  I  predic0on  method     Nakai & Kanehisa, Proteins: Structure, Function, Genetics (1991) Overall Accuracy = 69% What’s wrong here?
  • 31. Example  of  forced  predic0ons:  PSORT  I  predic0on  method   Nakai & Kanehisa, Proteins: Structure, Function, Genetics (1991) Overall Accuracy = 69% No secreted/ extracellular localization!
  • 32. Inspired  by  the  classic  “Data  Provider’s  Code  of  Conduct”  in  Stein  (2002)  Nature  417,  119-­‐120     -­‐  Never  force  predic0ons  -­‐  always  have  “unknown”  op0on/category             -­‐  Ensure  open  source  -­‐  enable  viewing  of  predic0on  method  details       -­‐   Predictor  should  easily  be  trainable  with  different  datasets          (if  applicable;  so  others  can  robustly  evaluate  accuracy)     -­‐   Have  ability  to  run  locally  or  over  web  (with  an  API  is  preferred)   -­‐   Provide  access  to  old  versions  (at  minimum  when  transi0oning        to  new  version)   -­‐  Encourage  con0nuing  cura0on  from  the  literature/lab  experiments!          Incorporate  some  curaIon  efforts  into  predictor  funding  applicaIons   Bioinforma0cs  Predictor’s  Code  of  Conduct  
  • 33. Bioinforma0cs  Predictor’s  Code  of  Conduct  -­‐  evalua*on   33   -­‐  Evaluate  precision  and  recall  (and  accuracy  measure  combos  thereof)        with  x-­‐fold  cross  valida0on  and/or  new  datasets  (like  CAFA!)     -­‐   ID  errors,  biases  and  provide  guidance  to  users  re  issues  to  watch  for   -­‐   bias  in  training  and/or  tes0ng  datasets        (“homology  reduc0on”,  “clade  exclusion”  may  help)   -­‐  errors  in  “gold  standard”  lab-­‐based  measure   -­‐  contextual/temporal  changes  in  proteins,  impac0ng  predic0on        (ie.  Func0on  changes  when  another  protein/compound  present)           What  we  MUST  do:   Guide  users  to  not  just  blindly  use  a  predictor  and  its  default  output.    
  • 34. What  we  MUST  do:     Guide  users  to  not  just  blindly  use  a  predictor  and  its  default  output.       Curators,  experimentalists,  and  automated  funcIon  predictor   developers  must  coordinate  efforts  more     •  Experimentalists  working  on  what     they  think  best…   •  Curators  cura0ng  what  they     priori0ze…   •  Func0on  predictors  op0mizing     predic0on  using  exis0ng  data….       FuncIon  predictors/bioinformaIcists  need  to  get  in  the  drivers  seat   more  for  research     Bioinforma0cs  Predictor’s  Code  of  Conduct  
  • 35. Brinkman  Lab  Kayaking  Trip,  Summer  2013     (Next  up,  Archery  Tag!)     Amir  Foroushani   Maphew  Laird   David  Lynn   Raymond  Lo       Mike  Peabody   Thea  Van  Rossum   Maphew  Whiteside   Nancy  Yu