SlideShare ist ein Scribd-Unternehmen logo
1 von 77
Downloaden Sie, um offline zu lesen
Cancer	
  Systems	
  Biology:	
  
RNA-­‐Seq	
  and	
  Differen;al	
  Expression	
  Analysis	
  
Taking	
  advantage	
  of	
  a	
  Measurement	
  Revolu;on	
  
July	
  25,	
  2013	
  
Anne	
  DeslaLes	
  Mays	
  
Wellstein/Riegel	
  Laboratory	
  
Mentor:	
  Anton	
  Wellstein,	
  MD,	
  PhD	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   1	
  
Talk	
  Outline	
  
•  On	
  the	
  Shoulders	
  of	
  Giants	
  
•  Sequencing	
  Timeline	
  
•  RNASeq	
  for	
  Everyone	
  
•  RNA-­‐Sequencing	
  Details	
  
•  Differen;al	
  Expression	
  Analysis	
  
•  Causality	
  
•  Cancer	
  Therapeu;cs	
  Example	
  
•  Ask	
  Bigger	
  Ques;ons	
  –	
  Sequencing	
  Everything	
  	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   2	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   3	
  
Rosalind	
  Franklin	
  
“pioneered	
  use	
  of	
  x-­‐rays	
  to	
  create	
  images	
  of	
  unorganized	
  maLer	
  –	
  such	
  as	
  
large	
  biological	
  molecules	
  –	
  not	
  just	
  single	
  crystals”	
  
hLp://www.pbs.org/wgbh/aso/databank/entries/bofran.html	
  
“Franklin	
  made	
  equipment	
  adjustments	
  to	
  produce	
  an	
  extremely	
  fine	
  beam	
  of	
  x-­‐rays.	
  	
  
She	
  extracted	
  finer	
  DNA	
  fibers	
  than	
  ever	
  before	
  and	
  arranged	
  them	
  in	
  parallel	
  
bundles.	
  	
  Studied	
  fibers’	
  reac;ons	
  to	
  humid	
  condi;ons.	
  …	
  allowed	
  her	
  to	
  discover	
  
cruical	
  keys	
  to	
  DNA’s	
  structure….	
  Wilkins	
  shared	
  this	
  with	
  Watson	
  &	
  Crick	
  at	
  
Cambridge	
  without	
  her	
  knowledge…”	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   4	
  
Sequencing	
  Timeline	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   5	
  
Human	
  Sequencing	
  Timeline	
  
Key	
  Technical	
  Advances:	
  	
  Celera	
  Human	
  Sequence	
  done	
  in	
  one	
  loca;on	
  
on	
  the	
  largest	
  super	
  computer	
  in	
  private	
  hands	
  at	
  that	
  ;me	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   6	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   7	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   8	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   9	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   10	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   11	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   12	
  
Cancer	
  Systems	
  Biology	
  
Taking	
  advantage	
  of	
  measurement	
  revolu3on	
  
Declining	
  sequencing	
  costs,	
  decreasing	
  compu3ng	
  costs	
  
How	
  do	
  you	
  leverage	
  all	
  this	
  data?	
  
GEO May 25, 2012
GEO June 25, 2013
Here	
  is	
  an	
  example	
  RNA-­‐Seq	
  Workflow	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   14	
  
Experimental	
  
Design	
  
Sample	
  
Collec;on	
  
Quality	
  Control	
  
Read	
  Trimming	
  
Differen;al	
  
Analysis	
  
Transcript	
  
Iden;fica;on	
  
Pathway	
  
Analysis	
  
Feature	
  
Discovery	
  
Sequencing	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   15	
  
hLp://rnaseq.uoregon.edu/index.html	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   16	
  
hLp://rnaseq.uoregon.edu/index.html	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   17	
  
hLp://rnaseq.uoregon.edu/index.html	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   18	
  
hLp://rnaseq.uoregon.edu/index.html	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   19	
  hLp://rnaseq.uoregon.edu/index.html	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   20	
  
hLp://rnaseq.uoregon.edu/index.html	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   21	
  
hLp://rnaseq.uoregon.edu/index.html	
  
Replicates:	
  	
  Type	
  I	
  and	
  Type	
  II	
  errors	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   22	
  
Detec;ng	
  Signal	
  vs.	
  Noise	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   23	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   24	
  
What	
  is	
  the	
  goal	
  of	
  the	
  sequencing	
  
experiment?	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   25	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   26	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   27	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   28	
  
Before	
  Library	
  Construc;on	
  
1.  Most	
  vendors	
  and	
  cores	
  will	
  assess	
  
the	
  quality	
  of	
  the	
  RNA	
  before	
  
sequencing	
  
2.  Important	
  to	
  determine	
  before	
  
sequencing	
  begins	
  
Garbage	
  –	
  in	
  ==	
  Garbage	
  out	
  
Before	
  library	
  construc;on,	
  RNA	
  quality	
  must	
  be	
  assessed	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   29	
  
RNA-­‐seq	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   30	
  
Three	
  steps	
  to	
  get	
  to	
  a	
  fresh	
  sequence	
  with	
  the	
  
Illumina	
  Genome	
  Sequence	
  Analyzer	
  
•  Library	
  genera;on	
  
•  Cluster	
  genera;on	
  
•  Sequencing	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   31	
  
Before	
  Library	
  Construc;on	
  
1.  Poly-­‐A	
  Selec;on	
  (Total	
  RNA	
  -­‐>	
  
mRNA)	
  
2.  mRNA	
  fragmenta;on	
  
3.  First	
  strand	
  synthesis	
  (here	
  we	
  stop	
  
if	
  we	
  want	
  to	
  maintain	
  strand	
  
specificity	
  
4.  Second	
  strand	
  synthesis	
  
Other	
  techniques	
  
1.  Ribozero	
  
2.  Ribominus	
  
Library	
  Construc;on:	
  	
  Messenger	
  RNA	
  are	
  Poly-­‐A	
  selected	
  
from	
  Total	
  RNA,	
  fragmented	
  and	
  cDNA	
  synthesized	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   32	
  
cDNA	
  (single	
  or	
  double	
  stranded)	
  
1.  cDNA	
  is	
  blunt	
  end-­‐repaired	
  and	
  
phosphorylated	
  (B.)	
  
2.  A-­‐base	
  added	
  to	
  prepare	
  for	
  
indexed	
  adapter	
  liga;on	
  (C.)	
  
	
  
Library	
  Construc;on:	
  End	
  repair	
  and	
  adenyla;on	
  results	
  in	
  
adapter	
  liga;on	
  ready	
  constructs	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   33	
  
Index	
  adapter	
  liga;on	
  and	
  product	
  
ready	
  for	
  amplifica;on	
  on	
  cBot	
  or	
  
the	
  cluster	
  sta;on	
  
1.  Strand	
  specific	
  tags	
  are	
  added	
  to	
  
the	
  A	
  base	
  –	
  ligate	
  index	
  adapter	
  
(D)	
  
2.  Denature	
  and	
  amplify	
  for	
  final	
  
product	
  (E)	
  
	
  
Library	
  Construc;on:	
  Adapter	
  liga;on	
  results	
  in	
  cluster-­‐
genera;on-­‐ready	
  constructs	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   34	
  
Single	
  DNA	
  molecules	
  hybridize	
  to	
  
the	
  lawn	
  of	
  oligos	
  graped	
  to	
  the	
  
surface	
  of	
  the	
  flow	
  cell	
  
1.  Oligo	
  lawn	
  
2.  Oligos	
  hybridize	
  to	
  the	
  adapters	
  
that	
  had	
  been	
  ligated	
  to	
  the	
  
library	
  fragments	
  which	
  flow	
  
through	
  the	
  cell	
  
	
  
	
  
Cluster	
  Genera;on:	
  In	
  the	
  illumina	
  Cbot	
  system,	
  single	
  molecules	
  are	
  
isothermally	
  amplified	
  in	
  a	
  flow	
  cell	
  to	
  prepare	
  them	
  for	
  sequencing	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   35	
  
Bridge	
  amplifica;ons	
  resul;ng	
  in	
  
100s	
  of	
  millions	
  of	
  unique	
  clusters	
  
1.  Each	
  fragment	
  is	
  clonally	
  
amplified	
  through	
  a	
  series	
  of	
  
extensions	
  and	
  isothermal	
  bridge	
  
amplifica;ons	
  
2.  Reverse	
  strands	
  cleaved	
  and	
  
washed	
  away	
  
3.  Ends	
  are	
  blocked	
  
4.  Sequencing	
  primer	
  hybridized	
  to	
  
the	
  DNA	
  template	
  
5.  Libraries	
  are	
  ready	
  for	
  
sequencing	
  
	
  
	
  
Cluster	
  genera;on:	
  	
  Bound	
  fragments	
  are	
  extended	
  to	
  make	
  
copies	
  and	
  reverse	
  strands	
  cleaved	
  and	
  washed	
  away	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   36	
  
4	
  fluorescently	
  labeled	
  reversibly	
  
terminated	
  nucleo;des	
  
1.  Each	
  base	
  competes	
  for	
  addi;on	
  
2.  Natural	
  compe;;on	
  ensures	
  
highest	
  accuracy	
  
3.  Aper	
  each	
  round	
  of	
  synthesis,	
  
clusters	
  are	
  excited	
  by	
  a	
  laser	
  
emiqng	
  a	
  color	
  that	
  iden;fies	
  
the	
  newly	
  added	
  base	
  
4.  Fluorescent	
  label	
  and	
  blocking	
  
group	
  are	
  removed	
  allowing	
  for	
  
addi;on	
  of	
  next	
  nucleo;de	
  
5.  Proprietary	
  (Illumina)	
  chemistry	
  
reads	
  a	
  base	
  in	
  each	
  cycle	
  
6.  Allows	
  for	
  accurate	
  sequencing	
  
through	
  difficult	
  regions	
  such	
  as	
  
homopolymers	
  and	
  repe;;ve	
  
sequence	
  
Sequencing:	
  	
  100s	
  of	
  millions	
  of	
  clusters	
  sequenced	
  
simultaneously	
  
There	
  are	
  other	
  ways	
  to	
  Inquire	
  about	
  the	
  
Transcriptome	
  
•  Array	
  Based	
  Technologies	
  
–  Affymetrix	
  
–  Agilent	
  
–  Known	
  genes	
  and	
  hybridiza;on	
  protocols	
  
•  Microarray	
  
–  20,000+	
  array	
  experiments	
  on	
  a	
  single	
  platorm	
  
–  Edge	
  effects	
  
–  False	
  posi;ves	
  /	
  false	
  nega;ves	
  
•  Bead-­‐based	
  arrays	
  
•  Tiling	
  arrays	
  
•  SAGE	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   37	
  
What	
  is	
  unique	
  about	
  RNA-­‐Seq?	
  	
  
•  Allows	
  you	
  to	
  discover	
  and	
  profile	
  the	
  en;re	
  transcriptome	
  of	
  
any	
  organism	
  
•  No	
  probes	
  or	
  primers	
  to	
  design	
  
•  Novel	
  transcripts	
  
•  Novel	
  isoforms	
  
•  Alterna;ve	
  splice	
  sites	
  
•  Rare	
  transcripts	
  
•  cSNPS	
  –	
  all	
  of	
  this	
  in	
  one	
  experiment	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   38	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   39	
  
Aper	
  sequencing…	
  
1.  Quality	
  control	
  –	
  trim	
  your	
  reads	
  
2.  Count	
  Reads	
  
•  Align	
  to	
  genome	
  
•  Align	
  to	
  transcriptome	
  
3.  Interpret	
  Data	
  
•  Sta;s;cal	
  tests	
  (differen;al	
  
expression	
  analysis)	
  
•  Visualiza;on	
  (mapped	
  
reads)	
  
•  Pathway	
  analysis	
  
	
  
Not	
  so	
  simple	
  –	
  big	
  data,	
  big	
  
compute	
  requirements	
  
	
  
Aper	
  sequencing,	
  we	
  must	
  then	
  perform	
  	
  
RNA-­‐Seq	
  Data	
  Analysis	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   40	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   41	
  
RNASeq flow chart – reference (steps 1-4): http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html
Step 1: align-reads:
FASTQ	
  
	
  PE*	
  reads	
  
Reference	
  
Genome	
  	
  
Assembly	
  
WGS	
  
Exis;ng	
  
Gene	
  models	
  
(gt	
  files	
  w/	
  tss	
  ids)*	
  
Gene	
  models	
  	
  
mapped	
  to	
  	
  
reference	
  
gsnap	
  
trimmoma;c	
  
FASTQC	
  
trimmed	
  
	
  PE*	
  reads	
  
Quality	
  control	
  	
  
consensus	
  	
  
per	
  read	
  length	
  
	
  graphs	
  
•  Tss ids = transcription start site ids, in a gtf file format
•  PE – paired end
•  The gene models that are built with the pasa pipeline can be input to tophat
Shadeless	
  
	
  rectangle	
  
An unshaded rectangle represents code to be run – a process
Shaded	
  
	
  rectangle	
  
A shaded rectangle is a file or a graphic which may be an input and/
or an output
Legend	
  
Gsnap	
  aligned	
  
Bam	
  files	
  
Dark	
  rectangle	
  
Dark rectangle represents a file that can be displayed as a track in
crop-pedia
Align-reads: Gsnap is used to align reads to the
genome sequence.
samtools	
   Gsnap.CoordSorted.bam	
  
RNA	
  Alterna;ve	
  Splicing:	
  Why	
  you	
  
need	
  gapped	
  aligners	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   43	
  
RNASeq flow chart – reference (steps 1-4): http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html
Step 2: assemble-reads:
Prep_rnaseq_	
  
alignments_for	
  	
  
genome_assisted_	
  
assembly.pl	
  
•  Tss ids = transcription start site ids, in a gtf file format
•  PE – paired end
•  The gene models that are built with the pasa pipeline can be input to tophat
Shadeless	
  
	
  rectangle	
  
An unshaded rectangle represents code to be run – a process
Shaded	
  
	
  rectangle	
  
A shaded rectangle is a file or a graphic which may be an input and/
or an output
Legend	
  
Dark	
  rectangle	
  
Dark rectangle represents a file that can be displayed as a track in
crop-pedia
assemble-reads: Trinity is used to assemble the RNA-Seq reads in each
partition. This can be done in a massiviely parallel manner, typically requiring
little RAM as compared to whole de novo RNA-Seq assemblies, and can be
executed using standard hardware.
The firs step (pre_rnaseq_alignments_for genome_assisted_assembly.pl –
partitions the reads according to covered regions
Gsnap.CoordSorted.bam	
  
Find	
  Dir_*	
  -­‐name	
  
	
  “*reads”	
  >	
  read_files.list	
  
Read_files.list	
  
GG_write_trinity_	
  
cmds.pl	
  
ParaFly	
  
Trinity_GG.cmds	
  
Find	
  Dir_*	
  -­‐name	
  
	
  “*inity.fasta”	
  –exec	
  cat	
  {}	
  |	
  	
  
Inchworm_accession_incrementer.pl	
  >	
  
Trinity_GG.fasta	
  
Trinity_GG.fasta	
  
RNASeq flow chart – reference (steps 1-4): http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html
Steps 3 and 4: align-transcripts and assemble-transcript alignments
Launch_PASA_pipeline.pl	
  
•  Tss ids = transcription start site ids, in a gtf file format
•  PE – paired end
•  The gene models that are built with the pasa pipeline can be input to tophat
Shadeless	
  
	
  rectangle	
  
An unshaded rectangle represents code to be run – a process
Shaded	
  
	
  rectangle	
  
A shaded rectangle is a file or a graphic which may be an input and/
or an output
Legend	
  
Dark	
  rectangle	
  
Dark rectangle represents a file that can be displayed as a track in
crop-pedia
Trinity_GG.fasta	
  
Pasa_databasename	
  
.pasa_assemblies.denovo_	
  
transcript_isoforms.gt	
  
Pasa_databasename	
  
.pasa_assemblies.denovo_	
  
transcript_isoforms.bed	
  
Pasa_databasename	
  
.pasa_assemblies.denovo_	
  
transcript_isoforms.gff3	
  
Pasa_databasename	
  
.pasa_assemblies.denovo_	
  
transcript_isoforms.fasta	
  
RNASeq flow chart – Step 5 – Tuxedo Suite – using the output of the trinity-genome-guided assembly and the pasa and
keygene annotation pipelines à call tuxedo suite (in parallel with then calling the abundancy estimator RSEM
•  Tss ids = transcription start site ids, in a gtf file format
•  PE – paired end
•  The gene models that are built with the pasa pipeline can be input to tophat
Shadeless	
  
	
  rectangle	
  
An unshaded rectangle represents code to be run – a process
Shaded	
  
	
  rectangle	
  
A shaded rectangle is a file or a graphic which may be an input and/
or an output
Legend	
  
Dark	
  rectangle	
  
Dark rectangle represents a file that can be displayed as a track in
crop-pedia
	
  
	
  
	
  
	
  
Gff3	
  (gene	
  model)	
  
	
  
	
  
	
  
Gff3togt	
  
(convert	
  to	
  gt	
  format	
  
	
  
	
  
	
  
	
  
Gt	
  (gene	
  model)	
  
	
  
	
  
	
  
tophat	
   Calls	
  	
  Bow;e2	
  
	
  
	
  
	
  
	
  
Junc;ons.bed	
  
	
  
	
  
	
  
Accepted.hits.	
  
sam	
  
RNASeq Quantitation and Differential Analysis
•  Tss ids = transcription start site ids, in a gtf file format
•  PE – paired end
•  The gene models that are built with the pasa pipeline can be input to tophat
Shadeless	
  
	
  rectangle	
  
An unshaded rectangle represents code to be run – a process
Shaded	
  
	
  rectangle	
  
A shaded rectangle is a file or a graphic which may be an input and/
or an output
Legend	
  
Quantitation (matrix file with counts per isoform) Model building/Differential analysis
Trinity.fasta	
  
Dark	
  rectangle	
  
Dark rectangle represents a file that can be displayed as a track in
crop-pedia
Tuxedo suite
Trinity genome guided assembly
Abundance	
  	
  
es;ma;on	
  
RSEM	
  
Transcripts	
  
.gt/.gff*	
  
trimmed	
  
	
  PE*	
  reads	
  
RSEM.isoform.	
  
results	
  
Limma	
  Model	
  
Design/contrast	
  
matrix	
  	
  
building	
  
randomForest	
  	
  
pcAlg	
  
Genie3.R	
  
DREAM4	
  
Accepted.hits.	
  
sam	
  
cuffdiff2	
  
•  Transcript annotation file produced by cufflinks, cuffcompare or other
source
•  Counts and read group tracking files also created
Isoforms.fpkm_tracking	
  
Genes.fpkm.tracking	
  
Cds.fpkm.tracking	
  
Tss_groups.fpkm.tracking	
  
Isoform_exp.diff	
  
Gene_exp.diff	
  
Tss_group_exp.diff	
  
Cds_exp.diff	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   48	
  
How	
  much	
  RNA-­‐sequencing	
  data?	
  
1.  20	
  million	
  paired	
  end	
  reads	
  ~	
  2	
  GB	
  of	
  data	
  
2.  100	
  million	
  paired	
  end	
  reads	
  ~	
  10	
  GB	
  of	
  data	
  
	
  
How	
  much	
  computa;on	
  power?	
  
1.  More	
  memory,	
  more	
  processors,	
  less	
  ;me	
  it	
  takes	
  to	
  compute	
  
2.  Outsource	
  the	
  analysis,	
  s;ll	
  will	
  need	
  to	
  store	
  the	
  results	
  somewhere	
  
Amazon	
  web	
  services	
  
S3	
  storage	
  
EC	
  elas;c	
  cloud	
  on	
  demand	
  computa;onal	
  facility	
  
	
  
Georgetown	
  University	
  High	
  Performance	
  Computer	
  Core	
  
matrix.georgetown.edu	
  
	
  
UPENN	
  Galaxy	
  services	
  
	
  
	
  
	
  
How	
  much	
  RNA-­‐sequencing	
  data,	
  how	
  much	
  computa;on	
  
power	
  and	
  where	
  do	
  you	
  go	
  to	
  compute?	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   49	
  
A	
  growing	
  number	
  of	
  tools	
  enable	
  RNA-­‐Seq	
  analysis	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   50	
  
What	
  percentage	
  of	
  reads	
  are	
  covered?	
  What	
  
percentage	
  of	
  reads	
  are	
  mapped?	
  
3’	
  Bias	
  on	
  transcript	
  reads	
  
1.  60-­‐80%	
  of	
  reads	
  are	
  mapped	
  
2.  Highest	
  percentage	
  or	
  3’	
  end	
  of	
  
reads	
  are	
  mapped	
  
3.  Reads	
  need	
  to	
  be	
  quality	
  trimmed	
  
Mapping	
  tools	
  bias	
  exons	
  to	
  known	
  
genes	
  
	
  
	
  
	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   51	
  
Galaxy	
  is	
  a	
  web	
  based	
  tool	
  commiLed	
  to	
  enable	
  a	
  
researcher	
  (more	
  than	
  just	
  for	
  RNA-­‐Seq)	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   52	
  
How	
  to	
  visualize	
  mapped	
  results?	
  
•  UCSC	
  Genome	
  Browser	
  (Gbrowse)	
  
•  Integrated	
  Genome	
  Browser	
  (IGB)	
  
•  Integrated	
  Genome	
  Viewer	
  (IGV)	
  
Many	
  shared	
  formats,	
  reading	
  many	
  of	
  the	
  outputs	
  generated	
  by	
  
the	
  programs,	
  ability	
  to	
  generate	
  ones	
  own	
  tracks	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   53	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   54	
  
Scale
chr21:
DNase Clusters
Multiz Align
Human mRNAs
K562 CTCF Int 1
K562 Pol2 Int 1
HeLaS3 Pol2 Int 1
GM12878 1
H1-hESC 1
K562 1
HeLa-S3 1
HepG2 1
GM12878
H1-hESC
K562
HeLa-S3
HepG2
HUVEC
GM12878 Pk
H1-hESC Pk
K562 Pk
HeLa-S3 Pk
50 kb hg19
23,600,000 23,650,000
C7 Random
C7 Targeted
Transcription Factor ChIP-seq from ENCODE
SwitchGear Genomics Transcription Start Sites
H3K27Ac Mark (Often Found Near Active Regulatory Elements) on 7 cell lines from ENCODE
RefSeq Genes
Human ESTs That Have Been Spliced
Digital DNaseI Hypersensitivity Clusters in 125 cell types from ENCODE
Vertebrate Multiz Alignment & Conservation (46 Species)
UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics)
Simple Nucleotide Polymorphisms (dbSNP 137) Found in >= 1% of Samples
Individual matches for article Przybylski2010
Sequences in Articles: PubmedCentral and Elsevier
SNPs in Publications
Human mRNAs from GenBank
Regulatory elements from ORegAnno
Chromatin Interaction Analysis Paired-End Tags (ChIA-PET) from ENCODE/GIS-Ruan
DNA Methylation by Reduced Representation Bisulfite Seq from ENCODE/HudsonAlpha
CpG Methylation by Methyl 450K Bead Arrays from ENCODE/HAIB
Chromatin Interactions by 5C from ENCODE/Dekker Univ. Mass.
HWI-ST1129:97:D0LRDACXX:6:2208:3356:23592_2:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2208:3356:23592_1:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2204:15017:145130_2:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2204:15017:145130_1:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2107:8319:79365_2:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2107:8319:79365_1:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2107:12368:117403_1:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2107:12368:117403_2:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2208:7212:116648_1:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2208:7212:116648_2:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2205:11321:72079_1:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:1203:1649:66972_1:N:0:CTCTCA
HWI-ST1129:97:D0LRDACXX:6:1203:1649:66972_2:N:0:CTCTCA
HWI-ST1129:97:D0LRDACXX:6:2106:11187:101221_2:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2106:11187:101221_1:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2102:8052:88370_2:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2102:8052:88370_1:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2108:5000:141429_2:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2108:5000:141429_1:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:1303:16417:184679_2:N:0:CACTCC
HWI-ST1129:97:D0LRDACXX:6:1303:16417:184679_1:N:0:CACTCC
HWI-ST1129:97:D0LRDACXX:6:2106:18235:74385_1:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2106:18235:74385_2:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2201:15196:5280_2:N:0:CACTCA
HWI-ST1129:97:D0LRDACXX:6:2201:15196:5280_1:N:0:CACTCA
HWI-ST1129:299:C18KJACXX:6:1305:12160:63303_1:N:0:ATCACG
HWI-ST1129:299:C18KJACXX:6:1102:19732:75986_1:N:0:ATCACG
HWI-ST1129:299:C18KJACXX:6:1305:12160:63303_2:N:0:ATCACG
HWI-ST1129:299:C18KJACXX:6:1102:19732:75986_2:N:0:ATCACG
KCEBPB
LMafK_(ab50322)
KTAL1_(SC-12984)
KCEBPB KKYY1
KTBP
KE2F4
KTAF1
KELF1_(SC-631)
KPol2-4H8
KHEY1
KE2F6_(H-50)
KCEBPB
KTFIIIC-110
ggNFKB
GgPU.1
GBATF
GIRF4_(M-17)
GBCL11A
GgPU.1
gPU.1 KCEBPB
DA743484
BF207587
Delgado-Olguin2004
Layered H3K27Ac
100 _
0 _
Mammal Cons
K562 CTCF Sig 1
K562 Pol2 Sig 1
HeLaS3 Pol2 Sig 1
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   55	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   56	
  
What	
  do	
  RNA-­‐Seq	
  reads	
  look	
  like	
  for	
  GAPDH?	
  
Repeat	
  masked	
  allowing	
  1/2	
  mismatched	
  bases	
  blat’d	
  reads	
  	
  
viewed	
  in	
  IGB	
  6.7.2	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   58	
  
RNA-­‐Seq	
  Differen;al	
  
Expression	
  analysis	
  
What	
  does	
  GAPDH	
  look	
  like	
  in	
  terms	
  of	
  quan;ta;on?	
  
TOTAL	
  BM	
   HPP	
  
RPKM	
   3SEQ	
  Counts	
   BLAT	
  Reads	
   RPKM	
   3SEQ	
  Counts	
   BLAT	
  Reads	
  
CD34	
   0.7	
   340	
   230	
   8	
   8	
   14	
  
BST1	
   19.7	
   5374	
  	
  	
   31	
   31	
  	
  	
  
CD133	
   0.2	
   173	
   176	
   16	
   16	
   33	
  
THY1	
   0	
   7	
  	
  	
   4	
   4	
  	
  	
  
A12	
   	
  	
   	
  	
   1	
  	
  	
   	
  	
   0	
  
A5	
   	
  	
   	
  	
   0	
  	
  	
   	
  	
   0	
  
ALK	
   0	
   9	
   24	
   0	
   0	
   3	
  
B9	
   	
  	
   	
  	
   0	
  	
  	
   	
  	
   0	
  
C1	
   	
  	
   	
  	
   0	
  	
  	
   	
  	
   0	
  
C2	
   	
  	
   	
  	
   0	
  	
  	
   	
  	
   0	
  
C7	
   	
  	
   	
  	
   0	
  	
  	
   	
  	
   0	
  
E7	
   	
  	
   	
  	
   0	
  	
  	
   	
  	
   0	
  
E9	
   	
  	
   	
  	
   2	
  	
  	
   	
  	
   0	
  
F6	
   	
  	
   	
  	
   0	
  	
  	
   	
  	
   0	
  
G12	
   	
  	
   	
  	
   0	
  	
  	
   	
  	
   0	
  
GAPDH	
   3013.2	
   727831	
   356289	
   120.8	
   5559	
   2670	
  
H3	
   	
  	
   	
  	
   0	
  	
  	
   	
  	
   0	
  
Blat	
  read	
  raw	
  counts	
  ra;o	
  ==	
  3Seq	
  counts	
  ra;o	
  ~=	
  130	
  to	
  1	
  
RPKM	
  ra;o	
  ~=	
  24.3	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   60	
  
RNA-­‐Seq	
  Quan;fica;on	
  Challenge:	
  A	
  problem	
  that	
  
exists	
  with	
  RNA-­‐Seq	
  data	
  that	
  doesn’t	
  exist	
  with	
  array	
  
data:	
  	
  Longer	
  transcripts	
  produce	
  more	
  reads	
  than	
  
shorter	
  transcripts	
  
One	
  solu;on	
  to	
  account	
  for	
  this	
  is	
  RPKM	
  (FPKM	
  used	
  by	
  Cufflinks)	
  
	
  
RPKM	
  =	
  10^9	
  x	
  C	
  /	
  NL,	
  which	
  is	
  really	
  just	
  simply	
  C/N	
  
	
  
C(gene)=	
  the	
  number	
  of	
  mappable	
  reads	
  that	
  fall	
  onto	
  a	
  gene's	
  exons	
  
N=	
  total	
  number	
  of	
  mappable	
  reads	
  in	
  the	
  experiment	
  
L(gene)=	
  the	
  sum	
  of	
  the	
  exons	
  in	
  base	
  pairs.	
  
	
  
Wold	
  (2008)	
  
	
  
RPKM	
  –	
  reads	
  per	
  kilo	
  base	
  per	
  million	
  
CPM	
  –	
  counts	
  per	
  million	
  
	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   61	
  
RNA-­‐Seq	
  Quan;fica;on	
  Challenge:	
  DESeq	
  Method	
  uses	
  
the	
  geometric	
  mean	
  of	
  counts	
  in	
  all	
  samples	
  
DESeq	
  Method:	
  
Construct	
  a	
  "reference	
  sample"	
  by	
  taking,	
  for	
  each	
  gene,	
  the	
  geometric	
  mean	
  
of	
  the	
  counts	
  in	
  all	
  samples.	
  
	
  
To	
  get	
  the	
  sequencing	
  depth	
  of	
  a	
  sample	
  rela;ve	
  to	
  the	
  reference,	
  calculate	
  
for	
  each	
  gene	
  the	
  quo;ent	
  of	
  the	
  counts	
  in	
  your	
  sample	
  divided	
  by	
  the	
  counts	
  
of	
  the	
  reference	
  sample.	
  
	
  
Now	
  you	
  have,	
  for	
  each	
  gene,	
  an	
  es;mate	
  of	
  the	
  depth	
  ra;o.	
  	
  
Simply	
  take	
  the	
  median	
  of	
  all	
  the	
  quo;ents	
  to	
  get	
  the	
  rela;ve	
  depth	
  of	
  the	
  
library.	
  
	
  
	
  'es;mateSizeFactors'	
  func;on	
  of	
  DESeq	
  package	
  does	
  this	
  calcula;on.	
  	
  
DESeq:	
  an	
  R	
  package	
  that	
  works	
  with	
  Raw	
  Counts	
  to	
  
determine	
  genes	
  differen;ally	
  expressed	
  across	
  samples	
  
•  Simon	
  Anders	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   62	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   63	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   64	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   65	
  
Given	
  a	
  list	
  of	
  differen;ally	
  expressed	
  Genes	
  now	
  
enrichment	
  analysis	
  should	
  be	
  performed	
  
•  Enrichment	
  analysis	
  allows	
  the	
  researcher	
  to	
  leverage	
  
documented	
  experiments	
  which	
  provide	
  evidence	
  for	
  genes	
  
roles	
  in	
  pathways	
  and	
  func;ons	
  that	
  enable	
  the	
  researcher	
  to	
  
determine	
  the	
  results	
  and	
  significance	
  of	
  their	
  experiments	
  
•  DAVID	
  
–  Gene	
  ontology	
  
–  Func;onal	
  ontology	
  
•  Revigo	
  
–  Output	
  of	
  David	
  may	
  be	
  placed	
  in	
  REVIGO	
  for	
  further	
  
interpreta;on	
  and	
  sta;s;cal	
  explora;on	
  of	
  significance	
  of	
  
discovered	
  sets	
  of	
  genes	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   66	
  
Using	
  differen;ally	
  expressed	
  genes,	
  biological	
  
pathways	
  should	
  be	
  explored	
  
•  Differen;ally	
  expressed	
  genes	
  are	
  put	
  into	
  programs	
  such	
  as	
  
pathway	
  studio	
  or	
  ingenuity	
  
•  Shortest	
  path	
  programs	
  and	
  
•  Canonical	
  pathway	
  analysis	
  
•  Enables	
  a	
  researcher	
  to	
  reverse	
  engineer	
  the	
  pathways	
  
expressed	
  in	
  the	
  course	
  of	
  a	
  healthy	
  response	
  to	
  a	
  diseased	
  
response	
  
•  Ideally	
  a	
  pathway	
  reveals	
  the	
  observed	
  phenotype	
  –	
  
connec;ng	
  the	
  expressed	
  gene	
  expression	
  program	
  with	
  the	
  
phenotype	
  –	
  genotype	
  –	
  gene	
  expression	
  program	
  to	
  
phenotype	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   67	
  
RNA-­‐Sequencing:	
  What	
  is	
  it	
  good	
  for?	
  
•  Transcript	
  Annota;on	
  
–  Muta;on	
  iden;fica;on	
  
–  Isoform	
  determina;on	
  
–  Alterna;ve	
  Splice	
  Varia;on	
  
•  Differen;al	
  Gene	
  Expression	
  
–  Phenotypically	
  segrega;ng	
  experiments	
  
–  Allows	
  us	
  to	
  get	
  at	
  the	
  How	
  in	
  looking	
  at	
  the	
  response	
  of	
  
an	
  organism	
  within	
  a	
  par;cular	
  cell	
  popula;on	
  to	
  events	
  
–  Good	
  and	
  careful	
  design	
  will	
  allow	
  us	
  to	
  unfold	
  the	
  
dynamics	
  of	
  this	
  response	
  and	
  iden;fy	
  targets	
  for	
  altering	
  
disease	
  responses	
  to	
  improve	
  ones	
  chances	
  of	
  surviving	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   68	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   69	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   70	
  
hLp://bayes.cs.ucla.edu/home.htm	
  	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   71	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   72	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   73	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   74	
  
Acknowledgements	
  
Dr.	
  Anton	
  Wellstein	
  
Dr.	
  Anna	
  Riegel	
  
	
  
Dr.	
  Marcel	
  Schmidt	
  
Dr.	
  Elena	
  Tassi	
  
The	
  en;re	
  lab:	
  	
  Elena,	
  Virginie,	
  Ghada,	
  Ivana,	
  Eveline,	
  Khalid,	
  Eric	
  the	
  en;re	
  Wellstein/Riegel	
  laboratory	
  	
  
	
  
My	
  CommiLee	
  	
  	
  
Dr.	
  Yuri	
  Gusev	
  
Dr.	
  Anatoly	
  Dritschilo	
  
Dr.	
  Michael	
  Johnson	
  
Dr.	
  Christopher	
  Loffredo	
  
Dr.	
  Habtom	
  Ressom	
  
Dr.	
  Terry	
  Ryan	
  (external	
  commiLee	
  member)	
  
	
  
High	
  Performance	
  Core	
  Group,	
  Steve	
  Moore,	
  especially	
  Woonki	
  Chung	
  
Amazon	
  Cloud	
  Services	
  
Dr.	
  Ann	
  Loraine,	
  UNC,	
  IGB	
  Developer	
  
Brian	
  Haas,	
  Author	
  Trinity	
  Suite	
  
	
  
	
  
Some	
  Resources	
  
•  hLp://rnaseq.uoregon.edu/index.html	
  
•  hLp://dx.doi.org/10.1038/npre.2010.4282.1	
  	
  (DESeq)	
  
•  hLp://galaxy.psu.edu/	
  
•  hLp://seqanswers.com/	
  
•  hLp://www.broadins;tute.org/igv/	
  
•  hLp://bioviz.org/igb/index.html	
  
•  hLp://www.illumina.com	
  
•  hLp://www.otogene;cs.com	
  
•  hLp://www.dnanexus.com	
  
•  hLp://bioconductor.org/packages/2.12/bioc/html/limma.html	
  
•  hLp://trinityrnaseq.sourceforge.net/	
  
•  hLp://trinityrnaseq.sourceforge.net/genome_guided_trinity.html	
  
•  hLp://cufflinks.cbcb.umd.edu/	
  
•  hLp://brb.nci.nih.gov/BRB-­‐ArrayTools.html	
  
•  hLp://www.modernatx.com/	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   75	
  
Systems	
  Biology	
  History	
  (wikipedia)	
  
•  Systems	
  biology	
  roots	
  found	
  in	
  
–  Quan;ta;ve	
  modeling	
  of	
  enzyme	
  kine;cs	
  
–  Mathema;cal	
  modeling	
  of	
  popula;on	
  growth	
  
–  Simula;ons	
  to	
  study	
  neurophysiology	
  
–  Control	
  theory	
  and	
  cyberne;cs	
  
•  Theorists	
  
–  Ludwig	
  von	
  Bertalanffy	
  –	
  General	
  Systems	
  Theory	
  
–  Alan	
  Lloyd	
  Hodgkin	
  and	
  Andrew	
  Fielding	
  Huxley	
  –	
  constructed	
  a	
  
mathema;cal	
  model	
  that	
  explained	
  poten;al	
  propaga;ng	
  along	
  the	
  
axon	
  of	
  a	
  neuron	
  cell	
  
–  Denis	
  Nobel	
  –	
  first	
  computer	
  model	
  of	
  the	
  heart	
  Pacemaker	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   76	
  
Scien;fic	
  knowledge	
  is	
  limited	
  (and	
  advanced)	
  by	
  the	
  
limits	
  (and	
  advancements)	
  of	
  measurement	
  
7/25/13	
   Wellstein/Riegel	
  Laboratory	
   77	
  
•  Ilya	
  Shmulevich	
  Genomic	
  Signal	
  Processing	
  “Validity	
  of	
  the	
  
model	
  involves	
  observa;on	
  and	
  measurement,	
  scien;fic	
  
knowledge	
  is	
  limited	
  by	
  the	
  limits	
  of	
  measurement”	
  
•  Erwin	
  Shrödinger	
  Science	
  Theory	
  and	
  Man:	
  “It	
  really	
  is	
  the	
  
ul;mate	
  purpose	
  of	
  all	
  schemes	
  and	
  models	
  to	
  serve	
  as	
  
scaffolding	
  for	
  any	
  observa;ons	
  that	
  are	
  at	
  all	
  means	
  
observable”	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...Torsten Seemann
 
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015Torsten Seemann
 
Sample to Insight: RNA Samples Infographic
Sample to Insight: RNA Samples InfographicSample to Insight: RNA Samples Infographic
Sample to Insight: RNA Samples InfographicQIAGEN
 
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014Torsten Seemann
 
Biosafety in Gene Therapy: Applying the latest regulatory guidance for RCL te...
Biosafety in Gene Therapy: Applying the latest regulatory guidance for RCL te...Biosafety in Gene Therapy: Applying the latest regulatory guidance for RCL te...
Biosafety in Gene Therapy: Applying the latest regulatory guidance for RCL te...Merck Life Sciences
 
C dna and genomic libraries amirtham
C dna and genomic libraries   amirthamC dna and genomic libraries   amirtham
C dna and genomic libraries amirthamchristanantony
 
Illumina (sequencing by synthesis) method
Illumina (sequencing by synthesis) methodIllumina (sequencing by synthesis) method
Illumina (sequencing by synthesis) methodFekaduKorsa
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingVishal Pandey
 
Curriculum Vitae Justin Villarreal
Curriculum Vitae Justin VillarrealCurriculum Vitae Justin Villarreal
Curriculum Vitae Justin VillarrealJustin Villarreal
 
[論文紹介] Asgardアーキアのゲノムはアクチンを制御するプロフィリンをコードしている (Genomes of Asgard archaea enco...
[論文紹介] Asgardアーキアのゲノムはアクチンを制御するプロフィリンをコードしている (Genomes of Asgard archaea enco...[論文紹介] Asgardアーキアのゲノムはアクチンを制御するプロフィリンをコードしている (Genomes of Asgard archaea enco...
[論文紹介] Asgardアーキアのゲノムはアクチンを制御するプロフィリンをコードしている (Genomes of Asgard archaea enco...Shohei Nagata
 
Mo's Barret Presentation
Mo's Barret PresentationMo's Barret Presentation
Mo's Barret PresentationMills Cbst
 
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...QIAGEN
 
Illumina infinium sequencing
Illumina infinium sequencingIllumina infinium sequencing
Illumina infinium sequencingAyush Jain
 
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Torsten Seemann
 
Long-read: assets and challenges of a (not so) emerging technology
Long-read: assets and challenges of a (not so) emerging technologyLong-read: assets and challenges of a (not so) emerging technology
Long-read: assets and challenges of a (not so) emerging technologyClaire Rioualen
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisJosh Neufeld
 

Was ist angesagt? (20)

Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
 
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
 
Mason abrf single_cell_2017
Mason abrf single_cell_2017Mason abrf single_cell_2017
Mason abrf single_cell_2017
 
Sample to Insight: RNA Samples Infographic
Sample to Insight: RNA Samples InfographicSample to Insight: RNA Samples Infographic
Sample to Insight: RNA Samples Infographic
 
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
 
Biosafety in Gene Therapy: Applying the latest regulatory guidance for RCL te...
Biosafety in Gene Therapy: Applying the latest regulatory guidance for RCL te...Biosafety in Gene Therapy: Applying the latest regulatory guidance for RCL te...
Biosafety in Gene Therapy: Applying the latest regulatory guidance for RCL te...
 
C dna and genomic libraries amirtham
C dna and genomic libraries   amirthamC dna and genomic libraries   amirtham
C dna and genomic libraries amirtham
 
Illumina (sequencing by synthesis) method
Illumina (sequencing by synthesis) methodIllumina (sequencing by synthesis) method
Illumina (sequencing by synthesis) method
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Curriculum Vitae Justin Villarreal
Curriculum Vitae Justin VillarrealCurriculum Vitae Justin Villarreal
Curriculum Vitae Justin Villarreal
 
[論文紹介] Asgardアーキアのゲノムはアクチンを制御するプロフィリンをコードしている (Genomes of Asgard archaea enco...
[論文紹介] Asgardアーキアのゲノムはアクチンを制御するプロフィリンをコードしている (Genomes of Asgard archaea enco...[論文紹介] Asgardアーキアのゲノムはアクチンを制御するプロフィリンをコードしている (Genomes of Asgard archaea enco...
[論文紹介] Asgardアーキアのゲノムはアクチンを制御するプロフィリンをコードしている (Genomes of Asgard archaea enco...
 
Mo's Barret Presentation
Mo's Barret PresentationMo's Barret Presentation
Mo's Barret Presentation
 
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
 
Dna
DnaDna
Dna
 
Hireme!!!
Hireme!!!Hireme!!!
Hireme!!!
 
Illumina infinium sequencing
Illumina infinium sequencingIllumina infinium sequencing
Illumina infinium sequencing
 
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
 
Long-read: assets and challenges of a (not so) emerging technology
Long-read: assets and challenges of a (not so) emerging technologyLong-read: assets and challenges of a (not so) emerging technology
Long-read: assets and challenges of a (not so) emerging technology
 
Practicum 2.0
Practicum 2.0Practicum 2.0
Practicum 2.0
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 

Ähnlich wie 2013 july 25 systems biology rna seq v2

2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...Anne Deslattes Mays
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizeAnn Loraine
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqTimothy Tickle
 
Biosafety in Gene Therapy: Applying the latest regulatory guidance for RCL te...
Biosafety in Gene Therapy: Applying the latest regulatory guidance for RCL te...Biosafety in Gene Therapy: Applying the latest regulatory guidance for RCL te...
Biosafety in Gene Therapy: Applying the latest regulatory guidance for RCL te...MilliporeSigma
 
What I learned at CSHL SynBio 2013.
What I learned at CSHL SynBio 2013.What I learned at CSHL SynBio 2013.
What I learned at CSHL SynBio 2013.Kevin Spring
 
next generation sequemcing
next generation sequemcingnext generation sequemcing
next generation sequemcingShahzebkhan135
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingBabar khan
 
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA SequencingEVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA SequencingJonathan Eisen
 
132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaquesSHAPE Society
 
Cignal lenti webinar
Cignal lenti webinarCignal lenti webinar
Cignal lenti webinarElsa von Licy
 
CRISPR bacterial transformation mixes
CRISPR bacterial transformation mixesCRISPR bacterial transformation mixes
CRISPR bacterial transformation mixesJohn Kozlosky
 
Working as a Lab Technician
Working as a Lab TechnicianWorking as a Lab Technician
Working as a Lab TechnicianChris Willmott
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing priyanka raviraj
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingShaheen Alam
 
Fluorescence- Activated Cell Sorter (FACS)
Fluorescence- Activated Cell Sorter (FACS)Fluorescence- Activated Cell Sorter (FACS)
Fluorescence- Activated Cell Sorter (FACS)Nidhi Parikh
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation SequencingAtifa Ambreen
 

Ähnlich wie 2013 july 25 systems biology rna seq v2 (20)

2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualize
 
2013 oct 2 rna sequencing
2013 oct 2 rna sequencing2013 oct 2 rna sequencing
2013 oct 2 rna sequencing
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seq
 
Biosafety in Gene Therapy: Applying the latest regulatory guidance for RCL te...
Biosafety in Gene Therapy: Applying the latest regulatory guidance for RCL te...Biosafety in Gene Therapy: Applying the latest regulatory guidance for RCL te...
Biosafety in Gene Therapy: Applying the latest regulatory guidance for RCL te...
 
What I learned at CSHL SynBio 2013.
What I learned at CSHL SynBio 2013.What I learned at CSHL SynBio 2013.
What I learned at CSHL SynBio 2013.
 
next generation sequemcing
next generation sequemcingnext generation sequemcing
next generation sequemcing
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA SequencingEVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
 
132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques132 gene expression in atherosclerotic plaques
132 gene expression in atherosclerotic plaques
 
Overview of Single-Cell RNA-seq
Overview of Single-Cell RNA-seqOverview of Single-Cell RNA-seq
Overview of Single-Cell RNA-seq
 
Cignal lenti webinar
Cignal lenti webinarCignal lenti webinar
Cignal lenti webinar
 
Biological technologies
Biological technologiesBiological technologies
Biological technologies
 
CRISPR bacterial transformation mixes
CRISPR bacterial transformation mixesCRISPR bacterial transformation mixes
CRISPR bacterial transformation mixes
 
Working as a Lab Technician
Working as a Lab TechnicianWorking as a Lab Technician
Working as a Lab Technician
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Fluorescence- Activated Cell Sorter (FACS)
Fluorescence- Activated Cell Sorter (FACS)Fluorescence- Activated Cell Sorter (FACS)
Fluorescence- Activated Cell Sorter (FACS)
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Aptamers as Drug of future
Aptamers as Drug of futureAptamers as Drug of future
Aptamers as Drug of future
 

Kürzlich hochgeladen

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Kürzlich hochgeladen (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

2013 july 25 systems biology rna seq v2

  • 1. Cancer  Systems  Biology:   RNA-­‐Seq  and  Differen;al  Expression  Analysis   Taking  advantage  of  a  Measurement  Revolu;on   July  25,  2013   Anne  DeslaLes  Mays   Wellstein/Riegel  Laboratory   Mentor:  Anton  Wellstein,  MD,  PhD   7/25/13   Wellstein/Riegel  Laboratory   1  
  • 2. Talk  Outline   •  On  the  Shoulders  of  Giants   •  Sequencing  Timeline   •  RNASeq  for  Everyone   •  RNA-­‐Sequencing  Details   •  Differen;al  Expression  Analysis   •  Causality   •  Cancer  Therapeu;cs  Example   •  Ask  Bigger  Ques;ons  –  Sequencing  Everything     7/25/13   Wellstein/Riegel  Laboratory   2  
  • 3. 7/25/13   Wellstein/Riegel  Laboratory   3   Rosalind  Franklin   “pioneered  use  of  x-­‐rays  to  create  images  of  unorganized  maLer  –  such  as   large  biological  molecules  –  not  just  single  crystals”   hLp://www.pbs.org/wgbh/aso/databank/entries/bofran.html   “Franklin  made  equipment  adjustments  to  produce  an  extremely  fine  beam  of  x-­‐rays.     She  extracted  finer  DNA  fibers  than  ever  before  and  arranged  them  in  parallel   bundles.    Studied  fibers’  reac;ons  to  humid  condi;ons.  …  allowed  her  to  discover   cruical  keys  to  DNA’s  structure….  Wilkins  shared  this  with  Watson  &  Crick  at   Cambridge  without  her  knowledge…”  
  • 4. 7/25/13   Wellstein/Riegel  Laboratory   4   Sequencing  Timeline  
  • 5. 7/25/13   Wellstein/Riegel  Laboratory   5   Human  Sequencing  Timeline   Key  Technical  Advances:    Celera  Human  Sequence  done  in  one  loca;on   on  the  largest  super  computer  in  private  hands  at  that  ;me  
  • 6. 7/25/13   Wellstein/Riegel  Laboratory   6  
  • 7. 7/25/13   Wellstein/Riegel  Laboratory   7  
  • 8. 7/25/13   Wellstein/Riegel  Laboratory   8  
  • 9. 7/25/13   Wellstein/Riegel  Laboratory   9  
  • 10. 7/25/13   Wellstein/Riegel  Laboratory   10  
  • 11. 7/25/13   Wellstein/Riegel  Laboratory   11  
  • 12. 7/25/13   Wellstein/Riegel  Laboratory   12  
  • 13. Cancer  Systems  Biology   Taking  advantage  of  measurement  revolu3on   Declining  sequencing  costs,  decreasing  compu3ng  costs   How  do  you  leverage  all  this  data?   GEO May 25, 2012 GEO June 25, 2013
  • 14. Here  is  an  example  RNA-­‐Seq  Workflow   7/25/13   Wellstein/Riegel  Laboratory   14   Experimental   Design   Sample   Collec;on   Quality  Control   Read  Trimming   Differen;al   Analysis   Transcript   Iden;fica;on   Pathway   Analysis   Feature   Discovery   Sequencing  
  • 15. 7/25/13   Wellstein/Riegel  Laboratory   15   hLp://rnaseq.uoregon.edu/index.html  
  • 16. 7/25/13   Wellstein/Riegel  Laboratory   16   hLp://rnaseq.uoregon.edu/index.html  
  • 17. 7/25/13   Wellstein/Riegel  Laboratory   17   hLp://rnaseq.uoregon.edu/index.html  
  • 18. 7/25/13   Wellstein/Riegel  Laboratory   18   hLp://rnaseq.uoregon.edu/index.html  
  • 19. 7/25/13   Wellstein/Riegel  Laboratory   19  hLp://rnaseq.uoregon.edu/index.html  
  • 20. 7/25/13   Wellstein/Riegel  Laboratory   20   hLp://rnaseq.uoregon.edu/index.html  
  • 21. 7/25/13   Wellstein/Riegel  Laboratory   21   hLp://rnaseq.uoregon.edu/index.html  
  • 22. Replicates:    Type  I  and  Type  II  errors   7/25/13   Wellstein/Riegel  Laboratory   22  
  • 23. Detec;ng  Signal  vs.  Noise   7/25/13   Wellstein/Riegel  Laboratory   23  
  • 24. 7/25/13   Wellstein/Riegel  Laboratory   24  
  • 25. What  is  the  goal  of  the  sequencing   experiment?   7/25/13   Wellstein/Riegel  Laboratory   25  
  • 26. 7/25/13   Wellstein/Riegel  Laboratory   26  
  • 27. 7/25/13   Wellstein/Riegel  Laboratory   27  
  • 28. 7/25/13   Wellstein/Riegel  Laboratory   28   Before  Library  Construc;on   1.  Most  vendors  and  cores  will  assess   the  quality  of  the  RNA  before   sequencing   2.  Important  to  determine  before   sequencing  begins   Garbage  –  in  ==  Garbage  out   Before  library  construc;on,  RNA  quality  must  be  assessed  
  • 29. 7/25/13   Wellstein/Riegel  Laboratory   29   RNA-­‐seq  
  • 30. 7/25/13   Wellstein/Riegel  Laboratory   30   Three  steps  to  get  to  a  fresh  sequence  with  the   Illumina  Genome  Sequence  Analyzer   •  Library  genera;on   •  Cluster  genera;on   •  Sequencing  
  • 31. 7/25/13   Wellstein/Riegel  Laboratory   31   Before  Library  Construc;on   1.  Poly-­‐A  Selec;on  (Total  RNA  -­‐>   mRNA)   2.  mRNA  fragmenta;on   3.  First  strand  synthesis  (here  we  stop   if  we  want  to  maintain  strand   specificity   4.  Second  strand  synthesis   Other  techniques   1.  Ribozero   2.  Ribominus   Library  Construc;on:    Messenger  RNA  are  Poly-­‐A  selected   from  Total  RNA,  fragmented  and  cDNA  synthesized  
  • 32. 7/25/13   Wellstein/Riegel  Laboratory   32   cDNA  (single  or  double  stranded)   1.  cDNA  is  blunt  end-­‐repaired  and   phosphorylated  (B.)   2.  A-­‐base  added  to  prepare  for   indexed  adapter  liga;on  (C.)     Library  Construc;on:  End  repair  and  adenyla;on  results  in   adapter  liga;on  ready  constructs  
  • 33. 7/25/13   Wellstein/Riegel  Laboratory   33   Index  adapter  liga;on  and  product   ready  for  amplifica;on  on  cBot  or   the  cluster  sta;on   1.  Strand  specific  tags  are  added  to   the  A  base  –  ligate  index  adapter   (D)   2.  Denature  and  amplify  for  final   product  (E)     Library  Construc;on:  Adapter  liga;on  results  in  cluster-­‐ genera;on-­‐ready  constructs  
  • 34. 7/25/13   Wellstein/Riegel  Laboratory   34   Single  DNA  molecules  hybridize  to   the  lawn  of  oligos  graped  to  the   surface  of  the  flow  cell   1.  Oligo  lawn   2.  Oligos  hybridize  to  the  adapters   that  had  been  ligated  to  the   library  fragments  which  flow   through  the  cell       Cluster  Genera;on:  In  the  illumina  Cbot  system,  single  molecules  are   isothermally  amplified  in  a  flow  cell  to  prepare  them  for  sequencing  
  • 35. 7/25/13   Wellstein/Riegel  Laboratory   35   Bridge  amplifica;ons  resul;ng  in   100s  of  millions  of  unique  clusters   1.  Each  fragment  is  clonally   amplified  through  a  series  of   extensions  and  isothermal  bridge   amplifica;ons   2.  Reverse  strands  cleaved  and   washed  away   3.  Ends  are  blocked   4.  Sequencing  primer  hybridized  to   the  DNA  template   5.  Libraries  are  ready  for   sequencing       Cluster  genera;on:    Bound  fragments  are  extended  to  make   copies  and  reverse  strands  cleaved  and  washed  away  
  • 36. 7/25/13   Wellstein/Riegel  Laboratory   36   4  fluorescently  labeled  reversibly   terminated  nucleo;des   1.  Each  base  competes  for  addi;on   2.  Natural  compe;;on  ensures   highest  accuracy   3.  Aper  each  round  of  synthesis,   clusters  are  excited  by  a  laser   emiqng  a  color  that  iden;fies   the  newly  added  base   4.  Fluorescent  label  and  blocking   group  are  removed  allowing  for   addi;on  of  next  nucleo;de   5.  Proprietary  (Illumina)  chemistry   reads  a  base  in  each  cycle   6.  Allows  for  accurate  sequencing   through  difficult  regions  such  as   homopolymers  and  repe;;ve   sequence   Sequencing:    100s  of  millions  of  clusters  sequenced   simultaneously  
  • 37. There  are  other  ways  to  Inquire  about  the   Transcriptome   •  Array  Based  Technologies   –  Affymetrix   –  Agilent   –  Known  genes  and  hybridiza;on  protocols   •  Microarray   –  20,000+  array  experiments  on  a  single  platorm   –  Edge  effects   –  False  posi;ves  /  false  nega;ves   •  Bead-­‐based  arrays   •  Tiling  arrays   •  SAGE   7/25/13   Wellstein/Riegel  Laboratory   37  
  • 38. What  is  unique  about  RNA-­‐Seq?     •  Allows  you  to  discover  and  profile  the  en;re  transcriptome  of   any  organism   •  No  probes  or  primers  to  design   •  Novel  transcripts   •  Novel  isoforms   •  Alterna;ve  splice  sites   •  Rare  transcripts   •  cSNPS  –  all  of  this  in  one  experiment   7/25/13   Wellstein/Riegel  Laboratory   38  
  • 39. 7/25/13   Wellstein/Riegel  Laboratory   39   Aper  sequencing…   1.  Quality  control  –  trim  your  reads   2.  Count  Reads   •  Align  to  genome   •  Align  to  transcriptome   3.  Interpret  Data   •  Sta;s;cal  tests  (differen;al   expression  analysis)   •  Visualiza;on  (mapped   reads)   •  Pathway  analysis     Not  so  simple  –  big  data,  big   compute  requirements     Aper  sequencing,  we  must  then  perform     RNA-­‐Seq  Data  Analysis  
  • 40. 7/25/13   Wellstein/Riegel  Laboratory   40  
  • 41. 7/25/13   Wellstein/Riegel  Laboratory   41  
  • 42. RNASeq flow chart – reference (steps 1-4): http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html Step 1: align-reads: FASTQ    PE*  reads   Reference   Genome     Assembly   WGS   Exis;ng   Gene  models   (gt  files  w/  tss  ids)*   Gene  models     mapped  to     reference   gsnap   trimmoma;c   FASTQC   trimmed    PE*  reads   Quality  control     consensus     per  read  length    graphs   •  Tss ids = transcription start site ids, in a gtf file format •  PE – paired end •  The gene models that are built with the pasa pipeline can be input to tophat Shadeless    rectangle   An unshaded rectangle represents code to be run – a process Shaded    rectangle   A shaded rectangle is a file or a graphic which may be an input and/ or an output Legend   Gsnap  aligned   Bam  files   Dark  rectangle   Dark rectangle represents a file that can be displayed as a track in crop-pedia Align-reads: Gsnap is used to align reads to the genome sequence. samtools   Gsnap.CoordSorted.bam  
  • 43. RNA  Alterna;ve  Splicing:  Why  you   need  gapped  aligners   7/25/13   Wellstein/Riegel  Laboratory   43  
  • 44. RNASeq flow chart – reference (steps 1-4): http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html Step 2: assemble-reads: Prep_rnaseq_   alignments_for     genome_assisted_   assembly.pl   •  Tss ids = transcription start site ids, in a gtf file format •  PE – paired end •  The gene models that are built with the pasa pipeline can be input to tophat Shadeless    rectangle   An unshaded rectangle represents code to be run – a process Shaded    rectangle   A shaded rectangle is a file or a graphic which may be an input and/ or an output Legend   Dark  rectangle   Dark rectangle represents a file that can be displayed as a track in crop-pedia assemble-reads: Trinity is used to assemble the RNA-Seq reads in each partition. This can be done in a massiviely parallel manner, typically requiring little RAM as compared to whole de novo RNA-Seq assemblies, and can be executed using standard hardware. The firs step (pre_rnaseq_alignments_for genome_assisted_assembly.pl – partitions the reads according to covered regions Gsnap.CoordSorted.bam   Find  Dir_*  -­‐name    “*reads”  >  read_files.list   Read_files.list   GG_write_trinity_   cmds.pl   ParaFly   Trinity_GG.cmds   Find  Dir_*  -­‐name    “*inity.fasta”  –exec  cat  {}  |     Inchworm_accession_incrementer.pl  >   Trinity_GG.fasta   Trinity_GG.fasta  
  • 45. RNASeq flow chart – reference (steps 1-4): http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html Steps 3 and 4: align-transcripts and assemble-transcript alignments Launch_PASA_pipeline.pl   •  Tss ids = transcription start site ids, in a gtf file format •  PE – paired end •  The gene models that are built with the pasa pipeline can be input to tophat Shadeless    rectangle   An unshaded rectangle represents code to be run – a process Shaded    rectangle   A shaded rectangle is a file or a graphic which may be an input and/ or an output Legend   Dark  rectangle   Dark rectangle represents a file that can be displayed as a track in crop-pedia Trinity_GG.fasta   Pasa_databasename   .pasa_assemblies.denovo_   transcript_isoforms.gt   Pasa_databasename   .pasa_assemblies.denovo_   transcript_isoforms.bed   Pasa_databasename   .pasa_assemblies.denovo_   transcript_isoforms.gff3   Pasa_databasename   .pasa_assemblies.denovo_   transcript_isoforms.fasta  
  • 46. RNASeq flow chart – Step 5 – Tuxedo Suite – using the output of the trinity-genome-guided assembly and the pasa and keygene annotation pipelines à call tuxedo suite (in parallel with then calling the abundancy estimator RSEM •  Tss ids = transcription start site ids, in a gtf file format •  PE – paired end •  The gene models that are built with the pasa pipeline can be input to tophat Shadeless    rectangle   An unshaded rectangle represents code to be run – a process Shaded    rectangle   A shaded rectangle is a file or a graphic which may be an input and/ or an output Legend   Dark  rectangle   Dark rectangle represents a file that can be displayed as a track in crop-pedia         Gff3  (gene  model)         Gff3togt   (convert  to  gt  format           Gt  (gene  model)         tophat   Calls    Bow;e2           Junc;ons.bed         Accepted.hits.   sam  
  • 47. RNASeq Quantitation and Differential Analysis •  Tss ids = transcription start site ids, in a gtf file format •  PE – paired end •  The gene models that are built with the pasa pipeline can be input to tophat Shadeless    rectangle   An unshaded rectangle represents code to be run – a process Shaded    rectangle   A shaded rectangle is a file or a graphic which may be an input and/ or an output Legend   Quantitation (matrix file with counts per isoform) Model building/Differential analysis Trinity.fasta   Dark  rectangle   Dark rectangle represents a file that can be displayed as a track in crop-pedia Tuxedo suite Trinity genome guided assembly Abundance     es;ma;on   RSEM   Transcripts   .gt/.gff*   trimmed    PE*  reads   RSEM.isoform.   results   Limma  Model   Design/contrast   matrix     building   randomForest     pcAlg   Genie3.R   DREAM4   Accepted.hits.   sam   cuffdiff2   •  Transcript annotation file produced by cufflinks, cuffcompare or other source •  Counts and read group tracking files also created Isoforms.fpkm_tracking   Genes.fpkm.tracking   Cds.fpkm.tracking   Tss_groups.fpkm.tracking   Isoform_exp.diff   Gene_exp.diff   Tss_group_exp.diff   Cds_exp.diff  
  • 48. 7/25/13   Wellstein/Riegel  Laboratory   48   How  much  RNA-­‐sequencing  data?   1.  20  million  paired  end  reads  ~  2  GB  of  data   2.  100  million  paired  end  reads  ~  10  GB  of  data     How  much  computa;on  power?   1.  More  memory,  more  processors,  less  ;me  it  takes  to  compute   2.  Outsource  the  analysis,  s;ll  will  need  to  store  the  results  somewhere   Amazon  web  services   S3  storage   EC  elas;c  cloud  on  demand  computa;onal  facility     Georgetown  University  High  Performance  Computer  Core   matrix.georgetown.edu     UPENN  Galaxy  services         How  much  RNA-­‐sequencing  data,  how  much  computa;on   power  and  where  do  you  go  to  compute?  
  • 49. 7/25/13   Wellstein/Riegel  Laboratory   49   A  growing  number  of  tools  enable  RNA-­‐Seq  analysis  
  • 50. 7/25/13   Wellstein/Riegel  Laboratory   50   What  percentage  of  reads  are  covered?  What   percentage  of  reads  are  mapped?   3’  Bias  on  transcript  reads   1.  60-­‐80%  of  reads  are  mapped   2.  Highest  percentage  or  3’  end  of   reads  are  mapped   3.  Reads  need  to  be  quality  trimmed   Mapping  tools  bias  exons  to  known   genes        
  • 51. 7/25/13   Wellstein/Riegel  Laboratory   51   Galaxy  is  a  web  based  tool  commiLed  to  enable  a   researcher  (more  than  just  for  RNA-­‐Seq)  
  • 52. 7/25/13   Wellstein/Riegel  Laboratory   52  
  • 53. How  to  visualize  mapped  results?   •  UCSC  Genome  Browser  (Gbrowse)   •  Integrated  Genome  Browser  (IGB)   •  Integrated  Genome  Viewer  (IGV)   Many  shared  formats,  reading  many  of  the  outputs  generated  by   the  programs,  ability  to  generate  ones  own  tracks   7/25/13   Wellstein/Riegel  Laboratory   53  
  • 54. 7/25/13   Wellstein/Riegel  Laboratory   54   Scale chr21: DNase Clusters Multiz Align Human mRNAs K562 CTCF Int 1 K562 Pol2 Int 1 HeLaS3 Pol2 Int 1 GM12878 1 H1-hESC 1 K562 1 HeLa-S3 1 HepG2 1 GM12878 H1-hESC K562 HeLa-S3 HepG2 HUVEC GM12878 Pk H1-hESC Pk K562 Pk HeLa-S3 Pk 50 kb hg19 23,600,000 23,650,000 C7 Random C7 Targeted Transcription Factor ChIP-seq from ENCODE SwitchGear Genomics Transcription Start Sites H3K27Ac Mark (Often Found Near Active Regulatory Elements) on 7 cell lines from ENCODE RefSeq Genes Human ESTs That Have Been Spliced Digital DNaseI Hypersensitivity Clusters in 125 cell types from ENCODE Vertebrate Multiz Alignment & Conservation (46 Species) UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) Simple Nucleotide Polymorphisms (dbSNP 137) Found in >= 1% of Samples Individual matches for article Przybylski2010 Sequences in Articles: PubmedCentral and Elsevier SNPs in Publications Human mRNAs from GenBank Regulatory elements from ORegAnno Chromatin Interaction Analysis Paired-End Tags (ChIA-PET) from ENCODE/GIS-Ruan DNA Methylation by Reduced Representation Bisulfite Seq from ENCODE/HudsonAlpha CpG Methylation by Methyl 450K Bead Arrays from ENCODE/HAIB Chromatin Interactions by 5C from ENCODE/Dekker Univ. Mass. HWI-ST1129:97:D0LRDACXX:6:2208:3356:23592_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2208:3356:23592_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2204:15017:145130_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2204:15017:145130_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2107:8319:79365_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2107:8319:79365_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2107:12368:117403_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2107:12368:117403_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2208:7212:116648_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2208:7212:116648_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2205:11321:72079_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:1203:1649:66972_1:N:0:CTCTCA HWI-ST1129:97:D0LRDACXX:6:1203:1649:66972_2:N:0:CTCTCA HWI-ST1129:97:D0LRDACXX:6:2106:11187:101221_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2106:11187:101221_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2102:8052:88370_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2102:8052:88370_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2108:5000:141429_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2108:5000:141429_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:1303:16417:184679_2:N:0:CACTCC HWI-ST1129:97:D0LRDACXX:6:1303:16417:184679_1:N:0:CACTCC HWI-ST1129:97:D0LRDACXX:6:2106:18235:74385_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2106:18235:74385_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2201:15196:5280_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2201:15196:5280_1:N:0:CACTCA HWI-ST1129:299:C18KJACXX:6:1305:12160:63303_1:N:0:ATCACG HWI-ST1129:299:C18KJACXX:6:1102:19732:75986_1:N:0:ATCACG HWI-ST1129:299:C18KJACXX:6:1305:12160:63303_2:N:0:ATCACG HWI-ST1129:299:C18KJACXX:6:1102:19732:75986_2:N:0:ATCACG KCEBPB LMafK_(ab50322) KTAL1_(SC-12984) KCEBPB KKYY1 KTBP KE2F4 KTAF1 KELF1_(SC-631) KPol2-4H8 KHEY1 KE2F6_(H-50) KCEBPB KTFIIIC-110 ggNFKB GgPU.1 GBATF GIRF4_(M-17) GBCL11A GgPU.1 gPU.1 KCEBPB DA743484 BF207587 Delgado-Olguin2004 Layered H3K27Ac 100 _ 0 _ Mammal Cons K562 CTCF Sig 1 K562 Pol2 Sig 1 HeLaS3 Pol2 Sig 1
  • 55. 7/25/13   Wellstein/Riegel  Laboratory   55  
  • 56. 7/25/13   Wellstein/Riegel  Laboratory   56  
  • 57. What  do  RNA-­‐Seq  reads  look  like  for  GAPDH?   Repeat  masked  allowing  1/2  mismatched  bases  blat’d  reads     viewed  in  IGB  6.7.2  
  • 58. 7/25/13   Wellstein/Riegel  Laboratory   58   RNA-­‐Seq  Differen;al   Expression  analysis  
  • 59. What  does  GAPDH  look  like  in  terms  of  quan;ta;on?   TOTAL  BM   HPP   RPKM   3SEQ  Counts   BLAT  Reads   RPKM   3SEQ  Counts   BLAT  Reads   CD34   0.7   340   230   8   8   14   BST1   19.7   5374       31   31       CD133   0.2   173   176   16   16   33   THY1   0   7       4   4       A12           1           0   A5           0           0   ALK   0   9   24   0   0   3   B9           0           0   C1           0           0   C2           0           0   C7           0           0   E7           0           0   E9           2           0   F6           0           0   G12           0           0   GAPDH   3013.2   727831   356289   120.8   5559   2670   H3           0           0   Blat  read  raw  counts  ra;o  ==  3Seq  counts  ra;o  ~=  130  to  1   RPKM  ra;o  ~=  24.3  
  • 60. 7/25/13   Wellstein/Riegel  Laboratory   60   RNA-­‐Seq  Quan;fica;on  Challenge:  A  problem  that   exists  with  RNA-­‐Seq  data  that  doesn’t  exist  with  array   data:    Longer  transcripts  produce  more  reads  than   shorter  transcripts   One  solu;on  to  account  for  this  is  RPKM  (FPKM  used  by  Cufflinks)     RPKM  =  10^9  x  C  /  NL,  which  is  really  just  simply  C/N     C(gene)=  the  number  of  mappable  reads  that  fall  onto  a  gene's  exons   N=  total  number  of  mappable  reads  in  the  experiment   L(gene)=  the  sum  of  the  exons  in  base  pairs.     Wold  (2008)     RPKM  –  reads  per  kilo  base  per  million   CPM  –  counts  per  million    
  • 61. 7/25/13   Wellstein/Riegel  Laboratory   61   RNA-­‐Seq  Quan;fica;on  Challenge:  DESeq  Method  uses   the  geometric  mean  of  counts  in  all  samples   DESeq  Method:   Construct  a  "reference  sample"  by  taking,  for  each  gene,  the  geometric  mean   of  the  counts  in  all  samples.     To  get  the  sequencing  depth  of  a  sample  rela;ve  to  the  reference,  calculate   for  each  gene  the  quo;ent  of  the  counts  in  your  sample  divided  by  the  counts   of  the  reference  sample.     Now  you  have,  for  each  gene,  an  es;mate  of  the  depth  ra;o.     Simply  take  the  median  of  all  the  quo;ents  to  get  the  rela;ve  depth  of  the   library.      'es;mateSizeFactors'  func;on  of  DESeq  package  does  this  calcula;on.    
  • 62. DESeq:  an  R  package  that  works  with  Raw  Counts  to   determine  genes  differen;ally  expressed  across  samples   •  Simon  Anders   7/25/13   Wellstein/Riegel  Laboratory   62  
  • 63. 7/25/13   Wellstein/Riegel  Laboratory   63  
  • 64. 7/25/13   Wellstein/Riegel  Laboratory   64  
  • 65. 7/25/13   Wellstein/Riegel  Laboratory   65  
  • 66. Given  a  list  of  differen;ally  expressed  Genes  now   enrichment  analysis  should  be  performed   •  Enrichment  analysis  allows  the  researcher  to  leverage   documented  experiments  which  provide  evidence  for  genes   roles  in  pathways  and  func;ons  that  enable  the  researcher  to   determine  the  results  and  significance  of  their  experiments   •  DAVID   –  Gene  ontology   –  Func;onal  ontology   •  Revigo   –  Output  of  David  may  be  placed  in  REVIGO  for  further   interpreta;on  and  sta;s;cal  explora;on  of  significance  of   discovered  sets  of  genes   7/25/13   Wellstein/Riegel  Laboratory   66  
  • 67. Using  differen;ally  expressed  genes,  biological   pathways  should  be  explored   •  Differen;ally  expressed  genes  are  put  into  programs  such  as   pathway  studio  or  ingenuity   •  Shortest  path  programs  and   •  Canonical  pathway  analysis   •  Enables  a  researcher  to  reverse  engineer  the  pathways   expressed  in  the  course  of  a  healthy  response  to  a  diseased   response   •  Ideally  a  pathway  reveals  the  observed  phenotype  –   connec;ng  the  expressed  gene  expression  program  with  the   phenotype  –  genotype  –  gene  expression  program  to   phenotype   7/25/13   Wellstein/Riegel  Laboratory   67  
  • 68. RNA-­‐Sequencing:  What  is  it  good  for?   •  Transcript  Annota;on   –  Muta;on  iden;fica;on   –  Isoform  determina;on   –  Alterna;ve  Splice  Varia;on   •  Differen;al  Gene  Expression   –  Phenotypically  segrega;ng  experiments   –  Allows  us  to  get  at  the  How  in  looking  at  the  response  of   an  organism  within  a  par;cular  cell  popula;on  to  events   –  Good  and  careful  design  will  allow  us  to  unfold  the   dynamics  of  this  response  and  iden;fy  targets  for  altering   disease  responses  to  improve  ones  chances  of  surviving   7/25/13   Wellstein/Riegel  Laboratory   68  
  • 69. 7/25/13   Wellstein/Riegel  Laboratory   69  
  • 70. 7/25/13   Wellstein/Riegel  Laboratory   70   hLp://bayes.cs.ucla.edu/home.htm    
  • 71. 7/25/13   Wellstein/Riegel  Laboratory   71  
  • 72. 7/25/13   Wellstein/Riegel  Laboratory   72  
  • 73. 7/25/13   Wellstein/Riegel  Laboratory   73  
  • 74. 7/25/13   Wellstein/Riegel  Laboratory   74   Acknowledgements   Dr.  Anton  Wellstein   Dr.  Anna  Riegel     Dr.  Marcel  Schmidt   Dr.  Elena  Tassi   The  en;re  lab:    Elena,  Virginie,  Ghada,  Ivana,  Eveline,  Khalid,  Eric  the  en;re  Wellstein/Riegel  laboratory       My  CommiLee       Dr.  Yuri  Gusev   Dr.  Anatoly  Dritschilo   Dr.  Michael  Johnson   Dr.  Christopher  Loffredo   Dr.  Habtom  Ressom   Dr.  Terry  Ryan  (external  commiLee  member)     High  Performance  Core  Group,  Steve  Moore,  especially  Woonki  Chung   Amazon  Cloud  Services   Dr.  Ann  Loraine,  UNC,  IGB  Developer   Brian  Haas,  Author  Trinity  Suite      
  • 75. Some  Resources   •  hLp://rnaseq.uoregon.edu/index.html   •  hLp://dx.doi.org/10.1038/npre.2010.4282.1    (DESeq)   •  hLp://galaxy.psu.edu/   •  hLp://seqanswers.com/   •  hLp://www.broadins;tute.org/igv/   •  hLp://bioviz.org/igb/index.html   •  hLp://www.illumina.com   •  hLp://www.otogene;cs.com   •  hLp://www.dnanexus.com   •  hLp://bioconductor.org/packages/2.12/bioc/html/limma.html   •  hLp://trinityrnaseq.sourceforge.net/   •  hLp://trinityrnaseq.sourceforge.net/genome_guided_trinity.html   •  hLp://cufflinks.cbcb.umd.edu/   •  hLp://brb.nci.nih.gov/BRB-­‐ArrayTools.html   •  hLp://www.modernatx.com/   7/25/13   Wellstein/Riegel  Laboratory   75  
  • 76. Systems  Biology  History  (wikipedia)   •  Systems  biology  roots  found  in   –  Quan;ta;ve  modeling  of  enzyme  kine;cs   –  Mathema;cal  modeling  of  popula;on  growth   –  Simula;ons  to  study  neurophysiology   –  Control  theory  and  cyberne;cs   •  Theorists   –  Ludwig  von  Bertalanffy  –  General  Systems  Theory   –  Alan  Lloyd  Hodgkin  and  Andrew  Fielding  Huxley  –  constructed  a   mathema;cal  model  that  explained  poten;al  propaga;ng  along  the   axon  of  a  neuron  cell   –  Denis  Nobel  –  first  computer  model  of  the  heart  Pacemaker   7/25/13   Wellstein/Riegel  Laboratory   76  
  • 77. Scien;fic  knowledge  is  limited  (and  advanced)  by  the   limits  (and  advancements)  of  measurement   7/25/13   Wellstein/Riegel  Laboratory   77   •  Ilya  Shmulevich  Genomic  Signal  Processing  “Validity  of  the   model  involves  observa;on  and  measurement,  scien;fic   knowledge  is  limited  by  the  limits  of  measurement”   •  Erwin  Shrödinger  Science  Theory  and  Man:  “It  really  is  the   ul;mate  purpose  of  all  schemes  and  models  to  serve  as   scaffolding  for  any  observa;ons  that  are  at  all  means   observable”