SlideShare a Scribd company logo
1 of 12
Differential expression in the paper
wasp Polistes dominula
Daniel S. Standage, Brendel Group Meeting, 17 Oct 2013


Experimental design
 6 queen samples
 6 worker samples
 Queen x and worker x from same colony

(x ∈ [1 .. 6])
 Hypothesis: identify handful of critical

caste-related genes/transcripts
Initial (naïve) analysis with RSEM/EBSeq
 209,675 transcripts (assembled by Trinity)
 RSEM and EBSeq completed without warnings
 80-85% reads mapped

 Many DE transcripts reported
 5,769 (FDR=.05)
 4,763 (FDR=.01)
 3,878 (FDR=.001)
Permutation testing
 Randomly shuffle caste labels (queen or worker)
 Re-run differential expression analysis
 Repeat test

 Compare number of transcripts reported as DE for

each permutation

https://github.com/standage/dept
Permutation testing
Real data

4,763 (FDR=.01)

Permutation 1

5,112

Permutation 2

4,174

Permutation 3

4,474

Permutation 4

4,307

Permutation 5

4,718

Permutation 6

4,312

Permutation 7

4,171

Permutation 8

4,714

Permutation 9

3,828

Permutation 10

5,192
Some observations
 Some expression levels very low
 Some transcripts had very few reads mapped
 Some transcripts had many read mapped

 Difficulty normalizing over large dynamic range?
Filter transcripts
 Reads mapped
 queen/worker reads mapped > 2,500
 overall reads < 1,000,000

 Samples
 4+ queen/worker samples with > 0 reads mapped

 Distribution of reads mapped
 mean(queen/worker reads mapped) * 0.9 >

stdev(queen/worker reads mapped)
DE analysis on filtered transcripts
 40,498 transcripts
 RSEM/EBSeq completed without warnings
 20-35% reads mapped

 Still many DE transcripts reported
 1,680 (FDR=.05)
 1,328 (FDR=.01)
 1,037 (FDR=.001)
PdomTSAr1.1-034114 (FC=126)
Sample

Expression

Reads mapped

Reads (adjusted)

Q1

0.00

5232

5669.09

Q2

0.00

10046

5148.89

Q3

51.18

9188

6644.97

Q4

136.68

7920

6901.36

Q5

698.51

27862

6712.76

Q6

0.00

2582

5739.05

W1

0.00

5866

6920.72

W2

0.00

2046

5029.50

W3

0.00

2628

5879.19

W4

0.00

4308

5022.74

W5

0.00

7396

5983.82

W6

0.00

9132

6467.88
PdomTSAr1.1-007723 (FC=2)
Sample

Expression

Reads mapped

Reads (adjusted)

Q1

198.82

928

1005.53

Q2

445.48

1864

955.36

Q3

335.03

1330

961.89

Q4

267.42

1048

913.21

Q5

908.57

3988

960.82

Q6

114.54

458

1018.00

W1

125.65

714

842.38

W2

0.00

318

781.71

W3

78.41

426

953.02

W4

116.07

650

757.84

W5

161.56

1028

831.72

W6

147.01

1262

893.83
RSEM expected count
 'expected_count' is the sum of the posterior

probability of each read comes from this transcript
over all reads. Because 1) each read aligning to this
transcript has a probability of being generated from
background noise; 2) RSEM may filter some
alignable low quality reads, the sum of expected
counts for all transcript are generally less than the
total number of reads aligned.
Next (final) steps
 Look more into “expected count”
 Additional filtering?
 Publish!

More Related Content

Viewers also liked

6 c flat plans
6 c   flat plans6 c   flat plans
6 c flat plans
MattyLane
 
Marbled zesty frog final
Marbled zesty frog finalMarbled zesty frog final
Marbled zesty frog final
Clarence Ng
 
Resume - David Lazarus 21.05.2015 - Demand Planner
Resume - David Lazarus 21.05.2015 - Demand PlannerResume - David Lazarus 21.05.2015 - Demand Planner
Resume - David Lazarus 21.05.2015 - Demand Planner
DAVID LAZARUS
 
Col credit suisse industrials conference final
Col credit suisse industrials conference finalCol credit suisse industrials conference final
Col credit suisse industrials conference final
rockwell_collins
 
Nano plastic
Nano plasticNano plastic
Nano plastic
moradwael
 

Viewers also liked (15)

6 c flat plans
6 c   flat plans6 c   flat plans
6 c flat plans
 
GSSD13 Solution Forum 2 ( UNIDO)
GSSD13 Solution Forum 2 ( UNIDO)GSSD13 Solution Forum 2 ( UNIDO)
GSSD13 Solution Forum 2 ( UNIDO)
 
Apokries 2015-16
Apokries 2015-16Apokries 2015-16
Apokries 2015-16
 
Marbled zesty frog final
Marbled zesty frog finalMarbled zesty frog final
Marbled zesty frog final
 
Asuhan keperawatan
Asuhan keperawatanAsuhan keperawatan
Asuhan keperawatan
 
HR Salon Bratislava
HR Salon BratislavaHR Salon Bratislava
HR Salon Bratislava
 
GSSD 13 Solution Forum 6 (UNECE) - UNECE International PPP Centre of Excellence
GSSD 13 Solution Forum 6 (UNECE) - UNECE International PPP Centre of ExcellenceGSSD 13 Solution Forum 6 (UNECE) - UNECE International PPP Centre of Excellence
GSSD 13 Solution Forum 6 (UNECE) - UNECE International PPP Centre of Excellence
 
Q1 fy16 quarterly earnings presentation
Q1 fy16 quarterly earnings presentation Q1 fy16 quarterly earnings presentation
Q1 fy16 quarterly earnings presentation
 
Resume - David Lazarus 21.05.2015 - Demand Planner
Resume - David Lazarus 21.05.2015 - Demand PlannerResume - David Lazarus 21.05.2015 - Demand Planner
Resume - David Lazarus 21.05.2015 - Demand Planner
 
Reporte sobre los animales invertebrados
Reporte sobre los animales invertebradosReporte sobre los animales invertebrados
Reporte sobre los animales invertebrados
 
A2 Media Evaluation Question 1
A2 Media Evaluation Question 1A2 Media Evaluation Question 1
A2 Media Evaluation Question 1
 
Práctica 5 integración metabólica
Práctica 5   integración metabólicaPráctica 5   integración metabólica
Práctica 5 integración metabólica
 
Col credit suisse industrials conference final
Col credit suisse industrials conference finalCol credit suisse industrials conference final
Col credit suisse industrials conference final
 
Nano plastic
Nano plasticNano plastic
Nano plastic
 
Dràcula
DràculaDràcula
Dràcula
 

Similar to Brendel Group Presentation: 17 Oct 2013

Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
AtreyiB
 
Comparison Of Commercially Available Str Typing Kits (Nx Power Lite)
Comparison Of Commercially Available Str Typing Kits (Nx Power Lite)Comparison Of Commercially Available Str Typing Kits (Nx Power Lite)
Comparison Of Commercially Available Str Typing Kits (Nx Power Lite)
Courtney Brennan
 
Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen H...
Towards Ultra-Large-Scale System:  Design of Scalable Software and Next-Gen H...Towards Ultra-Large-Scale System:  Design of Scalable Software and Next-Gen H...
Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen H...
Arghya Kusum Das
 
MidtermReview.pdfStatistics 411511Important Concepts an.docx
MidtermReview.pdfStatistics 411511Important Concepts an.docxMidtermReview.pdfStatistics 411511Important Concepts an.docx
MidtermReview.pdfStatistics 411511Important Concepts an.docx
ARIV4
 
PASCAL PASCAL CHALLENGE ON INFORMATION EXTRACTION
PASCAL PASCAL CHALLENGE ON INFORMATION EXTRACTIONPASCAL PASCAL CHALLENGE ON INFORMATION EXTRACTION
PASCAL PASCAL CHALLENGE ON INFORMATION EXTRACTION
butest
 
www1.cs.columbia.edu
www1.cs.columbia.eduwww1.cs.columbia.edu
www1.cs.columbia.edu
butest
 

Similar to Brendel Group Presentation: 17 Oct 2013 (14)

Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
 
Comparison Of Commercially Available Str Typing Kits (Nx Power Lite)
Comparison Of Commercially Available Str Typing Kits (Nx Power Lite)Comparison Of Commercially Available Str Typing Kits (Nx Power Lite)
Comparison Of Commercially Available Str Typing Kits (Nx Power Lite)
 
Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen H...
Towards Ultra-Large-Scale System:  Design of Scalable Software and Next-Gen H...Towards Ultra-Large-Scale System:  Design of Scalable Software and Next-Gen H...
Towards Ultra-Large-Scale System: Design of Scalable Software and Next-Gen H...
 
MidtermReview.pdfStatistics 411511Important Concepts an.docx
MidtermReview.pdfStatistics 411511Important Concepts an.docxMidtermReview.pdfStatistics 411511Important Concepts an.docx
MidtermReview.pdfStatistics 411511Important Concepts an.docx
 
Quantifiler® Trio kit and forensic samples management: a matter of degradation
Quantifiler® Trio kit and forensic samples management: a matter of degradationQuantifiler® Trio kit and forensic samples management: a matter of degradation
Quantifiler® Trio kit and forensic samples management: a matter of degradation
 
Interpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxInterpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptx
 
Teaching Population Genetics with R
Teaching Population Genetics with RTeaching Population Genetics with R
Teaching Population Genetics with R
 
6 sigma
6 sigma 6 sigma
6 sigma
 
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federationLargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
PASCAL PASCAL CHALLENGE ON INFORMATION EXTRACTION
PASCAL PASCAL CHALLENGE ON INFORMATION EXTRACTIONPASCAL PASCAL CHALLENGE ON INFORMATION EXTRACTION
PASCAL PASCAL CHALLENGE ON INFORMATION EXTRACTION
 
www1.cs.columbia.edu
www1.cs.columbia.eduwww1.cs.columbia.edu
www1.cs.columbia.edu
 
3 Ways the New Thermo Scientific LC MS Triple Quads Improve Residue Analysis
3 Ways the New Thermo Scientific LC MS Triple Quads Improve Residue Analysis3 Ways the New Thermo Scientific LC MS Triple Quads Improve Residue Analysis
3 Ways the New Thermo Scientific LC MS Triple Quads Improve Residue Analysis
 
Clinicaldataanalysis in r
Clinicaldataanalysis in rClinicaldataanalysis in r
Clinicaldataanalysis in r
 

More from danielstandage (6)

Brendel Group Presentation: 6 Mar 2013
Brendel Group Presentation: 6 Mar 2013Brendel Group Presentation: 6 Mar 2013
Brendel Group Presentation: 6 Mar 2013
 
Brendel Group Presentation: 4 Mar 2013
Brendel Group Presentation: 4 Mar 2013Brendel Group Presentation: 4 Mar 2013
Brendel Group Presentation: 4 Mar 2013
 
Brendel Group Presentation: 21 Nov 2013
Brendel Group Presentation: 21 Nov 2013Brendel Group Presentation: 21 Nov 2013
Brendel Group Presentation: 21 Nov 2013
 
Brendel Group Presentation: 19 Nov 2013
Brendel Group Presentation: 19 Nov 2013Brendel Group Presentation: 19 Nov 2013
Brendel Group Presentation: 19 Nov 2013
 
Brendel Group Presentation: 5 Nov 2013
Brendel Group Presentation: 5 Nov 2013Brendel Group Presentation: 5 Nov 2013
Brendel Group Presentation: 5 Nov 2013
 
Brendel Group Presentation: 15 Oct 2013
Brendel Group Presentation: 15 Oct 2013Brendel Group Presentation: 15 Oct 2013
Brendel Group Presentation: 15 Oct 2013
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 

Brendel Group Presentation: 17 Oct 2013

  • 1. Differential expression in the paper wasp Polistes dominula Daniel S. Standage, Brendel Group Meeting, 17 Oct 2013 
  • 2. Experimental design  6 queen samples  6 worker samples  Queen x and worker x from same colony (x ∈ [1 .. 6])  Hypothesis: identify handful of critical caste-related genes/transcripts
  • 3. Initial (naïve) analysis with RSEM/EBSeq  209,675 transcripts (assembled by Trinity)  RSEM and EBSeq completed without warnings  80-85% reads mapped  Many DE transcripts reported  5,769 (FDR=.05)  4,763 (FDR=.01)  3,878 (FDR=.001)
  • 4. Permutation testing  Randomly shuffle caste labels (queen or worker)  Re-run differential expression analysis  Repeat test  Compare number of transcripts reported as DE for each permutation https://github.com/standage/dept
  • 5. Permutation testing Real data 4,763 (FDR=.01) Permutation 1 5,112 Permutation 2 4,174 Permutation 3 4,474 Permutation 4 4,307 Permutation 5 4,718 Permutation 6 4,312 Permutation 7 4,171 Permutation 8 4,714 Permutation 9 3,828 Permutation 10 5,192
  • 6. Some observations  Some expression levels very low  Some transcripts had very few reads mapped  Some transcripts had many read mapped  Difficulty normalizing over large dynamic range?
  • 7. Filter transcripts  Reads mapped  queen/worker reads mapped > 2,500  overall reads < 1,000,000  Samples  4+ queen/worker samples with > 0 reads mapped  Distribution of reads mapped  mean(queen/worker reads mapped) * 0.9 > stdev(queen/worker reads mapped)
  • 8. DE analysis on filtered transcripts  40,498 transcripts  RSEM/EBSeq completed without warnings  20-35% reads mapped  Still many DE transcripts reported  1,680 (FDR=.05)  1,328 (FDR=.01)  1,037 (FDR=.001)
  • 9. PdomTSAr1.1-034114 (FC=126) Sample Expression Reads mapped Reads (adjusted) Q1 0.00 5232 5669.09 Q2 0.00 10046 5148.89 Q3 51.18 9188 6644.97 Q4 136.68 7920 6901.36 Q5 698.51 27862 6712.76 Q6 0.00 2582 5739.05 W1 0.00 5866 6920.72 W2 0.00 2046 5029.50 W3 0.00 2628 5879.19 W4 0.00 4308 5022.74 W5 0.00 7396 5983.82 W6 0.00 9132 6467.88
  • 10. PdomTSAr1.1-007723 (FC=2) Sample Expression Reads mapped Reads (adjusted) Q1 198.82 928 1005.53 Q2 445.48 1864 955.36 Q3 335.03 1330 961.89 Q4 267.42 1048 913.21 Q5 908.57 3988 960.82 Q6 114.54 458 1018.00 W1 125.65 714 842.38 W2 0.00 318 781.71 W3 78.41 426 953.02 W4 116.07 650 757.84 W5 161.56 1028 831.72 W6 147.01 1262 893.83
  • 11. RSEM expected count  'expected_count' is the sum of the posterior probability of each read comes from this transcript over all reads. Because 1) each read aligning to this transcript has a probability of being generated from background noise; 2) RSEM may filter some alignable low quality reads, the sum of expected counts for all transcript are generally less than the total number of reads aligned.
  • 12. Next (final) steps  Look more into “expected count”  Additional filtering?  Publish!