Interpreting transcriptomics (ers berlin 2017)

•Als PPTX, PDF herunterladen•

1 gefällt mir•180 views

Presented at European Respiratory Society, Berlin, October 2017. High level talk to mix of clinicians and scientists on analyzing transcriptomic / gene expression data

Wissenschaft

Clusters, pathways, context
Interpreting transcriptomic data
Paul Agapow, Translational Bioinformatics, Data Science Institute
Syst. Medicine in Resp. Disease
Berlin October 2017

• Which genes are transcribed more
/ less?
• What’s the difference between:
– Cell lines?
– Healthy & unhealthy tissue?
– Tissues?
– Patients with & without a
SNP?
Expression data can tell us ...

• Dynamic
• Responsive
• Quantifiable
• More informative
Why study expression data?
But:
• (Processing)
• Comparative analysis
• Multiple technologies
• Cut-offs
• Batch effects
• Power
• Looking at the right place / time?
• Interpretation

• Microarrays:
– DNA anchored to a solid
surface
– Assess RNA that binds to it
– “Old” (90s)
– Noisy
– Finds what’s on the chip
Platforms
• RNA-seq:
– Deep-sequencing of RNA
– More accurate & reliable
– More expensive
– High throughput
– Finds everything

1. Set of R software libraries for
analysis of high-throughput data
– Inter-operable
– documented
2. BC library for transcriptomic
analysis
Tools: Bioconductor & limma

Interpretation: Clustering
Put similar things together:
• Gene expression patterns (co-
regulation, modules)
• Patients (stratification)
But:
• What’s a cluster / similarity?
• Allow for noise
• Comparison
• Is it ontologically real?

Many methods but:
• K-Means / K-Medians clustering
– Simple
– Stochastic, define K
– Best with spherical data
• Hierarchical clustering
– Levels of granularity
– Produces dendrogram
– Computationally complex
How to cluster
But:
• Little comparative work
• No support / confidence
• Supervised vs unsupervised
• Poor reproducibility
– Bootstrap / Jackknife
• Comparing clusters

Clustering assumptions
• Incorrect number of clusters
• Non-spherical distribution
• Unequal variance
• Unequal group size

How do you compare clusters
obtained from 2+ different
experiments?
• Especially if clusters labelled
differently
• If separation poor
• If clusters nest
Comparing clusters
• Adjusted mutual information
(sklearn)
– No nesting
• Conditional entropy

• Match genes against lists
• Associate a gene with a
compartment or pathway
• Examine enrichment /
downregulation
Interpretation: enrichment
But:
• What’s a pathway?
• Are they right?
• Statistical basis
• Many choices
• Post-transcriptional regulation?

• Popular tools:
– DAVID (not updated?)
– GSEA
– Ingenuity / Metacore
– Bioconductor
• Individual cases:
– Hypergeometric test
• Gives you support
Enrichment

• Many knowledge bases are a pot-
pourri of undifferentiated “facts”
– Incomplete
– Where / what / how?
• Use curated knowledge bases
• Traverse graphs
Interpretation: contextualization

• Use graphs databases for
• Traverse graphs for “neighbours”
– Shortest paths connecting
protein COL6A5, a protein
implicated in airway
remodelling, to asthma
• Stats / support?
• Hypothesis generation
Graph databases for
knowledge representation

• Science is hard
• Assumptions are important
• Obtaining support / confidence / validation is
difficult
• ... but important
Conclusions?

Empfohlen

On the importance (and absence) of annotation in Next Generation Sequencing DataHugh Shanahan

Ilya Kupershmidt speaks at the Molecular Medicine Tri-ConferenceNextBio

Combining Phenomes and Genomes to Fill Analytical Gaps: Data Management in Ph...Anne Thessen

140127 GIAB update and NIST high-confidence callsGenomeInABottle

Creating an Urban Legend: A System for Electrophysiology Data Management and ...Anita de Waard

Big data from small data: A survey of the neuroscience landscape through the...Maryann Martone

Jillian ms defense-4-14-14-jaJillian Aurisano

Time and Money: Techniques for Neural Gene Expression ProfilingRayna Harris

Empfohlen

On the importance (and absence) of annotation in Next Generation Sequencing DataHugh Shanahan

Ilya Kupershmidt speaks at the Molecular Medicine Tri-ConferenceNextBio

Combining Phenomes and Genomes to Fill Analytical Gaps: Data Management in Ph...Anne Thessen

140127 GIAB update and NIST high-confidence callsGenomeInABottle

Creating an Urban Legend: A System for Electrophysiology Data Management and ...Anita de Waard

Big data from small data: A survey of the neuroscience landscape through the...Maryann Martone

Jillian ms defense-4-14-14-jaJillian Aurisano

Time and Money: Techniques for Neural Gene Expression ProfilingRayna Harris

2013 bms-retreat-talkc.titus.brown

GIAB Integrating multiple technologies to form benchmark SVs 180517GenomeInABottle

EiTESAL eHealth Conference 14&15 May 2017 EITESANGO

Use of dataChris Evelo

Data-knowledge transition zones within the biomedical research ecosystemMaryann Martone

Machine learning, health data & the limits of knowledgePaul Agapow

Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca

Hyperspectral Data IssuesAlex Henderson

160628 giab for festival of genomicsGenomeInABottle

Amia tb-review-08Russ Altman

Jillian ms defense-4-14-14-ja-novid2Jillian Aurisano

Jillian ms defense-4-14-14-ja-novideoJillian Aurisano

Giab jan2016 analysis team breakout summaryGenomeInABottle

GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle

CRISPR presentation extended Mouse ModelingTristan Kempston

ECCB 2014: Extracting patterns of database and software usage from the bioinf...geraintduck

2015 functional genomics variant annotation and interpretation- tools and p...Gabe Rudy

Databases and Ontologies: Where do we go from here?Maryann Martone

Making your science powerful : an introduction to NGS experimental designjelena121

Digital Biomarkers, a (too) brief introduction.pdfPaul Agapow

How to make every mistake and still have a career, Feb2024.pdfPaul Agapow

Weitere ähnliche Inhalte

Ähnlich wie Interpreting transcriptomics (ers berlin 2017)

2013 bms-retreat-talkc.titus.brown

GIAB Integrating multiple technologies to form benchmark SVs 180517GenomeInABottle

EiTESAL eHealth Conference 14&15 May 2017 EITESANGO

Use of dataChris Evelo

Data-knowledge transition zones within the biomedical research ecosystemMaryann Martone

Machine learning, health data & the limits of knowledgePaul Agapow

Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca

Hyperspectral Data IssuesAlex Henderson

160628 giab for festival of genomicsGenomeInABottle

Amia tb-review-08Russ Altman

Jillian ms defense-4-14-14-ja-novid2Jillian Aurisano

Jillian ms defense-4-14-14-ja-novideoJillian Aurisano

Giab jan2016 analysis team breakout summaryGenomeInABottle

GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle

CRISPR presentation extended Mouse ModelingTristan Kempston

ECCB 2014: Extracting patterns of database and software usage from the bioinf...geraintduck

2015 functional genomics variant annotation and interpretation- tools and p...Gabe Rudy

Databases and Ontologies: Where do we go from here?Maryann Martone

Making your science powerful : an introduction to NGS experimental designjelena121

Ähnlich wie Interpreting transcriptomics (ers berlin 2017) (20)

2013 bms-retreat-talk

GIAB Integrating multiple technologies to form benchmark SVs 180517

EiTESAL eHealth Conference 14&15 May 2017

Use of data

Data-knowledge transition zones within the biomedical research ecosystem

Machine learning, health data & the limits of knowledge

Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...

Hyperspectral Data Issues

160628 giab for festival of genomics

Amia tb-review-08

Jillian ms defense-4-14-14-ja-novid2

Jillian ms defense-4-14-14-ja-novideo

Giab jan2016 analysis team breakout summary

GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511

CRISPR presentation extended Mouse Modeling

ECCB 2014: Extracting patterns of database and software usage from the bioinf...

2015 functional genomics variant annotation and interpretation- tools and p...

Databases and Ontologies: Where do we go from here?

Making your science powerful : an introduction to NGS experimental design

Mehr von Paul Agapow

Digital Biomarkers, a (too) brief introduction.pdfPaul Agapow

How to make every mistake and still have a career, Feb2024.pdfPaul Agapow

ML, biomedical data & trustPaul Agapow

Where AI will (and won't) revolutionize biomedicinePaul Agapow

Beyond Proofs of Concept for Biomedical AIPaul Agapow

Multi-omics for drug discovery: what we lose, what we gainPaul Agapow

ML & AI in pharma: an overviewPaul Agapow

ML & AI in Drug development: the hidden part of the icebergPaul Agapow

AI in HealthcarePaul Agapow

The End of the Drug Development Casino?Paul Agapow

Get yourself a better bioinformatics jobPaul Agapow

Interpreting Complex Real World Data for Pharmaceutical ResearchPaul Agapow

Filling the gaps in translational researchPaul Agapow

Bioinformatics! (What is it good for?)Paul Agapow

Big Data & ML for Clinical DataPaul Agapow

Machine Learning for Preclinical ResearchPaul Agapow

AI for Precision Medicine (Pragmatic preclinical data science)Paul Agapow

Patient subtypes: real or not?Paul Agapow

Big biomedical data is a liePaul Agapow

eTRIKS at Pharma IT 2017, LondonPaul Agapow

Mehr von Paul Agapow (20)

Digital Biomarkers, a (too) brief introduction.pdf

How to make every mistake and still have a career, Feb2024.pdf

ML, biomedical data & trust

Where AI will (and won't) revolutionize biomedicine

Beyond Proofs of Concept for Biomedical AI

Multi-omics for drug discovery: what we lose, what we gain

ML & AI in pharma: an overview

ML & AI in Drug development: the hidden part of the iceberg

AI in Healthcare

The End of the Drug Development Casino?

Get yourself a better bioinformatics job

Interpreting Complex Real World Data for Pharmaceutical Research

Filling the gaps in translational research

Bioinformatics! (What is it good for?)

Big Data & ML for Clinical Data

Machine Learning for Preclinical Research

AI for Precision Medicine (Pragmatic preclinical data science)

Patient subtypes: real or not?

Big biomedical data is a lie

eTRIKS at Pharma IT 2017, London

Kürzlich hochgeladen

Zoology 4th semester series (krishna).pdfSumit Kumar yadav

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani

Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani

Animal Communication- Auditory and Visual.pptxUmerFayaz5

Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav

Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25

Bacterial Identification and ClassificationsAreesha Ahmad

Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6

GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari

VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P

Creating and Analyzing Definitive Screening DesignsNurulAfiqah307317

Nanoparticles synthesis and characterization kaibalyasahoo82800

Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha

❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani

Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju

CELL -Structural and Functional unit of life.pdfNistarini College, Purulia (W.B) India

Kürzlich hochgeladen (20)

Zoology 4th semester series (krishna).pdf

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...

Hubble Asteroid Hunter III. Physical properties of newly found asteroids

Animal Communication- Auditory and Visual.pptx

Botany 4th semester file By Sumit Kumar yadav.pdf

Recombination DNA Technology (Nucleic Acid Hybridization )

Bacterial Identification and Classifications

Biopesticide (2).pptx .This slides helps to know the different types of biop...

GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...

VIRUSES structure and classification ppt by Dr.Prince C P

Creating and Analyzing Definitive Screening Designs

Nanoparticles synthesis and characterization

Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000

❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b

Pests of mustard_Identification_Management_Dr.UPR.pdf

CELL -Structural and Functional unit of life.pdf

Interpreting transcriptomics (ers berlin 2017)

1. Clusters, pathways, context Interpreting transcriptomic data Paul Agapow, Translational Bioinformatics, Data Science Institute Syst. Medicine in Resp. Disease Berlin October 2017

2. • Which genes are transcribed more / less? • What’s the difference between: – Cell lines? – Healthy & unhealthy tissue? – Tissues? – Patients with & without a SNP? Expression data can tell us ...

3. • Dynamic • Responsive • Quantifiable • More informative Why study expression data? But: • (Processing) • Comparative analysis • Multiple technologies • Cut-offs • Batch effects • Power • Looking at the right place / time? • Interpretation

4. • Microarrays: – DNA anchored to a solid surface – Assess RNA that binds to it – “Old” (90s) – Noisy – Finds what’s on the chip Platforms • RNA-seq: – Deep-sequencing of RNA – More accurate & reliable – More expensive – High throughput – Finds everything

5. 1. Set of R software libraries for analysis of high-throughput data – Inter-operable – documented 2. BC library for transcriptomic analysis Tools: Bioconductor & limma

6. Interpretation: Clustering Put similar things together: • Gene expression patterns (co- regulation, modules) • Patients (stratification) But: • What’s a cluster / similarity? • Allow for noise • Comparison • Is it ontologically real?

7. Many methods but: • K-Means / K-Medians clustering – Simple – Stochastic, define K – Best with spherical data • Hierarchical clustering – Levels of granularity – Produces dendrogram – Computationally complex How to cluster But: • Little comparative work • No support / confidence • Supervised vs unsupervised • Poor reproducibility – Bootstrap / Jackknife • Comparing clusters

8. Clustering assumptions • Incorrect number of clusters • Non-spherical distribution • Unequal variance • Unequal group size

9. How do you compare clusters obtained from 2+ different experiments? • Especially if clusters labelled differently • If separation poor • If clusters nest Comparing clusters • Adjusted mutual information (sklearn) – No nesting • Conditional entropy

10. • Match genes against lists • Associate a gene with a compartment or pathway • Examine enrichment / downregulation Interpretation: enrichment But: • What’s a pathway? • Are they right? • Statistical basis • Many choices • Post-transcriptional regulation?

11. • Popular tools: – DAVID (not updated?) – GSEA – Ingenuity / Metacore – Bioconductor • Individual cases: – Hypergeometric test • Gives you support Enrichment

12. • Many knowledge bases are a pot- pourri of undifferentiated “facts” – Incomplete – Where / what / how? • Use curated knowledge bases • Traverse graphs Interpretation: contextualization

13. • Use graphs databases for • Traverse graphs for “neighbours” – Shortest paths connecting protein COL6A5, a protein implicated in airway remodelling, to asthma • Stats / support? • Hypothesis generation Graph databases for knowledge representation

14. • Science is hard • Assumptions are important • Obtaining support / confidence / validation is difficult • ... but important Conclusions?