1. Canceromatic III - Session I: Pan-Cancer analysis
- Changing landscape of data and tools available
for reproducible cancer genomics workflows: report
from the ICGC trenches.
Nov 14th 2016
B.F. Francis Ouellette francis@oicr.on.ca
• Senior Scientists & Associate Director,
Informatics and Biocomputing, Ontario Institute for
Cancer Research, Toronto, ON
• Associate Professor, Department of Cell and Systems Biology,
University of Toronto, Toronto, ON.
4. ONTARIO INSTITUTE FOR CANCER RESEARC
You are free to:
Copy, share, adapt, or re-mix;
Photograph, film, or broadcast;
Blog, live-blog, or post video of;
This presentation. Provided that:
You attribute the work to its author and respect the rights
and licenses associated with its components.
Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero.
Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at;
http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites
6. ONTARIO INSTITUTE FOR CANCER RESEARC
6
Cancer-om-atics Jul 6-9 2009
Cancer-om-atics II Mar 28-30 2011
Canceromatics III Nov 13 -16 2016
7. ONTARIO INSTITUTE FOR CANCER RESEARC
Disclaimers
I do not (and will not) profit in any way, shape or form, from
any of the brands, products or companies I may mention.
I am a big proponent of Open Access, Open Source, Opent
Data and Open Courseware
I am on the SAB of many NIH funded projects (SGD, Galaxy,
GenomeSpace, H3ABionet, and HMP2), as well as Elixir and
Genome Canada’s SIAC, and the NRC’s KMAC.
This comes with a bias on how science should be done!
8. ONTARIO INSTITUTE FOR CANCER RESEARC
Outline
8
Introduction
ICGC
PCAWG
Closing remarks
9. ONTARIO INSTITUTE FOR CANCER RESEARC
9
adapted from https://goo.gl/fQJAz1
ICGC PCAWG
Docker
Testing
10. ONTARIO INSTITUTE FOR CANCER RESEARC
Cancer is a Disease
of the Genome
Challenge in Treating Cancer:
Every tumour is different
Every cancer patient is different
Adapted from Tom Hudsonhttps://www.cancer.gov/research/areas/genomics
11. ONTARIO INSTITUTE FOR CANCER RESEARC
Johns Hopkins
> 18,000 genes analyzed for mutations
11 breast and 11 colon tumors
L.D. Wood et al, Science, Oct. 2007
Wellcome Trust Sanger Institute
518 genes analyzed for mutations
210 tumors of various types
C. Greenman et al, Nature, Mar. 2007
TCGA (NIH)
Multiple technologies
brain (glioblastoma multiforme), lung (squamous carcinoma),
and ovarian (serous cystadenocarcinoma).
F.S. Collins & A.D. Barker, Sci. Am, Mar. 2007
Large-Scale Studies of Cancer Genomes
12. ONTARIO INSTITUTE FOR CANCER RESEARC
Heterogeneity within and across tumor types
High rate of abnormalities (driver vs passenger)
Sample quality matters
Consent and controlled data access is complicated
Lessons learned from early studies
MR Stratton et al. Nature 458, 719-724 (2009) doi:10.1038/nature07943
13. ONTARIO INSTITUTE FOR CANCER RESEARC
Analysis Data Types
Simple Somatic Mutations (SSM or SNV)
Copy Number Alterations (CAN or CNV)
Structural Variants (SV)
Germline variants (SNPs)
Gene Expression (micro-arrays and RNASeq)
miRNA Expression (RNASeq)
Epigenomics (Arrays and Methylation)
Splicing Variation (RNASeq)
Protein Expression (Arrays)
14. ONTARIO INSTITUTE FOR CANCER RESEARC
Rationale for the ICGC:
Scope is huge
Reduce duplication of effort
Standardization and uniform quality
measures
Merging of datasets
Spectrum of many cancers varies
across the world
Accelerate the dissemination of
genomic and analytical methods
15. ONTARIO INSTITUTE FOR CANCER RESEARC
International Cancer Genome Consortium
Collect ~500 tumour/normal pairs from each of 50 different
major cancer types; 25,000 T/N pairs!
Comprehensive genome analysis of each T/N pair:
Genome
Transcriptome
Methylome
Clinical data
Make the data available to the research community & public.
Identify
genome
changes
…GATTATTCCAGGTAT… …GATTATTGCAGGTAT… …GATTATTGCAGGTAT…
Adapted from Tom Hudson
17. ONTARIO INSTITUTE FOR CANCER RESEARC
International Cancer Genome Consortium: http:/icgc.org
18. ONTARIO INSTITUTE FOR CANCER RESEARC
Data
Submission
Validation
ValidationValidation
(dictionary)
Validation
(across
fields)
Validation
(across
fields)
Validation
(across
fields)
indexing
Happy
Users
http://goo.gl/1EcyR
19. ONTARIO INSTITUTE FOR CANCER RESEARC
ICGC needs to deal with different
kinds of users!
19
Biologists/Clinicians:
Web interface to processed data, providing:
Affected gene lists with consequences
Impact on pathways
Power users:
Application Programing Interface (API) to get to data
Availability and Integration with cloud resources
25. ONTARIO INSTITUTE FOR CANCER RESEARC
http://docs.icgc.org/
User and submitter documentation
26. ONTARIO INSTITUTE FOR CANCER RESEARC
Software development discussions
26
https://discuss.icgc.org/
27. ONTARIO INSTITUTE FOR CANCER RESEARC
Some challenges:
27
So, we have lots of data, is
it generated the same way?
28. ONTARIO INSTITUTE FOR CANCER RESEARC
Every country/group has basically been
submitting:
28
Simple Somatic Mutations (SSM or SNV)
Copy Number Alterations (CAN or CNV)
Structural Variants (SV)
Germline variants (SNPs)
Gene Expression (micro-arrays and RNASeq)
miRNA Expression (RNASeq)
Epigenomics (Arrays and Methylation)
Splicing Variation (RNASeq)
Protein Expression (Arrays)
32. ONTARIO INSTITUTE FOR CANCER RESEARC
Are we all using the same definition for
controlled access data?
32
No
33. ONTARIO INSTITUTE FOR CANCER RESEARC
ICGC
BAM/FASTQ
TCGA
BAM/FASTQ
ICGC
Open
Data
(includes
TCGA
Open Data)
34. ONTARIO INSTITUTE FOR CANCER RESEARC
• Detailed Phenotype and Outcome data
Region of residence
Risk factors
Examination
Surgery
Radiation
Sample
Slide
Specific histological features
Analyte
Aliquot
Donor notes
• Gene Expression (probe-level data)
• Raw genotype calls
• Gene-sample identifier links
• Genome sequence files
ICGC Controlled
Access Datasets
• Cancer Pathology
Histologic type or subtype
Histologic nuclear grade
• Patient/Person
Gender, Age range,
Vital status, Survival time
Relapse type, Status at follow-up
• Gene Expression (normalized)
• DNA methylation
•Computed Copy Number and
Loss of Heterozygosity
• Newly discovered somatic variants
ICGC OA
Datasets
http://goo.gl/w4mrV
36. ONTARIO INSTITUTE FOR CANCER RESEARC
ICG
C
TCGA
Differences between ICGC & TCGA
• Different tumour types
• Different geographic rules
• Many countries vs one jurisdiction
• Different definitions of what is controlled
• Different data access rules
37. ONTARIO INSTITUTE FOR CANCER RESEARC
• Detailed Phenotype and Outcome data
• Gene Expression (probe-level data)
• Raw genotype calls
• Gene-sample identifier links
• Genome sequence files
• Germ line variants
ICGC Controlled
Access Datasets
• Cancer Pathology
Histologic type or subtype
Histologic nuclear grade
• Patient/Person
Gender, Age range,
Vital status, Survival time
Relapse type, Status at follow-up
• Gene Expression (normalized)
• DNA methylation
•Computed Copy Number and
Loss of Heterozygosity
• Somatic variants from Exome or WGS
ICGC Open
Access Datasets
http://goo.gl/w4mrV
38. ONTARIO INSTITUTE FOR CANCER RESEARC
• Primary sequence data
(BAM and FASTQ files)
• SNP6 array level 1 and level 2 data
• Exon array level 1 and level 2 data
• Somatic variants from whole
genome sequencing
• Certain information in MAFs
• A full list of controlled-access
data types can be found at:
http://goo.gl/K1h7zu
TCGA Controlled
Access Datasets
• De-identified clinical and
demographic data
• Gene expression data
• Copy number alterations in regions
of the genome
• Epigenetic data
• Summaries of data compiled across
individuals
• Anonymized single amplicon DNA
sequence data
• Somatic variants from scrubbed
exome sequencing
TCGA Open
Access Datasets
http://goo.gl/A1rMRB
40. ONTARIO INSTITUTE FOR CANCER RESEARC
From ICGC/TCGA
40
Each groups have been free to decide on their own if
they wanted to sequence Exomes or Whole Genomes.
A bit more than 10% of all genomes done were done
with Whole Genome Sequencing
A steering comitte was formed and we decided to
alnalyze these WG in a robust way with the primary
question of figuring out what was hidden in the genomic
sequence of cancer patients!
42. ONTARIO INSTITUTE FOR CANCER RESEARC
Steering Committee of PCAWG
42
Peter Campbell, Sanger Inst.
Gady Getz, Broad
Jan Korbel, EMBL
Lincoln Stein, OICR
Josh Stuart, UCSC
43. ONTARIO INSTITUTE FOR CANCER RESEARC
PanCancer Analysis of Whole Genomes
(PCAWG)
> 2,800 T/N pairs with clinical data from 20
tumour type of whole genome analysis.
Aligned with one standard pipeline.
Genomic Variants determined with 3
pipelines
17 working groups
Start writing papers now
44. ONTARIO INSTITUTE FOR CANCER RESEARC
Deliverable for PCAWG will include:
44
1st PANCANCER analysis on > 2,800 cancer tumours
from a WGS perspective
RNA, SSM, CNV, Methylation analysis & germline
Published (executable) pipelines
Docker / Dockstore
Mutiple cloud access to data
Multiple portal access to data
46. ONTARIO INSTITUTE FOR CANCER RESEARC
Working Groups (1/2)
46
1 Novel somatic mutation calling methods
2 Analysis of mutations in regulatory regions
3 Integration of transcriptome and genome
4 Integration of epigenome and genome
5 Consequences of somatic mutations on pathway
and network activity
6 Patterns of structural variations, signatures, genomic
correlations, retrotransposons, mobile elements
7 Mutation signatures and processes
8 Germline cancer genome
47. ONTARIO INSTITUTE FOR CANCER RESEARC
Working Groups (1/2)
47
9 Inferring driver mutations and identifying cancer genes
and pathways
10 Translating cancer genomes to the clinic
11 Evolution and heterogeneity
12 Exploratory: portals, visualization and software
infrastructure
13 Molecular subtypes and classification
14 Analysis of mutations in non-coding RNA
15 Exploratory: mitochondrial
16 Exploratory: pathogens
Tech Technical working group
50. ONTARIO INSTITUTE FOR CANCER RESEARC
DOCKSTORE testing group
50
Andrew Duncan, OICR
Christina Yung, OICR
Denis Yuen, OICR
Zhibin Lu, OICR
Brian O’Connor, UCSC
Alex Buchanan, OHSU
Kyle Ellrott, OHSU
Francis Ouellette, OICR
Gordon Saksena, Broad
Junjun Zhang, OICR
Miguel Vazquez, CNIO
Oliver Hofmann, Australia
Solomon Shorser, OICR
Adam Strucka, OHSU
51. ONTARIO INSTITUTE FOR CANCER RESEARC
Challenges:
51
Too many conference calls!
Too many clouds
Even though we learned from what not to do with ICGC,
we had to learn what not to do in the clouds.
TCGA and ICGC have different authorization protocols
Not all data can exist everywhere
Dockstore testing is taking too long!
52. ONTARIO INSTITUTE FOR CANCER RESEARC
Other projects in planning
ICGC to finish in Spring of 2018
Planning for ICGCmed
ICGC 1: 25,000 tumours (DNA, RNA, Epigenome,
Clinical data)
ICGCmed: 200,000 Tumours (DNA, RNA,
Epigenome, Clinical trial)
ICGC1 was the picture, ICGCmed will be the movie
(before and after treatment).
Submission system with one place for data and
metadata
Tools/links directory portal
66. ONTARIO INSTITUTE FOR CANCER RESEARC
66
0-Toronto1-Bethesda2-Hinxton
4-Queensland 3-Madrid5-Kyoto
7-Hidelberg 6-Cannes8-Toronto
9-Beijing
10-Mumbai11- Boston
12
72. ONTARIO INSTITUTE FOR CANCER RESEARC
Bioinformatics.ca workshops Content
72
http://bioinformatics-ca.github.io/
https://goo.gl/CGu13q
1
73. ONTARIO INSTITUTE FOR CANCER RESEARC
DCC Software
Developer
Vincent Ferretti
Dusan Andric
Phuong-My Do
Francois Gerthoffert
Terry Lin
Michael Moncada
Vitalii Slobodianyk
Bob Tiernay
Douglas Wong
Linda Xiang
Junjun Zhang
Acknowledgments
ICGC/OICR
Project leaders:
Tom Hudson
John McPherson
Lincoln Stein
Jared Simpson
Paul Boutros
Vincent Ferretti
Francis Ouellette
Jennifer Jennings
Christine Yung
Ouellette Lab
Alysha Moncrieffe
Ann Meyer
Zhibin Lu
Web Dev
Joseph Yamada
Kaman Wu
Kim Cullion
Koji Miyauchi
Miyuki Fukuma
ICGC DCC Biocuration
Hardeep Nahal
Marc Perry
http://oicr.on.ca http://icgc.org
… and all the patients and their
families that that are putting
their hopes into our work!
Research
IT/Systems
David Sutton,
Bob Gibson
David Magda
Rob Naccarato
Brian Ott
Gino Yearwood
EGA
Jordi Rambla De
Argila
Arcadi Navarro
Audald Iloret
Mauricio Moldes
74. ONTARIO INSTITUTE FOR CANCER RESEARC
http://icgc.org
http://dcc.icgc.org
http://docs.icgc.org
info@icgc.org
http://bioinformatics.ca
75. ONTARIO INSTITUTE FOR CANCER RESEARC
We are hiring:
• OICR Director
• Genome Technology Director
• Junior Faculty in Informatics
& Biocomputing
• PDFs
Interested? Ask Paul Boutros or I