Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Mik Black bioinformatics symposium
1. How to make bioinformatics
accessible to normal people!
Mik Black
Department of Biochemistry
University of Otago
2. Some musings…
• Accessibility - two aspects:
1. Methodology development distribution
(can you get it?)
2. Methodology uptake (can you use it?)
• “Normal people”:
1. Who are they?
2. What do they want? What do they
need? Is there a happy medium?
3. My background
• Statistical design and analysis of
microarray experiments:
– Methodology development
– Applying existing bioinformatics techniques
– Adapting “standard” statistical methods
– “Forensic” analysis
• Technologies:
– Microarrays (mRNA, SNPs, CNV/CGH)
– Second generation sequencing (SNP/CNV)
6. The joys of the command line…
• Large amounts of statistical genomics
methodology available via R
– Accessible?
– Uptake?
– Who are the end users?
• Can’t we just teach EVERYONE to use R?
7. http://www.broad.mit.edu/cancer/software/genepattern/
Reich et al. (2006) GenePattern 2.0., Nature Genetics, 38, 500-501.
• GenePattern provides a web-based
method for analysing microarray (and
other) data.
• Provides a simple interface to tools
developed in Java, R, Matlab and other
languages.
• Analysis performed on server:
– no compute resources required by users.
– Facilitates sharing of results.
8. Using GenePattern
• User friendly
– Third year bioinformatics course at Otago.
– Workshop for lab personnel at TGen.
• Guided analysis
– Facilitates use of standard analysis methods.
– Pipeline creation and “versioned” analysis.
• End users?
– At Otago: 3rd 4th year Biochemistry students
– At TGen: lab techs, iterns, bench scientists, PIs
9. Common analysis tasks
• Basic data analysis/exploration:
– Heatmap creation
– Hierarchical clustering
– Identifying differentially expressed genes
– Gene set analysis
– Survival analysis
• GenePattern provides these tools in a
modular format.
18. GenePattern on BeSTGRID
• Services:
– GenePattern server
– Development environment (server and SVN)
• GenePattern training (coming soon):
– Basic usage
– Module development (uptake path for
bioinformatics tool developers)
19. Next steps…
• Full GenePattern deployment:
– Transfer of development modules to public server
– Documentation and training
– Use of ROCKS cluster for job submission
• Modules for Second Gen Sequencing data:
– DNAseq, RNAseq, ChIPseq
– R/Bioconductor (e.g., ShortRead, Biostrings,
RSamTools, GenomeGraphs…)
– Analysis, visualization and quality assurance
20. Community effort
• Some current examples:
– VISG/MapNet: statisticians geneticists
– BeSTGRID: middleware development
deployment through to end users
– CTCR: cancer researchers clinicians
• Each group has the goal of placing
powerful (and useful, and usable) tools
into the hands of end users.
21. Bioinformatics community
• NZGL provides opportunity for
community-based effort.
– National infrastructure for genomics research
– Includes strong bioinformatics component
• Key issue: engagement with end users
– Methodology development and distribution
– Uptake, interaction and training
22. Bioinformatics community
• NZGL provides opportunity for
community-based effort.
– National infrastructure for genomics research
– Includes strong bioinformatics component
• Key issue: engagement with end users
– Methodology development and distribution
– Uptake and interaction
LETS GO FIND US SOME “NORMAL” PEOPLE!
23. Acknowledgements
University of Otago
The University of Auckland
Marcus Davy
Nick Jones
Tim Molteno
Mark Gahegan
Thomas Allen
Yuriy Halytskyy
Sarah Song
Cristin Print
Chris Brown
Daniel Hurley
Anthony Reeve
Christoff Knapp
Tony Merriman
University of Canterbury
Vladimir Mencl