1. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Andreas Gisel
IITA – Bioscience & Bioinformatics
2. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Bioinformatics – definition and introduction
Bioinformatics @ IITA
Bioinformatics & IITA
3. www.iita.orgA member of CGIAR consortium
Bioinformatics - definition
Bio – Biology, Life Sciences
Informatics – computational sciences
DATA INTERPRETATIONS
RESULTS
Bio informatics
4. www.iita.orgA member of CGIAR consortium
Bioinformatics - definition
Bio – Biology, Life Sciences
Informatics – computational sciences
DATA INTERPRETATIONS
RESULTS
Data Repositories
Knowledge
5. www.iita.orgA member of CGIAR consortium
Bioinformatics - definition
Bio – Biology, Life Sciences
Informatics – computational sciences
DATA INTERPRETATIONS
Bioinformatics is an interdisciplinary science that develops and
improves on methods of analyzing biological data and storing,
retrieving, organizing, and visualizing them.
This is in order to support to solve biological problems and
discover the wealth of biological information hidden in biological
data.
8. www.iita.orgA member of CGIAR consortium
Descriptions
Pictures
Sequences
Protein
RNA
DNA
First fully sequenced bio-sequence
amino acid of insulin (51aa) 1955
First fully sequence nucleic acid
tRNA (75nt) 1965
First DNA
Bacteriophage (5375nt) 1977
DNA sequencing
Sanger sequencing technology (1975)
Pyrosequencing (Next Generation sequencing 2004)
First fully sequenced bio-sequence
amino acid of insulin (51aa) 1955
First fully sequence nucleic acid
tRNA (75nt) 1965
First DNA
Bacteriophage (5375nt) 1977
DNA sequencing
Sanger sequencing technology (1975)
Pyrosequencing (Next Generation sequencing 2004)
Biological Data
9. www.iita.orgA member of CGIAR consortium
Descriptions
Pictures
Sequences
Protein
RNA
DNA
Structures
Biological Data
10. www.iita.orgA member of CGIAR consortium
Descriptions
Pictures
Sequences
Protein
RNA
DNA
Structures
Protein
RNA
Biological Data
11. www.iita.orgA member of CGIAR consortium
Descriptions
Pictures
Sequences
Protein
RNA
DNA
Structures
Protein
RNA
Interactions
Biological Data
12. www.iita.orgA member of CGIAR consortium
Descriptions
Pictures
Sequences
Protein
RNA
DNA
Structures
Protein
RNA
Interactions
Expressions
Biological Data
13. www.iita.orgA member of CGIAR consortium
Up to 600’000’000’000
(600GB) bases per
experiment
Data Explosion
Descriptions
Pictures
Sequences
Protein
RNA
DNA
Structures
Protein
RNA
Interactions
Expressions
Microarray
High Throughput sequencing
Up to 1 million data points
per experiment
NGS
(Next Generation Sequencing)
14. www.iita.orgA member of CGIAR consortium
Descriptions
Pictures
Sequences
Protein
RNA
DNA
Structures
Protein
RNA
Interactions
Expressions
Data Explosion
15. www.iita.orgA member of CGIAR consortium
Data Analysis – DNA/RNA sequences
Sequence without
knowledge connected to it
is meaningless!
What to do?
Sequence similarity
Finding genes and regulatory
elements
Functional analysis of genes
Homology
Polymorphism
BIO
INFO
RM
ATICS
16. www.iita.orgA member of CGIAR consortium
Data Analysis
So we need bioinformatics tools and reference data
Hardware – Computing infrastructure (CPU, RAM, Storage)
Tools – Programs that process your data
Reference data – Databases for existing data
INTERNET– connection to external Databases
17. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Personel
Livia Stavolone – molecular biologist
Deborah Adeyele – student (training in bioinformatics and non-coding RNA)
Toyin Abdulsalam – research fellow (bioinformatics and transcriptom analysis)
Andreas Gisel
Whole Bioscience Team
18. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Hardware – Computing infrastructure (CPU, RAM, Storage)
HP Blade, with:
3 blades with each 2 16-core processors (AMD
Opteron Processor 6272),
384Gb RAM
2Tb attached storage (DAS)
8TB attached storage (NAS)
The operating system is Ubuntu 14.04.1 LTS
installed via biolinux 8.
19. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
Basic bioinformatics services mainly based on sequence analysis
Next Generation Sequencing data analysis pipelines including:
GBS (genotyping by sequencing) data analysis and SNP calling
Transcriptomics (RNA-seq) mapping, assembly and expression profiling
smallRNA data analysis: discovery and expression profiling
DNA methylation (BS-seq) data analysis
DNA (shotgun) assembly and variation calling
Genome annotation using different data pipelines and visualization
Customized approaches using perl and shell scripting
20. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
GBS (genotyping by sequencing) data analysis and SNP calling
Cassava
1200GB compressed sequence data (~5500 accessions) SNP matrix
5500 x ~160’000SNPs
Yam
200GB compressed sequence data (~800 accessions) 800 x ~25’000SNPs
Raw sequencing data SNP matrix
Cornell SNP calling
(TASSEL)
Broad SNP calling (GATK)
21. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
GBS (genotyping by sequencing) data analysis and SNP calling
22. www.iita.orgA member of CGIAR consortium
GBS (genotyping by sequencing) data analysis and SNP calling
Ismail Rabbi
Bioinformatics @ IITA
Tools – Programs that process your data
SNP matrix
Cornell
23. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
GBS (genotyping by sequencing) data analysis and SNP calling
SNP matrix
In-house
24. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
GBS (genotyping by sequencing) data analysis and SNP calling
SNP matrix
Histogram of IBS.num
Distance (1 PSA)−
Frequency
0.00 0.05 0.10 0.15 0.20 0.25
01000200030004000
26. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
GBS (genotyping by sequencing) data analysis and SNP calling
SNP matrix
27. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
GBS (genotyping by sequencing) data analysis and SNP calling
SNP matrix
External data
In-house developed scripts
28. www.iita.orgA member of CGIAR consortium
GBS (genotyping by sequencing) data analysis and SNP calling
Bioinformatics @ IITA
Tools – Programs that process your data
Chr10
Chr1
Chr4
Chr6
Chr5
Chr2
Chr3
Chr7
Chr8
Chr18
Chr9
Chr16
Chr17
Chr15
Chr13
Chr14
Chr12
Chr11
Cassava Assembly
& Annotation
Version 6.1
29. www.iita.orgA member of CGIAR consortium
Cassava Assembly
& Annotation
Version 6.1
GBS (genotyping by sequencing) data analysis and SNP calling
Bioinformatics @ IITA
Tools – Programs that process your data
Gene Distribution
SNP Distribution
GBS Coverage
Heterocygosity
30. www.iita.orgA member of CGIAR consortium
GBS (genotyping by sequencing) data analysis and SNP calling
Bioinformatics @ IITA
Tools – Programs that process your data
31. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
Transcriptomics (RNA-seq) mapping, assembly and expression profiling
What is RNA-seq?
32. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
smallRNA data analysis: discovery and expression profiling
Automated pipeline for reference supported and de
novo transcriptome assembly and expression profiling
33. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
smallRNA data analysis: discovery and expression profiling
Small RNA are short (21
-200nt) long RNA, not coding
for proteins with gene
regulatory effects.
34. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
smallRNA data analysis: discovery and expression profiling
Automated pipeline for non-coding RNA classification
and expression profiling.
35. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
DNA methylation (BS-seq) data analysis
What is BS-seq?
DNA methylation is
another gene regulation
mechanism which can be
inherited.
36. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
DNA methylation (BS-seq) data analysis
What is BS-seq?
DNA methylation is
another gene regulation
mechanism which can be
inherited.
37. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Tools – Programs that process your data
DNA (shotgun) assembly and variation calling
Genome annotation using different data pipelines and visualization
38. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Reference data – Databases for existing data
Genomic Reference Data
Cassava (sequence, annotation, function)
D.rotundata (sequence, working on annotation and function)
D.alata (waiting for sequence and annotation)
Maize (ready sequence and annotation)
Banana (ready sequence and annotation)
Archive
Cassava (GBS, WGS, RNA-seq)
D.rotundata (GBS, smallRNA)
Maize (GBS)
39. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Reference data – Databases for existing data
Genomic Reference Data
Cassava (sequence, annotation, function)
D.rotundata (sequence, working on annotation and function)
D.alata (waiting for sequence and annotation)
Maize (ready sequence and annotation)
Banana (ready sequence and annotation)
Archive
Cassava (GBS, WGS, RNA-seq)
D.rotundata (GBS, smallRNA)
Maize (GBS)
40. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
Reference data – Databases for existing data
Genomic Reference Data
Cassava (sequence, annotation, function)
D.rotundata (sequence, working on annotation and function)
D.alata (waiting for sequence and annotation)
Maize (ready sequence and annotation)
Banana (ready sequence and annotation)
Archive
Cassava (GBS, WGS, RNA-seq)
D.rotundata (GBS, smallRNA)
Maize (GBS)
41. www.iita.orgA member of CGIAR consortium
Bioinformatics @ IITA
INTERNET– connection to external Databases
Automated pipelines and strategies for big data
downloads
42. www.iita.orgA member of CGIAR consortium
Bioinformatics & IITA
Development of Bioinformatics Capacity
IITA Projects
Involvement in planning of data production, analysis -
financing of data storage and analysis
Bioinformatics
Bioscience
Data analysis, Data repositories, Visualization
43. www.iita.orgA member of CGIAR consortium
Bioinformatics & IITA
Development of Bioinformatics Capacity
In project with sequencing activities:
We need to individuate the bioinformatics part
We need to take over at least a part of the bioinformatics
activities
We have the Bioscience involved in the planning of the data
production to optimize the data analysis and knowledge building
Capacity building to enforce the bioinformatics facility
44. www.iita.orgA member of CGIAR consortium
Thank you!
Data from:
Ranjana Bhattacharjee
Livia Stavolone
Morag Ferguson
Ismail Rabbi