SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Downloaden Sie, um offline zu lesen
IRRI Genotyping Service Laboratory
Galaxy: bioinformatics for rice scientists
Ramil P. Mauleon
Scientist – Bioinformatics Specialist
TT Chang Genetic Resources Center

International Rice Research Institute
ICG-8, Shenzhen, China
Presented in behalf of my co-authors
from IRRI
Lead Scientists
• Kenneth L. McNally – Genebank resequencing
• Nickolai Alexandrov – rice informatics consortium
• Michael Thomson – Genotyping Service Laboratory
• Hei Leung – Program Leader
Laboratory, software team
• Venice Margaret Juanillas
• Christine Jade Dilla-Ermita
Outline
• Introduction to IRRI & it’s research agenda
• Bioinformatics support to molecular rice
breeding at IRRI: IRRI GSL Galaxy
• Bioinformatics support to efforts for
harnessing Rice Genetic Diversity
• Future activity: International Rice
Informatics Consortium
INTERNATIONAL RICE RESEARCH INSTITUTE
Los Baños, Philippines

Mission:

Reduce poverty and
hunger,
Improve the health of rice
farmers and consumers,

Ensure environmental
sustainability
All done through research,
partnerships

Home of the Rice Green Revolution
Established 1960
www.irri.org

Aims to help rice farmers improve the yield and quality of their rice by developing..
•New rice varieties
•Rice crop management techniques
A single strategic work plan for global
rice research…
Global Rice Science Partnership : GRiSP
o
o
o

Core: 3 international research centers
Numerous research partners
NEED TO SHARE RESEARCH SOLUTIONS

IRRI

Many more…
First GRiSP Research Theme with heavy
bioinformatics …

Accelerating the development, delivery,
and adoption of improved rice
varieties
• 2.1. Breeding informatics, highthroughput marker applications, and
multi-environment testing
Allele mining for crop improvement
Diverse rice germplasm
“allele pool”
Gene and QTL mapping

Association mapping

Fine-mapping, candidate gene analysis, cloning
Genes controlling traits of interest
Beneficial alleles
Flanking and gene-based markers
for molecular breeding
Major QTLs/genes for breeding
• QTLs and major genes for
stress tolerance and disease
resistance are known
• Flanking SSRs and genebased STS markers have been
used to transfer these major
QTLs
• Move to SNP markers for
Marker Assisted Backcrossing
(MABC). Marker Assisted
Selection (MAS), Genomic
Selection (GS)
PXO71

PXO99

PXO363

PXO341

Xa genes for bacterial leaf blight
Challenges for IRRI scientists/breeders
• Not familiar with SNP-based genotyping
o

How do I score the alleles? (no gel image!!!)

o

Data does not fit my spreadsheet (run out of columns, rows)…

o

Cannot even view the data file using “ordinary” apps

o

Computer runs out of memory when I load the dataset…

o

Trusted analysis software crashes inexplicably…

• We need to
o enable field/bench researchers for bioinformatics
o Share solutions openly across GRiSP partners, with rice
research community as a whole
Galaxy has features that fit our needs
Open, web-based platform for accessible, reproducible, and
transparent computational biomedical research.
• Accessible: Users w/o programming experience can easily
specify parameters and run tools and workflows.
• Reproducible: Galaxy captures info so that any user can
repeat and understand a complete computational
analysis.
• Transparent: Users share and publish analyses via the
web and create interactive, web-based documents that
describe a complete analysis.
Integration of Galaxy to Genotyping Service Lab workflow
Illumina BeadXpress Genotyping

GenomeStudio with Alchemy
Plugin

Infinium Custom 6k
chip

Software tools on IRRI GSL-Galaxy
Fluidigm
EP1Genotyping

Data preparation
Genetic / association
analysis
Biioinformatic
analysis
Standard Galaxy release
IRRI GALAXY (current)

•Deployed in the cloud (Amazon Web
Services Large instance in Asia-Pacific
region)
•Streamlined to contain rice-specific
tools and genotyping data
•NO NGS assembly tools included
Rice genome browser installed as data source for
curated SNP, genome information

Comprehensive information on SNPs used in GSL
Data manipulation tools in GSL Galaxy

•Format conversion for
most commonly used
genetic analysis,
diversity study software
tools
Workflows for rice data analysis already
available

In place for Illumina BeadXpres, Infinium platforms, being tested on
Fluidigm system…
Software Tools for SNP analysis
•
•
•
•

SNP calling: Alchemy (Wright et al 2010)
SNP data exploration, visualization: Flapjack, TASSEL
Genetic linkage mapping: Mapmanager QTX, R/QTL
QTL analysis: R/QTL, Qgene, MPMap (for multi-parent
crosses)
• GWA analysis: TASSEL
• Population structure / diversity analysis : Powermarker,
Structure
Flapjack: visualize, manipulate SNP data

http://bioinf.scri.ac.uk/flapjack/index.shtml

•View genotypes graphically, with color code (nucleotide,
compared to selected line …)
•Select/deselect lines, markers from dataset,
•Filter lines by markers
•Basic statistics on dataset
http://statgen.ncsu.edu/powermarker/
TASSEL (Buckler Laboratory, Cornell University) : a software
package to evaluate traits associations, evolutionary
patterns, and linkage disequilibrium.
Three areas of strength:
1. Integrates with various diversity databases (Panzea,
Gramene, Sorghum , and GRIN )
2. Provides new and powerful statistical approaches to
association mapping eg. General Linear Model (GLM)
and Mixed Linear Model (MLM).

3. handles a wide range of indels (insertion & deletions)
which is the most common type of polymorphism in maize.

www.maizegenetics.net/tassel
Tassel stand-alone has lots of analysis tools…
TASSEL analysis tools are being
incorporated into Galaxy …
TASSEL pipeline
mode for
GWAS,
population
analysis
IRRI Galaxy Toolshed (“APPS STORE”) is
under development
Genotyping data management
IRRI GSL manages data of customers …
• Customer declares as private – retained in GSL Galaxy
account of customer
• Customer declares data as public – loaded into
Genotyping Data Management System; shared with
research community
Genotype data matrix …
Second GRiSP Research Theme with
heavy bioinformatics …
Harnessing genetic diversity to chart new
productivity, quality, and health horizons
1.2. Characterizing genetic diversity and creating
novel gene pools (SNP genotypes, whole genome
sequencing, phenotypes)
1.3. Genes and allelic diversity conferring stress
tolerance and enhanced nutrition (candidate
genes)
IRGC – the International Rice Genebank Collection @ IRRI
World’s largest collection of rice germplasm held in trust for the
world community and source countries (www.irri.org/GRC)

• Over 117,000 accessions from
117 countries
• Two cultivated species
Oryza sativa
Oryza glaberrima
• 22 wild species
• Relatively few accessions have
donated alleles to current, highyielding varieties
Rice exhibits deep population structure.

Phylogenetic tree for
200K SNPs on 3,000 lines
McNally et al., 2013
unpublished

Unpublished data removed
The Rice 3,000 Genomes Project: Sequencing for Crop Improvement
Kenneth McNally, Nickolai Alexandrov, Ramil Mauleon, Chengzhi Liang, Ruaraidh
Sackville Hamilton,
Zhikang Li, Ren Wang, Hongliang Chen, Gengyun Zhang, Hongsheng Liang,
Hei Leung, Achim Dobermann, Robert Zeigler

CAAS

+ Many Analysis Partners
.
.
.
Bioinformatics challenges of the project…

• Primary data analysis: SNP calls, reference
genome refinement, phylogenetic analysis,
genotype  phenotype association, etc…
• Efficient database system that allows the
integration of the genebank information with
phenotypic, breeding, genomic, and IPR data
• Development of toolkits/workbenches for use by
research scientists and rice breeders
• Make these databases, tools, & analyses results
publicly accessible (& constantly updated)
More analysis …
Can we assemble new references?
Find important SNPs (merging with current GWAS/QTL results)
- in CDSs
- in promoters and other regulatory motifs
Reconstruct large deletions/insertions/inversions in genome
Find correlated SNPs
Focus on known genes associated with traits
Find conserved genome regions selected by breeders

•Need for speed
•Need for collaborations
IRIC: International Rice Informatics
Consortium
PAG 2013 : first introduction of the initiative to the scientific community

IRIC Portal is a central
point of IRIC

NIAS
MIPS
CAS
Academia Sinica
EMBRAPA
CSHL

Cornell
Cirad
CAAS
MPI
AGI
Gramene

TGAC
IRD
KZI
Wageningen UR
Plant Onto
Uni Queensland
Initial Contact Organizations
GRiSP Centers (4)
IRRI
CIAT
IRD
Cirad
Breeding Companies (7)
Bayer CropSciences
Biogemma
Mahyco
Mars Food Global
Pioneer
RiceTec
Syngenta

Foundations (5)
Gates Foundation
NSF, U.S.A.
Sloan Foundation
USAID, U.S.A.
USDA-CREES

Universities (14)
Arizona Genomics Institute
Cornell University
Federal University of Pelotas, Brazil
Huazhong AU, China
Katsetsart University, Thailand
Kyung Hee University, Korea
Louisiana State University
Michigan State University
Oregon State University
Perpignan University, France
UC-Riverside
University of Delaware
University of Queensland, Australia
Wageningen UR, Netherlands

Others (5)
GigaScience Journal
GigaDB.org
iPlant Collaborative, U.S.A.
Gramene
Plant & Trait Ontology

Institutions (16)
Academia Sinica, Taiwan
CAAS, Beijing & Shenzhen
CAS, Beijing
Cold Spring Harbor Laboratory
EMBL-EBI, U.K.
EMBRAPA, Brazil
ICAR, India
INRA, France
Kunming Zoo Institute, China
MIPS, Germany
MPI-Tuebingen, Germany
NCGR-CAS, SIBS, Shanghai
NIAS, Japan
The Genome Analysis Centre, U.K.
USDA-Research
NCGR, Sante Fe , NM
Tech Companies (3)
BGI-Shenzhen
Affymetrix
Pacific Biosciences
Existing rice portals
MSU Rice Genome Annotation Project http://rice.plantbiology.msu.edu/
RAP-DB http://rapdb.dna.affrc.go.jp/

BGI-RIS Rice Information System
http://rice.genomics.org.cn/rice/index2.jsp
PlantGDB
http://www.plantgdb.org/OsGDB/

Gramene
http://www.gramene.org/
IRIC portal content
•

•
•
•
•
•
•
•
•
•
•

Sequences and analysis of 3,000 genomes*
o
SNPs
o
assemblies
o
phylogenetic trees
o
genes associated with traits
o
regulatory motifs
o
most significant variations
Other available rice genome sequences (~2,000 rice entries in SRA)
Sequences of rice microorganisms
Sequences of other grasses (e.g. for C4 project)
Genotyping results from GBS, 44K and 700K affy chips
Phenotypic data
Gene expression data
Gene functions and networks
Analysis tools
Linked to rice seeds database
Linked to other IRRI databases and portals

*Total amount of genotyping data: ~3K*20Mio = 60Bio
IRIC portal development team @ IRRI
Rolando Jay Santos
Victor Jun Ulat
Frances Nikki Borja
Venice Margarette Juanillas
Jeffrey Detras
Roven Rommel Fuentes
Ramil Mauleon
Kenneth McNally
Nickolai Alexandrov

Technological advices
Marco van den Berg

Visionary input
Achim Dobermann

Management team
Hei Leung
Ruaraidh Sackville Hamilton
Kenneth McNally
Ramil Mauleon
Nickolai Alexandrov
We need you!!!
• Now Hiring: Two (2) post-doctoral positions for
computational biology / bioinformatics based at IRRI
• http://irri.org

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use Cases
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...
 
Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 
AI and Machine Learning for Secondary Metabolite Prediction
AI and Machine Learning for Secondary Metabolite PredictionAI and Machine Learning for Secondary Metabolite Prediction
AI and Machine Learning for Secondary Metabolite Prediction
 
Introducing the KnetMiner Knowledge Graph: things, not strings
Introducing the KnetMiner Knowledge Graph: things, not stringsIntroducing the KnetMiner Knowledge Graph: things, not strings
Introducing the KnetMiner Knowledge Graph: things, not strings
 
KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 
CSHALS 2013
CSHALS 2013CSHALS 2013
CSHALS 2013
 
AgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasAgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with Bioschemas
 
Data sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryData sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK Story
 
BTIS
BTISBTIS
BTIS
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
Experimenta
ExperimentaExperimenta
Experimenta
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of Bioinformatics
 
2015 Summer - Araport Project Overview Leaflet
2015 Summer - Araport Project Overview Leaflet2015 Summer - Araport Project Overview Leaflet
2015 Summer - Araport Project Overview Leaflet
 
Towards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imagingTowards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imaging
 
Artificial Intelligence in Life Sciences and Agriculture.
Artificial Intelligence in Life Sciences and Agriculture.Artificial Intelligence in Life Sciences and Agriculture.
Artificial Intelligence in Life Sciences and Agriculture.
 

Ähnlich wie Ramil Mauleon: Galaxy: bioinformatics for rice scientists

Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Spark Summit
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Monica Munoz-Torres
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
Michael Atkins
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
Yatpang Cheung
 

Ähnlich wie Ramil Mauleon: Galaxy: bioinformatics for rice scientists (20)

Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
Towards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsTowards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery Labs
 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
 
Ramil Mauleon: IRRI GALAXY: bioinformatics for rice scientists
Ramil Mauleon: IRRI GALAXY: bioinformatics for rice scientistsRamil Mauleon: IRRI GALAXY: bioinformatics for rice scientists
Ramil Mauleon: IRRI GALAXY: bioinformatics for rice scientists
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra Orchard
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
CV_10/17
CV_10/17CV_10/17
CV_10/17
 
Cv long
Cv longCv long
Cv long
 
Supporting researchers in the molecular life sciences Jeff Christiansen
Supporting researchers in the molecular life sciences Jeff Christiansen Supporting researchers in the molecular life sciences Jeff Christiansen
Supporting researchers in the molecular life sciences Jeff Christiansen
 
Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015
 
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality?  - William HsiaoHow Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
 
Mik Black bioinformatics symposium
Mik Black bioinformatics symposiumMik Black bioinformatics symposium
Mik Black bioinformatics symposium
 
Mik Black bioinformatics symposium
Mik Black bioinformatics symposiumMik Black bioinformatics symposium
Mik Black bioinformatics symposium
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
 

Mehr von GigaScience, BGI Hong Kong

Mehr von GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Ramil Mauleon: Galaxy: bioinformatics for rice scientists

  • 1. IRRI Genotyping Service Laboratory Galaxy: bioinformatics for rice scientists Ramil P. Mauleon Scientist – Bioinformatics Specialist TT Chang Genetic Resources Center International Rice Research Institute ICG-8, Shenzhen, China
  • 2. Presented in behalf of my co-authors from IRRI Lead Scientists • Kenneth L. McNally – Genebank resequencing • Nickolai Alexandrov – rice informatics consortium • Michael Thomson – Genotyping Service Laboratory • Hei Leung – Program Leader Laboratory, software team • Venice Margaret Juanillas • Christine Jade Dilla-Ermita
  • 3. Outline • Introduction to IRRI & it’s research agenda • Bioinformatics support to molecular rice breeding at IRRI: IRRI GSL Galaxy • Bioinformatics support to efforts for harnessing Rice Genetic Diversity • Future activity: International Rice Informatics Consortium
  • 4. INTERNATIONAL RICE RESEARCH INSTITUTE Los Baños, Philippines Mission: Reduce poverty and hunger, Improve the health of rice farmers and consumers, Ensure environmental sustainability All done through research, partnerships Home of the Rice Green Revolution Established 1960 www.irri.org Aims to help rice farmers improve the yield and quality of their rice by developing.. •New rice varieties •Rice crop management techniques
  • 5. A single strategic work plan for global rice research… Global Rice Science Partnership : GRiSP o o o Core: 3 international research centers Numerous research partners NEED TO SHARE RESEARCH SOLUTIONS IRRI Many more…
  • 6. First GRiSP Research Theme with heavy bioinformatics … Accelerating the development, delivery, and adoption of improved rice varieties • 2.1. Breeding informatics, highthroughput marker applications, and multi-environment testing
  • 7. Allele mining for crop improvement Diverse rice germplasm “allele pool” Gene and QTL mapping Association mapping Fine-mapping, candidate gene analysis, cloning Genes controlling traits of interest Beneficial alleles Flanking and gene-based markers for molecular breeding
  • 8. Major QTLs/genes for breeding • QTLs and major genes for stress tolerance and disease resistance are known • Flanking SSRs and genebased STS markers have been used to transfer these major QTLs • Move to SNP markers for Marker Assisted Backcrossing (MABC). Marker Assisted Selection (MAS), Genomic Selection (GS) PXO71 PXO99 PXO363 PXO341 Xa genes for bacterial leaf blight
  • 9. Challenges for IRRI scientists/breeders • Not familiar with SNP-based genotyping o How do I score the alleles? (no gel image!!!) o Data does not fit my spreadsheet (run out of columns, rows)… o Cannot even view the data file using “ordinary” apps o Computer runs out of memory when I load the dataset… o Trusted analysis software crashes inexplicably… • We need to o enable field/bench researchers for bioinformatics o Share solutions openly across GRiSP partners, with rice research community as a whole
  • 10. Galaxy has features that fit our needs Open, web-based platform for accessible, reproducible, and transparent computational biomedical research. • Accessible: Users w/o programming experience can easily specify parameters and run tools and workflows. • Reproducible: Galaxy captures info so that any user can repeat and understand a complete computational analysis. • Transparent: Users share and publish analyses via the web and create interactive, web-based documents that describe a complete analysis.
  • 11. Integration of Galaxy to Genotyping Service Lab workflow Illumina BeadXpress Genotyping GenomeStudio with Alchemy Plugin Infinium Custom 6k chip Software tools on IRRI GSL-Galaxy Fluidigm EP1Genotyping Data preparation Genetic / association analysis Biioinformatic analysis
  • 13. IRRI GALAXY (current) •Deployed in the cloud (Amazon Web Services Large instance in Asia-Pacific region) •Streamlined to contain rice-specific tools and genotyping data •NO NGS assembly tools included
  • 14. Rice genome browser installed as data source for curated SNP, genome information Comprehensive information on SNPs used in GSL
  • 15. Data manipulation tools in GSL Galaxy •Format conversion for most commonly used genetic analysis, diversity study software tools
  • 16. Workflows for rice data analysis already available In place for Illumina BeadXpres, Infinium platforms, being tested on Fluidigm system…
  • 17. Software Tools for SNP analysis • • • • SNP calling: Alchemy (Wright et al 2010) SNP data exploration, visualization: Flapjack, TASSEL Genetic linkage mapping: Mapmanager QTX, R/QTL QTL analysis: R/QTL, Qgene, MPMap (for multi-parent crosses) • GWA analysis: TASSEL • Population structure / diversity analysis : Powermarker, Structure
  • 18. Flapjack: visualize, manipulate SNP data http://bioinf.scri.ac.uk/flapjack/index.shtml •View genotypes graphically, with color code (nucleotide, compared to selected line …) •Select/deselect lines, markers from dataset, •Filter lines by markers •Basic statistics on dataset
  • 20. TASSEL (Buckler Laboratory, Cornell University) : a software package to evaluate traits associations, evolutionary patterns, and linkage disequilibrium. Three areas of strength: 1. Integrates with various diversity databases (Panzea, Gramene, Sorghum , and GRIN ) 2. Provides new and powerful statistical approaches to association mapping eg. General Linear Model (GLM) and Mixed Linear Model (MLM). 3. handles a wide range of indels (insertion & deletions) which is the most common type of polymorphism in maize. www.maizegenetics.net/tassel
  • 21. Tassel stand-alone has lots of analysis tools…
  • 22. TASSEL analysis tools are being incorporated into Galaxy … TASSEL pipeline mode for GWAS, population analysis
  • 23. IRRI Galaxy Toolshed (“APPS STORE”) is under development
  • 24. Genotyping data management IRRI GSL manages data of customers … • Customer declares as private – retained in GSL Galaxy account of customer • Customer declares data as public – loaded into Genotyping Data Management System; shared with research community
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 32.
  • 33.
  • 34. Second GRiSP Research Theme with heavy bioinformatics … Harnessing genetic diversity to chart new productivity, quality, and health horizons 1.2. Characterizing genetic diversity and creating novel gene pools (SNP genotypes, whole genome sequencing, phenotypes) 1.3. Genes and allelic diversity conferring stress tolerance and enhanced nutrition (candidate genes)
  • 35. IRGC – the International Rice Genebank Collection @ IRRI World’s largest collection of rice germplasm held in trust for the world community and source countries (www.irri.org/GRC) • Over 117,000 accessions from 117 countries • Two cultivated species Oryza sativa Oryza glaberrima • 22 wild species • Relatively few accessions have donated alleles to current, highyielding varieties
  • 36. Rice exhibits deep population structure. Phylogenetic tree for 200K SNPs on 3,000 lines McNally et al., 2013 unpublished Unpublished data removed
  • 37. The Rice 3,000 Genomes Project: Sequencing for Crop Improvement Kenneth McNally, Nickolai Alexandrov, Ramil Mauleon, Chengzhi Liang, Ruaraidh Sackville Hamilton, Zhikang Li, Ren Wang, Hongliang Chen, Gengyun Zhang, Hongsheng Liang, Hei Leung, Achim Dobermann, Robert Zeigler CAAS + Many Analysis Partners . . .
  • 38. Bioinformatics challenges of the project… • Primary data analysis: SNP calls, reference genome refinement, phylogenetic analysis, genotype  phenotype association, etc… • Efficient database system that allows the integration of the genebank information with phenotypic, breeding, genomic, and IPR data • Development of toolkits/workbenches for use by research scientists and rice breeders • Make these databases, tools, & analyses results publicly accessible (& constantly updated)
  • 39. More analysis … Can we assemble new references? Find important SNPs (merging with current GWAS/QTL results) - in CDSs - in promoters and other regulatory motifs Reconstruct large deletions/insertions/inversions in genome Find correlated SNPs Focus on known genes associated with traits Find conserved genome regions selected by breeders •Need for speed •Need for collaborations
  • 40. IRIC: International Rice Informatics Consortium PAG 2013 : first introduction of the initiative to the scientific community IRIC Portal is a central point of IRIC NIAS MIPS CAS Academia Sinica EMBRAPA CSHL Cornell Cirad CAAS MPI AGI Gramene TGAC IRD KZI Wageningen UR Plant Onto Uni Queensland
  • 41. Initial Contact Organizations GRiSP Centers (4) IRRI CIAT IRD Cirad Breeding Companies (7) Bayer CropSciences Biogemma Mahyco Mars Food Global Pioneer RiceTec Syngenta Foundations (5) Gates Foundation NSF, U.S.A. Sloan Foundation USAID, U.S.A. USDA-CREES Universities (14) Arizona Genomics Institute Cornell University Federal University of Pelotas, Brazil Huazhong AU, China Katsetsart University, Thailand Kyung Hee University, Korea Louisiana State University Michigan State University Oregon State University Perpignan University, France UC-Riverside University of Delaware University of Queensland, Australia Wageningen UR, Netherlands Others (5) GigaScience Journal GigaDB.org iPlant Collaborative, U.S.A. Gramene Plant & Trait Ontology Institutions (16) Academia Sinica, Taiwan CAAS, Beijing & Shenzhen CAS, Beijing Cold Spring Harbor Laboratory EMBL-EBI, U.K. EMBRAPA, Brazil ICAR, India INRA, France Kunming Zoo Institute, China MIPS, Germany MPI-Tuebingen, Germany NCGR-CAS, SIBS, Shanghai NIAS, Japan The Genome Analysis Centre, U.K. USDA-Research NCGR, Sante Fe , NM Tech Companies (3) BGI-Shenzhen Affymetrix Pacific Biosciences
  • 42. Existing rice portals MSU Rice Genome Annotation Project http://rice.plantbiology.msu.edu/ RAP-DB http://rapdb.dna.affrc.go.jp/ BGI-RIS Rice Information System http://rice.genomics.org.cn/rice/index2.jsp PlantGDB http://www.plantgdb.org/OsGDB/ Gramene http://www.gramene.org/
  • 43. IRIC portal content • • • • • • • • • • • Sequences and analysis of 3,000 genomes* o SNPs o assemblies o phylogenetic trees o genes associated with traits o regulatory motifs o most significant variations Other available rice genome sequences (~2,000 rice entries in SRA) Sequences of rice microorganisms Sequences of other grasses (e.g. for C4 project) Genotyping results from GBS, 44K and 700K affy chips Phenotypic data Gene expression data Gene functions and networks Analysis tools Linked to rice seeds database Linked to other IRRI databases and portals *Total amount of genotyping data: ~3K*20Mio = 60Bio
  • 44. IRIC portal development team @ IRRI Rolando Jay Santos Victor Jun Ulat Frances Nikki Borja Venice Margarette Juanillas Jeffrey Detras Roven Rommel Fuentes Ramil Mauleon Kenneth McNally Nickolai Alexandrov Technological advices Marco van den Berg Visionary input Achim Dobermann Management team Hei Leung Ruaraidh Sackville Hamilton Kenneth McNally Ramil Mauleon Nickolai Alexandrov
  • 45. We need you!!! • Now Hiring: Two (2) post-doctoral positions for computational biology / bioinformatics based at IRRI • http://irri.org