SlideShare ist ein Scribd-Unternehmen logo
1 von 39
CBIIT GigaGalaxy – A Galaxy-based Platform
     for Large-scale Genomics Analysis
                    Tin-Lap, LEE
            School of Biomedical Sciences,
      CUHK-BGI Innovation Institute of Trans-omics,
         The Chinese University of Hong Kong,
                Hong Kong SAR, China.
CBIIT
        • Jointly established between
          The Chinese University of
          Hong Kong (CUHK) and BGI.

        • “We aim to provide a
          platform conducive to
          training of multi-disciplinary
          talents conversant with the
          knowledge and application
          of genomics, proteomics,
          genetics , computation
          biology and bioinformatics,
          by capitalizing on both
          institutions’ expertise and
          strengths in genomic
          science.”
Big Data Translates into Big
  Opportunities... and Big
     Responsibilities
The challenges for biomedical scientists
The challenges for biomedical scientists
http://galaxyproject.org/
CBIIT GigaGalaxy
Highlights:

• Provides enhanced functionality in additional to the original Galaxy
  functions

     Specialized instances

     Speed: local servers with SBS-UCSC genome database mirror in Hong
      Kong

     Reproducibility: Seamless integration with Taverna/myExperiment
      workflows

     Data exchange and publishing: GigaScience journal portal/GigaDB

     Customized functions and more…..
CBIIT GigaGalaxy

Benefits:

 Simplifies complicated bioinformatics tasks, accelerate data processing and
  allow flexible analysis.

 Significantly reduce software and hardware costs, encourage research
  collaboration.
Galaxy/CUHK-BGI




http://www.cuhk.edu.hk/cbiit/galaxy.html
CBIIT GigaGalaxy Structure

    Tool
Development     Biomedical and bioinformatics research   Publishing
What is SOAP?
•   SOAP - a tool package that provides full solution to NGS data analysis by BGI.




                                                   http://soap.genomics.org.cn/
Why SOAP?
• Galaxy has been using SAMtools for consensus sequence calling, but the
  recent upgrade has left this part out, which is very limited to some
  biologists.

• SOAPsnp is the only other method that can call full consensus sequences
  besides SAMtools.

• The main galaxy site supports none of the SOAP tools, including SOAPsnp.
Galaxy Tool Shed
• Enables sharing of Galaxy tools across
  Galaxy servers around the world.

• SOAP package tools configured for use in
  Galaxy.
   – SOAPsnp/SOAPdenovo
NGS mapping: SOAP1
NGS mapping: SOAP2
SOAPsnp
SOAPpopindel
NGS De Novo Assembly: SOAPdenovo
NGS De Novo Assembly: SOAPdenovo2
CBIIT GigaGalaxy structure

Bioinformatics
Development        Biomedical and bioinformatics research   Publishing
How does it work?

                              • myExperiment -a repository for workflows.

                               Taverna workflows.

                               New: Galaxy workflows.

                              • CBIIT GigaGalaxy integration
http://www.myexperiment.org
Taverna workflow




          http://www.taverna.org.uk/
Galaxy workflow
Import (1)
Import (2)
Export (1)
Export (2)
SOAPdenovo2 Galaxy workflow
CBIIT GigaGalaxy structure

Bioinformatics
Development        Biomedical and bioinformatics research   Publishing
Now launched…




        Large-Scale Data
        Journal/Database
       In conjunction with:


Editor-in-Chief: Laurie Goodman, PhD
Editor: Scott Edmunds, PhD
Commissioning Editor: Nicole Nogoy, PhD

     www.gigasciencejournal.com
GigaScience is go…
Data Publishing




 www.gigaDB.org
40 Datasets with DOI®s
Invertebrate                                             Released pre-publication
Ant                            Vertebrates               Non-BGI
- Florida carpenter ant        Giant panda               Paper in GigaScience
- Jerdon’s jumping ant         Macaque
- Leaf-cutter ant              - Chinese rhesus              Plants
Roundworm                      - Crab-eating                 Chinese cabbage
Schistosoma                    Mini-Pig                      Cucumber
Silkworm                       Naked mole rat                Foxtail millet
                               Parrot                        Pigeonpea
                               Penguin                       Potato
Human                                                        Sorghum
Asian individual (YH) v1+v2    - Emperor penguin
- DNA Methylome                - Adelie penguin
- Genome Assembly              Pigeon, domestic
- Transcriptome                Polar bear
                                                          Coming soon…
Cancer (14TB)                  Sheep
                                                          Microbiome data
Hep B infected exomes          Tibetan antelope
Single Cell Bladder Cancer     Microbes
Ancient DNA                    E. Coli O104:H4 TY-2482
- Saqqaq Eskimo                Cell-Line
- Aboriginal Australian        Chinese Hamster Ovary
                               Mouse Methylomes
GigaDB v2 export to CBIIT GigaGalaxy
How are we supporting data
               reproducibility?


                    Data sets




GigaScience
   paper            Analyses


                                    Community tools for
                                data reproduction and reuse
CBIIT GigaGalaxy

                                 Big data
                                 from the
Data, Data, Data…              “Sequencing
                                Coal Face”




                    Data
                   Modeling


              Pipeline
               design
                                             Tin-Lap Lee, CUHK

                  Validation




            Applications
Acknowledgements
•   Lee Lab (CUHK)                             • myExperiment
     – Huayan Gao                                 – Finn Bacall
                                                  – Dave De Roure
•   GigaScience
                                               • NBIC
     – Scott Edmunds
                                                  – Kostas Karasavvas
     – Peter Li
     – Tam Sneddon


•   BGI-Hong Kong      BGI-Shenzhen
     – Dennis Chan     -   Ruiqiang Li
                       -   Ruibang Luo
     – Edmond Leung
                       -   Haofu Wu
                       -   SOAP team members

•   Galaxy team
     – Nate Coraor
Thank you

Weitere ähnliche Inhalte

Ähnlich wie Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

Wf4Ever: Workflow Preservation
Wf4Ever: Workflow PreservationWf4Ever: Workflow Preservation
Wf4Ever: Workflow Preservation
Jose Enrique Ruiz
 

Ähnlich wie Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis (20)

Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
Ramil Mauleon: IRRI GALAXY: bioinformatics for rice scientists
Ramil Mauleon: IRRI GALAXY: bioinformatics for rice scientistsRamil Mauleon: IRRI GALAXY: bioinformatics for rice scientists
Ramil Mauleon: IRRI GALAXY: bioinformatics for rice scientists
 
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
 
Scott Edmunds flashtalk slides from Beyond the PDF2
Scott Edmunds flashtalk slides from Beyond the PDF2Scott Edmunds flashtalk slides from Beyond the PDF2
Scott Edmunds flashtalk slides from Beyond the PDF2
 
Scott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationScott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data Citation
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASA
 
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
 
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
 
Danis biosystematics2011
Danis biosystematics2011Danis biosystematics2011
Danis biosystematics2011
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
 
Wf4Ever: Workflow Preservation
Wf4Ever: Workflow PreservationWf4Ever: Workflow Preservation
Wf4Ever: Workflow Preservation
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysis
 
GigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDBGigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDB
 
ANTABIF at the BELSPO-SOA event
ANTABIF at the BELSPO-SOA eventANTABIF at the BELSPO-SOA event
ANTABIF at the BELSPO-SOA event
 
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011
 
If we build it will they come?
If we build it will they come?If we build it will they come?
If we build it will they come?
 

Mehr von GigaScience, BGI Hong Kong

Mehr von GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Kürzlich hochgeladen (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Tin-Lap Lee: CBIIT GigaGalaxy: A Galaxy-based platform for large-scale genomics analysis

  • 1. CBIIT GigaGalaxy – A Galaxy-based Platform for Large-scale Genomics Analysis Tin-Lap, LEE School of Biomedical Sciences, CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Hong Kong SAR, China.
  • 2. CBIIT • Jointly established between The Chinese University of Hong Kong (CUHK) and BGI. • “We aim to provide a platform conducive to training of multi-disciplinary talents conversant with the knowledge and application of genomics, proteomics, genetics , computation biology and bioinformatics, by capitalizing on both institutions’ expertise and strengths in genomic science.”
  • 3. Big Data Translates into Big Opportunities... and Big Responsibilities
  • 4. The challenges for biomedical scientists
  • 5. The challenges for biomedical scientists
  • 7. CBIIT GigaGalaxy Highlights: • Provides enhanced functionality in additional to the original Galaxy functions  Specialized instances  Speed: local servers with SBS-UCSC genome database mirror in Hong Kong  Reproducibility: Seamless integration with Taverna/myExperiment workflows  Data exchange and publishing: GigaScience journal portal/GigaDB  Customized functions and more…..
  • 8. CBIIT GigaGalaxy Benefits:  Simplifies complicated bioinformatics tasks, accelerate data processing and allow flexible analysis.  Significantly reduce software and hardware costs, encourage research collaboration.
  • 10. CBIIT GigaGalaxy Structure Tool Development Biomedical and bioinformatics research Publishing
  • 11. What is SOAP? • SOAP - a tool package that provides full solution to NGS data analysis by BGI. http://soap.genomics.org.cn/
  • 12. Why SOAP? • Galaxy has been using SAMtools for consensus sequence calling, but the recent upgrade has left this part out, which is very limited to some biologists. • SOAPsnp is the only other method that can call full consensus sequences besides SAMtools. • The main galaxy site supports none of the SOAP tools, including SOAPsnp.
  • 13. Galaxy Tool Shed • Enables sharing of Galaxy tools across Galaxy servers around the world. • SOAP package tools configured for use in Galaxy. – SOAPsnp/SOAPdenovo
  • 18. NGS De Novo Assembly: SOAPdenovo
  • 19. NGS De Novo Assembly: SOAPdenovo2
  • 20. CBIIT GigaGalaxy structure Bioinformatics Development Biomedical and bioinformatics research Publishing
  • 21. How does it work? • myExperiment -a repository for workflows.  Taverna workflows.  New: Galaxy workflows. • CBIIT GigaGalaxy integration http://www.myexperiment.org
  • 22. Taverna workflow http://www.taverna.org.uk/
  • 23.
  • 30. CBIIT GigaGalaxy structure Bioinformatics Development Biomedical and bioinformatics research Publishing
  • 31. Now launched… Large-Scale Data Journal/Database In conjunction with: Editor-in-Chief: Laurie Goodman, PhD Editor: Scott Edmunds, PhD Commissioning Editor: Nicole Nogoy, PhD www.gigasciencejournal.com
  • 34. 40 Datasets with DOI®s Invertebrate Released pre-publication Ant Vertebrates Non-BGI - Florida carpenter ant Giant panda Paper in GigaScience - Jerdon’s jumping ant Macaque - Leaf-cutter ant - Chinese rhesus Plants Roundworm - Crab-eating Chinese cabbage Schistosoma Mini-Pig Cucumber Silkworm Naked mole rat Foxtail millet Parrot Pigeonpea Penguin Potato Human Sorghum Asian individual (YH) v1+v2 - Emperor penguin - DNA Methylome - Adelie penguin - Genome Assembly Pigeon, domestic - Transcriptome Polar bear Coming soon… Cancer (14TB) Sheep Microbiome data Hep B infected exomes Tibetan antelope Single Cell Bladder Cancer Microbes Ancient DNA E. Coli O104:H4 TY-2482 - Saqqaq Eskimo Cell-Line - Aboriginal Australian Chinese Hamster Ovary Mouse Methylomes
  • 35. GigaDB v2 export to CBIIT GigaGalaxy
  • 36. How are we supporting data reproducibility? Data sets GigaScience paper Analyses Community tools for data reproduction and reuse
  • 37. CBIIT GigaGalaxy Big data from the Data, Data, Data… “Sequencing Coal Face” Data Modeling Pipeline design Tin-Lap Lee, CUHK Validation Applications
  • 38. Acknowledgements • Lee Lab (CUHK) • myExperiment – Huayan Gao – Finn Bacall – Dave De Roure • GigaScience • NBIC – Scott Edmunds – Kostas Karasavvas – Peter Li – Tam Sneddon • BGI-Hong Kong BGI-Shenzhen – Dennis Chan - Ruiqiang Li - Ruibang Luo – Edmond Leung - Haofu Wu - SOAP team members • Galaxy team – Nate Coraor

Hinweis der Redaktion

  1. The first section of this talk is about implementation of public instance using galaxy tool shed. We are currently implement the first public SOAP instance to the platform.
  2. The SOAP package provides a set of tools for processing NGS data. There are different versions of SOAP for mapping short reads to reference sequences. There are also tools like soapdenovo for construction of a new genome sequence and soapsnp which can assemble a consensus sequence and identify SNPs present on it in relation to a reference. Documentation in the BGI SOAP package is limited in scope, making the tools difficult to use. We will be working with the BGI developers in providing test data and Galaxy pipelines demonstrating the use of SOAP.
  3. Other than its popularity, another main reason to implement SOAP tool is that …
  4. We transform the command line base SOAP tool into galaxy instance by Galaxy tool shed. The tool shed is useful to transofrm any programs through python rapper. I should say the Galaxy team did a great job on this, and they are very helpful during the development process. By doing that.. It allows
  5. You can notice that all the parameters has been transformed into drop-down menu..We also put an explanation for each par. So that the user has a better understanding on each item.
  6. Similar to SOAPsnp, the complicated parameters or option has been transformed. The settings will be recorded in each run, so that one can track back easily.
  7. So much for the tool development, the second part of the talk will focus on work flow implementation using the workflows from myexperiment.
  8. What does semantic mean in the
  9. Introduction into GigaScience, a journal published by BGI and BioMed Central which focuses on the publication of papers involving the analysis of large-scale omics data - show first issue slide. In addition, the journal has a focus on enabling the experimental data and results published in its papers to be reproducible for readers.  Data produced from post-genomic experiments can be stored in GigaScience'sGigaDB database. It currently holds 37 data sets of mainly NGS data - show slide. Each data set is allocated a DOI - Digital Object Identifier which enables the data set to be uniquely identified and used for its citation, providing a handle for tracking its usage.