SlideShare ist ein Scribd-Unternehmen logo
1 von 77
Downloaden Sie, um offline zu lesen
Motivation     Solution             Implementation   Demonstration




          Developing distributed analysis
         pipelines with shared community
        resources using CloudBioLinux and
                    CloudMan

                     Brad Chapman
                   Bioinformatics Core
             Harvard School of Public Health


                          22 September 2011
Motivation     Solution    Implementation   Demonstration



Acknowledgements

     CloudBioLinux – Ntino Krampis, Tim
             Booth, Dawn Field, Pjotr Prins and
             CloudBioLinux community
     CloudMan – Enis Afgan, James Taylor
     Exome pipeline – HSPH, MGH, Win Hide,
             Oliver Hofmann
Motivation   Solution   Implementation   Demonstration



Follow along



     http://www.slideshare.net/chapmanb
Motivation   Solution   Implementation   Demonstration



Cue the “lots of data” slide


     ls -lh fastq/
     24G 1_110907_AD08A5ACXX_1_fastq.txt
     21G 1_110907_AD08A5ACXX_2_fastq.txt
     24G 2_110907_AD08A5ACXX_1_fastq.txt
     20G 2_110907_AD08A5ACXX_2_fastq.txt
Motivation   Solution   Implementation   Demonstration



Rapidly changing tools
Motivation   Solution       Implementation   Demonstration



Science – fundamental challenge



      75%     one-off experimental
      25%     reused code
Motivation       Solution      Implementation      Demonstration



Unfortunate result




     http://news.ycombinator.com/item?id=2735537
Motivation      Solution    Implementation   Demonstration



Hard choices

     Computation
     Demands flexible, well-architected, scalable
     code
     Science
     Requires rapid turn around and
     experimentation
Motivation         Solution    Implementation   Demonstration



2 solutions (at least)



         1   Improve your programming skills
         2   Utilize community resources
Motivation   Solution   Implementation   Demonstration



Become a better coder




     http://software-carpentry.org/
Motivation         Solution     Implementation     Demonstration



Community resources


             Share painful parts
             Base of well-written, scalable code

     Start each problem from a higher level of
     abstraction
Motivation         Solution    Implementation   Demonstration



Community components


             CloudBioLinux – install software
             CloudMan – manage cluster
             Exome analysis pipeline – do science
Motivation        Solution    Implementation    Demonstration



CloudBioLinux

             Amazon image with bioinformatics
             software and libraries
             Automated build framework
             Community effort to maintain and
             extend
     http://cloudbiolinux.org
Motivation        Solution    Implementation   Demonstration



CloudMan

             SGE cluster plus automation
             Web interface and monitoring
             Persistence and sharing
             Powers the Galaxy Cloud offering
     http://wiki.g2.bx.psu.edu/Admin/Cloud
Motivation         Solution        Implementation   Demonstration



Exome analysis pipeline

             Existing algorithms
                 Aligners – Bowtie, BWA
                 Variation – GATK
                 Quality assessment – FastQC, Picard
             Messaging system – AMQP
     https://github.com/chapmanb/bcbb/
     tree/master/nextgen
Motivation   Solution   Implementation   Demonstration



Fastq lane processing
Motivation   Solution   Implementation   Demonstration



Sample processing
Motivation   Solution   Implementation   Demonstration



Variant calling
Motivation   Solution   Implementation   Demonstration



Parallelization
Motivation   Solution   Implementation   Demonstration
Motivation         Solution     Implementation   Demonstration



Amazon


             Virtual machines
                  Share
                  Reproduce
                  Coordinate
             Accessibility
Motivation        Solution   Implementation   Demonstration



What are we going to do?


             Use AWS console to boot
             CloudBioLinux
             Setup CloudMan in AWS console
             Boot CloudMan instance with demo
             data
Motivation         Solution   Implementation   Demonstration



What are we going to do?
continued

             Manage cluster with CloudMan interface
             Setup messaging queue
             Run pipeline, examine results
             Share cluster
Motivation        Solution       Implementation   Demonstration



CloudBioLinux

             Select and launch CloudBioLinux AMI
             from AWS console
             Connect
                 FreeNX graphical client
                 ssh

     Full tutorial PDF: http://j.mp/nnh5TE
Motivation           Solution        Implementation       Demonstration



Prep work


             Signup for AWS account:
             http://aws.amazon.com/
             Create login key pair in AWS Console
             Install NX client:
             http://www.nomachine.com/select-package-client.php
https://console.aws.amazon.com/ec2/
Select CloudBioLinux image from Community AMIs
enter NX password in user-data (freenxpass: secret)
Launch CloudBioLinux server
Get external hostname from Instances page
Connect using NX client, with ubuntu user and secret password
Connect with ssh, using private ssh key-pair
Terminate the server when finished
Motivation         Solution    Implementation   Demonstration



Setup CloudMan in AWS console


             Create a custom security group

     Full tutorial:
     http://wiki.g2.bx.psu.edu/Admin/Cloud
Create security group rules following wiki instructions
Final security group specifications
Motivation        Solution   Implementation   Demonstration



Boot CloudMan instance with
demo data


             Start server
             Pass in CloudMan user data
             Load shared CloudMan image
Follow same procedure as CloudBioLinux
Create CloudMan user-data file


 cluster_name: cbldemo
 password: cbl
 access_key: your_access_key
 secret_key: your_long_AWS_secret_key
Provide user-data from file
Choose created security group
Login to instance with password from user-data
Motivation           Solution          Implementation          Demonstration



CloudMan share-an-instance


             Persist data in a CloudMan cluster
             Easily sharable
     For this demo
     cm-b53c6f1223f966914df347687f6fc818/shared/2011-10-07–14-00
Import shared instance with demo data
Motivation         Solution    Implementation   Demonstration



Manage cluster with CloudMan


             Web-based console
             Monitor running processes
             Add nodes to cluster as needed
CloudMan console to interact with cluster
Add node to cluster
Motivation        Solution    Implementation   Demonstration



Setup messaging communication


             Command line access to server
             Adjust RabbitMQ configuration
             Setup messaging queue
Motivation       Solution      Implementation    Demonstration



Command line access to server

     ssh -i ~/.ec2/id-kunkel.keypair
         ubuntu@ec2-67-202-14-208.compute-1.amazonaws.com


     Follow approach used to connect to
     CloudBioLinux cluster; can also connect via
     NX
Motivation       Solution   Implementation   Demonstration




     Edit /export/data/galaxy/universe_wsgi.ini
     configuration file to add internal host name.

         [galaxy_amqp]
         host = ip-10-125-10-182.ec2.internal
         port = 5672
         userid = biouser
         password = tester
Motivation         Solution        Implementation        Demonstration



Setup messaging queue

     $ sudo rabbitmqctl add_user biouser tester
      creating user ’biouser’ ...
      ...done.
     $ sudo rabbitmqctl add_vhost bionextgen
      creating vhost ’bionextgen’ ...
      ...done.
     $ sudo rabbitmqctl set_permissions -p bionextgen
            biouser ".*" ".*" ".*"
      setting permissions for user ’biouser’ in vhost ’bionextgen’ ..
      ...done.
Motivation         Solution    Implementation   Demonstration



Run pipeline, examine results


             Ready to run distributed pipeline
             Demo data – two paired end fastq lanes
             Variant calling workflow
Motivation       Solution   Implementation   Demonstration



Input sequence data


         $ ls -1 /export/data/exome_example/fastq/
         7_100326_FC6107FAAXX_1-chr22.fastq
         7_100326_FC6107FAAXX_2-chr22.fastq
         8_100326_FC6107FAAXX_1-chr22.fastq
         8_100326_FC6107FAAXX_2-chr22.fastq
Motivation         Solution        Implementation        Demonstration



Run level: YAML Configuration
     $ cat /export/data/exome_example/config/run_info.yaml
     ---
     fc_date: ’100326’
     fc_name: FC6107FAAXX
     details:
       - files: [7_100326_FC6107FAAXX_1-chr22.fastq,
                 7_100326_FC6107FAAXX_2-chr22.fastq]
         lane: 7
         description: Test replicate 1
         analysis: SNP calling
         genome_build: hg19
         algorithm:
           quality_format: Standard
           hybrid_bait: hybrid_selection/baits.bed
           hybrid_target: hybrid_selection/targets.bed
Motivation         Solution        Implementation   Demonstration



System level: YAML Configuration
     $ cat /export/data/galaxy/post_process.yaml
     ---
     program:
       bowtie: bowtie
       bwa: bwa
       ucsc_bigwig: wigToBigWig
       picard: /usr/share/java/picard
       gatk: /usr/share/java/gatk
       snpEff: /usr/share/java/snpeff
       fastqc: fastqc
     distributed:
       cluster_platform: sge
       platform_args: ’-q all.q’
       cores_per_host: 1
       rabbitmq_vhost: bionextgen
Motivation       Solution      Implementation    Demonstration



Run exome pipeline


     $ cd /export/data/work
     $ distributed_nextgen_pipeline.py
         /export/data/galaxy/post_process.yaml
         /export/data/exome_example/fastq
         /export/data/exome_example/config/run_info.yaml
Motivation   Solution   Implementation   Demonstration



What just happened?
Motivation         Solution        Implementation       Demonstration



Monitoring: SGE queues


     $ qstat
     ob-ID prior name state submit/start at    queue
     --------------------------------------------------------------
     1 0.55500 nextgen_an r 18:16:32 all.q@ip-10-125-10-182.ec2.int
     2 0.55500 nextgen_an r 18:16:32 all.q@ip-10-86-254-105.ec2.int
     3 0.55500 automated_ r 18:16:47 all.q@ip-10-125-10-182.ec2.int
Motivation         Solution        Implementation        Demonstration



Monitoring: Analysis directory


     $ cd /export/data/work
     $ ls -lh
     drwxr-xr-x 4.0 alignments
     -rw-r--r-- 2.0K automated_initial_analysis.py.o11
     drwxr-xr-x   33 log
     -rw-r--r-- 15K nextgen_analysis_server.py.o10
     -rw-r--r-- 15K nextgen_analysis_server.py.o9
     drwxr-xr-x 102 tmp
Motivation         Solution        Implementation        Demonstration



Monitoring: Log files

     $ less nextgen_analysis_server.py.o10
      INFO: nextgen_pipeline: Processing sample: Test replicate 2;
        lane 8; reference genome hg19; researcher ;
        analysis method SNP calling
      INFO: nextgen_pipeline:
        Aligning lane 8_100326_FC6107FAAXX with bwa aligner
      INFO: nextgen_pipeline:
        Combining and preparing wig file [u’’, u’Test replicate 2’]
      INFO: nextgen_pipeline:
        Recalibrating [u’’, u’Test replicate 2’] with GATK
Motivation         Solution        Implementation          Demonstration



Retrieve results: Copy files

     $ upload_to_galaxy.py
         /export/data/galaxy/post_process.yaml
         /export/data/exome_example/fastq
         /export/data/work
         /export/data/exome_example/config/run_info.yaml


     Final files copied into new directory; allows
     cleanup of analysis directory
Motivation         Solution        Implementation        Demonstration



Retrieve results: Output directory


     $ ls -lh /export/data/galaxy/storage/100326_FC6107FAAXX/7
     -rw-r--r-- 38M 7_100326_FC6107FAAXX.bam
     -rw-r--r-- 22M 7_100326_FC6107FAAXX-coverage.bigwig
     -rw-r--r-- 72M 7_100326_FC6107FAAXX-gatkrecal.bam
     -rw-r--r-- 109K 7_100326_FC6107FAAXX-snp-effects.tsv
     -rw-r--r-- 827K 7_100326_FC6107FAAXX-snp-filter.vcf
     -rw-r--r-- 1.6M 7_100326_FC6107FAAXX-summary.pdf
Motivation        Solution     Implementation       Demonstration



Share results


             Share-an-instance
             Uses CloudMan web interface
             Reproducible research
                 CloudBioLinux AMI – software
                 CloudMan – data and configuration
CloudMan console enables push button sharing
Can make public or available to specific collaborators
When finished, turn everything off through CloudMan
Motivation         Solution    Implementation   Demonstration



Summary

     CloudBioLinux

             Shared machine image of biological
             software
             Boot from AWS console
             Connect with NX graphical client and
             ssh
Motivation         Solution    Implementation   Demonstration



Summary

     CloudMan

             Cluster setup and management
             Boot from share-an-instance
             Manage cluster through web interface
             Share final results
Motivation         Solution     Implementation   Demonstration



Summary

     Exome pipeline

             Parallel framework for running analyses
             Run using automated scripts
             Extract alignments, variant calls and
             summary information
Motivation       Solution      Implementation       Demonstration



Future: interfaces make it easier




     https://bitbucket.org/hbc/galaxy-central-hbc
Motivation   Solution   Implementation   Demonstration



Future: Simplified file selection
Motivation   Solution   Implementation   Demonstration



Future: Top level parameters
Motivation   Solution   Implementation   Demonstration



Future: Galaxy data libraries
Motivation   Solution   Implementation   Demonstration



Future: Galaxy analysis
Motivation   Solution   Implementation   Demonstration



Future: External UCSC
visualization
Motivation       Solution   Implementation   Demonstration



Read more

             Step-by-step instructions
             http://j.mp/rp69nx
             Approaches to parallelism
             http://j.mp/nPQHcm
             Future work
             http://bcbio.wordpress.com

Weitere ähnliche Inhalte

Was ist angesagt?

DevNet Associate : Python introduction
DevNet Associate : Python introductionDevNet Associate : Python introduction
DevNet Associate : Python introductionJoel W. King
 
Let's go HTTPS-only! - More Than Buying a Certificate
Let's go HTTPS-only! - More Than Buying a CertificateLet's go HTTPS-only! - More Than Buying a Certificate
Let's go HTTPS-only! - More Than Buying a CertificateSteffen Gebert
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorialemedin
 
Monitoring Akka with Kamon 1.0
Monitoring Akka with Kamon 1.0Monitoring Akka with Kamon 1.0
Monitoring Akka with Kamon 1.0Steffen Gebert
 
Virtual CD4PE Workshop
Virtual CD4PE WorkshopVirtual CD4PE Workshop
Virtual CD4PE WorkshopPuppet
 
Monitor your Java application with Prometheus Stack
Monitor your Java application with Prometheus StackMonitor your Java application with Prometheus Stack
Monitor your Java application with Prometheus StackWojciech Barczyński
 
Tribal Nova Docker feedback
Tribal Nova Docker feedbackTribal Nova Docker feedback
Tribal Nova Docker feedbackNicolas Degardin
 
Monitoring kubernetes with prometheus
Monitoring kubernetes with prometheusMonitoring kubernetes with prometheus
Monitoring kubernetes with prometheusBrice Fernandes
 
Spring Boot Revisited with KoFu and JaFu
Spring Boot Revisited with KoFu and JaFuSpring Boot Revisited with KoFu and JaFu
Spring Boot Revisited with KoFu and JaFuVMware Tanzu
 
CMake: Improving Software Quality and Process
CMake: Improving Software Quality and ProcessCMake: Improving Software Quality and Process
CMake: Improving Software Quality and ProcessMarcus Hanwell
 
Cleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the Web
Cleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the WebCleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the Web
Cleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the WebSteffen Gebert
 
CloudStack usage service
CloudStack usage serviceCloudStack usage service
CloudStack usage serviceShapeBlue
 
Ninja, Choose Your Weapon!
Ninja, Choose Your Weapon!Ninja, Choose Your Weapon!
Ninja, Choose Your Weapon!Anton Weiss
 
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...Tobias Schneck
 
How to add a new hypervisor to CloudStack - Lessons learned from Hyper-V effort
How to add a new hypervisor to CloudStack - Lessons learned from Hyper-V effortHow to add a new hypervisor to CloudStack - Lessons learned from Hyper-V effort
How to add a new hypervisor to CloudStack - Lessons learned from Hyper-V effortShapeBlue
 
Gradle 3.0: Unleash the Daemon!
Gradle 3.0: Unleash the Daemon!Gradle 3.0: Unleash the Daemon!
Gradle 3.0: Unleash the Daemon!Eric Wendelin
 

Was ist angesagt? (19)

DevNet Associate : Python introduction
DevNet Associate : Python introductionDevNet Associate : Python introduction
DevNet Associate : Python introduction
 
Let's go HTTPS-only! - More Than Buying a Certificate
Let's go HTTPS-only! - More Than Buying a CertificateLet's go HTTPS-only! - More Than Buying a Certificate
Let's go HTTPS-only! - More Than Buying a Certificate
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Monitoring Akka with Kamon 1.0
Monitoring Akka with Kamon 1.0Monitoring Akka with Kamon 1.0
Monitoring Akka with Kamon 1.0
 
vBACD July 2012 - Deploying Private PaaS with ActiveState Stackato
vBACD July 2012 - Deploying Private PaaS with ActiveState StackatovBACD July 2012 - Deploying Private PaaS with ActiveState Stackato
vBACD July 2012 - Deploying Private PaaS with ActiveState Stackato
 
Virtual CD4PE Workshop
Virtual CD4PE WorkshopVirtual CD4PE Workshop
Virtual CD4PE Workshop
 
Monitor your Java application with Prometheus Stack
Monitor your Java application with Prometheus StackMonitor your Java application with Prometheus Stack
Monitor your Java application with Prometheus Stack
 
vBACD- July 2012 - Crash Course in Open Source Cloud Computing
vBACD- July 2012 - Crash Course in Open Source Cloud ComputingvBACD- July 2012 - Crash Course in Open Source Cloud Computing
vBACD- July 2012 - Crash Course in Open Source Cloud Computing
 
Tribal Nova Docker feedback
Tribal Nova Docker feedbackTribal Nova Docker feedback
Tribal Nova Docker feedback
 
Monitoring kubernetes with prometheus
Monitoring kubernetes with prometheusMonitoring kubernetes with prometheus
Monitoring kubernetes with prometheus
 
Spring Boot Revisited with KoFu and JaFu
Spring Boot Revisited with KoFu and JaFuSpring Boot Revisited with KoFu and JaFu
Spring Boot Revisited with KoFu and JaFu
 
CMake: Improving Software Quality and Process
CMake: Improving Software Quality and ProcessCMake: Improving Software Quality and Process
CMake: Improving Software Quality and Process
 
Cleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the Web
Cleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the WebCleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the Web
Cleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the Web
 
CloudStack usage service
CloudStack usage serviceCloudStack usage service
CloudStack usage service
 
Ninja, Choose Your Weapon!
Ninja, Choose Your Weapon!Ninja, Choose Your Weapon!
Ninja, Choose Your Weapon!
 
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
 
Ddev workshop t3dd18
Ddev workshop t3dd18Ddev workshop t3dd18
Ddev workshop t3dd18
 
How to add a new hypervisor to CloudStack - Lessons learned from Hyper-V effort
How to add a new hypervisor to CloudStack - Lessons learned from Hyper-V effortHow to add a new hypervisor to CloudStack - Lessons learned from Hyper-V effort
How to add a new hypervisor to CloudStack - Lessons learned from Hyper-V effort
 
Gradle 3.0: Unleash the Daemon!
Gradle 3.0: Unleash the Daemon!Gradle 3.0: Unleash the Daemon!
Gradle 3.0: Unleash the Daemon!
 

Andere mochten auch

Andere mochten auch (20)

Vol 02 chapter 8 2012
Vol 02 chapter 8 2012Vol 02 chapter 8 2012
Vol 02 chapter 8 2012
 
007 014 belcaro corrige
007 014 belcaro corrige007 014 belcaro corrige
007 014 belcaro corrige
 
Phenomenal Oct 22, 2009
Phenomenal Oct 22, 2009Phenomenal Oct 22, 2009
Phenomenal Oct 22, 2009
 
201101 affective learning
201101 affective learning201101 affective learning
201101 affective learning
 
Urban Cottage + IceMilk Aprons
Urban Cottage + IceMilk ApronsUrban Cottage + IceMilk Aprons
Urban Cottage + IceMilk Aprons
 
Eeuwigblijvenleren
EeuwigblijvenlerenEeuwigblijvenleren
Eeuwigblijvenleren
 
201004 - brain computer interaction
201004 - brain computer interaction201004 - brain computer interaction
201004 - brain computer interaction
 
Slides boekpresentatie 'Sociale Media en Journalistiek'
Slides boekpresentatie 'Sociale Media en Journalistiek'Slides boekpresentatie 'Sociale Media en Journalistiek'
Slides boekpresentatie 'Sociale Media en Journalistiek'
 
Valvuloplastie
ValvuloplastieValvuloplastie
Valvuloplastie
 
201506 CSE340 Lecture 23
201506 CSE340 Lecture 23201506 CSE340 Lecture 23
201506 CSE340 Lecture 23
 
Uip Romain
Uip RomainUip Romain
Uip Romain
 
Eddie Slide Show
Eddie Slide ShowEddie Slide Show
Eddie Slide Show
 
New Venture Presentatie
New Venture PresentatieNew Venture Presentatie
New Venture Presentatie
 
Week11 Presentation Group-C
Week11 Presentation Group-CWeek11 Presentation Group-C
Week11 Presentation Group-C
 
Jay Cross Vivo Versao Final Corrigida
Jay Cross Vivo Versao Final CorrigidaJay Cross Vivo Versao Final Corrigida
Jay Cross Vivo Versao Final Corrigida
 
lectura
lecturalectura
lectura
 
Mobile Social Media, Sept. 2010, Do You Want To Be Visible?, Marketing Club K...
Mobile Social Media, Sept. 2010, Do You Want To Be Visible?, Marketing Club K...Mobile Social Media, Sept. 2010, Do You Want To Be Visible?, Marketing Club K...
Mobile Social Media, Sept. 2010, Do You Want To Be Visible?, Marketing Club K...
 
201506 CSE340 Lecture 09
201506 CSE340 Lecture 09201506 CSE340 Lecture 09
201506 CSE340 Lecture 09
 
Chapter 11
Chapter 11Chapter 11
Chapter 11
 
Barya Perception
Barya PerceptionBarya Perception
Barya Perception
 

Ähnlich wie Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan

Amazon resource for bioinformatics
Amazon resource for bioinformaticsAmazon resource for bioinformatics
Amazon resource for bioinformaticsBrad Chapman
 
Automated Abstraction of Flow of Control in a System of Distributed Software...
Automated Abstraction of Flow of Control in a System of Distributed  Software...Automated Abstraction of Flow of Control in a System of Distributed  Software...
Automated Abstraction of Flow of Control in a System of Distributed Software...nimak
 
Mihai Criveti - PyCon Ireland - Automate Everything
Mihai Criveti - PyCon Ireland - Automate EverythingMihai Criveti - PyCon Ireland - Automate Everything
Mihai Criveti - PyCon Ireland - Automate EverythingMihai Criveti
 
Introduction to LAVA Workload Scheduler
Introduction to LAVA Workload SchedulerIntroduction to LAVA Workload Scheduler
Introduction to LAVA Workload SchedulerNopparat Nopkuat
 
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and KubeflowKostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and KubeflowIT Arena
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOAltinity Ltd
 
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~Brocade
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017Patrick Chanezon
 
CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016
CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016
CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016Amazon Web Services
 
Enhance your Agility with DevOps
Enhance your Agility with DevOpsEnhance your Agility with DevOps
Enhance your Agility with DevOpsEdureka!
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using MahoutIMC Institute
 
Managing Multi-hypervisor OpenStack Cloud with Single Virtual Network
Managing Multi-hypervisor OpenStack Cloud with Single Virtual NetworkManaging Multi-hypervisor OpenStack Cloud with Single Virtual Network
Managing Multi-hypervisor OpenStack Cloud with Single Virtual NetworkPLUMgrid
 
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and CloudAltinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and CloudAltinity Ltd
 
Priming Your Teams For Microservice Deployment to the Cloud
Priming Your Teams For Microservice Deployment to the CloudPriming Your Teams For Microservice Deployment to the Cloud
Priming Your Teams For Microservice Deployment to the CloudMatt Callanan
 
Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...
Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...
Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...Eduardo Patrocinio
 
Europe Cloud Summit - Security hardening of public cloud services
Europe Cloud Summit - Security hardening of public cloud servicesEurope Cloud Summit - Security hardening of public cloud services
Europe Cloud Summit - Security hardening of public cloud servicesRuncy Oommen
 

Ähnlich wie Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan (20)

Amazon resource for bioinformatics
Amazon resource for bioinformaticsAmazon resource for bioinformatics
Amazon resource for bioinformatics
 
Internship presentation
Internship presentationInternship presentation
Internship presentation
 
Automated Abstraction of Flow of Control in a System of Distributed Software...
Automated Abstraction of Flow of Control in a System of Distributed  Software...Automated Abstraction of Flow of Control in a System of Distributed  Software...
Automated Abstraction of Flow of Control in a System of Distributed Software...
 
Mihai Criveti - PyCon Ireland - Automate Everything
Mihai Criveti - PyCon Ireland - Automate EverythingMihai Criveti - PyCon Ireland - Automate Everything
Mihai Criveti - PyCon Ireland - Automate Everything
 
Mcollective introduction
Mcollective introductionMcollective introduction
Mcollective introduction
 
Introduction to LAVA Workload Scheduler
Introduction to LAVA Workload SchedulerIntroduction to LAVA Workload Scheduler
Introduction to LAVA Workload Scheduler
 
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and KubeflowKostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
 
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
 
CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016
CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016
CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016
 
Enhance your Agility with DevOps
Enhance your Agility with DevOpsEnhance your Agility with DevOps
Enhance your Agility with DevOps
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using Mahout
 
Managing Multi-hypervisor OpenStack Cloud with Single Virtual Network
Managing Multi-hypervisor OpenStack Cloud with Single Virtual NetworkManaging Multi-hypervisor OpenStack Cloud with Single Virtual Network
Managing Multi-hypervisor OpenStack Cloud with Single Virtual Network
 
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and CloudAltinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
 
Priming Your Teams For Microservice Deployment to the Cloud
Priming Your Teams For Microservice Deployment to the CloudPriming Your Teams For Microservice Deployment to the Cloud
Priming Your Teams For Microservice Deployment to the Cloud
 
Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...
Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...
Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...
 
Europe Cloud Summit - Security hardening of public cloud services
Europe Cloud Summit - Security hardening of public cloud servicesEurope Cloud Summit - Security hardening of public cloud services
Europe Cloud Summit - Security hardening of public cloud services
 
AWS Code Services
AWS Code ServicesAWS Code Services
AWS Code Services
 
TIAD : Automate everything with Google Cloud
TIAD : Automate everything with Google CloudTIAD : Automate everything with Google Cloud
TIAD : Automate everything with Google Cloud
 

Mehr von Brad Chapman

Biopython at BOSC 2010
Biopython at BOSC 2010Biopython at BOSC 2010
Biopython at BOSC 2010Brad Chapman
 
Developing an open source community for cloud bioinformatics
Developing an open source community for cloud bioinformaticsDeveloping an open source community for cloud bioinformatics
Developing an open source community for cloud bioinformaticsBrad Chapman
 
GATK recalibration plot
GATK recalibration plotGATK recalibration plot
GATK recalibration plotBrad Chapman
 
Next-generation sequencing request management system in Galaxy
Next-generation sequencing request management system in GalaxyNext-generation sequencing request management system in Galaxy
Next-generation sequencing request management system in GalaxyBrad Chapman
 
BioHackathon 2010 Intro
BioHackathon 2010 IntroBioHackathon 2010 Intro
BioHackathon 2010 IntroBrad Chapman
 
Lowering barriers to publishing biological data on the web
Lowering barriers to publishing biological data on the webLowering barriers to publishing biological data on the web
Lowering barriers to publishing biological data on the webBrad Chapman
 

Mehr von Brad Chapman (6)

Biopython at BOSC 2010
Biopython at BOSC 2010Biopython at BOSC 2010
Biopython at BOSC 2010
 
Developing an open source community for cloud bioinformatics
Developing an open source community for cloud bioinformaticsDeveloping an open source community for cloud bioinformatics
Developing an open source community for cloud bioinformatics
 
GATK recalibration plot
GATK recalibration plotGATK recalibration plot
GATK recalibration plot
 
Next-generation sequencing request management system in Galaxy
Next-generation sequencing request management system in GalaxyNext-generation sequencing request management system in Galaxy
Next-generation sequencing request management system in Galaxy
 
BioHackathon 2010 Intro
BioHackathon 2010 IntroBioHackathon 2010 Intro
BioHackathon 2010 Intro
 
Lowering barriers to publishing biological data on the web
Lowering barriers to publishing biological data on the webLowering barriers to publishing biological data on the web
Lowering barriers to publishing biological data on the web
 

Kürzlich hochgeladen

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Kürzlich hochgeladen (20)

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan

  • 1. Motivation Solution Implementation Demonstration Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan Brad Chapman Bioinformatics Core Harvard School of Public Health 22 September 2011
  • 2. Motivation Solution Implementation Demonstration Acknowledgements CloudBioLinux – Ntino Krampis, Tim Booth, Dawn Field, Pjotr Prins and CloudBioLinux community CloudMan – Enis Afgan, James Taylor Exome pipeline – HSPH, MGH, Win Hide, Oliver Hofmann
  • 3. Motivation Solution Implementation Demonstration Follow along http://www.slideshare.net/chapmanb
  • 4. Motivation Solution Implementation Demonstration Cue the “lots of data” slide ls -lh fastq/ 24G 1_110907_AD08A5ACXX_1_fastq.txt 21G 1_110907_AD08A5ACXX_2_fastq.txt 24G 2_110907_AD08A5ACXX_1_fastq.txt 20G 2_110907_AD08A5ACXX_2_fastq.txt
  • 5. Motivation Solution Implementation Demonstration Rapidly changing tools
  • 6. Motivation Solution Implementation Demonstration Science – fundamental challenge 75% one-off experimental 25% reused code
  • 7. Motivation Solution Implementation Demonstration Unfortunate result http://news.ycombinator.com/item?id=2735537
  • 8. Motivation Solution Implementation Demonstration Hard choices Computation Demands flexible, well-architected, scalable code Science Requires rapid turn around and experimentation
  • 9. Motivation Solution Implementation Demonstration 2 solutions (at least) 1 Improve your programming skills 2 Utilize community resources
  • 10. Motivation Solution Implementation Demonstration Become a better coder http://software-carpentry.org/
  • 11. Motivation Solution Implementation Demonstration Community resources Share painful parts Base of well-written, scalable code Start each problem from a higher level of abstraction
  • 12. Motivation Solution Implementation Demonstration Community components CloudBioLinux – install software CloudMan – manage cluster Exome analysis pipeline – do science
  • 13. Motivation Solution Implementation Demonstration CloudBioLinux Amazon image with bioinformatics software and libraries Automated build framework Community effort to maintain and extend http://cloudbiolinux.org
  • 14. Motivation Solution Implementation Demonstration CloudMan SGE cluster plus automation Web interface and monitoring Persistence and sharing Powers the Galaxy Cloud offering http://wiki.g2.bx.psu.edu/Admin/Cloud
  • 15. Motivation Solution Implementation Demonstration Exome analysis pipeline Existing algorithms Aligners – Bowtie, BWA Variation – GATK Quality assessment – FastQC, Picard Messaging system – AMQP https://github.com/chapmanb/bcbb/ tree/master/nextgen
  • 16. Motivation Solution Implementation Demonstration Fastq lane processing
  • 17. Motivation Solution Implementation Demonstration Sample processing
  • 18. Motivation Solution Implementation Demonstration Variant calling
  • 19. Motivation Solution Implementation Demonstration Parallelization
  • 20. Motivation Solution Implementation Demonstration
  • 21. Motivation Solution Implementation Demonstration Amazon Virtual machines Share Reproduce Coordinate Accessibility
  • 22. Motivation Solution Implementation Demonstration What are we going to do? Use AWS console to boot CloudBioLinux Setup CloudMan in AWS console Boot CloudMan instance with demo data
  • 23. Motivation Solution Implementation Demonstration What are we going to do? continued Manage cluster with CloudMan interface Setup messaging queue Run pipeline, examine results Share cluster
  • 24. Motivation Solution Implementation Demonstration CloudBioLinux Select and launch CloudBioLinux AMI from AWS console Connect FreeNX graphical client ssh Full tutorial PDF: http://j.mp/nnh5TE
  • 25. Motivation Solution Implementation Demonstration Prep work Signup for AWS account: http://aws.amazon.com/ Create login key pair in AWS Console Install NX client: http://www.nomachine.com/select-package-client.php
  • 27. Select CloudBioLinux image from Community AMIs
  • 28. enter NX password in user-data (freenxpass: secret)
  • 30. Get external hostname from Instances page
  • 31. Connect using NX client, with ubuntu user and secret password
  • 32.
  • 33. Connect with ssh, using private ssh key-pair
  • 34. Terminate the server when finished
  • 35. Motivation Solution Implementation Demonstration Setup CloudMan in AWS console Create a custom security group Full tutorial: http://wiki.g2.bx.psu.edu/Admin/Cloud
  • 36. Create security group rules following wiki instructions
  • 37. Final security group specifications
  • 38. Motivation Solution Implementation Demonstration Boot CloudMan instance with demo data Start server Pass in CloudMan user data Load shared CloudMan image
  • 39. Follow same procedure as CloudBioLinux
  • 40. Create CloudMan user-data file cluster_name: cbldemo password: cbl access_key: your_access_key secret_key: your_long_AWS_secret_key
  • 43. Login to instance with password from user-data
  • 44. Motivation Solution Implementation Demonstration CloudMan share-an-instance Persist data in a CloudMan cluster Easily sharable For this demo cm-b53c6f1223f966914df347687f6fc818/shared/2011-10-07–14-00
  • 45. Import shared instance with demo data
  • 46. Motivation Solution Implementation Demonstration Manage cluster with CloudMan Web-based console Monitor running processes Add nodes to cluster as needed
  • 47. CloudMan console to interact with cluster
  • 48. Add node to cluster
  • 49. Motivation Solution Implementation Demonstration Setup messaging communication Command line access to server Adjust RabbitMQ configuration Setup messaging queue
  • 50. Motivation Solution Implementation Demonstration Command line access to server ssh -i ~/.ec2/id-kunkel.keypair ubuntu@ec2-67-202-14-208.compute-1.amazonaws.com Follow approach used to connect to CloudBioLinux cluster; can also connect via NX
  • 51. Motivation Solution Implementation Demonstration Edit /export/data/galaxy/universe_wsgi.ini configuration file to add internal host name. [galaxy_amqp] host = ip-10-125-10-182.ec2.internal port = 5672 userid = biouser password = tester
  • 52. Motivation Solution Implementation Demonstration Setup messaging queue $ sudo rabbitmqctl add_user biouser tester creating user ’biouser’ ... ...done. $ sudo rabbitmqctl add_vhost bionextgen creating vhost ’bionextgen’ ... ...done. $ sudo rabbitmqctl set_permissions -p bionextgen biouser ".*" ".*" ".*" setting permissions for user ’biouser’ in vhost ’bionextgen’ .. ...done.
  • 53. Motivation Solution Implementation Demonstration Run pipeline, examine results Ready to run distributed pipeline Demo data – two paired end fastq lanes Variant calling workflow
  • 54. Motivation Solution Implementation Demonstration Input sequence data $ ls -1 /export/data/exome_example/fastq/ 7_100326_FC6107FAAXX_1-chr22.fastq 7_100326_FC6107FAAXX_2-chr22.fastq 8_100326_FC6107FAAXX_1-chr22.fastq 8_100326_FC6107FAAXX_2-chr22.fastq
  • 55. Motivation Solution Implementation Demonstration Run level: YAML Configuration $ cat /export/data/exome_example/config/run_info.yaml --- fc_date: ’100326’ fc_name: FC6107FAAXX details: - files: [7_100326_FC6107FAAXX_1-chr22.fastq, 7_100326_FC6107FAAXX_2-chr22.fastq] lane: 7 description: Test replicate 1 analysis: SNP calling genome_build: hg19 algorithm: quality_format: Standard hybrid_bait: hybrid_selection/baits.bed hybrid_target: hybrid_selection/targets.bed
  • 56. Motivation Solution Implementation Demonstration System level: YAML Configuration $ cat /export/data/galaxy/post_process.yaml --- program: bowtie: bowtie bwa: bwa ucsc_bigwig: wigToBigWig picard: /usr/share/java/picard gatk: /usr/share/java/gatk snpEff: /usr/share/java/snpeff fastqc: fastqc distributed: cluster_platform: sge platform_args: ’-q all.q’ cores_per_host: 1 rabbitmq_vhost: bionextgen
  • 57. Motivation Solution Implementation Demonstration Run exome pipeline $ cd /export/data/work $ distributed_nextgen_pipeline.py /export/data/galaxy/post_process.yaml /export/data/exome_example/fastq /export/data/exome_example/config/run_info.yaml
  • 58. Motivation Solution Implementation Demonstration What just happened?
  • 59. Motivation Solution Implementation Demonstration Monitoring: SGE queues $ qstat ob-ID prior name state submit/start at queue -------------------------------------------------------------- 1 0.55500 nextgen_an r 18:16:32 all.q@ip-10-125-10-182.ec2.int 2 0.55500 nextgen_an r 18:16:32 all.q@ip-10-86-254-105.ec2.int 3 0.55500 automated_ r 18:16:47 all.q@ip-10-125-10-182.ec2.int
  • 60. Motivation Solution Implementation Demonstration Monitoring: Analysis directory $ cd /export/data/work $ ls -lh drwxr-xr-x 4.0 alignments -rw-r--r-- 2.0K automated_initial_analysis.py.o11 drwxr-xr-x 33 log -rw-r--r-- 15K nextgen_analysis_server.py.o10 -rw-r--r-- 15K nextgen_analysis_server.py.o9 drwxr-xr-x 102 tmp
  • 61. Motivation Solution Implementation Demonstration Monitoring: Log files $ less nextgen_analysis_server.py.o10 INFO: nextgen_pipeline: Processing sample: Test replicate 2; lane 8; reference genome hg19; researcher ; analysis method SNP calling INFO: nextgen_pipeline: Aligning lane 8_100326_FC6107FAAXX with bwa aligner INFO: nextgen_pipeline: Combining and preparing wig file [u’’, u’Test replicate 2’] INFO: nextgen_pipeline: Recalibrating [u’’, u’Test replicate 2’] with GATK
  • 62. Motivation Solution Implementation Demonstration Retrieve results: Copy files $ upload_to_galaxy.py /export/data/galaxy/post_process.yaml /export/data/exome_example/fastq /export/data/work /export/data/exome_example/config/run_info.yaml Final files copied into new directory; allows cleanup of analysis directory
  • 63. Motivation Solution Implementation Demonstration Retrieve results: Output directory $ ls -lh /export/data/galaxy/storage/100326_FC6107FAAXX/7 -rw-r--r-- 38M 7_100326_FC6107FAAXX.bam -rw-r--r-- 22M 7_100326_FC6107FAAXX-coverage.bigwig -rw-r--r-- 72M 7_100326_FC6107FAAXX-gatkrecal.bam -rw-r--r-- 109K 7_100326_FC6107FAAXX-snp-effects.tsv -rw-r--r-- 827K 7_100326_FC6107FAAXX-snp-filter.vcf -rw-r--r-- 1.6M 7_100326_FC6107FAAXX-summary.pdf
  • 64. Motivation Solution Implementation Demonstration Share results Share-an-instance Uses CloudMan web interface Reproducible research CloudBioLinux AMI – software CloudMan – data and configuration
  • 65. CloudMan console enables push button sharing
  • 66. Can make public or available to specific collaborators
  • 67. When finished, turn everything off through CloudMan
  • 68. Motivation Solution Implementation Demonstration Summary CloudBioLinux Shared machine image of biological software Boot from AWS console Connect with NX graphical client and ssh
  • 69. Motivation Solution Implementation Demonstration Summary CloudMan Cluster setup and management Boot from share-an-instance Manage cluster through web interface Share final results
  • 70. Motivation Solution Implementation Demonstration Summary Exome pipeline Parallel framework for running analyses Run using automated scripts Extract alignments, variant calls and summary information
  • 71. Motivation Solution Implementation Demonstration Future: interfaces make it easier https://bitbucket.org/hbc/galaxy-central-hbc
  • 72. Motivation Solution Implementation Demonstration Future: Simplified file selection
  • 73. Motivation Solution Implementation Demonstration Future: Top level parameters
  • 74. Motivation Solution Implementation Demonstration Future: Galaxy data libraries
  • 75. Motivation Solution Implementation Demonstration Future: Galaxy analysis
  • 76. Motivation Solution Implementation Demonstration Future: External UCSC visualization
  • 77. Motivation Solution Implementation Demonstration Read more Step-by-step instructions http://j.mp/rp69nx Approaches to parallelism http://j.mp/nPQHcm Future work http://bcbio.wordpress.com