SlideShare a Scribd company logo
1 of 109
Download to read offline
NGS Analysis using Galaxy
2013 한국유전체학회 동계심포지엄 생물정보분석교육 워크샵



김형용, 이규열, 이성찬 _ 2013. 02. 05 ~ 2013.02.06

R&D Center, Insilicogen, Inc.
Index
                                           목차 있을 시 간지



                            01   Galaxy   introduction
NGS Analysis using Galaxy
                            02   Galaxy   examples 1,2
                            03   Galaxy   installation
                            04   Galaxy   function details

                            05   Galaxy   examples 3,4
                            06   Galaxy   tools
                            07   Galaxy   on Grid
                            08   Galaxy   on Cloud
Agenda

     구분              시간                   강의내용                              비고
                 15:00 ~ 15:20   Galaxy 소개             진행 김형용
                 15:20 ~ 15:50   Galaxy 분석예제 시연        1. Human exon 가운데 가장 SNP 많은 ex
                                                       on 찾기
       1부:                                             2. NGS QC and assembly 예제
  Introduction   16:00 ~ 16:20   Galaxy 설치             진행 이성찬
       and       16:20 ~ 17:10   Galaxy 설치 및 분석예제 실습   1. Galaxy 설치 실습
   Application                                         2. Human exon 가운데 가장 SNP가 많은
                                                       exon 찾기 실습
                                                       3. NGS QC and assembly 예제 실습

                 17:20 ~ 17:50   Galaxy 세부 기능에 대한 설명   진행 김형용
                 09:00 ~ 09:20   Galaxy 분석예제 시연        진행 김형용
                                                       1. RNA-seq 분석 예제
                                                       2. NGS 분석예제 2
                 19:20 ~ 09:50   Galaxy 분석예제 실습        1. RNA-seq 분석 예제
                                                       2. NGS 분석예제 2
     2부:         10:00 ~ 10:20   Galaxy tool의 이해       진행 김형용
    Custom       10:20 ~ 11:00   Galaxy tool 작성 실습     1. Primer design
   operation     11:10 ~ 11:30   Galaxy on Grid        진행 이규열
                                                       1. 그리드의 이해
                                                       2. 분산작업 시연
                 11:30 ~ 11:50   Galaxy on Cloud       진행 김형용
                                                       1. 클라우드의 이해
                                                       2. Galaxy on Amazon EC2
                                                                 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   3
NGS Technologies
Sequencer Comparison




                                 Illumina                        454                                   SOLiD
                                                                                  5500                 5500xl                   5500xl
           HiSeq 2000   HiSeq 1000    HiScan SQ      GAIIx       GS FLX
                                                                               microbeads            microbeads               nanobeads
                                                                                         Mate pair : 60 bp X60 bp
Read
                        2X100 bp                    2X150 bp     400 bp                  Paired-end : 75 bp X35 bp
length
                                                                                              Fragment : 75 bp

Gb/day        55            35           17.5         6.5         10h            10-15                    20-30                   30-45

Yield        600Gb        300Gb         150Gb        95Gb        35Mb             90Gb                   180Gb                   300Gb

Required                   50 ng with Nextera
input                   100 ng – 1 μg with TruSeq
                         85% (2X50 bp, >Q30)
Accuracy                                                       99% (>Q20)                               99.99%
                         80% (2X100 bp, >Q30)

                                                                                  Illumina의 Gb/day는 2X100 bp run 결과
                                                                        Illumina read length : 1X35, 2X50, 2X100
                                                                                            GA : 1X35, 2X50, 2X100, 2X150

                                                                                 Copyrightⓒ Insilicogen, Inc. 2011. All rights reserved.   5
Applications

       Application of NGS Technique




      Personal Genomics                                                  Environmentology

        Microbiology                                                           Toxicology

      Personal Genomics                                                  Chemical Biology




                                       Mutation Detection

                                        Structure Variation

                                      Transcriptional Control

                                  Interaction of DNA and Protein



                                                                   Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   6
Issue of New Genomic Era.
                             many researchers,
                               having invested
                            in next generation
                                   sequencing
                                  instruments,
                                      now face
                 a computational bottleneck
                              in their research
                                    work-flow.

                                                                 BGI
                                    Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   7
Most Significant Improvement to Your Next Generation Sequencing
Workflow




                                                           (출처: The Global Outlook for Next
                                                           Generation Sequencing: Usage, Platform
                                                           Drivers & Workflow, October 31, 2011.
                                                           BioInformatics, LLC)




                                                  Copyrightⓒ Insilicogen, Inc. 2010. All rights reserved.   8
Issue of New Genomic Era.


                                                                                                                           Bioinformatics
     •DNA shearing
     •Insert into high and             • Big Dye                                                          • FTP
     /or low copy                      • ABI 3730                     • Gene prediction                   • Web browser
     number vectors                    • Data compliation             • BLAST search                      • Commercial software

           Library                        Sequence                        Sequence
                                                                                                              Data delivery
         construction                    delineation                     annotation


                         Template                       Finishing &                        Secondary
                        purification                     Assembly                          annotation

                    • PCR Amplicons              • Primer walking                         • SNP
                    • BACs                       • Transposon insertion methods           • Comparative genomics
                    • Cosmids/ Fosmids           • Proprietary & commercial assembly      • Expression analysis



  Cost




                                                                                                                         Process


                                                                                              Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   9
Application of Next Genomic Data




                                   Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   10
Practical Software Platforms
   for NGS data analysis
What kind of?



•   Biological Features
•   Framework (Enterprise/Informatics) Features
•   Service
•   Price
List of NGS Frameworks




                         Copyrightⓒ Insilicogen,Inc. 2012. All rights reserved.   13
유전변이 추출 전문 파이프라인 HugeSeq




                           Copyrightⓒ Insilicogen,Inc. 2012. All rights reserved.   14
사용자 친화적 GUI환경을 제공하는 CLC Genomics Server




                            CLC Genomics Server
                       1
                           - 3계층 시스템 구조의 데이터 분석 및 공유, 관리를 위한 엔터프라이즈 솔루션

         ②    ⑤             CLC Bioinformatics Database
                       2
                           - 데이터의 중앙 집중 방식의 저장 및 공유 관리를 위한 데이터베이스


                            CLC Assembly Cell
                       3
                           - NGS 데이터의 초고속 assembly 분석 솔루션 (커맨드라인 기반)

         ①                  CLC Genomics Workbench
                       4
                           - NGS 데이터의 다양한 생물정보 분석 솔루션 (GUI 기반)
  ③            ④
                             CLC Developer Kit
                       5
                           - 사용자가 원하는 생물정보 분석 툴과 워크플로우 커스터마이징 솔루션




                                                          Copyrightⓒ Insilicogen,Inc. 2012. All rights reserved.   15
Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   16
30x Human
      genome 1 sample
      (150G)
      500만원 (1년저장)




Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   17
구글로부터 투자받아
   NCBI SRA 서비스 연동

   온라인에서 실험없이 곧
   바로 분석 가능




Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   18
GALAXY
Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   20
Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   21
What is Galaxy



Galaxy, a web-based genome analysis
platform http://usegalaxy.org

• An open-source framework for integrating various computational tools and
databases into a cohesive workspace

• A web-based service we provide, integrating many popular tools and
resources for comparative genomics

• A completely self-contained application for building your own Galaxy style
sites




                                                      Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   22
Galaxy Usage



• One of the fastest growing open source bioinformatics projects,
a highly successful high throughput data analysis platform for
Life Sciences with over 15,000 users worldwide

• Annual Galaxy Community Conference




                                             Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   23
Galaxy visualization



External Genome Browser

  UCSC
  Ensembl
  GBrowse




Trackster

 Track/data viewer in web browser
 HTML5 Canvas, jQuery
 Renders in browser, not on server




                                      Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   24
Galaxy visualization




                       Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   25
Trackster




            Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   26
Trackster




            Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   27
Trackster




            Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   28
Galaxy 구성요소


                                 Galaxy 주요구성 요소



                                  Datasources : 입력 데이터 지정. 별도의 지역
                                 시스템이나, 외부 웹사이트의 데이터를 등록 가능
                                  Tool : 기본적인 분석의 최소 단위, 지역설치시
                                 원하는 툴을 만들어 넣을 수 있음
                                  History : 입력데이터가 Tool의 조합을 거쳐 얻어진
                                 중간 결과물 목록
                                  Workflow : History 는 입력데이터 및 파라메터만
                                 바꾸면 새로운 데이터 결과를 얻을 수 있다. 이를
                                 별도로 프로세스 등록
                                  Visualization : 분석결과를 가시화 도구와 연결
                                  Page : 위 요소들을 종합한 보고서 작성 기능


Eprimer3 tool 을 별도로 만들어 등록한 예제



                                                  Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   29
Galaxy tool 은


        입력                                     출력
                      Tool
        포맷                                     포맷

  입력 데이터를 (포맷에 맞게) 작업하여 (포맷에 맞게) 출력 데이터를 만드는 역할

                             조합하면 Workflow가 된다




                                            Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   30
Galaxy formats

   Auto-detect      데이터가 어떤 형식인지 자동으로 인식
                    A binary sequence file in 'ab1' format with a '.ab1' file extension. You must manually select this 'File Format' when uploadi
       Ab1
                    ng the file.
                    blastz pairwise alignment format. Each alignment block in an axt file contains three lines: a summary line and 2 sequence li
       Axt          nes. Blocks are separated from one another by blank lines. The summary line contains chromosomal position and size infor
                    mation about the alignment. It consists of 9 required fields.
       Bam          A binary file compressed in the BGZF format with a '.bam' file extension.
       Bed          Tab delimited format (tabular). Does not require header line
                    A sequence in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of
      Fasta
                    the description line is a greater-than (">") symbol in the first column. All lines should be shorter than 80 characters

   FastqSolexa      Illumina (Solexa) variant of the Fastq format, which stores sequences and quality scores in a single file
       Gff          GFF lines have nine required fields that must be tab-separated.
                    The GFF3 format addresses the most common extensions to GFF, while preserving backward compatibility with previous fo
       Gff3
                    rmats.
Interval (Genomic
                  Tab delimited format (tabular)
     Intervals)
        Lav       Lav is the primary output format for BLASTZ. The first line of a .lav file begins with #:lav..
                    TBA and multiz multiple alignment format. The first line of a .maf file begins with ##maf. This word is followed by white-sp
       MAF
                    ace-separated "variable=value pairs". There should be no white space surrounding the "=".
                    A binary sequence file in 'scf' format with a '.scf' file extension. You must manually select this 'File Format' when uploading
       Scf
                    the file.
        Sff         A binary file in 'Standard Flowgram Format' with a '.sff' file extension.
Tabular (tab delimi
                    Any data in tab delimited format (tabular)
       ted)
                    The wiggle format is line-oriented. Wiggle data is preceded by a track definition line, which adds a number of options for
       Wig
                    controlling the default display of this track.
  Other text type Any text file


                                                                                                         Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   31
Galaxy 특징 한번 더

  최근 Galaxy 사용 추세

                                                        Biologist
       NGS 관련 분석기능 탑재         논문에 Galaxy URL 제공

       Amazon Cloud 이용         Transparent analysis
                                                      Bioinformatician


  Galaxy 특징 한번 더



   파이썬으로 만들어져 있으나, 확장시 파이썬이 아니어도 됨
   “투명한” 분석 플로우를 만들고 공유하고 확장할 수 있다.
   거의 모든 생물정보 분석을 Galaxy 로 할 수 있다.
   Galaxy만 잘 써도 뽑겠다 (NCBI)
  …




                                                                         Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   32
GALAXY Examples 1
Example 1.


             Finding Human Exons with the highest number of SNPs

    1. Download all Human Exons from NCBI or Ensembl BioMart or UCSC
       TableBrowser
    2. Download all Human SNPs from …
    3. Scripting
         Join 1, 2 according to position
         Group by Exon id
         Sort by SNP count
         Filter Exon which has more than 10 SNPs




                   Have to do programming! (Python, Perl, …)




                                                         Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   34
On Galaxy




            http://usegalaxy.org




                               Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   35
On Galaxy




            Get data  UCSC main          : Exon 데이터 가져오기


            Get data  UCSC main          : SNP 데이터 가져오기


     Operate on Genomic Interval  Join   : 영역이 겹치는 Exon 추출하기


     Join, Substract and Group  Group    : Exon 이름으로 그룹핑하고 SNP 세기


            Filter and Sort  Sort        : SNP 개수로 Exon 정렬하기


       Text Manipulation  Select first   : SNP 개수가 많은 top 5 exon 추출하기


         Join, Substract and Group
          Compare two Datasets           : 잃어버린 exon 정보 회복하기



                                                         Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   36
GALAXY Examples 2
Example 2.


             Human NGS data QC and assembly



                                  1.   NGS Quality Control
                                  2.   NGS Single End Mapping
                                  3.   SNP Calling
                                  4.   Compare with dbSNP




                                 Have to do in Unix and need
                                 programming! (Python, Perl, …)




                                            Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   38
On Galaxy




            http://usegalaxy.org




                               Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   39
On Galaxy




   NGS 분석을 위해서는
     프로그램 추가 설치해야 함
     ( http:// http://wiki.galaxyproject.org/Admin/NGS%20Local%20Setup )


              프로그램                사용되는 곳                        설치방법
              Fastx-toolkit       NGS QC                        Ubuntu apt-get
              Gnuplot             NGS QC boxplot                Ubuntu apt-get
              Bowtie2             Reference assembly            복사 후 PATH 설정
              SAMTools            SNP calling                   Ubuntu apt-get




                                                             Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   40
On Galaxy




            Get data  Upload File         : human illumina fastq 파일 업로드


        NGS: QC and minipulation          : fastsanger 포맷을 변경
             FASTQ Groomer


        NGS: QC and minipulation 
                                           : fastq quality 통계정보 보기
         Compute quality statistics


        NGS: QC and minipulation 
        Draw quality score boxplot         : fastq quality 통계정보로 boxplot 그리기


       NGS: QC and minipulation 
                                           : 의미없는 부분 잘라내기, 가리기
   FASTQ Trimmer, Quality Trimer, Masker




                                                            Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   41
On Galaxy




            Get data  Upload File        : Reference assembly를 위한 레퍼런스 서열 입력


         NGS: Mapping  Bowtie2           : Bowtie2를 이용한 assembly


        NGS: SAM Tools  MPileup          : BAM 파일에서 SNP, indel 정보 추출하기


       NGS: SAM Tools  Filter pileup     : 추출된 SNP, indel 가운데 높은 점수 추출하기


    NGS: SAM Tools  Pileup-to-interval   : Genomic interval 형식으로 변경


            Get data  UCSC Main          : dbSNP 정보 가져오기


     Operate on Genomic Interval  Join   : 영역이 겹치는 SNP 추출하기



                                                          Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   42
Galaxy Installation
Install Virtualbox - Ubuntu


  1. USB에서 Virtualbox와 Galaxy 폴더를 복사합니다.


  2. Virtualbox를 설치합니다.


  3. Virtualbox를 실행한 후, Galaxy 이미지를 Import합니다.


  4. 설정에서 네트워크를 브릿지(Bridge)로 변경합니다.

  5. Ubuntu 실행 후, Network 설정 파일을 삭제합니다.
   rm /etc/udev/rules.d/70-persistent-net.rules

  6. Linux(ubuntu) 를 재 시작합니다.
   sudo shutdown –h now


                                                 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   44
Creating your own Galaxy




                           Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   45
Running Galaxy in an production environment


  By default, Galaxy uses

    SQLite database
    Built-in HTTP server for all tasks
    Local job runnser
    Single process
    Simplest error-proof configuration


  Change configuration for service

    Disable the developer settings  use_interactive = False, use_debug = False
    Get a real database  PostgresSQL
    Offload the menial tasks: Proxy  Nginix, Apache
    Let your tools free: Cluster  Move intensive processing to other host, TORQUE, GRID, DRMAA
    Other advanced settings




                                                                             Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   46
Galaxy on Cluster

  Intensive processes to other hosts

    TORQUE
    GRID
    DRMAA


  Working with Galaxy on the Cloud




                                       Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   47
Virtualization
Virtualization

          가상화



 • 컴퓨터 자원의 추상화를 일컫는 말
 • 가상의 물리적 리소스를 만들어 냄.


 •물리적인 1대의 하드웨어 자원을 논리적으로 여러 개로 나누어 사용하거나,
 •여러대의 하드웨어 자원을 논리적으로 통합하여 이용하는 기술


 • 하드웨어 관리, 재난에 대한 시스템 복구 등 여러 문제를 해결할 수 있는 방법으로 최근 각광
 받고 있음
Virtualization

       가상화의 장점!!

 • 비용절감
     서버 한 대를 분할하여 여러 대의 서버를 구성할 수 있음
     서버 구입비용 절감, 전기, 상면비용, 서버관리비용이 절감
 • 자원의 효율적인 사용
     서버의 비 활용되는 자원을 이용하여 가상머신을 만듬으로써 효율적인 자원사용이 가능
 • 안정적인 운영
     서버를 이미지로 백업, 손쉬운 서버 이전으로 장애에 대한 신속한 대처 가능
 • SW의 지속적인 운영
     서버 HW의 수명 주기가 끝나면 OS 벤더는 장치 드라이버 지원이 중단됨
       -> 마이그레이션 문제가 발생
     가상머신에 기존의 시스템을 가상머신에 올리기 때문에 장치 드라이버에 대한 문제
       가 발생하지 않음



                                         Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   50
클라우드 서비스에 기본적으로 활용




                     Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   51
Public Galaxy environment




                            Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   52
Example of Cloud




                   출처 : iSC 2012 Amazon HPC session

                    Copyrightⓒ Insilicogen,Inc. 2012. All rights reserved.   53
Running Galaxy Web server


  1. 자신의 컴퓨터의 IP Address를 확인합니다.
   ifconfig


  2. Galaxy 폴더로 이동합니다.
   cd galaxy-dist


  3. Galaxy web server를 실행합니다.
   sh run.sh


  4. 자신의 호스트 OS (windows) 에서 웹브라우저에서 주소창에 다음을 입력합니다.
   IP Address:8080 (예, 172.20.8.162:8080)




                                             Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   54
Galaxy Detail functions
Get Data




           Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   56
Get Data / Send Data




                       Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   57
Text Manipulation




                    Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   58
Convert Format




                 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   59
FASTA manipulation




                     Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   60
Filter and Sort




                  Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   61
Join, Subtract and Group




                           Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   62
Operate on Genomic Intervals




                               Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   63
NGS Toolbox




              Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   64
Galaxy Examples 3
Example 3.


                             Human RNA-seq



     1. RNA-seq result: adrenal_1,2.fastq, brain_1,2.fastq
     2. Reference: iGenome UCSC hg19, chr19 gene notation (GTF format)




                       Have to do in Unix and need
                       programming! (Python, Perl, …)




                                                        Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   66
On Galaxy




            http://usegalaxy.org




                               Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   67
On Galaxy




   RNA-seq 분석을 위해서는
     프로그램 추가 설치해야 함
     ( http://wiki.galaxyproject.org/Admin/NGS%20Local%20Setup )


              프로그램        사용되는 곳               설치방법
              java        FastQC               Ubuntu apt-get install openjdk-7-jre
              FastQC      NGS QC               tool-data/shared/jars/ 로 복사
              Tophat      RNA-seq mapping      (다음페이지 참고)
              Cufflinks   RNA-seq assembly     Ubuntu apt-get install cufflinks




                                                             Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   68
Tophat install in Ubuntu



         $   cp samtools-0.1.18.tar.gz2 ~/work
         $   bzip2 –d samtools-0.1.18.tar.gz2
         $   tar xvf samtools-0.1.18.tar
         $   cd samtools-0.1.18
         $   make
         $   cd ..

         $   cp tophat-1.4.1.tar.gz ~/work
         $   tar zxvf tophat-1.4.1.tar.gz
         $   cd tophat-1.4.1
         $   apt-get install libboost libbam libboost-thread-dev
         $   cp ../samtools-0.1.18/libbam.a /usr/local/lib
         $   sudo mkdir /usr/local/include/bam
         $   cp ../samtools-0.1.18/*.h /usr/local/include/bam
         $   configure
         $   make
         $   make install


                                                             Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   69
On Galaxy




            Get data  Upload File   : fastq, chr19.fa, gtf 파일 업로드


       NGS: QC and minipulation     : fastqsanger 포맷으로 변경
            FASTQ Groomer


       NGS: QC and minipulation 
                                     : fastq quality 통계정보 보기
            FastQC:Read QC


             NGS: RNA Analysis      : RNA-seq fastq 데이터에서 splice junction 찾기
             Tophat for Illumina       레퍼런스로 chr19.fa 이용


             NGS: RNA Analysis 
                                     : Transcript assembly, FPKM 추정
                  Cufflinks




                                                       Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   70
On Galaxy




      NGS: RNA Analysis  Cuffmerge   : brain, adrenal 데이터를 reference에 맞게 합치기


       NGS: RNA Analysis  Cuffdiff   : 유의한 발현변화 찾기




                                                      Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   71
Galaxy Tools
Galaxy tool 은


        입력                                     출력
                      Tool
        포맷                                     포맷

  입력 데이터를 (포맷에 맞게) 작업하여 (포맷에 맞게) 출력 데이터를 만드는 역할

                             조합하면 Workflow가 된다




                                            Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   73
Galaxy formats

   Auto-detect      데이터가 어떤 형식인지 자동으로 인식
                    A binary sequence file in 'ab1' format with a '.ab1' file extension. You must manually select this 'File Format' when uploadi
       Ab1
                    ng the file.
                    blastz pairwise alignment format. Each alignment block in an axt file contains three lines: a summary line and 2 sequence li
       Axt          nes. Blocks are separated from one another by blank lines. The summary line contains chromosomal position and size infor
                    mation about the alignment. It consists of 9 required fields.
       Bam          A binary file compressed in the BGZF format with a '.bam' file extension.
       Bed          Tab delimited format (tabular). Does not require header line
                    A sequence in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of
      Fasta
                    the description line is a greater-than (">") symbol in the first column. All lines should be shorter than 80 characters

   FastqSolexa      Illumina (Solexa) variant of the Fastq format, which stores sequences and quality scores in a single file
       Gff          GFF lines have nine required fields that must be tab-separated.
                    The GFF3 format addresses the most common extensions to GFF, while preserving backward compatibility with previous fo
       Gff3
                    rmats.
Interval (Genomic
                  Tab delimited format (tabular)
     Intervals)
        Lav       Lav is the primary output format for BLASTZ. The first line of a .lav file begins with #:lav..
                    TBA and multiz multiple alignment format. The first line of a .maf file begins with ##maf. This word is followed by white-sp
       MAF
                    ace-separated "variable=value pairs". There should be no white space surrounding the "=".
                    A binary sequence file in 'scf' format with a '.scf' file extension. You must manually select this 'File Format' when uploading
       Scf
                    the file.
        Sff         A binary file in 'Standard Flowgram Format' with a '.sff' file extension.
Tabular (tab delimi
                    Any data in tab delimited format (tabular)
       ted)
                    The wiggle format is line-oriented. Wiggle data is preceded by a track definition line, which adds a number of options for
       Wig
                    controlling the default display of this track.
  Other text type Any text file


                                                                                                         Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   74
Creating your own Galaxy




                           Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   75
Primer design tool




                     Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   76
Primer3


 Primer3
   • Primer design program
   • http://primer3.sourceforge.net/releases.php
   • Download from
      http://sourceforge.net/projects/primer3/files/primer3/1.1.4/prim
      er3-1.1.4.tar.gz
   • make & copy to PATH

 eprimer3
   • Wrapper for Primer3, it’s used in EMBOSS package
   • Easy command line interface
   • http://emboss.sourceforge.net/apps/release/6.4/emboss/apps/
      eprimer3.html
   • apt-get install emboss




                                                          Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   77
erimer3



                                   # EPRIMER3 RESULTS FOR GL020027.1
 $ eprimer3                        #               Start Len     Tm     GC%     Sequence
   –sequence INPUT_FASTA_FILE          1 PRODUCT SIZE: 199
   –outfile PRIMER_DESIGN_RESULT        FORWARD PRIMER 571071         20 60.06 45.00 CTTGCCAATAGCGAATGGAT

   -osize OSIZE                         REVERSE PRIMER 571250     20 59.99 55.00 GACGGCGTAGATCTTCAAGC

   -gcclamp GCCLAMP                    2 PRODUCT SIZE: 199
   …                                    FORWARD PRIMER 55074          20 60.05 55.00 TAACACCACTGCTCCTGCTG

                                        REVERSE PRIMER   55253   20 59.97 50.00 CATTGCATGGTCAGAACCAC


                                       3 PRODUCT SIZE: 200
                                        FORWARD PRIMER 71990          20 60.03 45.00 GGGGTTGATTTTCATTGTGG
   이 결과 형식을 수정하여
                                        REVERSE PRIMER   72170   20 59.88 45.00 GTTTGCACCAACCTGGTTTT
   다른 Galaxy tool의 입력
   으로 쓰고 싶다.                           4 PRODUCT SIZE: 200
                                        FORWARD PRIMER 427182         20 59.83 50.00 CTGATGTGCTCTGTGGGAAA

                                        REVERSE PRIMER 427362     20 60.01 55.00 CCGTGTATGTAGCCCGAGTT


                                       5 PRODUCT SIZE: 197
   직접 Primer design                     FORWARD PRIMER 427185         20 59.97 50.00 ATGTGCTCTGTGGGAAAACC

   Galaxy tool 만들기                      REVERSE PRIMER 427362     20 60.01 55.00 CCGTGTATGTAGCCCGAGTT




                                                                       Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   78
erimer3.xml




              Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   79
erimer3.py




             Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   80
tool_conf.xml




 …
  <section name="VCF Tools" id="vcf_tools">
    <tool file="vcf_tools/intersect.xml" />
    <tool file="vcf_tools/annotate.xml" />
    <tool file="vcf_tools/filter.xml" />
    <tool file="vcf_tools/extract.xml" />
  </section>
  <section name=“MyTools" id=“mytools">
    <tool file=“mytools/eprimer3.xml" />
  </section>
 </toolbox>




                                              Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   81
EMBOSS eprimer3 tool added




                             Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   82
실습




            Install Primer3              : make 명령으로 컴파일 후, primer3_core PATH 설정


              Install EMBOSS             : sudo apt-get install emboss


             Install Biopython           : sudo apt-get install python-biopython


     Copy eprimer3.py, eprimer3.xml to
        galaxy-dist/tools/mytools/       : mytools 디렉토리는 직접 생성



             Edit tool_conf.xml          : mytools/eprimer3.xml 설정




                                                             Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   83
Galaxy on Grid
Grid vs Cluster


          대용량 데이터에 대한 연산을 작은 소규모 연산들로 나누
  공통점     어 작은 여러대의 컴퓨터로 분산시켜 수행

          WAN상에서 서로 다른 기종의 머신들을 연결
  차이점     다양한 플랫폼을 서로 연결함
          연결대수에 제한이 없음




                                     Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   85
Grid




       Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   86
Globus Toolkit

 대표적인 계산 그리드 미들웨어
 Open source toolkit for building computing grids
  developed and provided by Globus Alliance
 Standards implementation
   • Open Grid Service Architecture (OGSA)
   • Open Grid Service Infrastructure (OGSI)
   • Web Services Resource Framework (WSRF)
   • Job Submission Description Language
      (JSDL)
   • Distributed Resource Management
      Application API (DRMAA)
   • SOAP
   • WSDL
   • Grid Security Infrastructure




                                                     Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   87
High level Open Grid Forum API specification for submission and control of jobs
to a Distributed Resource Management (DRM, Job scheduler) system, such as a
Cluster or Grid computing infrastructure




                                                         Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   88
PBS (Portable Batch System)

 Computer software that performs job scheduling in Unix cluster environment
 A component of the Globus Toolkit
 Originally developed by NASA
 Following versions
    • OpenPBS
    • TORQUE – a fork of OpenPBS
    • PBS Professional (PBS pro) - commercial




                                                         Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   89
TORQUE

 Distributed resource manager providing control
  over batch jobs and distributed compute node
 It stands for Terascale Open Source Resource
  and QUEue Manager
 Slave 노드의 CPU개수, core 개수, RAM사이즈, 임
  시저장소 등의 설정정보를 가지고 스케줄러에 의해
  요청이 왔을 때 클러스터 리소스를 분배함

                       Slave 1


      Master
                       Slave 2

       NFS
                       Slave 3
     > qsub a.sh
      a.sh 명령을 스케줄러에 따라 slave로 넘김


                                                   Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   90
Virtualized Galaxy (Test-bed)




                                Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   91
Galaxy on Cloud
Cloud computing

 Delivery of computing and
  storage capacity as a service to
  a heterogeneous community of
  end-recipients.




                                     Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   93
Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   94
VPS (Virtual Private Server)

 Internet hosting services to refer a virtual machine in a cloud




                                                             Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.   95
Amazon EC2 (Amazon Elastic Compute Cloud)




                              Virtualization + Grid(Cluster)
                              computing in a Cloud




                                                               96
Amazon EC2 (Amazon Elastic Compute Cloud)




                                            97
Amazon EC2 (Amazon Elastic Compute Cloud)




                                            98
Amazon EC2 (Amazon Elastic Compute Cloud)




                                            99
Amazon S3 (Amazon Simple Storage Service)




                                            100
Galaxy on Cloud

  Using Amazon EC2 + S3

  Select AMIs in Community AMIs




                                                                                       101
                                  Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.
Galaxy on Cloud




                                                                       102
                  Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.
Galaxy on Cloud




                                                                       103
                  Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.
Galaxy on Cloud




                                                                       104
                  Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.
Galaxy on Cloud




                                                                       105
                  Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.
Galaxy on Cloud




                                                                       106
                  Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.
Galaxy on Insilicogen

  Galaxy localization on cluster

                                   Tool development




    Workflow development




                                                                                                           107
                                                      Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.
www.insilicogen.com E-mail codes@insilicogen.com Tel 031-278-0061 Fax 031-278-0062
www.insilicogen.com E-mail km@insilicogen.com Tel 031-548-1008,1009 Fax 031-278-0062

More Related Content

Similar to Kogo 2013-ngs galaxy

Customer presentation: Eagle Genomics, Introduction to AWS, Cambridge
Customer presentation: Eagle Genomics, Introduction to AWS, CambridgeCustomer presentation: Eagle Genomics, Introduction to AWS, Cambridge
Customer presentation: Eagle Genomics, Introduction to AWS, Cambridge
Amazon Web Services
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
c.titus.brown
 
Afgan bosc2010 galaxy_cloud
Afgan bosc2010 galaxy_cloudAfgan bosc2010 galaxy_cloud
Afgan bosc2010 galaxy_cloud
BOSC 2010
 
ドワンゴでのScala活用事例「ニコニコandroid」
ドワンゴでのScala活用事例「ニコニコandroid」ドワンゴでのScala活用事例「ニコニコandroid」
ドワンゴでのScala活用事例「ニコニコandroid」
Satoshi Goto
 
Quoc Le, Stanford & Google - Tera Scale Deep Learning
Quoc Le, Stanford & Google - Tera Scale Deep LearningQuoc Le, Stanford & Google - Tera Scale Deep Learning
Quoc Le, Stanford & Google - Tera Scale Deep Learning
Kun Le
 

Similar to Kogo 2013-ngs galaxy (20)

Customer presentation: Eagle Genomics, Introduction to AWS, Cambridge
Customer presentation: Eagle Genomics, Introduction to AWS, CambridgeCustomer presentation: Eagle Genomics, Introduction to AWS, Cambridge
Customer presentation: Eagle Genomics, Introduction to AWS, Cambridge
 
The Missing Link: Dedicated End-to-End 10Gbps Optical Lightpaths for Clusters...
The Missing Link: Dedicated End-to-End 10Gbps Optical Lightpaths for Clusters...The Missing Link: Dedicated End-to-End 10Gbps Optical Lightpaths for Clusters...
The Missing Link: Dedicated End-to-End 10Gbps Optical Lightpaths for Clusters...
 
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
Afgan bosc2010 galaxy_cloud
Afgan bosc2010 galaxy_cloudAfgan bosc2010 galaxy_cloud
Afgan bosc2010 galaxy_cloud
 
Creating a Planetary Scale OptIPuter
Creating a Planetary Scale OptIPuterCreating a Planetary Scale OptIPuter
Creating a Planetary Scale OptIPuter
 
The Computational Microscope Images Biomolecular Machines and Nanodevices - K...
The Computational Microscope Images Biomolecular Machines and Nanodevices - K...The Computational Microscope Images Biomolecular Machines and Nanodevices - K...
The Computational Microscope Images Biomolecular Machines and Nanodevices - K...
 
A RESTful WfXML
A RESTful WfXMLA RESTful WfXML
A RESTful WfXML
 
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
 
Hybrid Authentication
Hybrid AuthenticationHybrid Authentication
Hybrid Authentication
 
Towards Telepresence
Towards TelepresenceTowards Telepresence
Towards Telepresence
 
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsAction Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
 
ドワンゴでのScala活用事例「ニコニコandroid」
ドワンゴでのScala活用事例「ニコニコandroid」ドワンゴでのScala活用事例「ニコニコandroid」
ドワンゴでのScala活用事例「ニコニコandroid」
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGSCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
 
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
 
New Applications of SuperNetworks and the Implications for Campus Networks
New Applications of SuperNetworks and the Implications for Campus NetworksNew Applications of SuperNetworks and the Implications for Campus Networks
New Applications of SuperNetworks and the Implications for Campus Networks
 
Riding the Light: How Dedicated Optical Circuits are Enabling New Science
Riding the Light: How Dedicated Optical Circuits are Enabling New ScienceRiding the Light: How Dedicated Optical Circuits are Enabling New Science
Riding the Light: How Dedicated Optical Circuits are Enabling New Science
 
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...
An End-to-End Campus-Scale High Performance Cyberinfrastructure for Data-Inte...
 
Sgg crest-presentation-final
Sgg crest-presentation-finalSgg crest-presentation-final
Sgg crest-presentation-final
 
Quoc Le, Stanford & Google - Tera Scale Deep Learning
Quoc Le, Stanford & Google - Tera Scale Deep LearningQuoc Le, Stanford & Google - Tera Scale Deep Learning
Quoc Le, Stanford & Google - Tera Scale Deep Learning
 

Kogo 2013-ngs galaxy

  • 1. NGS Analysis using Galaxy 2013 한국유전체학회 동계심포지엄 생물정보분석교육 워크샵 김형용, 이규열, 이성찬 _ 2013. 02. 05 ~ 2013.02.06 R&D Center, Insilicogen, Inc.
  • 2. Index 목차 있을 시 간지 01 Galaxy introduction NGS Analysis using Galaxy 02 Galaxy examples 1,2 03 Galaxy installation 04 Galaxy function details 05 Galaxy examples 3,4 06 Galaxy tools 07 Galaxy on Grid 08 Galaxy on Cloud
  • 3. Agenda 구분 시간 강의내용 비고 15:00 ~ 15:20 Galaxy 소개 진행 김형용 15:20 ~ 15:50 Galaxy 분석예제 시연 1. Human exon 가운데 가장 SNP 많은 ex on 찾기 1부: 2. NGS QC and assembly 예제 Introduction 16:00 ~ 16:20 Galaxy 설치 진행 이성찬 and 16:20 ~ 17:10 Galaxy 설치 및 분석예제 실습 1. Galaxy 설치 실습 Application 2. Human exon 가운데 가장 SNP가 많은 exon 찾기 실습 3. NGS QC and assembly 예제 실습 17:20 ~ 17:50 Galaxy 세부 기능에 대한 설명 진행 김형용 09:00 ~ 09:20 Galaxy 분석예제 시연 진행 김형용 1. RNA-seq 분석 예제 2. NGS 분석예제 2 19:20 ~ 09:50 Galaxy 분석예제 실습 1. RNA-seq 분석 예제 2. NGS 분석예제 2 2부: 10:00 ~ 10:20 Galaxy tool의 이해 진행 김형용 Custom 10:20 ~ 11:00 Galaxy tool 작성 실습 1. Primer design operation 11:10 ~ 11:30 Galaxy on Grid 진행 이규열 1. 그리드의 이해 2. 분산작업 시연 11:30 ~ 11:50 Galaxy on Cloud 진행 김형용 1. 클라우드의 이해 2. Galaxy on Amazon EC2 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 3
  • 5. Sequencer Comparison Illumina 454 SOLiD 5500 5500xl 5500xl HiSeq 2000 HiSeq 1000 HiScan SQ GAIIx GS FLX microbeads microbeads nanobeads Mate pair : 60 bp X60 bp Read 2X100 bp 2X150 bp 400 bp Paired-end : 75 bp X35 bp length Fragment : 75 bp Gb/day 55 35 17.5 6.5 10h 10-15 20-30 30-45 Yield 600Gb 300Gb 150Gb 95Gb 35Mb 90Gb 180Gb 300Gb Required 50 ng with Nextera input 100 ng – 1 μg with TruSeq 85% (2X50 bp, >Q30) Accuracy 99% (>Q20) 99.99% 80% (2X100 bp, >Q30) Illumina의 Gb/day는 2X100 bp run 결과 Illumina read length : 1X35, 2X50, 2X100 GA : 1X35, 2X50, 2X100, 2X150 Copyrightⓒ Insilicogen, Inc. 2011. All rights reserved. 5
  • 6. Applications Application of NGS Technique Personal Genomics Environmentology Microbiology Toxicology Personal Genomics Chemical Biology Mutation Detection Structure Variation Transcriptional Control Interaction of DNA and Protein Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 6
  • 7. Issue of New Genomic Era. many researchers, having invested in next generation sequencing instruments, now face a computational bottleneck in their research work-flow. BGI Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 7
  • 8. Most Significant Improvement to Your Next Generation Sequencing Workflow (출처: The Global Outlook for Next Generation Sequencing: Usage, Platform Drivers & Workflow, October 31, 2011. BioInformatics, LLC) Copyrightⓒ Insilicogen, Inc. 2010. All rights reserved. 8
  • 9. Issue of New Genomic Era. Bioinformatics •DNA shearing •Insert into high and • Big Dye • FTP /or low copy • ABI 3730 • Gene prediction • Web browser number vectors • Data compliation • BLAST search • Commercial software Library Sequence Sequence Data delivery construction delineation annotation Template Finishing & Secondary purification Assembly annotation • PCR Amplicons • Primer walking • SNP • BACs • Transposon insertion methods • Comparative genomics • Cosmids/ Fosmids • Proprietary & commercial assembly • Expression analysis Cost Process Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 9
  • 10. Application of Next Genomic Data Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 10
  • 11. Practical Software Platforms for NGS data analysis
  • 12. What kind of? • Biological Features • Framework (Enterprise/Informatics) Features • Service • Price
  • 13. List of NGS Frameworks Copyrightⓒ Insilicogen,Inc. 2012. All rights reserved. 13
  • 14. 유전변이 추출 전문 파이프라인 HugeSeq Copyrightⓒ Insilicogen,Inc. 2012. All rights reserved. 14
  • 15. 사용자 친화적 GUI환경을 제공하는 CLC Genomics Server CLC Genomics Server 1 - 3계층 시스템 구조의 데이터 분석 및 공유, 관리를 위한 엔터프라이즈 솔루션 ② ⑤ CLC Bioinformatics Database 2 - 데이터의 중앙 집중 방식의 저장 및 공유 관리를 위한 데이터베이스 CLC Assembly Cell 3 - NGS 데이터의 초고속 assembly 분석 솔루션 (커맨드라인 기반) ① CLC Genomics Workbench 4 - NGS 데이터의 다양한 생물정보 분석 솔루션 (GUI 기반) ③ ④ CLC Developer Kit 5 - 사용자가 원하는 생물정보 분석 툴과 워크플로우 커스터마이징 솔루션 Copyrightⓒ Insilicogen,Inc. 2012. All rights reserved. 15
  • 16. Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 16
  • 17. 30x Human genome 1 sample (150G) 500만원 (1년저장) Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 17
  • 18. 구글로부터 투자받아 NCBI SRA 서비스 연동 온라인에서 실험없이 곧 바로 분석 가능 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 18
  • 20. Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 20
  • 21. Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 21
  • 22. What is Galaxy Galaxy, a web-based genome analysis platform http://usegalaxy.org • An open-source framework for integrating various computational tools and databases into a cohesive workspace • A web-based service we provide, integrating many popular tools and resources for comparative genomics • A completely self-contained application for building your own Galaxy style sites Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 22
  • 23. Galaxy Usage • One of the fastest growing open source bioinformatics projects, a highly successful high throughput data analysis platform for Life Sciences with over 15,000 users worldwide • Annual Galaxy Community Conference Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 23
  • 24. Galaxy visualization External Genome Browser  UCSC  Ensembl  GBrowse Trackster  Track/data viewer in web browser  HTML5 Canvas, jQuery  Renders in browser, not on server Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 24
  • 25. Galaxy visualization Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 25
  • 26. Trackster Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 26
  • 27. Trackster Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 27
  • 28. Trackster Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 28
  • 29. Galaxy 구성요소 Galaxy 주요구성 요소  Datasources : 입력 데이터 지정. 별도의 지역 시스템이나, 외부 웹사이트의 데이터를 등록 가능  Tool : 기본적인 분석의 최소 단위, 지역설치시 원하는 툴을 만들어 넣을 수 있음  History : 입력데이터가 Tool의 조합을 거쳐 얻어진 중간 결과물 목록  Workflow : History 는 입력데이터 및 파라메터만 바꾸면 새로운 데이터 결과를 얻을 수 있다. 이를 별도로 프로세스 등록  Visualization : 분석결과를 가시화 도구와 연결  Page : 위 요소들을 종합한 보고서 작성 기능 Eprimer3 tool 을 별도로 만들어 등록한 예제 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 29
  • 30. Galaxy tool 은 입력 출력 Tool 포맷 포맷 입력 데이터를 (포맷에 맞게) 작업하여 (포맷에 맞게) 출력 데이터를 만드는 역할 조합하면 Workflow가 된다 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 30
  • 31. Galaxy formats Auto-detect 데이터가 어떤 형식인지 자동으로 인식 A binary sequence file in 'ab1' format with a '.ab1' file extension. You must manually select this 'File Format' when uploadi Ab1 ng the file. blastz pairwise alignment format. Each alignment block in an axt file contains three lines: a summary line and 2 sequence li Axt nes. Blocks are separated from one another by blank lines. The summary line contains chromosomal position and size infor mation about the alignment. It consists of 9 required fields. Bam A binary file compressed in the BGZF format with a '.bam' file extension. Bed Tab delimited format (tabular). Does not require header line A sequence in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of Fasta the description line is a greater-than (">") symbol in the first column. All lines should be shorter than 80 characters FastqSolexa Illumina (Solexa) variant of the Fastq format, which stores sequences and quality scores in a single file Gff GFF lines have nine required fields that must be tab-separated. The GFF3 format addresses the most common extensions to GFF, while preserving backward compatibility with previous fo Gff3 rmats. Interval (Genomic Tab delimited format (tabular) Intervals) Lav Lav is the primary output format for BLASTZ. The first line of a .lav file begins with #:lav.. TBA and multiz multiple alignment format. The first line of a .maf file begins with ##maf. This word is followed by white-sp MAF ace-separated "variable=value pairs". There should be no white space surrounding the "=". A binary sequence file in 'scf' format with a '.scf' file extension. You must manually select this 'File Format' when uploading Scf the file. Sff A binary file in 'Standard Flowgram Format' with a '.sff' file extension. Tabular (tab delimi Any data in tab delimited format (tabular) ted) The wiggle format is line-oriented. Wiggle data is preceded by a track definition line, which adds a number of options for Wig controlling the default display of this track. Other text type Any text file Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 31
  • 32. Galaxy 특징 한번 더 최근 Galaxy 사용 추세 Biologist NGS 관련 분석기능 탑재 논문에 Galaxy URL 제공 Amazon Cloud 이용 Transparent analysis Bioinformatician Galaxy 특징 한번 더  파이썬으로 만들어져 있으나, 확장시 파이썬이 아니어도 됨  “투명한” 분석 플로우를 만들고 공유하고 확장할 수 있다.  거의 모든 생물정보 분석을 Galaxy 로 할 수 있다.  Galaxy만 잘 써도 뽑겠다 (NCBI) … Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 32
  • 34. Example 1. Finding Human Exons with the highest number of SNPs 1. Download all Human Exons from NCBI or Ensembl BioMart or UCSC TableBrowser 2. Download all Human SNPs from … 3. Scripting  Join 1, 2 according to position  Group by Exon id  Sort by SNP count  Filter Exon which has more than 10 SNPs Have to do programming! (Python, Perl, …) Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 34
  • 35. On Galaxy http://usegalaxy.org Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 35
  • 36. On Galaxy Get data  UCSC main : Exon 데이터 가져오기 Get data  UCSC main : SNP 데이터 가져오기 Operate on Genomic Interval  Join : 영역이 겹치는 Exon 추출하기 Join, Substract and Group  Group : Exon 이름으로 그룹핑하고 SNP 세기 Filter and Sort  Sort : SNP 개수로 Exon 정렬하기 Text Manipulation  Select first : SNP 개수가 많은 top 5 exon 추출하기 Join, Substract and Group  Compare two Datasets : 잃어버린 exon 정보 회복하기 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 36
  • 38. Example 2. Human NGS data QC and assembly 1. NGS Quality Control 2. NGS Single End Mapping 3. SNP Calling 4. Compare with dbSNP Have to do in Unix and need programming! (Python, Perl, …) Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 38
  • 39. On Galaxy http://usegalaxy.org Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 39
  • 40. On Galaxy NGS 분석을 위해서는 프로그램 추가 설치해야 함 ( http:// http://wiki.galaxyproject.org/Admin/NGS%20Local%20Setup ) 프로그램 사용되는 곳 설치방법 Fastx-toolkit NGS QC Ubuntu apt-get Gnuplot NGS QC boxplot Ubuntu apt-get Bowtie2 Reference assembly 복사 후 PATH 설정 SAMTools SNP calling Ubuntu apt-get Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 40
  • 41. On Galaxy Get data  Upload File : human illumina fastq 파일 업로드 NGS: QC and minipulation  : fastsanger 포맷을 변경 FASTQ Groomer NGS: QC and minipulation  : fastq quality 통계정보 보기 Compute quality statistics NGS: QC and minipulation  Draw quality score boxplot : fastq quality 통계정보로 boxplot 그리기 NGS: QC and minipulation  : 의미없는 부분 잘라내기, 가리기 FASTQ Trimmer, Quality Trimer, Masker Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 41
  • 42. On Galaxy Get data  Upload File : Reference assembly를 위한 레퍼런스 서열 입력 NGS: Mapping  Bowtie2 : Bowtie2를 이용한 assembly NGS: SAM Tools  MPileup : BAM 파일에서 SNP, indel 정보 추출하기 NGS: SAM Tools  Filter pileup : 추출된 SNP, indel 가운데 높은 점수 추출하기 NGS: SAM Tools  Pileup-to-interval : Genomic interval 형식으로 변경 Get data  UCSC Main : dbSNP 정보 가져오기 Operate on Genomic Interval  Join : 영역이 겹치는 SNP 추출하기 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 42
  • 44. Install Virtualbox - Ubuntu 1. USB에서 Virtualbox와 Galaxy 폴더를 복사합니다. 2. Virtualbox를 설치합니다. 3. Virtualbox를 실행한 후, Galaxy 이미지를 Import합니다. 4. 설정에서 네트워크를 브릿지(Bridge)로 변경합니다. 5. Ubuntu 실행 후, Network 설정 파일을 삭제합니다.  rm /etc/udev/rules.d/70-persistent-net.rules 6. Linux(ubuntu) 를 재 시작합니다.  sudo shutdown –h now Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 44
  • 45. Creating your own Galaxy Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 45
  • 46. Running Galaxy in an production environment By default, Galaxy uses  SQLite database  Built-in HTTP server for all tasks  Local job runnser  Single process  Simplest error-proof configuration Change configuration for service  Disable the developer settings  use_interactive = False, use_debug = False  Get a real database  PostgresSQL  Offload the menial tasks: Proxy  Nginix, Apache  Let your tools free: Cluster  Move intensive processing to other host, TORQUE, GRID, DRMAA  Other advanced settings Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 46
  • 47. Galaxy on Cluster Intensive processes to other hosts  TORQUE  GRID  DRMAA Working with Galaxy on the Cloud Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 47
  • 49. Virtualization 가상화 • 컴퓨터 자원의 추상화를 일컫는 말 • 가상의 물리적 리소스를 만들어 냄. •물리적인 1대의 하드웨어 자원을 논리적으로 여러 개로 나누어 사용하거나, •여러대의 하드웨어 자원을 논리적으로 통합하여 이용하는 기술 • 하드웨어 관리, 재난에 대한 시스템 복구 등 여러 문제를 해결할 수 있는 방법으로 최근 각광 받고 있음
  • 50. Virtualization 가상화의 장점!! • 비용절감  서버 한 대를 분할하여 여러 대의 서버를 구성할 수 있음  서버 구입비용 절감, 전기, 상면비용, 서버관리비용이 절감 • 자원의 효율적인 사용  서버의 비 활용되는 자원을 이용하여 가상머신을 만듬으로써 효율적인 자원사용이 가능 • 안정적인 운영  서버를 이미지로 백업, 손쉬운 서버 이전으로 장애에 대한 신속한 대처 가능 • SW의 지속적인 운영  서버 HW의 수명 주기가 끝나면 OS 벤더는 장치 드라이버 지원이 중단됨 -> 마이그레이션 문제가 발생  가상머신에 기존의 시스템을 가상머신에 올리기 때문에 장치 드라이버에 대한 문제 가 발생하지 않음 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 50
  • 51. 클라우드 서비스에 기본적으로 활용 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 51
  • 52. Public Galaxy environment Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 52
  • 53. Example of Cloud 출처 : iSC 2012 Amazon HPC session Copyrightⓒ Insilicogen,Inc. 2012. All rights reserved. 53
  • 54. Running Galaxy Web server 1. 자신의 컴퓨터의 IP Address를 확인합니다.  ifconfig 2. Galaxy 폴더로 이동합니다.  cd galaxy-dist 3. Galaxy web server를 실행합니다.  sh run.sh 4. 자신의 호스트 OS (windows) 에서 웹브라우저에서 주소창에 다음을 입력합니다.  IP Address:8080 (예, 172.20.8.162:8080) Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 54
  • 56. Get Data Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 56
  • 57. Get Data / Send Data Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 57
  • 58. Text Manipulation Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 58
  • 59. Convert Format Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 59
  • 60. FASTA manipulation Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 60
  • 61. Filter and Sort Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 61
  • 62. Join, Subtract and Group Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 62
  • 63. Operate on Genomic Intervals Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 63
  • 64. NGS Toolbox Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 64
  • 66. Example 3. Human RNA-seq 1. RNA-seq result: adrenal_1,2.fastq, brain_1,2.fastq 2. Reference: iGenome UCSC hg19, chr19 gene notation (GTF format) Have to do in Unix and need programming! (Python, Perl, …) Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 66
  • 67. On Galaxy http://usegalaxy.org Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 67
  • 68. On Galaxy RNA-seq 분석을 위해서는 프로그램 추가 설치해야 함 ( http://wiki.galaxyproject.org/Admin/NGS%20Local%20Setup ) 프로그램 사용되는 곳 설치방법 java FastQC Ubuntu apt-get install openjdk-7-jre FastQC NGS QC tool-data/shared/jars/ 로 복사 Tophat RNA-seq mapping (다음페이지 참고) Cufflinks RNA-seq assembly Ubuntu apt-get install cufflinks Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 68
  • 69. Tophat install in Ubuntu $ cp samtools-0.1.18.tar.gz2 ~/work $ bzip2 –d samtools-0.1.18.tar.gz2 $ tar xvf samtools-0.1.18.tar $ cd samtools-0.1.18 $ make $ cd .. $ cp tophat-1.4.1.tar.gz ~/work $ tar zxvf tophat-1.4.1.tar.gz $ cd tophat-1.4.1 $ apt-get install libboost libbam libboost-thread-dev $ cp ../samtools-0.1.18/libbam.a /usr/local/lib $ sudo mkdir /usr/local/include/bam $ cp ../samtools-0.1.18/*.h /usr/local/include/bam $ configure $ make $ make install Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 69
  • 70. On Galaxy Get data  Upload File : fastq, chr19.fa, gtf 파일 업로드 NGS: QC and minipulation  : fastqsanger 포맷으로 변경 FASTQ Groomer NGS: QC and minipulation  : fastq quality 통계정보 보기 FastQC:Read QC NGS: RNA Analysis  : RNA-seq fastq 데이터에서 splice junction 찾기 Tophat for Illumina 레퍼런스로 chr19.fa 이용 NGS: RNA Analysis  : Transcript assembly, FPKM 추정 Cufflinks Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 70
  • 71. On Galaxy NGS: RNA Analysis  Cuffmerge : brain, adrenal 데이터를 reference에 맞게 합치기 NGS: RNA Analysis  Cuffdiff : 유의한 발현변화 찾기 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 71
  • 73. Galaxy tool 은 입력 출력 Tool 포맷 포맷 입력 데이터를 (포맷에 맞게) 작업하여 (포맷에 맞게) 출력 데이터를 만드는 역할 조합하면 Workflow가 된다 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 73
  • 74. Galaxy formats Auto-detect 데이터가 어떤 형식인지 자동으로 인식 A binary sequence file in 'ab1' format with a '.ab1' file extension. You must manually select this 'File Format' when uploadi Ab1 ng the file. blastz pairwise alignment format. Each alignment block in an axt file contains three lines: a summary line and 2 sequence li Axt nes. Blocks are separated from one another by blank lines. The summary line contains chromosomal position and size infor mation about the alignment. It consists of 9 required fields. Bam A binary file compressed in the BGZF format with a '.bam' file extension. Bed Tab delimited format (tabular). Does not require header line A sequence in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of Fasta the description line is a greater-than (">") symbol in the first column. All lines should be shorter than 80 characters FastqSolexa Illumina (Solexa) variant of the Fastq format, which stores sequences and quality scores in a single file Gff GFF lines have nine required fields that must be tab-separated. The GFF3 format addresses the most common extensions to GFF, while preserving backward compatibility with previous fo Gff3 rmats. Interval (Genomic Tab delimited format (tabular) Intervals) Lav Lav is the primary output format for BLASTZ. The first line of a .lav file begins with #:lav.. TBA and multiz multiple alignment format. The first line of a .maf file begins with ##maf. This word is followed by white-sp MAF ace-separated "variable=value pairs". There should be no white space surrounding the "=". A binary sequence file in 'scf' format with a '.scf' file extension. You must manually select this 'File Format' when uploading Scf the file. Sff A binary file in 'Standard Flowgram Format' with a '.sff' file extension. Tabular (tab delimi Any data in tab delimited format (tabular) ted) The wiggle format is line-oriented. Wiggle data is preceded by a track definition line, which adds a number of options for Wig controlling the default display of this track. Other text type Any text file Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 74
  • 75. Creating your own Galaxy Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 75
  • 76. Primer design tool Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 76
  • 77. Primer3  Primer3 • Primer design program • http://primer3.sourceforge.net/releases.php • Download from http://sourceforge.net/projects/primer3/files/primer3/1.1.4/prim er3-1.1.4.tar.gz • make & copy to PATH  eprimer3 • Wrapper for Primer3, it’s used in EMBOSS package • Easy command line interface • http://emboss.sourceforge.net/apps/release/6.4/emboss/apps/ eprimer3.html • apt-get install emboss Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 77
  • 78. erimer3 # EPRIMER3 RESULTS FOR GL020027.1 $ eprimer3 # Start Len Tm GC% Sequence –sequence INPUT_FASTA_FILE 1 PRODUCT SIZE: 199 –outfile PRIMER_DESIGN_RESULT FORWARD PRIMER 571071 20 60.06 45.00 CTTGCCAATAGCGAATGGAT -osize OSIZE REVERSE PRIMER 571250 20 59.99 55.00 GACGGCGTAGATCTTCAAGC -gcclamp GCCLAMP 2 PRODUCT SIZE: 199 … FORWARD PRIMER 55074 20 60.05 55.00 TAACACCACTGCTCCTGCTG REVERSE PRIMER 55253 20 59.97 50.00 CATTGCATGGTCAGAACCAC 3 PRODUCT SIZE: 200 FORWARD PRIMER 71990 20 60.03 45.00 GGGGTTGATTTTCATTGTGG 이 결과 형식을 수정하여 REVERSE PRIMER 72170 20 59.88 45.00 GTTTGCACCAACCTGGTTTT 다른 Galaxy tool의 입력 으로 쓰고 싶다. 4 PRODUCT SIZE: 200 FORWARD PRIMER 427182 20 59.83 50.00 CTGATGTGCTCTGTGGGAAA REVERSE PRIMER 427362 20 60.01 55.00 CCGTGTATGTAGCCCGAGTT 5 PRODUCT SIZE: 197 직접 Primer design FORWARD PRIMER 427185 20 59.97 50.00 ATGTGCTCTGTGGGAAAACC Galaxy tool 만들기 REVERSE PRIMER 427362 20 60.01 55.00 CCGTGTATGTAGCCCGAGTT Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 78
  • 79. erimer3.xml Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 79
  • 80. erimer3.py Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 80
  • 81. tool_conf.xml … <section name="VCF Tools" id="vcf_tools"> <tool file="vcf_tools/intersect.xml" /> <tool file="vcf_tools/annotate.xml" /> <tool file="vcf_tools/filter.xml" /> <tool file="vcf_tools/extract.xml" /> </section> <section name=“MyTools" id=“mytools"> <tool file=“mytools/eprimer3.xml" /> </section> </toolbox> Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 81
  • 82. EMBOSS eprimer3 tool added Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 82
  • 83. 실습 Install Primer3 : make 명령으로 컴파일 후, primer3_core PATH 설정 Install EMBOSS : sudo apt-get install emboss Install Biopython : sudo apt-get install python-biopython Copy eprimer3.py, eprimer3.xml to galaxy-dist/tools/mytools/ : mytools 디렉토리는 직접 생성 Edit tool_conf.xml : mytools/eprimer3.xml 설정 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 83
  • 85. Grid vs Cluster 대용량 데이터에 대한 연산을 작은 소규모 연산들로 나누 공통점 어 작은 여러대의 컴퓨터로 분산시켜 수행 WAN상에서 서로 다른 기종의 머신들을 연결 차이점 다양한 플랫폼을 서로 연결함 연결대수에 제한이 없음 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 85
  • 86. Grid Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 86
  • 87. Globus Toolkit  대표적인 계산 그리드 미들웨어  Open source toolkit for building computing grids developed and provided by Globus Alliance  Standards implementation • Open Grid Service Architecture (OGSA) • Open Grid Service Infrastructure (OGSI) • Web Services Resource Framework (WSRF) • Job Submission Description Language (JSDL) • Distributed Resource Management Application API (DRMAA) • SOAP • WSDL • Grid Security Infrastructure Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 87
  • 88. High level Open Grid Forum API specification for submission and control of jobs to a Distributed Resource Management (DRM, Job scheduler) system, such as a Cluster or Grid computing infrastructure Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 88
  • 89. PBS (Portable Batch System)  Computer software that performs job scheduling in Unix cluster environment  A component of the Globus Toolkit  Originally developed by NASA  Following versions • OpenPBS • TORQUE – a fork of OpenPBS • PBS Professional (PBS pro) - commercial Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 89
  • 90. TORQUE  Distributed resource manager providing control over batch jobs and distributed compute node  It stands for Terascale Open Source Resource and QUEue Manager  Slave 노드의 CPU개수, core 개수, RAM사이즈, 임 시저장소 등의 설정정보를 가지고 스케줄러에 의해 요청이 왔을 때 클러스터 리소스를 분배함 Slave 1 Master Slave 2 NFS Slave 3 > qsub a.sh a.sh 명령을 스케줄러에 따라 slave로 넘김 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 90
  • 91. Virtualized Galaxy (Test-bed) Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 91
  • 93. Cloud computing  Delivery of computing and storage capacity as a service to a heterogeneous community of end-recipients. Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 93
  • 94. Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 94
  • 95. VPS (Virtual Private Server)  Internet hosting services to refer a virtual machine in a cloud Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved. 95
  • 96. Amazon EC2 (Amazon Elastic Compute Cloud) Virtualization + Grid(Cluster) computing in a Cloud 96
  • 97. Amazon EC2 (Amazon Elastic Compute Cloud) 97
  • 98. Amazon EC2 (Amazon Elastic Compute Cloud) 98
  • 99. Amazon EC2 (Amazon Elastic Compute Cloud) 99
  • 100. Amazon S3 (Amazon Simple Storage Service) 100
  • 101. Galaxy on Cloud Using Amazon EC2 + S3 Select AMIs in Community AMIs 101 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.
  • 102. Galaxy on Cloud 102 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.
  • 103. Galaxy on Cloud 103 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.
  • 104. Galaxy on Cloud 104 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.
  • 105. Galaxy on Cloud 105 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.
  • 106. Galaxy on Cloud 106 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.
  • 107. Galaxy on Insilicogen Galaxy localization on cluster Tool development Workflow development 107 Copyrightⓒ Insilicogen,Inc. 2011. All rights reserved.
  • 108. www.insilicogen.com E-mail codes@insilicogen.com Tel 031-278-0061 Fax 031-278-0062
  • 109. www.insilicogen.com E-mail km@insilicogen.com Tel 031-548-1008,1009 Fax 031-278-0062