SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble
Approach
CB Hong ⇤
, KJ Kim
4-5 February 2015
Contents
1 TCGA Benchmark 4 Data Set 3
1.1 GenomeTorrent| t© TCGA pt0 ‰¥‹ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Sample Data Set DX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 îú⌧ Ì Ù Ux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 ‰µ` pt0 Ux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 ¨X0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Somatic Mutation Prediction 6
2.1 SomaticSniper ‰â ✏ ¨⌅ D0 ©X0 (164 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 VarScan2 ‰â ✏ ¨⌅ D0 ©X0 (10Ñ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 MuTect ‰â ✏ ¨⌅ D0 ©X0 (18Ñ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 ¨X0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Full Consensus / Partial Consensus sSNV lX0 11
3.1 Bi-allelic SNPÃ îúX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Full Consensus / Partial Consensus lX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Full Consensus / Partial Consensus /⇠ lX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 ¨X0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 î D0 ©X0 13
4.1 Unifed Genotyper| t© normal, tumor variants call (8Ñ) . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Filtering SNVs - full consensus (›µ •) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Filtering SNVs - partial consensus (SomaticSniper/MuTect) . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 GATK D0| © ƒ Full Consensus / Partial Consensus /⇠ lX0 . . . . . . . . . . . . . . . . . . 14
4.5 ¨X0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5 Validation 15
5.1 COSMIC, CCLE pt0 DX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2 Validation ⇠â - consensus / parital consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.3 ¨X0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6 0¿ Somatic Mutation Callers - Strelka, Virmid 17
6.1 Strelka (1Ñ38 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2 Virmid (33Ñ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
⇤KT GenomeCloud hongiiv@gmail.com
1
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 2
7 ⌅¥ l| ⌅ ¨⇧§ 19
7.1 ‰µ© ¨⇧§ ⌧Ñ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7.2 ‰µ© ¨⇧§ ⌧Ñ ⌘çX0 - ƒ∞਩ê . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7.3 ‰µ© ¨⇧§ ⌧Ñ ⌘çX0 -  ⇣î ¨⇧§ ¨©ê . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7.4 ¨⇧§ ‹§ Ù LD¥0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
7.5 ¨⇧§ | ‹§ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.6 ¨⇧§ X‹§l î X0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.7 | ( Ö9¥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.8 ¨⇧§ $∏Ãl Ù . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7.9 ¨⇧§ Uï ttX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.10 ¨⇧§ å⌅∏Ë¥ $XX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.10.1 APT| t© å⌅∏Ë¥ $X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.10.2 å§ T‹ Ù |D µ å⌅∏Ë¥ $X . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 3
1 TCGA Benchmark 4 Data Set
¯ ‰µ–⌧î TCGA mutation calling benchmark4 datasetsD t©XÏ ¥ªå somatic mutationD >D¿– t⌧
LD ¸ ÉÖ»‰. Genome sequencing benchmakr dataset@ x⌅ < tumor ÿ – | D((5%-95%)X Normal
ÿ D <iXÏ ›1 pt0Ö»‰. t ⌘–⌧ ∞¨î n40t60 (mixed with 60% of the tumor and 40% of the
normal)¸ t– QXî normal sampleD ¨©` ÉÖ»‰. t˘ pt0î BAM Ϙ< TCGA Benchmark Hò
t¿–⌧ ‰¥‹ •i»‰.
1.1 GenomeTorrent| t© TCGA pt0 ‰¥‹
• ‰¥‹ S/W $X - Key/UUID | ‰¥‹ - ÿ ‰¥‹
• ‹)TCGA Benchmark Data SetD ⌅ Public Key ‰¥‹
• https://cghub.ucsc.edu/datasets/benchmark download.html
$ cd
$ wget https:// cghub.ucsc.edu/software/downloads/cghub_public.key
• π |X ‰¥‹ Ù| ÏhXî UUID(universally unique identifier, ›ƒê) |
• TCGA Benchmark cell line: HCC1143 tumor 50x
$ curl https:// cghub.ucsc.edu/cghub/metadata/ analysisAttributes ? 
analysis_id=ad3d4757 -f358 -40a3 -9d92 -742463 a95e88 
-o uuid.txt
$ more uuid.txt
<?xml version="1.0" encoding="utf -8" standalone="yes"?>
<center_name >UCSC </ center_name >
<study >TCGA_MUT_BENCHMARK_4 </study >
<files >
<file >
<filename >G15511.HCC1143 .1.bam </ filename >
<filesize >255795959440 </ filesize >
</file >
• gtdownload| t© pt0 ‰¥‹
$ cd
$ gtdownload -c cghub_public.key -vv -d uuid.txt
1.2 Sample Data Set DX0
• BAMX |Ä Ì îú - ,(sort) - xqÒ (index)
¸…¥ Ë⌅ îú (-b: bam Ϙ< ú%)
$ cd
$ samtools view -b in.bam 1 > chr1.bam
$ samtools sort chr1.bam chr1_sorted
$ samtools index chr1_sorted.bam
• π ÌX îú (BED | t©)
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 4
$ cd
$ cat chr17.bed
17:5967 -6207
17:11197 -11389
17:11806 -12018
17:13897 -14017
17:22307 -22427
17:30843 -30963
17:31151 -31279
17:63618 -63738
17:65398 -65638
17:69410 -69530
17:96838 -97108
17:131511 -131661
17:169155 -169395
17:170984 -171254
17:177205 -177355
17:260100 -260308
17:262897 -263257
17:263317 -263947
$ cat chr17.bed |xargs samtools view -b in.bam 
> exome.bam
$ samtools sort exome.bam exome_sorted
$ samtools index exome_sorted.bam
1.3 îú⌧ Ì Ù Ux
• readƒ ⌅X Ù| bed Ϙ< ú%‰. ⌅Ëà ucsc genome browserX custom track< î XÏ align
⌧ read Ù| Ux` ⇠ à‰.
$ cd
$ bamToBed -i exome_sorted.bam > cov_1.bed
• BAM |X ‰Ñ¨¿| BED | ú%Xp, read depth Ù| ৆¯®< ¯¨0 ⌅ Ù ©
⇠ à‰.
$ cd
$ samtools view -b exome_sorted.bam | 
genomeCoverageBed -ibam stdin > cov_2.bed
1.4 ‰µ` pt0 Ux
• ÿ , ⌅¯®, |§ pt0 ©]
$ cd /somatic_bench
$ pwd
/somatic_bench
$ ls -al
total 176
drwxr -xr -x 7 root root 4096 Jan 21 15:25 .
drwxr -xr -x 25 root root 4096 Jan 20 08:53 ..
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 5
drwxr -xr -x 9 root root 4096 Jan 21 08:15 app
drwxr -xr -x 2 root root 4096 Jan 21 14:38 bam
drwxr -xr -x 2 root root 4096 Jan 19 11:43 reference
drwxr -xr -x 2 root root 4096 Jan 21 15:24 script
drwxr -xr -x 2 root root 151552 Jan 21 12:59 tmp
$ more /somatic_bench/script/ somatic_call_bench .sh
input_bam1="/somatic_bench/bam/hcc1143.ccle.n40t60.sorted.bam"
input_bam2="/somatic_bench/bam/hcc1143.ccle.b.sorted.bam"
gatk_b37="/somatic_bench/reference/ human_g1k_v37_decoy .fasta"
temp_dir="/somatic_bench/tmp/"
$ cd
$ ln -s /somatic_bench/bam/hcc1143.ccle.n40t60.sorted.bam tumor.bam
$ ln -s /somatic_bench/bam/hcc1143.ccle.b.sorted.bam normal.bam
1.5 ¨X0
• ⌅¯® ©]: wget, curl, gtdownload, samtools, bedtools(bamToBed, genomeCoverageBed)
• ∞¸<: –Xî ÌÃt t¨Xî .bam, t˘ .bamX coverage| Ùϸî .bed
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 6
2 Somatic Mutation Prediction
SomaticSniper, VarScan2, MuTectD t©XÏ ÿ pt0K< Ä0 (tumor@ matched normal bam) somatic mu-
tationD >D≈»‰.
• Ñ Ö9: https://gist.github.com/hongiiv/06611f189f4c8158edb0
• SAMtools: v0.1.19
• GATK: v2.8.1
• MuTect: v1.1.4
• SomaticSniper: v1.0.4
• Strelka: v1.0.14
• Virmid: v1.1.1
2.1 SomaticSniper ‰â ✏ ¨⌅ D0 ©X0 (164 )
SomaticSniperî Varscan2| Ç ÃÒ4 YX Li Ding– Xt 2011D ⌧⌧⇠»<p, Bayesian probability@ poste-
rior filteringD t©‰. ¸î π’<î High computational e ciency| Ùx‰.
• -J: joint genotyping mode with default prior probability of a somatic mutation (0.01)
• -n, -t: normal/tumor sample id (for VCF header)
• -F: output Ϙ (classic, vcf, bed)
• -f: ref.fasta |X Ω
$ cd
$ bam - somaticsniper 
-J 
-F vcf 
-n HCC1143_Normal 
-t HCC1143_Tumor 
-f /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
tumor.bam normal.bam 
HCC1143_somaticsniper .vcf
• (D05X) Reads with a mapping quality of 0 were filtered prior to somatic mutation identification. Predictions
with ’somatic score’ of 40 or greater were considered for subsequent downstaream validation and analysis step.
• GATKXSelectVariants| t©XÏ –Xî variantsÃD îú` ⇠ à‰.
• VCF |X FORMAT D‹X SSC (somatic score), MQ (mapping quality) Ù| t©
$ cd
$ ln -s /somatic_bench/app/GenomeAnalysisTK -2.8 -1/ GenomeAnalysisTK .jar ./
$ update -alternatives --config java
There are 2 choices for the alternative java (providing /usr/bin/java ).
Selection Path Priority
------------------------------------------------------------
0 /usr/lib/jvm/java -7- oracle/jre/bin/java 2
1 /usr/lib/jvm/java -6- oracle/jre/bin/java 1
* 2 /usr/lib/jvm/java -7- oracle/jre/bin/java 2
Press enter to keep the current choice [*], or type selection number: 2
update -alternatives : using /usr/lib/jvm/java -6- oracle/jre/bin/java
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 7
$ java -version
java version "1.7.0 _72"
Java(TM) SE Runtime Environment (build 1.7.0_72 -b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.72 -b04 , mixed mode)
$ java -jar GenomeAnalysisTK .jar 
-T SelectVariants 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--variant HCC1143_somaticsniper .vcf 
-o HCC1143_somaticsniper_filter .vcf 
-sn HCC1143_Tumor -sn HCC1143_Normal 
-select 'vc.getGenotype(" HCC1143_Tumor"). getExtendedAttribute ("SSC") >= 40 
&& (vc.getGenotype(" HCC1143_Tumor"). getExtendedAttribute ("MQ") > 0 || 
vc.getGenotype(" HCC1143_Normal "). getExtendedAttribute ("MQ") > 0)'
• D0 ⌅/ƒX mutation /⇠ DPX0
$ cd
$ grep -v "#" HCC1143_somaticsniper .vcf |wc -l
583
$ grep -v "#" HCC1143_somaticsniper_filter .vcf |wc -l
161
2.2 VarScan2 ‰â ✏ ¨⌅ D0 ©X0 (10Ñ)
VarScan2î ÃÒ4 YX Li Ding– Xt SomaticSniperÙ‰ 1D ¶@ 2012D ⌧⌧⇠»‰. ‰x 4‰¸î Ϩ
Fisher exact test@ filtering and FDR correctionD ¨©‰. ¸î π’< high-quality sSNVs– t⌧ sensitive
detectionD ⇠â‰. ‰x 4‰¸ Ϩ Ö% |D .bam |t Dà pileup ⇣î mpileup |D Ö% î‰.
• samtoolsX mpileupD t©XÏ normal, tumor– t⌧ pileup/mpileup ϘD ›1‰.
• mpileup ˃–⌧ -q 1 (skip alignments with mapQ smaller than INT), -B (disable BAQ computation) 5XD µt
filter| ⇠â‰.
• VarScan–⌧ mpileup1
ϘD Ö%< ¨©Xî Ω∞ ’–mpileup 1’ 5XD ‰.
$ cd
$ samtools mpileup 
-f /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
-q 1 -B normal.bam > HCC1143_n.pileup
$ samtools mpileup 
-f /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
-q 1 -B tumor.bam > HCC1143_t.pileup
$ ln -s /somatic_bench/app/VarScan/VarScan.v2 .3.3. jar ./
$ java -jar VarScan.v2 .3.7. jar 
somatic HCC1143_n.pileup HCC1143_t.pileup 
HCC1143_varscan 
--output -vcf 1
14617150 positions in tumor
14616970 positions shared in normal
13721478 had sufficient coverage for comparison
10tX 8⌧‰@ samtoolsX pileupD ¨©Xî ÉD 0 < $Ö⇠¥ à¿Ã, samtools ≈pt∏ ⇠t⌧ pileup@ ¨|¿‡ mpileup
< ¥ ⇠»‰. X¿Ã mpileup<ƒ XòX ÿ à pileupt •X‰. <` varscan–⌧î N/T ®P Ïh⌧ mpileup |D ¿–‰.
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 8
13700958 were called Reference
0 were mixed SNP -indel calls and filtered
18427 were called Germline
1562 were called LOH
450 were called Somatic
81 were called Unknown
0 were called Variant
• VarScan2X ⇠â∞¸ Dò@ ⇡t INDEL¸ SNP Ïh⌧ ∞¸| VCF ‹ ›1⌧‰ (HCC1143 varscan.indel.vcf,
HCC1143 varscan.snp.vcf).
drwxr -xr -x 2 root root 4096 Jan 30 09:52 ./
drwxr -xr -x 5 root root 8192 Jan 30 09:35 ../
-rw -r--r-- 1 root root 402354 Jan 30 09:47 HCC1143_varscan .indel.vcf
-rw -r--r-- 1 root root 2691462 Jan 30 09:47 HCC1143_varscan .snp.vcf
• VarScan2X ∞¸ ⌘, HCC1143varscan.snp.vcf XprocessSomaticısomaticFilter|tXD0|¸.
• processSomatic: high-confidence2
/low-confidence Somatic mutationsD Ѩt ‰.
• somaticFilter: ê‡t –Xî D0 5X –min-coverage, –p-value, –indel-file Ò © •X‰.
$ cd
$ java -jar VarScan.v2 .3.3. jar processSomatic -help
USAGE: java -jar VarScan.jar process [status -file] OPTIONS
status -file - The VarScan output file for SNPs or Indels
OPTIONS
--min -tumor -freq - Minimum variant allele frequency in tumor [0.10]
--max -normal -freq - Maximum variant allele frequency in normal [0.05]
--p-value - P-value for high -confidence calling [0.07]
$ java -jar VarScan.v2 .3.3. jar processSomatic HCC1143_varscan .snp.vcf
Reading input from HCC1143_varscan .snp.vcf
Opening output files:
17914 VarScan calls processed
382 were Somatic (102 high confidence)
16048 were Germline (15431 high confidence)
1451 were LOH (1447 high confidence)
• processSomaticX ∞¸ Germline, LOH, Somatic– t⌧ high confidence, low confidenceX ©]t Ïh
⌧ ∞¸| ›1‰.
$ ls
-rw -r--r-- 1 2413169 Jan 30 09:52 HCC1143_varscan .snp.vcf.Germline
-rw -r--r-- 1 2320566 Jan 30 09:52 HCC1143_varscan .snp.vcf.Germline.hc
-rw -r--r-- 1 216574 Jan 30 09:52 HCC1143_varscan .snp.vcf.LOH
-rw -r--r-- 1 215997 Jan 30 09:52 HCC1143_varscan .snp.vcf.LOH.hc
-rw -r--r-- 1 59990 Jan 30 09:52 HCC1143_varscan .snp.vcf.Somatic
-rw -r--r-- 1 17055 Jan 30 09:52 HCC1143_varscan .snp.vcf.Somatic.hc
• VarScan2X ∞¸ VCFX Ω∞ ALT allele– ’G/T’ Ò< 0Xîp tî îƒ Ñ – –Ï| ⌧›‰. 0|
⌧ ’G,T’X ⌅ )›< ¿Ω‰.
2tumor–⌧ minimum variant allele frequency 0.1, normal–⌧ maximum variant allele frequency 0.05
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 9
$ cd
$ perl -pe 's/tA //tA ,/' HCC1143_varscan .snp.vcf.Somatic.hc | 
perl -pe 's/tT //tT ,/'| 
perl -pe 's/tG //tG ,/'| 
perl -pe 's/tC //tC ,/' > HCC1143_varscan_filter .vcf
• D0 ƒX mutation /⇠
$ cd
$ grep -v "#" HCC1143_varscan_filter .vcf |wc -l
102
2.3 MuTect ‰â ✏ ¨⌅ D0 ©X0 (18Ñ)
MuTect@ Broad–⌧ ⌧⌧⌧ 4 Bayesian probability with pre- and post- filteringD ⇠âXp, πà low allelic-fraction
–⌧ sSNVs– t⌧ sensitive detectionD ⇠â‰.
• MuTectî ê 1.6 Ñ⌅–⌧Ã ŸëX0 L8– ⌅¨ Java Ñ⌅D Ux ƒ– Dî‹ update-alternatives| t
©XÏ Ñ⌅D ¿Ω‰.
$ cd
$ ln -s /somatic_bench/app/mutect/muTect -1.1.4. jar ./
$ samtools index normal.bam
$ samtools index tumor.bam
$ cp /somatic_bench/reference/ccle.gatk.bed ./
$ update -alternatives --config java
There are 2 choices for the alternative java (providing /usr/bin/java ).
Selection Path Priority
------------------------------------------------------------
0 /usr/lib/jvm/java -7- oracle/jre/bin/java 2
1 /usr/lib/jvm/java -6- oracle/jre/bin/java 1
* 2 /usr/lib/jvm/java -7- oracle/jre/bin/java 2
Press enter to keep the current choice [*], or type selection number: 1
update -alternatives : using /usr/lib/jvm/java -6- oracle/jre/bin/java
$ java -version
java version "1.6.0 _45"
Java(TM) SE Runtime Environment (build 1.6.0_45 -b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45 -b01 , mixed mode)
$ java -jar muTect -1.1.4. jar --analysis_type MuTect 
--reference_sequence /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--cosmic /somatic_bench/reference/ b37_cosmic_v54_120711 .vcf 
--dbsnp /somatic_bench/reference/dbsnp_132_b37.leftAligned.vcf 
--input_file:normal normal.bam 
--input_file:tumor tumor.bam 
--out HCC1143_mutect .out 
--vcf HCC1143_mutect .vcf 
--coverage_file HCC1143.mutect.cov.wig.txt 
--normal_sample_name HCC1143_Normal 
--tumor_sample_name HCC1143_Tumor 
-L ccle.gatk.bed
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 10
• (D05X) Predictions not labeled as ’REJECT’ were accepted as confident somatic mutation predictions, and
subsequent downstream validation and analysis steps.
• D0– ¨©` GATKî ê 1.7 Ñ⌅D Dî X¿ update-alternatives| t©XÏ ê Ñ⌅D ¿Ω‰.
• GATKX SelectVariants| t©XÏ VCFX D0 (FILTER) D‹ÄÑt PASS⌧ (REJECT| ⌧x) variantsÃ
>D∏‰.
$ cd
$ update -alternatives --config java
There are 2 choices for the alternative java (providing /usr/bin/java ).
Selection Path Priority
------------------------------------------------------------
0 /usr/lib/jvm/java -7- oracle/jre/bin/java 2
1 /usr/lib/jvm/java -6- oracle/jre/bin/java 1
* 2 /usr/lib/jvm/java -7- oracle/jre/bin/java 2
Press enter to keep the current choice [*], or type selection number: 2
update -alternatives : using /usr/lib/jvm/java -6- oracle/jre/bin/java
$ java -version
java version "1.7.0 _72"
Java(TM) SE Runtime Environment (build 1.7.0_72 -b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.72 -b04 , mixed mode)
$ java -jar GenomeAnalysisTK .jar -T SelectVariants 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--variant HCC1143_mutect .vcf 
-o HCC1143_mutect_filter .vcf 
-sn HCC1143_Tumor -sn HCC1143_Normal 
-select 'vc.isNotFiltered ()'
• GATKX SelectVariants| t©XÏ VCFX D0 (FILTER) D‹ ÄÑt PASS⌧ (REJECT| ⌧x) variantsÃ
>D∏‰.
$ cd
$ java -jar GenomeAnalysisTK .jar -T SelectVariants 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--variant HCC1143_mutect .vcf 
-o HCC1143_mutect_filter .vcf 
-sn HCC1143_Tumor -sn HCC1143_Normal 
--excludeFiltered
• D0 ƒX mutation /⇠
$ cd
$ grep -v "#" HCC1143_mutect_filter .vcf |wc -l
109
2.4 ¨X0
• ⌅¯® ©]: VarScan2, SomaticSniper, MuTect, GATK
• ∞¸<: 4ƒ D0 DÃ⌧ somatic mutation (161, 102, 112)
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 11
3 Full Consensus / Partial Consensus sSNV lX0
SomaticSniper, VarScan2, MuTect 3ÖX SNV detecting toolsX full consensus callD >î‰. ∞ multi-allelic¸ indel
@ ⌧p‰.
3.1 Bi-allelic SNPÃ îúX0
• ¨⌅ D0 ∞¸– t⌧ multi-allelicD ⌧pX‡ SNPà îú‰.
• GATKX SelectVariants| t©XÏ -selectTypeD SNP (INDEL, SNP, MIXED, MNP, SYMBOLIC, NO VARIATION),
-restrictAllelesTo| BIALLELIC (MULTIALLELIC or BIALLELIC)<  ‰.
$ cd
$ java -jar GenomeAnalysisTK .jar 
-T SelectVariants 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--variant HCC1143_mutect_filter .vcf 
-o HCC1143_mutect_1 .vcf 
-selectType SNP 
-restrictAllelesTo BIALLELIC
$ java -jar GenomeAnalysisTK .jar 
-T SelectVariants 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--variant HCC1143_somaticsniper_filter .vcf 
-o HCC1143_somaticsniper_1 .vcf 
-selectType SNP 
-restrictAllelesTo BIALLELIC
$ java -jar GenomeAnalysisTK .jar 
-T SelectVariants 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--variant HCC1143_varscan_filter .vcf 
-o HCC1143_varscan_1 .vcf 
-selectType SNP 
-restrictAllelesTo BIALLELIC
3.2 Full Consensus / Partial Consensus lX0
• Partial Consensus (SomaticSniper/MuTect, MuTect/VarScan2, VarScan2/SomaticSniper)@ somatic caller 3Ö–
 ⌅¥ consensus| l‰.
$ cd
$ java -jar GenomeAnalysisTK .jar 
-T SelectVariants 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--variant HCC1143_somaticsniper_1 .vcf 
--concordance HCC1143_mutect_1 .vcf 
-o HCC1143_SM.vcf
$ java -jar GenomeAnalysisTK .jar 
-T SelectVariants 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--variant HCC1143_mutect_1 .vcf 
--concordance HCC1143_varscan_1 .vcf 
-o HCC1143_MV.vcf
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 12
$ java -jar GenomeAnalysisTK .jar 
-T SelectVariants 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--variant HCC1143_varscan_1 .vcf 
--concordance HCC1143_somaticsniper_1 .vcf 
-o HCC1143_VS.vcf
$ java -jar GenomeAnalysisTK .jar 
-T SelectVariants 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--variant HCC1143_SM.vcf 
--concordance HCC1143_varscan_1 .vcf 
-o HCC1143_SMV.vcf
3.3 Full Consensus / Partial Consensus /⇠ lX0
• full consensus ✏ parital consensus /⇠| l‰.
$ cd
$ grep -v "#" HCC1143_SM.vcf |wc -l
45
$ grep -v "#" HCC1143_MV.vcf |wc -l
38
$ grep -v "#" HCC1143_VS.vcf |wc -l
42
$ grep -v "#" HCC1143_SMV.vcf |wc -l
32
3.4 ¨X0
• ⌅¯® ©]: GATK
• ∞¸<: consensus / parital consensus pt0
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 13
4 î D0 ©X0
GATK Unified Genotyper| t©XÏ specificity| ù ‹¨ ⇠ à‰.
4.1 Unifed Genotyper| t© normal, tumor variants call (8Ñ)
• GATK UnifiedGenotyper| t©XÏ Normal/Tumor ÿ – t SNP| calling‰.
$ cd
$ java -jar GenomeAnalysisTK .jar 
-T UnifiedGenotyper 
-o HCC1143_gatk.tumor.vcf 
-I tumor.bam 
--genotype_likelihoods_model SNP 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta
-L ccle.gatk.bed
$ java -jar GenomeAnalysisTK .jar 
-T UnifiedGenotyper 
-o HCC1143_gatk.normal.vcf 
-I normal.bam 
--genotype_likelihoods_model SNP 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta
-L ccle.gatk.bed
4.2 Filtering SNVs - full consensus (›µ •)
• GATK UnifiedGenotyper| t©XÏ ›1⌧ Normal/Tumor X variants| t©XÏ SNVs predicted in tumor
but not the germlines D0| ⇠â‰.
$ cd
$ java -jar GenomeAnalysisTK .jar 
-T SelectVariants 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--variant HCC1143_SMV.vcf 
--discordance HCC1143_gatk.normal.vcf 
-o HCC1143_SMV_discordance_normal .vcf
$ java -jar GenomeAnalysisTK .jar 
-T SelectVariants 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--variant HCC1143_SMV_discordance_normal .vcf 
--concordance HCC1143_gatk.tumor.vcf 
-o HCC1143_final_filter_concordance .vcf
4.3 Filtering SNVs - partial consensus (SomaticSniper/MuTect)
$ cd
$ java -jar GenomeAnalysisTK .jar 
-T SelectVariants 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--variant HCC1143_SM.vcf 
--discordance HCC1143_gatk.normal.vcf
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 14
-o HCC1143_SM_discordance_normal .vcf
$ java -jar GenomeAnalysisTK .jar 
-T SelectVariants 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--variant HCC1143_SM_discordance_normal .vcf 
--concordance HCC1143_gatk.tumor.vcf 
-o HCC1143_SM_final_filter_concordance .vcf
4.4 GATK D0| © ƒ Full Consensus / Partial Consensus /⇠ lX0
• GATK D0| » consensus ✏ parital consensus /⇠| l‰.
$ cd
$ grep -v "#" HCC1143_final_filter_concordance .vcf |wc -l
32
$ grep -v "#" HCC1143_SM_final_filter_concordance .vcf |wc -l
45
4.5 ¨X0
• ⌅¯® ©]: GATK
• ∞¸<: GATK D0| © consensus / parital consensus pt0
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 15
5 Validation
COSMIC¸CCLEX HCC1143 ÿ –  ¿t ¨§∏| ¿‡ º»ò |XXî¿| LD¯‰. validation.list
|@ ⌧Ñ– •⌧ | ⇣î ‰¥‹ (https://gist.github.com/hongiiv/42194181ce6402d8b629)XÏ ¨©i»‰.
5.1 COSMIC, CCLE pt0 DX0
• COSMIC¸ CCLEX HCC1143 ÿ –  ¿t ©] ( 103⌧)D ı¨‰.
$ cd
$ cp /somatic_bench/reference/validation.list ./
$ cat validation.list | wc -l
103
5.2 Validation ⇠â - consensus / parital consensus
• Ö filter⌧ consensus/partial consensus (SomaticSniper/MuTect)– t⌧ á⌧ |XXî¿| Ux‰.
$ cd
$ java -jar GenomeAnalysisTK .jar 
-T SelectVariants 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--variant HCC1143_final_filter_concordance .vcf 
-o all.val.filter.vcf 
-L validation.list
$ java -jar GenomeAnalysisTK .jar 
-T SelectVariants 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--variant HCC1143_SM_final_filter_concordance .vcf 
-o sm.val.filter.vcf 
-L validation.list
$ grep -v "#" all.val.filter.vcf | wc -l
6
$ grep -v "#" sm.val.filter.vcf | wc -l
9
• î  GATK D0⌅X consensus ¿t– t⌧ á⌧ |XXî¿| Ux‰.
$ cd
$ java -jar GenomeAnalysisTK .jar 
-T SelectVariants 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--variant HCC1143_SMV.vcf 
-o all.val.vcf 
-L validation.list
$ java -jar GenomeAnalysisTK .jar 
-T SelectVariants 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--variant HCC1143_SM.vcf 
-o sm.val.vcf 
-L validation.list
$ grep -v "#" all.val.vcf |wc -l
6
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 16
$ grep -v "#" sm.val.vcf |wc -l
9
• consensus: before GATK filter (32/6) - after GATK filter (32/6)
• partial consensus-SM: before GATK filter (45/9) - after GATK filter (45/9)
5.3 ¨X0
• ⌅¯® ©]: GATK
• ∞¸<: Ö consensus / partial consensus@ COSMIC, CCLE@ |XXî /⇠
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 17
6 0¿ Somatic Mutation Callers - Strelka, Virmid
6.1 Strelka (1Ñ38 )
Bayesian probability with posterior filtering| t© somatic mutation caller 2012D |˯ò Ç ⌅¯®t
‰. |˯òX alignerx issactò eland –à D»| bwaƒ ¿–‰.‰â)ït |⇠ ⌅¯®‰¸î }⌅ ‰x
)›D t©Xîp tî |˯ò ¸ ⌧ issac ⇣ D∑ ‰â)ïD ¨©Xp, tî XòX ⌅ ∏|
®( < ¨X‡ | 1àå ¨X0 ⌅XÏ Makefile t|î ›D ¨©Xî make |î ¯¨| t©
X0 L8t‰.
• Strelka| ¨©X0 ⌅t⌧î StrelkaX 5Xt •⌧ |t DîXp, 0¯ < bwa, eland, isaac 3⌧X
aligner| ⌅ 0¯ 5XD ⌧ı‰.
• 0¯ 5X–⌧ exometò target sequencingX Ω∞ isSkipDepthFilters = 1  ¿ ‰.
$ ll /somatic_bench/app/strelka -1.0.14/ etc/
total 20
drwxrwxr -x 2 viz viz 4096 Jul 10 2014 ./
drwxr -xr -x 7 root root 4096 Jan 30 11:06 ../
-rw -rw -r-- 1 viz viz 3658 Jul 10 2014 strelka_config_bwa_default .ini
-rw -rw -r-- 1 viz viz 3683 Jul 10 2014 strelka_config_eland_default .ini
-rw -rw -r-- 1 viz viz 3821 Jul 10 2014 strelka_config_isaac_default .ini
• Strelka $X⌧  †¨@ Ñ ∞¸ •  †¨– t⌧ ¿⇠ $ D ‰.
• 0¯ 5X |D ı¨X‡ configureStrelkaWorkflow.pl Ö9< Ñ Ö9¥| ›1‰.
• É¥ƒ Ñ Ö9D make| µt ‰âXp tL -j 5XD µt Ñ – ¨©` thread (cpu) /⇠| ¿ ‰.
• INDEL¸ SNP ƒƒX VCF Ϙ< ›1⇠p, pass ⌧ ɸ raw somatic 4⌧X ∞¸ |t
›1⌧‰.
$ STRELKA_INSTALL_DIR =/ somatic_bench/app/strelka -1.0.14/
echo $ STRELKA_INSTALL_DIR
/somatic_bench/app/strelka -1.0.14/
$ WORK_DIR =/ root/myWork
$ cp $ STRELKA_INSTALL_DIR /etc/ strelka_config_isaac_default .ini config.ini
$ STRELKA_INSTALL_DIR /bin/ configureStrelkaWorkflow .pl 
--normal =/ root/normal.bam 
--tumor =/ root/tumor.bam 
--ref=/ somatic_bench/reference/ human_g1k_v37_decoy .fasta 
--config=config.ini --output -dir =./ myAnalysis
$ cd ./ myAnalysis
$ make -j 8
$ ll myAnalysis/results/
total 88
drwxr -xr -x 2 root root 4096 Jan 30 11:39 ./
drwxr -xr -x 5 root root 4096 Jan 30 11:37 ../
-rw -r--r-- 1 root root 13452 Jan 30 11:37 all.somatic.indels.vcf
-rw -r--r-- 1 root root 36736 Jan 30 11:37 all.somatic.snvs.vcf
-rw -r--r-- 1 root root 7098 Jan 30 11:37 passed.somatic.indels.vcf
-rw -r--r-- 1 root root 16070 Jan 30 11:37 passed.somatic.snvs.vcf
• Ö pass⌧ somatic SNPX /⇠| Ux‰.
$ cd myAnalysis/results/
$ grep -v "#" passed.somatic.snvs.vcf|wc -l
62
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 18
6.2 Virmid (33Ñ)
Virmidî 2013D 8 YP @¡∞ P⇠ Ç å⌅∏Ë¥Ö»‰. ÿ ¡D µt tumor–⌧ normal ÿ X pro-
portionD ©‰ (↵).
• Ö pass⌧ somatic SNPX /⇠| Ux‰.
$ java -jar /somatic_bench/app/Virmid -1.1.1/ Virmid.jar 
-R /somatic_bench/reference/ human_g1k_v37_decoy .fasta 
-D /root/tumor.bam 
-N /root/normal.bam 
-t 8 
-w /root/virmid
$ cd /root/virmid
$ ls -la
$ ls -al
total 98024
drwxr -xr -x 2 root 4096 Jan 30 16:00 ./
drwxr -xr -x 8 root 8192 Jan 30 15:32 ../
-rw -r--r-- 1 root 1252161 Jan 30 16:03 tumor.bam.virmid.germ.all.vcf
-rw -r--r-- 1 root 955213 Jan 30 16:03 tumor.bam.virmid.germ.passed.vcf
-rw -r--r-- 1 root 262 Jan 30 16:00 tumor.bam.virmid.gm
-rw -r--r-- 1 root 36564 Jan 30 16:03 tumor.bam.virmid.loh.all.vcf
-rw -r--r-- 1 root 2233 Jan 30 16:01 tumor.bam.virmid.loh.passed.vcf
-rw -r--r-- 1 root 992 Jan 30 16:03 tumor.bam.virmid.report
-rw -r--r-- 1 root 1364144 Jan 30 15:29 tumor.bam.virmid.sample.control.bai
-rw -r--r-- 1 root 53107377 Jan 30 15:29 tumor.bam.virmid.sample.control.bam
-rw -r--r-- 1 root 1364104 Jan 30 15:29 tumor.bam.virmid.sample.disease.bai
-rw -r--r-- 1 root 41746178 Jan 30 15:29 tumor.bam.virmid.sample.disease.bam
-rw -r--r-- 1 root 84053 Jan 30 16:03 tumor.bam.virmid.som.all.vcf
-rw -r--r-- 1 root 6883 Jan 30 16:03 tumor.bam.virmid.som.passed.vcf
$ grep -v "#" tumor.bam.virmid.som.passed.vcf|wc -l
78
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 19
7 ⌅¥ l| ⌅ ¨⇧§
7.1 ‰µ© ¨⇧§ ⌧Ñ
• ⌧Ñ ¸å: xxx.xxx.xxx.xxx
• Dt: edu01, edu02
• T8: kogo2015
• ˘⌘ç: http://xxx.xxx.xxx.xxx:8787
7.2 ‰µ© ¨⇧§ ⌧Ñ ⌘çX0 - ƒ∞਩ê
• http://www.chiark.greenend.org.uk/˜sgtatham/putty/download.html ⌘ç
• Intel x86© putty.exe| ‰¥‹ i»‰.
• Host Name: xxx.xxx.xxx.xxx / Port: xx
• Security Alert =t (t ’ (Y)’| ›i»‰.
• ¯x Dt: `˘ @ Dt@ T8| ¨©i»‰.
7.3 ‰µ© ¨⇧§ ⌧Ñ ⌘çX0 -  ⇣î ¨⇧§ ¨©ê
• Â(OSX)X Ω∞ ’Q©⌅¯®, ¯¨, 0¯⇣ app’D ‰âi»‰. ¨⇧§X Ω∞ ’Tt ⇣î ê ¨
⇧§X ⌅¯® Tt–⌧ 0¯⇣D ‰â i»‰.
$ ssh user_id@host_name
$ ssh root@127 .0.0.1
• ssh Ö9D t©XÏ ‰µ© ¨⇧§ ⌧Ñ– ⌘çi»‰. ´à¯ ⌘ç‹ yes| ›Xt T8| ;î Ttt
ò$å ⇠p tL ÄÏ @ T8| Ö%XÏ ⌘çi»‰.
7.4 ¨⇧§ ‹§ Ù LD¥0
¯ 8⌧î ¨⇧§ 0Ï⇣3
X Xòx ’Ubuntu (∞Ñ,)’| 0⇠< $Öi»‰. ƒƒX ‹ ∆î Ω∞ ®‡ Ö
XX ¨⇧§– ¨©t •i»‰. ¨⇧§î ‰ë 0Ï⇣¸ X‹Ë¥¡–⌧ ŸëXî ¥ ¥⌧Ö»‰. ê‡X
¨⇧§ ¥† XΩ–⌧ ŸëXî¿| LDP¥| å⌅∏Ë¥ $X‹ ê‡X ¨⇧§– i å⌅∏Ë¥X
$X •i»‰.
• ⌅¨ ê‡t ¨©Xî ¨⇧§ 0Ï⇣X ÖX ›ƒXî )ïÖ»‰. UbuntuX Ω∞ 4à 0Ï⇠î ¨⇧§
¥ ¥⌧ ⌅¨ ‡Ñ⌅@ 14.04 LTS (Long Term Support)4
Ñ⌅Ö»‰.
$ cat /etc/issue.net
Ubuntu 12.04.1 LTS
• ¨⇧§î ‰ë X‹Ë¥ XΩ–⌧ ¥ ⇠p ¨⇧§| ¿–Xî å⌅∏Ë¥‰@ tÏ X‹Ë¥– 0|
‰â |D 0 ⌧ıi»‰. 0|⌧ ⌅¨ ê‡t ¨©Xî X‹Ë¥ Ù| Lt ꇖå fiî å⌅∏Ë
¥| ‰¥‹XÏ ¨©` ⇠ ൻ‰. ¨⇧§ ⌧Ñ •D X‹Ë¥ ¨ë ›ƒ@ ’-m’ â, machine 5XD µt
L ⇠ ൻ‰. ’x86’@ Intel 0⇠X CPU| X¯Xp, ’64’î 64D∏ X‹Ë¥| X¯5
i»‰.
$ uname -m
x86_64
3¨⇧§î lå ‹á ƒÙ¸ pDH ƒÙ Ѩ⇠p ƒÙƒ ‰ë 0Ï⇣t t¨‰.
4T‹Ö@ Trusty TahrÖ»‰.
5Tà ⌅Ï⌧ x64|‡ ⌅i»‰.
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 20
• ‰⇣@ ¨⇧§ ¥ ¥⌧X uÏ< ¨©êX Ö9D ‰⌧ X‹Ë¥| µt ‰âXƒ] i»‰. ¨⇧§ ‰⇣
@ ¨©Xî 0Ï⇣– 0| ⌧ ‰x Ñ⌅D ¨©i»‰. ⌅¨ • ‡X ¨⇧§ ‰⇣@ 3.14.3dmfh 2014D
5‘6| ⌧⌧ Ñ⌅Ö»‰. ¨⇧§ 0Ï⇣@ t⌥å ⌧⌧ ‰⇣D 0⇠< ⌧ë)»‰. ¨⇧§X ‰⇣
Ù ›ƒ tÙƒ] X†µ»‰.
$ uname -r
3.2.0 -32 - virtual
• X@ ¨⇧§ Ö9¥| Ö% D t| ‰âXî XΩ< ’PATH’î ⌅8§ ŸëXî )ï– •D |
Xî ✓x XΩ ¿⇠ ⌘X XòÖ»‰. exportî tÏ XΩ¿⇠X ✓D $ Xî Ö9¥ Ö»‰. ¨⇧§–
Ö9D Ö%Xt PATH– $ ⌧  †¨| ∞ Ä…XÏ t˘ Ö9¥ àî¿| UxX‡ t| ‰âi
»‰. 0|⌧ ê‡X ¡⌘ å⌅∏Ë¥| $XX‡ ¨⇧§ ¡–⌧ ‰âXî Ω∞ ⇠‹‹ PATH| ¿ t| ¥
–⌧‡¿ ‰ât •Xp ¯⌥¿ J@ Ω∞ å⌅∏Ë¥ $X⌧  †¨ ¥–⌧à ‰ât •i»‰.
X XΩ ¿⇠ Ux@ ’env’ Ö9< LD º ⇠ à<p, PATHî ’export’| µt $ i»‰.
$ env | grep PATH
MANPATH =/usr/local/texlive /2013/ texmf/doc/man:
PATH =/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
INFOPATH =/usr/local/texlive /2013/ texmf/doc/info:
$ export PATH =/BIO/app/bwa -0.7.5a/:$PATH
$ env | grep PATH
7.5 ¨⇧§ | ‹§
¨⇧§X X@ XòX <¨ §l| |¨ < ÏÏ Ì< lÑXÏ ¨Xp X@ | ‹§
D ›1XÏ | ✏  †¨| ¨` ⇠ ൻ‰.
• ¨⇧§ ‹§@ ÏÏ ¨©ê ¨©Xî ‹§< ê ê‡X ‡ Ìx H †¨| ¿‡ ൻ
‰. H  †¨¥–⌧î ê‡t |D ›1, ≠⌧ •i»‰. H  †¨ tŸXî Ö9@ ’cd’ Ö9
tp, ⌅¨  †¨ Ωî ’pwd’ Ö9< Ux` ⇠ ൻ‰.
$ cd
$ pwd
/home/hongiiv
•  †¨ ɇ t˘  †¨ tŸX0
$ cd
$ mkdir sample_data
$ ls -la
total 2203488
drwxr -xr -x 16 hongiiv hongiiv 4096 May 29 10:34 .
drwxr -xr -x 3 root root 4096 May 7 13:14 ..
-rw ------- 1 hongiiv hongiiv 1908 May 10 11:59 .bash_history
-rw -r--r-- 1 hongiiv hongiiv 220 May 7 13:14 .bash_logout
-rw -r--r-- 1 hongiiv hongiiv 3763 May 10 17:06 .bashrc
drwxr -xr -x 2 root root 4096 May 29 10:34 sample_data
$ cd sample_data
$ pwd
/home/hongiiv/sample_data
•  †¨ ✏ | ≠⌧X0
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 21
$ cd
$ rm -rf sample_data
$ ls -la
total 2203488
drwxr -xr -x 16 hongiiv hongiiv 4096 May 29 10:34 .
drwxr -xr -x 3 root root 4096 May 7 13:14 ..
-rw ------- 1 hongiiv hongiiv 1908 May 10 11:59 .bash_history
-rw -r--r-- 1 hongiiv hongiiv 220 May 7 13:14 .bash_logout
-rw -r--r-- 1 hongiiv hongiiv 3763 May 10 17:06 .bashrc
$
• ¨⇧§ | ‹§ Ù0
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 19G 14G 4.8G 74% /
udev 3.9G 4.0K 3.9G 1% /dev
tmpfs 1.6G 188K 1.6G 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 3.9G 0 3.9G 0% /run/shm
/dev/xvdb1 79G 38G 38G 50% /home/hongiiv/test
• <¨ X‹§l X Ù Ù0 - 21.5 GBX <¨ x /dev/xvda X‹§lî vxda1, xvda2 2⌧X 
X< l1⇠¥ à<p Linux, Linux swapX |‹§ÑD Ux` ⇠ ൻ‰.
$ fdisk -l
Disk /dev/xvda: 21.5 GB , 21474836480 bytes
255 heads , 63 sectors/track , 2610 cylinders , total 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00034212
Device Boot Start End Blocks Id System
/dev/xvda1 2048 40038399 20018176 83 Linux
/dev/xvda2 40038400 41940991 951296 82 Linux swap / Solaris
Disk /dev/xvdb: 300.6 GB , 300647710720 bytes
171 heads , 35 sectors/track , 98112 cylinders , total 587202560 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x3459a991
Device Boot Start End Blocks Id System
/dev/xvdb1 2048 587202559 293600256 8e Linux LVM
• | ‹§ »¥∏ Ù Ux
$ cat /etc/fstab
proc /proc proc nodev ,noexec ,nosuid 0 0
/dev/xvda1 / ext3 errors=remount -ro 0 1
/dev/xvda2 none swap sw 0 0
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 22
7.6 ¨⇧§ X‹§l î X0
• fdisk| µt î ⌧ X‹§l| Ux ƒ T›, |‹§ ›1, »¥∏X 3˃| p– X‹§l
| ¨©i»‰. USB •X| ¨⇧§– x›X0 ⌅t⌧î mount ¸ ÃD pXt )»‰.
$ fdisk /dev/xvdb
$ mkfs.ext3 /dev/xvdb1
$ mkdir /new_hdd
$ mount /dev/xvdb1 /new_hdd
$ cd /new_hdd
$ df -h
7.7 | ( Ö9¥
• touch - | l0 0x »¥ | ›1Xpò |t ›1⌧ ‹⌅D ¿Ω` ⇠ ൻ‰. ⌅9 ⌅¥ (
å⌅∏Ë¥ $Xò P!‹ ¨©Xî Ö9¥ ⇡¿X‹0 绉.
$ touch a
$ ls -al
-rw -r--r-- 1 root root 0 Jun 18 10:04 a
$ date
Wed Jun 18 10:05:10 KST 2014
$ touch -c a
$ ls -al
-rw -r--r-- 1 root root 0 Jun 18 10:05 a
• cat - |X ¥©D UxXpò ⌅Ë §lΩ∏ ë1‹ ¨©i»‰. ’cat ¿ test’ Ö9< test|î |D
›1Xt⌧ | ¥©D ë1i»‰. ë1t DÃ⌧ ƒ–î ’ctrl+D’ ѺD Ï `8ò, ⇠ ൻ‰.
$ cat > test
hi there
my name is hong
$ cat test
hi there
my name is hong
$ ls -al
-rw -r--r-- 1 root root 25 Jun 18 10:09 test
• π  †¨X |X /⇠ 80
$ ls -l . | grep ^- | wc -l
50
• |X π 8êÙ ‹ëXî ÄÑD ⌧x ÄÑ ú%X0Ö»‰. VCF |¸ ⇡t ’’ ‹ëXî ÄÑ@
¸ x Ω∞ ¸ ÃD ⌧x ‰⌧ ⌅¿tX ¨§∏| ú%i»‰. ⇣î ¯ ⇠  ¸ ÄÑÃD ú%i
»‰.
$ cd /BIO/data/gatk
$ grep -v "#" dbsnp_138.hg19.vcf| wc -l
8087914
$ grep -F "#" dbsnp_138.hg19.vcf |wc -l
165
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 23
• π ¸…¥Ã ú%i»‰. t˘ ¸…¥X L ≥⌧ ’-d’, +ê⌧’-c’< ,t •i»‰.
$ grep -v "#" dbsnp_138.hg19.vcf |awk '{print $1}'| more
chrM
chrM
chrM
chrM
chrM
chrM
chrM
chrM
chrM
$ grep -v "#" dbsnp_138.hg19.vcf |awk '{print $1}'| sort -d
chr1
chr2
$ grep -v "#" dbsnp_138.hg19.vcf |awk '{print $1}'| uniq -c
475 chrM
4723878 chr1
3363561 chr2
$ grep -v "#" dbsnp_138.hg19.vcf | 
awk '{if ($1 == "chrM") printf "chrM is: %sn", $2}'
chrM is: 16390
chrM is: 16391
chrM is: 16429
chrM is: 16445
chrM is: 16499
•  ú%< ú%⇠î ¥©D | •X0
$ grep -v "#" dbsnp_138.hg19.vcf | 
awk '{if ($1 == "chrM") printf "chrM is: %sn", $2}' > ~/chr_pos.txt
$ grep -v "#" dbsnp_138.hg19.vcf | 
awk '{if ($1 == "chr1") printf "chrM is: %sn", $2}' >> ~/chr_pos.txt
7.8 ¨⇧§ $∏Ãl Ù
• $∏Ãl x0òt§–  Ù eth0X inet addrt xÄ–⌧ ⌅¨ ¨⇧§ ⌘ç • ¸å6
Ö»‰.
$ ifconfig
eth0 Link encap:Ethernet HWaddr 02:00:5b:73:00:33
inet addr: 172.27.252.234 Bcast: 172.27.255.255
inet6 addr: fe80::5bff:fe73:33/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:501386 errors:0 dropped:0 overruns:0 frame:0
TX packets:346879 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:19357734604 (1 GB) TX bytes:2720265191 (2 GB)
Interrupt:68
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
6¨⇧§ ⌧ÑX ¸åî 172.27.252.234 êX ‰µ XΩ– 0| ‰tå ‹⌧‰.
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 24
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:4337 errors:0 dropped:0 overruns:0 frame:0
TX packets:4337 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2203478 (2.2 MB) TX bytes:2203478 (2.2 MB)
7.9 ¨⇧§ Uï ttX0
¨⇧§î ‰ë UïD ¿–Xp, å⌅∏Ë¥ò pt0| 0ÏXî Ω∞ Uï⌧ |D t©XÏ 0Ïi»‰.
• ¨⇧§–⌧ ¨©Xî ‰ë Uï t⌧ )ïÖ»‰. UïD t⌧ |H–î 8⌧ ‰¥àµ»‰. 8⌧|
⌧| < x‹î Ñ–åî ¡àt ¸¥—»‰.
$ cd
$ cp -R /BIO/data/compress ./ compress
$ cd compress
$ gzip -d compress01.gz
$ tar xvfz compress02.tar.gz
$ unzip compress03.zip
$ bzip2 -d comress04.bz2
$ tar xvfz compress05.tar.gz
$ tar xvf compress06.tar.bz2
• gzip: Recommended for fast network connections
• bzip2: Recommended for slower network connections (smaller size but takes longer to compress)
• zip: Not recommended but is provided as an option for those who cannot open the above formats
• ©…X Uï⌧ ⌅¥ pt0– t UïD t⌧X¿ J‡ ¯¨ |X ¥© UxXî )ïÖ»‰. FASTQ
|ÒD UxXîp ©i»‰.
$ gzip -dc CEUTrio.HiSeq.WGS.b37.bestPractices.hg19.vcf.gz | more
$ gzip -dc CEUTrio.HiSeq.WGS.b37.bestPractices.hg19.tar.gz | tar -tvf -
7.10 ¨⇧§ å⌅∏Ë¥ $XX0
|⇠ < ¨⇧§– å⌅∏Ë¥| $XXî )ï@ ‰LX 3 ¿ )ït ൻ‰. ´à¯î t ¨ (‰â)
|D Uï ‹ ⌧ıXî )ï< ⌅Ëà UïD t⌧XÏ  ¨©t •X‰. Pà¯î ¨⇧§–⌧ ⌧ı
Xî (§¿| t©Xî )ï< ∞Ñ,X Ω∞ APT|î (§¿ ¨ ⌅¯®D t©‰. 8à¯î å§
|D t©XÏ $XXî )ït‰.
7.10.1 APT| t© å⌅∏Ë¥ $X
• APT| t© (§¿ ≈pt∏
$ apt -get update
$ apt -get install bwa
Reading package lists ... Done
Building dependency tree
Reading state information ... Done
Use 'apt -get autoremove ' to remove them.
Suggested packages:
samtools
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 25
The following NEW packages will be installed:
bwa
0 upgraded , 1 newly installed , 0 to remove and 153 not upgraded.
Need to get 135 kB of archives.
After this operation , 286 kB of additional disk space will be used.
Fetched 135 kB in 3s (40.1 kB/s)
Selecting previously unselected package bwa.
(Reading database ...17 files and directories currently installed .)
Unpacking bwa (from .../ archives/bwa_0 .6.1 -1 _amd64.deb) ...
Processing triggers for man -db ...
Setting up bwa (0.6.1 -1) ...
$ bwa
Program: bwa (alignment via Burrows -Wheeler transformation )
Version: 0.6.1 - r104
Contact: Heng Li <lh3@sanger.ac.uk >
Usage: bwa <command > [options]
Command: index index sequences in the FASTA format
aln gapped/ungapped alignment
samse generate alignment (single ended)
sampe generate alignment (paired ended)
bwasw BWA -SW for long queries
fastmap identify super -maximal exact matches
fa2pac convert FASTA to PAC format
pac2bwt generate BWT from PAC
pac2bwtgen alternative algorithm for generating BWT
bwtupdate update .bwt to the new format
bwt2sa generate SA from BWT and Occ
pac2cspac convert PAC to color -space PAC
stdsw standard SW/NW alignment
• NGS ( å⌅∏Ë¥ $X| ⌅t ¯¨ 0¯ $X⇠¥| Xî (§¿ ©]Ö»‰.
$ apt -get update -y
$ apt -get install gcc -y
$ apt -get install make -y
$ apt -get install zlib1g -dev -y
$ apt -get install libncurses5 -dev -y
$ apt -get install g++ -y
$ apt -get install tcl tk -y
$ apt -get install tcl -dev -y
$ apt -get install unzip -y
$ apt -get install curl -y
$ apt -get install screen -y
$ apt -get install python -dev -y
$ apt -get install python -software -properties -y
$ add -apt -repository ppa:webupd8team/java
$ apt -get update -y
$ apt -get install oracle -java7 -installer -y
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 26
7.10.2 å§ T‹ Ù |D µ å⌅∏Ë¥ $X
• å§ $XX0
$ cd
$ cp /BIO/app/bwa -0.7.4. tar.bz2 ./
$ tar xvf bwa -0.7.4. tar.bz2
$ cd bwa -0.7.4
$ make
$ ./bwa
Program: bwa (alignment via Burrows -Wheeler transformation )
Version: 0.7.4 - r385
Contact: Heng Li <lh3@sanger.ac.uk >
Usage: bwa <command > [options]
Command: index index sequences in the FASTA format
mem BWA -MEM algorithm
fastmap identify super -maximal exact matches
pemerge merge overlapping paired ends (EXPERIMENTAL)
aln gapped/ungapped alignment
samse generate alignment (single ended)
sampe generate alignment (paired ended)
bwasw BWA -SW for long queries
fa2pac convert FASTA to PAC format
pac2bwt generate BWT from PAC
pac2bwtgen alternative algorithm for generating BWT
bwtupdate update .bwt to the new format
bwt2sa generate SA from BWT and Occ
$ bwa
Program: bwa (alignment via Burrows -Wheeler transformation )
Version: 0.6.2 - r126
Contact: Heng Li <lh3@sanger.ac.uk >
Usage: bwa <command > [options]

Weitere ähnliche Inhalte

Was ist angesagt?

Computational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysisComputational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysiscursoNGS
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq DataPhil Ewels
 
High efficiency qPCR with PrimeTime® Gene Expression Master Mix from IDT
High efficiency qPCR with PrimeTime® Gene Expression Master Mix from IDTHigh efficiency qPCR with PrimeTime® Gene Expression Master Mix from IDT
High efficiency qPCR with PrimeTime® Gene Expression Master Mix from IDTIntegrated DNA Technologies
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)James Hadfield
 
Scaling Genomic Analyses
Scaling Genomic AnalysesScaling Genomic Analyses
Scaling Genomic Analysesfnothaft
 
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...Lucidworks
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016DataStax
 
Alt-R™ CRISPR-Cas9 System: Ribonucleoprotein delivery optimization for improv...
Alt-R™ CRISPR-Cas9 System: Ribonucleoprotein delivery optimization for improv...Alt-R™ CRISPR-Cas9 System: Ribonucleoprotein delivery optimization for improv...
Alt-R™ CRISPR-Cas9 System: Ribonucleoprotein delivery optimization for improv...Integrated DNA Technologies
 

Was ist angesagt? (20)

Computational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysisComputational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysis
 
ABGT 2016 Workshop Schneider
ABGT 2016 Workshop SchneiderABGT 2016 Workshop Schneider
ABGT 2016 Workshop Schneider
 
Grc workshop agbt2015_tg
Grc workshop agbt2015_tgGrc workshop agbt2015_tg
Grc workshop agbt2015_tg
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq Data
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
High efficiency qPCR with PrimeTime® Gene Expression Master Mix from IDT
High efficiency qPCR with PrimeTime® Gene Expression Master Mix from IDTHigh efficiency qPCR with PrimeTime® Gene Expression Master Mix from IDT
High efficiency qPCR with PrimeTime® Gene Expression Master Mix from IDT
 
Use of NCBI Databases in qPCR Assay Design
Use of NCBI Databases in qPCR Assay DesignUse of NCBI Databases in qPCR Assay Design
Use of NCBI Databases in qPCR Assay Design
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
 
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slidesGRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
Ashg grc workshop2014_tg
Ashg grc workshop2014_tgAshg grc workshop2014_tg
Ashg grc workshop2014_tg
 
Scaling Genomic Analyses
Scaling Genomic AnalysesScaling Genomic Analyses
Scaling Genomic Analyses
 
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
 
Agbt2015 workshop schneider
Agbt2015 workshop schneiderAgbt2015 workshop schneider
Agbt2015 workshop schneider
 
agbt 2016 workshop lindsay
agbt 2016 workshop lindsayagbt 2016 workshop lindsay
agbt 2016 workshop lindsay
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
 
Alt-R™ CRISPR-Cas9 System: Ribonucleoprotein delivery optimization for improv...
Alt-R™ CRISPR-Cas9 System: Ribonucleoprotein delivery optimization for improv...Alt-R™ CRISPR-Cas9 System: Ribonucleoprotein delivery optimization for improv...
Alt-R™ CRISPR-Cas9 System: Ribonucleoprotein delivery optimization for improv...
 
Storm Anatomy
Storm AnatomyStorm Anatomy
Storm Anatomy
 

Andere mochten auch

Detecting Somatic Mutation - Ensemble Approach
Detecting Somatic Mutation - Ensemble ApproachDetecting Somatic Mutation - Ensemble Approach
Detecting Somatic Mutation - Ensemble ApproachHong ChangBum
 
Genomics and BigData - case study
Genomics and BigData - case studyGenomics and BigData - case study
Genomics and BigData - case studyHong ChangBum
 
Aug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingAug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingGenomeInABottle
 
Kogo 2013-ngs galaxy
Kogo 2013-ngs galaxyKogo 2013-ngs galaxy
Kogo 2013-ngs galaxyHyungyong Kim
 
Explanation slides Somatic Mutations cancer
Explanation slides Somatic Mutations cancerExplanation slides Somatic Mutations cancer
Explanation slides Somatic Mutations cancermeducationdotnet
 
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics WorkshopLopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics WorkshopNuria Lopez-Bigas
 
Normal/Tumor somatic mutations report tool
Normal/Tumor somatic mutations report toolNormal/Tumor somatic mutations report tool
Normal/Tumor somatic mutations report toolIsaac Noguera
 
Incidental findings throughout multigene panel testing in cancer genetics
Incidental findings throughout multigene panel testing in cancer geneticsIncidental findings throughout multigene panel testing in cancer genetics
Incidental findings throughout multigene panel testing in cancer geneticsPasteur_Tunis
 
Part 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expressionPart 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expressionJoachim Jacob
 
DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club avrilcoghlan
 
DEseq, voom and vst
DEseq, voom and vstDEseq, voom and vst
DEseq, voom and vstQiang Kou
 
Computational genomics approaches to precision medicine
Computational genomics approaches to precision medicineComputational genomics approaches to precision medicine
Computational genomics approaches to precision medicineAltuna Akalin
 
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)Altuna Akalin
 
영어로 논문쓰기 - 읽기 쓰기 통합 전략을 중심으로
영어로 논문쓰기 - 읽기 쓰기 통합 전략을 중심으로영어로 논문쓰기 - 읽기 쓰기 통합 전략을 중심으로
영어로 논문쓰기 - 읽기 쓰기 통합 전략을 중심으로Sungwoo Kim
 
R 기본-데이타형 소개
R 기본-데이타형 소개R 기본-데이타형 소개
R 기본-데이타형 소개Terry Cho
 
R 프로그래밍-향상된 데이타 조작
R 프로그래밍-향상된 데이타 조작R 프로그래밍-향상된 데이타 조작
R 프로그래밍-향상된 데이타 조작Terry Cho
 
R 프로그래밍 기본 문법
R 프로그래밍 기본 문법R 프로그래밍 기본 문법
R 프로그래밍 기본 문법Terry Cho
 

Andere mochten auch (18)

Detecting Somatic Mutation - Ensemble Approach
Detecting Somatic Mutation - Ensemble ApproachDetecting Somatic Mutation - Ensemble Approach
Detecting Somatic Mutation - Ensemble Approach
 
Workshop 2011
Workshop 2011Workshop 2011
Workshop 2011
 
Genomics and BigData - case study
Genomics and BigData - case studyGenomics and BigData - case study
Genomics and BigData - case study
 
Aug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingAug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencing
 
Kogo 2013-ngs galaxy
Kogo 2013-ngs galaxyKogo 2013-ngs galaxy
Kogo 2013-ngs galaxy
 
Explanation slides Somatic Mutations cancer
Explanation slides Somatic Mutations cancerExplanation slides Somatic Mutations cancer
Explanation slides Somatic Mutations cancer
 
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics WorkshopLopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
 
Normal/Tumor somatic mutations report tool
Normal/Tumor somatic mutations report toolNormal/Tumor somatic mutations report tool
Normal/Tumor somatic mutations report tool
 
Incidental findings throughout multigene panel testing in cancer genetics
Incidental findings throughout multigene panel testing in cancer geneticsIncidental findings throughout multigene panel testing in cancer genetics
Incidental findings throughout multigene panel testing in cancer genetics
 
Part 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expressionPart 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expression
 
DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club
 
DEseq, voom and vst
DEseq, voom and vstDEseq, voom and vst
DEseq, voom and vst
 
Computational genomics approaches to precision medicine
Computational genomics approaches to precision medicineComputational genomics approaches to precision medicine
Computational genomics approaches to precision medicine
 
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
 
영어로 논문쓰기 - 읽기 쓰기 통합 전략을 중심으로
영어로 논문쓰기 - 읽기 쓰기 통합 전략을 중심으로영어로 논문쓰기 - 읽기 쓰기 통합 전략을 중심으로
영어로 논문쓰기 - 읽기 쓰기 통합 전략을 중심으로
 
R 기본-데이타형 소개
R 기본-데이타형 소개R 기본-데이타형 소개
R 기본-데이타형 소개
 
R 프로그래밍-향상된 데이타 조작
R 프로그래밍-향상된 데이타 조작R 프로그래밍-향상된 데이타 조작
R 프로그래밍-향상된 데이타 조작
 
R 프로그래밍 기본 문법
R 프로그래밍 기본 문법R 프로그래밍 기본 문법
R 프로그래밍 기본 문법
 

Ähnlich wie Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach

вестник южно уральского-государственного_университета._серия_математика._меха...
вестник южно уральского-государственного_университета._серия_математика._меха...вестник южно уральского-государственного_университета._серия_математика._меха...
вестник южно уральского-государственного_университета._серия_математика._меха...Иван Иванов
 
Apache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at XiaomiApache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at XiaomiHBaseCon
 
ThinkPad® T400 M R400
ThinkPad® T400 M R400ThinkPad® T400 M R400
ThinkPad® T400 M R400zcejzr
 
Rapide deployment with Pathloss
Rapide  deployment with PathlossRapide  deployment with Pathloss
Rapide deployment with PathlossMounir Slimani
 
强烈推荐Ann77+python
强烈推荐Ann77+python强烈推荐Ann77+python
强烈推荐Ann77+python晓坤 丁
 
Exp pcb intro_wkb_rus
Exp pcb intro_wkb_rusExp pcb intro_wkb_rus
Exp pcb intro_wkb_rusak318bc299
 
Clinical significance of transcript alignment discrepancies gne - 20141016
Clinical significance of transcript alignment discrepancies   gne - 20141016Clinical significance of transcript alignment discrepancies   gne - 20141016
Clinical significance of transcript alignment discrepancies gne - 20141016Reece Hart
 
Unveiling the Secrets of Gaokao Essays to Stand Out in IB Chinese Exams 揭秘高考作...
Unveiling the Secrets of Gaokao Essays to Stand Out in IB Chinese Exams 揭秘高考作...Unveiling the Secrets of Gaokao Essays to Stand Out in IB Chinese Exams 揭秘高考作...
Unveiling the Secrets of Gaokao Essays to Stand Out in IB Chinese Exams 揭秘高考作...LEGOO MANDARIN
 
Burst TCP: an approach for benefiting mice flows
Burst TCP: an approach for benefiting mice flowsBurst TCP: an approach for benefiting mice flows
Burst TCP: an approach for benefiting mice flowsGlauco Gonçalves
 
2_DOF_Inverted_Pendulum_Laboratory_Session
2_DOF_Inverted_Pendulum_Laboratory_Session2_DOF_Inverted_Pendulum_Laboratory_Session
2_DOF_Inverted_Pendulum_Laboratory_SessionPeixi Gong
 
YCT 1 Chinese Intensive Reading for Kids Y10900 Official Mock 少儿汉语考试模拟考题 sample
YCT 1 Chinese Intensive Reading for Kids Y10900 Official Mock 少儿汉语考试模拟考题 sampleYCT 1 Chinese Intensive Reading for Kids Y10900 Official Mock 少儿汉语考试模拟考题 sample
YCT 1 Chinese Intensive Reading for Kids Y10900 Official Mock 少儿汉语考试模拟考题 sampleLEGOO MANDARIN
 
Smith randall 15-rolling-element-bearing-diagnostics-cwu
Smith randall 15-rolling-element-bearing-diagnostics-cwuSmith randall 15-rolling-element-bearing-diagnostics-cwu
Smith randall 15-rolling-element-bearing-diagnostics-cwuProto Gonzales Rique
 
Thesis_Sebastian_Ånerud_2015-06-16
Thesis_Sebastian_Ånerud_2015-06-16Thesis_Sebastian_Ånerud_2015-06-16
Thesis_Sebastian_Ånerud_2015-06-16Sebastian
 
Documentation - LibraryRandom
Documentation - LibraryRandomDocumentation - LibraryRandom
Documentation - LibraryRandomMichel Alves
 
Querying Provenance Information: Basic Notions and an Example from Paleoclima...
Querying Provenance Information: Basic Notions and an Example from Paleoclima...Querying Provenance Information: Basic Notions and an Example from Paleoclima...
Querying Provenance Information: Basic Notions and an Example from Paleoclima...Bertram Ludäscher
 
Coriolis rct1000 manual badger meter rct1000
Coriolis rct1000 manual badger meter rct1000Coriolis rct1000 manual badger meter rct1000
Coriolis rct1000 manual badger meter rct1000ENVIMART
 

Ähnlich wie Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach (20)

вестник южно уральского-государственного_университета._серия_математика._меха...
вестник южно уральского-государственного_университета._серия_математика._меха...вестник южно уральского-государственного_университета._серия_математика._меха...
вестник южно уральского-государственного_университета._серия_математика._меха...
 
Apache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at XiaomiApache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at Xiaomi
 
ThinkPad® T400 M R400
ThinkPad® T400 M R400ThinkPad® T400 M R400
ThinkPad® T400 M R400
 
Rapide deployment with Pathloss
Rapide  deployment with PathlossRapide  deployment with Pathloss
Rapide deployment with Pathloss
 
强烈推荐Ann77+python
强烈推荐Ann77+python强烈推荐Ann77+python
强烈推荐Ann77+python
 
TaqMan® Gene Expression Assays Protocol
TaqMan® Gene Expression Assays ProtocolTaqMan® Gene Expression Assays Protocol
TaqMan® Gene Expression Assays Protocol
 
Exp pcb intro_wkb_rus
Exp pcb intro_wkb_rusExp pcb intro_wkb_rus
Exp pcb intro_wkb_rus
 
Clinical significance of transcript alignment discrepancies gne - 20141016
Clinical significance of transcript alignment discrepancies   gne - 20141016Clinical significance of transcript alignment discrepancies   gne - 20141016
Clinical significance of transcript alignment discrepancies gne - 20141016
 
Unveiling the Secrets of Gaokao Essays to Stand Out in IB Chinese Exams 揭秘高考作...
Unveiling the Secrets of Gaokao Essays to Stand Out in IB Chinese Exams 揭秘高考作...Unveiling the Secrets of Gaokao Essays to Stand Out in IB Chinese Exams 揭秘高考作...
Unveiling the Secrets of Gaokao Essays to Stand Out in IB Chinese Exams 揭秘高考作...
 
Burst TCP: an approach for benefiting mice flows
Burst TCP: an approach for benefiting mice flowsBurst TCP: an approach for benefiting mice flows
Burst TCP: an approach for benefiting mice flows
 
Curvic
CurvicCurvic
Curvic
 
2_DOF_Inverted_Pendulum_Laboratory_Session
2_DOF_Inverted_Pendulum_Laboratory_Session2_DOF_Inverted_Pendulum_Laboratory_Session
2_DOF_Inverted_Pendulum_Laboratory_Session
 
YCT 1 Chinese Intensive Reading for Kids Y10900 Official Mock 少儿汉语考试模拟考题 sample
YCT 1 Chinese Intensive Reading for Kids Y10900 Official Mock 少儿汉语考试模拟考题 sampleYCT 1 Chinese Intensive Reading for Kids Y10900 Official Mock 少儿汉语考试模拟考题 sample
YCT 1 Chinese Intensive Reading for Kids Y10900 Official Mock 少儿汉语考试模拟考题 sample
 
Smith randall 15-rolling-element-bearing-diagnostics-cwu
Smith randall 15-rolling-element-bearing-diagnostics-cwuSmith randall 15-rolling-element-bearing-diagnostics-cwu
Smith randall 15-rolling-element-bearing-diagnostics-cwu
 
Thesis_Sebastian_Ånerud_2015-06-16
Thesis_Sebastian_Ånerud_2015-06-16Thesis_Sebastian_Ånerud_2015-06-16
Thesis_Sebastian_Ånerud_2015-06-16
 
Documentation - LibraryRandom
Documentation - LibraryRandomDocumentation - LibraryRandom
Documentation - LibraryRandom
 
Kl ph d_thesisfinal
Kl ph d_thesisfinalKl ph d_thesisfinal
Kl ph d_thesisfinal
 
Querying Provenance Information: Basic Notions and an Example from Paleoclima...
Querying Provenance Information: Basic Notions and an Example from Paleoclima...Querying Provenance Information: Basic Notions and an Example from Paleoclima...
Querying Provenance Information: Basic Notions and an Example from Paleoclima...
 
Brick
BrickBrick
Brick
 
Coriolis rct1000 manual badger meter rct1000
Coriolis rct1000 manual badger meter rct1000Coriolis rct1000 manual badger meter rct1000
Coriolis rct1000 manual badger meter rct1000
 

Mehr von Hong ChangBum

통계유전학워크샵
통계유전학워크샵통계유전학워크샵
통계유전학워크샵Hong ChangBum
 
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...Hong ChangBum
 
BioSMACK - Linux Live CD for GWAS
BioSMACK - Linux Live CD for GWASBioSMACK - Linux Live CD for GWAS
BioSMACK - Linux Live CD for GWASHong ChangBum
 
Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approachHong ChangBum
 
worldwide population
worldwide populationworldwide population
worldwide populationHong ChangBum
 
RSS & Bioinformatics
RSS & BioinformaticsRSS & Bioinformatics
RSS & BioinformaticsHong ChangBum
 
Perspectives of identifying Korean genetic variations
Perspectives of identifying Korean genetic variationsPerspectives of identifying Korean genetic variations
Perspectives of identifying Korean genetic variationsHong ChangBum
 
Genome Browser based on Google Maps API
Genome Browser based on Google Maps APIGenome Browser based on Google Maps API
Genome Browser based on Google Maps APIHong ChangBum
 
Korean Database of Genomic Variants
Korean Database of Genomic VariantsKorean Database of Genomic Variants
Korean Database of Genomic VariantsHong ChangBum
 
Next Generation bio Research Infra
Next Generation bio Research InfraNext Generation bio Research Infra
Next Generation bio Research InfraHong ChangBum
 

Mehr von Hong ChangBum (20)

Demo chapter3
Demo chapter3Demo chapter3
Demo chapter3
 
통계유전학워크샵
통계유전학워크샵통계유전학워크샵
통계유전학워크샵
 
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
 
BioSMACK - Linux Live CD for GWAS
BioSMACK - Linux Live CD for GWASBioSMACK - Linux Live CD for GWAS
BioSMACK - Linux Live CD for GWAS
 
Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approach
 
How to genome
How to genomeHow to genome
How to genome
 
worldwide population
worldwide populationworldwide population
worldwide population
 
RSS & Bioinformatics
RSS & BioinformaticsRSS & Bioinformatics
RSS & Bioinformatics
 
Perspectives of identifying Korean genetic variations
Perspectives of identifying Korean genetic variationsPerspectives of identifying Korean genetic variations
Perspectives of identifying Korean genetic variations
 
Genome Browser based on Google Maps API
Genome Browser based on Google Maps APIGenome Browser based on Google Maps API
Genome Browser based on Google Maps API
 
Korean Database of Genomic Variants
Korean Database of Genomic VariantsKorean Database of Genomic Variants
Korean Database of Genomic Variants
 
Dt Ccompanieslist
Dt CcompanieslistDt Ccompanieslist
Dt Ccompanieslist
 
DTC Companies List
DTC Companies ListDTC Companies List
DTC Companies List
 
My Project
My ProjectMy Project
My Project
 
Genome Browser
Genome BrowserGenome Browser
Genome Browser
 
GenomeBrowser
GenomeBrowserGenomeBrowser
GenomeBrowser
 
Desire
DesireDesire
Desire
 
Next Generation bio Research Infra
Next Generation bio Research InfraNext Generation bio Research Infra
Next Generation bio Research Infra
 
Cluster Drm
Cluster DrmCluster Drm
Cluster Drm
 
Cluster Drm
Cluster DrmCluster Drm
Cluster Drm
 

Kürzlich hochgeladen

%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 

Kürzlich hochgeladen (20)

%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 

Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach

  • 1. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach CB Hong ⇤ , KJ Kim 4-5 February 2015 Contents 1 TCGA Benchmark 4 Data Set 3 1.1 GenomeTorrent| t© TCGA pt0 ‰¥‹ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Sample Data Set DX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 îú⌧ Ì Ù Ux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 ‰µ` pt0 Ux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.5 ¨X0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Somatic Mutation Prediction 6 2.1 SomaticSniper ‰â ✏ ¨⌅ D0 ©X0 (164 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 VarScan2 ‰â ✏ ¨⌅ D0 ©X0 (10Ñ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 MuTect ‰â ✏ ¨⌅ D0 ©X0 (18Ñ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 ¨X0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Full Consensus / Partial Consensus sSNV lX0 11 3.1 Bi-allelic SNPà îúX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Full Consensus / Partial Consensus lX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Full Consensus / Partial Consensus /⇠ lX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.4 ¨X0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4 î D0 ©X0 13 4.1 Unifed Genotyper| t© normal, tumor variants call (8Ñ) . . . . . . . . . . . . . . . . . . . . . . . 13 4.2 Filtering SNVs - full consensus (›µ •) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.3 Filtering SNVs - partial consensus (SomaticSniper/MuTect) . . . . . . . . . . . . . . . . . . . . . . . . 13 4.4 GATK D0| © ƒ Full Consensus / Partial Consensus /⇠ lX0 . . . . . . . . . . . . . . . . . . 14 4.5 ¨X0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5 Validation 15 5.1 COSMIC, CCLE pt0 DX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.2 Validation ⇠â - consensus / parital consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.3 ¨X0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6 0¿ Somatic Mutation Callers - Strelka, Virmid 17 6.1 Strelka (1Ñ38 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.2 Virmid (33Ñ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 ⇤KT GenomeCloud hongiiv@gmail.com 1
  • 2. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 2 7 ⌅¥ l| ⌅ ¨⇧§ 19 7.1 ‰µ© ¨⇧§ ⌧Ñ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.2 ‰µ© ¨⇧§ ⌧Ñ ⌘çX0 - ƒ∞਩ê . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.3 ‰µ© ¨⇧§ ⌧Ñ ⌘çX0 -  ⇣î ¨⇧§ ¨©ê . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.4 ¨⇧§ ‹§ Ù LD¥0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.5 ¨⇧§ | ‹§ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 7.6 ¨⇧§ X‹§l î X0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 7.7 | ( Ö9¥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 7.8 ¨⇧§ $∏Ãl Ù . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 7.9 ¨⇧§ Uï ttX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7.10 ¨⇧§ å⌅∏Ë¥ $XX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7.10.1 APT| t© å⌅∏Ë¥ $X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7.10.2 å§ T‹ Ù |D µ å⌅∏Ë¥ $X . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
  • 3. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 3 1 TCGA Benchmark 4 Data Set ¯ ‰µ–⌧î TCGA mutation calling benchmark4 datasetsD t©XÏ ¥ªå somatic mutationD >D¿– t⌧ LD ¸ ÉÖ»‰. Genome sequencing benchmakr dataset@ x⌅ < tumor ÿ – | D((5%-95%)X Normal ÿ D <iXÏ ›1 pt0Ö»‰. t ⌘–⌧ ∞¨î n40t60 (mixed with 60% of the tumor and 40% of the normal)¸ t– QXî normal sampleD ¨©` ÉÖ»‰. t˘ pt0î BAM Ϙ< TCGA Benchmark Hò t¿–⌧ ‰¥‹ •i»‰. 1.1 GenomeTorrent| t© TCGA pt0 ‰¥‹ • ‰¥‹ S/W $X - Key/UUID | ‰¥‹ - ÿ ‰¥‹ • ‹)TCGA Benchmark Data SetD ⌅ Public Key ‰¥‹ • https://cghub.ucsc.edu/datasets/benchmark download.html $ cd $ wget https:// cghub.ucsc.edu/software/downloads/cghub_public.key • π |X ‰¥‹ Ù| ÏhXî UUID(universally unique identifier, ›ƒê) | • TCGA Benchmark cell line: HCC1143 tumor 50x $ curl https:// cghub.ucsc.edu/cghub/metadata/ analysisAttributes ? analysis_id=ad3d4757 -f358 -40a3 -9d92 -742463 a95e88 -o uuid.txt $ more uuid.txt <?xml version="1.0" encoding="utf -8" standalone="yes"?> <center_name >UCSC </ center_name > <study >TCGA_MUT_BENCHMARK_4 </study > <files > <file > <filename >G15511.HCC1143 .1.bam </ filename > <filesize >255795959440 </ filesize > </file > • gtdownload| t© pt0 ‰¥‹ $ cd $ gtdownload -c cghub_public.key -vv -d uuid.txt 1.2 Sample Data Set DX0 • BAMX |Ä Ì îú - ,(sort) - xqÒ (index) ¸…¥ Ë⌅ îú (-b: bam Ϙ< ú%) $ cd $ samtools view -b in.bam 1 > chr1.bam $ samtools sort chr1.bam chr1_sorted $ samtools index chr1_sorted.bam • π ÌX îú (BED | t©)
  • 4. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 4 $ cd $ cat chr17.bed 17:5967 -6207 17:11197 -11389 17:11806 -12018 17:13897 -14017 17:22307 -22427 17:30843 -30963 17:31151 -31279 17:63618 -63738 17:65398 -65638 17:69410 -69530 17:96838 -97108 17:131511 -131661 17:169155 -169395 17:170984 -171254 17:177205 -177355 17:260100 -260308 17:262897 -263257 17:263317 -263947 $ cat chr17.bed |xargs samtools view -b in.bam > exome.bam $ samtools sort exome.bam exome_sorted $ samtools index exome_sorted.bam 1.3 îú⌧ Ì Ù Ux • readƒ ⌅X Ù| bed Ϙ< ú%‰. ⌅Ëà ucsc genome browserX custom track< î XÏ align ⌧ read Ù| Ux` ⇠ à‰. $ cd $ bamToBed -i exome_sorted.bam > cov_1.bed • BAM |X ‰Ñ¨¿| BED | ú%Xp, read depth Ù| ৆¯®< ¯¨0 ⌅ Ù © ⇠ à‰. $ cd $ samtools view -b exome_sorted.bam | genomeCoverageBed -ibam stdin > cov_2.bed 1.4 ‰µ` pt0 Ux • ÿ , ⌅¯®, |§ pt0 ©] $ cd /somatic_bench $ pwd /somatic_bench $ ls -al total 176 drwxr -xr -x 7 root root 4096 Jan 21 15:25 . drwxr -xr -x 25 root root 4096 Jan 20 08:53 ..
  • 5. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 5 drwxr -xr -x 9 root root 4096 Jan 21 08:15 app drwxr -xr -x 2 root root 4096 Jan 21 14:38 bam drwxr -xr -x 2 root root 4096 Jan 19 11:43 reference drwxr -xr -x 2 root root 4096 Jan 21 15:24 script drwxr -xr -x 2 root root 151552 Jan 21 12:59 tmp $ more /somatic_bench/script/ somatic_call_bench .sh input_bam1="/somatic_bench/bam/hcc1143.ccle.n40t60.sorted.bam" input_bam2="/somatic_bench/bam/hcc1143.ccle.b.sorted.bam" gatk_b37="/somatic_bench/reference/ human_g1k_v37_decoy .fasta" temp_dir="/somatic_bench/tmp/" $ cd $ ln -s /somatic_bench/bam/hcc1143.ccle.n40t60.sorted.bam tumor.bam $ ln -s /somatic_bench/bam/hcc1143.ccle.b.sorted.bam normal.bam 1.5 ¨X0 • ⌅¯® ©]: wget, curl, gtdownload, samtools, bedtools(bamToBed, genomeCoverageBed) • ∞¸<: –Xî ÌÃt t¨Xî .bam, t˘ .bamX coverage| Ùϸî .bed
  • 6. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 6 2 Somatic Mutation Prediction SomaticSniper, VarScan2, MuTectD t©XÏ ÿ pt0K< Ä0 (tumor@ matched normal bam) somatic mu- tationD >D≈»‰. • Ñ Ö9: https://gist.github.com/hongiiv/06611f189f4c8158edb0 • SAMtools: v0.1.19 • GATK: v2.8.1 • MuTect: v1.1.4 • SomaticSniper: v1.0.4 • Strelka: v1.0.14 • Virmid: v1.1.1 2.1 SomaticSniper ‰â ✏ ¨⌅ D0 ©X0 (164 ) SomaticSniperî Varscan2| Ç ÃÒ4 YX Li Ding– Xt 2011D ⌧⌧⇠»<p, Bayesian probability@ poste- rior filteringD t©‰. ¸î π’<î High computational e ciency| Ùx‰. • -J: joint genotyping mode with default prior probability of a somatic mutation (0.01) • -n, -t: normal/tumor sample id (for VCF header) • -F: output Ϙ (classic, vcf, bed) • -f: ref.fasta |X Ω $ cd $ bam - somaticsniper -J -F vcf -n HCC1143_Normal -t HCC1143_Tumor -f /somatic_bench/reference/ human_g1k_v37_decoy .fasta tumor.bam normal.bam HCC1143_somaticsniper .vcf • (D05X) Reads with a mapping quality of 0 were filtered prior to somatic mutation identification. Predictions with ’somatic score’ of 40 or greater were considered for subsequent downstaream validation and analysis step. • GATKXSelectVariants| t©XÏ –Xî variantsÃD îú` ⇠ à‰. • VCF |X FORMAT D‹X SSC (somatic score), MQ (mapping quality) Ù| t© $ cd $ ln -s /somatic_bench/app/GenomeAnalysisTK -2.8 -1/ GenomeAnalysisTK .jar ./ $ update -alternatives --config java There are 2 choices for the alternative java (providing /usr/bin/java ). Selection Path Priority ------------------------------------------------------------ 0 /usr/lib/jvm/java -7- oracle/jre/bin/java 2 1 /usr/lib/jvm/java -6- oracle/jre/bin/java 1 * 2 /usr/lib/jvm/java -7- oracle/jre/bin/java 2 Press enter to keep the current choice [*], or type selection number: 2 update -alternatives : using /usr/lib/jvm/java -6- oracle/jre/bin/java
  • 7. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 7 $ java -version java version "1.7.0 _72" Java(TM) SE Runtime Environment (build 1.7.0_72 -b14) Java HotSpot(TM) 64-Bit Server VM (build 24.72 -b04 , mixed mode) $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_somaticsniper .vcf -o HCC1143_somaticsniper_filter .vcf -sn HCC1143_Tumor -sn HCC1143_Normal -select 'vc.getGenotype(" HCC1143_Tumor"). getExtendedAttribute ("SSC") >= 40 && (vc.getGenotype(" HCC1143_Tumor"). getExtendedAttribute ("MQ") > 0 || vc.getGenotype(" HCC1143_Normal "). getExtendedAttribute ("MQ") > 0)' • D0 ⌅/ƒX mutation /⇠ DPX0 $ cd $ grep -v "#" HCC1143_somaticsniper .vcf |wc -l 583 $ grep -v "#" HCC1143_somaticsniper_filter .vcf |wc -l 161 2.2 VarScan2 ‰â ✏ ¨⌅ D0 ©X0 (10Ñ) VarScan2î ÃÒ4 YX Li Ding– Xt SomaticSniperÙ‰ 1D ¶@ 2012D ⌧⌧⇠»‰. ‰x 4‰¸î Ϩ Fisher exact test@ filtering and FDR correctionD ¨©‰. ¸î π’< high-quality sSNVs– t⌧ sensitive detectionD ⇠â‰. ‰x 4‰¸ Ϩ Ö% |D .bam |t Dà pileup ⇣î mpileup |D Ö% î‰. • samtoolsX mpileupD t©XÏ normal, tumor– t⌧ pileup/mpileup ϘD ›1‰. • mpileup ˃–⌧ -q 1 (skip alignments with mapQ smaller than INT), -B (disable BAQ computation) 5XD µt filter| ⇠â‰. • VarScan–⌧ mpileup1 ϘD Ö%< ¨©Xî Ω∞ ’–mpileup 1’ 5XD ‰. $ cd $ samtools mpileup -f /somatic_bench/reference/ human_g1k_v37_decoy .fasta -q 1 -B normal.bam > HCC1143_n.pileup $ samtools mpileup -f /somatic_bench/reference/ human_g1k_v37_decoy .fasta -q 1 -B tumor.bam > HCC1143_t.pileup $ ln -s /somatic_bench/app/VarScan/VarScan.v2 .3.3. jar ./ $ java -jar VarScan.v2 .3.7. jar somatic HCC1143_n.pileup HCC1143_t.pileup HCC1143_varscan --output -vcf 1 14617150 positions in tumor 14616970 positions shared in normal 13721478 had sufficient coverage for comparison 10tX 8⌧‰@ samtoolsX pileupD ¨©Xî ÉD 0 < $Ö⇠¥ à¿Ã, samtools ≈pt∏ ⇠t⌧ pileup@ ¨|¿‡ mpileup < ¥ ⇠»‰. X¿Ã mpileup<ƒ XòX ÿ à pileupt •X‰. <` varscan–⌧î N/T ®P Ïh⌧ mpileup |D ¿–‰.
  • 8. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 8 13700958 were called Reference 0 were mixed SNP -indel calls and filtered 18427 were called Germline 1562 were called LOH 450 were called Somatic 81 were called Unknown 0 were called Variant • VarScan2X ⇠â∞¸ Dò@ ⇡t INDEL¸ SNP Ïh⌧ ∞¸| VCF ‹ ›1⌧‰ (HCC1143 varscan.indel.vcf, HCC1143 varscan.snp.vcf). drwxr -xr -x 2 root root 4096 Jan 30 09:52 ./ drwxr -xr -x 5 root root 8192 Jan 30 09:35 ../ -rw -r--r-- 1 root root 402354 Jan 30 09:47 HCC1143_varscan .indel.vcf -rw -r--r-- 1 root root 2691462 Jan 30 09:47 HCC1143_varscan .snp.vcf • VarScan2X ∞¸ ⌘, HCC1143varscan.snp.vcf XprocessSomaticısomaticFilter|tXD0|¸. • processSomatic: high-confidence2 /low-confidence Somatic mutationsD Ѩt ‰. • somaticFilter: ê‡t –Xî D0 5X –min-coverage, –p-value, –indel-file Ò © •X‰. $ cd $ java -jar VarScan.v2 .3.3. jar processSomatic -help USAGE: java -jar VarScan.jar process [status -file] OPTIONS status -file - The VarScan output file for SNPs or Indels OPTIONS --min -tumor -freq - Minimum variant allele frequency in tumor [0.10] --max -normal -freq - Maximum variant allele frequency in normal [0.05] --p-value - P-value for high -confidence calling [0.07] $ java -jar VarScan.v2 .3.3. jar processSomatic HCC1143_varscan .snp.vcf Reading input from HCC1143_varscan .snp.vcf Opening output files: 17914 VarScan calls processed 382 were Somatic (102 high confidence) 16048 were Germline (15431 high confidence) 1451 were LOH (1447 high confidence) • processSomaticX ∞¸ Germline, LOH, Somatic– t⌧ high confidence, low confidenceX ©]t Ïh ⌧ ∞¸| ›1‰. $ ls -rw -r--r-- 1 2413169 Jan 30 09:52 HCC1143_varscan .snp.vcf.Germline -rw -r--r-- 1 2320566 Jan 30 09:52 HCC1143_varscan .snp.vcf.Germline.hc -rw -r--r-- 1 216574 Jan 30 09:52 HCC1143_varscan .snp.vcf.LOH -rw -r--r-- 1 215997 Jan 30 09:52 HCC1143_varscan .snp.vcf.LOH.hc -rw -r--r-- 1 59990 Jan 30 09:52 HCC1143_varscan .snp.vcf.Somatic -rw -r--r-- 1 17055 Jan 30 09:52 HCC1143_varscan .snp.vcf.Somatic.hc • VarScan2X ∞¸ VCFX Ω∞ ALT allele– ’G/T’ Ò< 0Xîp tî îƒ Ñ – –Ï| ⌧›‰. 0| ⌧ ’G,T’X ⌅ )›< ¿Ω‰. 2tumor–⌧ minimum variant allele frequency 0.1, normal–⌧ maximum variant allele frequency 0.05
  • 9. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 9 $ cd $ perl -pe 's/tA //tA ,/' HCC1143_varscan .snp.vcf.Somatic.hc | perl -pe 's/tT //tT ,/'| perl -pe 's/tG //tG ,/'| perl -pe 's/tC //tC ,/' > HCC1143_varscan_filter .vcf • D0 ƒX mutation /⇠ $ cd $ grep -v "#" HCC1143_varscan_filter .vcf |wc -l 102 2.3 MuTect ‰â ✏ ¨⌅ D0 ©X0 (18Ñ) MuTect@ Broad–⌧ ⌧⌧⌧ 4 Bayesian probability with pre- and post- filteringD ⇠âXp, πà low allelic-fraction –⌧ sSNVs– t⌧ sensitive detectionD ⇠â‰. • MuTectî ê 1.6 Ñ⌅–⌧à ŸëX0 L8– ⌅¨ Java Ñ⌅D Ux ƒ– Dî‹ update-alternatives| t ©XÏ Ñ⌅D ¿Ω‰. $ cd $ ln -s /somatic_bench/app/mutect/muTect -1.1.4. jar ./ $ samtools index normal.bam $ samtools index tumor.bam $ cp /somatic_bench/reference/ccle.gatk.bed ./ $ update -alternatives --config java There are 2 choices for the alternative java (providing /usr/bin/java ). Selection Path Priority ------------------------------------------------------------ 0 /usr/lib/jvm/java -7- oracle/jre/bin/java 2 1 /usr/lib/jvm/java -6- oracle/jre/bin/java 1 * 2 /usr/lib/jvm/java -7- oracle/jre/bin/java 2 Press enter to keep the current choice [*], or type selection number: 1 update -alternatives : using /usr/lib/jvm/java -6- oracle/jre/bin/java $ java -version java version "1.6.0 _45" Java(TM) SE Runtime Environment (build 1.6.0_45 -b06) Java HotSpot(TM) 64-Bit Server VM (build 20.45 -b01 , mixed mode) $ java -jar muTect -1.1.4. jar --analysis_type MuTect --reference_sequence /somatic_bench/reference/ human_g1k_v37_decoy .fasta --cosmic /somatic_bench/reference/ b37_cosmic_v54_120711 .vcf --dbsnp /somatic_bench/reference/dbsnp_132_b37.leftAligned.vcf --input_file:normal normal.bam --input_file:tumor tumor.bam --out HCC1143_mutect .out --vcf HCC1143_mutect .vcf --coverage_file HCC1143.mutect.cov.wig.txt --normal_sample_name HCC1143_Normal --tumor_sample_name HCC1143_Tumor -L ccle.gatk.bed
  • 10. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 10 • (D05X) Predictions not labeled as ’REJECT’ were accepted as confident somatic mutation predictions, and subsequent downstream validation and analysis steps. • D0– ¨©` GATKî ê 1.7 Ñ⌅D Dî X¿ update-alternatives| t©XÏ ê Ñ⌅D ¿Ω‰. • GATKX SelectVariants| t©XÏ VCFX D0 (FILTER) D‹ÄÑt PASS⌧ (REJECT| ⌧x) variantsà >D∏‰. $ cd $ update -alternatives --config java There are 2 choices for the alternative java (providing /usr/bin/java ). Selection Path Priority ------------------------------------------------------------ 0 /usr/lib/jvm/java -7- oracle/jre/bin/java 2 1 /usr/lib/jvm/java -6- oracle/jre/bin/java 1 * 2 /usr/lib/jvm/java -7- oracle/jre/bin/java 2 Press enter to keep the current choice [*], or type selection number: 2 update -alternatives : using /usr/lib/jvm/java -6- oracle/jre/bin/java $ java -version java version "1.7.0 _72" Java(TM) SE Runtime Environment (build 1.7.0_72 -b14) Java HotSpot(TM) 64-Bit Server VM (build 24.72 -b04 , mixed mode) $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_mutect .vcf -o HCC1143_mutect_filter .vcf -sn HCC1143_Tumor -sn HCC1143_Normal -select 'vc.isNotFiltered ()' • GATKX SelectVariants| t©XÏ VCFX D0 (FILTER) D‹ ÄÑt PASS⌧ (REJECT| ⌧x) variantsà >D∏‰. $ cd $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_mutect .vcf -o HCC1143_mutect_filter .vcf -sn HCC1143_Tumor -sn HCC1143_Normal --excludeFiltered • D0 ƒX mutation /⇠ $ cd $ grep -v "#" HCC1143_mutect_filter .vcf |wc -l 109 2.4 ¨X0 • ⌅¯® ©]: VarScan2, SomaticSniper, MuTect, GATK • ∞¸<: 4ƒ D0 DÃ⌧ somatic mutation (161, 102, 112)
  • 11. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 11 3 Full Consensus / Partial Consensus sSNV lX0 SomaticSniper, VarScan2, MuTect 3ÖX SNV detecting toolsX full consensus callD >î‰. ∞ multi-allelic¸ indel @ ⌧p‰. 3.1 Bi-allelic SNPà îúX0 • ¨⌅ D0 ∞¸– t⌧ multi-allelicD ⌧pX‡ SNPà îú‰. • GATKX SelectVariants| t©XÏ -selectTypeD SNP (INDEL, SNP, MIXED, MNP, SYMBOLIC, NO VARIATION), -restrictAllelesTo| BIALLELIC (MULTIALLELIC or BIALLELIC)< ‰. $ cd $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_mutect_filter .vcf -o HCC1143_mutect_1 .vcf -selectType SNP -restrictAllelesTo BIALLELIC $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_somaticsniper_filter .vcf -o HCC1143_somaticsniper_1 .vcf -selectType SNP -restrictAllelesTo BIALLELIC $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_varscan_filter .vcf -o HCC1143_varscan_1 .vcf -selectType SNP -restrictAllelesTo BIALLELIC 3.2 Full Consensus / Partial Consensus lX0 • Partial Consensus (SomaticSniper/MuTect, MuTect/VarScan2, VarScan2/SomaticSniper)@ somatic caller 3Ö– ⌅¥ consensus| l‰. $ cd $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_somaticsniper_1 .vcf --concordance HCC1143_mutect_1 .vcf -o HCC1143_SM.vcf $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_mutect_1 .vcf --concordance HCC1143_varscan_1 .vcf -o HCC1143_MV.vcf
  • 12. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 12 $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_varscan_1 .vcf --concordance HCC1143_somaticsniper_1 .vcf -o HCC1143_VS.vcf $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_SM.vcf --concordance HCC1143_varscan_1 .vcf -o HCC1143_SMV.vcf 3.3 Full Consensus / Partial Consensus /⇠ lX0 • full consensus ✏ parital consensus /⇠| l‰. $ cd $ grep -v "#" HCC1143_SM.vcf |wc -l 45 $ grep -v "#" HCC1143_MV.vcf |wc -l 38 $ grep -v "#" HCC1143_VS.vcf |wc -l 42 $ grep -v "#" HCC1143_SMV.vcf |wc -l 32 3.4 ¨X0 • ⌅¯® ©]: GATK • ∞¸<: consensus / parital consensus pt0
  • 13. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 13 4 î D0 ©X0 GATK Unified Genotyper| t©XÏ specificity| ù ‹¨ ⇠ à‰. 4.1 Unifed Genotyper| t© normal, tumor variants call (8Ñ) • GATK UnifiedGenotyper| t©XÏ Normal/Tumor ÿ – t SNP| calling‰. $ cd $ java -jar GenomeAnalysisTK .jar -T UnifiedGenotyper -o HCC1143_gatk.tumor.vcf -I tumor.bam --genotype_likelihoods_model SNP -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta -L ccle.gatk.bed $ java -jar GenomeAnalysisTK .jar -T UnifiedGenotyper -o HCC1143_gatk.normal.vcf -I normal.bam --genotype_likelihoods_model SNP -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta -L ccle.gatk.bed 4.2 Filtering SNVs - full consensus (›µ •) • GATK UnifiedGenotyper| t©XÏ ›1⌧ Normal/Tumor X variants| t©XÏ SNVs predicted in tumor but not the germlines D0| ⇠â‰. $ cd $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_SMV.vcf --discordance HCC1143_gatk.normal.vcf -o HCC1143_SMV_discordance_normal .vcf $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_SMV_discordance_normal .vcf --concordance HCC1143_gatk.tumor.vcf -o HCC1143_final_filter_concordance .vcf 4.3 Filtering SNVs - partial consensus (SomaticSniper/MuTect) $ cd $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_SM.vcf --discordance HCC1143_gatk.normal.vcf
  • 14. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 14 -o HCC1143_SM_discordance_normal .vcf $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_SM_discordance_normal .vcf --concordance HCC1143_gatk.tumor.vcf -o HCC1143_SM_final_filter_concordance .vcf 4.4 GATK D0| © ƒ Full Consensus / Partial Consensus /⇠ lX0 • GATK D0| » consensus ✏ parital consensus /⇠| l‰. $ cd $ grep -v "#" HCC1143_final_filter_concordance .vcf |wc -l 32 $ grep -v "#" HCC1143_SM_final_filter_concordance .vcf |wc -l 45 4.5 ¨X0 • ⌅¯® ©]: GATK • ∞¸<: GATK D0| © consensus / parital consensus pt0
  • 15. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 15 5 Validation COSMIC¸CCLEX HCC1143 ÿ – ¿t ¨§∏| ¿‡ º»ò |XXî¿| LD¯‰. validation.list |@ ⌧Ñ– •⌧ | ⇣î ‰¥‹ (https://gist.github.com/hongiiv/42194181ce6402d8b629)XÏ ¨©i»‰. 5.1 COSMIC, CCLE pt0 DX0 • COSMIC¸ CCLEX HCC1143 ÿ – ¿t ©] ( 103⌧)D ı¨‰. $ cd $ cp /somatic_bench/reference/validation.list ./ $ cat validation.list | wc -l 103 5.2 Validation ⇠â - consensus / parital consensus • Ö filter⌧ consensus/partial consensus (SomaticSniper/MuTect)– t⌧ á⌧ |XXî¿| Ux‰. $ cd $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_final_filter_concordance .vcf -o all.val.filter.vcf -L validation.list $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_SM_final_filter_concordance .vcf -o sm.val.filter.vcf -L validation.list $ grep -v "#" all.val.filter.vcf | wc -l 6 $ grep -v "#" sm.val.filter.vcf | wc -l 9 • î GATK D0⌅X consensus ¿t– t⌧ á⌧ |XXî¿| Ux‰. $ cd $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_SMV.vcf -o all.val.vcf -L validation.list $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_SM.vcf -o sm.val.vcf -L validation.list $ grep -v "#" all.val.vcf |wc -l 6
  • 16. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 16 $ grep -v "#" sm.val.vcf |wc -l 9 • consensus: before GATK filter (32/6) - after GATK filter (32/6) • partial consensus-SM: before GATK filter (45/9) - after GATK filter (45/9) 5.3 ¨X0 • ⌅¯® ©]: GATK • ∞¸<: Ö consensus / partial consensus@ COSMIC, CCLE@ |XXî /⇠
  • 17. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 17 6 0¿ Somatic Mutation Callers - Strelka, Virmid 6.1 Strelka (1Ñ38 ) Bayesian probability with posterior filtering| t© somatic mutation caller 2012D |˯ò Ç ⌅¯®t ‰. |˯òX alignerx issactò eland –à D»| bwaƒ ¿–‰.‰â)ït |⇠ ⌅¯®‰¸î }⌅ ‰x )›D t©Xîp tî |˯ò ¸ ⌧ issac ⇣ D∑ ‰â)ïD ¨©Xp, tî XòX ⌅ ∏| ®( < ¨X‡ | 1àå ¨X0 ⌅XÏ Makefile t|î ›D ¨©Xî make |î ¯¨| t© X0 L8t‰. • Strelka| ¨©X0 ⌅t⌧î StrelkaX 5Xt •⌧ |t DîXp, 0¯ < bwa, eland, isaac 3⌧X aligner| ⌅ 0¯ 5XD ⌧ı‰. • 0¯ 5X–⌧ exometò target sequencingX Ω∞ isSkipDepthFilters = 1 ¿ ‰. $ ll /somatic_bench/app/strelka -1.0.14/ etc/ total 20 drwxrwxr -x 2 viz viz 4096 Jul 10 2014 ./ drwxr -xr -x 7 root root 4096 Jan 30 11:06 ../ -rw -rw -r-- 1 viz viz 3658 Jul 10 2014 strelka_config_bwa_default .ini -rw -rw -r-- 1 viz viz 3683 Jul 10 2014 strelka_config_eland_default .ini -rw -rw -r-- 1 viz viz 3821 Jul 10 2014 strelka_config_isaac_default .ini • Strelka $X⌧  †¨@ Ñ ∞¸ •  †¨– t⌧ ¿⇠ $ D ‰. • 0¯ 5X |D ı¨X‡ configureStrelkaWorkflow.pl Ö9< Ñ Ö9¥| ›1‰. • É¥ƒ Ñ Ö9D make| µt ‰âXp tL -j 5XD µt Ñ – ¨©` thread (cpu) /⇠| ¿ ‰. • INDEL¸ SNP ƒƒX VCF Ϙ< ›1⇠p, pass ⌧ ɸ raw somatic 4⌧X ∞¸ |t ›1⌧‰. $ STRELKA_INSTALL_DIR =/ somatic_bench/app/strelka -1.0.14/ echo $ STRELKA_INSTALL_DIR /somatic_bench/app/strelka -1.0.14/ $ WORK_DIR =/ root/myWork $ cp $ STRELKA_INSTALL_DIR /etc/ strelka_config_isaac_default .ini config.ini $ STRELKA_INSTALL_DIR /bin/ configureStrelkaWorkflow .pl --normal =/ root/normal.bam --tumor =/ root/tumor.bam --ref=/ somatic_bench/reference/ human_g1k_v37_decoy .fasta --config=config.ini --output -dir =./ myAnalysis $ cd ./ myAnalysis $ make -j 8 $ ll myAnalysis/results/ total 88 drwxr -xr -x 2 root root 4096 Jan 30 11:39 ./ drwxr -xr -x 5 root root 4096 Jan 30 11:37 ../ -rw -r--r-- 1 root root 13452 Jan 30 11:37 all.somatic.indels.vcf -rw -r--r-- 1 root root 36736 Jan 30 11:37 all.somatic.snvs.vcf -rw -r--r-- 1 root root 7098 Jan 30 11:37 passed.somatic.indels.vcf -rw -r--r-- 1 root root 16070 Jan 30 11:37 passed.somatic.snvs.vcf • Ö pass⌧ somatic SNPX /⇠| Ux‰. $ cd myAnalysis/results/ $ grep -v "#" passed.somatic.snvs.vcf|wc -l 62
  • 18. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 18 6.2 Virmid (33Ñ) Virmidî 2013D 8 YP @¡∞ P⇠ Ç å⌅∏Ë¥Ö»‰. ÿ ¡D µt tumor–⌧ normal ÿ X pro- portionD ©‰ (↵). • Ö pass⌧ somatic SNPX /⇠| Ux‰. $ java -jar /somatic_bench/app/Virmid -1.1.1/ Virmid.jar -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta -D /root/tumor.bam -N /root/normal.bam -t 8 -w /root/virmid $ cd /root/virmid $ ls -la $ ls -al total 98024 drwxr -xr -x 2 root 4096 Jan 30 16:00 ./ drwxr -xr -x 8 root 8192 Jan 30 15:32 ../ -rw -r--r-- 1 root 1252161 Jan 30 16:03 tumor.bam.virmid.germ.all.vcf -rw -r--r-- 1 root 955213 Jan 30 16:03 tumor.bam.virmid.germ.passed.vcf -rw -r--r-- 1 root 262 Jan 30 16:00 tumor.bam.virmid.gm -rw -r--r-- 1 root 36564 Jan 30 16:03 tumor.bam.virmid.loh.all.vcf -rw -r--r-- 1 root 2233 Jan 30 16:01 tumor.bam.virmid.loh.passed.vcf -rw -r--r-- 1 root 992 Jan 30 16:03 tumor.bam.virmid.report -rw -r--r-- 1 root 1364144 Jan 30 15:29 tumor.bam.virmid.sample.control.bai -rw -r--r-- 1 root 53107377 Jan 30 15:29 tumor.bam.virmid.sample.control.bam -rw -r--r-- 1 root 1364104 Jan 30 15:29 tumor.bam.virmid.sample.disease.bai -rw -r--r-- 1 root 41746178 Jan 30 15:29 tumor.bam.virmid.sample.disease.bam -rw -r--r-- 1 root 84053 Jan 30 16:03 tumor.bam.virmid.som.all.vcf -rw -r--r-- 1 root 6883 Jan 30 16:03 tumor.bam.virmid.som.passed.vcf $ grep -v "#" tumor.bam.virmid.som.passed.vcf|wc -l 78
  • 19. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 19 7 ⌅¥ l| ⌅ ¨⇧§ 7.1 ‰µ© ¨⇧§ ⌧Ñ • ⌧Ñ ¸å: xxx.xxx.xxx.xxx • Dt: edu01, edu02 • T8: kogo2015 • ˘⌘ç: http://xxx.xxx.xxx.xxx:8787 7.2 ‰µ© ¨⇧§ ⌧Ñ ⌘çX0 - ƒ∞à¨©ê • http://www.chiark.greenend.org.uk/˜sgtatham/putty/download.html ⌘ç • Intel x86© putty.exe| ‰¥‹ i»‰. • Host Name: xxx.xxx.xxx.xxx / Port: xx • Security Alert =t (t ’ (Y)’| ›i»‰. • ¯x Dt: `˘ @ Dt@ T8| ¨©i»‰. 7.3 ‰µ© ¨⇧§ ⌧Ñ ⌘çX0 -  ⇣î ¨⇧§ ¨©ê • Â(OSX)X Ω∞ ’Q©⌅¯®, ¯¨, 0¯⇣ app’D ‰âi»‰. ¨⇧§X Ω∞ ’Tt ⇣î ê ¨ ⇧§X ⌅¯® Tt–⌧ 0¯⇣D ‰â i»‰. $ ssh user_id@host_name $ ssh root@127 .0.0.1 • ssh Ö9D t©XÏ ‰µ© ¨⇧§ ⌧Ñ– ⌘çi»‰. ´à¯ ⌘ç‹ yes| ›Xt T8| ;î Ttt ò$å ⇠p tL ÄÏ @ T8| Ö%XÏ ⌘çi»‰. 7.4 ¨⇧§ ‹§ Ù LD¥0 ¯ 8⌧î ¨⇧§ 0Ï⇣3 X Xòx ’Ubuntu (∞Ñ,)’| 0⇠< $Öi»‰. ƒƒX ‹ ∆î Ω∞ ®‡ Ö XX ¨⇧§– ¨©t •i»‰. ¨⇧§î ‰ë 0Ï⇣¸ X‹Ë¥¡–⌧ ŸëXî ¥ ¥⌧Ö»‰. ê‡X ¨⇧§ ¥† XΩ–⌧ ŸëXî¿| LDP¥| å⌅∏Ë¥ $X‹ ê‡X ¨⇧§– i å⌅∏Ë¥X $X •i»‰. • ⌅¨ ê‡t ¨©Xî ¨⇧§ 0Ï⇣X ÖX ›ƒXî )ïÖ»‰. UbuntuX Ω∞ 4à 0Ï⇠î ¨⇧§ ¥ ¥⌧ ⌅¨ ‡Ñ⌅@ 14.04 LTS (Long Term Support)4 Ñ⌅Ö»‰. $ cat /etc/issue.net Ubuntu 12.04.1 LTS • ¨⇧§î ‰ë X‹Ë¥ XΩ–⌧ ¥ ⇠p ¨⇧§| ¿–Xî å⌅∏Ë¥‰@ tÏ X‹Ë¥– 0| ‰â |D 0 ⌧ıi»‰. 0|⌧ ⌅¨ ê‡t ¨©Xî X‹Ë¥ Ù| Lt ꇖå fiî å⌅∏Ë ¥| ‰¥‹XÏ ¨©` ⇠ ൻ‰. ¨⇧§ ⌧Ñ •D X‹Ë¥ ¨ë ›ƒ@ ’-m’ â, machine 5XD µt L ⇠ ൻ‰. ’x86’@ Intel 0⇠X CPU| X¯Xp, ’64’î 64D∏ X‹Ë¥| X¯5 i»‰. $ uname -m x86_64 3¨⇧§î lå ‹á ƒÙ¸ pDH ƒÙ Ѩ⇠p ƒÙƒ ‰ë 0Ï⇣t t¨‰. 4T‹Ö@ Trusty TahrÖ»‰. 5Tà ⌅Ï⌧ x64|‡ ⌅i»‰.
  • 20. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 20 • ‰⇣@ ¨⇧§ ¥ ¥⌧X uÏ< ¨©êX Ö9D ‰⌧ X‹Ë¥| µt ‰âXƒ] i»‰. ¨⇧§ ‰⇣ @ ¨©Xî 0Ï⇣– 0| ⌧ ‰x Ñ⌅D ¨©i»‰. ⌅¨ • ‡X ¨⇧§ ‰⇣@ 3.14.3dmfh 2014D 5‘6| ⌧⌧ Ñ⌅Ö»‰. ¨⇧§ 0Ï⇣@ t⌥å ⌧⌧ ‰⇣D 0⇠< ⌧ë)»‰. ¨⇧§X ‰⇣ Ù ›ƒ tÙƒ] X†µ»‰. $ uname -r 3.2.0 -32 - virtual • X@ ¨⇧§ Ö9¥| Ö% D t| ‰âXî XΩ< ’PATH’î ⌅8§ ŸëXî )ï– •D | Xî ✓x XΩ ¿⇠ ⌘X XòÖ»‰. exportî tÏ XΩ¿⇠X ✓D $ Xî Ö9¥ Ö»‰. ¨⇧§– Ö9D Ö%Xt PATH– $ ⌧  †¨| ∞ Ä…XÏ t˘ Ö9¥ àî¿| UxX‡ t| ‰âi »‰. 0|⌧ ê‡X ¡⌘ å⌅∏Ë¥| $XX‡ ¨⇧§ ¡–⌧ ‰âXî Ω∞ ⇠‹‹ PATH| ¿ t| ¥ –⌧‡¿ ‰ât •Xp ¯⌥¿ J@ Ω∞ å⌅∏Ë¥ $X⌧  †¨ ¥–⌧à ‰ât •i»‰. X XΩ ¿⇠ Ux@ ’env’ Ö9< LD º ⇠ à<p, PATHî ’export’| µt $ i»‰. $ env | grep PATH MANPATH =/usr/local/texlive /2013/ texmf/doc/man: PATH =/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin INFOPATH =/usr/local/texlive /2013/ texmf/doc/info: $ export PATH =/BIO/app/bwa -0.7.5a/:$PATH $ env | grep PATH 7.5 ¨⇧§ | ‹§ ¨⇧§X X@ XòX <¨ §l| |¨ < ÏÏ Ì< lÑXÏ ¨Xp X@ | ‹§ D ›1XÏ | ✏  †¨| ¨` ⇠ ൻ‰. • ¨⇧§ ‹§@ ÏÏ ¨©ê ¨©Xî ‹§< ê ê‡X ‡ Ìx H †¨| ¿‡ ൻ ‰. H  †¨¥–⌧î ê‡t |D ›1, ≠⌧ •i»‰. H  †¨ tŸXî Ö9@ ’cd’ Ö9 tp, ⌅¨  †¨ Ωî ’pwd’ Ö9< Ux` ⇠ ൻ‰. $ cd $ pwd /home/hongiiv •  †¨ ɇ t˘  †¨ tŸX0 $ cd $ mkdir sample_data $ ls -la total 2203488 drwxr -xr -x 16 hongiiv hongiiv 4096 May 29 10:34 . drwxr -xr -x 3 root root 4096 May 7 13:14 .. -rw ------- 1 hongiiv hongiiv 1908 May 10 11:59 .bash_history -rw -r--r-- 1 hongiiv hongiiv 220 May 7 13:14 .bash_logout -rw -r--r-- 1 hongiiv hongiiv 3763 May 10 17:06 .bashrc drwxr -xr -x 2 root root 4096 May 29 10:34 sample_data $ cd sample_data $ pwd /home/hongiiv/sample_data •  †¨ ✏ | ≠⌧X0
  • 21. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 21 $ cd $ rm -rf sample_data $ ls -la total 2203488 drwxr -xr -x 16 hongiiv hongiiv 4096 May 29 10:34 . drwxr -xr -x 3 root root 4096 May 7 13:14 .. -rw ------- 1 hongiiv hongiiv 1908 May 10 11:59 .bash_history -rw -r--r-- 1 hongiiv hongiiv 220 May 7 13:14 .bash_logout -rw -r--r-- 1 hongiiv hongiiv 3763 May 10 17:06 .bashrc $ • ¨⇧§ | ‹§ Ù0 $ df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 19G 14G 4.8G 74% / udev 3.9G 4.0K 3.9G 1% /dev tmpfs 1.6G 188K 1.6G 1% /run none 5.0M 0 5.0M 0% /run/lock none 3.9G 0 3.9G 0% /run/shm /dev/xvdb1 79G 38G 38G 50% /home/hongiiv/test • <¨ X‹§l X Ù Ù0 - 21.5 GBX <¨ x /dev/xvda X‹§lî vxda1, xvda2 2⌧X  X< l1⇠¥ à<p Linux, Linux swapX |‹§ÑD Ux` ⇠ ൻ‰. $ fdisk -l Disk /dev/xvda: 21.5 GB , 21474836480 bytes 255 heads , 63 sectors/track , 2610 cylinders , total 41943040 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00034212 Device Boot Start End Blocks Id System /dev/xvda1 2048 40038399 20018176 83 Linux /dev/xvda2 40038400 41940991 951296 82 Linux swap / Solaris Disk /dev/xvdb: 300.6 GB , 300647710720 bytes 171 heads , 35 sectors/track , 98112 cylinders , total 587202560 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x3459a991 Device Boot Start End Blocks Id System /dev/xvdb1 2048 587202559 293600256 8e Linux LVM • | ‹§ »¥∏ Ù Ux $ cat /etc/fstab proc /proc proc nodev ,noexec ,nosuid 0 0 /dev/xvda1 / ext3 errors=remount -ro 0 1 /dev/xvda2 none swap sw 0 0
  • 22. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 22 7.6 ¨⇧§ X‹§l î X0 • fdisk| µt î ⌧ X‹§l| Ux ƒ T›, |‹§ ›1, »¥∏X 3˃| p– X‹§l | ¨©i»‰. USB •X| ¨⇧§– x›X0 ⌅t⌧î mount ¸ ÃD pXt )»‰. $ fdisk /dev/xvdb $ mkfs.ext3 /dev/xvdb1 $ mkdir /new_hdd $ mount /dev/xvdb1 /new_hdd $ cd /new_hdd $ df -h 7.7 | ( Ö9¥ • touch - | l0 0x »¥ | ›1Xpò |t ›1⌧ ‹⌅D ¿Ω` ⇠ ൻ‰. ⌅9 ⌅¥ ( å⌅∏Ë¥ $Xò P!‹ ¨©Xî Ö9¥ ⇡¿X‹0 绉. $ touch a $ ls -al -rw -r--r-- 1 root root 0 Jun 18 10:04 a $ date Wed Jun 18 10:05:10 KST 2014 $ touch -c a $ ls -al -rw -r--r-- 1 root root 0 Jun 18 10:05 a • cat - |X ¥©D UxXpò ⌅Ë §lΩ∏ ë1‹ ¨©i»‰. ’cat ¿ test’ Ö9< test|î |D ›1Xt⌧ | ¥©D ë1i»‰. ë1t DÃ⌧ ƒ–î ’ctrl+D’ ѺD Ï `8ò, ⇠ ൻ‰. $ cat > test hi there my name is hong $ cat test hi there my name is hong $ ls -al -rw -r--r-- 1 root root 25 Jun 18 10:09 test • π  †¨X |X /⇠ 80 $ ls -l . | grep ^- | wc -l 50 • |X π 8êÙ ‹ëXî ÄÑD ⌧x ÄÑ ú%X0Ö»‰. VCF |¸ ⇡t ’’ ‹ëXî ÄÑ@ ¸ x Ω∞ ¸ ÃD ⌧x ‰⌧ ⌅¿tX ¨§∏| ú%i»‰. ⇣î ¯ ⇠ ¸ ÄÑÃD ú%i »‰. $ cd /BIO/data/gatk $ grep -v "#" dbsnp_138.hg19.vcf| wc -l 8087914 $ grep -F "#" dbsnp_138.hg19.vcf |wc -l 165
  • 23. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 23 • π ¸…¥Ã ú%i»‰. t˘ ¸…¥X L ≥⌧ ’-d’, +ê⌧’-c’< ,t •i»‰. $ grep -v "#" dbsnp_138.hg19.vcf |awk '{print $1}'| more chrM chrM chrM chrM chrM chrM chrM chrM chrM $ grep -v "#" dbsnp_138.hg19.vcf |awk '{print $1}'| sort -d chr1 chr2 $ grep -v "#" dbsnp_138.hg19.vcf |awk '{print $1}'| uniq -c 475 chrM 4723878 chr1 3363561 chr2 $ grep -v "#" dbsnp_138.hg19.vcf | awk '{if ($1 == "chrM") printf "chrM is: %sn", $2}' chrM is: 16390 chrM is: 16391 chrM is: 16429 chrM is: 16445 chrM is: 16499 • ú%< ú%⇠î ¥©D | •X0 $ grep -v "#" dbsnp_138.hg19.vcf | awk '{if ($1 == "chrM") printf "chrM is: %sn", $2}' > ~/chr_pos.txt $ grep -v "#" dbsnp_138.hg19.vcf | awk '{if ($1 == "chr1") printf "chrM is: %sn", $2}' >> ~/chr_pos.txt 7.8 ¨⇧§ $∏Ãl Ù • $∏Ãl x0òt§– Ù eth0X inet addrt xÄ–⌧ ⌅¨ ¨⇧§ ⌘ç • ¸å6 Ö»‰. $ ifconfig eth0 Link encap:Ethernet HWaddr 02:00:5b:73:00:33 inet addr: 172.27.252.234 Bcast: 172.27.255.255 inet6 addr: fe80::5bff:fe73:33/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:501386 errors:0 dropped:0 overruns:0 frame:0 TX packets:346879 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:19357734604 (1 GB) TX bytes:2720265191 (2 GB) Interrupt:68 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host 6¨⇧§ ⌧ÑX ¸åî 172.27.252.234 êX ‰µ XΩ– 0| ‰tå ‹⌧‰.
  • 24. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 24 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:4337 errors:0 dropped:0 overruns:0 frame:0 TX packets:4337 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2203478 (2.2 MB) TX bytes:2203478 (2.2 MB) 7.9 ¨⇧§ Uï ttX0 ¨⇧§î ‰ë UïD ¿–Xp, å⌅∏Ë¥ò pt0| 0ÏXî Ω∞ Uï⌧ |D t©XÏ 0Ïi»‰. • ¨⇧§–⌧ ¨©Xî ‰ë Uï t⌧ )ïÖ»‰. UïD t⌧ |H–î 8⌧ ‰¥àµ»‰. 8⌧| ⌧| < x‹î Ñ–åî ¡àt ¸¥—»‰. $ cd $ cp -R /BIO/data/compress ./ compress $ cd compress $ gzip -d compress01.gz $ tar xvfz compress02.tar.gz $ unzip compress03.zip $ bzip2 -d comress04.bz2 $ tar xvfz compress05.tar.gz $ tar xvf compress06.tar.bz2 • gzip: Recommended for fast network connections • bzip2: Recommended for slower network connections (smaller size but takes longer to compress) • zip: Not recommended but is provided as an option for those who cannot open the above formats • ©…X Uï⌧ ⌅¥ pt0– t UïD t⌧X¿ J‡ ¯¨ |X ¥© UxXî )ïÖ»‰. FASTQ |ÒD UxXîp ©i»‰. $ gzip -dc CEUTrio.HiSeq.WGS.b37.bestPractices.hg19.vcf.gz | more $ gzip -dc CEUTrio.HiSeq.WGS.b37.bestPractices.hg19.tar.gz | tar -tvf - 7.10 ¨⇧§ å⌅∏Ë¥ $XX0 |⇠ < ¨⇧§– å⌅∏Ë¥| $XXî )ï@ ‰LX 3 ¿ )ït ൻ‰. ´à¯î t ¨ (‰â) |D Uï ‹ ⌧ıXî )ï< ⌅Ëà UïD t⌧XÏ  ¨©t •X‰. Pà¯î ¨⇧§–⌧ ⌧ı Xî (§¿| t©Xî )ï< ∞Ñ,X Ω∞ APT|î (§¿ ¨ ⌅¯®D t©‰. 8à¯î å§ |D t©XÏ $XXî )ït‰. 7.10.1 APT| t© å⌅∏Ë¥ $X • APT| t© (§¿ ≈pt∏ $ apt -get update $ apt -get install bwa Reading package lists ... Done Building dependency tree Reading state information ... Done Use 'apt -get autoremove ' to remove them. Suggested packages: samtools
  • 25. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 25 The following NEW packages will be installed: bwa 0 upgraded , 1 newly installed , 0 to remove and 153 not upgraded. Need to get 135 kB of archives. After this operation , 286 kB of additional disk space will be used. Fetched 135 kB in 3s (40.1 kB/s) Selecting previously unselected package bwa. (Reading database ...17 files and directories currently installed .) Unpacking bwa (from .../ archives/bwa_0 .6.1 -1 _amd64.deb) ... Processing triggers for man -db ... Setting up bwa (0.6.1 -1) ... $ bwa Program: bwa (alignment via Burrows -Wheeler transformation ) Version: 0.6.1 - r104 Contact: Heng Li <lh3@sanger.ac.uk > Usage: bwa <command > [options] Command: index index sequences in the FASTA format aln gapped/ungapped alignment samse generate alignment (single ended) sampe generate alignment (paired ended) bwasw BWA -SW for long queries fastmap identify super -maximal exact matches fa2pac convert FASTA to PAC format pac2bwt generate BWT from PAC pac2bwtgen alternative algorithm for generating BWT bwtupdate update .bwt to the new format bwt2sa generate SA from BWT and Occ pac2cspac convert PAC to color -space PAC stdsw standard SW/NW alignment • NGS ( å⌅∏Ë¥ $X| ⌅t ¯¨ 0¯ $X⇠¥| Xî (§¿ ©]Ö»‰. $ apt -get update -y $ apt -get install gcc -y $ apt -get install make -y $ apt -get install zlib1g -dev -y $ apt -get install libncurses5 -dev -y $ apt -get install g++ -y $ apt -get install tcl tk -y $ apt -get install tcl -dev -y $ apt -get install unzip -y $ apt -get install curl -y $ apt -get install screen -y $ apt -get install python -dev -y $ apt -get install python -software -properties -y $ add -apt -repository ppa:webupd8team/java $ apt -get update -y $ apt -get install oracle -java7 -installer -y
  • 26. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 26 7.10.2 å§ T‹ Ù |D µ å⌅∏Ë¥ $X • å§ $XX0 $ cd $ cp /BIO/app/bwa -0.7.4. tar.bz2 ./ $ tar xvf bwa -0.7.4. tar.bz2 $ cd bwa -0.7.4 $ make $ ./bwa Program: bwa (alignment via Burrows -Wheeler transformation ) Version: 0.7.4 - r385 Contact: Heng Li <lh3@sanger.ac.uk > Usage: bwa <command > [options] Command: index index sequences in the FASTA format mem BWA -MEM algorithm fastmap identify super -maximal exact matches pemerge merge overlapping paired ends (EXPERIMENTAL) aln gapped/ungapped alignment samse generate alignment (single ended) sampe generate alignment (paired ended) bwasw BWA -SW for long queries fa2pac convert FASTA to PAC format pac2bwt generate BWT from PAC pac2bwtgen alternative algorithm for generating BWT bwtupdate update .bwt to the new format bwt2sa generate SA from BWT and Occ $ bwa Program: bwa (alignment via Burrows -Wheeler transformation ) Version: 0.6.2 - r126 Contact: Heng Li <lh3@sanger.ac.uk > Usage: bwa <command > [options]