Highly Sensitive Cloud-Based Read Mapping with CloudBurst

CloudBurst
• CloudBurst : Highly Sensitive Short Read
Mapping with MapReduce

• New parallel read-mapping algorithm
optimized for mapping NGS data to the
human genome and other reference
genomes

• SNP discovery, genotyping, and personal
genomics

CloudBurst
• It is modeled after the short read mapping
program RMAP

• Reports either all alignments or the unambiguous
best alignment for each read with any number of
mismatches or differences

• This level of sensitivity could be prohibitively time
consuming, but CloudBurst uses the open-source
Hadoop implementation of MapReduce to
parallelize execution using multiple compute
nodes.

CloudBurst
• Running time
– scales linearly with the number of reads mapped
– with near linear speedup as the number of
processors increases.

• CloudBurst reduces the running time from
hours to mere minutes for typical jobs
involving mapping of millions of short reads to
the human genome.

Algorithm Overview
• CloudBurst uses seed-and-extend algorithms to
map reads to a reference genome.

• Seed
– k differences : the alignment must have a region of
length s=r/k+1 called a seed that exactly matches the
reference.

• Extend
– CloudBurst attempts to extend the alignment into an
end-to-end alignment with at most k mismatches or
differences

Algorithm Overview
• CloudBurst uses the Hadoop implementation of
MapReduce to catalog and extend the seeds

• Map phase emits
– all length-s k-mers from the reference sequences
– all non-overlapping length-s kmers from the reads

• Shuffle phase
– read and reference kmers are brought together

• Reduce phase
– the seeds are extended into end-to-end alignments

Demo

Getting Started.docx 참고

Related Tools
• Bowtie: Ultrafast short read alignment
• SoapSNP: Accurate SNP/consensus calling
• Tophat: RNA-Seq splice junction mapper
• Cufflinks: Isoform assembly, quantitation
• Hadoop: Open Source MapReduce
• CloudBurst: Sensitive MapReduce alignment
• Crossbow: Read Mapping and SNP calling in the clouds
• Jnomics: Cloud-Scale Sequence Analysis
• Contrail: Cloud-based de novo assembly
• Myrna: Cloud-Scale differential expression of RNAseq

Figure 1: A MapReduce approach for detecting genetic variants from high-throughput genome sequencing.

출처 : http://www.nature.com/nbt/journal/v30/n3/fig_tab/nbt.2134_F1.html

Highly Sensitive Cloud-Based Read Mapping with CloudBurst

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Highly Sensitive Cloud-Based Read Mapping with CloudBurst

Similar to Highly Sensitive Cloud-Based Read Mapping with CloudBurst (20)

More from 주영 송

More from 주영 송 (12)

Recently uploaded

Recently uploaded (20)

Highly Sensitive Cloud-Based Read Mapping with CloudBurst