2. Numbers you should know
The Human Genome Project
2
â ⯠1984: Human Genome (HG) project idea
discussed at Alta Summit as âDNA
available on the Internetâ
â ⯠1990: HG project for 15 years started in
the US (3 billion USD funding)
â ⯠2000: Rough draft of the HG announced
â ⯠2003: Complete genome sequenced
â ⯠2006: Last and longest chr1 sequenced
â ⯠As of today, we know:
âĄâŻ HG consists of 3.2 Bbp (~3.2 GB),
âĄâŻ 23 chromosomes,
âĄâŻ 20k-25k distinct genes
Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
3. 3
Costs in USD
0,01
0,1
1
10
100
1000
10000
01.01.01
01.05.01
01.09.01
01.01.02
01.05.02
01.09.02
01.01.03
01.05.03
01.09.03
01.01.04
01.05.04
Comparison of Costs
01.09.04
01.01.05
Costs per Megabyte RAM
01.05.05
01.09.05
Numbers you should know
01.01.06
01.05.06
01.09.06
01.01.07
01.05.07
Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
01.09.07
01.01.08
01.05.08
01.09.08
01.01.09
Costs per Megabase Sequencing
01.05.09
01.09.09
01.01.10
Comparison of Costs for Main Memory and Genome Analysis
01.05.10
01.09.10
01.01.11
01.05.11
01.09.11
01.01.12
4. Numbers you should know
Hardware Characteristics
4
â ⯠1,000 core cluster,
25 TB main memory
â ⯠Consists of 25 identical nodes:
âĄâŻ 80 cores
âĄâŻ 1 TB main memory
âĄâŻ IntelÂź XeonÂź E7- 4870
âĄâŻ 2.40GHz
âĄâŻ 30 MB Cache
Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
5. Aims of the Bachelorâs Project
5
â ⯠Gather interdisciplinary knowledge to work in
teams with biological and medical experts
â ⯠Explore data from gene, protein, drug, and
pathway databases to gain new insights
â ⯠Implement algorithms optimized for in-memory
technology, e.g. cluster algorithms for quantifying
similarity of samples or detection of single
nucleotide polymorphisms
â ⯠Proof applicability of in-memory technology for
real-time analysis of genome data
â ⯠Areas of interest: life sciences, crop sciences,
biology, crime investigation, etc.
Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
6. Your profile
6
â ⯠What we expect
âĄâŻ Flexibility in working interdisciplinary
âĄâŻ At least one passed database lecture
âĄâŻ Knowledge in using either or all: Python, C++, Bash, SQL
â ⯠We provide you with
âĄâŻ Introduction to in-memory technology and genomics basics
âĄâŻ Technology introduction in either or all: SQL, SQLScript, L, R,
BFL
Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
7. Do not hesitate to contact us!
7
Matthieu-P. Schapranow, M.Sc.
schapranow@hpi.uni-potsdam.de
http://j.mp/schapranow
Hasso Plattner Institute
Enterprise Platform & Integration Concepts
Matthieu-P. Schapranow
August-Bebel-Str. 88
14482 Potsdam, Germany
Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012