SlideShare ist ein Scribd-Unternehmen logo
1 von 7
Bachelor Project:
Real-time Analysis of Genome Data
                       July 12, 2012




               Matthieu-P. Schapranow
               Hasso Plattner Institute
           Chair of Prof. Hasso Plattner
Numbers you should know
    The Human Genome Project
2


      ■  1984: Human Genome (HG) project idea
         discussed at Alta Summit as “DNA
         available on the Internet”
      ■  1990: HG project for 15 years started in
         the US (3 billion USD funding)
      ■  2000: Rough draft of the HG announced
      ■  2003: Complete genome sequenced
      ■  2006: Last and longest chr1 sequenced
      ■  As of today, we know:
        □  HG consists of 3.2 Bbp (~3.2 GB),
        □  23 chromosomes,
        □  20k-25k distinct genes

    Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
3
                                                                                              Costs in USD




                                                                             0,01
                                                                                    0,1
                                                                                          1
                                                                                                  10
                                                                                                             100
                                                                                                                   1000
                                                                                                                          10000
                                                                  01.01.01
                                                                  01.05.01
                                                                  01.09.01
                                                                  01.01.02
                                                                  01.05.02
                                                                  01.09.02
                                                                  01.01.03
                                                                  01.05.03
                                                                  01.09.03
                                                                  01.01.04
                                                                  01.05.04
                                                                                                                                                                                                                            Comparison of Costs




                                                                  01.09.04
                                                                  01.01.05
                                                                                                                                  Costs per Megabyte RAM




                                                                  01.05.05
                                                                  01.09.05
                                                                                                                                                                                                                            Numbers you should know




                                                                  01.01.06
                                                                  01.05.06
                                                                  01.09.06
                                                                  01.01.07
                                                                  01.05.07



Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
                                                                  01.09.07
                                                                  01.01.08
                                                                  01.05.08
                                                                  01.09.08
                                                                  01.01.09
                                                                                                                                  Costs per Megabase Sequencing




                                                                  01.05.09
                                                                  01.09.09
                                                                  01.01.10
                                                                                                                                                                  Comparison of Costs for Main Memory and Genome Analysis




                                                                  01.05.10
                                                                  01.09.10
                                                                  01.01.11
                                                                  01.05.11
                                                                  01.09.11
                                                                  01.01.12
Numbers you should know
    Hardware Characteristics
4


      ■  1,000 core cluster,
         25 TB main memory
      ■  Consists of 25 identical nodes:
            □  80 cores
            □  1 TB main memory
            □  Intel¼ Xeon¼ E7- 4870
            □  2.40GHz
            □  30 MB Cache




    Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
Aims of the Bachelor’s Project
5


      ■  Gather interdisciplinary knowledge to work in
         teams with biological and medical experts
      ■  Explore data from gene, protein, drug, and
         pathway databases to gain new insights
      ■  Implement algorithms optimized for in-memory
         technology, e.g. cluster algorithms for quantifying
         similarity of samples or detection of single
         nucleotide polymorphisms
      ■  Proof applicability of in-memory technology for
         real-time analysis of genome data
      ■  Areas of interest: life sciences, crop sciences,
         biology, crime investigation, etc.


    Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
Your profile
6


      ■  What we expect
            □  Flexibility in working interdisciplinary
            □  At least one passed database lecture
            □  Knowledge in using either or all: Python, C++, Bash, SQL




      ■  We provide you with
            □  Introduction to in-memory technology and genomics basics
            □  Technology introduction in either or all: SQL, SQLScript, L, R,
               BFL



    Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
Do not hesitate to contact us!
7




                                                                    Matthieu-P. Schapranow, M.Sc.
                                                                  schapranow@hpi.uni-potsdam.de
                                                                           http://j.mp/schapranow




                                                                    Hasso Plattner Institute
                                                Enterprise Platform & Integration Concepts
                                                                    Matthieu-P. Schapranow
                                                                      August-Bebel-Str. 88
                                                                  14482 Potsdam, Germany

    Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012

Weitere Àhnliche Inhalte

Was ist angesagt?

១៱ រាជធានឞភ្នំពេញ
១៱ ážšáž¶áž‡áž’áž¶áž“ážžáž—áŸ’áž“áŸ†áž–áŸáž‰áŸĄáŸą រាជធានឞភ្នំពេញ
១៱ រាជធានឞភ្នំពេញ
sam seyla hun
 
áŸĄáŸ„ ពោធិសាត់
áŸĄáŸ„ áž–áŸ„áž’áž·ážŸáž¶ážáŸ‹áŸĄáŸ„ ពោធិសាត់
áŸĄáŸ„ ពោធិសាត់
sam seyla hun
 
1-800-PetMeds September 2011 Business Plan
1-800-PetMeds September 2011 Business Plan1-800-PetMeds September 2011 Business Plan
1-800-PetMeds September 2011 Business Plan
Pet Meds
 
០៱ ខេត្តបាត់ដំបង
០៱ ážáŸážáŸ’ážáž”áž¶ážáŸ‹ážŠáŸ†áž”áž„áŸ áŸą ខេត្តបាត់ដំបង
០៱ ខេត្តបាត់ដំបង
sam seyla hun
 
BLA Capabilities
BLA CapabilitiesBLA Capabilities
BLA Capabilities
Masood Akhtar
 
MoScript a textual DSL for model manipulations
MoScript a textual DSL for model manipulationsMoScript a textual DSL for model manipulations
MoScript a textual DSL for model manipulations
Wolfgang Kling
 
០ៀ ខេត្តកំពង់ឆ្នាំង
០ៀ ខេត្តកំពង់ឆ្នាំង០ៀ ខេត្តកំពង់ឆ្នាំង
០ៀ ខេត្តកំពង់ឆ្នាំង
sam seyla hun
 
០៩ ខេត្តកំពង់ធំ
០៩ ខេត្តកំពង់ធំ០៩ ខេត្តកំពង់ធំ
០៩ ខេត្តកំពង់ធំ
sam seyla hun
 

Was ist angesagt? (20)

It Sector Presenter
It Sector PresenterIt Sector Presenter
It Sector Presenter
 
Boyar Miller Commercial Real Estate Breakfast Forum 12 4 09
Boyar Miller Commercial Real Estate Breakfast Forum 12 4 09Boyar Miller Commercial Real Estate Breakfast Forum 12 4 09
Boyar Miller Commercial Real Estate Breakfast Forum 12 4 09
 
Short term intensive rotational grazing in native pasture: effects on soil ni...
Short term intensive rotational grazing in native pasture: effects on soil ni...Short term intensive rotational grazing in native pasture: effects on soil ni...
Short term intensive rotational grazing in native pasture: effects on soil ni...
 
Fatal Crash Graphs July 2010
Fatal Crash Graphs July 2010Fatal Crash Graphs July 2010
Fatal Crash Graphs July 2010
 
១៱ រាជធានឞភ្នំពេញ
១៱ ážšáž¶áž‡áž’áž¶áž“ážžáž—áŸ’áž“áŸ†áž–áŸáž‰áŸĄáŸą រាជធានឞភ្នំពេញ
១៱ រាជធានឞភ្នំពេញ
 
Dr. Julie Menard - What Would Dr. Leman Do... for PRRS
Dr. Julie Menard - What Would Dr. Leman Do... for PRRSDr. Julie Menard - What Would Dr. Leman Do... for PRRS
Dr. Julie Menard - What Would Dr. Leman Do... for PRRS
 
Antony Allen
Antony AllenAntony Allen
Antony Allen
 
áŸĄáŸ„ ពោធិសាត់
áŸĄáŸ„ áž–áŸ„áž’áž·ážŸáž¶ážáŸ‹áŸĄáŸ„ ពោធិសាត់
áŸĄáŸ„ ពោធិសាត់
 
Presentation SCA interim report Q2 2011
Presentation SCA interim report Q2 2011Presentation SCA interim report Q2 2011
Presentation SCA interim report Q2 2011
 
1-800-PetMeds September 2011 Business Plan
1-800-PetMeds September 2011 Business Plan1-800-PetMeds September 2011 Business Plan
1-800-PetMeds September 2011 Business Plan
 
6 city report
6 city report6 city report
6 city report
 
០៱ ខេត្តបាត់ដំបង
០៱ ážáŸážáŸ’ážáž”áž¶ážáŸ‹ážŠáŸ†áž”áž„áŸ áŸą ខេត្តបាត់ដំបង
០៱ ខេត្តបាត់ដំបង
 
BLA Capabilities
BLA CapabilitiesBLA Capabilities
BLA Capabilities
 
Kuya rafael
Kuya rafaelKuya rafael
Kuya rafael
 
MoScript a textual DSL for model manipulations
MoScript a textual DSL for model manipulationsMoScript a textual DSL for model manipulations
MoScript a textual DSL for model manipulations
 
០ៀ ខេត្តកំពង់ឆ្នាំង
០ៀ ខេត្តកំពង់ឆ្នាំង០ៀ ខេត្តកំពង់ឆ្នាំង
០ៀ ខេត្តកំពង់ឆ្នាំង
 
July Tatum Survey - Recovery Stalls
July Tatum Survey - Recovery StallsJuly Tatum Survey - Recovery Stalls
July Tatum Survey - Recovery Stalls
 
Multiple Species Grazing in Oklahoma
Multiple Species Grazing in OklahomaMultiple Species Grazing in Oklahoma
Multiple Species Grazing in Oklahoma
 
Aug 2009 Tatum Survey
Aug 2009 Tatum SurveyAug 2009 Tatum Survey
Aug 2009 Tatum Survey
 
០៩ ខេត្តកំពង់ធំ
០៩ ខេត្តកំពង់ធំ០៩ ខេត្តកំពង់ធំ
០៩ ខេត្តកំពង់ធំ
 

Andere mochten auch

Case Study: Streamline ATP Checks with HPI Smart Enterprise Widgets
Case Study: Streamline ATP Checks with HPI Smart Enterprise WidgetsCase Study: Streamline ATP Checks with HPI Smart Enterprise Widgets
Case Study: Streamline ATP Checks with HPI Smart Enterprise Widgets
Matthieu Schapranow
 
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Matthieu Schapranow
 
Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI
Matthieu Schapranow
 
License-based Access Control in EPCglobal Networks
License-based Access Control in EPCglobal NetworksLicense-based Access Control in EPCglobal Networks
License-based Access Control in EPCglobal Networks
Matthieu Schapranow
 
HANA Oncolyzer -- Sanofi Open Innovation Forum Feb. 12, 2013
HANA Oncolyzer -- Sanofi Open Innovation Forum Feb. 12, 2013HANA Oncolyzer -- Sanofi Open Innovation Forum Feb. 12, 2013
HANA Oncolyzer -- Sanofi Open Innovation Forum Feb. 12, 2013
Matthieu Schapranow
 
In-memory Applications for Oncology
In-memory Applications for OncologyIn-memory Applications for Oncology
In-memory Applications for Oncology
Matthieu Schapranow
 
CoMoSeR: Cost Model for Security-Enhanced RFID-Aided Supply Chains
CoMoSeR: Cost Model for Security-Enhanced RFID-Aided Supply ChainsCoMoSeR: Cost Model for Security-Enhanced RFID-Aided Supply Chains
CoMoSeR: Cost Model for Security-Enhanced RFID-Aided Supply Chains
Matthieu Schapranow
 

Andere mochten auch (14)

Virtual Open House Presentation
Virtual Open House PresentationVirtual Open House Presentation
Virtual Open House Presentation
 
Case Study: Streamline ATP Checks with HPI Smart Enterprise Widgets
Case Study: Streamline ATP Checks with HPI Smart Enterprise WidgetsCase Study: Streamline ATP Checks with HPI Smart Enterprise Widgets
Case Study: Streamline ATP Checks with HPI Smart Enterprise Widgets
 
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
 
Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI Introduction to High-performance In-memory Genome Project at HPI
Introduction to High-performance In-memory Genome Project at HPI
 
Introduction to SMI
Introduction to SMIIntroduction to SMI
Introduction to SMI
 
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
 
Usd for balt artek day first
Usd for balt artek day firstUsd for balt artek day first
Usd for balt artek day first
 
License-based Access Control in EPCglobal Networks
License-based Access Control in EPCglobal NetworksLicense-based Access Control in EPCglobal Networks
License-based Access Control in EPCglobal Networks
 
HANA Oncolyzer -- Sanofi Open Innovation Forum Feb. 12, 2013
HANA Oncolyzer -- Sanofi Open Innovation Forum Feb. 12, 2013HANA Oncolyzer -- Sanofi Open Innovation Forum Feb. 12, 2013
HANA Oncolyzer -- Sanofi Open Innovation Forum Feb. 12, 2013
 
In-memory Applications for Oncology
In-memory Applications for OncologyIn-memory Applications for Oncology
In-memory Applications for Oncology
 
BioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or PotentialBioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or Potential
 
CoMoSeR: Cost Model for Security-Enhanced RFID-Aided Supply Chains
CoMoSeR: Cost Model for Security-Enhanced RFID-Aided Supply ChainsCoMoSeR: Cost Model for Security-Enhanced RFID-Aided Supply Chains
CoMoSeR: Cost Model for Security-Enhanced RFID-Aided Supply Chains
 
Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
 

Mehr von Matthieu Schapranow

Mehr von Matthieu Schapranow (20)

Patient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in PracticePatient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in Practice
 
How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?
 
AI in Oncology
AI in OncologyAI in Oncology
AI in Oncology
 
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital HealthAnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
 
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
 
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
 
In-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineIn-Memory Apps for Precision Medicine
In-Memory Apps for Precision Medicine
 
"When time matters..."
"When time matters...""When time matters..."
"When time matters..."
 
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
 
In-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineIn-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems Medicine
 
Analyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineAnalyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision Medicine
 
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchAnalyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
 
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
 
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
 
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
 
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
 
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world ExamplesFestival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
 
Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?
 

KĂŒrzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Christopher Logan Kennedy
 

KĂŒrzlich hochgeladen (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 

Real-time Analysis of Genome Data

  • 1. Bachelor Project: Real-time Analysis of Genome Data July 12, 2012 Matthieu-P. Schapranow Hasso Plattner Institute Chair of Prof. Hasso Plattner
  • 2. Numbers you should know The Human Genome Project 2 ■  1984: Human Genome (HG) project idea discussed at Alta Summit as “DNA available on the Internet” ■  1990: HG project for 15 years started in the US (3 billion USD funding) ■  2000: Rough draft of the HG announced ■  2003: Complete genome sequenced ■  2006: Last and longest chr1 sequenced ■  As of today, we know: □  HG consists of 3.2 Bbp (~3.2 GB), □  23 chromosomes, □  20k-25k distinct genes Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
  • 3. 3 Costs in USD 0,01 0,1 1 10 100 1000 10000 01.01.01 01.05.01 01.09.01 01.01.02 01.05.02 01.09.02 01.01.03 01.05.03 01.09.03 01.01.04 01.05.04 Comparison of Costs 01.09.04 01.01.05 Costs per Megabyte RAM 01.05.05 01.09.05 Numbers you should know 01.01.06 01.05.06 01.09.06 01.01.07 01.05.07 Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012 01.09.07 01.01.08 01.05.08 01.09.08 01.01.09 Costs per Megabase Sequencing 01.05.09 01.09.09 01.01.10 Comparison of Costs for Main Memory and Genome Analysis 01.05.10 01.09.10 01.01.11 01.05.11 01.09.11 01.01.12
  • 4. Numbers you should know Hardware Characteristics 4 ■  1,000 core cluster, 25 TB main memory ■  Consists of 25 identical nodes: □  80 cores □  1 TB main memory □  IntelÂź XeonÂź E7- 4870 □  2.40GHz □  30 MB Cache Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
  • 5. Aims of the Bachelor’s Project 5 ■  Gather interdisciplinary knowledge to work in teams with biological and medical experts ■  Explore data from gene, protein, drug, and pathway databases to gain new insights ■  Implement algorithms optimized for in-memory technology, e.g. cluster algorithms for quantifying similarity of samples or detection of single nucleotide polymorphisms ■  Proof applicability of in-memory technology for real-time analysis of genome data ■  Areas of interest: life sciences, crop sciences, biology, crime investigation, etc. Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
  • 6. Your profile 6 ■  What we expect □  Flexibility in working interdisciplinary □  At least one passed database lecture □  Knowledge in using either or all: Python, C++, Bash, SQL ■  We provide you with □  Introduction to in-memory technology and genomics basics □  Technology introduction in either or all: SQL, SQLScript, L, R, BFL Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012
  • 7. Do not hesitate to contact us! 7 Matthieu-P. Schapranow, M.Sc. schapranow@hpi.uni-potsdam.de http://j.mp/schapranow Hasso Plattner Institute Enterprise Platform & Integration Concepts Matthieu-P. Schapranow August-Bebel-Str. 88 14482 Potsdam, Germany Real-time Analysis of Genome Data, M. Schapranow, July 12, 2012