SlideShare ist ein Scribd-Unternehmen logo
1 von 82
Mapping Informatics To the Cloud


                    2012 AIRI Petabyte Challenge
                                  Chris Dagdigian
                               chris@bioteam.net
I‟m Chris.

I‟m an infrastructure
geek.

I work for the
BioTeam.
The “C” Word.
When I say “cloud”
 I‟m talking IaaS.
Amazon AWS
      Is the IaaS cloud.
         Most others are fooling themselves.
(Has-beens, also-rans & delusional marketing zombies)
A message for the
  pretenders…
No APIs?
Not a cloud.
No self-service?
 Not a cloud.
I have to email a human?
       Not a cloud.
~50% failure rate when
provisioning new servers?
    Stupid cloud.
Block storage
and virtual servers only?
    (barely) a cloud;
Private Clouds: My $.02
Private Clouds in 2012:

• Hype vs. Reality ratio still wacky
• Sensible only for certain shops
 •   Have you seen what you have to do to your networks & gear?

• There are easier ways
Private Clouds: My Advice for „12

• Remain cynical (test vendor claims)
• Due Diligence still essential
• I personally would not deploy/buy
  anything that does not explicitly provide
  Amazon API compatibility
Private Clouds: My Advice for „12

• Most people are better off:
  • Adding VM platforms to existing HPC
    clusters & environments
  • Extending enterprise VM platforms to
    allow user self-service & server
    catalogs
Enough Bloviating. Advice time.
Tip #1
HPC & Clouds: Whole New World
• We have spent decades learning
  to tune research HPC systems
  for shared access & many users.

• The cloud upends this model
• Far more common to see …
 • Dedicated cloud resources
   spun up for each app or use case
 • Each system gets individually
   tuned & optimized
Tip #2
Hybrid Clouds & Cloud Bursting
• Lots of aggressive marketing
• Lots of carefully constructed “case
  studies” and prototypes
• The truth?
  • Less usable than you‟ve been told
  • Possible? Heck yeah.
  • Practical? Only sometimes.
• Advice
  • Be cynical
  • Demand proof
  • Test carefully
• Still want to do it?
  • Buy it, don‟t build it
   •   Cycle Computing
   •   Univa
   •   BrightComputing
   •   …
• Follow the crowd
• In the real world we see:
 • Separation between local
   and cloud HPC resources
 • Send your work to the
   system most suitable
Tip #3
You can‟t rewrite EVERYTHING.
• Salesfolk will just glibly tell
  you to rewrite your apps so
  you can use whatever big
  data analysis framework
  they happen to be selling
  today
• They have no clue.
• In life science informatics
  we have hundreds of codes
  that will never be rewritten.
• We‟ll be needing them for
  years to come.
• Advice:
 • MapReduceish methods are
   the future for big-data
   informatics
 • It will take years to get there
 • We still have to deal with
   legacy algorithms and codes
• You will need:
 • A process for figuring out
   when it‟s worthwhile to
   rewrite/re-architect
 • Tested cloud strategies for
   handling three use cases
You need 3 cloud
architectures:

 1. Legacy HPC
 2. “Cloudy” HPC
 3. Big Data HPC (Hadoop)
Legacy HPC on the cloud

•       MIT StarCluster
•       http://web.mit.edu/star/cluster/
• This is your baseline
    •     Extend as needed
“Cloudy” HPC

•   Use this method when …
•   It makes sense to rewrite or
    rearchitect an HPC workflow to
    better leverage modern cloud
    capabilities
“Cloudy” HPC, continued

•       Ditch the legacy compute farm model
•       Leverage elastic scale-out tools (***)
    •     Spot Instances for elastic & cheap compute
    •     SimpleDB for job statekeeping
    •     SQS for job queues & workrflow “glue”
    •     SNS for message passing & monitoring
    •     S3 for input & output data
    •     Etc.
Big Data HPC

•   It‟s gonna be a MapReduce world
•   Little need to roll your own
•   Ecosystem already healthy
•   Multiple providers today
•   Often a slam-dunk cloud use case
Tip #4
The Cloud was not designed for
            “us”
• HPC is an edge case for the
  hyperscale IaaS clouds
• We need to deal with this
  and engineer around it.
• Many examples
  • Eventual consistency
  • Networking & subnets
  • Latency
  • Node placement
• Advice
  • Manage expectations
  • Benchmark & test
  • Evangelize
   • (pester the cloud sales reps …)
Tip #5
Data Movement Is Still Hard
• Consistently getting easier
  • Amazon is not a
    bottleneck
  • AWS Import/Export
  • AWS Direct Connect
  • Aspera has some amazing
    stuff out right now
• Advice
 • AWS Import/Export works well
 • Size of pipe is not everything
 • Sweat the small stuff
   • Tracking, checksums, disk speed
   • Dedicated workstations
   • Secure media storage
Dedicated data movement station
„naked‟ Terabyte-scale data movement
Don‟t overlook media storage …
• Advice for 2012
 • BioTeam is dialing down our
    advocacy of physical data
    ingestion into the cloud
 • Why?
   • Operationally hard, expensive
      and no longer strictly needed
Real world cross-country
internet-based data movement




                        March
                        2012
700Mb/sec into Amazon, stress-free & zero tuning




                                    March
                                    2012
• People trying to move data via
  physical media quickly realize the
  operational difficulties
• Bandwidth is cheaper than hiring
  another body to manage physical
  data ingestion & movement
• In 2012 we strongly recommend
  network-based data movement
  when at all possible
u r doing it wrong
cool data movement, bro!
Tips #6 & 7
Cloud storage. Still slow.
Big shared storage. Still hard.
• Not much we can do except
  engineer around it
• AWS compute cluster
  instances are a huge step
  forward
• AWS competitors take note
• We are not database nerds
• We care about more than
   just random IO performance
• We need it all
  • Random I/O
  • Long sequential read/write
• Faster Storage Options
  • Software RAID on EBS
  • Various GlusterFS options
• Even if you optimize
   everything, the virtual NICs
   are still a bottleneck
• Big Shared Storage
  • 10GbE nodes and NFS
  • Software RAID sets
  • GlusterFS or similar
  • 2012: pNFS finally?
Tip #8
Things fail differently in the cloud.
• Stuff breaks
• It breaks in weird ways
• Transient/temporary issues
  more common than what we
  see “at home”
• Advice
  • Pessimism is good
  • Design for failure
  • Think hard about
    • How will you detect?
    • How will you respond?
• Advice
  • Remove humans from
    loop
  • Automate recovery
  • Automate your backups
Tip #9
Serial/batch computing at-scale
• Loosely coupled workflows
  are ideal
• Break the pipeline into
  discrete components
• Components should be able
  to scale up|down
  independently
• Component = Opportunity to:
 • … Make a scaling decision
   •   (# nodes in use)
 • … Make sizing decision
   •   (instance type in use)
Nirvana is …
… independent loosely
connected components that
can self-scale and
communicate asynchronously
Advice:
• Many people already doing
  this
• Best practices are well known
• Steal from the best:
  • RightScale, Opscode &
    Cycle Computing
Phew. Think I‟m done now.
Questions?
       Slides available at
http://slideshare.net/chrisdag/
End;
Backup Slides
Private Clouds: Pick Your Poison

• OpenStack - http://openstack.org
  • Pro: Super smart developers;
    significant mindshare; True
    Open Source
  • Con: Commitment to AWS API
    compatibility (?) & stability
Private Clouds: Pick Your Poison

• CloudStack- http://cloudstack.org
  • Pro: Explicit AWS API support;
    very recent move away from
    “open-core” model; usability
  • Con: Developer mindshare?
    Sudden switch to Apache
Private Clouds: Pick Your Poison

• Eucalyptus- http://eucalyptus.com
  • Pro: Direct AWS API
    compatibility; lots of hypervisor
    support
  • Con: Open-core model;
    mindshare; Recent ressurection

Weitere ähnliche Inhalte

Was ist angesagt?

Cloud Security for Life Science R&D
Cloud Security for Life Science R&DCloud Security for Life Science R&D
Cloud Security for Life Science R&DChris Dagdigian
 
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Chris Dagdigian
 
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingBio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingChris Dagdigian
 
Bio-IT for Core Facility Managers
Bio-IT for Core Facility ManagersBio-IT for Core Facility Managers
Bio-IT for Core Facility ManagersChris Dagdigian
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it worldChris Dwan
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019Chris Dagdigian
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte PushingChris Dagdigian
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?mark madsen
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogiesmark madsen
 
Lean approach to IT development
Lean approach to IT developmentLean approach to IT development
Lean approach to IT developmentMark Krebs
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except usmark madsen
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sangerChris Dwan
 
BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014Ari Berman
 
frog IoT Big Design IoT World Congress 2015
frog IoT Big Design IoT World Congress 2015frog IoT Big Design IoT World Congress 2015
frog IoT Big Design IoT World Congress 2015Patrick Kalaher
 
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São PauloMini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São PauloOCTO Technology
 
Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIHumans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIPaco Nathan
 
Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Chris Dagdigian
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerMicrosoft
 

Was ist angesagt? (20)

Cloud Security for Life Science R&D
Cloud Security for Life Science R&DCloud Security for Life Science R&D
Cloud Security for Life Science R&D
 
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
 
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingBio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
 
Bio-IT for Core Facility Managers
Bio-IT for Core Facility ManagersBio-IT for Core Facility Managers
Bio-IT for Core Facility Managers
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte Pushing
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogies
 
Lean approach to IT development
Lean approach to IT developmentLean approach to IT development
Lean approach to IT development
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except us
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014
 
frog IoT Big Design IoT World Congress 2015
frog IoT Big Design IoT World Congress 2015frog IoT Big Design IoT World Congress 2015
frog IoT Big Design IoT World Congress 2015
 
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São PauloMini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
 
Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIHumans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AI
 
Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 

Ähnlich wie Mapping Life Science Informatics to the Cloud

Cloud Computing: The Hard Problems Never Go Away
Cloud Computing: The Hard Problems Never Go AwayCloud Computing: The Hard Problems Never Go Away
Cloud Computing: The Hard Problems Never Go AwayZendCon
 
Dev/Test in the Cloud - F
Dev/Test in the Cloud - FDev/Test in the Cloud - F
Dev/Test in the Cloud - FChris Riley ☁
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Andrew Brust
 
SQL 2014 hybrid platform - Azure and on premise
SQL 2014 hybrid platform - Azure and on premise SQL 2014 hybrid platform - Azure and on premise
SQL 2014 hybrid platform - Azure and on premise Shy Engelberg
 
Serverless Toronto helps Startups
Serverless Toronto helps StartupsServerless Toronto helps Startups
Serverless Toronto helps StartupsDaniel Zivkovic
 
When small problems become big problems
When small problems become big problemsWhen small problems become big problems
When small problems become big problemsAdrian Cole
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data ArchitecturesLynn Langit
 
Building a Service Provider Cloud Offering - MVMUG Sept2013
Building a Service Provider Cloud Offering - MVMUG Sept2013Building a Service Provider Cloud Offering - MVMUG Sept2013
Building a Service Provider Cloud Offering - MVMUG Sept2013Arron Stebbing
 
Scaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHPScaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHP120bi
 
Scaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsScaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsAchievers Tech
 
Greenfields tech decisions
Greenfields tech decisionsGreenfields tech decisions
Greenfields tech decisionsTrent Hornibrook
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLTriNimbus
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
"Portrait of the developer as The Artist" Lockheed Architect Workshop
"Portrait of the developer as The Artist" Lockheed Architect Workshop"Portrait of the developer as The Artist" Lockheed Architect Workshop
"Portrait of the developer as The Artist" Lockheed Architect WorkshopPatrick Chanezon
 
Sage Summit 2012: Cloud Computing for Accountants
Sage Summit 2012: Cloud Computing for AccountantsSage Summit 2012: Cloud Computing for Accountants
Sage Summit 2012: Cloud Computing for AccountantsGrant M Howe
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNATomas Cervenka
 
Why choose cloud based servers over physical hardware?
Why choose cloud based servers over physical hardware?Why choose cloud based servers over physical hardware?
Why choose cloud based servers over physical hardware?Michael Lobb
 
Itlc hanoi lesson learned-open-stack - pham tung duong
Itlc hanoi lesson learned-open-stack - pham tung duongItlc hanoi lesson learned-open-stack - pham tung duong
Itlc hanoi lesson learned-open-stack - pham tung duongLe Cuong
 

Ähnlich wie Mapping Life Science Informatics to the Cloud (20)

Cloud Computing: The Hard Problems Never Go Away
Cloud Computing: The Hard Problems Never Go AwayCloud Computing: The Hard Problems Never Go Away
Cloud Computing: The Hard Problems Never Go Away
 
Dev/Test in the Cloud - F
Dev/Test in the Cloud - FDev/Test in the Cloud - F
Dev/Test in the Cloud - F
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
SQL 2014 hybrid platform - Azure and on premise
SQL 2014 hybrid platform - Azure and on premise SQL 2014 hybrid platform - Azure and on premise
SQL 2014 hybrid platform - Azure and on premise
 
Serverless Toronto helps Startups
Serverless Toronto helps StartupsServerless Toronto helps Startups
Serverless Toronto helps Startups
 
When small problems become big problems
When small problems become big problemsWhen small problems become big problems
When small problems become big problems
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
 
Building a Service Provider Cloud Offering - MVMUG Sept2013
Building a Service Provider Cloud Offering - MVMUG Sept2013Building a Service Provider Cloud Offering - MVMUG Sept2013
Building a Service Provider Cloud Offering - MVMUG Sept2013
 
Scaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHPScaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHP
 
Scaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsScaling High Traffic Web Applications
Scaling High Traffic Web Applications
 
Greenfields tech decisions
Greenfields tech decisionsGreenfields tech decisions
Greenfields tech decisions
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
 
"Portrait of the developer as The Artist" Lockheed Architect Workshop
"Portrait of the developer as The Artist" Lockheed Architect Workshop"Portrait of the developer as The Artist" Lockheed Architect Workshop
"Portrait of the developer as The Artist" Lockheed Architect Workshop
 
Sage Summit 2012: Cloud Computing for Accountants
Sage Summit 2012: Cloud Computing for AccountantsSage Summit 2012: Cloud Computing for Accountants
Sage Summit 2012: Cloud Computing for Accountants
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
Why choose cloud based servers over physical hardware?
Why choose cloud based servers over physical hardware?Why choose cloud based servers over physical hardware?
Why choose cloud based servers over physical hardware?
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Itlc hanoi lesson learned-open-stack - pham tung duong
Itlc hanoi lesson learned-open-stack - pham tung duongItlc hanoi lesson learned-open-stack - pham tung duong
Itlc hanoi lesson learned-open-stack - pham tung duong
 

Kürzlich hochgeladen

Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Kürzlich hochgeladen (20)

Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Mapping Life Science Informatics to the Cloud

  • 1. Mapping Informatics To the Cloud 2012 AIRI Petabyte Challenge Chris Dagdigian chris@bioteam.net
  • 2. I‟m Chris. I‟m an infrastructure geek. I work for the BioTeam.
  • 4. When I say “cloud” I‟m talking IaaS.
  • 5. Amazon AWS Is the IaaS cloud. Most others are fooling themselves. (Has-beens, also-rans & delusional marketing zombies)
  • 6. A message for the pretenders…
  • 7. No APIs? Not a cloud.
  • 9. I have to email a human? Not a cloud.
  • 10. ~50% failure rate when provisioning new servers? Stupid cloud.
  • 11. Block storage and virtual servers only? (barely) a cloud;
  • 13. Private Clouds in 2012: • Hype vs. Reality ratio still wacky • Sensible only for certain shops • Have you seen what you have to do to your networks & gear? • There are easier ways
  • 14. Private Clouds: My Advice for „12 • Remain cynical (test vendor claims) • Due Diligence still essential • I personally would not deploy/buy anything that does not explicitly provide Amazon API compatibility
  • 15. Private Clouds: My Advice for „12 • Most people are better off: • Adding VM platforms to existing HPC clusters & environments • Extending enterprise VM platforms to allow user self-service & server catalogs
  • 18. HPC & Clouds: Whole New World
  • 19. • We have spent decades learning to tune research HPC systems for shared access & many users. • The cloud upends this model
  • 20. • Far more common to see … • Dedicated cloud resources spun up for each app or use case • Each system gets individually tuned & optimized
  • 22. Hybrid Clouds & Cloud Bursting
  • 23. • Lots of aggressive marketing • Lots of carefully constructed “case studies” and prototypes • The truth? • Less usable than you‟ve been told • Possible? Heck yeah. • Practical? Only sometimes.
  • 24. • Advice • Be cynical • Demand proof • Test carefully
  • 25. • Still want to do it? • Buy it, don‟t build it • Cycle Computing • Univa • BrightComputing • …
  • 26. • Follow the crowd • In the real world we see: • Separation between local and cloud HPC resources • Send your work to the system most suitable
  • 28. You can‟t rewrite EVERYTHING.
  • 29. • Salesfolk will just glibly tell you to rewrite your apps so you can use whatever big data analysis framework they happen to be selling today
  • 30. • They have no clue.
  • 31. • In life science informatics we have hundreds of codes that will never be rewritten. • We‟ll be needing them for years to come.
  • 32. • Advice: • MapReduceish methods are the future for big-data informatics • It will take years to get there • We still have to deal with legacy algorithms and codes
  • 33. • You will need: • A process for figuring out when it‟s worthwhile to rewrite/re-architect • Tested cloud strategies for handling three use cases
  • 34. You need 3 cloud architectures: 1. Legacy HPC 2. “Cloudy” HPC 3. Big Data HPC (Hadoop)
  • 35. Legacy HPC on the cloud • MIT StarCluster • http://web.mit.edu/star/cluster/ • This is your baseline • Extend as needed
  • 36. “Cloudy” HPC • Use this method when … • It makes sense to rewrite or rearchitect an HPC workflow to better leverage modern cloud capabilities
  • 37. “Cloudy” HPC, continued • Ditch the legacy compute farm model • Leverage elastic scale-out tools (***) • Spot Instances for elastic & cheap compute • SimpleDB for job statekeeping • SQS for job queues & workrflow “glue” • SNS for message passing & monitoring • S3 for input & output data • Etc.
  • 38. Big Data HPC • It‟s gonna be a MapReduce world • Little need to roll your own • Ecosystem already healthy • Multiple providers today • Often a slam-dunk cloud use case
  • 40. The Cloud was not designed for “us”
  • 41. • HPC is an edge case for the hyperscale IaaS clouds • We need to deal with this and engineer around it.
  • 42. • Many examples • Eventual consistency • Networking & subnets • Latency • Node placement
  • 43. • Advice • Manage expectations • Benchmark & test • Evangelize • (pester the cloud sales reps …)
  • 45. Data Movement Is Still Hard
  • 46. • Consistently getting easier • Amazon is not a bottleneck • AWS Import/Export • AWS Direct Connect • Aspera has some amazing stuff out right now
  • 47. • Advice • AWS Import/Export works well • Size of pipe is not everything • Sweat the small stuff • Tracking, checksums, disk speed • Dedicated workstations • Secure media storage
  • 50. Don‟t overlook media storage …
  • 51. • Advice for 2012 • BioTeam is dialing down our advocacy of physical data ingestion into the cloud • Why? • Operationally hard, expensive and no longer strictly needed
  • 52. Real world cross-country internet-based data movement March 2012
  • 53. 700Mb/sec into Amazon, stress-free & zero tuning March 2012
  • 54. • People trying to move data via physical media quickly realize the operational difficulties • Bandwidth is cheaper than hiring another body to manage physical data ingestion & movement • In 2012 we strongly recommend network-based data movement when at all possible
  • 55. u r doing it wrong
  • 59. Big shared storage. Still hard.
  • 60. • Not much we can do except engineer around it • AWS compute cluster instances are a huge step forward • AWS competitors take note
  • 61. • We are not database nerds • We care about more than just random IO performance • We need it all • Random I/O • Long sequential read/write
  • 62. • Faster Storage Options • Software RAID on EBS • Various GlusterFS options • Even if you optimize everything, the virtual NICs are still a bottleneck
  • 63. • Big Shared Storage • 10GbE nodes and NFS • Software RAID sets • GlusterFS or similar • 2012: pNFS finally?
  • 65. Things fail differently in the cloud.
  • 66. • Stuff breaks • It breaks in weird ways • Transient/temporary issues more common than what we see “at home”
  • 67. • Advice • Pessimism is good • Design for failure • Think hard about • How will you detect? • How will you respond?
  • 68. • Advice • Remove humans from loop • Automate recovery • Automate your backups
  • 71. • Loosely coupled workflows are ideal • Break the pipeline into discrete components • Components should be able to scale up|down independently
  • 72. • Component = Opportunity to: • … Make a scaling decision • (# nodes in use) • … Make sizing decision • (instance type in use)
  • 74. … independent loosely connected components that can self-scale and communicate asynchronously
  • 75. Advice: • Many people already doing this • Best practices are well known • Steal from the best: • RightScale, Opscode & Cycle Computing
  • 76. Phew. Think I‟m done now.
  • 77. Questions? Slides available at http://slideshare.net/chrisdag/
  • 78. End;
  • 80. Private Clouds: Pick Your Poison • OpenStack - http://openstack.org • Pro: Super smart developers; significant mindshare; True Open Source • Con: Commitment to AWS API compatibility (?) & stability
  • 81. Private Clouds: Pick Your Poison • CloudStack- http://cloudstack.org • Pro: Explicit AWS API support; very recent move away from “open-core” model; usability • Con: Developer mindshare? Sudden switch to Apache
  • 82. Private Clouds: Pick Your Poison • Eucalyptus- http://eucalyptus.com • Pro: Direct AWS API compatibility; lots of hypervisor support • Con: Open-core model; mindshare; Recent ressurection