SlideShare ist ein Scribd-Unternehmen logo
1 von 156
Downloaden Sie, um offline zu lesen
Bio-IT For Core Facility Leaders
  Tips, Tricks & Trends


                            2012 NERLCSD Meeting - www.nerlscd.org




                                                                     1
Wednesday, October 31, 12
Intro                           1
  Meta-Issues (The Big Picture)   2
  Infrastructure Tour             3
  Compute & HPC                   4
  Storage                         5
  Cloud & Big Data                6
                                      2
Wednesday, October 31, 12
I’m Chris.

 I’m an infrastructure geek.

 I work for the BioTeam.

                               @chris_dag   3
Wednesday, October 31, 12
BioTeam
  Who, what & why


     ‣ Independent consulting shop
     ‣ Staffed by scientists forced to
       learn IT, SW & HPC to get our
       own research done
     ‣ 12+ years bridging the “gap”
       between science, IT & high
       performance computing
     ‣ www.bioteam.net

                                         4
Wednesday, October 31, 12
Listen to me at your own risk
  Seriously.

       ‣ Clever people find multiple
         solutions to common issues
       ‣ I’m fairly blunt, burnt-out and
         cynical in my advanced age
       ‣ Significant portion of my work
         has been done in demanding
         production Biotech & Pharma
         environments
       ‣ Filter my words accordingly
                                           5
Wednesday, October 31, 12
Intro                           1
  Meta-Issues (The Big Picture)   2
  Infrastructure Tour             3
  Compute & HPC                   4
  Storage                         5
  Cloud & Big Data                6
                                      6
Wednesday, October 31, 12
Meta-Issues
                       Why you need to track this stuff ...


                                                              7
Wednesday, October 31, 12
Big Picture
  Why this stuff matters ...



      ‣ HUGE revolution in the rate at which lab instruments are
        being redesigned, improved & refreshed
            •      Example: CCD sensor upgrade on that confocal
                   microscopy rig just doubled your storage requirements
            •      Example: That 2D ultrasound imager is now a 3D imager
            •      Example: Illumina HiSeq upgrade just doubled the rate at
                   which you can acquire genomes. Massive downstream
                   increase in storage, compute & data movement needs

                                                                              8
Wednesday, October 31, 12
The Central Problem Is ...


      ‣ Instrumentation & protocols are changing FAR FASTER
        than we can refresh our Research-IT & Scientific
        Computing infrastructure
            •      The science is changing month-to-month ...
            •      ... while our IT infrastructure only gets refreshed every 2-7
                   years
      ‣ We have to design systems TODAY that can support
        unknown research requirements & workflows over many
        years (gulp ...)
                                                                                   9
Wednesday, October 31, 12
The Central Problem Is ...


      ‣ The easy period is over
      ‣ 5 years ago you could toss inexpensive storage and
        servers at the problem; even in a nearby closet or under
        a lab bench if necessary
      ‣ That does not work any more; IT needs are too extreme
      ‣ 1000 CPU Linux clusters and petascale storage is the
        new normal; try fitting THAT in a closet!

                                                                   10
Wednesday, October 31, 12
The Take Home Lesson
  What core facility leadership needs to understand


      ‣ The incredible rate of cost decreases & capability gains
        seen in the lab instrumentation space is not mirrored
        everywhere
      ‣ As gear gets cheaper/faster, scientists will simply do
        more work and ask more questions. Nobody simply
        banks the financial savings when an instrument gets
        50% cheaper -- they just buy two of them!
      ‣ IT technology is not improving at the same rate; we also
        can’t change our IT infrastructures all that rapidly
                                                                   11
Wednesday, October 31, 12
If you get it wrong ...




      ‣ Lost opportunity
      ‣ Frustrated & very vocal researchers
      ‣ Problems in recruiting
      ‣ Publication problems



                                              12
Wednesday, October 31, 12
Intro                           1
  Meta-Issues (The Big Picture)   2
  Infrastructure Tour             3
  Compute & HPC                   4
  Storage                         5
  Cloud & Big Data                6
                                      13
Wednesday, October 31, 12
Infrastructure Tour
                            What does this stuff look like?




                                                              14
Wednesday, October 31, 12
Self-contained single-instrument infrastructure


                                                                              15
Wednesday, October 31, 12
Ilumina GA
                                         16
Wednesday, October 31, 12
Instrument Control Workstation


                                                             17
Wednesday, October 31, 12
SOLiD Sequencer ...
                                                  18
Wednesday, October 31, 12
sits on top of a 24U server rack...

                                                                  19
Wednesday, October 31, 12
Another lab-local HPC cluster + storage

                                                        20
Wednesday, October 31, 12
More lab-local servers & storage

                                                               21
Wednesday, October 31, 12
Small core w/ multiple instrument support

                                                     22
Wednesday, October 31, 12
Small cluster; large storage

                                                           23
Wednesday, October 31, 12
Mid-sized core facility

                                                      24
Wednesday, October 31, 12
Large Core Facility

                                                  25
Wednesday, October 31, 12
Large Core Facility

                                                  26
Wednesday, October 31, 12
Large Core Facility
                                                  27
Wednesday, October 31, 12
Colocation Cages
                                               28
Wednesday, October 31, 12
Inside a colo cage
                                                 29
Wednesday, October 31, 12
Linux Cluster + In-row chillers (front)
                                                            30
Wednesday, October 31, 12
Linux Cluster + In-row chillers (rear)
                                                            31
Wednesday, October 31, 12
1U “Pizza Box” Style Server Chassis
                                                          32
Wednesday, October 31, 12
Pile of “pizza boxes”
                                                    33
Wednesday, October 31, 12
4U Rackmount Servers
                                                   34
Wednesday, October 31, 12
“Blade” Servers & Enclosure
                                                          35
Wednesday, October 31, 12
Hybrid Modular Server
                                                    36
Wednesday, October 31, 12
Integrated: Blades + Hypervisor + Storage
                                                      37
Wednesday, October 31, 12
Petabyte-scale Storage
                                                     38
Wednesday, October 31, 12
Real world screenshot from earlier this month




                     16 monster compute nodes + 22 GPU nodes
                    Cost? 30 bucks an hour via AWS Spot Market

                               Yep. This counts.
                                                                  39
Wednesday, October 31, 12
Physical data movement station

                                                             40
Wednesday, October 31, 12
Physical data movement station

                                                             41
Wednesday, October 31, 12
“Naked” Data Movement

                                                    42
Wednesday, October 31, 12
“Naked” Data Archive

                                                   43
Wednesday, October 31, 12
The cliche image

                                               44
Wednesday, October 31, 12
Backblaze Pod: 100 terabytes for $12,000

                                                      45
Wednesday, October 31, 12
Intro                           1
  Meta-Issues (The Big Picture)   2
  Infrastructure Tour             3
  Compute & HPC                   4
  Storage                         5
  Cloud & Big Data                6
                                      46
Wednesday, October 31, 12
Compute
                       Actually the easy bit ...


                                                   47
Wednesday, October 31, 12
Compute Power
  Not a big deal in 2012 ...




      ‣ Compute power is largely a solved problem
      ‣ It’s just a commodity
      ‣ Cheap, simple & very easy to acquire
      ‣ Lets talk about what you need to know ...



                                                    48
Wednesday, October 31, 12
Compute Trends
  Thinks you should be tracking ...




      ‣ Facility Issues
      ‣ “Fat Nodes” replacing Linux Clusters
      ‣ Increasing presence of serious “lab-local” IT



                                                        49
Wednesday, October 31, 12
Facility Stuff


       ‣ Compute & storage
         requirements are getting
         larger and larger
       ‣ We are packing more “stuff”
         into smaller spaces
       ‣ This increases (radically)
         electrical and cooling
         requirements
                                       50
Wednesday, October 31, 12
Facility Stuff - Core issue

       ‣ Facility & power issues can
         take many months or years to
         address
       ‣ Sometimes it may be
         impossible to address (new
         building required ...)
       ‣ If research IT footprint is
         growing fast; you must be well
         versed in your facility
         planning/upgrade process
                                          51
Wednesday, October 31, 12
Facility Stuff - One more thing

       ‣ Sometimes central IT will begin
         facility upgrade efforts without
         consulting with research users
            •       This was the reason behind one of
                    our more ‘interesting’ projects in
                    2012
       ‣ ... a client was weeks away from
         signing off on a $MM datacenter
         which would not have had enough
         electricity to support current
         research & faculty recruiting
         commitments
                                                         52
Wednesday, October 31, 12
“Fat” Nodes Replacing Clusters
                                                      53
Wednesday, October 31, 12
Fat Nodes - 1 box replacing a cluster

       ‣ This server has 64 CPU Cores
       ‣ .. and up to 1TB of RAM
       ‣ Fantastic Genomics/Chemistry
         system
             •       A 256GB RAM version only
                     costs $13,000
       ‣ These single systems are
         replacing small clusters in
         some environments
                                                54
Wednesday, October 31, 12
Fat Nodes - Clever Scale-out Packaging

       ‣ This 2U chassis contains 4
         individual servers
       ‣ Systems like this get near
         “blade” density without
         the price premium seen
         with proprietary blade
         packaging
       ‣ These “shrink” clusters in
         a major way or replace
         small ones
                                           55
Wednesday, October 31, 12
The other trend ...
                                                  56
Wednesday, October 31, 12
“Serious” IT now in your wet lab ...

       ‣ Instruments used to ship with a
         Windows PC “instrument
         control workstation”
       ‣ As instruments get more
         powerful the “companion”
         hardware is starting to scale-up
       ‣ End result: very significant stuff
         that used to live in your
         datacenter is now being rolled
         into lab enviroments
                                             57
Wednesday, October 31, 12
“Serious” IT now in your wet lab ...

       ‣ You may be surpised what
         you find in your labs in ’12
       ‣ ... can be problematic for a
         few reasons ...
       1. IT support & backup
       2. Power & cooling
       3. Noise
       4. Security
                                         58
Wednesday, October 31, 12
Networking
                       Also not particularly worrisome ...


                                                             59
Wednesday, October 31, 12
Networking



      ‣ Networking is also not super complicated
      ‣ It’s also fairly cheap & commoditized in ’12
      ‣ There are three core uses for networks:
            1. Communication between servers & services
            2. Message passing within a single application
            3. Sharing files and data between many clients

                                                             60
Wednesday, October 31, 12
Networking 1 - Servers & Services



      ‣ Ethernet. Period. Enough said.
      ‣ Your only decision is between 10-Gig and 1-Gig ethernet
      ‣ 1-Gig Ethernet is pervasive and dirt cheap
      ‣ 10-Gig Ethernet is getting cheaper and on it’s way to
        becoming pervasive


                                                                  61
Wednesday, October 31, 12
Networking 1 - Ethernet



      ‣ Everything speaks ethernet
      ‣ 1-Gig is still the common interconnect for most things
      ‣ 10-Gig is the standard now for the “core”
      ‣ 10-Gig is the standard for top-of-rack and “aggregation”
      ‣ 10-Gig connections to “special” servers is the norm


                                                                   62
Wednesday, October 31, 12
Networking 2 - Message Passing

      ‣ Parallel applications can span many servers at once
      ‣ Communicate/coordinate via “message passing”
      ‣ Ethernet is fine for this but has a somewhat high latency
        between message packets
      ‣ Many apps can tolerate Ethernet-level latency; some
        applications clearly benefit from a message passing
        network with lower latency
      ‣ There used to be many competing alternatives
      ‣ Clear 2012 winner is “Infiniband”                           63
Wednesday, October 31, 12
Networking 2 - Message Passing



      ‣ The only things you need to know ...
      ‣ Infiniband is an expensive networking alternative that
        offers much lower latency than Ethernet
      ‣ You would only pay for and deploy an IB fabric if you had
        an application or use case that requires it.
      ‣ No big deal. It’s just “another” network.

                                                                    64
Wednesday, October 31, 12
Networking 3 - File Sharing



      ‣ For ‘Omics this is the primary focus area
      ‣ Overwhelming need for shared read/write access to files
        and data between instruments, HPC environment and
        researcher desktops
      ‣ In HPC environments you will often have a separate
        network just for file sharing traffic


                                                                 65
Wednesday, October 31, 12
Networking 3 - File Sharing

      ‣ Generic file sharing uses familiar NFS or Windows fileshare
        protocols. No big deal
      ‣ Always implemented over Ethernet although often a mixture
        of 10-Gig and 1-Gig connections
            •      10-Gig connections to the file servers, storage and edge switches;
                   1-gig connections to cluster nodes and user desktops
      ‣ Infiniband also has a presence here
            •      Many “parallel” or “cluster” filesystems may talk to the clients
                   via NFS-over-ethernet but internally the distributed components
                   may use a private Infiband network for metadata and
                   coordination.
                                                                                       66
Wednesday, October 31, 12
Storage.
                            (the hard bit ...)


                                                 67
Wednesday, October 31, 12
Storage
  Setting the stage ...

      ‣ Life science is generating torrents of data
      ‣ Size and volume often dwarf all other research areas -
        particularly with Bioinformatics & Genomics work
      ‣ Big/Fast storage is not cheap and is not commodity
      ‣ There are many vendors and many ways to spectacularly
        waste tons of money
      ‣ And we still have an overwhelming need for storage that
        can be shared concurrently between many different
        users, systems and clients
                                                                  68
Wednesday, October 31, 12
Life Science “Data Deluge”


      ‣ Scare stories and shocking graphs getting tiresome
      ‣ We’ve been dealing with terabyte-scale lab instruments
        & data movement issues since 2004
            •      And somehow we’ve managed to survive ...
      ‣ Next few slides
            •      Try to explain why storage does not stress me out all that
                   much in 2012 ...


                                                                                69
Wednesday, October 31, 12
The sky is not falling.
  1. You are not the Broad Institute or Sanger Center

     ‣ Overwhelming majority of us do not operate at Broad/
       Sanger levels
           •       These folks add 200+ TB a week in primary storage
     ‣ We still face challenges but the scale/scope is well
       within the bounds of what traditional IT technologies can
       handle
     ‣ We’ve been doing this for years
           •       Many vendors, best practices, “war stories”, proven methods
                   and just plain “people to talk to…”
                                                                                 70
Wednesday, October 31, 12
The sky is not falling.
  2. Instrument Sanity Beckons




     ‣ Yesteryear: Terascale .TIFF Tsunami
     ‣ Yesterday: RTA, in-instrument data reduction
     ‣ Today: Basecalls, BAMs & Outsourcing
     ‣ Tomorrow: Write directly to the cloud



                                                      71
Wednesday, October 31, 12
The sky is not falling.
  3. Peta-scale storage is not really exotic or unusual any more.

    ‣ Peta-scale storage has not been a risky exotic technology
      gamble for years now
          •       A few years ago you’d be betting your career
    ‣ Today it’s just an engineering & budget exercise
          •       Multiple vendors don’t find petascale requirements particularly
                  troublesome and can deliver proven systems within weeks
          •       $1M (or less in ’12) will get you 1PB from several top vendors
    ‣ However, still HARD to do BIG, FAST & SAFE
          •       Hard but solvable; many resources & solutions out there
                                                                                   72
Wednesday, October 31, 12
On the other hand ...




                                                    73
Wednesday, October 31, 12
OMG! The Sky Is Falling!
                       Maybe a little panic is appropriate ...




                                                                 74
Wednesday, October 31, 12
The sky IS falling!
  1. Those @!*#&^@ Scientists ...

     ‣ As instrument output declines …
     ‣ Downstream storage consumption by
       end-user researchers is increasing
       rapidly
     ‣ Each new genome generates new
       data mashups, experiments, data
       interchange conversions, etc.
     ‣ MUCH harder to do capacity planning
       against human beings vs.
       instruments
                                             75
Wednesday, October 31, 12
The sky IS falling!
  2. @!*#&^@ Scientific Leadership ...


     ‣ Sequencing is already a
       commodity
     ‣ NOBODY simply banks the
       savings
     ‣ EVERYBODY buys or does
       more




                                        76
Wednesday, October 31, 12
The sky IS falling!
    Gigabases vs. Moores Law
                                                             OMG!!




                                   BIG SCARY GRAPH




       2007                 2008     2009   2010     2011   2012
:                                                                    77
Wednesday, October 31, 12
The sky IS falling!
  3. Uncomfortable truths


    ‣ Cost of acquiring data (genomes)
      falling faster than rate at which
      industry is increasing drive capacity
    ‣ Human researchers downstream of
      these datasets are also consuming
      more storage (and less predictably)
    ‣ High-scale labs must react or
      potentially have catastrophic issues
      in 2012-2013

                                              78
Wednesday, October 31, 12
The sky IS falling!
  5. Something will have to break ...

     ‣ This is not sustainable
           •      Downstream consumption
                  exceeding instrument data
                  reduction
           •      Commoditization yielding
                  more platforms
           •      Chemistry moving faster
                  than IT infrastructure
           •      What the heck are we
                  doing with all this
                  sequence?
                                              79
Wednesday, October 31, 12
CRAM it.



                                       80
Wednesday, October 31, 12
The sky IS falling!
  CRAM it in 2012 ...

     ‣ Minor improvements are useless; order-of-magnitude needed
     ‣ Some people are talking about radical new methods –
       compressing against reference sequences and only storing the
       diffs
           •      With a variable compression “quality budget” to spend on
                  lossless techniques in the areas you care about
     ‣ http://biote.am/5v - Ewan Birney on “Compressing DNA”
     ‣ http://biote.am/5w - The actual CRAM paper
     ‣ If CRAM takes off, storage landscape will change
                                                                             81
Wednesday, October 31, 12
What comes next?
                              Next 18 months will be really fun...
                                                                     82
Wednesday, October 31, 12
What comes next.
  The same rules apply for 2012 and beyond ...

     ‣ Accept that science changes faster than IT infrastructure
     ‣ Be glad you are not Broad/Sanger
     ‣ Flexibility, scalability and agility become the key
       requirements of research informatics platforms
           •       Tiered storage is in your future ...
     ‣ Shared/concurrent access is still the overwhelming
       storage use case
           •       We’ll still continue to use clustered, parallel and scale-out
                   NAS solutions
                                                                                   83
Wednesday, October 31, 12
What comes next.
  In the following year ...


     ‣ Many peta-scale capable systems deployed
           •       Most will operate in the hundreds-of-TBs range
     ‣ Far more aggressive “data triage”
           •       “.BAM only!”
     ‣ Genome compression via CRAM
     ‣ Even more data will sit untouched & unloved
     ‣ Growing need for tiers, HSM & even tape

                                                                    84
Wednesday, October 31, 12
What comes next.
  In the following year ...

     ‣ Broad, Sanger and others will pave the way with respect
       to metadata-aware & policy driven storage frameworks
           •       And we’ll shamelessly copy a year or two later
     ‣ I’m still on my cloud storage kick
           •       Economics are inescapable; Will be built into storage
                   platforms, gateways & VMs
           •       Amazon S3 is only a HTTP RESTful call away
           •       Cloud will become “just another tier”

                                                                           85
Wednesday, October 31, 12
What comes next.
  Expect your storage to be smarter & more capable ...


     ‣ What do DDN, Panasas, Isilon,
       BlueArc, etc. have in common?
           •       Under the hood they all run
                   Unix or Unix-like OS’s on
                   x86_64 architectures
     ‣ Some storage arrays can
       already run applications natively
           •       More will follow
           •       Likely a big trend for 2012
                                                         86
Wednesday, October 31, 12
But what about today?




                                                    87
Wednesday, October 31, 12
Still trying to avoid this.
                  (100TB scientific data, no RAID, unsecured on lab benchtops )




                                                                                 88
Wednesday, October 31, 12
Flops, Failures & Freakouts
                                    Common storage mistakes ...
                                                                  89
Wednesday, October 31, 12
Flops, Failures & Freakouts
  #1 - Unchecked Enterprise Storage Architects
     ‣ Scientist: “My work is priceless,
       I must be able to access it at all times”
     ‣ Corporate/Enterprise Storage Guru:
       “Hmmm …you want high availability, huh?”
     ‣ System delivered:
           •       40TB Enterprise SAN
           •       Asynchronous replication to remote site
           •       Can’t scale, can’t do NFS easily
           •       ~$500K per year in operational & maintenance costs
                                                                        90
Wednesday, October 31, 12
Flops, Failures & Freakouts
  #2 - Unchecked User Requirements

     ‣ Scientist:
       “I do bioinformatics, I am rate limited by the speed of file
       IO operations. Faster disk means faster science. “


     ‣ System delivered:
           •       Budget blown on top tier fastest-possible ‘Cadillac’ system


     ‣ Outcome:
           •       System fills to capacity in 9 months; zero budget left.
                                                                                 91
Wednesday, October 31, 12
Flops, Failures & Freakouts
  #3 - D.I.Y Cluster & Parallel Filesystems

     ‣         Common source of storage unhappiness
     ‣         Root cause:
           •       Not enough pre-sales time spent on design and engineering
           •       Choosing Open Source over Common Sense

     ‣         System as built:
           •       Not enough metadata controllers
           •       Issues with interconnect fabric
           •       Poor selection & configuration of key components
     ‣         End result:
           •       Poor performance or availability
           •       High administrative/operational burden
                                                                               92
Wednesday, October 31, 12
Hard Lessons Learned
                                    What these tales tell us ...
                                                                   93
Wednesday, October 31, 12
Flops, Failures & Freakouts
  Hard Lessons Learned


     ‣ End-users are not precise with storage terms
           •       “Extremely reliable” means no data loss;
                   Not millions spent on 99.99999% high availability
     ‣ When true costs are explained:
           •        Many research users will trade a small amount of uptime or
                   availability for more capacity or capabilities
           •       … will also often trade some level of performance in
                   exchange for a huge win in capacity or capability

                                                                                 94
Wednesday, October 31, 12
Flops, Failures & Freakouts
  Hard Lessons Learned



     ‣ End-users demand the world but are willing to
       compromise
           •       Necessary for IT staff to really talk to them and understand
                   work, needs and priorities
           •       Also essential to explain true costs involved
     ‣ People demanding the “fastest” storage often don’t have
       actual metrics to back their assertions

                                                                                  95
Wednesday, October 31, 12
Flops, Failures & Freakouts
  Hard Lessons Learned



     ‣ Software-based parallel or clustered file systems are
       non-trivial to correctly implement
           •       Essential to involve experts in the initial design phase
           •       Even if using ‘open source’ version …
     ‣ Commercial support is essential
           •       And I say this as an open source zealot …


                                                                              96
Wednesday, October 31, 12
The road ahead
                                        My $.02 for 2012...
                                                              97
Wednesday, October 31, 12
The Road Ahead
                            Storage Trends & Tips for 2012


     ‣ Peta-capable platforms required
     ‣ Scale-out NAS still the best fit
     ‣ Customers will no longer build one
       big scale-out NAS tier
     ‣ My ‘hack’ of using nearline spec
       storage as primary science tier is
       probably obsolete in ’12
     ‣ Not everything is worth backing up
     ‣ Expect disruptive stuff
                                                             98
Wednesday, October 31, 12
The Road Ahead
                                   Trends & Tips for 2012

     ‣ Monolithic tiers no longer cut it
           •      Changing science & instrument
                  output patterns are to blame
           •      We can’t get away with biasing
                  towards capacity over
                  performance any more
     ‣ pNFS should go mainstream in ’12
           •      { fantastic news }
     ‣ Tiered storage IS in your future
           •      Multiple vendors & types
                                                            99
Wednesday, October 31, 12
The Road Ahead
                                 Trends & Tips for 2012


     ‣ Your storage will be able to run apps
           •      Dedupe, cloud gateways &
                  replication
           •      ‘CRAM’ or similar compression
           •      Storage Resource Brokers
                  (iRODS) & metadata servers
           •      HDFS/Hadoop hooks?
           •      Lab, Data management & LIMS
                  applications                              Drobo Appliance running
                                                          BioTeam MiniLIMS internally...


                                                                                           100
Wednesday, October 31, 12
The Road Ahead
                                   Trends & Tips for 2012

    ‣ Hadoop / MapReduce / BigData
          •      Just like GRID and CLOUD back
                 in the day you’ll need a gas mask
                 to survive the smog of hype and
                 vendor press releases.
          •      You still need to think about it
          •      ... and have a roadmap for doing it
          •      Deep, deep ties to your storage
          •      Your users want/need it
          •      My $.02? Fantastic cloud use case
                                                            101
Wednesday, October 31, 12
Disruptive Technology Example

                                                            102
Wednesday, October 31, 12
Backblaze Pod For Biotech

                                                        103
Wednesday, October 31, 12
Backblaze: 100Tb for $12,000

                                                           104
Wednesday, October 31, 12
Intro                           1
  Meta-Issues (The Big Picture)   2
  Infrastructure Tour             3
  Compute & HPC                   4
  Storage                         5
  Cloud & Big Data                6
                                      105
Wednesday, October 31, 12
The ‘C’ word
                            Does a Bio-IT talk exist if it does not mention “the cloud”?
                                                                                           106
Wednesday, October 31, 12
Defining the “C-word”



      ‣ Just like “Grid Computing” the “cloud” word has been
        diluted to almost uselessness thanks to hype, vendor
        FUD and lunatic marketing minions
      ‣ Helpful to define terms before talking seriously
      ‣ There are three types of cloud
      ‣ “IAAS”, “SAAS” & “PAAS”

                                                               107
Wednesday, October 31, 12
Cloud Stuff


      ‣ Before I get nasty ...
      ‣ I am not an Amazon shill
      ‣ I am a jaded, cynical, zero-loyalty consumer of IT
        services and products that let me get #%$^ done
      ‣ Because I only get paid when my #%$^ works, I am
        picky about what tools I keep in my toolkit
      ‣ Amazon AWS is an infinitely cool tool
                                                             108
Wednesday, October 31, 12
Cloud Stuff - SAAS




      ‣ SAAS = “Software as a Service”
      ‣ Think:
      ‣ gmail.com



                                         109
Wednesday, October 31, 12
Cloud Stuff - SAAS



      ‣ PAAS = “Platform as a Service”
      ‣ Think:
      ‣ https://basespace.illumina.com/
      ‣ salesforce.com
      ‣ MS office365.com, Apple iCloud, etc.


                                              110
Wednesday, October 31, 12
Cloud Stuff - IAAS




      ‣ IAAS = “Infrastructure as a Service”
      ‣ Think:
      ‣ Amazon Web Services
      ‣ Microsoft Azure



                                               111
Wednesday, October 31, 12
Cloud Stuff - IAAS




      ‣ When I talk “cloud” I mean IAAS
      ‣ And right now in 2012 Amazon IS the IAAS cloud
      ‣ ... everyone else is a pretender



                                                         112
Wednesday, October 31, 12
Cloud Stuff - Why IAAS

      ‣ IAAS clouds are the focal point for life science
        informatics
            •      Although some vendors are now offering PAAS and SAAS
                   options ...
      ‣ The “infrastructure” clouds give us the “building blocks”
        we can assemble into useful stuff
      ‣ Right now Amazon has the best & most powerful
        collection of “building blocks”
      ‣ The competition is years behind ...
                                                                          113
Wednesday, October 31, 12
A message for the
                            cloud pretenders…


Wednesday, October 31, 12
No APIs?
                            Not a cloud.


Wednesday, October 31, 12
No self-service?
                              Not a cloud.


Wednesday, October 31, 12
Installing VMWare
                       & excreting a press release?
                             Not a cloud.


Wednesday, October 31, 12
I have to email a human?
                     Not a cloud.


Wednesday, October 31, 12
~50% failure rate when launching
                 new servers?

                            Stupid cloud.

Wednesday, October 31, 12
Block storage
               and virtual servers only?

                            (barely) a cloud;

Wednesday, October 31, 12
Private Clouds
                                        My $.02 cents
                                                        121
Wednesday, October 31, 12
Private Clouds in 2012:


      ‣ I’m no longer dismissing them as “utter crap”
      ‣ Usable & useful in certain situations
      ‣ Hype vs. Reality ratio still wacky
      ‣ Sensible only for certain shops
            •      Have you seen what you have to do
                   to your networks & gear?
      ‣ There are easier ways



Wednesday, October 31, 12
Private Clouds: My Advice for ‘12



      ‣ Remain cynical (test vendor claims)
      ‣ Due Diligence still essential
      ‣ I personally would not deploy/buy anything that does not
        explicitly provide Amazon API compatibility




Wednesday, October 31, 12
Private Clouds: My Advice for ‘12



         Most people are better off:
            1. Adding VM platforms to existing HPC clusters &
               environments
            2. Extending enterprise VM platforms to allow user self-
               service & server catalogs




Wednesday, October 31, 12
Cloud Advice
                                           My $.02 cents
                                                           125
Wednesday, October 31, 12
Cloud Advice
  Don’t get left behind




      ‣ Research IT Organizations need a cloud strategy today
      ‣ Those that don’t will be bypassed by frustrated users
      ‣ IaaS cloud services are only a departmental credit card
        away ... and some senior scientists are too big to be fired
        for violating IT policy :)


                                                                     126
Wednesday, October 31, 12
Cloud Advice
  Design Patterns




      ‣ You actually need three tested cloud design patterns:


      ‣ (1) To handle ‘legacy’ scientific apps & workflows
      ‣ (2) The special stuff that is worth re-architecting
      ‣ (3) Hadoop & big data analytics


                                                                127
Wednesday, October 31, 12
Cloud Advice
  Legacy HPC on the Cloud




      ‣ MIT StarCluster
            •      http://web.mit.edu/star/cluster/
      ‣ This is your baseline
      ‣ Extend as needed



                                                      128
Wednesday, October 31, 12
Cloud Advice
  “Cloudy” HPC



      ‣ Some of our research workflows are important enough to
        be rewritten for “the cloud” and the advantages that a
        truly elastic & API-driven infrastructure can deliver
      ‣ This is where you have the most freedom
      ‣ Many published best practices you can borrow
      ‣ Amazon Simple Workflow Service (SWS) look sweet
      ‣ Good commercial options: Cycle Computing, etc.
                                                             129
Wednesday, October 31, 12
Hadoop & “Big Data”



      ‣ Hadoop and “big data” need to be on your radar
      ‣ Be careful though, you’ll need a gas mask to avoid the
        smog of marketing and vapid hype
      ‣ The utility is real and this does represent the “future
        path” for analysis of large data sets


                                                                  130
Wednesday, October 31, 12
Cloud Advice - Hadoop & Big Data
  Big Data HPC




      ‣ It’s gonna be a MapReduce world, get used to it
      ‣ Little need to roll your own Hadoop in 2012
      ‣ ISV & commercial ecosystem already healthy
      ‣ Multiple providers today; both onsite & cloud-based
      ‣ Often a slam-dunk cloud use case


                                                              131
Wednesday, October 31, 12
Hadoop & “Big Data”
  What you need to know




      ‣ “Hadoop” and “Big Data” are now general terms
      ‣ You need to drill down to find out what people actually
        mean
      ‣ We are still in the period where senior mgmt. may
        demand “hadoop” or “big data” capability without any
        actual business or scientific need


                                                                 132
Wednesday, October 31, 12
Hadoop & “Big Data”
  What you need to know
      ‣ In broad terms you can break “Big Data” down into two very
        basic use cases:
      1. Compute: Hadoop can be used as a very powerful platform for
         the analysis of very large data sets. The google search term
         here is “map reduce”
      2. Data Stores: Hadoop is driving the development of very
         sophisticated “no-SQL” “non-Relational” databases and data
         query engines. The google search terms include “nosql”,
         “couchdb”, “hive”, “pig” & “mongodb”, etc.
      ‣ Your job is to figure out which type applies for the groups
        requesting “hadoop” or “big data” capability
                                                                        133
Wednesday, October 31, 12
High Throughput Science
  Hadoop vs traditional Linux Clusters



      ‣ Hadoop is a very complex beast
      ‣ It’s also the way of the future so you can’t ignore it
      ‣ Very tight dependency on moving the ‘compute’ as close
        as possible to the ‘data’
      ‣ Hadoop clusters are just different enough that they do
        not integrate cleanly with traditional Linux HPC system
      ‣ Often treated as separate silo or punted to the cloud
                                                                  134
Wednesday, October 31, 12
Hadoop & “Big Data”
  What you need to know




      ‣ Hadoop is being driven by a small group of academics
        writing and releasing open source life science hadoop
        applications;
      ‣ Your people will want to run these codes
      ‣ In some academic environments you may find people
        wanting to develop on this platform


                                                                135
Wednesday, October 31, 12
Cloud Data Movement
                                        My $.02 cents
                                                        136
Wednesday, October 31, 12
Cloud Data Movement




      ‣ We’ve slung a ton of data in and out of the cloud
      ‣ We used to be big fans of physical media movement
      ‣ Remember these pictures?
      ‣ ...



                                                            137
Wednesday, October 31, 12
Physical data movement station 1

                                                           138
Wednesday, October 31, 12
Physical data movement station 2

                                                           139
Wednesday, October 31, 12
“Naked” Data Movement

                                                    140
Wednesday, October 31, 12
“Naked” Data Archive

                                                   141
Wednesday, October 31, 12
Cloud Data Movement




      ‣ We’ve got a new story for 2012
      ‣ And the next image shows why ...




                                           142
Wednesday, October 31, 12
March 2012
                                         143
Wednesday, October 31, 12
Cloud Data Movement
  Wow!

      ‣ With a 1GbE internet connection ...
      ‣ and using Aspera software ....
      ‣ We sustained 700 MB/sec for more than 7 hours
        freighting genomes into Amazon Web Services
      ‣ This is fast enough for many use cases, including
        genome sequencing core facilities*
      ‣ Chris Dwan’s webinar on this topic:
        http://biote.am/7e

                                                            144
Wednesday, October 31, 12
Cloud Data Movement
  Wow!




      ‣ Results like this mean we now favor network-based data
        movement over physical media movement
      ‣ Large-scale physical data movement carries a high
        operational burden and consumes non-trivial staff time &
        resources




                                                                 145
Wednesday, October 31, 12
Cloud Data Movement
  There are three ways to do network data movement ...


      ‣ Buy software from Aspera and be done with it
      ‣ Attend the annual SuperComputing conference & see
        which student group wins the bandwidth challenge
        contest; use their code
      ‣ Get GridFTP from the Globus folks
            •      Trend: At every single “data movement” talk I’ve been to in
                   2011 it seemed that any speaker who was NOT using Aspera
                   was a very happy user of GridFTP. #notCoincidence


                                                                             146
Wednesday, October 31, 12
Putting it all together
                                                      147
Wednesday, October 31, 12
Wrapping up


             IT may just be a means to an end but you need to get
             your head wrapped around it
      ‣ (1) So you use/buy/request the correct ‘stuff’
      ‣ (2) So you don’t get cheated by a vendor
      ‣ (3) Because you need to understand your tools
      ‣ (4) Because trends in automation and orchestration
        are blurring the line between scientist & sysadmin

                                                                    148
Wednesday, October 31, 12
Wrapping up - Compute & Servers


  ‣ Servers and compute power are pretty straightforward
  ‣ You just need to know roughly what your preferred
    compute building blocks look like
  ‣ ... and what special purpose resources you require (GPUs,
    Large Memory, High Core Count, etc.)
  ‣ Some of you may also have to deal with sizing, cost and
    facility (power, cooling, space) issues as well

                                                                149
Wednesday, October 31, 12
Wrapping up - Networking

  ‣ Networking is also not hugely painful thing
  ‣ Ethernet rules the land; you might have to pick and choose
    between 1-Gig and 10-Gig Ethernet
  ‣ Understand that special networking technologies like
    Infiniband offer advantages but they are expensive and need
    to be applied carefully (if at all)
  ‣ Knowing if your MPI apps are latency sensitive will help
  ‣ And remember that networking is used for multiple things
    (server communication, application message passing & file
    and data sharing)
                                                                 150
Wednesday, October 31, 12
Wrapping up - Storage

  ‣ If you are going to focus on one IT area, this is it
  ‣ It’s incredibly important for genomics and also incredibly
    complicated. Many ways to waste money or buy the ‘wrong’ stuff
  ‣ You may only have one chance to get it correct and may have to
    live with your decision for years
  ‣ Budget is finite. You have to balance “speed” vs “size” vs
    “expansion capacity” vs “high availibility” and more ...
  ‣ “Petabyte-capable Scale-out NAS” is usually the best starting
    point. You deviate away from NAS when scientific or technical
    requirements demand “something else”.
                                                                     151
Wednesday, October 31, 12
Wrapping up - Hadoop / Big Data

  ‣ Probably the way of the future for big-data analytics. It’s
    worth spending time to study; especially if you intend to
    develop software in the future
  ‣ Popular target for current and emerging high-scale
    genomics tools. If you want to use those tools you need to
    deploy Hadoop
  ‣ It’s complicated and still changing rapidly. It can be
    difficult to integrate into existing setups
  ‣ Be cynical about hype & test vendor claims
                                                                  152
Wednesday, October 31, 12
Wrapping up - Cloud


  ‣ Cloud is the future. The economics are inescapable and the
    advantages are compelling.
  ‣ The main obstacle holding back genomics is terabyte
    scale data movement. The cloud is horrible if you have to
    move 2TB of data before you can run 2Hrs of compute!
  ‣ Your future core facility may involve a comp bio lab
    without a datacenter at all. Some organizations are
    already 100% virtual and 100% cloud-based

                                                                153
Wednesday, October 31, 12
The NGS cloud clincher.




                            700 mb/sec sustained for ~7 hours
                              West Coast to East Coast USA
                                                                154
Wednesday, October 31, 12
Wrapping up - Cloud, continued

  ‣ Understand that for the foreseeable future there are THREE distinct
    cloud architectures and design patterns.
  ‣ Vendors who push “100% hadoop” or “legacy free” solutions are
    idiots and should be shoved out the door. We will be running legacy
    codes and workflows for many years to come
  ‣ Your three design patterns on the cloud:
            •      Legacy HPC systems
                   (replicate traditional clusters in the cloud)
            •      Hadoop
            •      Cloudy
                   (when you rewrite something to fully leverage cloud capability)
                                                                                     155
Wednesday, October 31, 12
Thanks!
                            Slides online at: http://slideshare.net/chrisdag/

                                                                                156
Wednesday, October 31, 12

Weitere ähnliche Inhalte

Was ist angesagt?

2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZ2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZChris Dagdigian
 
2013: Trends from the Trenches
2013: Trends from the Trenches2013: Trends from the Trenches
2013: Trends from the TrenchesChris Dagdigian
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019Chris Dagdigian
 
Big Data Meets HCI—How South African Insurance Provider King Price Gives Deve...
Big Data Meets HCI—How South African Insurance Provider King Price Gives Deve...Big Data Meets HCI—How South African Insurance Provider King Price Gives Deve...
Big Data Meets HCI—How South African Insurance Provider King Price Gives Deve...Dana Gardner
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)mark madsen
 
Maciej Marek (Philip Morris International) - The Tools of The Trade
Maciej Marek (Philip Morris International) - The Tools of The TradeMaciej Marek (Philip Morris International) - The Tools of The Trade
Maciej Marek (Philip Morris International) - The Tools of The TradeCodiax
 
The European CIO Conference - November 27th, 2014
The European CIO Conference - November 27th, 2014The European CIO Conference - November 27th, 2014
The European CIO Conference - November 27th, 2014Yves Caseau
 
Physiotherapy at home
Physiotherapy at homePhysiotherapy at home
Physiotherapy at homeGeorge Goh
 
IT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOsIT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOsVikram Ramesh
 
Inria Tech Talk : boostez la performance de vos objets connectés - Mercredi 2...
Inria Tech Talk : boostez la performance de vos objets connectés - Mercredi 2...Inria Tech Talk : boostez la performance de vos objets connectés - Mercredi 2...
Inria Tech Talk : boostez la performance de vos objets connectés - Mercredi 2...FrenchTechCentral
 
Top data center trends and predictions to watch for in 2016.
Top data center trends and predictions to watch for in 2016.Top data center trends and predictions to watch for in 2016.
Top data center trends and predictions to watch for in 2016.Swaroopanand Laxmikruppaneth
 
XEBICON Public November 2015
XEBICON Public November 2015XEBICON Public November 2015
XEBICON Public November 2015Yves Caseau
 
Top 5 Deep Learning and AI Stories - November 3, 2017
Top 5 Deep Learning and AI Stories - November 3, 2017Top 5 Deep Learning and AI Stories - November 3, 2017
Top 5 Deep Learning and AI Stories - November 3, 2017NVIDIA
 
Agents for Agility - The Just-in-Time Enterprise Has Arrived
Agents for Agility - The Just-in-Time Enterprise Has ArrivedAgents for Agility - The Just-in-Time Enterprise Has Arrived
Agents for Agility - The Just-in-Time Enterprise Has ArrivedInside Analysis
 
Will the Cloud be your disaster, or will Cloud be your disaster recovery?
Will the Cloud be your disaster, or will Cloud be your disaster recovery?Will the Cloud be your disaster, or will Cloud be your disaster recovery?
Will the Cloud be your disaster, or will Cloud be your disaster recovery?Livingstone Advisory
 
Big Data Scotland
Big Data ScotlandBig Data Scotland
Big Data ScotlandRay Bugg
 
Systematic Innovation in Software Using TRIZ
Systematic Innovation in Software Using TRIZSystematic Innovation in Software Using TRIZ
Systematic Innovation in Software Using TRIZMichael Kalika
 
ALVIS for Innovation and Decision Making
ALVIS for Innovation and Decision MakingALVIS for Innovation and Decision Making
ALVIS for Innovation and Decision MakingNavneet Bhushan
 
MESA workshop ARC Europe Industry Forum 2016
MESA workshop ARC Europe Industry Forum 2016MESA workshop ARC Europe Industry Forum 2016
MESA workshop ARC Europe Industry Forum 2016Valentijn de Leeuw
 
Cloud: Fuelling the crisis of confidence in corporate IT?
Cloud: Fuelling the crisis of confidence in corporate IT?Cloud: Fuelling the crisis of confidence in corporate IT?
Cloud: Fuelling the crisis of confidence in corporate IT?Livingstone Advisory
 

Was ist angesagt? (20)

2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZ2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZ
 
2013: Trends from the Trenches
2013: Trends from the Trenches2013: Trends from the Trenches
2013: Trends from the Trenches
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019
 
Big Data Meets HCI—How South African Insurance Provider King Price Gives Deve...
Big Data Meets HCI—How South African Insurance Provider King Price Gives Deve...Big Data Meets HCI—How South African Insurance Provider King Price Gives Deve...
Big Data Meets HCI—How South African Insurance Provider King Price Gives Deve...
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)
 
Maciej Marek (Philip Morris International) - The Tools of The Trade
Maciej Marek (Philip Morris International) - The Tools of The TradeMaciej Marek (Philip Morris International) - The Tools of The Trade
Maciej Marek (Philip Morris International) - The Tools of The Trade
 
The European CIO Conference - November 27th, 2014
The European CIO Conference - November 27th, 2014The European CIO Conference - November 27th, 2014
The European CIO Conference - November 27th, 2014
 
Physiotherapy at home
Physiotherapy at homePhysiotherapy at home
Physiotherapy at home
 
IT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOsIT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOs
 
Inria Tech Talk : boostez la performance de vos objets connectés - Mercredi 2...
Inria Tech Talk : boostez la performance de vos objets connectés - Mercredi 2...Inria Tech Talk : boostez la performance de vos objets connectés - Mercredi 2...
Inria Tech Talk : boostez la performance de vos objets connectés - Mercredi 2...
 
Top data center trends and predictions to watch for in 2016.
Top data center trends and predictions to watch for in 2016.Top data center trends and predictions to watch for in 2016.
Top data center trends and predictions to watch for in 2016.
 
XEBICON Public November 2015
XEBICON Public November 2015XEBICON Public November 2015
XEBICON Public November 2015
 
Top 5 Deep Learning and AI Stories - November 3, 2017
Top 5 Deep Learning and AI Stories - November 3, 2017Top 5 Deep Learning and AI Stories - November 3, 2017
Top 5 Deep Learning and AI Stories - November 3, 2017
 
Agents for Agility - The Just-in-Time Enterprise Has Arrived
Agents for Agility - The Just-in-Time Enterprise Has ArrivedAgents for Agility - The Just-in-Time Enterprise Has Arrived
Agents for Agility - The Just-in-Time Enterprise Has Arrived
 
Will the Cloud be your disaster, or will Cloud be your disaster recovery?
Will the Cloud be your disaster, or will Cloud be your disaster recovery?Will the Cloud be your disaster, or will Cloud be your disaster recovery?
Will the Cloud be your disaster, or will Cloud be your disaster recovery?
 
Big Data Scotland
Big Data ScotlandBig Data Scotland
Big Data Scotland
 
Systematic Innovation in Software Using TRIZ
Systematic Innovation in Software Using TRIZSystematic Innovation in Software Using TRIZ
Systematic Innovation in Software Using TRIZ
 
ALVIS for Innovation and Decision Making
ALVIS for Innovation and Decision MakingALVIS for Innovation and Decision Making
ALVIS for Innovation and Decision Making
 
MESA workshop ARC Europe Industry Forum 2016
MESA workshop ARC Europe Industry Forum 2016MESA workshop ARC Europe Industry Forum 2016
MESA workshop ARC Europe Industry Forum 2016
 
Cloud: Fuelling the crisis of confidence in corporate IT?
Cloud: Fuelling the crisis of confidence in corporate IT?Cloud: Fuelling the crisis of confidence in corporate IT?
Cloud: Fuelling the crisis of confidence in corporate IT?
 

Andere mochten auch

I Rmag Guide Sept09
I Rmag Guide Sept09I Rmag Guide Sept09
I Rmag Guide Sept09Eric Hill
 
How Sweco creates operationally high performance buildings and reduce client...
How Sweco creates  operationally high performance buildings and reduce client...How Sweco creates  operationally high performance buildings and reduce client...
How Sweco creates operationally high performance buildings and reduce client...Carita Kottila
 
The Expansive Hospital Game
The Expansive Hospital GameThe Expansive Hospital Game
The Expansive Hospital GameUTFPR
 
Presentation to BIM Lancashire Conference 2013 @uclan
Presentation to BIM Lancashire Conference 2013 @uclanPresentation to BIM Lancashire Conference 2013 @uclan
Presentation to BIM Lancashire Conference 2013 @uclanWhitbags
 
2nd Qatar BIM User Day Document Control and Collaboration Technologies
2nd Qatar BIM User Day  Document Control and Collaboration Technologies2nd Qatar BIM User Day  Document Control and Collaboration Technologies
2nd Qatar BIM User Day Document Control and Collaboration TechnologiesBIM User Day
 
Space based BIM technology
Space based BIM technologySpace based BIM technology
Space based BIM technologySeungkyu Yang
 
BIM at Stanford - Building Success
BIM at Stanford - Building SuccessBIM at Stanford - Building Success
BIM at Stanford - Building SuccessJason Holbrook, PMP
 
Fujitsu & M2SYS Webinar - How Palm Vein Biometrics Can Strengthen PCI and Wor...
Fujitsu & M2SYS Webinar - How Palm Vein Biometrics Can Strengthen PCI and Wor...Fujitsu & M2SYS Webinar - How Palm Vein Biometrics Can Strengthen PCI and Wor...
Fujitsu & M2SYS Webinar - How Palm Vein Biometrics Can Strengthen PCI and Wor...M2SYS Technology
 
The Benefits of Using a Biometric Timeclock in Workforce Management
The Benefits of Using a Biometric Timeclock in Workforce ManagementThe Benefits of Using a Biometric Timeclock in Workforce Management
The Benefits of Using a Biometric Timeclock in Workforce ManagementM2SYS Technology
 
BIM for Lifecycle Asset Management
BIM for Lifecycle Asset ManagementBIM for Lifecycle Asset Management
BIM for Lifecycle Asset ManagementEdwin Bartlett
 

Andere mochten auch (13)

I Rmag Guide Sept09
I Rmag Guide Sept09I Rmag Guide Sept09
I Rmag Guide Sept09
 
How Sweco creates operationally high performance buildings and reduce client...
How Sweco creates  operationally high performance buildings and reduce client...How Sweco creates  operationally high performance buildings and reduce client...
How Sweco creates operationally high performance buildings and reduce client...
 
The Expansive Hospital Game
The Expansive Hospital GameThe Expansive Hospital Game
The Expansive Hospital Game
 
Presentation to BIM Lancashire Conference 2013 @uclan
Presentation to BIM Lancashire Conference 2013 @uclanPresentation to BIM Lancashire Conference 2013 @uclan
Presentation to BIM Lancashire Conference 2013 @uclan
 
8. eBook #8 time theft
8. eBook #8 time theft8. eBook #8 time theft
8. eBook #8 time theft
 
Abdullah Mukhtar ppt
Abdullah Mukhtar pptAbdullah Mukhtar ppt
Abdullah Mukhtar ppt
 
2nd Qatar BIM User Day Document Control and Collaboration Technologies
2nd Qatar BIM User Day  Document Control and Collaboration Technologies2nd Qatar BIM User Day  Document Control and Collaboration Technologies
2nd Qatar BIM User Day Document Control and Collaboration Technologies
 
What does BIM mean to a maintenance technician? Beyond the hype, a practical ...
What does BIM mean to a maintenance technician? Beyond the hype, a practical ...What does BIM mean to a maintenance technician? Beyond the hype, a practical ...
What does BIM mean to a maintenance technician? Beyond the hype, a practical ...
 
Space based BIM technology
Space based BIM technologySpace based BIM technology
Space based BIM technology
 
BIM at Stanford - Building Success
BIM at Stanford - Building SuccessBIM at Stanford - Building Success
BIM at Stanford - Building Success
 
Fujitsu & M2SYS Webinar - How Palm Vein Biometrics Can Strengthen PCI and Wor...
Fujitsu & M2SYS Webinar - How Palm Vein Biometrics Can Strengthen PCI and Wor...Fujitsu & M2SYS Webinar - How Palm Vein Biometrics Can Strengthen PCI and Wor...
Fujitsu & M2SYS Webinar - How Palm Vein Biometrics Can Strengthen PCI and Wor...
 
The Benefits of Using a Biometric Timeclock in Workforce Management
The Benefits of Using a Biometric Timeclock in Workforce ManagementThe Benefits of Using a Biometric Timeclock in Workforce Management
The Benefits of Using a Biometric Timeclock in Workforce Management
 
BIM for Lifecycle Asset Management
BIM for Lifecycle Asset ManagementBIM for Lifecycle Asset Management
BIM for Lifecycle Asset Management
 

Ähnlich wie Bio-IT for Core Facility Managers

Ruxcon Finding Needles in Haystacks (the size of countries)
Ruxcon Finding Needles in Haystacks (the size of countries)Ruxcon Finding Needles in Haystacks (the size of countries)
Ruxcon Finding Needles in Haystacks (the size of countries)packetloop
 
The Rules of Scalable database
The Rules of Scalable databaseThe Rules of Scalable database
The Rules of Scalable databaseDahui Feng
 
MySQL Cluster no PayPal
MySQL Cluster no PayPalMySQL Cluster no PayPal
MySQL Cluster no PayPalMySQL Brasil
 
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedBio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedChris Dagdigian
 
iOS Development. Some practices.
iOS Development. Some practices.iOS Development. Some practices.
iOS Development. Some practices.Alexander Lobunets
 
Thematic Mapping and Drupal
Thematic Mapping and DrupalThematic Mapping and Drupal
Thematic Mapping and DrupalForum One
 
Cloud Tech III: Actionable Metrics
Cloud Tech III: Actionable MetricsCloud Tech III: Actionable Metrics
Cloud Tech III: Actionable Metricsroyrapoport
 
Softlayer Bluemix User Summit 2015 Keynote
Softlayer Bluemix User Summit 2015 KeynoteSoftlayer Bluemix User Summit 2015 Keynote
Softlayer Bluemix User Summit 2015 KeynoteJesse Proudman
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observabilityTheo Schlossnagle
 
Virtue desk atomic-db vs relational vs everything
Virtue desk atomic-db vs relational vs everythingVirtue desk atomic-db vs relational vs everything
Virtue desk atomic-db vs relational vs everythingjfxm3671
 
NoSql _ MongoDB - Italian Market copy
NoSql _ MongoDB - Italian Market copyNoSql _ MongoDB - Italian Market copy
NoSql _ MongoDB - Italian Market copyMongoDB
 
Performance Schema in MySQL (Danil Zburivsky)
Performance Schema in MySQL (Danil Zburivsky)Performance Schema in MySQL (Danil Zburivsky)
Performance Schema in MySQL (Danil Zburivsky)Ontico
 
Can we hack open source cloud platforms to help reduce emissions? cloudstack ...
Can we hack open source cloud platforms to help reduce emissions? cloudstack ...Can we hack open source cloud platforms to help reduce emissions? cloudstack ...
Can we hack open source cloud platforms to help reduce emissions? cloudstack ...Tom Raftery
 
OpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdf
OpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdfOpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdf
OpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdfOpenStack Foundation
 
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"Randy Bias
 
Interaction Design - why making skills matter
Interaction Design - why making skills matterInteraction Design - why making skills matter
Interaction Design - why making skills matteraadjan
 
Arnaud Porterie - The Truth About C++
Arnaud Porterie - The Truth About C++Arnaud Porterie - The Truth About C++
Arnaud Porterie - The Truth About C++Arnaud Porterie
 
Talk at Ken Goldberg's Berkeley Lab - June 12th
Talk at Ken Goldberg's Berkeley Lab - June 12thTalk at Ken Goldberg's Berkeley Lab - June 12th
Talk at Ken Goldberg's Berkeley Lab - June 12thNick Pinkston
 

Ähnlich wie Bio-IT for Core Facility Managers (20)

Ruxcon Finding Needles in Haystacks (the size of countries)
Ruxcon Finding Needles in Haystacks (the size of countries)Ruxcon Finding Needles in Haystacks (the size of countries)
Ruxcon Finding Needles in Haystacks (the size of countries)
 
The Rules of Scalable database
The Rules of Scalable databaseThe Rules of Scalable database
The Rules of Scalable database
 
Xtreme Deployment
Xtreme DeploymentXtreme Deployment
Xtreme Deployment
 
MySQL Cluster no PayPal
MySQL Cluster no PayPalMySQL Cluster no PayPal
MySQL Cluster no PayPal
 
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedBio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
 
iOS Development. Some practices.
iOS Development. Some practices.iOS Development. Some practices.
iOS Development. Some practices.
 
Thematic Mapping and Drupal
Thematic Mapping and DrupalThematic Mapping and Drupal
Thematic Mapping and Drupal
 
Cloud Tech III: Actionable Metrics
Cloud Tech III: Actionable MetricsCloud Tech III: Actionable Metrics
Cloud Tech III: Actionable Metrics
 
On Storing Big Data
On Storing Big DataOn Storing Big Data
On Storing Big Data
 
Softlayer Bluemix User Summit 2015 Keynote
Softlayer Bluemix User Summit 2015 KeynoteSoftlayer Bluemix User Summit 2015 Keynote
Softlayer Bluemix User Summit 2015 Keynote
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Virtue desk atomic-db vs relational vs everything
Virtue desk atomic-db vs relational vs everythingVirtue desk atomic-db vs relational vs everything
Virtue desk atomic-db vs relational vs everything
 
NoSql _ MongoDB - Italian Market copy
NoSql _ MongoDB - Italian Market copyNoSql _ MongoDB - Italian Market copy
NoSql _ MongoDB - Italian Market copy
 
Performance Schema in MySQL (Danil Zburivsky)
Performance Schema in MySQL (Danil Zburivsky)Performance Schema in MySQL (Danil Zburivsky)
Performance Schema in MySQL (Danil Zburivsky)
 
Can we hack open source cloud platforms to help reduce emissions? cloudstack ...
Can we hack open source cloud platforms to help reduce emissions? cloudstack ...Can we hack open source cloud platforms to help reduce emissions? cloudstack ...
Can we hack open source cloud platforms to help reduce emissions? cloudstack ...
 
OpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdf
OpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdfOpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdf
OpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdf
 
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
 
Interaction Design - why making skills matter
Interaction Design - why making skills matterInteraction Design - why making skills matter
Interaction Design - why making skills matter
 
Arnaud Porterie - The Truth About C++
Arnaud Porterie - The Truth About C++Arnaud Porterie - The Truth About C++
Arnaud Porterie - The Truth About C++
 
Talk at Ken Goldberg's Berkeley Lab - June 12th
Talk at Ken Goldberg's Berkeley Lab - June 12thTalk at Ken Goldberg's Berkeley Lab - June 12th
Talk at Ken Goldberg's Berkeley Lab - June 12th
 

Mehr von Chris Dagdigian

2021 Trends from the Trenches
2021 Trends from the Trenches2021 Trends from the Trenches
2021 Trends from the TrenchesChris Dagdigian
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte PushingChris Dagdigian
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
 
Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersChris Dagdigian
 
AWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating ResearchAWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating ResearchChris Dagdigian
 
Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Chris Dagdigian
 
2012: Trends from the Trenches
2012: Trends from the Trenches2012: Trends from the Trenches
2012: Trends from the TrenchesChris Dagdigian
 
Practical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationPractical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationChris Dagdigian
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudChris Dagdigian
 

Mehr von Chris Dagdigian (10)

2021 Trends from the Trenches
2021 Trends from the Trenches2021 Trends from the Trenches
2021 Trends from the Trenches
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte Pushing
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
 
Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC Clusters
 
AWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating ResearchAWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating Research
 
Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)
 
2012: Trends from the Trenches
2012: Trends from the Trenches2012: Trends from the Trenches
2012: Trends from the Trenches
 
Practical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationPractical Cloud & Workflow Orchestration
Practical Cloud & Workflow Orchestration
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the Cloud
 

Kürzlich hochgeladen

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Bio-IT for Core Facility Managers

  • 1. Bio-IT For Core Facility Leaders Tips, Tricks & Trends 2012 NERLCSD Meeting - www.nerlscd.org 1 Wednesday, October 31, 12
  • 2. Intro 1 Meta-Issues (The Big Picture) 2 Infrastructure Tour 3 Compute & HPC 4 Storage 5 Cloud & Big Data 6 2 Wednesday, October 31, 12
  • 3. I’m Chris. I’m an infrastructure geek. I work for the BioTeam. @chris_dag 3 Wednesday, October 31, 12
  • 4. BioTeam Who, what & why ‣ Independent consulting shop ‣ Staffed by scientists forced to learn IT, SW & HPC to get our own research done ‣ 12+ years bridging the “gap” between science, IT & high performance computing ‣ www.bioteam.net 4 Wednesday, October 31, 12
  • 5. Listen to me at your own risk Seriously. ‣ Clever people find multiple solutions to common issues ‣ I’m fairly blunt, burnt-out and cynical in my advanced age ‣ Significant portion of my work has been done in demanding production Biotech & Pharma environments ‣ Filter my words accordingly 5 Wednesday, October 31, 12
  • 6. Intro 1 Meta-Issues (The Big Picture) 2 Infrastructure Tour 3 Compute & HPC 4 Storage 5 Cloud & Big Data 6 6 Wednesday, October 31, 12
  • 7. Meta-Issues Why you need to track this stuff ... 7 Wednesday, October 31, 12
  • 8. Big Picture Why this stuff matters ... ‣ HUGE revolution in the rate at which lab instruments are being redesigned, improved & refreshed • Example: CCD sensor upgrade on that confocal microscopy rig just doubled your storage requirements • Example: That 2D ultrasound imager is now a 3D imager • Example: Illumina HiSeq upgrade just doubled the rate at which you can acquire genomes. Massive downstream increase in storage, compute & data movement needs 8 Wednesday, October 31, 12
  • 9. The Central Problem Is ... ‣ Instrumentation & protocols are changing FAR FASTER than we can refresh our Research-IT & Scientific Computing infrastructure • The science is changing month-to-month ... • ... while our IT infrastructure only gets refreshed every 2-7 years ‣ We have to design systems TODAY that can support unknown research requirements & workflows over many years (gulp ...) 9 Wednesday, October 31, 12
  • 10. The Central Problem Is ... ‣ The easy period is over ‣ 5 years ago you could toss inexpensive storage and servers at the problem; even in a nearby closet or under a lab bench if necessary ‣ That does not work any more; IT needs are too extreme ‣ 1000 CPU Linux clusters and petascale storage is the new normal; try fitting THAT in a closet! 10 Wednesday, October 31, 12
  • 11. The Take Home Lesson What core facility leadership needs to understand ‣ The incredible rate of cost decreases & capability gains seen in the lab instrumentation space is not mirrored everywhere ‣ As gear gets cheaper/faster, scientists will simply do more work and ask more questions. Nobody simply banks the financial savings when an instrument gets 50% cheaper -- they just buy two of them! ‣ IT technology is not improving at the same rate; we also can’t change our IT infrastructures all that rapidly 11 Wednesday, October 31, 12
  • 12. If you get it wrong ... ‣ Lost opportunity ‣ Frustrated & very vocal researchers ‣ Problems in recruiting ‣ Publication problems 12 Wednesday, October 31, 12
  • 13. Intro 1 Meta-Issues (The Big Picture) 2 Infrastructure Tour 3 Compute & HPC 4 Storage 5 Cloud & Big Data 6 13 Wednesday, October 31, 12
  • 14. Infrastructure Tour What does this stuff look like? 14 Wednesday, October 31, 12
  • 15. Self-contained single-instrument infrastructure 15 Wednesday, October 31, 12
  • 16. Ilumina GA 16 Wednesday, October 31, 12
  • 17. Instrument Control Workstation 17 Wednesday, October 31, 12
  • 18. SOLiD Sequencer ... 18 Wednesday, October 31, 12
  • 19. sits on top of a 24U server rack... 19 Wednesday, October 31, 12
  • 20. Another lab-local HPC cluster + storage 20 Wednesday, October 31, 12
  • 21. More lab-local servers & storage 21 Wednesday, October 31, 12
  • 22. Small core w/ multiple instrument support 22 Wednesday, October 31, 12
  • 23. Small cluster; large storage 23 Wednesday, October 31, 12
  • 24. Mid-sized core facility 24 Wednesday, October 31, 12
  • 25. Large Core Facility 25 Wednesday, October 31, 12
  • 26. Large Core Facility 26 Wednesday, October 31, 12
  • 27. Large Core Facility 27 Wednesday, October 31, 12
  • 28. Colocation Cages 28 Wednesday, October 31, 12
  • 29. Inside a colo cage 29 Wednesday, October 31, 12
  • 30. Linux Cluster + In-row chillers (front) 30 Wednesday, October 31, 12
  • 31. Linux Cluster + In-row chillers (rear) 31 Wednesday, October 31, 12
  • 32. 1U “Pizza Box” Style Server Chassis 32 Wednesday, October 31, 12
  • 33. Pile of “pizza boxes” 33 Wednesday, October 31, 12
  • 34. 4U Rackmount Servers 34 Wednesday, October 31, 12
  • 35. “Blade” Servers & Enclosure 35 Wednesday, October 31, 12
  • 36. Hybrid Modular Server 36 Wednesday, October 31, 12
  • 37. Integrated: Blades + Hypervisor + Storage 37 Wednesday, October 31, 12
  • 38. Petabyte-scale Storage 38 Wednesday, October 31, 12
  • 39. Real world screenshot from earlier this month 16 monster compute nodes + 22 GPU nodes Cost? 30 bucks an hour via AWS Spot Market Yep. This counts. 39 Wednesday, October 31, 12
  • 40. Physical data movement station 40 Wednesday, October 31, 12
  • 41. Physical data movement station 41 Wednesday, October 31, 12
  • 42. “Naked” Data Movement 42 Wednesday, October 31, 12
  • 43. “Naked” Data Archive 43 Wednesday, October 31, 12
  • 44. The cliche image 44 Wednesday, October 31, 12
  • 45. Backblaze Pod: 100 terabytes for $12,000 45 Wednesday, October 31, 12
  • 46. Intro 1 Meta-Issues (The Big Picture) 2 Infrastructure Tour 3 Compute & HPC 4 Storage 5 Cloud & Big Data 6 46 Wednesday, October 31, 12
  • 47. Compute Actually the easy bit ... 47 Wednesday, October 31, 12
  • 48. Compute Power Not a big deal in 2012 ... ‣ Compute power is largely a solved problem ‣ It’s just a commodity ‣ Cheap, simple & very easy to acquire ‣ Lets talk about what you need to know ... 48 Wednesday, October 31, 12
  • 49. Compute Trends Thinks you should be tracking ... ‣ Facility Issues ‣ “Fat Nodes” replacing Linux Clusters ‣ Increasing presence of serious “lab-local” IT 49 Wednesday, October 31, 12
  • 50. Facility Stuff ‣ Compute & storage requirements are getting larger and larger ‣ We are packing more “stuff” into smaller spaces ‣ This increases (radically) electrical and cooling requirements 50 Wednesday, October 31, 12
  • 51. Facility Stuff - Core issue ‣ Facility & power issues can take many months or years to address ‣ Sometimes it may be impossible to address (new building required ...) ‣ If research IT footprint is growing fast; you must be well versed in your facility planning/upgrade process 51 Wednesday, October 31, 12
  • 52. Facility Stuff - One more thing ‣ Sometimes central IT will begin facility upgrade efforts without consulting with research users • This was the reason behind one of our more ‘interesting’ projects in 2012 ‣ ... a client was weeks away from signing off on a $MM datacenter which would not have had enough electricity to support current research & faculty recruiting commitments 52 Wednesday, October 31, 12
  • 53. “Fat” Nodes Replacing Clusters 53 Wednesday, October 31, 12
  • 54. Fat Nodes - 1 box replacing a cluster ‣ This server has 64 CPU Cores ‣ .. and up to 1TB of RAM ‣ Fantastic Genomics/Chemistry system • A 256GB RAM version only costs $13,000 ‣ These single systems are replacing small clusters in some environments 54 Wednesday, October 31, 12
  • 55. Fat Nodes - Clever Scale-out Packaging ‣ This 2U chassis contains 4 individual servers ‣ Systems like this get near “blade” density without the price premium seen with proprietary blade packaging ‣ These “shrink” clusters in a major way or replace small ones 55 Wednesday, October 31, 12
  • 56. The other trend ... 56 Wednesday, October 31, 12
  • 57. “Serious” IT now in your wet lab ... ‣ Instruments used to ship with a Windows PC “instrument control workstation” ‣ As instruments get more powerful the “companion” hardware is starting to scale-up ‣ End result: very significant stuff that used to live in your datacenter is now being rolled into lab enviroments 57 Wednesday, October 31, 12
  • 58. “Serious” IT now in your wet lab ... ‣ You may be surpised what you find in your labs in ’12 ‣ ... can be problematic for a few reasons ... 1. IT support & backup 2. Power & cooling 3. Noise 4. Security 58 Wednesday, October 31, 12
  • 59. Networking Also not particularly worrisome ... 59 Wednesday, October 31, 12
  • 60. Networking ‣ Networking is also not super complicated ‣ It’s also fairly cheap & commoditized in ’12 ‣ There are three core uses for networks: 1. Communication between servers & services 2. Message passing within a single application 3. Sharing files and data between many clients 60 Wednesday, October 31, 12
  • 61. Networking 1 - Servers & Services ‣ Ethernet. Period. Enough said. ‣ Your only decision is between 10-Gig and 1-Gig ethernet ‣ 1-Gig Ethernet is pervasive and dirt cheap ‣ 10-Gig Ethernet is getting cheaper and on it’s way to becoming pervasive 61 Wednesday, October 31, 12
  • 62. Networking 1 - Ethernet ‣ Everything speaks ethernet ‣ 1-Gig is still the common interconnect for most things ‣ 10-Gig is the standard now for the “core” ‣ 10-Gig is the standard for top-of-rack and “aggregation” ‣ 10-Gig connections to “special” servers is the norm 62 Wednesday, October 31, 12
  • 63. Networking 2 - Message Passing ‣ Parallel applications can span many servers at once ‣ Communicate/coordinate via “message passing” ‣ Ethernet is fine for this but has a somewhat high latency between message packets ‣ Many apps can tolerate Ethernet-level latency; some applications clearly benefit from a message passing network with lower latency ‣ There used to be many competing alternatives ‣ Clear 2012 winner is “Infiniband” 63 Wednesday, October 31, 12
  • 64. Networking 2 - Message Passing ‣ The only things you need to know ... ‣ Infiniband is an expensive networking alternative that offers much lower latency than Ethernet ‣ You would only pay for and deploy an IB fabric if you had an application or use case that requires it. ‣ No big deal. It’s just “another” network. 64 Wednesday, October 31, 12
  • 65. Networking 3 - File Sharing ‣ For ‘Omics this is the primary focus area ‣ Overwhelming need for shared read/write access to files and data between instruments, HPC environment and researcher desktops ‣ In HPC environments you will often have a separate network just for file sharing traffic 65 Wednesday, October 31, 12
  • 66. Networking 3 - File Sharing ‣ Generic file sharing uses familiar NFS or Windows fileshare protocols. No big deal ‣ Always implemented over Ethernet although often a mixture of 10-Gig and 1-Gig connections • 10-Gig connections to the file servers, storage and edge switches; 1-gig connections to cluster nodes and user desktops ‣ Infiniband also has a presence here • Many “parallel” or “cluster” filesystems may talk to the clients via NFS-over-ethernet but internally the distributed components may use a private Infiband network for metadata and coordination. 66 Wednesday, October 31, 12
  • 67. Storage. (the hard bit ...) 67 Wednesday, October 31, 12
  • 68. Storage Setting the stage ... ‣ Life science is generating torrents of data ‣ Size and volume often dwarf all other research areas - particularly with Bioinformatics & Genomics work ‣ Big/Fast storage is not cheap and is not commodity ‣ There are many vendors and many ways to spectacularly waste tons of money ‣ And we still have an overwhelming need for storage that can be shared concurrently between many different users, systems and clients 68 Wednesday, October 31, 12
  • 69. Life Science “Data Deluge” ‣ Scare stories and shocking graphs getting tiresome ‣ We’ve been dealing with terabyte-scale lab instruments & data movement issues since 2004 • And somehow we’ve managed to survive ... ‣ Next few slides • Try to explain why storage does not stress me out all that much in 2012 ... 69 Wednesday, October 31, 12
  • 70. The sky is not falling. 1. You are not the Broad Institute or Sanger Center ‣ Overwhelming majority of us do not operate at Broad/ Sanger levels • These folks add 200+ TB a week in primary storage ‣ We still face challenges but the scale/scope is well within the bounds of what traditional IT technologies can handle ‣ We’ve been doing this for years • Many vendors, best practices, “war stories”, proven methods and just plain “people to talk to…” 70 Wednesday, October 31, 12
  • 71. The sky is not falling. 2. Instrument Sanity Beckons ‣ Yesteryear: Terascale .TIFF Tsunami ‣ Yesterday: RTA, in-instrument data reduction ‣ Today: Basecalls, BAMs & Outsourcing ‣ Tomorrow: Write directly to the cloud 71 Wednesday, October 31, 12
  • 72. The sky is not falling. 3. Peta-scale storage is not really exotic or unusual any more. ‣ Peta-scale storage has not been a risky exotic technology gamble for years now • A few years ago you’d be betting your career ‣ Today it’s just an engineering & budget exercise • Multiple vendors don’t find petascale requirements particularly troublesome and can deliver proven systems within weeks • $1M (or less in ’12) will get you 1PB from several top vendors ‣ However, still HARD to do BIG, FAST & SAFE • Hard but solvable; many resources & solutions out there 72 Wednesday, October 31, 12
  • 73. On the other hand ... 73 Wednesday, October 31, 12
  • 74. OMG! The Sky Is Falling! Maybe a little panic is appropriate ... 74 Wednesday, October 31, 12
  • 75. The sky IS falling! 1. Those @!*#&^@ Scientists ... ‣ As instrument output declines … ‣ Downstream storage consumption by end-user researchers is increasing rapidly ‣ Each new genome generates new data mashups, experiments, data interchange conversions, etc. ‣ MUCH harder to do capacity planning against human beings vs. instruments 75 Wednesday, October 31, 12
  • 76. The sky IS falling! 2. @!*#&^@ Scientific Leadership ... ‣ Sequencing is already a commodity ‣ NOBODY simply banks the savings ‣ EVERYBODY buys or does more 76 Wednesday, October 31, 12
  • 77. The sky IS falling! Gigabases vs. Moores Law OMG!! BIG SCARY GRAPH 2007 2008 2009 2010 2011 2012 : 77 Wednesday, October 31, 12
  • 78. The sky IS falling! 3. Uncomfortable truths ‣ Cost of acquiring data (genomes) falling faster than rate at which industry is increasing drive capacity ‣ Human researchers downstream of these datasets are also consuming more storage (and less predictably) ‣ High-scale labs must react or potentially have catastrophic issues in 2012-2013 78 Wednesday, October 31, 12
  • 79. The sky IS falling! 5. Something will have to break ... ‣ This is not sustainable • Downstream consumption exceeding instrument data reduction • Commoditization yielding more platforms • Chemistry moving faster than IT infrastructure • What the heck are we doing with all this sequence? 79 Wednesday, October 31, 12
  • 80. CRAM it. 80 Wednesday, October 31, 12
  • 81. The sky IS falling! CRAM it in 2012 ... ‣ Minor improvements are useless; order-of-magnitude needed ‣ Some people are talking about radical new methods – compressing against reference sequences and only storing the diffs • With a variable compression “quality budget” to spend on lossless techniques in the areas you care about ‣ http://biote.am/5v - Ewan Birney on “Compressing DNA” ‣ http://biote.am/5w - The actual CRAM paper ‣ If CRAM takes off, storage landscape will change 81 Wednesday, October 31, 12
  • 82. What comes next? Next 18 months will be really fun... 82 Wednesday, October 31, 12
  • 83. What comes next. The same rules apply for 2012 and beyond ... ‣ Accept that science changes faster than IT infrastructure ‣ Be glad you are not Broad/Sanger ‣ Flexibility, scalability and agility become the key requirements of research informatics platforms • Tiered storage is in your future ... ‣ Shared/concurrent access is still the overwhelming storage use case • We’ll still continue to use clustered, parallel and scale-out NAS solutions 83 Wednesday, October 31, 12
  • 84. What comes next. In the following year ... ‣ Many peta-scale capable systems deployed • Most will operate in the hundreds-of-TBs range ‣ Far more aggressive “data triage” • “.BAM only!” ‣ Genome compression via CRAM ‣ Even more data will sit untouched & unloved ‣ Growing need for tiers, HSM & even tape 84 Wednesday, October 31, 12
  • 85. What comes next. In the following year ... ‣ Broad, Sanger and others will pave the way with respect to metadata-aware & policy driven storage frameworks • And we’ll shamelessly copy a year or two later ‣ I’m still on my cloud storage kick • Economics are inescapable; Will be built into storage platforms, gateways & VMs • Amazon S3 is only a HTTP RESTful call away • Cloud will become “just another tier” 85 Wednesday, October 31, 12
  • 86. What comes next. Expect your storage to be smarter & more capable ... ‣ What do DDN, Panasas, Isilon, BlueArc, etc. have in common? • Under the hood they all run Unix or Unix-like OS’s on x86_64 architectures ‣ Some storage arrays can already run applications natively • More will follow • Likely a big trend for 2012 86 Wednesday, October 31, 12
  • 87. But what about today? 87 Wednesday, October 31, 12
  • 88. Still trying to avoid this. (100TB scientific data, no RAID, unsecured on lab benchtops ) 88 Wednesday, October 31, 12
  • 89. Flops, Failures & Freakouts Common storage mistakes ... 89 Wednesday, October 31, 12
  • 90. Flops, Failures & Freakouts #1 - Unchecked Enterprise Storage Architects ‣ Scientist: “My work is priceless, I must be able to access it at all times” ‣ Corporate/Enterprise Storage Guru: “Hmmm …you want high availability, huh?” ‣ System delivered: • 40TB Enterprise SAN • Asynchronous replication to remote site • Can’t scale, can’t do NFS easily • ~$500K per year in operational & maintenance costs 90 Wednesday, October 31, 12
  • 91. Flops, Failures & Freakouts #2 - Unchecked User Requirements ‣ Scientist: “I do bioinformatics, I am rate limited by the speed of file IO operations. Faster disk means faster science. “ ‣ System delivered: • Budget blown on top tier fastest-possible ‘Cadillac’ system ‣ Outcome: • System fills to capacity in 9 months; zero budget left. 91 Wednesday, October 31, 12
  • 92. Flops, Failures & Freakouts #3 - D.I.Y Cluster & Parallel Filesystems ‣ Common source of storage unhappiness ‣ Root cause: • Not enough pre-sales time spent on design and engineering • Choosing Open Source over Common Sense ‣ System as built: • Not enough metadata controllers • Issues with interconnect fabric • Poor selection & configuration of key components ‣ End result: • Poor performance or availability • High administrative/operational burden 92 Wednesday, October 31, 12
  • 93. Hard Lessons Learned What these tales tell us ... 93 Wednesday, October 31, 12
  • 94. Flops, Failures & Freakouts Hard Lessons Learned ‣ End-users are not precise with storage terms • “Extremely reliable” means no data loss; Not millions spent on 99.99999% high availability ‣ When true costs are explained: • Many research users will trade a small amount of uptime or availability for more capacity or capabilities • … will also often trade some level of performance in exchange for a huge win in capacity or capability 94 Wednesday, October 31, 12
  • 95. Flops, Failures & Freakouts Hard Lessons Learned ‣ End-users demand the world but are willing to compromise • Necessary for IT staff to really talk to them and understand work, needs and priorities • Also essential to explain true costs involved ‣ People demanding the “fastest” storage often don’t have actual metrics to back their assertions 95 Wednesday, October 31, 12
  • 96. Flops, Failures & Freakouts Hard Lessons Learned ‣ Software-based parallel or clustered file systems are non-trivial to correctly implement • Essential to involve experts in the initial design phase • Even if using ‘open source’ version … ‣ Commercial support is essential • And I say this as an open source zealot … 96 Wednesday, October 31, 12
  • 97. The road ahead My $.02 for 2012... 97 Wednesday, October 31, 12
  • 98. The Road Ahead Storage Trends & Tips for 2012 ‣ Peta-capable platforms required ‣ Scale-out NAS still the best fit ‣ Customers will no longer build one big scale-out NAS tier ‣ My ‘hack’ of using nearline spec storage as primary science tier is probably obsolete in ’12 ‣ Not everything is worth backing up ‣ Expect disruptive stuff 98 Wednesday, October 31, 12
  • 99. The Road Ahead Trends & Tips for 2012 ‣ Monolithic tiers no longer cut it • Changing science & instrument output patterns are to blame • We can’t get away with biasing towards capacity over performance any more ‣ pNFS should go mainstream in ’12 • { fantastic news } ‣ Tiered storage IS in your future • Multiple vendors & types 99 Wednesday, October 31, 12
  • 100. The Road Ahead Trends & Tips for 2012 ‣ Your storage will be able to run apps • Dedupe, cloud gateways & replication • ‘CRAM’ or similar compression • Storage Resource Brokers (iRODS) & metadata servers • HDFS/Hadoop hooks? • Lab, Data management & LIMS applications Drobo Appliance running BioTeam MiniLIMS internally... 100 Wednesday, October 31, 12
  • 101. The Road Ahead Trends & Tips for 2012 ‣ Hadoop / MapReduce / BigData • Just like GRID and CLOUD back in the day you’ll need a gas mask to survive the smog of hype and vendor press releases. • You still need to think about it • ... and have a roadmap for doing it • Deep, deep ties to your storage • Your users want/need it • My $.02? Fantastic cloud use case 101 Wednesday, October 31, 12
  • 102. Disruptive Technology Example 102 Wednesday, October 31, 12
  • 103. Backblaze Pod For Biotech 103 Wednesday, October 31, 12
  • 104. Backblaze: 100Tb for $12,000 104 Wednesday, October 31, 12
  • 105. Intro 1 Meta-Issues (The Big Picture) 2 Infrastructure Tour 3 Compute & HPC 4 Storage 5 Cloud & Big Data 6 105 Wednesday, October 31, 12
  • 106. The ‘C’ word Does a Bio-IT talk exist if it does not mention “the cloud”? 106 Wednesday, October 31, 12
  • 107. Defining the “C-word” ‣ Just like “Grid Computing” the “cloud” word has been diluted to almost uselessness thanks to hype, vendor FUD and lunatic marketing minions ‣ Helpful to define terms before talking seriously ‣ There are three types of cloud ‣ “IAAS”, “SAAS” & “PAAS” 107 Wednesday, October 31, 12
  • 108. Cloud Stuff ‣ Before I get nasty ... ‣ I am not an Amazon shill ‣ I am a jaded, cynical, zero-loyalty consumer of IT services and products that let me get #%$^ done ‣ Because I only get paid when my #%$^ works, I am picky about what tools I keep in my toolkit ‣ Amazon AWS is an infinitely cool tool 108 Wednesday, October 31, 12
  • 109. Cloud Stuff - SAAS ‣ SAAS = “Software as a Service” ‣ Think: ‣ gmail.com 109 Wednesday, October 31, 12
  • 110. Cloud Stuff - SAAS ‣ PAAS = “Platform as a Service” ‣ Think: ‣ https://basespace.illumina.com/ ‣ salesforce.com ‣ MS office365.com, Apple iCloud, etc. 110 Wednesday, October 31, 12
  • 111. Cloud Stuff - IAAS ‣ IAAS = “Infrastructure as a Service” ‣ Think: ‣ Amazon Web Services ‣ Microsoft Azure 111 Wednesday, October 31, 12
  • 112. Cloud Stuff - IAAS ‣ When I talk “cloud” I mean IAAS ‣ And right now in 2012 Amazon IS the IAAS cloud ‣ ... everyone else is a pretender 112 Wednesday, October 31, 12
  • 113. Cloud Stuff - Why IAAS ‣ IAAS clouds are the focal point for life science informatics • Although some vendors are now offering PAAS and SAAS options ... ‣ The “infrastructure” clouds give us the “building blocks” we can assemble into useful stuff ‣ Right now Amazon has the best & most powerful collection of “building blocks” ‣ The competition is years behind ... 113 Wednesday, October 31, 12
  • 114. A message for the cloud pretenders… Wednesday, October 31, 12
  • 115. No APIs? Not a cloud. Wednesday, October 31, 12
  • 116. No self-service? Not a cloud. Wednesday, October 31, 12
  • 117. Installing VMWare & excreting a press release? Not a cloud. Wednesday, October 31, 12
  • 118. I have to email a human? Not a cloud. Wednesday, October 31, 12
  • 119. ~50% failure rate when launching new servers? Stupid cloud. Wednesday, October 31, 12
  • 120. Block storage and virtual servers only? (barely) a cloud; Wednesday, October 31, 12
  • 121. Private Clouds My $.02 cents 121 Wednesday, October 31, 12
  • 122. Private Clouds in 2012: ‣ I’m no longer dismissing them as “utter crap” ‣ Usable & useful in certain situations ‣ Hype vs. Reality ratio still wacky ‣ Sensible only for certain shops • Have you seen what you have to do to your networks & gear? ‣ There are easier ways Wednesday, October 31, 12
  • 123. Private Clouds: My Advice for ‘12 ‣ Remain cynical (test vendor claims) ‣ Due Diligence still essential ‣ I personally would not deploy/buy anything that does not explicitly provide Amazon API compatibility Wednesday, October 31, 12
  • 124. Private Clouds: My Advice for ‘12 Most people are better off: 1. Adding VM platforms to existing HPC clusters & environments 2. Extending enterprise VM platforms to allow user self- service & server catalogs Wednesday, October 31, 12
  • 125. Cloud Advice My $.02 cents 125 Wednesday, October 31, 12
  • 126. Cloud Advice Don’t get left behind ‣ Research IT Organizations need a cloud strategy today ‣ Those that don’t will be bypassed by frustrated users ‣ IaaS cloud services are only a departmental credit card away ... and some senior scientists are too big to be fired for violating IT policy :) 126 Wednesday, October 31, 12
  • 127. Cloud Advice Design Patterns ‣ You actually need three tested cloud design patterns: ‣ (1) To handle ‘legacy’ scientific apps & workflows ‣ (2) The special stuff that is worth re-architecting ‣ (3) Hadoop & big data analytics 127 Wednesday, October 31, 12
  • 128. Cloud Advice Legacy HPC on the Cloud ‣ MIT StarCluster • http://web.mit.edu/star/cluster/ ‣ This is your baseline ‣ Extend as needed 128 Wednesday, October 31, 12
  • 129. Cloud Advice “Cloudy” HPC ‣ Some of our research workflows are important enough to be rewritten for “the cloud” and the advantages that a truly elastic & API-driven infrastructure can deliver ‣ This is where you have the most freedom ‣ Many published best practices you can borrow ‣ Amazon Simple Workflow Service (SWS) look sweet ‣ Good commercial options: Cycle Computing, etc. 129 Wednesday, October 31, 12
  • 130. Hadoop & “Big Data” ‣ Hadoop and “big data” need to be on your radar ‣ Be careful though, you’ll need a gas mask to avoid the smog of marketing and vapid hype ‣ The utility is real and this does represent the “future path” for analysis of large data sets 130 Wednesday, October 31, 12
  • 131. Cloud Advice - Hadoop & Big Data Big Data HPC ‣ It’s gonna be a MapReduce world, get used to it ‣ Little need to roll your own Hadoop in 2012 ‣ ISV & commercial ecosystem already healthy ‣ Multiple providers today; both onsite & cloud-based ‣ Often a slam-dunk cloud use case 131 Wednesday, October 31, 12
  • 132. Hadoop & “Big Data” What you need to know ‣ “Hadoop” and “Big Data” are now general terms ‣ You need to drill down to find out what people actually mean ‣ We are still in the period where senior mgmt. may demand “hadoop” or “big data” capability without any actual business or scientific need 132 Wednesday, October 31, 12
  • 133. Hadoop & “Big Data” What you need to know ‣ In broad terms you can break “Big Data” down into two very basic use cases: 1. Compute: Hadoop can be used as a very powerful platform for the analysis of very large data sets. The google search term here is “map reduce” 2. Data Stores: Hadoop is driving the development of very sophisticated “no-SQL” “non-Relational” databases and data query engines. The google search terms include “nosql”, “couchdb”, “hive”, “pig” & “mongodb”, etc. ‣ Your job is to figure out which type applies for the groups requesting “hadoop” or “big data” capability 133 Wednesday, October 31, 12
  • 134. High Throughput Science Hadoop vs traditional Linux Clusters ‣ Hadoop is a very complex beast ‣ It’s also the way of the future so you can’t ignore it ‣ Very tight dependency on moving the ‘compute’ as close as possible to the ‘data’ ‣ Hadoop clusters are just different enough that they do not integrate cleanly with traditional Linux HPC system ‣ Often treated as separate silo or punted to the cloud 134 Wednesday, October 31, 12
  • 135. Hadoop & “Big Data” What you need to know ‣ Hadoop is being driven by a small group of academics writing and releasing open source life science hadoop applications; ‣ Your people will want to run these codes ‣ In some academic environments you may find people wanting to develop on this platform 135 Wednesday, October 31, 12
  • 136. Cloud Data Movement My $.02 cents 136 Wednesday, October 31, 12
  • 137. Cloud Data Movement ‣ We’ve slung a ton of data in and out of the cloud ‣ We used to be big fans of physical media movement ‣ Remember these pictures? ‣ ... 137 Wednesday, October 31, 12
  • 138. Physical data movement station 1 138 Wednesday, October 31, 12
  • 139. Physical data movement station 2 139 Wednesday, October 31, 12
  • 140. “Naked” Data Movement 140 Wednesday, October 31, 12
  • 141. “Naked” Data Archive 141 Wednesday, October 31, 12
  • 142. Cloud Data Movement ‣ We’ve got a new story for 2012 ‣ And the next image shows why ... 142 Wednesday, October 31, 12
  • 143. March 2012 143 Wednesday, October 31, 12
  • 144. Cloud Data Movement Wow! ‣ With a 1GbE internet connection ... ‣ and using Aspera software .... ‣ We sustained 700 MB/sec for more than 7 hours freighting genomes into Amazon Web Services ‣ This is fast enough for many use cases, including genome sequencing core facilities* ‣ Chris Dwan’s webinar on this topic: http://biote.am/7e 144 Wednesday, October 31, 12
  • 145. Cloud Data Movement Wow! ‣ Results like this mean we now favor network-based data movement over physical media movement ‣ Large-scale physical data movement carries a high operational burden and consumes non-trivial staff time & resources 145 Wednesday, October 31, 12
  • 146. Cloud Data Movement There are three ways to do network data movement ... ‣ Buy software from Aspera and be done with it ‣ Attend the annual SuperComputing conference & see which student group wins the bandwidth challenge contest; use their code ‣ Get GridFTP from the Globus folks • Trend: At every single “data movement” talk I’ve been to in 2011 it seemed that any speaker who was NOT using Aspera was a very happy user of GridFTP. #notCoincidence 146 Wednesday, October 31, 12
  • 147. Putting it all together 147 Wednesday, October 31, 12
  • 148. Wrapping up IT may just be a means to an end but you need to get your head wrapped around it ‣ (1) So you use/buy/request the correct ‘stuff’ ‣ (2) So you don’t get cheated by a vendor ‣ (3) Because you need to understand your tools ‣ (4) Because trends in automation and orchestration are blurring the line between scientist & sysadmin 148 Wednesday, October 31, 12
  • 149. Wrapping up - Compute & Servers ‣ Servers and compute power are pretty straightforward ‣ You just need to know roughly what your preferred compute building blocks look like ‣ ... and what special purpose resources you require (GPUs, Large Memory, High Core Count, etc.) ‣ Some of you may also have to deal with sizing, cost and facility (power, cooling, space) issues as well 149 Wednesday, October 31, 12
  • 150. Wrapping up - Networking ‣ Networking is also not hugely painful thing ‣ Ethernet rules the land; you might have to pick and choose between 1-Gig and 10-Gig Ethernet ‣ Understand that special networking technologies like Infiniband offer advantages but they are expensive and need to be applied carefully (if at all) ‣ Knowing if your MPI apps are latency sensitive will help ‣ And remember that networking is used for multiple things (server communication, application message passing & file and data sharing) 150 Wednesday, October 31, 12
  • 151. Wrapping up - Storage ‣ If you are going to focus on one IT area, this is it ‣ It’s incredibly important for genomics and also incredibly complicated. Many ways to waste money or buy the ‘wrong’ stuff ‣ You may only have one chance to get it correct and may have to live with your decision for years ‣ Budget is finite. You have to balance “speed” vs “size” vs “expansion capacity” vs “high availibility” and more ... ‣ “Petabyte-capable Scale-out NAS” is usually the best starting point. You deviate away from NAS when scientific or technical requirements demand “something else”. 151 Wednesday, October 31, 12
  • 152. Wrapping up - Hadoop / Big Data ‣ Probably the way of the future for big-data analytics. It’s worth spending time to study; especially if you intend to develop software in the future ‣ Popular target for current and emerging high-scale genomics tools. If you want to use those tools you need to deploy Hadoop ‣ It’s complicated and still changing rapidly. It can be difficult to integrate into existing setups ‣ Be cynical about hype & test vendor claims 152 Wednesday, October 31, 12
  • 153. Wrapping up - Cloud ‣ Cloud is the future. The economics are inescapable and the advantages are compelling. ‣ The main obstacle holding back genomics is terabyte scale data movement. The cloud is horrible if you have to move 2TB of data before you can run 2Hrs of compute! ‣ Your future core facility may involve a comp bio lab without a datacenter at all. Some organizations are already 100% virtual and 100% cloud-based 153 Wednesday, October 31, 12
  • 154. The NGS cloud clincher. 700 mb/sec sustained for ~7 hours West Coast to East Coast USA 154 Wednesday, October 31, 12
  • 155. Wrapping up - Cloud, continued ‣ Understand that for the foreseeable future there are THREE distinct cloud architectures and design patterns. ‣ Vendors who push “100% hadoop” or “legacy free” solutions are idiots and should be shoved out the door. We will be running legacy codes and workflows for many years to come ‣ Your three design patterns on the cloud: • Legacy HPC systems (replicate traditional clusters in the cloud) • Hadoop • Cloudy (when you rewrite something to fully leverage cloud capability) 155 Wednesday, October 31, 12
  • 156. Thanks! Slides online at: http://slideshare.net/chrisdag/ 156 Wednesday, October 31, 12