This document discusses strategies for mapping informatics to the cloud. It provides 9 tips for doing so effectively. Tip 1 advises that high-performance computing and clouds require a new model where resources are dedicated to each application. Tip 2 recommends hybrid cloud approaches but cautions they are less usable than claimed and practical only sometimes. The document emphasizes the need to handle legacy codes in addition to new "big data" approaches.
13. Private Clouds in 2012:
• Hype vs. Reality ratio still wacky
• Sensible only for certain shops
• Have you seen what you have to do to your networks & gear?
• There are easier ways
14. Private Clouds: My Advice for „12
• Remain cynical (test vendor claims)
• Due Diligence still essential
• I personally would not deploy/buy
anything that does not explicitly provide
Amazon API compatibility
15. Private Clouds: My Advice for „12
• Most people are better off:
• Adding VM platforms to existing HPC
clusters & environments
• Extending enterprise VM platforms to
allow user self-service & server
catalogs
23. • Lots of aggressive marketing
• Lots of carefully constructed “case
studies” and prototypes
• The truth?
• Less usable than you‟ve been told
• Possible? Heck yeah.
• Practical? Only sometimes.
24. • Advice
• Be cynical
• Demand proof
• Test carefully
25. • Still want to do it?
• Buy it, don‟t build it
• Cycle Computing
• Univa
• BrightComputing
• …
26. • Follow the crowd
• In the real world we see:
• Separation between local
and cloud HPC resources
• Send your work to the
system most suitable
31. • In life science informatics
we have hundreds of codes
that will never be rewritten.
• We‟ll be needing them for
years to come.
32. • Advice:
• MapReduceish methods are
the future for big-data
informatics
• It will take years to get there
• We still have to deal with
legacy algorithms and codes
33. • You will need:
• A process for figuring out
when it‟s worthwhile to
rewrite/re-architect
• Tested cloud strategies for
handling three use cases
34. You need 3 cloud
architectures:
1. Legacy HPC
2. “Cloudy” HPC
3. Big Data HPC (Hadoop)
35. Legacy HPC on the cloud
• MIT StarCluster
• http://web.mit.edu/star/cluster/
• This is your baseline
• Extend as needed
36. “Cloudy” HPC
• Use this method when …
• It makes sense to rewrite or
rearchitect an HPC workflow to
better leverage modern cloud
capabilities
37. “Cloudy” HPC, continued
• Ditch the legacy compute farm model
• Leverage elastic scale-out tools (***)
• Spot Instances for elastic & cheap compute
• SimpleDB for job statekeeping
• SQS for job queues & workrflow “glue”
• SNS for message passing & monitoring
• S3 for input & output data
• Etc.
38. Big Data HPC
• It‟s gonna be a MapReduce world
• Little need to roll your own
• Ecosystem already healthy
• Multiple providers today
• Often a slam-dunk cloud use case
46. • Consistently getting easier
• Amazon is not a
bottleneck
• AWS Import/Export
• AWS Direct Connect
• Aspera has some amazing
stuff out right now
47. • Advice
• AWS Import/Export works well
• Size of pipe is not everything
• Sweat the small stuff
• Tracking, checksums, disk speed
• Dedicated workstations
• Secure media storage
51. • Advice for 2012
• BioTeam is dialing down our
advocacy of physical data
ingestion into the cloud
• Why?
• Operationally hard, expensive
and no longer strictly needed
54. • People trying to move data via
physical media quickly realize the
operational difficulties
• Bandwidth is cheaper than hiring
another body to manage physical
data ingestion & movement
• In 2012 we strongly recommend
network-based data movement
when at all possible
60. • Not much we can do except
engineer around it
• AWS compute cluster
instances are a huge step
forward
• AWS competitors take note
61. • We are not database nerds
• We care about more than
just random IO performance
• We need it all
• Random I/O
• Long sequential read/write
62. • Faster Storage Options
• Software RAID on EBS
• Various GlusterFS options
• Even if you optimize
everything, the virtual NICs
are still a bottleneck
63. • Big Shared Storage
• 10GbE nodes and NFS
• Software RAID sets
• GlusterFS or similar
• 2012: pNFS finally?