SlideShare ist ein Scribd-Unternehmen logo
1 von 68
Downloaden Sie, um offline zu lesen
Best Practices & Lessons
Learned Life Science Informatics & The Cloud
Tuesday, May 28, 13
2
I’m Chris.
I’m an infrastructure geek.
I work for the BioTeam.
Twitter: @chris_dag
Tuesday, May 28, 13
Who, what & why
BioTeam
‣ Independent consulting shop
‣ Staffed by scientists forced to
learn IT, SW & HPC to get our
own research done
‣ 12+ years bridging the “gap”
between science, IT & high
performance computing
‣ www.bioteam.net
3
Tuesday, May 28, 13
Seriously.
Listen to me at your own risk
‣ Clever people find multiple
solutions to common issues
‣ I’m fairly blunt, burnt-out and
cynical in my advanced age
‣ Significant portion of my work
has been done in demanding
production Biotech & Pharma
environments
‣ Filter my words accordingly
4
Tuesday, May 28, 13
Other 2013 Presentations ...
Bio-IT World Boston
5
Tuesday, May 28, 13
Bio-IT World Boston: “Multi-Tenant Research Clusters”
6
http://slideshare.net/chrisdag/
Tuesday, May 28, 13
Bio-IT World Boston: “HPC Trends from the trenches.”
7
http://slideshare.net/chrisdag/
Tuesday, May 28, 13
8
Meta: Why Cloud?
What the sales & marketing folks won’t tell you
Getting Practical
Intro
HPC Case Study
1
2
3
4
5
Tuesday, May 28, 13
9
The big picture
Why we need IaaS clouds ...
Tuesday, May 28, 13
Why life science needs infrastructure clouds
10
Big Picture
‣ HUGE revolution in the rate at which lab platforms are
being redesigned, improved & refreshed
• Example: CCD sensor upgrade on that confocal
microscopy rig just doubled your storage requirements
• Example: That 2D ultrasound imager is now a 3D imager
• Example: Illumina HiSeq upgrade just doubled the rate at
which you can acquire genomes. Massive downstream
increase in storage, compute & data movement needs
Tuesday, May 28, 13
11
The Central Problem Is ...
‣ Instrumentation & protocols are changing FAR FASTER
than we can refresh our Research-IT & Scientific
Computing infrastructure
• The science is changing month-to-month ...
• ... while our IT infrastructure only gets refreshed every 2-7
years
‣ We have to design systems TODAY that can support
unknown research requirements & workflows over many
years (gulp ...)
Tuesday, May 28, 13
12
The Central Problem Is ...
‣ The easy period is over
‣ 5 years ago you could toss inexpensive storage and
servers at the problem; even in a nearby closet or under
a lab bench if necessary
‣ That does not work any more; real solutions required
Tuesday, May 28, 13
13
And a related problem ...
‣ It has never been easier to acquire vast amounts of data
cheaply and easily
‣ Growth rate of data creation/ingest exceeds rate at
which the storage industry is improving disk capacity
‣ Not just a storage lifecycle problem. This data *moves*
and often needs to be shared among multiple entities
and providers
• ... ideally without punching holes in your firewall or
consuming all available internet bandwidth
Tuesday, May 28, 13
If you get it wrong ...
‣ Lost opportunity
‣ Missing capability
‣ Frustrated & very vocal scientific staff
‣ Problems in recruiting, retention,
publication & product development
14
Tuesday, May 28, 13
15
IaaS to the Rescue
Tuesday, May 28, 13
IaaS solves the current critical “Research IT” dilemma
16
Why Cloud?
‣ IaaS clouds let us react and
respond to scientific
requirements that change far
faster than we can refresh
local datacenters and
enterprise IT platforms
Image: shanelin via Flickr
Tuesday, May 28, 13
Beyond capability and agility gains ...
17
Why Cloud?
‣ The economic benefits are real, inescapable and
trending in the proper direction
‣ Internet-scale providers with millions of cores and
exabytes of spinning disk spanning the globe
leverage operational efficiencies you will never come
close to matching internally
‣ ... be suspicious of people who claim otherwise
Tuesday, May 28, 13
Also ...
18
Why Cloud?
‣ Clouds becoming a natural
place for data exchange &
access
‣ “scriptable everything”
enables entirely new
capabilities not possible
internally*
‣ Finance people love converting
CapEx to OpEx
Tuesday, May 28, 13
19
Meta: Why Cloud?
What the sales & marketing folks won’t tell you
Getting Practical
Intro
HPC Case Study
1
2
3
4
5
Tuesday, May 28, 13
What the salesfolk won’t tell you ...
20
‣ There is no one-size-fits-all research
design pattern ...
‣ You are not going to toss everything
and replace it with “Big Data”
‣ Very few of us have a single pipeline or
workflow that we can devote endless
engineering effort to
‣ We are not going to toss out hundreds
of legacy codes and rewrite everything
for GPUs or MapReduce
‣ For research HPC it’s all about the
building blocks { and how we can
effectively use/deploy them }
Tuesday, May 28, 13
21
What the salesfolk won’t tell you
‣ Your organization actually needs THREE tested cloud
design patterns:
‣ (1) To handle ‘legacy’ scientific apps & workflows
‣ (2) The special stuff that is worth re-architecting
‣ (3) Hadoop & big data analytics
Tuesday, May 28, 13
Legacy HPC on the Cloud
22
Design Pattern #1 - Legacy
‣ There are many hundreds of
existing algorithms and
applications in the life science
informatics space
‣ We’ll be running/using these
codes for years to come
‣ Many can’t or will never be
refactored or rewritten
‣ I call this the “legacy” design
pattern
Tuesday, May 28, 13
23
One Easy Solution.
Tuesday, May 28, 13
StarCluster
24
Design Pattern #1 - Legacy
‣ MIT StarCluster
• http://web.mit.edu/star/cluster/
‣ Infinite Awesomeness. Worth a talk by itself.
‣ This is your baseline
‣ Extend as needed
Tuesday, May 28, 13
25
Design Pattern #2 - “Cloudy”
‣ Some of our research workflows are important enough to
be rewritten for “the cloud” and the advantages that a
truly elastic & API-driven infrastructure can deliver
‣ This is where you have the most freedom
‣ Many published best practices you can borrow
‣ Warning: Cloud vendor lock-in potential is strongest here
Tuesday, May 28, 13
26
Design Pattern #3 - Hadoop/BigData
‣ Hadoop and “big data” need to be on your radar
‣ Be careful though, you’ll need a gas mask to avoid the
smog of marketing and vapid hype
‣ The utility is real and this does represent one “future
path” for analysis of large data sets
Tuesday, May 28, 13
27
Design Pattern #3 - Hadoop/BigData
‣ It’s going to be a MapReduce world, get used to it
‣ Little need to roll your own Hadoop in 2013
‣ ISV & commercial ecosystem already healthy
‣ Multiple providers today; both onsite & cloud-based
‣ Often a slam-dunk cloud use case
Tuesday, May 28, 13
What you need to know
28
Design Pattern #3 - Hadoop/BigData
‣ “Hadoop” and “Big Data” are now general terms
‣ You need to drill down to find out what people actually
mean
‣ We are still in the period where senior leadership may
demand “Hadoop” or “BigData” capability without any
actual business or scientific need
Tuesday, May 28, 13
What you need to know
29
Hadoop & “Big Data”
‣ In broad terms you can break “Big Data” down into two very
basic use cases:
1. Compute: Hadoop can be used as a very powerful platform for
the analysis of very large data sets. The google search term
here is “map reduce”
2. Data Stores: Hadoop is driving the development of very
sophisticated “no-SQL” “non-Relational” databases and data
query engines. The google search terms include “nosql”,
“couchdb”, “hive”, “pig” & “mongodb”, etc.
‣ Your job is to figure out which type applies for the groups
requesting “Hadoop” or “BigData” capability
Tuesday, May 28, 13
Hadoop vs traditional Linux Clusters
30
High Throughput Science
‣ Hadoop is a very complex beast
‣ It’s also the way of the future so you can’t ignore it
‣ Very tight dependency on moving the ‘compute’ as close
as possible to the ‘data’
‣ Hadoop clusters are just different enough that they do
not integrate cleanly with traditional Linux HPC system
‣ Often treated as separate silo or punted to the cloud
Tuesday, May 28, 13
What you need to know
31
Hadoop & “Big Data”
‣ Hadoop is being driven by a small group of academics
writing and releasing open source life science hadoop
applications;
‣ Your people will want to run these codes
‣ In some academic environments you may find people
wanting to develop on this platform
Tuesday, May 28, 13
32
Meta: Why Cloud?
What the sales & marketing folks won’t tell you
Getting Practical
Intro
HPC Case Study
1
2
3
4
5
Tuesday, May 28, 13
Strategy
33
Practical Advice
‣ Research oriented IT organizations need a cloud strategy
today; or risk being bypassed by employees
Tuesday, May 28, 13
Design Patterns
34
Practical Advice
‣ Remember the three design patterns on the cloud:
• Legacy HPC systems
(replicate traditional clusters in the cloud)
• Hadoop
• Cloudy
(when you rewrite something to fully leverage cloud
capability)
Tuesday, May 28, 13
Policies and Procedures
35
Practical Advice
‣ Cloud technology bits are easy. Cloud Process and Policy
discussions take forever
‣ Start these conversations sooner rather than later!
Tuesday, May 28, 13
Core services that take time and advance planning
36
Practical Advice
‣ A few of key foundational cloud services take time and
advanced planning to deploy properly:
‣ VPNs & subnet schemes
‣ Identity Management & Access Control
‣ Data Movement
Tuesday, May 28, 13
Data Movemement
37
Practical Advice
‣ A few words & pictures on data movement ...
Tuesday, May 28, 13
38
Physical data movement station 1
Tuesday, May 28, 13
39
Physical data movement station 2
Tuesday, May 28, 13
40
“Naked” Data Movement
Tuesday, May 28, 13
41
“Naked” Data Archive
Tuesday, May 28, 13
42
Cloud Data Movement
‣ Things changed pretty definitively in 2012
‣ And the next image shows why ...
Tuesday, May 28, 13
43
March 2012
Tuesday, May 28, 13
Network vs. Physical
Cloud Data Movement
‣ With a 1GbE internet connection ...
‣ and using Aspera software ....
‣ We sustained 700 MB/sec for more than 7 hours
freighting genomes into Amazon Web Services
‣ This is fast enough for many use cases, including
genome sequencing core facilities*
‣ Chris Dwan’s webinar on this topic:
http://biote.am/7e
44
Tuesday, May 28, 13
Network vs. Physical
Cloud Data Movement
‣ Results like this mean we now favor network-based data
movement over physical media movement
‣ Large-scale physical data movement carries a high
operational burden and consumes non-trivial staff time &
resources
45
Tuesday, May 28, 13
There are three ways to do network data movement ...
Cloud Data Movement
‣ Buy software from Aspera and be done with it
‣ Attend the annual SuperComputing conference & see
which student group wins the bandwidth challenge
contest; use their code
‣ Get GridFTP from the Globus folks
46
Tuesday, May 28, 13
SysAdmin vs Programmer
47
Practical Advice
‣ Recognize the blurring line between
IT / Informatics / SW Engineer
‣ ... and how it may mix up your org chart
Tuesday, May 28, 13
Very blurry lines in 2013 for all of these roles
48
Scientist/SysAdmin/Programmer
‣ Radical change in last ~2 years
for how IT is provisioned,
delivered, managed & supported
‣ Root cause (Technology)
Virtualization & Cloud
‣ Root Cause (Operations)
Configuration Mgmt, Systems
Orchestration & Infrastructure
Automation
‣ SysAdmins & IT staff need to re-
skill and retrain to stay relevant
Tuesday, May 28, 13
Very blurry lines in 2013 for all of these roles
49
Scientist/SysAdmin/Programmer
‣ When everything has an API ..
‣ .. anything can be
‘orchestrated’ or ‘automated’
remotely
‣ And by the way ...
‣ The APIs (‘knobs & buttons’)
are accessible to all
Tuesday, May 28, 13
Very blurry lines in 2013 for all of these roles
50
Scientist/SysAdmin/Programmer
‣ IT jobs, roles and
responsibilities are
undergoing rapid
upheaval
‣ SysAdmins must learn to
program in order to
harness automation tools
‣ Programmers & Scientists
can now self-provision
and control sophisticated
IT resources
Tuesday, May 28, 13
Very blurry lines in 2012 for all of these roles
51
Scientist/SysAdmin/Programmer
‣ My take on the future ...
‣ Far more control is going into the
hands of the research end user
‣ IT support roles will radically
change -- no longer owners or
gatekeepers
‣ IT will handle policies,
procedures, reference patterns ,
security & best practices
‣ Researchers will control the
“what”, “when” and “how big”
Tuesday, May 28, 13
52
Thanks! Email: chris@bioteam.net
http://slideshare.net/chrisdag/
Tuesday, May 28, 13
53
Cloud HPC Case Study
Time Permitting ...
Tuesday, May 28, 13
Next Generation Nuclear Magnetic Resonance
54
NMR Probehead Simulation on AWS
‣ CAE Simulation Project
‣ via www.hpcexperiment.com
‣ Software: CST Studio 2012
‣ My role: Volunteer HPC Mentor
Tuesday, May 28, 13
Simulating next-generation NMR probeheads
55
Why this was an interesting project
‣ Frontend interface is graphics
heavy and requires Windows
‣ Studio ‘solvers’ run Linux or
Windows; support GPUs and MPI
task distribution
‣ Simultaneous use of local and
cloud-based solvers actually works
‣ flexLM license server involved
‣ Non-trivial security and geo-
location requirements
Tuesday, May 28, 13
56
When we ran at modest scale ...
16 large compute nodes + 22 GPU nodes
$30/hour on AWS Spot Market.
HPC on the cloud is real.
Tuesday, May 28, 13
Design Attempt #1
57
‣ Hybrid Linux/Windows cloud running in AWS EU Region
‣ Failure:
• No GPU nodes in EU at the time
• No cc2.4xlarge at the time
Tuesday, May 28, 13
Design Attempt #2
58
‣ Move Hybrid Linux/Windows system to US-EAST
‣ ... with synthetic test data
‣ Best-practices VPC isolation & VPN access
‣ It looked like this ...
Tuesday, May 28, 13
Architecture #2 59
Tuesday, May 28, 13
Design Attempt #2
60
‣ Attempt #2 Failed:
‣ CST FrontEnd Controller running at end-user site could
not tolerate NAT translation used by solvers
‣ No GPU nodes available within VPC at that time
Tuesday, May 28, 13
Design Attempt #3
61
‣ Design #3 Finally works
‣ VPC shrunk to single license server running in US EAST
‣ All Windows/Linux/GPU solover nodes running in EU
‣ NO NAT, NO VPC For Solvers
‣ Extensive use of AWS spot instance servers
Tuesday, May 28, 13
At experiment end it looked like this ... 62
Tuesday, May 28, 13
63
Non Trivial HPC on the Cloud
16 large compute nodes + 22 GPU nodes
$30/hour on AWS Spot Market.
Tuesday, May 28, 13
Why this work was ‘easy’ on Amazon AWS ...
64
Nightmare on any other cloud
‣ Lets discuss why this simulation workload would be
much, much harder to do on some other cloud
platform ...
Tuesday, May 28, 13
Why this work was ‘easy’ on Amazon AWS ...
65
Nightmare on any other cloud
1. Virtual Servers
2. Block Storage
3. Object Storage
4. ... and maybe some other
stuff if I’m lucky
‣ EC2, S3, EBS, RDS, SNS,
SQS, SWS, GPUs, SSDs,
CloudFormation, VPC, ENIs,
SecurityGroups, 10GbE
DirectConnect, Reserved
Instances, ImportExport,
Spot Market
‣ And ~25 other products and
service features with more
added monthly
‘Brand X’ Cloud AWS
Tuesday, May 28, 13
Easy on AWS; much harder elsewhere
One very specific example
66
‣ The widely used FLEXlm
license server uses NIC
MAC addresses when
generating license keys
‣ Different MAC? Science
stops. Screwed.
‣ VPC ENIs allow separation
of MAC address from
Network Interface.
Badass.
Tuesday, May 28, 13
Why this work was ‘easy’ on Amazon AWS ...
A few other examples ...
67
VPC
Spot Market
cc* & cg*
ec2 instance
types
Incredibly powerful. Actually useful.
Approachable even if you are not an IPSEC or BGP
routing god.
Compelling economics. Once you start you’ll likely
never run anywhere else.
The competition can’t compete.
Fat nodes with bidirectional 10GbE bandwidth.
And don’t get me started on SSD or Provisioned-
performance EBS volumes.
Tuesday, May 28, 13
68
Thanks!
Email: chris@Bioteam.net
Tuesday, May 28, 13

Weitere ähnliche Inhalte

Was ist angesagt?

Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except usmark madsen
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?mark madsen
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogiesmark madsen
 
How big data tranform your business? Data Science Thailand Meet up #6
How big data tranform your business? Data Science Thailand Meet up #6How big data tranform your business? Data Science Thailand Meet up #6
How big data tranform your business? Data Science Thailand Meet up #6Data Science Thailand
 
Taming Big Science Data Growth with Converged Infrastructure
Taming Big Science Data Growth with Converged InfrastructureTaming Big Science Data Growth with Converged Infrastructure
Taming Big Science Data Growth with Converged InfrastructureThe BioTeam Inc.
 
BioIT World 2016 - HPC Trends from the Trenches
BioIT World 2016 - HPC Trends from the TrenchesBioIT World 2016 - HPC Trends from the Trenches
BioIT World 2016 - HPC Trends from the TrenchesChris Dagdigian
 
Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersChris Dagdigian
 
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Data Science Thailand
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)Prof. Dr. Diego Kuonen
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionmark madsen
 
Big data - Aditya Yadav
Big data - Aditya YadavBig data - Aditya Yadav
Big data - Aditya YadavAditya Yadav
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudChris Dagdigian
 
Quantitative Methods for Lawyers - R Boot Camp Bonus Module - Professor Danie...
Quantitative Methods for Lawyers - R Boot Camp Bonus Module - Professor Danie...Quantitative Methods for Lawyers - R Boot Camp Bonus Module - Professor Danie...
Quantitative Methods for Lawyers - R Boot Camp Bonus Module - Professor Danie...Daniel Katz
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerMicrosoft
 
Data Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and AnalyticsData Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and AnalyticsDomino Data Lab
 
Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?Gregg Barrett
 
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Josh Patterson
 

Was ist angesagt? (20)

Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except us
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?
 
HadoopWorkshopJuly2014
HadoopWorkshopJuly2014HadoopWorkshopJuly2014
HadoopWorkshopJuly2014
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogies
 
How big data tranform your business? Data Science Thailand Meet up #6
How big data tranform your business? Data Science Thailand Meet up #6How big data tranform your business? Data Science Thailand Meet up #6
How big data tranform your business? Data Science Thailand Meet up #6
 
Taming Big Science Data Growth with Converged Infrastructure
Taming Big Science Data Growth with Converged InfrastructureTaming Big Science Data Growth with Converged Infrastructure
Taming Big Science Data Growth with Converged Infrastructure
 
BioIT World 2016 - HPC Trends from the Trenches
BioIT World 2016 - HPC Trends from the TrenchesBioIT World 2016 - HPC Trends from the Trenches
BioIT World 2016 - HPC Trends from the Trenches
 
Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC Clusters
 
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collection
 
Big data - Aditya Yadav
Big data - Aditya YadavBig data - Aditya Yadav
Big data - Aditya Yadav
 
How to hack into the big data team
How to hack into the big data teamHow to hack into the big data team
How to hack into the big data team
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the Cloud
 
Quantitative Methods for Lawyers - R Boot Camp Bonus Module - Professor Danie...
Quantitative Methods for Lawyers - R Boot Camp Bonus Module - Professor Danie...Quantitative Methods for Lawyers - R Boot Camp Bonus Module - Professor Danie...
Quantitative Methods for Lawyers - R Boot Camp Bonus Module - Professor Danie...
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 
Data Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and AnalyticsData Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and Analytics
 
Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?Data Science Introduction - Data Science: What Art Thou?
Data Science Introduction - Data Science: What Art Thou?
 
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
 

Ähnlich wie Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned

2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZ2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZChris Dagdigian
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019Chris Dagdigian
 
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Chris Dagdigian
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingPaco Nathan
 
2013: Trends from the Trenches
2013: Trends from the Trenches2013: Trends from the Trenches
2013: Trends from the TrenchesChris Dagdigian
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumVMware Tanzu
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessionsJessicaMurrell3
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar ibi
 
Sql saturday el salvador 2016 - Me, A Data Scientist?
Sql saturday el salvador 2016 - Me, A Data Scientist?Sql saturday el salvador 2016 - Me, A Data Scientist?
Sql saturday el salvador 2016 - Me, A Data Scientist?Fabricio Quintanilla
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackAnant Corporation
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Trieu Nguyen
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computingViet-Trung TRAN
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 
Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?Guido Schmutz
 
Industrial production process visualization with the Elastic Stack in real-ti...
Industrial production process visualization with the Elastic Stack in real-ti...Industrial production process visualization with the Elastic Stack in real-ti...
Industrial production process visualization with the Elastic Stack in real-ti...Elasticsearch
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAmpoolIO
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)Denodo
 

Ähnlich wie Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned (20)

2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZ2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZ
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019
 
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely heading
 
2013: Trends from the Trenches
2013: Trends from the Trenches2013: Trends from the Trenches
2013: Trends from the Trenches
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessions
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar
 
Sql saturday el salvador 2016 - Me, A Data Scientist?
Sql saturday el salvador 2016 - Me, A Data Scientist?Sql saturday el salvador 2016 - Me, A Data Scientist?
Sql saturday el salvador 2016 - Me, A Data Scientist?
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)
 
Fundamentals of Big Data
Fundamentals of Big DataFundamentals of Big Data
Fundamentals of Big Data
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computing
 
Viet stack 2nd meetup - BigData in Cloud Computing
Viet stack 2nd meetup - BigData in Cloud ComputingViet stack 2nd meetup - BigData in Cloud Computing
Viet stack 2nd meetup - BigData in Cloud Computing
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?
 
Industrial production process visualization with the Elastic Stack in real-ti...
Industrial production process visualization with the Elastic Stack in real-ti...Industrial production process visualization with the Elastic Stack in real-ti...
Industrial production process visualization with the Elastic Stack in real-ti...
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)
 

Mehr von Chris Dagdigian

2021 Trends from the Trenches
2021 Trends from the Trenches2021 Trends from the Trenches
2021 Trends from the TrenchesChris Dagdigian
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte PushingChris Dagdigian
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
 
Cloud Security for Life Science R&D
Cloud Security for Life Science R&DCloud Security for Life Science R&D
Cloud Security for Life Science R&DChris Dagdigian
 
AWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating ResearchAWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating ResearchChris Dagdigian
 
Bio-IT for Core Facility Managers
Bio-IT for Core Facility ManagersBio-IT for Core Facility Managers
Bio-IT for Core Facility ManagersChris Dagdigian
 
Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Chris Dagdigian
 
2012: Trends from the Trenches
2012: Trends from the Trenches2012: Trends from the Trenches
2012: Trends from the TrenchesChris Dagdigian
 
Practical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationPractical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationChris Dagdigian
 

Mehr von Chris Dagdigian (9)

2021 Trends from the Trenches
2021 Trends from the Trenches2021 Trends from the Trenches
2021 Trends from the Trenches
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte Pushing
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
 
Cloud Security for Life Science R&D
Cloud Security for Life Science R&DCloud Security for Life Science R&D
Cloud Security for Life Science R&D
 
AWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating ResearchAWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating Research
 
Bio-IT for Core Facility Managers
Bio-IT for Core Facility ManagersBio-IT for Core Facility Managers
Bio-IT for Core Facility Managers
 
Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)
 
2012: Trends from the Trenches
2012: Trends from the Trenches2012: Trends from the Trenches
2012: Trends from the Trenches
 
Practical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationPractical Cloud & Workflow Orchestration
Practical Cloud & Workflow Orchestration
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned

  • 1. Best Practices & Lessons Learned Life Science Informatics & The Cloud Tuesday, May 28, 13
  • 2. 2 I’m Chris. I’m an infrastructure geek. I work for the BioTeam. Twitter: @chris_dag Tuesday, May 28, 13
  • 3. Who, what & why BioTeam ‣ Independent consulting shop ‣ Staffed by scientists forced to learn IT, SW & HPC to get our own research done ‣ 12+ years bridging the “gap” between science, IT & high performance computing ‣ www.bioteam.net 3 Tuesday, May 28, 13
  • 4. Seriously. Listen to me at your own risk ‣ Clever people find multiple solutions to common issues ‣ I’m fairly blunt, burnt-out and cynical in my advanced age ‣ Significant portion of my work has been done in demanding production Biotech & Pharma environments ‣ Filter my words accordingly 4 Tuesday, May 28, 13
  • 5. Other 2013 Presentations ... Bio-IT World Boston 5 Tuesday, May 28, 13
  • 6. Bio-IT World Boston: “Multi-Tenant Research Clusters” 6 http://slideshare.net/chrisdag/ Tuesday, May 28, 13
  • 7. Bio-IT World Boston: “HPC Trends from the trenches.” 7 http://slideshare.net/chrisdag/ Tuesday, May 28, 13
  • 8. 8 Meta: Why Cloud? What the sales & marketing folks won’t tell you Getting Practical Intro HPC Case Study 1 2 3 4 5 Tuesday, May 28, 13
  • 9. 9 The big picture Why we need IaaS clouds ... Tuesday, May 28, 13
  • 10. Why life science needs infrastructure clouds 10 Big Picture ‣ HUGE revolution in the rate at which lab platforms are being redesigned, improved & refreshed • Example: CCD sensor upgrade on that confocal microscopy rig just doubled your storage requirements • Example: That 2D ultrasound imager is now a 3D imager • Example: Illumina HiSeq upgrade just doubled the rate at which you can acquire genomes. Massive downstream increase in storage, compute & data movement needs Tuesday, May 28, 13
  • 11. 11 The Central Problem Is ... ‣ Instrumentation & protocols are changing FAR FASTER than we can refresh our Research-IT & Scientific Computing infrastructure • The science is changing month-to-month ... • ... while our IT infrastructure only gets refreshed every 2-7 years ‣ We have to design systems TODAY that can support unknown research requirements & workflows over many years (gulp ...) Tuesday, May 28, 13
  • 12. 12 The Central Problem Is ... ‣ The easy period is over ‣ 5 years ago you could toss inexpensive storage and servers at the problem; even in a nearby closet or under a lab bench if necessary ‣ That does not work any more; real solutions required Tuesday, May 28, 13
  • 13. 13 And a related problem ... ‣ It has never been easier to acquire vast amounts of data cheaply and easily ‣ Growth rate of data creation/ingest exceeds rate at which the storage industry is improving disk capacity ‣ Not just a storage lifecycle problem. This data *moves* and often needs to be shared among multiple entities and providers • ... ideally without punching holes in your firewall or consuming all available internet bandwidth Tuesday, May 28, 13
  • 14. If you get it wrong ... ‣ Lost opportunity ‣ Missing capability ‣ Frustrated & very vocal scientific staff ‣ Problems in recruiting, retention, publication & product development 14 Tuesday, May 28, 13
  • 15. 15 IaaS to the Rescue Tuesday, May 28, 13
  • 16. IaaS solves the current critical “Research IT” dilemma 16 Why Cloud? ‣ IaaS clouds let us react and respond to scientific requirements that change far faster than we can refresh local datacenters and enterprise IT platforms Image: shanelin via Flickr Tuesday, May 28, 13
  • 17. Beyond capability and agility gains ... 17 Why Cloud? ‣ The economic benefits are real, inescapable and trending in the proper direction ‣ Internet-scale providers with millions of cores and exabytes of spinning disk spanning the globe leverage operational efficiencies you will never come close to matching internally ‣ ... be suspicious of people who claim otherwise Tuesday, May 28, 13
  • 18. Also ... 18 Why Cloud? ‣ Clouds becoming a natural place for data exchange & access ‣ “scriptable everything” enables entirely new capabilities not possible internally* ‣ Finance people love converting CapEx to OpEx Tuesday, May 28, 13
  • 19. 19 Meta: Why Cloud? What the sales & marketing folks won’t tell you Getting Practical Intro HPC Case Study 1 2 3 4 5 Tuesday, May 28, 13
  • 20. What the salesfolk won’t tell you ... 20 ‣ There is no one-size-fits-all research design pattern ... ‣ You are not going to toss everything and replace it with “Big Data” ‣ Very few of us have a single pipeline or workflow that we can devote endless engineering effort to ‣ We are not going to toss out hundreds of legacy codes and rewrite everything for GPUs or MapReduce ‣ For research HPC it’s all about the building blocks { and how we can effectively use/deploy them } Tuesday, May 28, 13
  • 21. 21 What the salesfolk won’t tell you ‣ Your organization actually needs THREE tested cloud design patterns: ‣ (1) To handle ‘legacy’ scientific apps & workflows ‣ (2) The special stuff that is worth re-architecting ‣ (3) Hadoop & big data analytics Tuesday, May 28, 13
  • 22. Legacy HPC on the Cloud 22 Design Pattern #1 - Legacy ‣ There are many hundreds of existing algorithms and applications in the life science informatics space ‣ We’ll be running/using these codes for years to come ‣ Many can’t or will never be refactored or rewritten ‣ I call this the “legacy” design pattern Tuesday, May 28, 13
  • 24. StarCluster 24 Design Pattern #1 - Legacy ‣ MIT StarCluster • http://web.mit.edu/star/cluster/ ‣ Infinite Awesomeness. Worth a talk by itself. ‣ This is your baseline ‣ Extend as needed Tuesday, May 28, 13
  • 25. 25 Design Pattern #2 - “Cloudy” ‣ Some of our research workflows are important enough to be rewritten for “the cloud” and the advantages that a truly elastic & API-driven infrastructure can deliver ‣ This is where you have the most freedom ‣ Many published best practices you can borrow ‣ Warning: Cloud vendor lock-in potential is strongest here Tuesday, May 28, 13
  • 26. 26 Design Pattern #3 - Hadoop/BigData ‣ Hadoop and “big data” need to be on your radar ‣ Be careful though, you’ll need a gas mask to avoid the smog of marketing and vapid hype ‣ The utility is real and this does represent one “future path” for analysis of large data sets Tuesday, May 28, 13
  • 27. 27 Design Pattern #3 - Hadoop/BigData ‣ It’s going to be a MapReduce world, get used to it ‣ Little need to roll your own Hadoop in 2013 ‣ ISV & commercial ecosystem already healthy ‣ Multiple providers today; both onsite & cloud-based ‣ Often a slam-dunk cloud use case Tuesday, May 28, 13
  • 28. What you need to know 28 Design Pattern #3 - Hadoop/BigData ‣ “Hadoop” and “Big Data” are now general terms ‣ You need to drill down to find out what people actually mean ‣ We are still in the period where senior leadership may demand “Hadoop” or “BigData” capability without any actual business or scientific need Tuesday, May 28, 13
  • 29. What you need to know 29 Hadoop & “Big Data” ‣ In broad terms you can break “Big Data” down into two very basic use cases: 1. Compute: Hadoop can be used as a very powerful platform for the analysis of very large data sets. The google search term here is “map reduce” 2. Data Stores: Hadoop is driving the development of very sophisticated “no-SQL” “non-Relational” databases and data query engines. The google search terms include “nosql”, “couchdb”, “hive”, “pig” & “mongodb”, etc. ‣ Your job is to figure out which type applies for the groups requesting “Hadoop” or “BigData” capability Tuesday, May 28, 13
  • 30. Hadoop vs traditional Linux Clusters 30 High Throughput Science ‣ Hadoop is a very complex beast ‣ It’s also the way of the future so you can’t ignore it ‣ Very tight dependency on moving the ‘compute’ as close as possible to the ‘data’ ‣ Hadoop clusters are just different enough that they do not integrate cleanly with traditional Linux HPC system ‣ Often treated as separate silo or punted to the cloud Tuesday, May 28, 13
  • 31. What you need to know 31 Hadoop & “Big Data” ‣ Hadoop is being driven by a small group of academics writing and releasing open source life science hadoop applications; ‣ Your people will want to run these codes ‣ In some academic environments you may find people wanting to develop on this platform Tuesday, May 28, 13
  • 32. 32 Meta: Why Cloud? What the sales & marketing folks won’t tell you Getting Practical Intro HPC Case Study 1 2 3 4 5 Tuesday, May 28, 13
  • 33. Strategy 33 Practical Advice ‣ Research oriented IT organizations need a cloud strategy today; or risk being bypassed by employees Tuesday, May 28, 13
  • 34. Design Patterns 34 Practical Advice ‣ Remember the three design patterns on the cloud: • Legacy HPC systems (replicate traditional clusters in the cloud) • Hadoop • Cloudy (when you rewrite something to fully leverage cloud capability) Tuesday, May 28, 13
  • 35. Policies and Procedures 35 Practical Advice ‣ Cloud technology bits are easy. Cloud Process and Policy discussions take forever ‣ Start these conversations sooner rather than later! Tuesday, May 28, 13
  • 36. Core services that take time and advance planning 36 Practical Advice ‣ A few of key foundational cloud services take time and advanced planning to deploy properly: ‣ VPNs & subnet schemes ‣ Identity Management & Access Control ‣ Data Movement Tuesday, May 28, 13
  • 37. Data Movemement 37 Practical Advice ‣ A few words & pictures on data movement ... Tuesday, May 28, 13
  • 38. 38 Physical data movement station 1 Tuesday, May 28, 13
  • 39. 39 Physical data movement station 2 Tuesday, May 28, 13
  • 42. 42 Cloud Data Movement ‣ Things changed pretty definitively in 2012 ‣ And the next image shows why ... Tuesday, May 28, 13
  • 44. Network vs. Physical Cloud Data Movement ‣ With a 1GbE internet connection ... ‣ and using Aspera software .... ‣ We sustained 700 MB/sec for more than 7 hours freighting genomes into Amazon Web Services ‣ This is fast enough for many use cases, including genome sequencing core facilities* ‣ Chris Dwan’s webinar on this topic: http://biote.am/7e 44 Tuesday, May 28, 13
  • 45. Network vs. Physical Cloud Data Movement ‣ Results like this mean we now favor network-based data movement over physical media movement ‣ Large-scale physical data movement carries a high operational burden and consumes non-trivial staff time & resources 45 Tuesday, May 28, 13
  • 46. There are three ways to do network data movement ... Cloud Data Movement ‣ Buy software from Aspera and be done with it ‣ Attend the annual SuperComputing conference & see which student group wins the bandwidth challenge contest; use their code ‣ Get GridFTP from the Globus folks 46 Tuesday, May 28, 13
  • 47. SysAdmin vs Programmer 47 Practical Advice ‣ Recognize the blurring line between IT / Informatics / SW Engineer ‣ ... and how it may mix up your org chart Tuesday, May 28, 13
  • 48. Very blurry lines in 2013 for all of these roles 48 Scientist/SysAdmin/Programmer ‣ Radical change in last ~2 years for how IT is provisioned, delivered, managed & supported ‣ Root cause (Technology) Virtualization & Cloud ‣ Root Cause (Operations) Configuration Mgmt, Systems Orchestration & Infrastructure Automation ‣ SysAdmins & IT staff need to re- skill and retrain to stay relevant Tuesday, May 28, 13
  • 49. Very blurry lines in 2013 for all of these roles 49 Scientist/SysAdmin/Programmer ‣ When everything has an API .. ‣ .. anything can be ‘orchestrated’ or ‘automated’ remotely ‣ And by the way ... ‣ The APIs (‘knobs & buttons’) are accessible to all Tuesday, May 28, 13
  • 50. Very blurry lines in 2013 for all of these roles 50 Scientist/SysAdmin/Programmer ‣ IT jobs, roles and responsibilities are undergoing rapid upheaval ‣ SysAdmins must learn to program in order to harness automation tools ‣ Programmers & Scientists can now self-provision and control sophisticated IT resources Tuesday, May 28, 13
  • 51. Very blurry lines in 2012 for all of these roles 51 Scientist/SysAdmin/Programmer ‣ My take on the future ... ‣ Far more control is going into the hands of the research end user ‣ IT support roles will radically change -- no longer owners or gatekeepers ‣ IT will handle policies, procedures, reference patterns , security & best practices ‣ Researchers will control the “what”, “when” and “how big” Tuesday, May 28, 13
  • 53. 53 Cloud HPC Case Study Time Permitting ... Tuesday, May 28, 13
  • 54. Next Generation Nuclear Magnetic Resonance 54 NMR Probehead Simulation on AWS ‣ CAE Simulation Project ‣ via www.hpcexperiment.com ‣ Software: CST Studio 2012 ‣ My role: Volunteer HPC Mentor Tuesday, May 28, 13
  • 55. Simulating next-generation NMR probeheads 55 Why this was an interesting project ‣ Frontend interface is graphics heavy and requires Windows ‣ Studio ‘solvers’ run Linux or Windows; support GPUs and MPI task distribution ‣ Simultaneous use of local and cloud-based solvers actually works ‣ flexLM license server involved ‣ Non-trivial security and geo- location requirements Tuesday, May 28, 13
  • 56. 56 When we ran at modest scale ... 16 large compute nodes + 22 GPU nodes $30/hour on AWS Spot Market. HPC on the cloud is real. Tuesday, May 28, 13
  • 57. Design Attempt #1 57 ‣ Hybrid Linux/Windows cloud running in AWS EU Region ‣ Failure: • No GPU nodes in EU at the time • No cc2.4xlarge at the time Tuesday, May 28, 13
  • 58. Design Attempt #2 58 ‣ Move Hybrid Linux/Windows system to US-EAST ‣ ... with synthetic test data ‣ Best-practices VPC isolation & VPN access ‣ It looked like this ... Tuesday, May 28, 13
  • 60. Design Attempt #2 60 ‣ Attempt #2 Failed: ‣ CST FrontEnd Controller running at end-user site could not tolerate NAT translation used by solvers ‣ No GPU nodes available within VPC at that time Tuesday, May 28, 13
  • 61. Design Attempt #3 61 ‣ Design #3 Finally works ‣ VPC shrunk to single license server running in US EAST ‣ All Windows/Linux/GPU solover nodes running in EU ‣ NO NAT, NO VPC For Solvers ‣ Extensive use of AWS spot instance servers Tuesday, May 28, 13
  • 62. At experiment end it looked like this ... 62 Tuesday, May 28, 13
  • 63. 63 Non Trivial HPC on the Cloud 16 large compute nodes + 22 GPU nodes $30/hour on AWS Spot Market. Tuesday, May 28, 13
  • 64. Why this work was ‘easy’ on Amazon AWS ... 64 Nightmare on any other cloud ‣ Lets discuss why this simulation workload would be much, much harder to do on some other cloud platform ... Tuesday, May 28, 13
  • 65. Why this work was ‘easy’ on Amazon AWS ... 65 Nightmare on any other cloud 1. Virtual Servers 2. Block Storage 3. Object Storage 4. ... and maybe some other stuff if I’m lucky ‣ EC2, S3, EBS, RDS, SNS, SQS, SWS, GPUs, SSDs, CloudFormation, VPC, ENIs, SecurityGroups, 10GbE DirectConnect, Reserved Instances, ImportExport, Spot Market ‣ And ~25 other products and service features with more added monthly ‘Brand X’ Cloud AWS Tuesday, May 28, 13
  • 66. Easy on AWS; much harder elsewhere One very specific example 66 ‣ The widely used FLEXlm license server uses NIC MAC addresses when generating license keys ‣ Different MAC? Science stops. Screwed. ‣ VPC ENIs allow separation of MAC address from Network Interface. Badass. Tuesday, May 28, 13
  • 67. Why this work was ‘easy’ on Amazon AWS ... A few other examples ... 67 VPC Spot Market cc* & cg* ec2 instance types Incredibly powerful. Actually useful. Approachable even if you are not an IPSEC or BGP routing god. Compelling economics. Once you start you’ll likely never run anywhere else. The competition can’t compete. Fat nodes with bidirectional 10GbE bandwidth. And don’t get me started on SSD or Provisioned- performance EBS volumes. Tuesday, May 28, 13