This document provides an overview of trends from the 2015 Bio IT World Expo presented by Chris Dag from the BioTeam. It discusses trends around DevOps, automation, converged infrastructure, compute, storage, cloud computing, and specific projects. Key points include the need for sysadmins to learn scripting and automation, the growing role of APIs, and how object storage is the future for managing scientific data and metadata. Specific examples highlight high performance computing configurations, small file ingest solutions, low-cost storage approaches, and a petascale storage system built using Intel Lustre, Linux, and ZFS on commodity hardware.
7. 7
BioTeam.
Independent Consulting Shop
Run by scientists forced to learn IT to “get science done”
Virtual company with nationwide staff
15+ years “bridging the gap” between hardcore science, HPC & IT
8. 8
BioTeam.
Independent Consulting Shop
Run by scientists forced to learn IT to “get science done”
Virtual company with nationwide staff
15+ years “bridging the gap” between hardcore science, HPC & IT
Honest. Objective. Vendor & Technology Agnostic.
10. 10
Disclaimer
I speak mainly from my own personal experiences
Not an expert. Not a pundit. “Thought Leader? Hell no!
… and what I learn via osmosis from coworkers, clients & colleagues
11. 11
Disclaimer
I speak mainly from my own personal experiences
Not an expert. Not a pundit. “Thought Leader? Hell no!
… and what I learn via osmosis from coworkers, clients & colleagues
Be cynical. Filter my words through your own insight & experience.
12. 12image: shanelin via flickr
Tick …
Tick …
Tick …
Insufferable, huh? Lets talk trends …
And steal/recycle some bits from last year …
19. 19
Years ago we could stage cheap kit in the lab or a nearby telco closet
The easy period is over.
20. 20
Years ago we could stage cheap kit in the lab or a nearby telco closet
The easy period is over.
This approach has not been viable for years now… real solutions are required
27. Homework Assignment #1
2015 Bio IT World Expo
‣ 11:10am Tomorrow - Track1
‣ “Infrastructure, Architecture, and Organization:
Data Engineering at Scale at the Broad”
• Chris Dwan - Assistant Director, Research Computing
and Data Services, Broad Institute of MIT and Harvard
• … and the DDN/Qumulo stuff preceding Dwan should
be interesting as well
27
40. 40
Chef, Puppet, Ansible, SaltStack …
Pick what works for you (and that all can agree on)
… and commit to learning, using and evangelizing it
… ideally across the enterprise (not just Research)
41. 41
Hey Network Engineers …
Same API-driven automation trends are steamrolling your way
You’ll need more than a Cisco certification in the future
44. Homework Assignment #2
2015 Bio IT World Expo
‣ 2:55pm Today - Track1
‣ “Accelerating Biomedical Research Discovery:
The 100G Internet2 Network – Built and
Engineered for the Most Demanding Big Data
Science Collaborations”
• Christian Todorov, Director, Network Services
Management, Internet2
‣ Why? Internet2 has some of the most interesting non-
cloud SDN stuff in production today
44
47. 47
Unchanged in 2015:
Even our lab instruments know how to submit jobs to the
common HPC resource allocation & scheduling tools
48. 48
Unchanged in 2015:
Still feels like a solved problem. Compute is a commodity.
Most of the interesting action is with ‘outlier’ projects
Ok. Designing a 60,000 CPU core HPC environment is still hard :)
Compute power is rarely challenging these days.
49. 49
Compute: NICs & Disks
At 10Gig and higher speeds, careful attention is needed to ensure that
our server, NIC and disk configurations do not become the new
bottleneck. Pay careful attention to NIC selection and consider
host/kernel tuning when playing at 10Gig and above.
Latest Intel Haswell stuff is driving 40Gig NICs today
50. 50
Compute: Diversity Trend Still Strong
GPU Compute Block
GPU Visualization Block
Phi Coprocessor Block
Dev/Application Block
Hadoop/HDFS Block
HPC Life Science Informatics via modular “Building Blocks”
Large Memory Block
Very Large Memory Block
Fat Nodes (large SMP)
Thin Nodes (fastest CPUs)
Flash/SSD Analytic Block
Homogenous ResearchIT is fading away in favor of “capability-
based” computing via standard HPC building blocks
54. 54
Compute: FPGAs & Phi’s
Exotic hardware has it’s place but will not rule the world
2015: Thoughts on hardware acceleration largely unchanged
Why not?
“... the activation energy required for a scientist to use
this stuff is generally quite high ...”
Best deployed as point-solution for pain point or as component
of large scale / high-value analytical workflow
55. Homework Assignment #3
2015 Bio IT World Expo
‣ 1:10pm Today - Lunch Presentation II
‣ “Optimizing Genomic Sequence Searches to
Next-Generation Intel Architectures”
• Bhanu Rekepalli, Ph.D., Senior Scientific Consultant &
Principal Investigator, BioTeam Inc.
‣ Interested in Phi? Bhanu has deep experience and will
be talking at lunch about Intel Phi used for massively
scalable BLAST searches
55
56. 56
Topic / Trend: Converged Infrastructure
Yep.
100Gig line
card in the wild
57. 57
HyperConvergence in Research IT
Not yet a widespread trend. Something to watch though
Warning/Caveat:
We see it mostly in very large greenfield deployments
Feels like scale-out petabyte+ NAS a few years ago — This may be
an area that most of us “watch” to see how the big orgs approach it
… and then we copy them 2-3 years later
58. 58
HyperConvergence in Research IT
“ISP Model” seeing use within large campus network upgrade projects
Small Examples
Ultra-converged Virtualization blocks packed with CPU/Disk/Flash/NICs
Avere kit front-ending on-prem NAS/Object + Google & Amazon Object Stores
DDN Disk Arrays running native applications via onboard hypervisors
iRODS + Object Store efforts (including Cleversafe/BioTeam work …)
59. 59
HyperConvergence in Research IT
Infiniband EDR. Particularly the Mellanox stuff
One Big Example (and topic to watch …)
Mellanox ConnectX-4 VPI and EDR Infiniband gets you:
684 ports of 100Gig performance in one director class switch
Split personality host adaptors supporting IB, Ethernet or Both
Infiniband: 56Gb/s FDR or 100Gb/s EDR
Ethernet: 1GbE, 10GbE, 40GbE, 56GbE, 100Gbe
60. 60
Infiniband Convergence
EDR in the Core & FDR @ the Edge enable large non-blocking HPC designs
This is enabling some cool stuff in large greenfield projects:
Infiniband for parallel filesystem access AND low-latency MPI apps
10/40/100 Gig Ethernet wherever you need it
7-figure CAPEX cost savings at very large scale (**)
Compute, storage & message passing on one managed fabric
63. 63
2015 Converged IT Summit
1st ever BioTeam conference series (w/ CHI of course)
2-day meeting of the minds / Total focus on life science topics
Brochures @ BioTeam Booth #357
September 9-10, 2015
Intercontinental Hotel, San Francisco USA
http://convergeditsummit.com
65. 65
Cloud-based Science:
Still real. Still useful. Still growing strong.
One quick recap and then a few slides on
some BioTeam 2015 “firsts”
Still polluted by marketeers and thick layers of BS
66. 66
Cloud-based Science:
Cost and economics ARE NOT the primary drivers
Neutral meeting ground for collaboration w/ competitors
Primary cloud/science driver is CAPABILITY:
Lab instruments are now capable of “write to cloud” operation
Ease of data ingest/exchange
IaaS environments like AWS/Google offer capabilities that we
can’t easily match on-premise
And many more …
67. 67
Cloud-based Science:
Our first use of a 10Gig Direct Connect circuit to Amazon
BioTeam IaaS Cloud Milestones (2014-2015)
Our first use of Internet2 as layer2/3 transit for AWS Direct Connect
Our first large-scale production project on Google Compute
Our first large-scale use of Google Genomics API
69. Homework Assignment #4
2015 Bio IT World Expo
‣ 10:40am Thursday - Track 3
‣ “Next-Generation Sequencing and Cloud Scale:
A Journey to Large-Scale Flexible
Infrastructures in AWS”
• Jason Tetrault, Associate Director, Business and
Information Architect, R&D IT, Biogen
69
74. 74
Storage:
Lots of attention to this area in the 2014 talk
Check out the 2014 Trends slides online at http://slideshare.net/chrisdag
No time to recap all of the stuff that has only modestly changed
Today: Quick recap followed by new stuff/thoughts/trends
75. 75
Storage: Quick Recap
Still a huge pain point
Still amazing ways to waste large amounts of money
Petabyte-class storage has not been scary for years now
Single Tier of Scale-out NAS or Parallel FS insufficient in 2015
Multiple Tiers are a requirement; probably multi-vendor
76. 76
Storage: Reasonable Tier Example
5-40TB SSD/Flash tier for ingest & IOPS-sensitive workflows
50-400TB tier (SATA,SAS,SSD mix) for active processing
Petabyte-capable (Cloud/Object/SATA) nearline tier
100TB - 1PB “Trash” Tier (optional)
100TB - 500TB Fast Scratch (optional)
77. 77
Storage: Object Is the Future
Not a trend. Yet. I’m still a believer though ..
Object storage is the future of
scientific data at rest.
Expect a lot more on this in 2016 talk …
78. 78
Storage: Object Is the Future
Don’t believe me?
Check out how many object vendors
are on the show floor this week!
Amplidata, Avere*, CleverSafe, DDN, Swiftstack Inc. etc. etc.
79. 79
Storage: Object Is the Future
This what my metadata looks like on a POSIX filesystem:
Owner
Group membership
Read/write/execute permissions based on Owner or Group
File size
Creation/Modification/Last-access Timestamps
80. 80
Storage: Object Is the Future
This is what I WOULD LIKE TO TRACK on a PER-FILE basis:
What instrument produced this data?
What funding source paid to produce this data?
What revision was the instrument/flowcell at?
Who is the primary PI or owner of this data? Secondary?
What protocol was used to prepare the sample?
Where did the sample come from?
Where is the consent information?
Can this data be used to identify an individual?
What is the data retention classification for this file?
What is the security classification for this file?
Can this file be moved offsite?
etc. etc. etc.
…
81. 81
Storage: Object Is the Future
Historically metadata has been tracked via several methods:
DIY LIMS and Relational Database Systems
iRODS or other “metadata aware” systems
… all at significant human, development & operational cost
82. 82
Storage: Object Is the Future
My gut feeling for the future:
Economics and tech benefits like erasure coding will draw interest
But most adoption will be motivated by the ease at which arbitrary
metadata can be stored with each file or object
… and later searched / sorted / retrieved based on queries against
the stored metadata
83. 83
Storage: Object Is the Future
Advice for the audience:
It will take years for our field to get here. You've got time!
When evaluating Object Storage Solutions:
… consider scoring or evaluating them on how well they
handle metadata search and retrieval operations
84. 84
Storage: Object Is the Future
Would you like object storage with your iRODS?
Go talk to the CleverSafe people on the show floor
We did some neat stuff with them related to using
iRODS with CleverSafe object store backend
86. 86
Storage: Where the Action Is
Primary storage is still challenging, but …
The really interesting work lies at the edges:
Small & Cheap Storage
Ludicrously Large Storage
Some examples …
87. 87
Storage War Story 1: Small Ingest
Pharmaceutical company with an ingest issue
Funky lab instrument puts 30,000 tiny files in one directory
Copying each experiment across 1Gb to SAN took HOURS
Root cause: SAN system choking on tiny file metadata
Trivial from a size viewpoint — ~6GB per experiment
88. 88
Storage War Story 1: Small Ingest
This was a VERY interesting project
There are MANY large/expensive systems that can handle
small file ingest. Don’t need BIG and can’t afford EXPENSIVE
Incredibly difficult to find small 5-10TB usable solution with
the right mix of hardware to handle small-file ingest
Winner: NexSan due to their FASTier smart SSD caching
95. 95
Storage War Story 2: Backblaze
Backblaze ‘pod’ style still lives on
Now with much better fault tolerance
No more custom wiring harnesses!
What can YOU do with 45x 6TB drives at
rock bottom price?
99. 99
Storage War Story 2: Backblaze
… including updated ‘real world’ cost
Hope to blog about this Summer 2015
… and their 30 drive totally silent chassis (!)
100. 100
One last storage war story …
From a lab with long history of
innovative storage projects
101. 101
Storage War Story 3: Petascale Disruption
Very cool 2014 project
BioTeam + Pineda Lab @ Johns Hopkins
Intel Lustre + Linux + ZFS + Commodity HW
102. 102
Storage War Story 3:
Petascale Disruption
2 Petabytes (raw) / 1.4PB usable for $165,000
PUE of 1.5 = $10,000/year in electrical savings
Performance close to much more $$$ options
Expect to see more details released in 2015
104. 104
Last section of this talk is going to
discuss what keeps me up at night.
105. 105
This is what I expect my hardest projects
will involve 2015-2016 and beyond …
106. 106
Tipping Point #1
Effort/cost of generating or acquiring vast piles of data
in 2015 is far less than real world cost of storing and
managing that data through a realistic lifecycle.
107. 107
Tipping Point #2
Scientists still believe storage is cheap & near-infinite.
Data triage no longer sufficient. Scientists rarely asked
to articulate a scientific/business case for storage.
108. 108
Tipping Point #3
Centralized infrastructure models are not sufficient and
must be modified. Data & compute WILL span sites and
locations with or without active IT involvement.
“Data Spread” is real. We need to start preparing now.
110. 110
“Center Of Gravity” Problem
Current methods involving centralized storage and bringing
“users” and “compute” very close “… to the data” are
going to face significant problems in 2015 and beyond.
111. 111
“Center Of Gravity” Pain #1
Terabyte class instruments. Everywhere. Gulp.
We can not stop this trend - large scale data generation will span labs,
buildings, campus sites & WANs
112. 112
“Center Of Gravity” Pain #2
Collaborations & Peta-scale Open Access Data
The future of large scale genomics|informatics increasingly involves
multi-party / multi-site collaboration. Also: Petabytes of free data (!!)
113. 113
“Center Of Gravity” Pain #3
Object Storage Less Effective @ Single Site
Object storage is the future of scientific data at rest. Some major side
benefits (erasure coding, etc.) can only be realized when 3 or more
sites are involved
114. 114
“Center Of Gravity” Summarized
Data spread is unavoidable. Effectively Unstoppable.
We have a WAN-scale data movement/access problem.
There are ~2 viable approaches going forward ...
115. 115
Option 1 - “Stay Centralized”Still totally viable but much faster connectivity to
instruments & collaborators will be essential
Nutshell: Significant investment in edge/WAN connectivity required,
likely requiring bandwidth exceeding 10Gbps
116. 116
Option 2 - “Go With The Flow”Embrace the distributed & “cloudy” future where
compute & storage span multiple zones
Nutshell: Still requires massive bandwidth upgrades to support
metadata-aware or location-aware access & compute
118. 118
Terabyte-scale data movement is
going to be an informatics “grand
challenge” for the next 2-3+ years
And far harder/scarier than previous compute & storage challenges
120. Long history of engagement & cooperation
Research IT vs. Enterprise IT
‣ Historically our infrastructure requirements often
surpassed what the Enterprise uses to sustain
day to day operation
‣ We’ve spent ~20 years working closely with
Enterprise IT to enable “data intensive science”
‣ Relatively easy to align informatics IT
infrastructure with established vendor, product,
technology and architecture standards
‣ This held true until this year …
120
123. 123
Issue #1
Current LAN/WAN stacks bad for emerging use case
Existing technology we’ve used for decades has been architected to
support many small network flows; not a single big data flow
124. 124
Issue #2
Ratio of LAN:WAN bandwidth is out of whack
We will need faster links to “outside” than most organizations have
anticipated or accounted for in long-term technology planning
125. 125
Issue #3
Core, Campus, Edge and “Top of Rack” bandwidth
Enterprise networking types can be *smug* about 10Gbps at the
network core. Boy are they in for a bad surprise.
126. 126
Issue #4
Bigger blast radius when stuff goes wrong
Compute & storage can be logically or physically contained to
minimize disruption/risk when Research does stupid things.
Networks, however, touch EVERYTHING EVERYWHERE. Major risk.
127. 127
What We Need:
- Ludicrous bandwidth @ network core
- Very fast (10-40Gbps) ToR, Edge, Campus links
- 1Gbps - 10Gbps connections to “outside”
- Switches/Routers/Firewalls that can support
small #s of very large data flows
129. 129
Issue #4
Social, trust & cultural issues
We lack the multi-year relationship and track record we’ve built with
facility, compute & storage teams. We are “strangers” to many WAN
and SecurityOps types
130. 130
Issue #5
Our “deep bench” of internal expertise is lacking
Research IT usually has very good “shadow IT” skills but we don’t
have homegrown experts in BGP, Firewalls, Dark Fiber, Routing etc.
132. 132
Issue #5
Cisco. Cisco. Cisco.
The elephant in the room. Cisco rarely 1st choice for greenfield efforts
in this space but Cisco shops often refuse to entertain any
alternatives. Massive existing install base & on-premise expertise
must be balanced, recognized & carefully handled.
133. 133
Issue #5
Firewalls, SecOps & Incumbent Vendors
Legacy security products supporting 10Gbps can cost $150,000+ and
still utterly fail to perform without heroic tuning & deep config magic.
Alternatives exist but massive institutional inertia to overcome.
Deeply Challenging Issue.
135. 135
‣ Peta-scale becoming the norm, not exception
‣ Compute is a commodity; Storage getting there
‣ Historically it has been pretty easy to integrate
“Research Computing” with “Enterprise”
facilities and operational standards
‣ We can no longer assume the majority of our
infrastructure will reside in a single datacenter
136. 136
‣ We need a massive increase in end-to-end
network connectivity & bandwidth
‣ … and kit that can handle large data flows
‣ Current state of “Enterprise” LAN/WAN
networking is not aligned with emerging needs:
• Cost, Capability, Performance, Security …
137. 137
‣ New hardware, reference architectures, best
practices and methods will be required
‣ There is no easy path forward …
139. 139
‣ Science DMZ
• Only viable reference architecture &
collection of operational practices /
philosophy BioTeam has seen to date
• In-use today. Real world. No BS.
• High level visibility & support within US.GOV,
grant funding agencies and supporters of
data intensive science and R&E networks
140. 140
‣ BioTeam has three current ScienceDMZ
projects going on right now. Speeds ranging
from 10Gig to 100Gig
‣ This is likely just the beginning of a long and
difficult transformation in our world
‣ We are going to try to collect useful public info
at http://sciencedmz.org starting this summer
143. 143
‣ Science DMZ Overview Webinar
• May 18, 2-4pm EDT
• http://bioteam-events.webex.com
• No BS; No Hype; No Marketing
- 60 min of content from the inventors of
Science DMZ (ES.NET, of course!)
- 60 min for questions/discussion
144. 144
‣ Announcing BioTeam 100Gig ConvergedIT Lab
• Hosted at Texas Advanced Computing Center (“TACC”)
• Compute/storage/networking/security kit all available for use/experimentation
• Access to TACC 100Gig Internet2 circuit
• Access to STAMPEDE and other TACC Supercomputers
• Support from Intel, Juniper and many other vendors (Hint, hint!)
• Goal #1: Showcase and test ScienceDMZ reference architectures for LifeSci
• Goal #2: Have a killer demo for SuperComputing 2016 :)