2015 Bio-IT Trends From the Trenches

1
2015 Bio IT World Expo:
Trends From The Trenches
slideshare.net/chrisdag/ chris@bioteam.net @chris_dag

2
I’m Chris.
I’m an infrastructure
geek.
(and failed scientist …)
I work for the BioTeam.

4
BioTeam.
Independent Consulting Shop

5
BioTeam.
Virtual company with nationwide staff

6
BioTeam.
Run by scientists forced to learn IT to “get science done”

7
BioTeam.
15+ years “bridging the gap” between hardcore science, HPC & IT

8
BioTeam.
15+ years “bridging the gap” between hardcore science, HPC & IT
Honest. Objective. Vendor & Technology Agnostic.

10
Disclaimer
I speak mainly from my own personal experiences
Not an expert. Not a pundit. “Thought Leader? Hell no!
… and what I learn via osmosis from coworkers, clients & colleagues

11
Disclaimer
I speak mainly from my own personal experiences
Not an expert. Not a pundit. “Thought Leader? Hell no!
… and what I learn via osmosis from coworkers, clients & colleagues
Be cynical. Filter my words through your own insight & experience.

12image: shanelin via flickr
Tick …
Tick …
Tick …
Insufferable, huh? Lets talk trends …
And steal/recycle some bits from last year …

13
Still a risky time to be doing Bio-IT

14
Bottom Line: Science evolves faster than
IT can refresh infrastructure & practices

15
Bench science changes ~monthly
… while our IT infrastructure gets
refreshed every 2-7 years

16
Our job is to design systems TODAY that can
support unknown research requirements
over a multi-year timespan

17
Our job is to design systems TODAY that can
support unknown research requirements
over a multi-year timespan
Gulp …

19
Years ago we could stage cheap kit in the lab or a nearby telco closet
The easy period is over.

20
Years ago we could stage cheap kit in the lab or a nearby telco closet
The easy period is over.
This approach has not been viable for years now… real solutions are required

25
And speaking of the Broad Institute

26
.. and smart people who think about infrastructure

Homework Assignment #1
2015 Bio IT World Expo
‣ 11:10am Tomorrow - Track1
‣ “Infrastructure, Architecture, and Organization:
Data Engineering at Scale at the Broad”
• Chris Dwan - Assistant Director, Research Computing
and Data Services, Broad Institute of MIT and Harvard
• … and the DDN/Qumulo stuff preceding Dwan should
be interesting as well
27

28
Enough groundwork. Lets Talk Trends

29
Trends: DevOps, Automation & Org Charts

30
2015: Time to stop screwing around.

31
SysAdmins: If you can’t script your
upward career mobility is done.

32
It’s not “just” cloud & virtualization …
Everything will have an API soon

33
Orchestration, configuration
management & automation stacks are
already in your enterprise

34
You will need to learn them.
Bonus: You will be 10x more productive

35
Organizations: “We are not on the
cloud yet” is no longer a viable excuse

36
DevOps methods & infrastructure
automation are already transforming
on-premise IT

37
These methods are transformational
“force-multipliers” for overworked &
understaffed IT teams

38
Not using these methods in 2015
implies a certain ‘legacy’ attitude or
method of operation

39
.. that your competition may not have

40
Chef, Puppet, Ansible, SaltStack …
Pick what works for you (and that all can agree on)
… and commit to learning, using and evangelizing it
… ideally across the enterprise (not just Research)

41
Hey Network Engineers …
Same API-driven automation trends are steamrolling your way
You’ll need more than a Cisco certification in the future

42
Practical software defined networking will not be “mostly cloud” for long

43
Practical software defined networking will not be “mostly cloud” for long

‣ 2:55pm Today - Track1
‣ “Accelerating Biomedical Research Discovery:
The 100G Internet2 Network – Built and
Engineered for the Most Demanding Big Data
Science Collaborations”
• Christian Todorov, Director, Network Services
Management, Internet2
‣ Why? Internet2 has some of the most interesting non-
cloud SDN stuff in production today
44

46
Unchanged in 2015:
Linux clusters are still the baseline platform

47
Unchanged in 2015:
Even our lab instruments know how to submit jobs to the
common HPC resource allocation & scheduling tools

48
Unchanged in 2015:
Still feels like a solved problem. Compute is a commodity.
Most of the interesting action is with ‘outlier’ projects
Ok. Designing a 60,000 CPU core HPC environment is still hard :)
Compute power is rarely challenging these days.

49
Compute: NICs & Disks
At 10Gig and higher speeds, careful attention is needed to ensure that
our server, NIC and disk configurations do not become the new
bottleneck. Pay careful attention to NIC selection and consider
host/kernel tuning when playing at 10Gig and above.
Latest Intel Haswell stuff is driving 40Gig NICs today

50
Compute: Diversity Trend Still Strong
GPU Compute Block
GPU Visualization Block
Phi Coprocessor Block
Dev/Application Block
Hadoop/HDFS Block
HPC Life Science Informatics via modular “Building Blocks”
Large Memory Block
Very Large Memory Block
Fat Nodes (large SMP)
Thin Nodes (fastest CPUs)
Flash/SSD Analytic Block
Homogenous ResearchIT is fading away in favor of “capability-
based” computing via standard HPC building blocks

52
Ever see 96 DIMMs in a server? 1.5TB RAM

53
“FAT GPU” Building Block Example
Image source: nor-tech.com

54
Compute: FPGAs & Phi’s
Exotic hardware has it’s place but will not rule the world
2015: Thoughts on hardware acceleration largely unchanged
Why not?
“... the activation energy required for a scientist to use
this stuff is generally quite high ...”
Best deployed as point-solution for pain point or as component
of large scale / high-value analytical workflow

‣ 1:10pm Today - Lunch Presentation II
‣ “Optimizing Genomic Sequence Searches to
Next-Generation Intel Architectures”
• Bhanu Rekepalli, Ph.D., Senior Scientific Consultant &
Principal Investigator, BioTeam Inc.
‣ Interested in Phi? Bhanu has deep experience and will
be talking at lunch about Intel Phi used for massively
scalable BLAST searches
55

56
Topic / Trend: Converged Infrastructure
Yep.
100Gig line
card in the wild

57
HyperConvergence in Research IT
Not yet a widespread trend. Something to watch though
Warning/Caveat:
We see it mostly in very large greenfield deployments
Feels like scale-out petabyte+ NAS a few years ago — This may be
an area that most of us “watch” to see how the big orgs approach it
… and then we copy them 2-3 years later

58
“ISP Model” seeing use within large campus network upgrade projects
Small Examples
Ultra-converged Virtualization blocks packed with CPU/Disk/Flash/NICs
Avere kit front-ending on-prem NAS/Object + Google & Amazon Object Stores
DDN Disk Arrays running native applications via onboard hypervisors
iRODS + Object Store efforts (including Cleversafe/BioTeam work …)

59
Infiniband EDR. Particularly the Mellanox stuff
One Big Example (and topic to watch …)
Mellanox ConnectX-4 VPI and EDR Infiniband gets you:
684 ports of 100Gig performance in one director class switch
Split personality host adaptors supporting IB, Ethernet or Both
Infiniband: 56Gb/s FDR or 100Gb/s EDR
Ethernet: 1GbE, 10GbE, 40GbE, 56GbE, 100Gbe

60
Infiniband Convergence
EDR in the Core & FDR @ the Edge enable large non-blocking HPC designs
This is enabling some cool stuff in large greenfield projects:
Infiniband for parallel filesystem access AND low-latency MPI apps
10/40/100 Gig Ethernet wherever you need it
7-figure CAPEX cost savings at very large scale (**)
Compute, storage & message passing on one managed fabric

63
2015 Converged IT Summit
1st ever BioTeam conference series (w/ CHI of course)
2-day meeting of the minds / Total focus on life science topics
Brochures @ BioTeam Booth #357
September 9-10, 2015
Intercontinental Hotel, San Francisco USA
http://convergeditsummit.com

65
Cloud-based Science:
Still real. Still useful. Still growing strong.
One quick recap and then a few slides on
some BioTeam 2015 “firsts”
Still polluted by marketeers and thick layers of BS

66
Cost and economics ARE NOT the primary drivers
Neutral meeting ground for collaboration w/ competitors
Primary cloud/science driver is CAPABILITY:
Lab instruments are now capable of “write to cloud” operation
Ease of data ingest/exchange
IaaS environments like AWS/Google offer capabilities that we
can’t easily match on-premise
And many more …

67
Our first use of a 10Gig Direct Connect circuit to Amazon
BioTeam IaaS Cloud Milestones (2014-2015)
Our first use of Internet2 as layer2/3 transit for AWS Direct Connect
Our first large-scale production project on Google Compute
Our first large-scale use of Google Genomics API

68
If Amazon, 10Gig, template-driven HPC, multi-VPC
security, Internet2 interests you …

‣ 10:40am Thursday - Track 3
‣ “Next-Generation Sequencing and Cloud Scale:
A Journey to Large-Scale Flexible
Infrastructures in AWS”
• Jason Tetrault, Associate Director, Business and
Information Architect, R&D IT, Biogen
69

70
If Google Genomics API interests you …

72
Autism Speaks: Variant Finding
via Google Genomics API
(I can’t disclose anything else)
Interested? Chat with one of the dev @ BioTeam Booth

74
Storage:
Lots of attention to this area in the 2014 talk
Check out the 2014 Trends slides online at http://slideshare.net/chrisdag
No time to recap all of the stuff that has only modestly changed
Today: Quick recap followed by new stuff/thoughts/trends

75
Storage: Quick Recap
Still a huge pain point
Still amazing ways to waste large amounts of money
Petabyte-class storage has not been scary for years now
Single Tier of Scale-out NAS or Parallel FS insufficient in 2015
Multiple Tiers are a requirement; probably multi-vendor

76
Storage: Reasonable Tier Example
5-40TB SSD/Flash tier for ingest & IOPS-sensitive workflows
50-400TB tier (SATA,SAS,SSD mix) for active processing
Petabyte-capable (Cloud/Object/SATA) nearline tier
100TB - 1PB “Trash” Tier (optional)
100TB - 500TB Fast Scratch (optional)

77
Storage: Object Is the Future
Not a trend. Yet. I’m still a believer though ..
Object storage is the future of
scientific data at rest.
Expect a lot more on this in 2016 talk …

78
Don’t believe me?
Check out how many object vendors
are on the show floor this week!
Amplidata, Avere*, CleverSafe, DDN, Swiftstack Inc. etc. etc.

79
This what my metadata looks like on a POSIX filesystem:
Owner
Group membership
Read/write/execute permissions based on Owner or Group
File size
Creation/Modification/Last-access Timestamps

80
This is what I WOULD LIKE TO TRACK on a PER-FILE basis:
What instrument produced this data?
What funding source paid to produce this data?
What revision was the instrument/flowcell at?
Who is the primary PI or owner of this data? Secondary?
What protocol was used to prepare the sample?
Where did the sample come from?
Where is the consent information?
Can this data be used to identify an individual?
What is the data retention classification for this file?
What is the security classification for this file?
Can this file be moved offsite?
etc. etc. etc.
…

81
Historically metadata has been tracked via several methods:
DIY LIMS and Relational Database Systems
iRODS or other “metadata aware” systems
… all at significant human, development & operational cost

82
My gut feeling for the future:
Economics and tech benefits like erasure coding will draw interest
But most adoption will be motivated by the ease at which arbitrary
metadata can be stored with each file or object
… and later searched / sorted / retrieved based on queries against
the stored metadata

83
Advice for the audience:
It will take years for our field to get here. You've got time!
When evaluating Object Storage Solutions:
… consider scoring or evaluating them on how well they
handle metadata search and retrieval operations

84
Would you like object storage with your iRODS?
Go talk to the CleverSafe people on the show floor
We did some neat stuff with them related to using
iRODS with CleverSafe object store backend

85
Time for a few 2014 war stories?

86
Storage: Where the Action Is
Primary storage is still challenging, but …
The really interesting work lies at the edges:
Small & Cheap Storage
Ludicrously Large Storage
Some examples …

87
Storage War Story 1: Small Ingest
Pharmaceutical company with an ingest issue
Funky lab instrument puts 30,000 tiny files in one directory
Copying each experiment across 1Gb to SAN took HOURS
Root cause: SAN system choking on tiny file metadata
Trivial from a size viewpoint — ~6GB per experiment

88
Storage War Story 1: Small Ingest
This was a VERY interesting project
There are MANY large/expensive systems that can handle
small file ingest. Don’t need BIG and can’t afford EXPENSIVE
Incredibly difficult to find small 5-10TB usable solution with
the right mix of hardware to handle small-file ingest
Winner: NexSan due to their FASTier smart SSD caching

89
Storage War Story 2: Backblaze
Do these pictures look familiar?

93
Most popular BioTeam blog post ever
2011: 135TB raw storage for $12,000
Guess what?

94
Next-Generation “Pod” Now available

95
Backblaze ‘pod’ style still lives on
Now with much better fault tolerance
No more custom wiring harnesses!
What can YOU do with 45x 6TB drives at
rock bottom price?

99
… including updated ‘real world’ cost
Hope to blog about this Summer 2015
… and their 30 drive totally silent chassis (!)

100
One last storage war story …
From a lab with long history of
innovative storage projects

101
Storage War Story 3: Petascale Disruption
Very cool 2014 project
BioTeam + Pineda Lab @ Johns Hopkins
Intel Lustre + Linux + ZFS + Commodity HW

102
Storage War Story 3:
Petascale Disruption
2 Petabytes (raw) / 1.4PB usable for $165,000
PUE of 1.5 = $10,000/year in electrical savings
Performance close to much more $$$ options
Expect to see more details released in 2015

104
Last section of this talk is going to
discuss what keeps me up at night.

105
This is what I expect my hardest projects
will involve 2015-2016 and beyond …

106
Tipping Point #1
Effort/cost of generating or acquiring vast piles of data
in 2015 is far less than real world cost of storing and
managing that data through a realistic lifecycle.

107
Tipping Point #2
Scientists still believe storage is cheap & near-infinite.
Data triage no longer sufficient. Scientists rarely asked
to articulate a scientific/business case for storage.

108
Tipping Point #3
Centralized infrastructure models are not sufficient and
must be modified. Data & compute WILL span sites and
locations with or without active IT involvement.
“Data Spread” is real. We need to start preparing now.

109
“Center Of Gravity” Problem

110
“Center Of Gravity” Problem
Current methods involving centralized storage and bringing
“users” and “compute” very close “… to the data” are
going to face significant problems in 2015 and beyond.

111
“Center Of Gravity” Pain #1
Terabyte class instruments. Everywhere. Gulp.
We can not stop this trend - large scale data generation will span labs,
buildings, campus sites & WANs

112
Collaborations & Peta-scale Open Access Data
The future of large scale genomics|informatics increasingly involves
multi-party / multi-site collaboration. Also: Petabytes of free data (!!)

113
Object Storage Less Effective @ Single Site
Object storage is the future of scientific data at rest. Some major side
benefits (erasure coding, etc.) can only be realized when 3 or more
sites are involved

114
“Center Of Gravity” Summarized
Data spread is unavoidable. Effectively Unstoppable.
We have a WAN-scale data movement/access problem.
There are ~2 viable approaches going forward ...

115
Option 1 - “Stay Centralized”Still totally viable but much faster connectivity to
instruments & collaborators will be essential
Nutshell: Significant investment in edge/WAN connectivity required,
likely requiring bandwidth exceeding 10Gbps

116
Option 2 - “Go With The Flow”Embrace the distributed & “cloudy” future where
compute & storage span multiple zones
Nutshell: Still requires massive bandwidth upgrades to support
metadata-aware or location-aware access & compute

118
Terabyte-scale data movement is
going to be an informatics “grand
challenge” for the next 2-3+ years
And far harder/scarier than previous compute & storage challenges

Long history of engagement & cooperation
Research IT vs. Enterprise IT
‣ Historically our infrastructure requirements often
surpassed what the Enterprise uses to sustain
day to day operation
‣ We’ve spent ~20 years working closely with
Enterprise IT to enable “data intensive science”
‣ Relatively easy to align informatics IT
infrastructure with established vendor, product,
technology and architecture standards
‣ This held true until this year …
120

$#%(*&@#@*&^@!*^@!(*&# !!!!!!!!!!!!!!!!!!!!!!!!
121
Data Movement
Prepare For Pain …

122
2015 Grand Challenge
Large-scale Data Movement (and why this will be very difficult …)

123
Issue #1
Current LAN/WAN stacks bad for emerging use case
Existing technology we’ve used for decades has been architected to
support many small network flows; not a single big data flow

124
Issue #2
Ratio of LAN:WAN bandwidth is out of whack
We will need faster links to “outside” than most organizations have
anticipated or accounted for in long-term technology planning

125
Issue #3
Core, Campus, Edge and “Top of Rack” bandwidth
Enterprise networking types can be *smug* about 10Gbps at the
network core. Boy are they in for a bad surprise.

126
Issue #4
Bigger blast radius when stuff goes wrong
Compute & storage can be logically or physically contained to
minimize disruption/risk when Research does stupid things.
Networks, however, touch EVERYTHING EVERYWHERE. Major risk.

127
What We Need:
- Ludicrous bandwidth @ network core
- Very fast (10-40Gbps) ToR, Edge, Campus links
- 1Gbps - 10Gbps connections to “outside”
- Switches/Routers/Firewalls that can support
small #s of very large data flows

128
Why this will be difficult to
achieve

129
Issue #4
Social, trust & cultural issues
We lack the multi-year relationship and track record we’ve built with
facility, compute & storage teams. We are “strangers” to many WAN
and SecurityOps types

130
Issue #5
Our “deep bench” of internal expertise is lacking
Research IT usually has very good “shadow IT” skills but we don’t
have homegrown experts in BGP, Firewalls, Dark Fiber, Routing etc.

131
Issue #5
Cost. Cost. Cost.
Have you seen what Cisco charges for a 100Gbps line card?

132
Issue #5
Cisco. Cisco. Cisco.
The elephant in the room. Cisco rarely 1st choice for greenfield efforts
in this space but Cisco shops often refuse to entertain any
alternatives. Massive existing install base & on-premise expertise
must be balanced, recognized & carefully handled.

133
Issue #5
Firewalls, SecOps & Incumbent Vendors
Legacy security products supporting 10Gbps can cost $150,000+ and
still utterly fail to perform without heroic tuning & deep config magic.
Alternatives exist but massive institutional inertia to overcome.
Deeply Challenging Issue.

135
‣ Peta-scale becoming the norm, not exception
‣ Compute is a commodity; Storage getting there
‣ Historically it has been pretty easy to integrate
“Research Computing” with “Enterprise”
facilities and operational standards
‣ We can no longer assume the majority of our
infrastructure will reside in a single datacenter

136
‣ We need a massive increase in end-to-end
network connectivity & bandwidth
‣ … and kit that can handle large data flows
‣ Current state of “Enterprise” LAN/WAN
networking is not aligned with emerging needs:
• Cost, Capability, Performance, Security …

137
‣ New hardware, reference architectures, best
practices and methods will be required
‣ There is no easy path forward …

138
‣ And this brings us to …
‣ ScienceDMZ

139
‣ Science DMZ
• Only viable reference architecture &
collection of operational practices /
philosophy BioTeam has seen to date
• In-use today. Real world. No BS.
• High level visibility & support within US.GOV,
grant funding agencies and supporters of
data intensive science and R&E networks

140
‣ BioTeam has three current ScienceDMZ
projects going on right now. Speeds ranging
from 10Gig to 100Gig
‣ This is likely just the beginning of a long and
difficult transformation in our world
‣ We are going to try to collect useful public info
at http://sciencedmz.org starting this summer

141
The “come to jesus” graph …

Two final announcements …
142

143
‣ Science DMZ Overview Webinar
• May 18, 2-4pm EDT
• http://bioteam-events.webex.com
• No BS; No Hype; No Marketing
- 60 min of content from the inventors of
Science DMZ (ES.NET, of course!)
- 60 min for questions/discussion

144
‣ Announcing BioTeam 100Gig ConvergedIT Lab
• Hosted at Texas Advanced Computing Center (“TACC”)
• Compute/storage/networking/security kit all available for use/experimentation
• Access to TACC 100Gig Internet2 circuit
• Access to STAMPEDE and other TACC Supercomputers
• Support from Intel, Juniper and many other vendors (Hint, hint!)
• Goal #1: Showcase and test ScienceDMZ reference architectures for LifeSci
• Goal #2: Have a killer demo for SuperComputing 2016 :)

145
end; Thanks!
slideshare.net/chrisdag/ chris@bioteam.net @chris_dag

146
I am only here today because
#TurkeyFAILED at #Genocide
#100YearsOfDenial
www.neverforget1915.us

2015 Bio-IT Trends From the Trenches

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie 2015 Bio-IT Trends From the Trenches

Ähnlich wie 2015 Bio-IT Trends From the Trenches (20)

Mehr von Chris Dagdigian

Mehr von Chris Dagdigian (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

2015 Bio-IT Trends From the Trenches