The University of Alabama at Birmingham gives scientists and researchers a massive, on-demand, virtual storage cloud using OpenStack and Ceph for less than $0.41 per gigabyte. This is a session at the OpenStack summit given by Kamesh Pemmaraju at Dell and John Paul at University of Alabama. This will detail how the university IT staff deployed a private storage cloud infrastructure using the Dell OpenStack cloud solution with Dell servers, storage, networking and OpenStack, and Inktank Ceph. After assessing a number of traditional storage scenarios, the University partnered with Dell and Inktank to architect a centralized cloud storage platform that was capable of scaling seamlessly and rapidly, was cost-effective, and that could leverage a single hardware infrastructure for the OpenStack compute and storage environment.
OpenStack and Ceph case study at the University of Alabama
1. Case Study: The University of
Alabama at Birmingham
OpenStack , Ceph, Dell
Kamesh Pemmaraju, Dell
John-Paul Robinson, UAB
OpenStack Summit 2014
Atlanta, GA
2. An overview
• Dell – UAB backgrounder
• What we were doing before
• How the implementation went
• What we’ve been doing since
• Where we’re headed
3. Dell – UAB background
• 900 researchers working on Cancer and Genomic
Projects.
• Their growing data sets challenged available resources
– Research data distributed across laptops, USB drives, local
servers, HPC clusters
– Transferring datasets to HPC clusters took too much time
and clogged shared networks
– Distributed data management reduced researcher
productivity and put data at risk
• They therefore needed a centralized data repository for
Researchers in order to insure compliances concerning
retention of data.
• They also wanted scale-out cost-effective solution and
hardware that could be re-purposed for compute &
storage
4. Dell – UAB background (contd..)
• Potential solutions investigated
– Traditional SAN
– Public cloud storage
– Hadoop
UAB chose Dell/Inktank to architect a platform that
would be very scalable and provide lost costs per GB
and was the best of all worlds that provide compute
and storage on the same hardware.
5. A little background…
• We didn’t get here overnight
• 2000s-era High Performance Computing
• ROCKS-based compute cluster
• The Grid and proto-clouds
• GridWay Meta-scheduler
• OpenNebula an early entrant that connected
grids with this thing called the cloud
• Virtualization through-and-through
• DevOps is US
6. Challenges and Drivers
• Technology
• Many hypervisors
• Many clouds
• We have the technology…can we rebuild it here?
• Applications
• Researcher started shouting “Data”!
NextGen Sequencing
Research Data Repositories
Hadoop
• Researcher kept on shouting “Compute”!
7. Data Intensive Scientific Computing
• We knew we needed storage and computing
• We knew we wanted to tie it together with an
HPC commodity scale-out philosophy
• So August 2012 we bought 10 Dell 720xd servers
• 16-core
• 96GB RAM
• 36TB Disk
• A 192-core, ~1TB RAM, 360TB expansion to our
HPC fabric
• Now to integrate it…
8. December 2012
• Bob said:
Hearing good things about open stack and ceph at this week at dell world.
Simon anderson, CEO of dream host , spoke highly of
dell, open stack, and ceph today.
He is also chair of company that supports
He also spoke highly of dell crowbar deployment tool.
I
9. December 2012
• Bob said:
Hearing good things about open stack and ceph at this week at dell world.
Simon anderson, CEO of dream host , spoke highly of
dell, open stack, and ceph today.
He is also chair of company that supports
He also spoke highly of dell crowbar deployment tool.
• I said:
Good to hear.
I've been thinking a lot about dell in this picture
too.
We have the building blocks in place. Might be a good
way to speed the construction.
11. The 2013 Implementation
• The Timeline
• In January we started our discussions with Dell and
Inktank
• By March we had committed to the fabric
• A week in April and we had our own cloud in place
• The Experience
• Vendors committed to their product
• Direct engagement through open communities
• Bright people who share your development ethic
12. Next Step…Build Adoption
• Defined a new storage product based on the
commodity scale-out fabric
• Able to focus on strengths of Ceph to aggregate storage
across servers
• Provision any sized image to provide Flexible Block
Storage
• Promote cloud adoption within IT and across
the research community
• Demonstrate utility with applications
13. Applications
• Crashplan Backup in the cloud
• A couple hours to provision the VM resources
• An easy half-day deploy with the vendor because we controlled our
resources a.k.a. firewall
• Add storage containers on the fly as we grow…10TB in few clicks
• Gitlab hosting
• Start a VM spec’d according to project site
• Work with Omnibus install. Hey it uses Chef!
• Research Storage
• 1TB storage containers for cluster users
• Uses Ceph RBD images and NFS
• The storage infrastructure part was easy
• Scaled provisioning, 100+ user containers (100TB) created in about 5
minutes.
• Add storage servers as existing ones fill
16. Lesson 2:
Use it! That’s what it’s for!
The sooner you start using the cloud
the sooner you start thinking like the cloud.
17. How PoC Decisions Age Over Time
• Pick the environment you want when you are in
operation…you’ll be there before you know it
• Simple networking is good
• But don’t go basic unless you are able to reinstall the fabric
• Class B ranges to match the campus fabric
• We chose a split admin range to coordinate with our HPC admin range
• We chose a collapsed admin/storage network due to a single
switch…probably would have been better to keep separate and allow
growth
• It’s OK to add non-provisioned interfacing nodes…know your net
• Avoid painting yourself in corner
• Don’t let the Paranoid Folk box-in your deployment
• An inaccessible fabric is an unusable fabric
• Fixed IP range mismatch with “fake” reservations
19. Problems will Arise
• The release version of the ixgbe driver in Ubuntu
12.04.1 kernel didn’t perform well with our 10Gbit
cards
• Open source has an upstream
• Use it as part of debug network
• Upgrading the drivers was a simple fix
• Sometimes when you fix something you break
something else
• There are still a lot of moving parts but each has a
strong open source community
• Work methodically
• You will learn as you go
• Recognize the stack is integrated and respect tool boundaries
23. Where we are today
• OpenStack plus Ceph are here to stay for our
Research Computing System
• They give us the flexibility we need for an ever
expanding research applications portfolio
• Move our UAB Galaxy NextGen Sequencing platform to
our Cloud
• Add Object Storage services
• Put the cloud in the hands of researchers
• The big question…
24. …how far can we take it?
• The goal of process automation is scale
• Incompatible, non-repeatable, manual processes
are a cost
• Success is in dual-use
• Satisfy your needs and customer demand
• Automating process implies documenting process…great for
compliance and repeatability
• Recognize the latent talent in your staff today’s system
admins are tomorrows systems developers
• Traditional infrastructure models are ripe for
replacement
26. Want to learn more about Dell +
OpenStack + Ceph?
Join the Session, 2:00 pm, Tuesday, Room #313
Software Defined Storage, Big Data and Ceph -
What Is all the Fuss About?
Neil Levine, Inktank &
Kamesh Pemmaraju, Dell