Slides from our introduction to Ceph and OpenStack webinar. You can watch the webinar on demand also here http://www.inktank.com/news-events/webinars/.
2. Agenda
• Software and Companies
• Cloud Storage Considerations
• Ceph architecture and unique features and benefits
• Ceph OpenStack Best practices
• Resources
• Next steps
3. • Company that provides • Distributed unified object,
professional services and block and file storage
support for Ceph
platform
• Founded in 2011
• Created by storage
• Funded by DreamHost experts
• Mark Shuttleworth • Open source
invested $1M
• In the Linux Kernel
• Sage Weil, CTO and
creator of Ceph • Integrated into Cloud
Platforms
4. What is OpenStack?
OpenStack is an open standards and open
source software with a mission to provide
scalable and elastic cloud operating system
for both private and public clouds, large and
small. It has:
• Largest and most active community
• High-profile contributors (ATT, Dell, HP, IBM,
Rackspace etc)
• Regular release cycles (6 months cadence)
• Governed by an independent foundation
• Broad suite of services
• Sound architecture built from the ground-up in a
scale-out manner, loosely-coupled, asynchronous
message based, distributed software system
• Ideal for massively scalable applications
5. What is in OpenStack?
OpenStack has a collection of projects that
are developed and maintained collaboratively
by an active and large community:
– Compute (Nova)
– Storage (Swift)
– Glance (Imaging)
– Horizon (Dashboard)
– Keystone (Authentication)
– Quantum (Network service)
OpenStack’s basic requirement: “Clouds
must be simple to implement and
massively scalable.”
6. OpenStack Value Proposition
• Limits costly software licenses
• Limits lock-in by proprietary vendors & by cloud
providers (Amazon)
• Allows for massive scalability
• Open source hypervisor support
– KVM, Xen, LXE
– ESX & HyperV support is lagging
• Offers standard APIs enabling growing cloud
ecosystem
7. Dell Positioned Well in OpenStack
• FIRST hardware solutions vendor to back
OpenStack
• FIRST OpenStack integrated Solution in market
Market
• FIRST OpenStack deployment solution: Crowbar
Offerings
• Industry recognition of Dell leadership
• Regular releases, webinars, white papers, case
studies, press coverage, etc
• Sponsor of each OpenStack Conference to date
Community
• Lead monthly Austin & Boston OpenStack meetups
Advocacy
• Numerous cloud / community events worldwide
• Gold-level sponsor of new OpenStack Foundation
OpenStack • 2 seats on Foundation Board of Directors
Governance • Active participation in developing community
bylaws, code of conduct, etc
• Winning, differentiated Dell IP
• Owner / Leader of Crowbar OSS community
• 470 followers; 31,000+ Crowbar site hits in 90 days
• 2,000+ downloads in 6 months
Crowbar • 2nd most active Dell Listserv
• Multiple “Hack Days” – 100s of world-wide
participants
• Corp contributors include Intel, Suse, Rackspace,
others
8. What is Crowbar?
Opensource Software Management Framework
• Innovative approach to bare metal deployment of complex
environments such as Cloud (Openstack and Big Data /
Hadoop)
• Extendable architecture that includes Barclamps
– Independent modules that add functionality such as support for
network Switches or new software components such as Ceph
• Intel, Secureworks, VMware Cloud Foundry,Dreamhost Ceph
database, SUSE Operating System and Rackspace have all
contributed to community
• Enables “operational model “ that deploys bare metal reference
architectures to fully operational environments in hours
(Openstack and Hadoop)
• Designed for Hyperscale environments
– Anticipates extensions via “Barclamps”
– Plans for scale using “Proposals”
• Focuses on shift to DevOps management capabilities and
continuous integration
9. Inktank and Dell Partnership
• Inktank is a Strategic partner for Dell in Emerging Solutions
• The Emerging Solutions Ecosystem Partner Program is
designed to deliver complementary cloud components
• As part of this program, Dell and Inktank provide:
– Ceph Storage Software
› Adds scalable cloud storage to the Dell OpenStack-powered
cloud
› Uses Crowbar to provision and configure a Ceph cluster
– Professional Services, Support, and Training
› Collaborative Support for Dell hardware customers
– Joint Solution
› Validated against Dell Reference Architectures via the
Technology Partner program
11. Cloud Storage Requirements
• Scalable to Many Petabytes
• Flexible Configuration
• Rapid Provisioning
• Delegated Management
• Commodity Hardware
• No Single Point of Failure
• Self-Managing with Fail-in-Place Maintenance
• Dynamic Data Placement
• Proportional Data Movement
• Support for VM Migration
• Unified Storage Capabilities
12. Addressing OpenStack Gap in Block Storage
• Swift has been a good Object Storage starting point
– Mature and solid foundation for objects and images
– Can be a headache across failures and hardware reconfiguration
– Object only: no Block or File capabilities
– Not for use cases that require
› High performance storage
› Database operations
› Use of existing application that need a file system or block-storage access
• Nova-Volume using LVM and iSCSI
– Many layers between storage and VM instances
› LVM, iSCSI Target, iSCSI Initiator, Linux Disk, Hypervisor
– Node a single point of failure
– RAID reconstruction takes a long time and degrades performance
13. Ceph Closes the Block Gap
• Integrated with Crowbar – Provides a barclamp to provision and
configure a Ceph cluster and simplifies growing deployments
• Tested with OpenStack – Extensive testing of Ceph with OpenStack over
several years of both products’ evolution by the Ceph team
• Lowers cost per Gigabyte – Delivers the capabilities of expensive
traditional block storage solutions at utility hardware prices
• Scalable to many Petabytes – Single cluster can hold hundreds of nodes
and many Petabytes of managed data
• No Single Point of Failure – Infrastructure-aware data placement
distributes replicas across fault zones to ensure NoSPoF
• Self-Managing and Fail-in-Place – Autonomous operation allows
deployment in remote datacenters without worry about frequent visits
• Dynamic data placement – Data distributed evenly across the cluster,
and moved only in proportion to how much of the cluster changes
• Unified Storage – Match access mechanism to application needs by using
Ceph’s Block, Object, or File access modes.
14. Key Differentiators
CRUSH data placement algorithm
Metadata is primarily computed rather than stored
CRUSH computation is distributed to the most scalable components
Algorithm is infrastructure aware and quickly adjusts to failures
Advanced Virtual Block Device
Enterprise storage capabilities from utility server hardware
Thin Provisioned, Allocate-on-Write Snapshots, LUN cloning
In the Linux kernel and integrated with OpenStack components
Open Source Solution
Maximize value by leveraging free software
Control own destiny with access to the source code
Customize and modify the code to provide differentiated value
Unified storage platform (Object + Block + File)
Multiple uses cases satisfied in a single storage cluster
Manage single, autonomous system integrated with OpenStack
Economies of scale from sharing large set of disks with many apps
15. Ceph Provides Advanced Block Storage Capabilities for
OpenStack Clouds
Easy Spin-up, Back-up, and Cloning of VMs
• Persistence – volumes are persistent by default; VMs behave more like
traditional servers and do not disappear when you reboot them
• Host independence – enables VM migration; compute and storage
resources can be scaled independently; compute hosts can be diskless
• Easy snapshots – snapshots of instances are easier and act more like
backups
• Easy cloning – 2 step process:
• Use API to create a volume
• Populate it with the contents of an image from Glance,
• You are ready to boot from the new volume
• Thin provisioning – fast instance creation with copy-on-write cloning if you
store both your Glance images and your volumes as Ceph block devices
17. Ceph
APP APP HOST/VM CLIENT
Ceph Object Ceph Block Ceph Distributed
Ceph Object Gateway (RBD) File System
Library (RADOS (CephFS)
(LIBRADOS) Gateway) A reliable and fully-
distributed block A POSIX-compliant
device distributed file
A library allowing A RESTful gateway
applications to for object storage system
directly
access Ceph Object
Storage
Ceph Object Storage
(RADOS)
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
18. RADOS Components
Monitors:
M
• Maintain cluster map
• Provide consensus for
distributed decision-making
• Must have an odd number
• These do not serve stored
objects to clients
RADOS Storage Nodes
containing Object Storage
Daemons (OSDs):
• One OSD per disk (recommended)
• At least three nodes in a cluster
• Serve stored objects to clients
• Intelligently peer to perform
replication tasks
• Supports object classes
19. RADOS Cluster Makeup
OSD OSD OSD OSD OSD
RADOS
Node
btrfs
FS FS FS FS FS xfs
ext4
DISK DISK DISK DISK DISK
M M M
RADOS
Cluster
20. Distributed Block Access
APP APP HOST/VM CLIENT
Ceph Object Ceph Block Ceph Distributed
Ceph Object Gateway (RBD) File System
Library (RADOS (CephFS)
(LIBRADOS) Gateway) A reliable and fully-
distributed block A POSIX-compliant
device distributed file
A library allowing A RESTful gateway
applications to for object storage system
directly
access Ceph Object
Storage
Ceph Object Storage
(RADOS)
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
21. RADOS Block Device
• Store virtual disks in RADOS
• Allows decoupling of VMs and
nodes
Ø Live Migration
• Images are striped across the
cluster for better throughput
• Boot support in QEMU, KVM,
and OpenStack Nova
• Mount support in the Linux
kernel for VM applications
22. VM
• Boot Device
• Application Storage
VIRTUALIZATION CONTAINER
LIBRBD
LIBRADOS
M
M M
25. RADOS Gateway
Web Services Access (REST) for Applications to Object Storage
APP APP HOST/VM CLIENT
Ceph Object Ceph Block Ceph Distributed
Ceph Object Gateway (RBD) File System
Library (RADOS (CephFS)
(LIBRADOS) Gateway) A reliable and fully-
distributed block A POSIX-compliant
device distributed file
A library allowing A RESTful gateway
applications to for object storage system
directly
access Ceph Object
Storage
Ceph Object Storage
(RADOS)
A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
26. RADOS Gateway
Web Services Access (REST) for Applications to Object Storage
APP APP
REST REST
• REST-based interface
to RADOS
• Supports buckets,
accounting
• Compatible with S3 RADOSGW RADOSGW
and Swift
LIBRADOS LIBRADOS
applications
Native
Protocol
M
M M
27. VOTE
Using the Votes Bottom on the top of the presentation panel
please take 30 seconds answer the following questions to
help us better understand you.
1. Are you currently exploring OpenStack for a project?
2. Are you looking to implement Ceph for OpenStack
storage within the next 6 months?
3. Do you anticipate wanting help to deploy OpenStack
and Ceph, for your project?
29. Configure Networking within the Rack
1GbE link
10GbE link
Connect to 40GbE link
Spine Routers
High-Speed Low-Speed
Distribution (Leaf) Switch Access Switch
e.g., Dell Force10 S4810 e.g., Dell Force10 S55
Connect to
Nodes in Rack
Front Back IPMI Management
Side Side
• Plan for low latency and high bandwidth
• Management tasks on low speed access switch
• Leaf switch carries both front (client) and back (OSD) traffic
• 2x 40GbE uplink trunks created by aggregating 4x port per trunk
30. Configure Networking within the Pod
To Other Rows (Pods) To Other Rows (Pods)
High-Speed High-Speed
End-of-Row End-of-Row 10GbE link
(Spine) Switch (Spine) Switch 40GbE link
High-Speed High-Speed High-Speed
Top-of-Rack Top-of-Rack Top-of-Rack
(Leaf) Switch (Leaf) Switch (Leaf) Switch
Nodes in Rack Nodes in Rack Nodes in Rack
• Each Pod (e.g., row of racks) contains two Spine switches
• Each Leaf switch is redundantly uplinked to each Spine switch
• Spine switches are redundantly linked to each other with 2x 40GbE
• Each Spine switch has three uplinks to other pods with 3x 40GbE
31. Configure Networking between Pods
Spine Spine
Router Router
Spine Spine
Router Router
Spine Spine
Router Router
Spine Spine
Router Router
• Each Pod is placed on two counter-rotating rings with cut-through
• Paths maintained via routing tables
• Traffic between Pods can take:
• Clockwise path
• Counterclockwise path
• Cut-through path
• Ensures that maximum number of hops is N/4 for N pods
32. Object Storage Daemons (OSD)
• Allocate enough CPU cycles and memory per OSD
— 2GB memory and 1GHz of Xeon CPU cycles per OSD
— Usually only a fraction needed
› in error cases all can be consumed
— Trade off over-provisioning with risk tolerance
• Use SSDs as Journal devices to improve latency
— Some workloads benefit from separate journal on SSD
• Consider different tiers of storage for mixed
application clusters
— Can mix OSDs on different types of disks to create tiers
— For each pool, choose the tier and level of replication
33. Ceph Cluster Monitors
• Best practice to deploy monitor role on dedicated hardware
— Not resource intensive but critical
— Using separate hardware ensures no contention for resources
• Make sure monitor processes are never starved for resources
— If running monitor process on shared hardware, fence off resources
• Deploy an odd number of monitors (3 or 5)
— Need to have an odd number of monitors for quorum voting
— Clusters < 200 nodes work well with 3 monitors
— Larger clusters may benefit from 5
— Main reason to go to 7 is to have redundancy in fault zones
• Add redundancy to monitor nodes as appropriate
— Make sure the monitor nodes are distributed across fault zones
— Consider refactoring fault zones if needing more than 7 monitors
34. Mixed Use Deployments
• For simplicity, dedicate hardware to specific role
— That may not always be practical (e.g., small clusters)
— If needed, can combine multiple functions on same
hardware
• Multiple Ceph Roles (e.g., OSD+RGW, OSD+MDS, Mon+RGW)
— Balance IO-intensive with CPU/memory intensive roles
— If both roles are relatively light (e.g., Mon and RGW) can
combine
• Multiple Applications (e.g., OSD+Compute, Mon+Horizon)
— In OpenStack environment, may need to mix components
— Follow same logic of balancing IO-intensive with CPU
intensive
35. Deploying Ceph Roles and Objects
• Ceph depends on replication for reliability
— Data (objects) and Roles (hardware/software) exist in multiple
places
— RAID can be used but generally just adds to the expense
• Ceph incorporates notion of multiple nested fault zones
— Ceph understands nested fault zones
— Each OSD is tagged with location in multiple zones
— CRUSH uses info to distribute data as protection from zone
failures
• Roles need to be replicated in the cluster
— Ensure there are multiple nodes in the cluster with needed roles
› Monitors (3 minimum for production)
› OSD nodes (2 minimum for production)
• Ceph needs sufficient nodes per fault zone to replicate objects
— Ensure enough of each role exist in each fault zone
36. Leverage Dell Ceph Expert Support
• Dell and Inktank are your partners for complex deployments
• Solution design and Proof-of-Concept
• Solution customization
• Capacity planning
• Performance optimization
• Having access to expert support is a production best practice
• Troubleshooting
• Debugging
40. Dell and Inktank’s Professional Services
Consulting Services:
• Technical Overview
• Infrastructure Assessment
• Proof of Concept
• Implementation Support
• Performance Tuning
Support Subscriptions:
• Pre-Production Support
• Production Support
A full description of our services can be found at the following:
Consulting Services: http://www.inktank.com/consulting-services/
Support Subscriptions: http://www.inktank.com/support-services/
41. Check out our upcoming and on
demand webinars
Upcoming:
1. DreamHost Case Study: DreamObjects with Ceph
February 7, 2013
10:00AM PT, 12:00PM CT, 1:00PM ET
http://www.inktank.com/news-events/webinars/
2. Advanced Features of Ceph Distributed Storage
(delivered by Sage Weil, creator of Ceph)
February 12, 2013
10:00AM PT, 12:00PM CT, 1:00PM ET
http://www.inktank.com/news-events/webinars/
On Demand:
Getting Started with Ceph
http://www.inktank.com/news-events/webinars/
42. Contact Us
Inktank:
Info@inktank.com and 1-855-INKTANK
Don’t forget to follow us on:
Twitter: https://twitter.com/inktank
Facebook: http://www.facebook.com/inktank
YouTube: http://www.youtube.com/inktankstorage
Dell:
OpenStack@Dell.com
http://www.dell.com/OpenStack
http://www.dell.com/CrowBar
https://github.com/dellcloudedge/