Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Ceph Overview for Distributed Computing Denver Meetup

989 Aufrufe

Veröffentlicht am

Ceph Overview slides for Distributed Computing Denver Meetup on Apr 23, 2015. http://www.meetup.com/Distributed-Computing-Denver/events/220642902/

Veröffentlicht in: Software
  • Als Erste(r) kommentieren

Ceph Overview for Distributed Computing Denver Meetup

  1. 1. CEPH: A MASSIVELY SCALABLE DISTRIBUTED STORAGE SYSTEM Ken Dreyer Software Engineer Apr 23 2015
  2. 2. Hitler finds out about software-defined storage If you are a proprietary storage vendor...
  3. 3. THE FUTURE OF STORAGE Traditional Storage Complex proprietary silos Open Software Defined Storage Standardized, unified, open platforms USER ADMIN USER ADMIN Custom GUI Proprietary Software Custom GUI Proprietary Software Proprietary Hardware Proprietary Hardware Standard Computers and Disks Commodity Hardware OpenSource Software Ceph Control Plane (API, GUI) ADMIN USER
  4. 4. THE JOURNEY Open Software-Defined Storage is a fundamental reimagining of how storage infrastructure works. It provides substantial economic and operational advantages, and it has quickly become ideally suited for a growing number of use cases. TODAY EMERGING FUTURE Cloud Infrastructure Cloud Native Apps Analytics Hyper- Convergence Containers ??? ???
  5. 5. HISTORICAL TIMELINE RHEL-OSP Certification FEB 2014 MAY 2012 Launch of Inktank OpenStack Integration 2011 2010 Mainline Linux Kernel Open Source 2006 2004 Project Starts at UCSC Production Ready Ceph SEPT 2012 2012 CloudStack Integration OCT 2013 Inktank Ceph Enterprise Launch Xen Integration 2013 APR 2014 Inktank Acquired by Red Hat 10 years in the making 5
  6. 6. ARCHITECTURE
  7. 7. ARCHITECTURAL COMPONENTS 7 RGW A web services gateway for object storage, compatible with S3 and Swift LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors RBD A reliable, fully- distributed block device with cloud platform integration CEPHFS A distributed file system with POSIX semantics and scale- out metadata management APP HOST/VM CLIENT
  8. 8. OBJECT STORAGE DAEMONS 8 FS DISK OSD DISK OSD FS DISK OSD FS DISK OSD FS btrfs xfs ext4 zfs? M M M
  9. 9. RADOS CLUSTER 9 APPLICATION M M M M M RADOS CLUSTER
  10. 10. RADOS COMPONENTS 10 OSDs:  10s to 10000s in a cluster  One per disk (or one per SSD, RAID group…)  Serve stored objects to clients  Intelligently peer for replication & recovery Monitors:  Maintain cluster membership and state  Provide consensus for distributed decision- making  Small, odd number  These do not serve stored objects to clients M
  11. 11. WHERE DO OBJECTS LIVE? 11 ?? APPLICATION M M M OBJECT
  12. 12. A METADATA SERVER? 12 1 APPLICATION M M M 2
  13. 13. CALCULATED PLACEMENT 13 FAPPLICATION M M M A-G H-N O-T U-Z
  14. 14. EVEN BETTER: CRUSH! 14 CLUSTER OBJECTS 10 01 01 10 10 01 11 01 10 01 01 10 10 01 11 01 1001 0110 10 01 11 01 PLACEMENT GROUPS (PGs)
  15. 15. CRUSH IS A QUICK CALCULATION 15 RADOS CLUSTER OBJECT 10 01 01 10 10 01 11 01 1001 0110 10 01 11 01
  16. 16. CRUSH: DYNAMIC DATA PLACEMENT 16 CRUSH:  Pseudo-random placement algorithm  Fast calculation, no lookup  Repeatable, deterministic  Statistically uniform distribution  Stable mapping  Limited data migration on change  Rule-based configuration  Infrastructure topology aware  Adjustable replication  Weighting
  17. 17. CRUSH OBJECT 10 10 01 01 10 10 01 11 01 10 hash(object name) % num pg CRUSH(pg, cluster state, rule set)
  18. 18. 18 OBJECT 10 10 01 01 10 10 01 11 01 10
  19. 19. 19 CLIENT ??
  20. 20. 20
  21. 21. 21
  22. 22. 22 CLIENT ??
  23. 23. 23
  24. 24. 24
  25. 25. 25
  26. 26. ARCHITECTURAL COMPONENTS 26 RGW A web services gateway for object storage, compatible with S3 and Swift LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors RBD A reliable, fully- distributed block device with cloud platform integration CEPHFS A distributed file system with POSIX semantics and scale- out metadata management APP HOST/VM CLIENT
  27. 27. ACCESSING A RADOS CLUSTER 27 APPLICATION M M M RADOS CLUSTER LIBRADOS OBJECT socket
  28. 28. L LIBRADOS: RADOS ACCESS FOR APPS 28 LIBRADOS:  Direct access to RADOS for applications  C, C++, Python, PHP, Java, Erlang  Direct access to storage nodes  No HTTP overhead
  29. 29. ARCHITECTURAL COMPONENTS 29 RGW A web services gateway for object storage, compatible with S3 and Swift LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors RBD A reliable, fully- distributed block device with cloud platform integration CEPHFS A distributed file system with POSIX semantics and scale- out metadata management APP HOST/VM CLIENT
  30. 30. THE RADOS GATEWAY 30 M M M RADOS CLUSTER RADOSGW LIBRADOS socket RADOSGW LIBRADOS APPLICATION APPLICATION REST
  31. 31. RADOSGW MAKES RADOS WEBBY 31 RADOSGW:  REST-based object storage proxy  Uses RADOS to store objects  API supports buckets, accounts  Usage accounting for billing  Compatible with S3 and Swift applications
  32. 32. ARCHITECTURAL COMPONENTS 32 RGW A web services gateway for object storage, compatible with S3 and Swift LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors RBD A reliable, fully- distributed block device with cloud platform integration CEPHFS A distributed file system with POSIX semantics and scale- out metadata management APP HOST/VM CLIENT
  33. 33. STORING VIRTUAL DISKS 33 M M RADOS CLUSTER HYPERVISOR LIBRBD VM
  34. 34. SEPARATE COMPUTE FROM STORAGE 34 M M RADOS CLUSTER HYPERVISOR LIBRBD VM HYPERVISOR LIBRBD
  35. 35. KRBD - KERNEL MODULE M M RADOS CLUSTER LINUX HOST KRBD
  36. 36. RBD STORES VIRTUAL DISKS RADOS BLOCK DEVICE:  Storage of disk images in RADOS  Decouples VMs from host  Images are striped across the cluster (pool)  Snapshots  Copy-on-write clones  Support in:  Mainline Linux Kernel (2.6.39+) and RHEL 7  Qemu/KVM, native Xen coming soon  OpenStack, CloudStack, Nebula, Proxmox
  37. 37. Export snapshots to geographically dispersed data centers ▪ Institute disaster recovery Export incremental snapshots ▪ Minimize network bandwidth by only sending changes RBD SNAPSHOTS
  38. 38. ARCHITECTURAL COMPONENTS RGW A web services gateway for object storage, compatible with S3 and Swift LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors RBD A reliable, fully- distributed block device with cloud platform integration CEPHFS A distributed file system with POSIX semantics and scale- out metadata management APP HOST/VM CLIENT
  39. 39. SEPARATE METADATA SERVER LINUX HOST M M M RADOS CLUSTER KERNEL MODULE datametadata 01 10
  40. 40. SCALABLE METADATA SERVERS METADATA SERVER  Manages metadata for a POSIX-compliant shared filesystem  Directory hierarchy  File metadata (owner, timestamps, mode, etc.)  Stores metadata in RADOS  Does not serve file data to clients  Only required for shared filesystem
  41. 41. CALAMARI 41
  42. 42. CALAMARI ARCHITECTURE CEPH STORAGE CLUSTER MASTER CALAMARI ADMIN NODE MINION MINION M MINION MINION M MINIONMINION M
  43. 43. USE CASES
  44. 44. WEB APPLICATION STORAGE WEB APPLICATION APP SERVER APP SERVER APP SERVER CEPH STORAGE CLUSTER (RADOS) CEPH OBJECT GATEWAY (RGW) CEPH OBJECT GATEWAY (RGW) APP SERVER S3/Swift S3/Swift S3/Swift S3/Swift
  45. 45. MULTI-SITE OBJECT STORAGE WEB APPLICATION APP SERVER CEPH OBJECT GATEWAY (RGW) CEPH STORAGE CLUSTER (US-EAST) WEB APPLICATION APP SERVER CEPH OBJECT GATEWAY (RGW) CEPH STORAGE CLUSTER (EU-WEST)
  46. 46. ARCHIVE / COLD STORAGE APPLICATION CACHE POOL (REPLICATED) BACKING POOL (ERASURE CODED) CEPH STORAGE CLUSTER
  47. 47. ERASURE CODING 47 OBJECT REPLICATED POOL CEPH STORAGE CLUSTER ERASURE CODED POOL CEPH STORAGE CLUSTER COPY COPY OBJECT 31 2 X Y COPY 4 Full copies of stored objects  Very high durability  Quicker recovery One copy plus parity  Cost-effective durability  Expensive recovery
  48. 48. ERASURE CODING: HOW DOES IT WORK? 48 CEPH STORAGE CLUSTER OBJECT Y OSD 3 OSD 2 OSD 1 OSD 4 OSD X OSD ERASURE CODED POOL
  49. 49. CACHE TIERING 49 CEPH CLIENT CACHE: WRITEBACK MODE BACKING POOL (REPLICATED) CEPH STORAGE CLUSTER Read/Write Read/Write
  50. 50. WEBSCALE APPLICATIONS 50 WEB APPLICATION APP SERVER APP SERVER APP SERVER CEPH STORAGE CLUSTER (RADOS) APP SERVER Native Protocol Native Protocol Native Protocol Native Protocol
  51. 51. ARCHIVE / COLD STORAGE 51 APPLICATION CACHE POOL (REPLICATED) BACKING POOL (ERASURE CODED) CEPH STORAGE CLUSTER
  52. 52. CEPH BLOCK DEVICE (RBD) DATABASES 52 MYSQL / MARIADB LINUX KERNEL CEPH STORAGE CLUSTER (RADOS) Native Protocol Native Protocol Native Protocol Native Protocol
  53. 53. Future Ceph Roadmap
  54. 54. CEPH ROADMAP 57 Hammer (current release) Infernalis J-Release NewStore Object Expiration Performance Improvements Stable CephFS?Object Versioning Alternative Web Server for RGW Performance Improvements ??? Performance Improvements
  55. 55. NEXT STEPS
  56. 56. NEXT STEPS WHAT NOW? • Read about the latest version of Ceph: http://ceph.com/docs • Deploy a test cluster using ceph-deploy: http://ceph.com/ qsg Getting Started with Ceph  Most discussion happens on the mailing lists ceph-devel and ceph-users. Join or view archives at http://ceph.com/list  IRC is a great place to get help (or help others!) #ceph and #ceph-devel. Details and logs at http://ceph.com/irc Getting Involved with Ceph 59 • Deploy a test cluster on the AWS free- tier using Juju: http://ceph.com/juju • Ansible playbooks for Ceph: https://www.github.com /alfredodeza/ceph-ansible  Download the code: http: //www.github.com/ceph  The tracker manages bugs and feature requests. Register and start looking around at http://tracker.ceph.com  Doc updates and suggestions are always welcome. Learn how to contribute docs at http://ceph.com/docwriting
  57. 57. Thank You
  58. 58. extras ● metrics.ceph.com ● http://yahooeng.tumblr.com/post/116391291701 /yahoo-cloud-object-store-object-storage-at

×