Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

XSKY - ceph luminous update

761 Aufrufe

Veröffentlicht am

講者:
王豪邁 (CTO, XSKY (星辰天合(北京)数据科技有限公司)暨 Ceph 傑出核心開發者)

概要:
XSKY為一家專注於軟體定義基礎設施及儲存的新創科技公司,近年來在中國IT界迅速竄起,擔任其CTO的王豪邁先生致力於開源儲存領域多年,於2013年加入Ceph社群,也是世界首位獲選Ceph官方傑出貢獻者頭銜的core developer, 這次Ceph Day Taiwan有幸請王先生遠道而來分享今年稍早推出的Ceph Luminous版本的最新進展,難得的交流機會,請大家多多把握~!

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

XSKY - ceph luminous update

  1. 1. CEPH LUMINOUS UPDATE XSKY Haomai Wang 2017.06.06
  2. 2. • Haomai Wang, active Ceph contributor • Maintain multi components • XSKY CTO, a China-based storage startup • haomaiwang@gmail.com/haomai@xsky.com Who Am I
  3. 3. • Hammer v0.94.x (LTS) – March '15 • Infernalis v9.2.x – November '15 • Jewel v10.2.x (LTS) – April '16 • Kraken v11.2.x – December '16 • Luminous v12.2.x (LTS) – September ‘17 (delay) Releases
  4. 4. Ceph Ecosystem
  5. 5. • BlueStore=Block+NewStore • Key/value database(RocksDB) for metadata • All data written directly to raw block device(s) • Fast on both HDDs (~2x) and SSDs (~1.5x) – Similar to FileStore on NVMe, where the device is not the bottleneck • Full data checksums (crc32c, xxhash, etc.) • Inline compression(zlib, snappy, zstd) • Stable and default RADOS - BlueStore
  6. 6. • requires BlueStore to perform reasonably • signicant improvement in effciency over 3x replication • 2+2 → 2x 4+2 → 1.5x • small writes slower than replication – early testing showed 4+2 is about half as fast as 3x replication • large writes faster than replication – less IO to device • implementation still does the “simple” thing – all writes update a full stripe RADOS – RBD Over Erasure Code
  7. 7. • ceph-mgr – new management daemon to supplement ceph-mon (monitor) – easier integration point for python management logic – integrated metrics • make ceph-mon scalable again – offload pg stats from mon to mgr – push to 10K OSDs (planned “big bang 3” @ CERN) • new REST API – pecan – based on previous Calamari API • built-in web dashbard CEPH-MGR
  8. 8. AsyncMessenger • AsyncMessenger – Core Library included by all components – Kernel TCP/IP driver – Epoll/Kqueue Drive – Maintain connection lifecycle and session – replaces aging SimpleMessenger – fixed size thread pool (vs 2 threads per socket) – scales better to larger clusters – more healthy relationship with tcmalloc – now the default!
  9. 9. DPDK Support • Built for High Performance – DPDK – SPDK – Full userspace IO path – Shared-nothing TCP/IP Stack(Seastar refer)
  10. 10. • RDMA backend – Inherit NetworkStack and implement RDMAStack – Using user-space verbs directly – TCP as control path – Exchange message using RDMA SEND – Using shared receive queue – Multiple connection qp’s in many-to-many topology – Built-in into ceph master – All Features are fully avail on ceph master • Support: – RH/centos – INFINIBAND and ETH – Roce V2 for cross subnet – Front-end TCP and back-end RDMA RDMA Support
  11. 11. Plugin Default Hardware Requirement Performance Compatible OSD Storage Engine Requirement OSD Disk Backend Requirement Posix(Kernel) YES None Middle TCP/IP Compatible None None DPDK+Userspace TCP/IP NO DPDK Supported NIC High TCP/IP Compatible BlueStore Must be NVME SSD RDMA NO RDMA Supported NIC High RDMA Supported Network None None Messenger Plugins
  12. 12. Recovery Improvements
  13. 13. RBD - iSCSI • TCMU-RUNNER + LIBRBD • LIO + Kernel RBD
  14. 14. RBD Mirror HA
  15. 15. RGW METADATA SEARCH
  16. 16. RGW MISC • NFS gateway – NFSv4 and v3 – full object access (not general purpose!) • dynamic bucket index sharding – automatic ( nally!) • inline compression • Encryption – follows S3 encryption APIs • S3 and Swift API odds and ends NFS- Client nfs-ganesha (nfs-v4) librgw-file RADOS NFS-Server RadosGW Apps rados api S3 API Swift API rados api RadosHandler
  17. 17. • multiple active MDS daemons ( nally!) • subtree pinning to specific daemon • directory fragmentation on by default • (snapshots still o by default) so many tests • so many bugs fixed • kernel client improvements CephFS
  18. 18. CephFS – MultiMDS
  19. 19. Container
  20. 20. • Rados – IO Path Refactor – BlueStore Peformance • QoS – dmclock • Dedup – based on Tiering • Tiering Future
  21. 21. Growing Developer Community
  22. 22. How To Help
  23. 23. Thank you

×