SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Swift at Scale:

The IBM SoftLayer Story
Brian Cline, Object Storage Development Lead
OpenStack Summit • Ocata series
2016.10.25 Barcelona, Spain
twitter/irc: @briancline
Our History with Public Object Storage
• 2012 — First three clusters go live (DAL, AMS, SNG)
• 2014 — Dedicated development team
• 2014 — Launch 11 clusters in new datacenters
• 2015 — Launch 5 clusters in new DCs
• 2015 — Product integrations with IBM Bluemix
• 2016 — Launch 3 clusters in new DCs

(and expand an existing cluster into multiple DCs)
2012: When things were [mostly] simpler…
• 7-10 nodes in each cluster
• Two node types
• Proxy
• Data - account, container, object services
• Load balancer
• FreeBSD with ZFS ⚠ Do not attempt.
• No centralized logs
• No log analysis tools
2016: Adjusted for scale (blood, sweat, tears, dreams, starlight…)
• Up to hundreds of nodes per cluster
• Three node types
• Proxy
• Meta - account and container services
• Data - object services
• Load balancer cluster
• Debian Linux
• Centralized and searchable logs
• Analytics via Spark and Hadoop
Our Scale
22 Swift Clusters
24 Datacenters
16 Countries
Tens of Billions of Objects
7 Million Containers
Hundreds of thousands of Swift Accounts
90 PB of Capacity
Thousands of Nodes
40,000+ Disks
Tens of thousands requests per second
GET HEAD PUT DELETE
(with notable variability between clusters)
Hardware
Hardware we like
• Supermicro 36-disk chassis
• 12-16 physical cores

(24-32 HT cores)
• 128GB RAM for proxies
• 256GB RAM for data nodes
• 10Gbps NICs (separate API vs.
storage/replication networks)
• 3 - 4 TB disks
• Controller card
• 2 disks for OS (RAID1)
• 1 disk for OS hotswap
• 4 disks for SSD caching
• 29 disks for data storage
• Usually expand by ½-row or a

full row at a time
Software
Our Stack — Software
OS Debian
Base Swift (duh) — sometimes with backports
Authentication
Swauth — some internal patches and enhancements
Keystone (APIv3) — starting with Bluemix accounts
Metadata Search Elasticsearch
Monitoring &
Logging
collectd, Nagios, Capacity Dashboard
Logstash, Kibana, Graphite, Grafana
slogging
Automation Chef, Jenkins, Fabric
Our Stack — Custom Middlewares
• CDN operations (purge, load, CNAMEs, TTL, compression, etc.)
• CDN origin pull
• Search indexer (on successful PUT/POST/DELETE)
• Search query operations
• Checkpoint (account enable/disable/etc. abilities for resellers)
• Internal management (sysmeta read/write, proxy-level recon)
Lessons Learned
Lessons Learned: Automation
• Make automation a must-have, day-one deliverable
• Never launch something new without test/deploy automation
• Must work across all environments (dev, QA, UAT/staging, prod)
• Automation needs tests and metrics, too — it is code!
• Functional testing should be an automated part of every deploy
• Remember your orchestration (knowledge of Swift zones)
Lessons Learned: Monitoring
• Scale test any monitoring/logging infrastructure you put into place
• Very obvious stuff:
• Space and IOPS, errors from SMART/XFS/kernel/controller, etc.
• HTTP response code aggregates, latency aggregates by verb, etc.
• Swift metrics:
• If nothing else, async pendings
• Replicator failures and partitions/sec rates
• Replicator last completion timestamp vs. ring push timestamp
Lessons Learned: Monitoring
• Any middleware you create needs to emit ops metrics
• New features benefit from emitting usage metrics
• Don’t forget debug-level log messages
• Automatic checks for precipitating conditions that lead to failures

(not just for the error log lines that result from them afterwards)
Lessons Learned: Rebalancing
• Keep tabs on your rebalance times

(and keep them small when possible)
• Coordinate rebalances around node/cluster maintenance
• Don’t let IOPS levels grow too high before expanding capacity
• Customer IOPS vs. Replicator & Auditor IOPS — know your limits
Lessons Learned: Swift
• Use 256 byte inode sizes (or the smallest you can get away with)
• Using swauth? Use an SSD storage policy for AUTH_.auth containers
• Namespace any custom API additions (and be consistent)
• When possible, ask community about new middleware thoughts
• Upstream is important! Stay involved and give back when possible
Thank you!
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...ScyllaDB
 
MySQL HA Percona cluster @ MySQL meetup Mumbai
MySQL HA Percona cluster @ MySQL meetup MumbaiMySQL HA Percona cluster @ MySQL meetup Mumbai
MySQL HA Percona cluster @ MySQL meetup MumbaiRemote MySQL DBA
 
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseFireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseScyllaDB
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScyllaDB
 
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in GoScylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in GoScyllaDB
 
High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016Eric Sammer
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterScyllaDB
 
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...Lucidworks
 
Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka confluent
 
Scaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeScaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeAngad Singh
 
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScyllaDB
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScyllaDB
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB
 
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating  Volatile Latencies Inside Rakuten’s NoSQL MigrationEliminating  Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating Volatile Latencies Inside Rakuten’s NoSQL MigrationScyllaDB
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...DataStax Academy
 
Introducing DataStax Enterprise 4.7
Introducing DataStax Enterprise 4.7Introducing DataStax Enterprise 4.7
Introducing DataStax Enterprise 4.7DataStax
 
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...Lucidworks
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...ScyllaDB
 
WSO2Con ASIA 2016: An Introduction to the WSO2 Analytics Platform
WSO2Con ASIA 2016: An Introduction to the WSO2 Analytics PlatformWSO2Con ASIA 2016: An Introduction to the WSO2 Analytics Platform
WSO2Con ASIA 2016: An Introduction to the WSO2 Analytics PlatformWSO2
 

Was ist angesagt? (20)

Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
 
MySQL HA Percona cluster @ MySQL meetup Mumbai
MySQL HA Percona cluster @ MySQL meetup MumbaiMySQL HA Percona cluster @ MySQL meetup Mumbai
MySQL HA Percona cluster @ MySQL meetup Mumbai
 
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseFireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDB
 
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in GoScylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
 
High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
 
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
 
Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka
 
Scaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeScaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays Singapore
 
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDS
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent Databases
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating  Volatile Latencies Inside Rakuten’s NoSQL MigrationEliminating  Volatile Latencies Inside Rakuten’s NoSQL Migration
Eliminating Volatile Latencies Inside Rakuten’s NoSQL Migration
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
 
Introducing DataStax Enterprise 4.7
Introducing DataStax Enterprise 4.7Introducing DataStax Enterprise 4.7
Introducing DataStax Enterprise 4.7
 
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
 
WSO2Con ASIA 2016: An Introduction to the WSO2 Analytics Platform
WSO2Con ASIA 2016: An Introduction to the WSO2 Analytics PlatformWSO2Con ASIA 2016: An Introduction to the WSO2 Analytics Platform
WSO2Con ASIA 2016: An Introduction to the WSO2 Analytics Platform
 

Andere mochten auch

Dockerize the World - presentation from Hradec Kralove
Dockerize the World - presentation from Hradec KraloveDockerize the World - presentation from Hradec Kralove
Dockerize the World - presentation from Hradec Kralovedamovsky
 
Supporting Debian machines for friends and family
Supporting Debian machines for friends and familySupporting Debian machines for friends and family
Supporting Debian machines for friends and familyFrancois Marier
 
Disksim with SSD_extension
Disksim with SSD_extensionDisksim with SSD_extension
Disksim with SSD_extensioncucufrog
 
How to build Debian packages
How to build Debian packages How to build Debian packages
How to build Debian packages Priyank Kapadia
 
Debian Cloud - building the Debian AMIs
Debian Cloud - building the Debian AMIsDebian Cloud - building the Debian AMIs
Debian Cloud - building the Debian AMIsJames Bromberger
 
Debian 套件打包教學指南 v0.19 - 繁體中文翻譯
Debian 套件打包教學指南 v0.19 - 繁體中文翻譯Debian 套件打包教學指南 v0.19 - 繁體中文翻譯
Debian 套件打包教學指南 v0.19 - 繁體中文翻譯SZ Lin
 
SR-IOV+KVM on Debian/Stable
SR-IOV+KVM on Debian/StableSR-IOV+KVM on Debian/Stable
SR-IOV+KVM on Debian/Stablejuet-y
 
Debian Packaging tutorial
Debian Packaging tutorialDebian Packaging tutorial
Debian Packaging tutorialnussbauml
 
Deep Dive: Maximizing EC2 and EBS Performance
Deep Dive: Maximizing EC2 and EBS PerformanceDeep Dive: Maximizing EC2 and EBS Performance
Deep Dive: Maximizing EC2 and EBS PerformanceAmazon Web Services
 
Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)
Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)
Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)Shinya Takamaeda-Y
 
Embedded Linux/ Debian with ARM64 Platform
Embedded Linux/ Debian with ARM64 PlatformEmbedded Linux/ Debian with ARM64 Platform
Embedded Linux/ Debian with ARM64 PlatformSZ Lin
 
Optimizing Oracle databases with SSD - April 2014
Optimizing Oracle databases with SSD - April 2014Optimizing Oracle databases with SSD - April 2014
Optimizing Oracle databases with SSD - April 2014Guy Harrison
 
SSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLSSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLYoshinori Matsunobu
 

Andere mochten auch (17)

Dockerize the World - presentation from Hradec Kralove
Dockerize the World - presentation from Hradec KraloveDockerize the World - presentation from Hradec Kralove
Dockerize the World - presentation from Hradec Kralove
 
Supporting Debian machines for friends and family
Supporting Debian machines for friends and familySupporting Debian machines for friends and family
Supporting Debian machines for friends and family
 
Disksim with SSD_extension
Disksim with SSD_extensionDisksim with SSD_extension
Disksim with SSD_extension
 
MySQL and SSD
MySQL and SSDMySQL and SSD
MySQL and SSD
 
How to build Debian packages
How to build Debian packages How to build Debian packages
How to build Debian packages
 
Debian Cloud - building the Debian AMIs
Debian Cloud - building the Debian AMIsDebian Cloud - building the Debian AMIs
Debian Cloud - building the Debian AMIs
 
Debian 套件打包教學指南 v0.19 - 繁體中文翻譯
Debian 套件打包教學指南 v0.19 - 繁體中文翻譯Debian 套件打包教學指南 v0.19 - 繁體中文翻譯
Debian 套件打包教學指南 v0.19 - 繁體中文翻譯
 
SR-IOV+KVM on Debian/Stable
SR-IOV+KVM on Debian/StableSR-IOV+KVM on Debian/Stable
SR-IOV+KVM on Debian/Stable
 
Debian Packaging tutorial
Debian Packaging tutorialDebian Packaging tutorial
Debian Packaging tutorial
 
Deep Dive: Maximizing EC2 and EBS Performance
Deep Dive: Maximizing EC2 and EBS PerformanceDeep Dive: Maximizing EC2 and EBS Performance
Deep Dive: Maximizing EC2 and EBS Performance
 
Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)
Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)
Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)
 
Embedded Linux/ Debian with ARM64 Platform
Embedded Linux/ Debian with ARM64 PlatformEmbedded Linux/ Debian with ARM64 Platform
Embedded Linux/ Debian with ARM64 Platform
 
Solid state drives
Solid state drivesSolid state drives
Solid state drives
 
Optimizing Oracle databases with SSD - April 2014
Optimizing Oracle databases with SSD - April 2014Optimizing Oracle databases with SSD - April 2014
Optimizing Oracle databases with SSD - April 2014
 
Linux introduction
Linux introductionLinux introduction
Linux introduction
 
SSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQLSSD Deployment Strategies for MySQL
SSD Deployment Strategies for MySQL
 
SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)
 

Ähnlich wie Swift at Scale: The IBM SoftLayer Story

Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...DataStax Academy
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsJulien Anguenot
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyCeph Community
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackRich Lee
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructureharendra_pathak
 
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for TomorrowOpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for TomorrowEd Balduf
 
Introducing Cloudian HyperStore 6.0
Introducing Cloudian HyperStore 6.0Introducing Cloudian HyperStore 6.0
Introducing Cloudian HyperStore 6.0Cloudian
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2Thomas Segismont
 
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015Belmiro Moreira
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introductionkanedafromparis
 
Redis Streams - Fiverr Tech5 meetup
Redis Streams - Fiverr Tech5 meetupRedis Streams - Fiverr Tech5 meetup
Redis Streams - Fiverr Tech5 meetupItamar Haber
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWSTom Laszewski
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 

Ähnlich wie Swift at Scale: The IBM SoftLayer Story (20)

Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
 
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for TomorrowOpenStack Cinder, Implementation Today and New Trends for Tomorrow
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
 
Introducing Cloudian HyperStore 6.0
Introducing Cloudian HyperStore 6.0Introducing Cloudian HyperStore 6.0
Introducing Cloudian HyperStore 6.0
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2
 
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Redis Streams - Fiverr Tech5 meetup
Redis Streams - Fiverr Tech5 meetupRedis Streams - Fiverr Tech5 meetup
Redis Streams - Fiverr Tech5 meetup
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 

Kürzlich hochgeladen

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Kürzlich hochgeladen (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Swift at Scale: The IBM SoftLayer Story

  • 1. Swift at Scale:
 The IBM SoftLayer Story Brian Cline, Object Storage Development Lead OpenStack Summit • Ocata series 2016.10.25 Barcelona, Spain twitter/irc: @briancline
  • 2. Our History with Public Object Storage • 2012 — First three clusters go live (DAL, AMS, SNG) • 2014 — Dedicated development team • 2014 — Launch 11 clusters in new datacenters • 2015 — Launch 5 clusters in new DCs • 2015 — Product integrations with IBM Bluemix • 2016 — Launch 3 clusters in new DCs
 (and expand an existing cluster into multiple DCs)
  • 3. 2012: When things were [mostly] simpler… • 7-10 nodes in each cluster • Two node types • Proxy • Data - account, container, object services • Load balancer • FreeBSD with ZFS ⚠ Do not attempt. • No centralized logs • No log analysis tools
  • 4. 2016: Adjusted for scale (blood, sweat, tears, dreams, starlight…) • Up to hundreds of nodes per cluster • Three node types • Proxy • Meta - account and container services • Data - object services • Load balancer cluster • Debian Linux • Centralized and searchable logs • Analytics via Spark and Hadoop
  • 6.
  • 7. 22 Swift Clusters 24 Datacenters 16 Countries
  • 8. Tens of Billions of Objects 7 Million Containers Hundreds of thousands of Swift Accounts
  • 9. 90 PB of Capacity Thousands of Nodes 40,000+ Disks
  • 10. Tens of thousands requests per second GET HEAD PUT DELETE (with notable variability between clusters)
  • 12. Hardware we like • Supermicro 36-disk chassis • 12-16 physical cores
 (24-32 HT cores) • 128GB RAM for proxies • 256GB RAM for data nodes • 10Gbps NICs (separate API vs. storage/replication networks) • 3 - 4 TB disks • Controller card • 2 disks for OS (RAID1) • 1 disk for OS hotswap • 4 disks for SSD caching • 29 disks for data storage • Usually expand by ½-row or a
 full row at a time
  • 14. Our Stack — Software OS Debian Base Swift (duh) — sometimes with backports Authentication Swauth — some internal patches and enhancements Keystone (APIv3) — starting with Bluemix accounts Metadata Search Elasticsearch Monitoring & Logging collectd, Nagios, Capacity Dashboard Logstash, Kibana, Graphite, Grafana slogging Automation Chef, Jenkins, Fabric
  • 15. Our Stack — Custom Middlewares • CDN operations (purge, load, CNAMEs, TTL, compression, etc.) • CDN origin pull • Search indexer (on successful PUT/POST/DELETE) • Search query operations • Checkpoint (account enable/disable/etc. abilities for resellers) • Internal management (sysmeta read/write, proxy-level recon)
  • 17. Lessons Learned: Automation • Make automation a must-have, day-one deliverable • Never launch something new without test/deploy automation • Must work across all environments (dev, QA, UAT/staging, prod) • Automation needs tests and metrics, too — it is code! • Functional testing should be an automated part of every deploy • Remember your orchestration (knowledge of Swift zones)
  • 18. Lessons Learned: Monitoring • Scale test any monitoring/logging infrastructure you put into place • Very obvious stuff: • Space and IOPS, errors from SMART/XFS/kernel/controller, etc. • HTTP response code aggregates, latency aggregates by verb, etc. • Swift metrics: • If nothing else, async pendings • Replicator failures and partitions/sec rates • Replicator last completion timestamp vs. ring push timestamp
  • 19. Lessons Learned: Monitoring • Any middleware you create needs to emit ops metrics • New features benefit from emitting usage metrics • Don’t forget debug-level log messages • Automatic checks for precipitating conditions that lead to failures
 (not just for the error log lines that result from them afterwards)
  • 20. Lessons Learned: Rebalancing • Keep tabs on your rebalance times
 (and keep them small when possible) • Coordinate rebalances around node/cluster maintenance • Don’t let IOPS levels grow too high before expanding capacity • Customer IOPS vs. Replicator & Auditor IOPS — know your limits
  • 21. Lessons Learned: Swift • Use 256 byte inode sizes (or the smallest you can get away with) • Using swauth? Use an SSD storage policy for AUTH_.auth containers • Namespace any custom API additions (and be consistent) • When possible, ask community about new middleware thoughts • Upstream is important! Stay involved and give back when possible