Eventually, Scylla Chooses Consistency

•

1 gefällt mir•1,146 views

Agreement in a distributed system is complicated but required. Scylla gained lightweight transactions through Paxos but the latter has a cost of 3X roundtrips. Raft can allow consistent transactions without the performance penalty. Beyond LWT, we plan to integrate Raft with most aspects of Scylla making a leap forward in manageability and consistency

Technologie

Raft in Scylla
Konstantin Osipov
ScyllaDB

Konstantin Osipov
Team Lead, ScyllaDB
Kostja’s one of the developers behind Scylla lightweight transactions. His current focus is
Raft log replication and its applications to schema, topology changes and tablets.
2

Recap Scylla Summit 2019
▪ LWT: the first strongly
consistent feature
▪ Available in 4.0
▪ Pay per use
3
2020-12-25: Tested with Jepsen!
UPDATE employees
SET join_date = '2018-05-19' WHERE
firstname = 'John' AND
lastname = 'Doe'
IF join_date != null;
[applied]
False

▪ 3 network round trips per write
▪ Must read the old value before write
LWT use of Paxos
4
R1
R
2
R
3
Decision made
Can I
propose
a value?
Check
condition
Accept
new
value
Learn
decision

What is Raft anyway?
▪ Raft provides strong consistency efficiently
▪ Only the leader can accept writes
5
Leader
Append
entries
Apply
Follower
Follower
Decision made
… 1 network round trip per write on average

Raft log replication
▪ Each node has a copy of Raft log
6

Scylla plans to use Raft for:
▪ Topology changes
7

Scylla plans to use Raft for:
▪ Topology changes
▪ Schema changes
8

Scylla plans to use Raft for:
▪ Schema changes
▪ Topology changes
▪ Tablets
9
0-99
100-199
200-299
Table
Node A
Tablet 1
0-99
Node B
Tablet 2
100-199
Node C
Tablet 3
200-299

Topology changes in Scylla
▪ Safe when one change is done at a time
▪ Rely on 30+ second timeouts for consistency
▪ Allowed on a significantly degraded cluster (split brain)
11
30s
👋
💡
💡

Topology changes using Raft
▪ Durable and linearizable
▪ Permit adding multiple nodes
▪ Permit background data rebalancing
▪ Require a majority of replicas alive to succeed
12

Schema changes in Scylla
▪ Each node owns a copy of the schema
▪ Schema change is first made locally
▪ Then eventually pushed through the cluster
▪ Last-timestamp-wins rule is used for reconciliation
14
Node A:
> CREATE TABLE e (a int);
OK (hash: a81e, ts: 1609420790)
Node B:
> CREATE TABLE e (a int, b int);
OK (hash: 2fa3, ts: 1609420792)

INSERT/UPDATE when schemas differ
▪ Each data request carries a schema version
▪ Missing versions can be pulled from peers
15
Node A (a81e):
> INSERT INTO e (a) VALUES (1);
hash: 2fa3, row: (1, null)
Node B (2fa3):
> INSERT INTO e (a) VALUES (1);
hash: 2fa3, row: (1, null)

Schema changes using Raft
▪ Each node continues to store a copy of the schema
▪ A change is first persisted in a global Raft log
▪ On success, it’s applied on replicas
▪ Schema changes are now linearizable and consistent
▪ Nodes catch up with schema history during boot
The
Speaker’s
camera
displays
here
16

Token based partitioning
▪ Partition key is hashed to an integer (token)
▪ Nodes own ranges of tokens
▪ Provides even distribution of data and traffic
▪ Hotspots if partitions have many clustering rows
18
ck:
pk - partition key, ck - clustering key
pk: a b c .. gf .. t.. u
1 2 3 .. 21 3 11 1
token
footprint:

Tablet partitioning
▪ Tablet is a new kind of partition
▪ It stores a primary key range, not a single partition key
▪ Tablet ranges are subject to dynamic load balancing
▪ Size of each tablet is configurable (e.g. 64MB)
19

Raft for Tablets
▪ Manageable number of Raft groups (~100,000)
▪ No client-side timestamps
▪ Provides isolation for ALL queries
▪ Writes do not require a read
▪ No need to repair
▪ Strong consistency of materialized views
Strong consistency by default
20

Raft in Scylla: summary
▪ Raft extended to efficiently support many groups
▪ Raft and Tablet partitioning = fast strong consistency
▪ Linearizable, more powerful schema and topology changes
▪ High Availability and partition tolerance of Cassandra are
mostly unaffected
21

Thank You
@kostja_osipov
kostja@scylladb.com
Konstantin Osipov
22

Download Scylla Open Source:
scylladb.com/download
Talk to an expert:
scylladb.com/consultation
Take a test drive:
scylladb.com/test-drive
The
Speaker’s
camera
displays
here
Experience Scylla for Yourself
23

Weitere ähnliche Inhalte

Was ist angesagt?

redis 소개자료 - 네오클로바

NeoClova

In this session I will use a simple HTTP benchmark to compare the performance of the Linux kernel networking stack with userspace networking powered by DPDK (kernel-bypass). It is said that kernel-bypass technologies avoid the kernel because it is "slow", but in reality, a lot of the performance advantages that they bring just come from enforcing certain constraints. As it turns out, many of these constraints can be enforced without bypassing the kernel. If the system is tuned just right, one can achieve performance that approaches kernel-bypass speeds, while still benefiting from the kernel's battle-tested compatibility, and rich ecosystem of tools.

Linux Kernel vs DPDK: HTTP Performance Showdown

ScyllaDB

Deleting data from Cassandra has several challenges, and existing solutions (tombstones or TTLs) have limitations that make them unusable or untenable in certain circumstances. We'll explore the cases where existing deletion options fail or are inadequate, then describe a solution we developed which deletes data from Cassandra during standard or user-defined compaction, but without resorting to tombstones or TTL's. About the Speaker Eric Stevens Principal Architect, ProtectWise, Inc. Eric is the principal architect, and day one employee of ProtectWise, Inc., specializing in massive real time processing and scalability problems. The team at ProtectWise processes, analyzes, optimizes, indexes, and stores billions of network packets each second. They look for threats in real time, but also store full fidelity network data (including PCAP), and when new security intelligence is received, automatically replay existing network history through that new intelligence.

Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...

DataStax

Neutron packet logging framework

Vietnam Open Infrastructure User Group

Big Data means big hardware, and the less of it we can use to do the job properly, the better the bottom line. Apache Kafka makes up the core of our data pipelines at many organizations, including LinkedIn, and we are on a perpetual quest to squeeze as much as we can out of our systems, from Zookeeper, to the brokers, to the various client applications. This means we need to know how well the system is running, and only then can we start turning the knobs to optimize it. In this talk, we will explore how best to monitor Kafka and its clients to assure they are working well. Then we will dive into how to get the best performance from Kafka, including how to pick hardware and the effect of a variety of configurations in both the broker and clients. We’ll also talk about setting up Kafka for no data loss.

Kafka at Peak Performance

Todd Palino

Discover the new features and capabilities of Scylla Open Source 5.0 directly from the engineers who developed it. This second block of lightning talks will cover the following topics: - New IO Scheduler and Disk Parallelism - Per-Service-Level Timeouts - Better Workload Estimation for Backpressure and Out-of-Memory Conditions - Large Partition Handling Improvements - Optimizing Reverse Queries To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.

Scylla Summit 2022: Scylla 5.0 New Features, Part 1

ScyllaDB

When does InnoDB lock a row? Multiple rows? Why would it lock a gap? How do transactions affect these scenarios? Locking is one of the more opaque features of MySQL, but it’s very important for both developers and DBA’s to understand if they want their applications to work with high performance and concurrency. This is a creative presentation to illustrate the scenarios for locking in InnoDB and make these scenarios easier to visualize. I'll cover: key locks, table locks, gap locks, shared locks, exclusive locks, intention locks, insert locks, auto-inc locks, and also conditions for deadlocks.

InnoDB Locking Explained with Stick Figures

Karwin Software Solutions LLC

SR-IOV+KVM on Debian/Stable

juet-y

Nick Fisk - low latency Ceph

ShapeBlue

How Impala Works

Yue Chen

Stream Processing made simple with Kafka

DataWorks Summit/Hadoop Summit

Discover the new features and capabilities of Scylla Open Source 5.0 directly from the engineers who developed it. This first block of lightning talks will cover the following topics: - Repair Based Node Operations - Repair Based Tombstone Garbage Collection - Compaction Improvements - Range Tombstone Improvements To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.

Scylla Summit 2022: Scylla 5.0 New Features, Part 2

ScyllaDB

BPF of Berkeley Packet Filter mechanism was first introduced in linux in 1997 in version 2.1.75. It has seen a number of extensions of the years. Recently in versions 3.15 - 3.19 it received a major overhaul which drastically expanded it's applicability. This talk will cover how the instruction set looks today and why. It's architecture, capabilities, interface, just-in-time compilers. We will also talk about how it's being used in different areas of the kernel like tracing and networking and future plans.

BPF - in-kernel virtual machine

Alexei Starovoitov

MongoDB at Baidu

Mat Keep

Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu

Ceph Community

Introduction to Kafka Cruise Control

Jiangjie Qin

Learn how RageDB leveraged the Seastar framework to build an outrageously fast graph database. Understand the right way to embrace the triple digit multi-core future by scaling up and not out. Sacrifice everything for speed and get out of the way of your users. No drivers, no custom protocols, no query languages, no GraphQL, just code in and JSON out. Exploit the built in Seastar HTTP server to tie it all together.

Outrageous Performance: RageDB's Experience with the Seastar Framework

ScyllaDB

Most databases are based on architectures that pre-date advances to modern hardware. This results in performance issues, the need to overprovision, and a high total cost of ownership. In this webinar, we will discuss the advances to modern server technology and take a deep dive into ScyllaDB’s shard-per-core architecture and our asynchronous engine, the Seastar framework. Join us to learn how Seastar (and ScyllaDB): - Avoid locks and contention on the CPU level - Bypass kernel bottlenecks - Implement its per-core shared-nothing autosharding mechanism - Utilize modern storage hardware - Leverage NUMA to get the best RAM performance - Balance your data across CPUs and nodes for the best and smoothest performance Plus we’ll cover the advantages of unlocking vertical scalability.

Under The Hood Of A Shard-Per-Core Database Architecture

ScyllaDB

Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.

Deep Dive on Amazon EC2 instances

Amazon Web Services

Ceph Tech Talk: Ceph at DigitalOcean

Ceph Community

Was ist angesagt? (20)

redis 소개자료 - 네오클로바

Linux Kernel vs DPDK: HTTP Performance Showdown

Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...

Neutron packet logging framework

Kafka at Peak Performance

Scylla Summit 2022: Scylla 5.0 New Features, Part 1

InnoDB Locking Explained with Stick Figures

SR-IOV+KVM on Debian/Stable

Nick Fisk - low latency Ceph

How Impala Works

Stream Processing made simple with Kafka

Scylla Summit 2022: Scylla 5.0 New Features, Part 2

BPF - in-kernel virtual machine

MongoDB at Baidu

Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu

Introduction to Kafka Cruise Control

Outrageous Performance: RageDB's Experience with the Seastar Framework

Under The Hood Of A Shard-Per-Core Database Architecture

Deep Dive on Amazon EC2 instances

Ceph Tech Talk: Ceph at DigitalOcean

Ähnlich wie Eventually, Scylla Chooses Consistency

Graph processing

yeahjs

CCNA (R & S) Module 04 - Scaling Networks - Chapter 4

Waqas Ahmed Nawaz

State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...

Paris Carbone

Cisco data center support

Krunal Shah

This event takes us to the cusp of Distributed Software Development and SDN Controllers. We will be hosting Madan and Brian who have been involved in the architecture and development of ONOS (Open Network Operating System). Synopsis ONOS is a distributed SDN network operating system architected to provide performance, scale-out, resiliency, and well-defined northbound and southbound abstractions. Madan and Brian, both from ON.Lab, will start the talk with a deep-dive into ONOS architecture, including the key technical challenges that were solved to build this platform. They will also walk us through a live demo of building a SDN application on ONOS. Details: ONOS Architecture ONOS Abstractions and Modularity ONOS Distributed architecture ONOS APIs and their usage Live demo- Building a SDN app on ONOS Speaker Bios Madan Jampani, Distributed Systems Architect, ONOS Madan is Distributed Systems Architect at ON.Lab focusing on the core distributed systems problems for ONOS. Prior to joining ON.Lab in Sep 2014, Madan worked at Amazon for around 10 years. At Amazon, Madan was instrumental in building several key technologies ranging from Amazon retail ordering systems, distributed data stores and shared compute clusters for running large-scale data processing and machine learning workloads. Brian O’Connor, Lead Developer, ONOS Brian is the ONOS Application Intent Framework lead and a core developer at ON.Lab, working on ONOS and Mininet. Brian O’Connor received Bachelor’s and Master’s degrees in Computer Science from Stanford University. At Stanford, he helped develop “An Introduction to Computer Networking,” one of Stanford’s first MOOCs (Massively Open Online Courses). ABOUT ON.LAB and ONOS Open Networking Lab (ON.Lab) is a non-profit organization founded by SDN inventors and leaders from Stanford University and UC Berkeley to foster an open source community for developing tools and platforms to realize the full potential of SDN. ON.Lab brings innovative ideas from leading edge research and delivers high quality open source platforms on which members of its ecosystem and the industry can build real products and solutions. ONOS, a SDN network operating system for service provider and mission critical networks, was open sourced on Dec 5th, 2014. ONOS delivers a highly available, scalable SDN control plane featuring northbound and southbound abstractions and interfaces for a diversity of management, control, service applications and network devices. ONOS ecosystem comprises of ON.Lab, organizations who are funding and contributing to the ONOS initiative including AT&T, NTT Communications, SK Telecom, Ciena, Cisco, Ericsson, Fujitsu, Huawei, Intel, NEC; members who are collaborating and contributing to ONOS include ONF, Infoblox, SRI, Internet2, Happiest Minds, CNIT, Black Duck, Create-Net and the broader ONOS community. Learn how you can get involved with ONOS at onosproject.org.

Tech Talk: ONOS- A Distributed SDN Network Operating System

nvirters

Cisco systems hacking layer 2 ethernet switches

KJ Savaliya

Hacking Layer 2 - Enthernet Switcher Hacking Countermeasures.

Sumutiu Marius

Cisco labs practical4

Tai Lam

Evolving Data Center switching with TRILL

bradhedlund

Ether channel fundamentals

Edgardo Scrimaglia

Beyond the immediate schema changes supported in Scylla Open Source 5.0, learn how the Raft consensus infrastructure will enable radical new capabilities. Discover how it will enable more dynamic topology changes, tablets, immediate consistency, better and faster elasticity, and simplification to repair operations. To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.

Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond

ScyllaDB

Scala & Spark(1.6) in Performance Aspect for Scala Taiwan

Jimin Hsieh

My new industry acronym: PSTL the Parallelized Streaming Transformation Loader (Pron. PiSToL) is an architecture for highly scalable and reliable, data ingestion pipelines While there is guidance on using; Apache Kafka™ for Streaming (or non-Streaming), and Apache Spark™ for Transformations, and Loading data (e.g., COPY) into an HP-Vertica™ columnar Data Warehouse, there is very little prescriptive guidance on how to truly parallelize a unified data pipeline - until now.

HPBigData2015 PSTL kafka spark vertica

Jack Gudenkauf

© L W T A O B © 2013 Cisco and Lab – Ob Wiresha Topology Addressing R S S2 PC PC Objectives Part 1: Bu Part 2: Us Part 3: Us Part 4: Us Backgroun The Addre address. W d/or its affiliates. bserving ark g Table Device 1 G 1 V 2 V C-A N C-B N uild and Con se the Windo se the IOS S se Wireshark nd / Scenar ess Resolutio When a frame All rights reserve g ARP w Interface G0/1 VLAN 1 VLAN 1 NIC NIC nfigure the N ows ARP Co how ARP Co k to Examine rio on Protocol (A e is placed on ed. This docume with the IP Ad 192.168 192.168 192.168 192.168 192.168 etwork mmand ommand e ARP Excha ARP) is used n the network ent is Cisco Publ Window ddress 8.1.1 25 8.1.11 25 8.1.12 25 8.1.3 25 8.1.2 25 anges by TCP/IP to k, it must have ic. ws CLI, I Subnet Mas 55.255.255.0 55.255.255.0 55.255.255.0 55.255.255.0 55.255.255.0 map a Layer e a destinatio OS CLI, sk Defaul N/A 192.16 192.16 192.16 192.16 r 3 IP address n MAC addre P and t Gateway 8.1.1 8.1.1 8.1.1 8.1.1 s to a Layer 2 ess. To dynam Page 1 of 11 MAC mically Lab – Observing ARP with the Windows CLI, IOS CLI and Wireshark © 2013 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 2 of 11 discover the MAC address for the destination device, an ARP request is broadcast on the LAN. The device that contains the destination IP address responds, and the MAC address is recorded in the ARP cache. Every device on the LAN keeps its own ARP cache, or small area in RAM that holds ARP results. An ARP cache timer removes ARP entries that have not been used for a certain period of time. ARP is an excellent example of performance tradeoff. With no cache, ARP must continually request address translations each time a frame is placed on the network. This adds latency to the communication and could congest the LAN. Conversely, unlimited hold times could cause errors with devices that leave the network or change the Layer 3 address. A network administrator should be aware of ARP, but may not interact with the protocol on a regular basis. ARP is a protocol that enables network devices to communicate with the TCP/IP protocol. Without ARP, there is no efficient method to build the datagram Layer 2 destination address. Also, ARP is a potential security risk. ARP spoofing, or ARP poisoning, is a technique used by an attacker to inject the wrong MAC address association in a network. An attacker forges the MAC address of a device, and frames are sent to the wrong destination. Manually configuring static ARP associations is one way to prevent ARP spoofing. Finally, an authorized MAC address list may be configured on Cisco devices to restrict network access to only approved devices. In this lab, you will use the ARP commands in both Windows and Cisco routers to display the ARP table. You will also clear the ARP cache and add static ARP entries.

LynellBull52

ScyllaDB’s drive towards strongly consistent features continues, and in this talk I will cover the upcoming implementation of safe topology changes feature: our rethinking of adding and removing nodes to a Scylla cluster. Quickly assembling a fresh cluster, performing topology and schema changes concurrently, quickly restarting a node with a different IP address or configuration – all of this has become possible thanks to a centralized - yet fault-tolerant - topology change coordinator, the new algorithm we implemented for Scylla 5.3. The next step would be automatically changing data placement to adjust to the load and distribution of data - our future plans which I will touch upon as well.

Raft After ScyllaDB 5.2: Safe Topology Changes

ScyllaDB

This presentation introduces Data Plane Development Kit overview and basics. It is a part of a Network Programming Series. First, the presentation focuses on the network performance challenges on the modern systems by comparing modern CPUs with modern 10 Gbps ethernet links. Then it touches memory hierarchy and kernel bottlenecks. The following part explains the main DPDK techniques, like polling, bursts, hugepages and multicore processing. DPDK overview explains how is the DPDK application is being initialized and run, touches lockless queues (rte_ring), memory pools (rte_mempool), memory buffers (rte_mbuf), hashes (rte_hash), cuckoo hashing, longest prefix match library (rte_lpm), poll mode drivers (PMDs) and kernel NIC interface (KNI). At the end, there are few DPDK performance tips. Tags: access time, burst, cache, dpdk, driver, ethernet, hub, hugepage, ip, kernel, lcore, linux, memory, pmd, polling, rss, softswitch, switch, userspace, xeon

Network Programming: Data Plane Development Kit (DPDK)

Andriy Berestovskyy

Building Scalable, Real Time Applications for Financial Services with DataStax

DataStax

DataStax Enterprise – Foundations for Finance – 20160419

Daniel Cohen

LEGaTO: Software Stack Runtimes

LEGATO project

Code GPU with CUDA - SIMT

Marina Kolpakova

Ähnlich wie Eventually, Scylla Chooses Consistency (20)

Graph processing

CCNA (R & S) Module 04 - Scaling Networks - Chapter 4

State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...

Cisco data center support

Tech Talk: ONOS- A Distributed SDN Network Operating System

Cisco systems hacking layer 2 ethernet switches

Hacking Layer 2 - Enthernet Switcher Hacking Countermeasures.

Cisco labs practical4

Evolving Data Center switching with TRILL

Ether channel fundamentals

Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond

Scala & Spark(1.6) in Performance Aspect for Scala Taiwan

HPBigData2015 PSTL kafka spark vertica

Raft After ScyllaDB 5.2: Safe Topology Changes

Network Programming: Data Plane Development Kit (DPDK)

Building Scalable, Real Time Applications for Financial Services with DataStax

DataStax Enterprise – Foundations for Finance – 20160419

LEGaTO: Software Stack Runtimes

Code GPU with CUDA - SIMT

Mehr von ScyllaDB

See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the data modeling transition ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning. Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts. This webinar focuses on common data modeling and querying mistakes that occur when developers move from SQL to NoSQL. For example: - Understanding query first design principles - Planning for schema evolution - Steering clear of common pitfalls and anti-patterns - Assessing data access patterns This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.

Developer Data Modeling Mistakes: From Postgres to NoSQL

ScyllaDB

See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the transition ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning. Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts. Our first webinar of this series will cover common mistakes with practices such as: - Translating the data model to NoSQL - Optimizing table design - Optimizing query performance - Planning for partitioning This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.

What Developers Need to Unlearn for High Performance NoSQL

ScyllaDB

Expert tips on how to maximize your database performance at scale Untangle the complexity of achieving database performance at scale. Join this webinar to discover commonly overlooked ways to get predictable low latency, even at extreme scale. Our Solution Architects will walk you through the strategies and pitfalls learned by working on thousands of real-world distributed database projects, many reaching 1M OPS with single-digit MS latencies. In addition to offering clear recommendations, we’ll also explain the process behind how we arrived at them – so you can benefit from the lessons learned by other teams. We’ll cover how to: - Design and deploy a large-scale distributed database cluster - Optimize your clients’ interactions with it - Expand the cluster horizontally and globally - Ensure it survives whatever disasters the world throws at it

Low Latency at Extreme Scale: Proven Practices & Pitfalls

ScyllaDB

Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments. Join us for an interactive session where we dissect four specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma, we'll: - Examine the context and technical requirements - Talk about potential solutions and cover the pros and cons of each - Disclose what approach the team took, and how it worked out About the speaker: Felipe is an IT specialist with years of experience on distributed systems and open-source technologies. He is one of the co-authors of "Database Performance at Scale", an Open Access, freely available publication for individuals interested on improving database performance. At ScyllaDB, he works as a Solution Architect.

Dissecting Real-World Database Performance Dilemmas

ScyllaDB

Linear scaling (sometimes near linear scaling) is often mentioned in several benchmarks, articles and product comparisons as proof that a given technology and algorithmic optimizations perform better than another. But is that really what performance is all about, and should you even care? This webinar discusses performance beyond linear scalability, including what typically matters more when running high throughput and low latency workloads at scale. We'll cover how ScyllaDB offers unparalleled performance and share our insights on: - The hidden aspects of linear scaling - When linear scaling matters most and when it’s simply irrelevant - Often overlooked considerations for optimizing and measuring distributed systems performance Watch now to learn from our experience (and lessons learned) in building the fastest NoSQL database in the world.

Beyond Linear Scaling: A New Path for Performance with ScyllaDB

ScyllaDB

Navigating Complex Database Performance Hurdles Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments. Join us for an interactive session where we dissect 4 specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma: - The presenters will describe the context and technical requirements - Together, we’ll talk about potential solutions and cover the pros and cons of each - Finally, we’ll disclose what approach the team took, and how it worked out Throughout the event, we’ll have opportunities to win ScyllaDB swag and prizes! Come prepared to engage in lively discussions and gain valuable insight into database performance strategies.

Dissecting Real-World Database Performance Dilemmas

ScyllaDB

Database Performance at Scale Masterclass: Workload Characteristics by Felipe...

ScyllaDB

Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...

ScyllaDB

Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna

ScyllaDB

Technical risks of putting a cache in front of your database– and what to do instead Teams experiencing subpar latency commonly turn to an external cache to meet the required SLAs. Placing a cache in front of your database might seem like a fast and easy fix, but it often ends up introducing unanticipated complexity, costs, and risks. External caches can be one of the more problematic components of distributed application architecture. Join this webinar for a technical discussion of the risks associated with using an external cache and a look at how ScyllaDB’s cache implementation simplifies your architecture without compromising latency. We’ll cover: - Different approaches to caching (pre-caching vs. caching, side cache vs. transparent cache) - 7 specific reasons why external caching ia a bad choice - Why Linux’s default caching doesn’t work well for databases - The advantages & architecture of ScyllaDB's specialized row-based cache - Real-world examples of why and how teams eliminated their external cache with ScyllaDB

Replacing Your Cache with ScyllaDB

ScyllaDB

Discover how your team can achieve low latency at the extreme scale that your data-intensive applications require. We’ll walk you through an example of how ScyllaDB scales linearly to achieve 1M and then 2M OPS – with <1ms P99 latency. We’ll cover how this works on a sample realtime app (an ML feature store), share best practices for performance, and talk about the most important tradeoffs you’ll need to negotiate. Join us to learn: - Why and how to ensure your database takes full advantage of your cloud infrastructure - What architectural considerations matter most for high throughput and low latency - Key factors to consider when selecting a high-performance database

Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability

ScyllaDB

Teams experiencing subpar latency commonly turn to an external cache to meet the required SLAs. Placing a cache in front of your database might seem like a fast and easy fix, but it often ends up introducing unanticipated complexity, costs, and risks. Caches can be one of the more problematic components of distributed application architecture. Join this webinar for a technical discussion of the risks associated with using an external cache and a look at an alternative strategy that simplifies your architecture without compromising latency. We’ll cover: - Different approaches to caching (pre-caching vs. caching, side cache vs. transparent cache) - 7 specific reasons why external caching can be a bad choice - Why Linux’s default caching doesn’t work well for databases - The advantages & architecture of specialized row-based caches - Real-world examples of why and how teams eliminated their external cache

7 Reasons Not to Put an External Cache in Front of Your Database.pptx

ScyllaDB

Expert tips on how to maximize your database potential If you’re considering or getting started with ScyllaDB, you’re probably intrigued by its potential to achieve high throughput and predictable low latency at a reasonable cost. So how do you ensure that you’re maximizing that potential for your team’s specific workloads and use case? This webinar offers practical advice for navigating the various decision points you’ll face as you assess whether ScyllaDB is a good fit for your team and later roll it out into production. We’ll cover the most critical considerations, tradeoffs, and recommendations related to: - Infrastructure selection - ScyllaDB configuration - Client-side setup - Data modeling

Getting the most out of ScyllaDB

ScyllaDB

NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration

ScyllaDB

NoSQL Database Migration Masterclass - Session 3: Migration Logistics

ScyllaDB

NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges

ScyllaDB

Build the foundation for success with ScyllaDB Ready to try out ScyllaDB and want to make sure you’re “doing it right?” We’ll help you get up and running, fast. Spend an hour with our architects for a crash course in what ScyllaDB is all about, the core concepts you need to know, and a step-by-step demonstration of how to get started. During the live, interactive one-hour session, you will learn: - Critical considerations for designing a NoSQL system and NoSQL data model - The technology underlying ScyllaDB’s high performance, availability, and scalability – and best practices for taking advantage of it - How to install, deploy and operate a full working ScyllaDB system, including multi-data center deployment, monitoring, and connecting an application to the ScyllaDB cluster By the end of the session, you’ll have the knowledge and tools you need to get ScyllaDB running on your laptop, connect your application to it, and see what it’s like to use ScyllaDB for your specific use case.

ScyllaDB Virtual Workshop

ScyllaDB

What do you give up – and gain – when moving to a fully-managed cloud database? Now that database-as-a-Service (DBaaS) offerings have been “battle tested” in production, how is the reality matching up to the expectation? What can teams thinking of adopting a fully-managed DBaaS can learn from teams who have years of experience working with this deployment model? Join this webinar to dive into the reality of working with various high-performance DBaaS offerings. We’ll cover the following topics, all supported with real-world examples: - Developer flexibility - Cost variability - Security & privacy - Performance impact - Transparency & troubleshooting

DBaaS in the Real World: Risks, Rewards & Tradeoffs

ScyllaDB

Hands-on workshop to explore the affinities between Rust, the Tokio framework, and ScyllaDB NoSQL. ScyllaDB is a perfect match for Rust. Similar to the Rust programming language and the Tokio framework, ScyllaDB is built on an asynchronous, non-blocking runtime that works extremely well for building highly-reliable low-latency distributed applications. In this workshop, we’ll build a sample Rust application on our high performance native Rust client driver. By compiling and walking through the code, you’ll learn how to craft queries to a locally running ScyllaDB cluster. We’ll cover how to: - Install and compile a sample app, built on ScyllaDB’s native Rust SDK. - Get a ScyllaDB cluster up and running - Connect the application to the database - Review data modeling, query types, and best practices - Manage and monitor the database for consistently low latencies If you’re an application developer with an interest in Rust and Tokio, this workshop is for you!

Build Low-Latency Applications in Rust on ScyllaDB

ScyllaDB

NoSQL Data Modeling 101

ScyllaDB

Mehr von ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL

What Developers Need to Unlearn for High Performance NoSQL

Low Latency at Extreme Scale: Proven Practices & Pitfalls

Dissecting Real-World Database Performance Dilemmas

Beyond Linear Scaling: A New Path for Performance with ScyllaDB

Dissecting Real-World Database Performance Dilemmas

Database Performance at Scale Masterclass: Workload Characteristics by Felipe...

Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...

Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna

Replacing Your Cache with ScyllaDB

Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability

7 Reasons Not to Put an External Cache in Front of Your Database.pptx

Getting the most out of ScyllaDB

NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration

NoSQL Database Migration Masterclass - Session 3: Migration Logistics

NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges

ScyllaDB Virtual Workshop

DBaaS in the Real World: Risks, Rewards & Tradeoffs

Build Low-Latency Applications in Rust on ScyllaDB

NoSQL Data Modeling 101

Kürzlich hochgeladen

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

Finology Group – Insurtech Innovation Award 2024

The Digital Insurer

What are drone anti-jamming systems? The drone anti-jamming systems and anti-spoof technology protect against interference, jamming, and spoofing of the UAVs. To protect their security, countries are beginning to research drone anti-jamming systems, also known as drone strike weapons. The anti-jam and anti-spoof technology protects against interference, jamming and spoofing. A drone strike weapon is a drone attack weapon that can attack and destroy enemy drones. So what is so unique about this amazing system?

What Are The Drone Anti-jamming Systems Technology?

Antenna Manufacturer Coco

This presentation explores the impact of HTML injection attacks on web applications, detailing how attackers exploit vulnerabilities to inject malicious code into web pages. Learn about the potential consequences of such attacks and discover effective mitigation strategies to protect your web applications from HTML injection vulnerabilities. for more information visit https://bostoninstituteofanalytics.org/category/cyber-security-ethical-hacking/

HTML Injection Attacks: Impact and Mitigation Strategies

Boston Institute of Analytics

Discord is a free app offering voice, video, and text chat functionalities, primarily catering to the gaming community. It serves as a hub for users to create and join servers tailored to their interests. Discord’s ecosystem comprises servers, each functioning as a distinct online community with its own channels dedicated to specific topics or activities. Users can engage in text-based discussions, voice calls, or video chats within these channels. Understanding Discord Servers Discord servers are virtual spaces where users congregate to interact, share content, and build communities. Servers may revolve around gaming, hobbies, interests, or fandoms, providing a platform for like-minded individuals to connect. Communication Features Discord offers a range of communication tools, including text channels for messaging, voice channels for real-time audio conversations, and video channels for face-to-face interactions. These features facilitate seamless communication and collaboration. What Does NSFW Mean? The acronym NSFW stands for “Not Safe For Work,” indicating content that may be inappropriate for professional or public settings. NSFW Content NSFW content encompasses material that is sexually explicit, violent, or otherwise graphic in nature. It often includes nudity, profanity, or depictions of sensitive topics.

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

UK Journal

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

Tata AIG General Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

apidays

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

GenCyber Cyber Security Day Presentation

Michael W. Hawkins

GenAI Risks & Security Meetup 01052024.pdf

lior mazor

Enterprise Knowledge’s Urmi Majumder, Principal Data Architecture Consultant, and Fernando Aguilar Islas, Senior Data Science Consultant, presented "Driving Behavioral Change for Information Management through Data-Driven Green Strategy" on March 27, 2024 at Enterprise Data World (EDW) in Orlando, Florida. In this presentation, Urmi and Fernando discussed a case study describing how the information management division in a large supply chain organization drove user behavior change through awareness of the carbon footprint of their duplicated and near-duplicated content, identified via advanced data analytics. Check out their presentation to gain valuable perspectives on utilizing data-driven strategies to influence positive behavioral shifts and support sustainability initiatives within your organization. In this session, participants gained answers to the following questions: - What is a Green Information Management (IM) Strategy, and why should you have one? - How can Artificial Intelligence (AI) and Machine Learning (ML) support your Green IM Strategy through content deduplication? - How can an organization use insights into their data to influence employee behavior for IM? - How can you reap additional benefits from content reduction that go beyond Green IM?

Driving Behavioral Change for Information Management through Data-Driven Gree...

Enterprise Knowledge

What is a good lead in your organisation? Which leads are priority? What happens to leads? When sales and marketing give different answers to these questions, or perhaps aren't sure of the answers at all, frustrations build and opportunities are left on the table. Join us for an illuminating session with Cian McLoughlin, HubSpot Principal Customer Success Manager, as we look at that crucial piece of the customer journey in which leads are transferred from marketing to sales.

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

HampshireHUG

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

[2024]Digital Global Overview Report 2024 Meltwater.pdf

hans926745

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

The Digital Insurer

Created by Mozilla Research in 2012 and now part of Linux Foundation Europe, the Servo project is an experimental rendering engine written in Rust. It combines memory safety and concurrency to create an independent, modular, and embeddable rendering engine that adheres to web standards. Stewardship of Servo moved from Mozilla Research to the Linux Foundation in 2020, where its mission remains unchanged. After some slow years, in 2023 there has been renewed activity on the project, with a roadmap now focused on improving the engine’s CSS 2 conformance, exploring Android support, and making Servo a practical embeddable rendering engine. In this presentation, Rakhi Sharma reviews the status of the project, our recent developments in 2023, our collaboration with Tauri to make Servo an easy-to-use embeddable rendering engine, and our plans for the future to make Servo an alternative web rendering engine for the embedded devices industry. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://ossna2024.sched.com/event/1aBNF/a-year-of-servo-reboot-where-are-we-now-rakhi-sharma-igalia

A Year of the Servo Reboot: Where Are We Now?

Igalia

Scaling API-first – The story of a global engineering organization

Radu Cotescu

With more memory available, system performance of three Dell devices increased, which can translate to a better user experience Conclusion When your system has plenty of RAM to meet your needs, you can efficiently access the applications and data you need to finish projects and to-do lists without sacrificing time and focus. Our test results show that with more memory available, three Dell PCs delivered better performance and took less time to complete the Procyon Office Productivity benchmark. These advantages translate to users being able to complete workflows more quickly and multitask more easily. Whether you need the mobility of the Latitude 5440, the creative capabilities of the Precision 3470, or the high performance of the OptiPlex Tower Plus 7010, configuring your system with more RAM can help keep processes running smoothly, enabling you to do more without compromising performance.

Boost PC performance: How more available memory can improve productivity

Principled Technologies

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

presentation ICT roal in 21st century education

Finology Group – Insurtech Innovation Award 2024

What Are The Drone Anti-jamming Systems Technology?

HTML Injection Attacks: Impact and Mitigation Strategies

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Apidays New York 2024 - The value of a flexible API Management solution for O...

How to Troubleshoot Apps for the Modern Connected Worker

GenCyber Cyber Security Day Presentation

GenAI Risks & Security Meetup 01052024.pdf

Driving Behavioral Change for Information Management through Data-Driven Gree...

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

A Year of the Servo Reboot: Where Are We Now?

Scaling API-first – The story of a global engineering organization

Boost PC performance: How more available memory can improve productivity

Eventually, Scylla Chooses Consistency

1. Raft in Scylla Konstantin Osipov ScyllaDB

2. Konstantin Osipov Team Lead, ScyllaDB Kostja’s one of the developers behind Scylla lightweight transactions. His current focus is Raft log replication and its applications to schema, topology changes and tablets. 2

3. Recap Scylla Summit 2019 ▪ LWT: the first strongly consistent feature ▪ Available in 4.0 ▪ Pay per use 3 2020-12-25: Tested with Jepsen! UPDATE employees SET join_date = '2018-05-19' WHERE firstname = 'John' AND lastname = 'Doe' IF join_date != null; [applied] False

4. ▪ 3 network round trips per write ▪ Must read the old value before write LWT use of Paxos 4 R1 R 2 R 3 Decision made Can I propose a value? Check condition Accept new value Learn decision

5. What is Raft anyway? ▪ Raft provides strong consistency efficiently ▪ Only the leader can accept writes 5 Leader Append entries Apply Follower Follower Decision made … 1 network round trip per write on average

6. Raft log replication ▪ Each node has a copy of Raft log 6

7. Scylla plans to use Raft for: ▪ Topology changes 7

8. Scylla plans to use Raft for: ▪ Topology changes ▪ Schema changes 8

9. Scylla plans to use Raft for: ▪ Schema changes ▪ Topology changes ▪ Tablets 9 0-99 100-199 200-299 Table Node A Tablet 1 0-99 Node B Tablet 2 100-199 Node C Tablet 3 200-299

10. Topology changes on Raft 10

11. Topology changes in Scylla ▪ Safe when one change is done at a time ▪ Rely on 30+ second timeouts for consistency ▪ Allowed on a significantly degraded cluster (split brain) 11 30s 👋 💡 💡

12. Topology changes using Raft ▪ Durable and linearizable ▪ Permit adding multiple nodes ▪ Permit background data rebalancing ▪ Require a majority of replicas alive to succeed 12

13. Schema changes using Raft 13

14. Schema changes in Scylla ▪ Each node owns a copy of the schema ▪ Schema change is first made locally ▪ Then eventually pushed through the cluster ▪ Last-timestamp-wins rule is used for reconciliation 14 Node A: > CREATE TABLE e (a int); OK (hash: a81e, ts: 1609420790) Node B: > CREATE TABLE e (a int, b int); OK (hash: 2fa3, ts: 1609420792)

15. INSERT/UPDATE when schemas differ ▪ Each data request carries a schema version ▪ Missing versions can be pulled from peers 15 Node A (a81e): > INSERT INTO e (a) VALUES (1); hash: 2fa3, row: (1, null) Node B (2fa3): > INSERT INTO e (a) VALUES (1); hash: 2fa3, row: (1, null)

16. Schema changes using Raft ▪ Each node continues to store a copy of the schema ▪ A change is first persisted in a global Raft log ▪ On success, it’s applied on replicas ▪ Schema changes are now linearizable and consistent ▪ Nodes catch up with schema history during boot The Speaker’s camera displays here 16

17. Tablets 17

18. Token based partitioning ▪ Partition key is hashed to an integer (token) ▪ Nodes own ranges of tokens ▪ Provides even distribution of data and traffic ▪ Hotspots if partitions have many clustering rows 18 ck: pk - partition key, ck - clustering key pk: a b c .. gf .. t.. u 1 2 3 .. 21 3 11 1 token footprint:

19. Tablet partitioning ▪ Tablet is a new kind of partition ▪ It stores a primary key range, not a single partition key ▪ Tablet ranges are subject to dynamic load balancing ▪ Size of each tablet is configurable (e.g. 64MB) 19

20. Raft for Tablets ▪ Manageable number of Raft groups (~100,000) ▪ No client-side timestamps ▪ Provides isolation for ALL queries ▪ Writes do not require a read ▪ No need to repair ▪ Strong consistency of materialized views Strong consistency by default 20

21. Raft in Scylla: summary ▪ Raft extended to efficiently support many groups ▪ Raft and Tablet partitioning = fast strong consistency ▪ Linearizable, more powerful schema and topology changes ▪ High Availability and partition tolerance of Cassandra are mostly unaffected 21

22. Thank You @kostja_osipov kostja@scylladb.com Konstantin Osipov 22

23. Download Scylla Open Source: scylladb.com/download Talk to an expert: scylladb.com/consultation Take a test drive: scylladb.com/test-drive The Speaker’s camera displays here Experience Scylla for Yourself 23

Hinweis der Redaktion

Hi, This talk is about Raft in Scylla - our effort to improve a lot of existing Cassandra functionality and add new strongly consistent features.
I’m Konstantin Osipov, I live in Moscow and work on open source databases. In Scylla I’ve been involved with implementation of lightweight transactions.
Before discussing Raft, let’s recap the items we delivered recently. Back at Scylla Summit 2019 we announced support for Cassandra lightweight transactions. Lightweight transactions allow all clients agree on a state of a database before making a change to it. Prior to that, Scylla lacked any strongly consistent features. We made a considerable effort testing LWT, and just recently completed an industry standard Jepsen testing for it.
In Scylla, LWT are based on Paxos consensus algorithm. Paxos is a leaderless protocol, in which each participant stores little state, which was an advantage considering that to be compatible with Cassandra Scylla needed to allow each partition be independently available. Paxos runs 3 rounds of network messages to commit each transaction. This is 1 round trip less than Cassandra, but still is more than necessary in the optimal case. An important property of LWT is that it works over existing tables and alongside eventually consistent operations. If LWT are not used, the overhead on the rest of the operations is zero. This is the gain of a fairly high cost of the implementation. We mentioned at the 2019 Summit that Scylla is committed to providing an optimized implementation of strongly consistent reads and writes based on Raft. In this talk I will discuss our progress with Raft and what else we’re going to improve using it.
So what is Raft anyway? It is a leader based log replication protocol. A very crude explanation of what Raft does, is it elects a leader once, and then the leader is responsible for making all the decisions about the state of the database. This helps avoid extra communication between replicas during individual reads and writes. Each node maintains a state of who the current leader is, and forwards requests to the leader. Scylla clients are unaffected - except now the leader does some more work than replicas, so the load distribution may be less even. This means Scylla will need to Raft instances side by side.
Raft is built around the notion of a replicated log. When the leader receives a request, it first stores an entry for it in its log. Then it pushes the entry to replica’s copies of the log. Once the majority of replicas store the entry, the leader applies the entry and instruct the replicas to do the same. On event of leader failure, a replica with the most up to date log becomes the leader.
Raft defines not only how group makes a decision, but also the protocol of deciding on new members of the group, and removing group members. This lays a solid foundation for Scylla topology changes: they translate naturally to Raft configuration changes, assuming there is a Raft group for all of the nodes, and no longer need a proprietary protocol.
Schema changes translate to simply storing a command in a the global Raft log and then applying the change on each node which has a copy of the log.
Because of the additional state (the current leader) stored at each peer, it’s not as straightforward to apply Raft to Scylla data manipulation statements. Maintaining a separate leader for each partition would be just too much overhead, considering individual partition updates may be rare. This is why Scylla, alongside Raft, works on a new partitioner, which would reduce the total number of partitions, while still keeping the number high to guarantee even distribution of data and work, and would allow balance the data between partitions more flexibly. For each such partition, called Tablet, Scylla will run an own instance of Raft algorithm. In the rest of the talk I will discuss these 3 applications of Raft in more detail.
Let’s begin with the subject of topology changes and discuss how Raft could be used to improve it.
Presently, topology changes in Scylla are eventually consistent. Let’s use node addition as an example. A node wishing to join the cluster advertises itself to the rest of the members through Gossip. For those of you not familiar with the way Gossip works, it’s a great protocol for distributing some infrequently changing information at low cost. It’s very commonly used for failure detection - when healthy clusters enjoy low network overhead induced by a failure detector, and state of a faulty node distributes across the cluster reasonably quickly - a few to several seconds would be a typical interval. Knowing Gossip is not too fast waits for (by default) 30s to let the news spread. Nodes begin forwarding relevant updates to the new node once they are aware of ot. With updates coming in, the node can start data rebalancing. Node removal or decommission works similarly, except the node spreading the rumour (aka the change coordinator) is not necessarily the same node the rumour is about (just what we are used to in real life). This poses some challenges: The actions performed by the change coordinator are unilateral, and assume the operator avoids starting a conflicting change concurrently. The joining node will proceed after a 30s interval even if one of the nodes in the cluster is down and did not get the news about the new member. Such nodes, once are back online, will continue serving queries using old topology until Gossip messages reach them. A repair will then be necessary to restore the configured data replication factor. If a joining node dies mid-way, its added data ranges will remain in the cluster topology and the operator will need to clean them up manually before proceeding with the next change. Since the procedure relies on a fairly slow vehicle to spread the information, it’s hard to split into multiple steps. When we at Scylla discuss how to add multiple nodes concurrently, we consider breaking a single topology change action into smaller, persistent and resumable steps, such as first adding an empty node, then assigning it some data ranges, then actually moving these ranges. Having to wait 30s for each step to settle in through Gossip is not very practical.
Raft handles these challenges by including topology changes (called configuration changes there) into protocol core. This part of Raft protocol is also widely adopted and went under extensive scrutiny, so should be naturally preferred to Scylla’s proprietary solution inherited from Cassandra. The way Raft treats topology changes is similar to the way it handles standard strongly consistent reads and writes. A topology change is done by appending two records to the distributed Raft log. The first record is introducing the new topology to the cluster. After the first record is appended to master log, and until the log with this record is shipped to the majority of nodes, the cluster takes into account the new topology (e.g. a new node) in all writes, but doesn’t abandon the old topology yet - it’s also used for all reads and writes. Once the majority of replicas got the information about the new topology, the leader adds the second record to the log. This informs replicas that now it’s safe to discard the old topology and fully switch to the new one. This two-step procedure ensures that no two parts of the cluster operate in two different configurations - worst case, some nodes may still be using joint topology and old one, or joint topology and new one, both of which is safe, but never only old and only new topology. With Raft, Scylla topology changes could be split into multiple steps: First, add the new node to the global Raft group configuration, using the procedure just described Then, commit a record to token_metadata with the new nodes’ token. This will be linearizable with all topologies The, stream ranges to the added node, and update state of each range as it is streamed. Since all the steps are linearized through Raft log it is now possible to permit concurrent topology changes, as long as they don’t conflict. The only conceivable downside is that if the majority of the cluster nodes are down, it may be not possible to perform topology changes at all. Scylla will need to provide an emergency brake instrument to recover clusters so significantly degraded. One possible solution would be directly editing topology information on the remaining nodes, to let them continue in the state that remains.
Schema changes are operations such as creating and dropping keyspaces, tables, user defined types or functions. If they are using Raft, they also benefit from linearizability.
Currently, Schema changes in Scylla are eventually consistent. Each Scylla node has an own copy of the schema. Requests to change schema are validated against a local copy and then are applied locally. A new data item may be added to the immediately following, before any other cluster node knows about it. There is no coordination between changes at different nodes, and any node is free to propose a change. The change is eventually propagated through the rest of the cluster. The last-timestamp-wins is used to resolve conflicts if two changes against the same object happened concurrently.
Data manipulation is aware of possible schema inconsistency. A specific request carries a schema version with it. Scylla is able execute requests with divergent history, so will fetch a particular schema version just to execute a request. This guarantees the schema changes are fully available in presence of network failures. It has some downsides as well. It is possible to submit changes that conflict: e.g. define a table based on UDT, and drop that UDT New features, such as triggers, stored functions, UDFs, aggravate the consistency problem
After switching schema changes to Raft any node would still be able to propose a change. However, the change now will be forwarded to Raft leader, where it will be validated against the latest version of the schema. Then, the leader will persist it in a global Raft log, replicated to all nodes of the cluster. Once the majority of replicas confirm persisting its copy of the log, the change will be applied on all replicas. With this approach, all schema changes will form a linear history and divergent/conflicting changes will be impossible. It should open the way to complex but safe dependencies between schema objects, i.e. triggers, constraints or functional indexes. A replica which was down while the cluster has been performing schema changes will catch up with them on boot, but streaming the entire history of changes from the leader. There is also a downside. It will no longer possible to perform a schema change if the majority for the cluster is unreachable or down. It is still possible that a node gets a request for a schema it did not see yet, and will need to fetch schema for it. For older schemas we will maintain a version history. For newer schemas, we will need to make sure that the history can be fetched from any node, not just the leader. https://docs.google.com/presentation/d/1ZazssA802_bUHcJKy7yPUbiVby8acFxbebf-VbmXRDk/edit#slide=id.ga3bc8bcbea_0_131
Finally, the ultimate feature enabled by Raft are fast & efficient yet strongly consistent tables. Tablets is a term for a database unit of data distribution and load balancing first introduced in Google BigTable paper from 2006. Let’s see how they work.
Today, Scylla’s partitioning strategy is not pluggable. Compare with replication strategy: you can change how many replicas a keyspace has, and where these replicas are located. You can also use QUORUM/LOCAL_QUORUM and SERIAL/LOCAL_SERIAL to work efficiently in cross-dc setup. Scylla partitioner is not like it: all you can choose is what makes a partition key. The key is always hashed to a token, a token mapped to a replica set/shard. Thanks to hashing and use of vnodes (tokens), the data is evenly distributed across the cluster. Most write and read scenarios produce even load on all nodes of the cluster. Hotspots, while possible, are unlikely. Unfortunately, one size still can not fit all. Using the same partitioner for all tables can be rather a hindrance if there are a lot of small tables, which are frequently scanned. Frequent range scans also require an extra step of merging streams produced by multiple nodes. Certain partitions tend to get hot no matter how good is the choice of the partition key. https://docs.google.com/document/d/1flYRliD-VXNlrdPR2IT_rswXRW_55CySlXnEcw7qqtY/edit#heading=h.ly4c9p67vgne https://docs.google.com/presentation/d/1Pm1hIGza4RcSEzlV_bRSYv9AmUyAGRv6cuNmVuEmt9g/edit#slide=id.g51b14e1223_0_432
So in Scylla, we would like to make partitioning strategy a user choice, like the replication factor is today. If a user chooses tablet partitioning, Scylla will store small tables using few tablets. Large partitions (tablets) will be automatically split, and small tablets coalesced if necessary. Other databases that support range-based partitioners include MongoDB, Couchbase, Cockroach… https://docs.google.com/document/d/1flYRliD-VXNlrdPR2IT_rswXRW_55CySlXnEcw7qqtY/edit#heading=h.ly4c9p67vgne https://docs.google.com/presentation/d/1Pm1hIGza4RcSEzlV_bRSYv9AmUyAGRv6cuNmVuEmt9g/edit#slide=id.g51b14e1223_0_432
Tables, partitioned using tablets, will work efficiently with Raft. When Raft is used, the change is stored in the log before it’s applied to the table, so no repair in Cassandra sense is needed - we may still want to “repair” (i.e. sync up) the logs between replicas, but the base tables will stay consistent at all times. This addressed the problem of consistency of derived data, which has been open in Cassandra for along time (many of you who track Cassandra development are familiar with materialized view consistency issues) . https://docs.google.com/document/d/1flYRliD-VXNlrdPR2IT_rswXRW_55CySlXnEcw7qqtY/edit#heading=h.ly4c9p67vgne https://docs.google.com/presentation/d/1Pm1hIGza4RcSEzlV_bRSYv9AmUyAGRv6cuNmVuEmt9g/edit#slide=id.g51b14e1223_0_432
Original Raft does not know about partitions, tokens, shards. It is an abstract algorithm describing replication of an abstract state machine. In Scylla, we have more than one state machine (schema information, topology information, and then each tablet and its replica set is an independent Raft instance), so we want to run many copies of Raft algorithm simultaneously. This poses new challenges: how do we spawn new copies consistently? How much state the algorithm will take? Can we share the overhead of the algorithm, such as the cost of distributed failure detection, between Raft instances? Where to store Raft replication log? Could we avoid the overhead of double logging: raft log and commit log? Could we make these decision configurable, depending on the balance of performance and ease of use? We have already addressed many of these issues in Scylla Raft - a reusable library, which supports joint consensus configuration changes, pluggable state machine, logging and failure detection. We’re working on rebuilding Scylla schema on top of it. The first user-visible impact of the effort is expected in the upcoming year. Stay tuned.

Eventually, Scylla Chooses Consistency

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Eventually, Scylla Chooses Consistency

Ähnlich wie Eventually, Scylla Chooses Consistency (20)

Mehr von ScyllaDB

Mehr von ScyllaDB (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Eventually, Scylla Chooses Consistency

Hinweis der Redaktion