SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Brought to you by
Implementing Highly
Performant Distributed
Aggregates
Michał Jadwiszczak
Junior Software Engineer at ScyllaDB
The Problem
Why distribute aggregates?
How are aggregate queries executed currently?
■ Coordinator downloads all required data from other nodes
■ Then locally aggregates the whole table
Reasons to do the distribution
■ This is slow and very inefficient
■ The workload is accumulated on a single machine
Sequential aggregate
SELECT COUNT(*) FROM keyspace.table
p0..p1
p2..p3
p4..p5
p6..p7
Sequential aggregate
SELECT COUNT(*) FROM keyspace.table
Get data
p0..p1
p2..p3
p4..p5
p6..p7
Sequential aggregate
SELECT COUNT(*) FROM keyspace.table
p0..p1
p2..p3
p4..p5
p6..p7
Return(p2..p3)
Return(p4..p5)
Return(p6..p7)
COUNT( )
Sequential aggregate
SELECT COUNT(*) FROM keyspace.table
p0..p1
p2..p3
p4..p5
p6..p7
Return(p2..p3)
Return(p4..p5)
Return(p6..p7)
The Concept
Distributed aggregate
SELECT COUNT(*) FROM keyspace.table
p0..p1
p2..p3
p4..p5
p6..p7
Distributed aggregate
SELECT COUNT(*) FROM keyspace.table
SELECT
COUNT(p2..p3)
p0..p1
p4..p5
p6..p7
SELECT
COUNT(p4..p5)
SELECT
COUNT(p6..p7)
p2..p3
Distributed aggregate
SELECT COUNT(*) FROM keyspace.table
Return c1
Return c2
Return c3
COUNT( )
p2..p3
COUNT( )
p4..p5
COUNT( )
p6..p7
COUNT( )
p0..p1
REDUCE( )
Distributed aggregate
SELECT COUNT(*) FROM keyspace.table
c0
c1
c2
c3
Distributing the aggregates
The parallelization of aggregates works in a pretty straightforward way
■ Coordinator which receives the query turns into super-coordinator
■ Super-coordinator splits the query into subqueries for other nodes
● Each node receive query only for data it holds
● Each partition range exists in only one subquery to avoid data duplication
■ Each participating node executes its subquery and returns a partial result
● Partial result is a state of an aggregate without finalizing it
■ Super-coordinator receives the results, reduces them and finalizes
The Implementation
Distributing aggregates
Scylla not only distributed the work at the cluster level. The nodes also distribute the query to
their shards. Each shard again gets a query to only its data, so it doesn’t have to
communicate with other ones.
All native aggregates have their distributed implementation.
Scylla uses them by default without user need to think about it.
Native aggregates:
■ COUNT
■ SUM
■ AVG
■ MAX
■ MIN
User-Defined Aggregates
UDA consists of:
■ Required fields
● State function
● State type
■ Optional fields
● Final function
● Initial condition
CREATE AGGREGATE my_aggregate(int)
SFUNC state_f
STYPE tuple<int, bigint>
FINALFUNC final_f
INITCOND (0, 0);
Distributed User-Defined Aggregates
UDA consists of:
■ Required fields
● State function
● State type
■ Optional fields
● Reduce function
● Final function
● Initial condition
CREATE AGGREGATE my_aggregate(int)
SFUNC state_f
STYPE tuple<int, bigint>
REDUCEFUNC reduce_f
FINALFUNC final_f
INITCOND (0, 0);
Example of UDA
How many people follows Elon
Musk on Twitter?
Operations:
■ full scan table
■ iterate over each list
CREATE TABLE twitter_users (
id bigint,
name text,
follows list<bigint>,
PRIMARY KEY(id)
)
Example of UDA
CREATE FUNCTION count_if_follows_elon(acc bigint, follows list<bigint>)
RETURNS NULL ON NULL INPUT
RETURNS bigint
LANGUAGE lua
AS $$
for _, f in ipairs(follows) do
if f == 100 then
return acc + 1
end
end
return acc
$$
CREATE FUNCTION reduce_followers_count(acc1 bigint, acc2 bigint)
RETURNS NULL ON NULL INPUT
RETURNS bigint
LANGUAGE lua
AS $$
return acc1 + acc2
$$
Example of UDA
CREATE AGGREGATE count_elon_followers(list<bigint>)
SFUNC count_if_follows_elon
STYPE bigint
REDUCEFUNC reduce_followers_count
INITCOND 0
Reducing the results
■ Each coordinator reduces results from its shards and sends the result to
super-coordinator
■ Super-coordinator is responsible for reducing partial results from other
coordinators
■ Reduction is done be exciting proper function
■ All of the partial results are stored in memory (for now!)
What if the subquery fails?
Parallelization service provides retrying system in case of subquery failure.
■ When the subquery fails
● Super-coordinator acts as a its coordinator …
● and executes it
This way, the query is more likely to succeed, for instance by selecting another,
available replica.
Limitations
Main limitations are only queries
without WHERE and GROUP BY
clauses can be distributed.
Because select statement can’t be
serialized, we are representing it as
forward_request.
struct forward_request {
struct aggregation_info {
db::functions::function_name name;
std::vector<sstring> column_names;
};
query::read_command cmd;
dht::partition_range_vector pr;
db::consistency_level cl;
lowres_clock::time_point timeout;
std::vector<aggregation_info> aggregation_infos;
};
Benchmarks
Benchmarks
Specification:
■ 3 nodes cluster on AWS
■ Instances: i3.4xlarge (16 vCPUs, memory 112 GiB)
■ Each result is average of 10 queries
■ Queries executed one by one with “BYPASS CACHE” option
Data:
■ 50 millions rows
■ “Twitter accounts” table
Benchmarks - Native aggregates
Aggregation COUNT(*) SUM(id) MIN(id)
Sequential 408s 411s 407s
Distributed 23s 24s 23s
Benchmarks - User-Defined Aggregates
Aggregation count_elon_followers(follows)
Sequential 1450s
Distributed 78s
Benchmarks - CPU Usage
Sequential aggregate Distributed aggregate
Benchmarks - Transmitted packets
Sequential aggregate Distributed aggregate
Brought to you by
Michał Jadwiszczak
michal.jadwiszczak@scylladb.com

Weitere ähnliche Inhalte

Was ist angesagt?

MySQL_MariaDB-성능개선-202201.pptx
MySQL_MariaDB-성능개선-202201.pptxMySQL_MariaDB-성능개선-202201.pptx
MySQL_MariaDB-성능개선-202201.pptxNeoClova
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basicsnickmbailey
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code GenerationDatabricks
 
Solr's missing plugin ecosystem
Solr's missing plugin ecosystemSolr's missing plugin ecosystem
Solr's missing plugin ecosystemCominvent AS
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationmysqlops
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks
 
Scylla Summit 2022: ScyllaDB Embraces Wasm
Scylla Summit 2022: ScyllaDB Embraces WasmScylla Summit 2022: ScyllaDB Embraces Wasm
Scylla Summit 2022: ScyllaDB Embraces WasmScyllaDB
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLScyllaDB
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBrendan Gregg
 
ksqlDB로 실시간 데이터 변환 및 스트림 처리
ksqlDB로 실시간 데이터 변환 및 스트림 처리ksqlDB로 실시간 데이터 변환 및 스트림 처리
ksqlDB로 실시간 데이터 변환 및 스트림 처리confluent
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
 
Lightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraLightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraScyllaDB
 
PostgreSQL and JDBC: striving for high performance
PostgreSQL and JDBC: striving for high performancePostgreSQL and JDBC: striving for high performance
PostgreSQL and JDBC: striving for high performanceVladimir Sitnikov
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger InternalsNorberto Leite
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Use ScyllaDB Alternator to Use Amazon DynamoDB API, Everywhere, Better, More ...
Use ScyllaDB Alternator to Use Amazon DynamoDB API, Everywhere, Better, More ...Use ScyllaDB Alternator to Use Amazon DynamoDB API, Everywhere, Better, More ...
Use ScyllaDB Alternator to Use Amazon DynamoDB API, Everywhere, Better, More ...ScyllaDB
 

Was ist angesagt? (20)

MySQL_MariaDB-성능개선-202201.pptx
MySQL_MariaDB-성능개선-202201.pptxMySQL_MariaDB-성능개선-202201.pptx
MySQL_MariaDB-성능개선-202201.pptx
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basics
 
How to Design Indexes, Really
How to Design Indexes, ReallyHow to Design Indexes, Really
How to Design Indexes, Really
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 
Solr's missing plugin ecosystem
Solr's missing plugin ecosystemSolr's missing plugin ecosystem
Solr's missing plugin ecosystem
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimization
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
Scylla Summit 2022: ScyllaDB Embraces Wasm
Scylla Summit 2022: ScyllaDB Embraces WasmScylla Summit 2022: ScyllaDB Embraces Wasm
Scylla Summit 2022: ScyllaDB Embraces Wasm
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQL
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame Graphs
 
ksqlDB로 실시간 데이터 변환 및 스트림 처리
ksqlDB로 실시간 데이터 변환 및 스트림 처리ksqlDB로 실시간 데이터 변환 및 스트림 처리
ksqlDB로 실시간 데이터 변환 및 스트림 처리
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
Lightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraLightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache Cassandra
 
PostgreSQL and JDBC: striving for high performance
PostgreSQL and JDBC: striving for high performancePostgreSQL and JDBC: striving for high performance
PostgreSQL and JDBC: striving for high performance
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger Internals
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Use ScyllaDB Alternator to Use Amazon DynamoDB API, Everywhere, Better, More ...
Use ScyllaDB Alternator to Use Amazon DynamoDB API, Everywhere, Better, More ...Use ScyllaDB Alternator to Use Amazon DynamoDB API, Everywhere, Better, More ...
Use ScyllaDB Alternator to Use Amazon DynamoDB API, Everywhere, Better, More ...
 

Ähnlich wie Implementing Highly Performant Distributed Aggregates

Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Databricks
 
Troubleshooting MySQL from a MySQL Developer Perspective
Troubleshooting MySQL from a MySQL Developer PerspectiveTroubleshooting MySQL from a MySQL Developer Perspective
Troubleshooting MySQL from a MySQL Developer PerspectiveMarcelo Altmann
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLCommand Prompt., Inc
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache SparkLucian Neghina
 
Java 8 - functional features
Java 8 - functional featuresJava 8 - functional features
Java 8 - functional featuresRafal Rybacki
 
Kapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing EngineKapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing EnginePrashant Vats
 
Save Coding Time with Proc SQL.ppt
Save Coding Time with Proc SQL.pptSave Coding Time with Proc SQL.ppt
Save Coding Time with Proc SQL.pptssuser660bb1
 
Building Efficient Multi-Threaded Filters for Faster SQL Queries
Building Efficient Multi-Threaded Filters for Faster SQL QueriesBuilding Efficient Multi-Threaded Filters for Faster SQL Queries
Building Efficient Multi-Threaded Filters for Faster SQL QueriesScyllaDB
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLJim Mlodgenski
 
Oracle postgre sql-mirgration-top-10-mistakes
Oracle postgre sql-mirgration-top-10-mistakesOracle postgre sql-mirgration-top-10-mistakes
Oracle postgre sql-mirgration-top-10-mistakesJim Mlodgenski
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
 
DAGScheduler - The Internals of Apache Spark.pdf
DAGScheduler - The Internals of Apache Spark.pdfDAGScheduler - The Internals of Apache Spark.pdf
DAGScheduler - The Internals of Apache Spark.pdfJoeKibangu
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
 
Precog & MongoDB User Group: Skyrocket Your Analytics
Precog & MongoDB User Group: Skyrocket Your Analytics Precog & MongoDB User Group: Skyrocket Your Analytics
Precog & MongoDB User Group: Skyrocket Your Analytics MongoDB
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLCommand Prompt., Inc
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLMark Wong
 

Ähnlich wie Implementing Highly Performant Distributed Aggregates (20)

Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
 
MySQL performance tuning
MySQL performance tuningMySQL performance tuning
MySQL performance tuning
 
Troubleshooting MySQL from a MySQL Developer Perspective
Troubleshooting MySQL from a MySQL Developer PerspectiveTroubleshooting MySQL from a MySQL Developer Perspective
Troubleshooting MySQL from a MySQL Developer Perspective
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
 
Esd module2
Esd module2Esd module2
Esd module2
 
Java 8 - functional features
Java 8 - functional featuresJava 8 - functional features
Java 8 - functional features
 
Kapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing EngineKapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing Engine
 
Save Coding Time with Proc SQL.ppt
Save Coding Time with Proc SQL.pptSave Coding Time with Proc SQL.ppt
Save Coding Time with Proc SQL.ppt
 
Building Efficient Multi-Threaded Filters for Faster SQL Queries
Building Efficient Multi-Threaded Filters for Faster SQL QueriesBuilding Efficient Multi-Threaded Filters for Faster SQL Queries
Building Efficient Multi-Threaded Filters for Faster SQL Queries
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
Oracle postgre sql-mirgration-top-10-mistakes
Oracle postgre sql-mirgration-top-10-mistakesOracle postgre sql-mirgration-top-10-mistakes
Oracle postgre sql-mirgration-top-10-mistakes
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
DAGScheduler - The Internals of Apache Spark.pdf
DAGScheduler - The Internals of Apache Spark.pdfDAGScheduler - The Internals of Apache Spark.pdf
DAGScheduler - The Internals of Apache Spark.pdf
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
Precog & MongoDB User Group: Skyrocket Your Analytics
Precog & MongoDB User Group: Skyrocket Your Analytics Precog & MongoDB User Group: Skyrocket Your Analytics
Precog & MongoDB User Group: Skyrocket Your Analytics
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 

Mehr von ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

Mehr von ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Kürzlich hochgeladen

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 

Kürzlich hochgeladen (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 

Implementing Highly Performant Distributed Aggregates

  • 1. Brought to you by Implementing Highly Performant Distributed Aggregates Michał Jadwiszczak Junior Software Engineer at ScyllaDB
  • 3. Why distribute aggregates? How are aggregate queries executed currently? ■ Coordinator downloads all required data from other nodes ■ Then locally aggregates the whole table Reasons to do the distribution ■ This is slow and very inefficient ■ The workload is accumulated on a single machine
  • 4. Sequential aggregate SELECT COUNT(*) FROM keyspace.table p0..p1 p2..p3 p4..p5 p6..p7
  • 5. Sequential aggregate SELECT COUNT(*) FROM keyspace.table Get data p0..p1 p2..p3 p4..p5 p6..p7
  • 6. Sequential aggregate SELECT COUNT(*) FROM keyspace.table p0..p1 p2..p3 p4..p5 p6..p7 Return(p2..p3) Return(p4..p5) Return(p6..p7)
  • 7. COUNT( ) Sequential aggregate SELECT COUNT(*) FROM keyspace.table p0..p1 p2..p3 p4..p5 p6..p7 Return(p2..p3) Return(p4..p5) Return(p6..p7)
  • 9. Distributed aggregate SELECT COUNT(*) FROM keyspace.table p0..p1 p2..p3 p4..p5 p6..p7
  • 10. Distributed aggregate SELECT COUNT(*) FROM keyspace.table SELECT COUNT(p2..p3) p0..p1 p4..p5 p6..p7 SELECT COUNT(p4..p5) SELECT COUNT(p6..p7) p2..p3
  • 11. Distributed aggregate SELECT COUNT(*) FROM keyspace.table Return c1 Return c2 Return c3 COUNT( ) p2..p3 COUNT( ) p4..p5 COUNT( ) p6..p7 COUNT( ) p0..p1
  • 12. REDUCE( ) Distributed aggregate SELECT COUNT(*) FROM keyspace.table c0 c1 c2 c3
  • 13. Distributing the aggregates The parallelization of aggregates works in a pretty straightforward way ■ Coordinator which receives the query turns into super-coordinator ■ Super-coordinator splits the query into subqueries for other nodes ● Each node receive query only for data it holds ● Each partition range exists in only one subquery to avoid data duplication ■ Each participating node executes its subquery and returns a partial result ● Partial result is a state of an aggregate without finalizing it ■ Super-coordinator receives the results, reduces them and finalizes
  • 15. Distributing aggregates Scylla not only distributed the work at the cluster level. The nodes also distribute the query to their shards. Each shard again gets a query to only its data, so it doesn’t have to communicate with other ones. All native aggregates have their distributed implementation. Scylla uses them by default without user need to think about it. Native aggregates: ■ COUNT ■ SUM ■ AVG ■ MAX ■ MIN
  • 16. User-Defined Aggregates UDA consists of: ■ Required fields ● State function ● State type ■ Optional fields ● Final function ● Initial condition CREATE AGGREGATE my_aggregate(int) SFUNC state_f STYPE tuple<int, bigint> FINALFUNC final_f INITCOND (0, 0);
  • 17. Distributed User-Defined Aggregates UDA consists of: ■ Required fields ● State function ● State type ■ Optional fields ● Reduce function ● Final function ● Initial condition CREATE AGGREGATE my_aggregate(int) SFUNC state_f STYPE tuple<int, bigint> REDUCEFUNC reduce_f FINALFUNC final_f INITCOND (0, 0);
  • 18. Example of UDA How many people follows Elon Musk on Twitter? Operations: ■ full scan table ■ iterate over each list CREATE TABLE twitter_users ( id bigint, name text, follows list<bigint>, PRIMARY KEY(id) )
  • 19. Example of UDA CREATE FUNCTION count_if_follows_elon(acc bigint, follows list<bigint>) RETURNS NULL ON NULL INPUT RETURNS bigint LANGUAGE lua AS $$ for _, f in ipairs(follows) do if f == 100 then return acc + 1 end end return acc $$ CREATE FUNCTION reduce_followers_count(acc1 bigint, acc2 bigint) RETURNS NULL ON NULL INPUT RETURNS bigint LANGUAGE lua AS $$ return acc1 + acc2 $$
  • 20. Example of UDA CREATE AGGREGATE count_elon_followers(list<bigint>) SFUNC count_if_follows_elon STYPE bigint REDUCEFUNC reduce_followers_count INITCOND 0
  • 21. Reducing the results ■ Each coordinator reduces results from its shards and sends the result to super-coordinator ■ Super-coordinator is responsible for reducing partial results from other coordinators ■ Reduction is done be exciting proper function ■ All of the partial results are stored in memory (for now!)
  • 22. What if the subquery fails? Parallelization service provides retrying system in case of subquery failure. ■ When the subquery fails ● Super-coordinator acts as a its coordinator … ● and executes it This way, the query is more likely to succeed, for instance by selecting another, available replica.
  • 23. Limitations Main limitations are only queries without WHERE and GROUP BY clauses can be distributed. Because select statement can’t be serialized, we are representing it as forward_request. struct forward_request { struct aggregation_info { db::functions::function_name name; std::vector<sstring> column_names; }; query::read_command cmd; dht::partition_range_vector pr; db::consistency_level cl; lowres_clock::time_point timeout; std::vector<aggregation_info> aggregation_infos; };
  • 25. Benchmarks Specification: ■ 3 nodes cluster on AWS ■ Instances: i3.4xlarge (16 vCPUs, memory 112 GiB) ■ Each result is average of 10 queries ■ Queries executed one by one with “BYPASS CACHE” option Data: ■ 50 millions rows ■ “Twitter accounts” table
  • 26. Benchmarks - Native aggregates Aggregation COUNT(*) SUM(id) MIN(id) Sequential 408s 411s 407s Distributed 23s 24s 23s
  • 27. Benchmarks - User-Defined Aggregates Aggregation count_elon_followers(follows) Sequential 1450s Distributed 78s
  • 28. Benchmarks - CPU Usage Sequential aggregate Distributed aggregate
  • 29. Benchmarks - Transmitted packets Sequential aggregate Distributed aggregate
  • 30. Brought to you by Michał Jadwiszczak michal.jadwiszczak@scylladb.com