Cassandra 2.1 boot camp, exercise

•

0 gefällt mir•962 views

Joshua McKenzie

Cassandra Summit Boot Camp, 2014 Coding Exercise

Technologie

Build a compaction strategy that
compacts the most overlapping
sstables together

0. Setting up your IDE
https://wiki.apache.org/cassandra/RunningCassandraInIDEA
http://wiki.apache.org/cassandra/RunningCassandraInEclipse

1. Implement a no-op compaction
strategy
● class Xyz extends AbstractCompactionStrategy {..}
● Implement the abstract methods
○ getNextBackgroundTask
■ Return a CompactionTask containing the sstables you want to
compact, null if none
○ getMaximalTask
■ ‘Major compaction’ - should compact all sstables
○ ...
● ALTER TABLE foo WITH compaction = {
class: ‘Xyz’ }

2. Make it compact the most
overlapping sstables
● We should reduce disk usage the most if we compact
the overlapping sstables together
● CompactionMetadata has ICardinality
○ HyperLogLog - count unique items in a stream
○ Currently used to estimate how big bloom filters we need to allocate
during compaction
○ https://github.com/addthis/stream-lib
○ SSTableReader#getApproximateKeyCount
○ ICardinality#merge - merge several of these components to find count
of keys in the union of the sstables.

3. Add support for
worthDroppingTombstones
● Single-sstable compaction to drop tombstones
● Tries to figure how much sstables overlap and then
estimate how many tombstones we have outside that
overlap
● Currently we check for range overlap
● Could probably be improved if we used ICardinality

4. Add heuristics to avoid n²
CompactionMetadata comparisons
Algorithms!

Summary
1. Implement a no-op compaction strategy
2. Make it compact the most overlapping
sstables
3. Add support for worth dropping tombstones
4. Add heuristics to avoid n² comparisons
Slides: bit.ly/1pd9Bws

Weitere ähnliche Inhalte

Was ist angesagt?

Gluster as Block Store in ContainersGluster.org

Storage best practicesMaor Lipchuk

CloudModule for ZabbixDaisuke Ikeda

OpenNebulaConf2018 - OpenNebula and LXD Containers - Rubén S. Montero - OpenN...OpenNebula Project

oVirt DR Site to-site using ansibleMaor Lipchuk

CRIU: time and space travel for Linux containers -- Kir KolyshkinOpenVZ

OpenStack Ottawa Q3 Meetup September 26th 2017Stacy Véronneau

Barcamp presentationVachagan Balayan

Memory Forensics in AWSMarcVilanova1

Improving hyperconverged performanceDenis Chapligin

Wikimedia Content API: A Cassandra Use-caseEric Evans

OpenNebula LXD Container Support overviewCSUC - Consorci de Serveis Universitaris de Catalunya

OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...NETWAYS

Wikimedia Content API: A Cassandra Use-caseEric Evans

Hello, Docker!Michael Kwesi Essandoh

Is It Faster to Go with Redpanda Transactions than Without Them?!ScyllaDB

Building a continuous delivery platform for the biggest spike in e-commerce -...Puppet

Object Storage in a Cloud-Native Container EnvirnomentMinio

Incident Response Automation @ Netflix Q12019MarcVilanova1

OpenNebula TechDay Waterloo 2015 - OpenNebula is Evolving FastOpenNebula Project

Was ist angesagt? (20)

Gluster as Block Store in Containers

Storage best practices

CloudModule for Zabbix

OpenNebulaConf2018 - OpenNebula and LXD Containers - Rubén S. Montero - OpenN...

oVirt DR Site to-site using ansible

CRIU: time and space travel for Linux containers -- Kir Kolyshkin

OpenStack Ottawa Q3 Meetup September 26th 2017

Barcamp presentation

Memory Forensics in AWS

Improving hyperconverged performance

Wikimedia Content API: A Cassandra Use-case

OpenNebula LXD Container Support overview

OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...

Wikimedia Content API: A Cassandra Use-case

Hello, Docker!

Is It Faster to Go with Redpanda Transactions than Without Them?!

Building a continuous delivery platform for the biggest spike in e-commerce -...

Object Storage in a Cloud-Native Container Envirnoment

Incident Response Automation @ Netflix Q12019

OpenNebula TechDay Waterloo 2015 - OpenNebula is Evolving Fast

Andere mochten auch

Film production supplies phoenix az, grip equipment phoenix azvidyasagar555

Cic pptDeepika BL

Doc1Rhulee Sinaga

Unit 7Alejandra Moreno

врач и компьютер часть3 презентация_аркадийirinaisaeva12

Mens haircuts augusta gavidyasagar555

врач и компьютер часть 4 аркадий_презентацияirinaisaeva12

Analysis of student digipakellyshakular

Research into video platformsBrettMooreG321

Andere mochten auch (9)

Film production supplies phoenix az, grip equipment phoenix az

Cic ppt

Doc1

Unit 7

врач и компьютер часть3 презентация_аркадий

Mens haircuts augusta ga

врач и компьютер часть 4 аркадий_презентация

Analysis of student digipak

Research into video platforms

Ähnlich wie Cassandra 2.1 boot camp, exercise

Scaling Cassandra for Big DataDataStax Academy

How Incremental Compaction Reduces Your Storage FootprintScyllaDB

Introduction to containersNitish Jadia

Overview of kubernetes network functionsHungWei Chiu

Deploying OpenStack with AnsibleKevin Carter

Basic stuff You Need to Know about CassandraYu-Chang Ho

HKG18-419 - OpenHPC on AnsibleLinaro

Mongo nyc nyt + mongodbDeep Kapadia

Tmax tutorial4sachinrulz4

Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax AstraAnant Corporation

ScyllaDB: NoSQL at Ludicrous SpeedJ On The Beach

Tips, Tricks & Best Practices for large scale HDInsight DeploymentsAshish Thapliyal

Helm and the zen of managing complex Kubernetes appsAbhishek Chanda

Docker 原理與實作kao kuo-tung

The elastic stack on dockerSmartWave

Intro to Hadoop ecosystem and Apache KylinChase Zhang

Mosix ClusterAbhay Pai

CEPH DAY BERLIN - PRACTICAL CEPHFS AND NFS USING OPENSTACK MANILACeph Community

Practical CephFS with nfs today using OpenStack Manila - Ceph Day Berlin - 12...TomBarron

Introduction to Apache Tajo: Data Warehouse for Big DataJihoon Son

Ähnlich wie Cassandra 2.1 boot camp, exercise (20)

Scaling Cassandra for Big Data

How Incremental Compaction Reduces Your Storage Footprint

Introduction to containers

Overview of kubernetes network functions

Deploying OpenStack with Ansible

Basic stuff You Need to Know about Cassandra

HKG18-419 - OpenHPC on Ansible

Mongo nyc nyt + mongodb

Tmax tutorial4

Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra

ScyllaDB: NoSQL at Ludicrous Speed

Tips, Tricks & Best Practices for large scale HDInsight Deployments

Helm and the zen of managing complex Kubernetes apps

Docker 原理與實作

The elastic stack on docker

Intro to Hadoop ecosystem and Apache Kylin

Mosix Cluster

CEPH DAY BERLIN - PRACTICAL CEPHFS AND NFS USING OPENSTACK MANILA

Practical CephFS with nfs today using OpenStack Manila - Ceph Day Berlin - 12...

Introduction to Apache Tajo: Data Warehouse for Big Data

Kürzlich hochgeladen

Artificial Intelligence: Facts and MythsJoaquim Jorge

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Scaling API-first – The story of a global engineering organizationRadu Cotescu

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

How to convert PDF to text with Nanonetsnaman860154

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Kürzlich hochgeladen (20)

Artificial Intelligence: Facts and Myths

How to Troubleshoot Apps for the Modern Connected Worker

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Scaling API-first – The story of a global engineering organization

CNv6 Instructor Chapter 6 Quality of Service

What Are The Drone Anti-jamming Systems Technology?

How to convert PDF to text with Nanonets

How to Troubleshoot Apps for the Modern Connected Worker

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

A Domino Admins Adventures (Engage 2024)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

GenCyber Cyber Security Day Presentation

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Boost PC performance: How more available memory can improve productivity

Data Cloud, More than a CDP by Matt Robison

[2024]Digital Global Overview Report 2024 Meltwater.pdf

presentation ICT roal in 21st century education

Cassandra 2.1 boot camp, exercise

1. Assignment

2. Build a compaction strategy that compacts the most overlapping sstables together

3. 0. Setting up your IDE https://wiki.apache.org/cassandra/RunningCassandraInIDEA http://wiki.apache.org/cassandra/RunningCassandraInEclipse

4. 1. Implement a no-op compaction strategy ● class Xyz extends AbstractCompactionStrategy {..} ● Implement the abstract methods ○ getNextBackgroundTask ■ Return a CompactionTask containing the sstables you want to compact, null if none ○ getMaximalTask ■ ‘Major compaction’ - should compact all sstables ○ ... ● ALTER TABLE foo WITH compaction = { class: ‘Xyz’ }

5. 2. Make it compact the most overlapping sstables ● We should reduce disk usage the most if we compact the overlapping sstables together ● CompactionMetadata has ICardinality ○ HyperLogLog - count unique items in a stream ○ Currently used to estimate how big bloom filters we need to allocate during compaction ○ https://github.com/addthis/stream-lib ○ SSTableReader#getApproximateKeyCount ○ ICardinality#merge - merge several of these components to find count of keys in the union of the sstables.

6. 3. Add support for worthDroppingTombstones ● Single-sstable compaction to drop tombstones ● Tries to figure how much sstables overlap and then estimate how many tombstones we have outside that overlap ● Currently we check for range overlap ● Could probably be improved if we used ICardinality

7. 4. Add heuristics to avoid n² CompactionMetadata comparisons Algorithms!

8. Summary 1. Implement a no-op compaction strategy 2. Make it compact the most overlapping sstables 3. Add support for worth dropping tombstones 4. Add heuristics to avoid n² comparisons Slides: bit.ly/1pd9Bws

Cassandra 2.1 boot camp, exercise

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (9)

Ähnlich wie Cassandra 2.1 boot camp, exercise

Ähnlich wie Cassandra 2.1 boot camp, exercise (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Cassandra 2.1 boot camp, exercise