SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
Assignment
Build a compaction strategy that 
compacts the most overlapping 
sstables together
0. Setting up your IDE 
https://wiki.apache.org/cassandra/RunningCassandraInIDEA 
http://wiki.apache.org/cassandra/RunningCassandraInEclipse
1. Implement a no-op compaction 
strategy 
● class Xyz extends AbstractCompactionStrategy {..} 
● Implement the abstract methods 
○ getNextBackgroundTask 
■ Return a CompactionTask containing the sstables you want to 
compact, null if none 
○ getMaximalTask 
■ ‘Major compaction’ - should compact all sstables 
○ ... 
● ALTER TABLE foo WITH compaction = { 
class: ‘Xyz’ }
2. Make it compact the most 
overlapping sstables 
● We should reduce disk usage the most if we compact 
the overlapping sstables together 
● CompactionMetadata has ICardinality 
○ HyperLogLog - count unique items in a stream 
○ Currently used to estimate how big bloom filters we need to allocate 
during compaction 
○ https://github.com/addthis/stream-lib 
○ SSTableReader#getApproximateKeyCount 
○ ICardinality#merge - merge several of these components to find count 
of keys in the union of the sstables.
3. Add support for 
worthDroppingTombstones 
● Single-sstable compaction to drop tombstones 
● Tries to figure how much sstables overlap and then 
estimate how many tombstones we have outside that 
overlap 
● Currently we check for range overlap 
● Could probably be improved if we used ICardinality
4. Add heuristics to avoid n² 
CompactionMetadata comparisons 
Algorithms!
Summary 
1. Implement a no-op compaction strategy 
2. Make it compact the most overlapping 
sstables 
3. Add support for worth dropping tombstones 
4. Add heuristics to avoid n² comparisons 
Slides: bit.ly/1pd9Bws

Weitere ähnliche Inhalte

Was ist angesagt?

Gluster as Block Store in Containers
Gluster as Block Store in ContainersGluster as Block Store in Containers
Gluster as Block Store in ContainersGluster.org
 
Storage best practices
Storage best practicesStorage best practices
Storage best practicesMaor Lipchuk
 
CloudModule for Zabbix
CloudModule for ZabbixCloudModule for Zabbix
CloudModule for ZabbixDaisuke Ikeda
 
OpenNebulaConf2018 - OpenNebula and LXD Containers - Rubén S. Montero - OpenN...
OpenNebulaConf2018 - OpenNebula and LXD Containers - Rubén S. Montero - OpenN...OpenNebulaConf2018 - OpenNebula and LXD Containers - Rubén S. Montero - OpenN...
OpenNebulaConf2018 - OpenNebula and LXD Containers - Rubén S. Montero - OpenN...OpenNebula Project
 
oVirt DR Site to-site using ansible
oVirt DR Site to-site using ansibleoVirt DR Site to-site using ansible
oVirt DR Site to-site using ansibleMaor Lipchuk
 
CRIU: time and space travel for Linux containers -- Kir Kolyshkin
CRIU: time and space travel for Linux containers -- Kir KolyshkinCRIU: time and space travel for Linux containers -- Kir Kolyshkin
CRIU: time and space travel for Linux containers -- Kir KolyshkinOpenVZ
 
OpenStack Ottawa Q3 Meetup September 26th 2017
OpenStack Ottawa Q3 Meetup   September 26th 2017OpenStack Ottawa Q3 Meetup   September 26th 2017
OpenStack Ottawa Q3 Meetup September 26th 2017Stacy Véronneau
 
Memory Forensics in AWS
Memory Forensics in AWSMemory Forensics in AWS
Memory Forensics in AWSMarcVilanova1
 
Improving hyperconverged performance
Improving hyperconverged performanceImproving hyperconverged performance
Improving hyperconverged performanceDenis Chapligin
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseEric Evans
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...NETWAYS
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseEric Evans
 
Is It Faster to Go with Redpanda Transactions than Without Them?!
Is It Faster to Go with Redpanda Transactions than Without Them?!Is It Faster to Go with Redpanda Transactions than Without Them?!
Is It Faster to Go with Redpanda Transactions than Without Them?!ScyllaDB
 
Building a continuous delivery platform for the biggest spike in e-commerce -...
Building a continuous delivery platform for the biggest spike in e-commerce -...Building a continuous delivery platform for the biggest spike in e-commerce -...
Building a continuous delivery platform for the biggest spike in e-commerce -...Puppet
 
Object Storage in a Cloud-Native Container Envirnoment
Object Storage in a Cloud-Native Container EnvirnomentObject Storage in a Cloud-Native Container Envirnoment
Object Storage in a Cloud-Native Container EnvirnomentMinio
 
Incident Response Automation @ Netflix Q12019
Incident Response Automation @ Netflix Q12019Incident Response Automation @ Netflix Q12019
Incident Response Automation @ Netflix Q12019MarcVilanova1
 
OpenNebula TechDay Waterloo 2015 - OpenNebula is Evolving Fast
OpenNebula TechDay Waterloo 2015 - OpenNebula is Evolving FastOpenNebula TechDay Waterloo 2015 - OpenNebula is Evolving Fast
OpenNebula TechDay Waterloo 2015 - OpenNebula is Evolving FastOpenNebula Project
 

Was ist angesagt? (20)

Gluster as Block Store in Containers
Gluster as Block Store in ContainersGluster as Block Store in Containers
Gluster as Block Store in Containers
 
Storage best practices
Storage best practicesStorage best practices
Storage best practices
 
CloudModule for Zabbix
CloudModule for ZabbixCloudModule for Zabbix
CloudModule for Zabbix
 
OpenNebulaConf2018 - OpenNebula and LXD Containers - Rubén S. Montero - OpenN...
OpenNebulaConf2018 - OpenNebula and LXD Containers - Rubén S. Montero - OpenN...OpenNebulaConf2018 - OpenNebula and LXD Containers - Rubén S. Montero - OpenN...
OpenNebulaConf2018 - OpenNebula and LXD Containers - Rubén S. Montero - OpenN...
 
oVirt DR Site to-site using ansible
oVirt DR Site to-site using ansibleoVirt DR Site to-site using ansible
oVirt DR Site to-site using ansible
 
CRIU: time and space travel for Linux containers -- Kir Kolyshkin
CRIU: time and space travel for Linux containers -- Kir KolyshkinCRIU: time and space travel for Linux containers -- Kir Kolyshkin
CRIU: time and space travel for Linux containers -- Kir Kolyshkin
 
OpenStack Ottawa Q3 Meetup September 26th 2017
OpenStack Ottawa Q3 Meetup   September 26th 2017OpenStack Ottawa Q3 Meetup   September 26th 2017
OpenStack Ottawa Q3 Meetup September 26th 2017
 
Barcamp presentation
Barcamp presentationBarcamp presentation
Barcamp presentation
 
Memory Forensics in AWS
Memory Forensics in AWSMemory Forensics in AWS
Memory Forensics in AWS
 
Improving hyperconverged performance
Improving hyperconverged performanceImproving hyperconverged performance
Improving hyperconverged performance
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-case
 
OpenNebula LXD Container Support overview
OpenNebula LXD Container Support overviewOpenNebula LXD Container Support overview
OpenNebula LXD Container Support overview
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-case
 
Hello, Docker!
Hello, Docker!Hello, Docker!
Hello, Docker!
 
Is It Faster to Go with Redpanda Transactions than Without Them?!
Is It Faster to Go with Redpanda Transactions than Without Them?!Is It Faster to Go with Redpanda Transactions than Without Them?!
Is It Faster to Go with Redpanda Transactions than Without Them?!
 
Building a continuous delivery platform for the biggest spike in e-commerce -...
Building a continuous delivery platform for the biggest spike in e-commerce -...Building a continuous delivery platform for the biggest spike in e-commerce -...
Building a continuous delivery platform for the biggest spike in e-commerce -...
 
Object Storage in a Cloud-Native Container Envirnoment
Object Storage in a Cloud-Native Container EnvirnomentObject Storage in a Cloud-Native Container Envirnoment
Object Storage in a Cloud-Native Container Envirnoment
 
Incident Response Automation @ Netflix Q12019
Incident Response Automation @ Netflix Q12019Incident Response Automation @ Netflix Q12019
Incident Response Automation @ Netflix Q12019
 
OpenNebula TechDay Waterloo 2015 - OpenNebula is Evolving Fast
OpenNebula TechDay Waterloo 2015 - OpenNebula is Evolving FastOpenNebula TechDay Waterloo 2015 - OpenNebula is Evolving Fast
OpenNebula TechDay Waterloo 2015 - OpenNebula is Evolving Fast
 

Andere mochten auch

Film production supplies phoenix az, grip equipment phoenix az
Film production supplies phoenix az, grip equipment phoenix azFilm production supplies phoenix az, grip equipment phoenix az
Film production supplies phoenix az, grip equipment phoenix azvidyasagar555
 
врач и компьютер часть3 презентация_аркадий
врач и компьютер часть3 презентация_аркадийврач и компьютер часть3 презентация_аркадий
врач и компьютер часть3 презентация_аркадийirinaisaeva12
 
Mens haircuts augusta ga
Mens haircuts augusta gaMens haircuts augusta ga
Mens haircuts augusta gavidyasagar555
 
врач и компьютер часть 4 аркадий_презентация
врач и компьютер часть 4 аркадий_презентацияврач и компьютер часть 4 аркадий_презентация
врач и компьютер часть 4 аркадий_презентацияirinaisaeva12
 
Analysis of student digipak
Analysis of student digipakAnalysis of student digipak
Analysis of student digipakellyshakular
 
Research into video platforms
Research into video platformsResearch into video platforms
Research into video platformsBrettMooreG321
 

Andere mochten auch (9)

Film production supplies phoenix az, grip equipment phoenix az
Film production supplies phoenix az, grip equipment phoenix azFilm production supplies phoenix az, grip equipment phoenix az
Film production supplies phoenix az, grip equipment phoenix az
 
Cic ppt
Cic pptCic ppt
Cic ppt
 
Doc1
Doc1Doc1
Doc1
 
Unit 7
Unit 7Unit 7
Unit 7
 
врач и компьютер часть3 презентация_аркадий
врач и компьютер часть3 презентация_аркадийврач и компьютер часть3 презентация_аркадий
врач и компьютер часть3 презентация_аркадий
 
Mens haircuts augusta ga
Mens haircuts augusta gaMens haircuts augusta ga
Mens haircuts augusta ga
 
врач и компьютер часть 4 аркадий_презентация
врач и компьютер часть 4 аркадий_презентацияврач и компьютер часть 4 аркадий_презентация
врач и компьютер часть 4 аркадий_презентация
 
Analysis of student digipak
Analysis of student digipakAnalysis of student digipak
Analysis of student digipak
 
Research into video platforms
Research into video platformsResearch into video platforms
Research into video platforms
 

Ähnlich wie Cassandra 2.1 boot camp, exercise

Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big DataDataStax Academy
 
How Incremental Compaction Reduces Your Storage Footprint
How Incremental Compaction Reduces Your Storage FootprintHow Incremental Compaction Reduces Your Storage Footprint
How Incremental Compaction Reduces Your Storage FootprintScyllaDB
 
Introduction to containers
Introduction to containersIntroduction to containers
Introduction to containersNitish Jadia
 
Overview of kubernetes network functions
Overview of kubernetes network functionsOverview of kubernetes network functions
Overview of kubernetes network functionsHungWei Chiu
 
Deploying OpenStack with Ansible
Deploying OpenStack with AnsibleDeploying OpenStack with Ansible
Deploying OpenStack with AnsibleKevin Carter
 
Basic stuff You Need to Know about Cassandra
Basic stuff You Need to Know about CassandraBasic stuff You Need to Know about Cassandra
Basic stuff You Need to Know about CassandraYu-Chang Ho
 
HKG18-419 - OpenHPC on Ansible
HKG18-419 - OpenHPC on AnsibleHKG18-419 - OpenHPC on Ansible
HKG18-419 - OpenHPC on AnsibleLinaro
 
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodbDeep Kapadia
 
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax AstraApache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax AstraAnant Corporation
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedJ On The Beach
 
Tips, Tricks & Best Practices for large scale HDInsight Deployments
Tips, Tricks & Best Practices for large scale HDInsight DeploymentsTips, Tricks & Best Practices for large scale HDInsight Deployments
Tips, Tricks & Best Practices for large scale HDInsight DeploymentsAshish Thapliyal
 
Helm and the zen of managing complex Kubernetes apps
Helm and the zen of managing complex Kubernetes appsHelm and the zen of managing complex Kubernetes apps
Helm and the zen of managing complex Kubernetes appsAbhishek Chanda
 
Docker 原理與實作
Docker 原理與實作Docker 原理與實作
Docker 原理與實作kao kuo-tung
 
The elastic stack on docker
The elastic stack on dockerThe elastic stack on docker
The elastic stack on dockerSmartWave
 
Intro to Hadoop ecosystem and Apache Kylin
Intro to Hadoop ecosystem and Apache KylinIntro to Hadoop ecosystem and Apache Kylin
Intro to Hadoop ecosystem and Apache KylinChase Zhang
 
Mosix Cluster
Mosix ClusterMosix Cluster
Mosix ClusterAbhay Pai
 
CEPH DAY BERLIN - PRACTICAL CEPHFS AND NFS USING OPENSTACK MANILA
CEPH DAY BERLIN - PRACTICAL CEPHFS AND NFS USING OPENSTACK MANILACEPH DAY BERLIN - PRACTICAL CEPHFS AND NFS USING OPENSTACK MANILA
CEPH DAY BERLIN - PRACTICAL CEPHFS AND NFS USING OPENSTACK MANILACeph Community
 
Practical CephFS with nfs today using OpenStack Manila - Ceph Day Berlin - 12...
Practical CephFS with nfs today using OpenStack Manila - Ceph Day Berlin - 12...Practical CephFS with nfs today using OpenStack Manila - Ceph Day Berlin - 12...
Practical CephFS with nfs today using OpenStack Manila - Ceph Day Berlin - 12...TomBarron
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataJihoon Son
 

Ähnlich wie Cassandra 2.1 boot camp, exercise (20)

Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
 
How Incremental Compaction Reduces Your Storage Footprint
How Incremental Compaction Reduces Your Storage FootprintHow Incremental Compaction Reduces Your Storage Footprint
How Incremental Compaction Reduces Your Storage Footprint
 
Introduction to containers
Introduction to containersIntroduction to containers
Introduction to containers
 
Overview of kubernetes network functions
Overview of kubernetes network functionsOverview of kubernetes network functions
Overview of kubernetes network functions
 
Deploying OpenStack with Ansible
Deploying OpenStack with AnsibleDeploying OpenStack with Ansible
Deploying OpenStack with Ansible
 
Basic stuff You Need to Know about Cassandra
Basic stuff You Need to Know about CassandraBasic stuff You Need to Know about Cassandra
Basic stuff You Need to Know about Cassandra
 
HKG18-419 - OpenHPC on Ansible
HKG18-419 - OpenHPC on AnsibleHKG18-419 - OpenHPC on Ansible
HKG18-419 - OpenHPC on Ansible
 
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodb
 
Tmax tutorial4
Tmax tutorial4Tmax tutorial4
Tmax tutorial4
 
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax AstraApache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous Speed
 
Tips, Tricks & Best Practices for large scale HDInsight Deployments
Tips, Tricks & Best Practices for large scale HDInsight DeploymentsTips, Tricks & Best Practices for large scale HDInsight Deployments
Tips, Tricks & Best Practices for large scale HDInsight Deployments
 
Helm and the zen of managing complex Kubernetes apps
Helm and the zen of managing complex Kubernetes appsHelm and the zen of managing complex Kubernetes apps
Helm and the zen of managing complex Kubernetes apps
 
Docker 原理與實作
Docker 原理與實作Docker 原理與實作
Docker 原理與實作
 
The elastic stack on docker
The elastic stack on dockerThe elastic stack on docker
The elastic stack on docker
 
Intro to Hadoop ecosystem and Apache Kylin
Intro to Hadoop ecosystem and Apache KylinIntro to Hadoop ecosystem and Apache Kylin
Intro to Hadoop ecosystem and Apache Kylin
 
Mosix Cluster
Mosix ClusterMosix Cluster
Mosix Cluster
 
CEPH DAY BERLIN - PRACTICAL CEPHFS AND NFS USING OPENSTACK MANILA
CEPH DAY BERLIN - PRACTICAL CEPHFS AND NFS USING OPENSTACK MANILACEPH DAY BERLIN - PRACTICAL CEPHFS AND NFS USING OPENSTACK MANILA
CEPH DAY BERLIN - PRACTICAL CEPHFS AND NFS USING OPENSTACK MANILA
 
Practical CephFS with nfs today using OpenStack Manila - Ceph Day Berlin - 12...
Practical CephFS with nfs today using OpenStack Manila - Ceph Day Berlin - 12...Practical CephFS with nfs today using OpenStack Manila - Ceph Day Berlin - 12...
Practical CephFS with nfs today using OpenStack Manila - Ceph Day Berlin - 12...
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Kürzlich hochgeladen (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Cassandra 2.1 boot camp, exercise

  • 2. Build a compaction strategy that compacts the most overlapping sstables together
  • 3. 0. Setting up your IDE https://wiki.apache.org/cassandra/RunningCassandraInIDEA http://wiki.apache.org/cassandra/RunningCassandraInEclipse
  • 4. 1. Implement a no-op compaction strategy ● class Xyz extends AbstractCompactionStrategy {..} ● Implement the abstract methods ○ getNextBackgroundTask ■ Return a CompactionTask containing the sstables you want to compact, null if none ○ getMaximalTask ■ ‘Major compaction’ - should compact all sstables ○ ... ● ALTER TABLE foo WITH compaction = { class: ‘Xyz’ }
  • 5. 2. Make it compact the most overlapping sstables ● We should reduce disk usage the most if we compact the overlapping sstables together ● CompactionMetadata has ICardinality ○ HyperLogLog - count unique items in a stream ○ Currently used to estimate how big bloom filters we need to allocate during compaction ○ https://github.com/addthis/stream-lib ○ SSTableReader#getApproximateKeyCount ○ ICardinality#merge - merge several of these components to find count of keys in the union of the sstables.
  • 6. 3. Add support for worthDroppingTombstones ● Single-sstable compaction to drop tombstones ● Tries to figure how much sstables overlap and then estimate how many tombstones we have outside that overlap ● Currently we check for range overlap ● Could probably be improved if we used ICardinality
  • 7. 4. Add heuristics to avoid n² CompactionMetadata comparisons Algorithms!
  • 8. Summary 1. Implement a no-op compaction strategy 2. Make it compact the most overlapping sstables 3. Add support for worth dropping tombstones 4. Add heuristics to avoid n² comparisons Slides: bit.ly/1pd9Bws