Cassandra - Deep Dive ...

•Als PPTX, PDF herunterladen•

1 gefällt mir•3,093 views

sameiralk

Presentation of internal architecture and features of Cassandra based on the version 1.2

Technologie

Cassandra
A Decentralized Structured Storage System
By Sameera Nelson

Outline …
 Introduction
 Data Model
 System Architecture
 Failure Detection & Recovery
 Local Persistence
 Performance
 Statistics

What is Cassandra ?
 Distributed Storage System
 Manages Structured Data
 Highly available , No SPoF
 Not a Relational Data Model
 Handle high write throughput
◦ No impact on read efficiency

Motivation
 Operational Requirements in Facebook
◦ Performance
◦ Reliability/ Dealing with Failures
◦ Efficiency
◦ Continues Growth
 Application
◦ Inbox Search Problem, Facebook

Similar Work
 Google File System
◦ Distributed FS, Single master/Slave
 Ficus/ Coda
◦ Distributed FS
 Farsite
◦ Distributed FS, No centralized server
 Bayou
◦ Distributed Relational DB System
 Dynamo
◦ Distributed Storage system

Data Model
Figure from Eben Hewitt’s slides.

Supported Operations
 insert(table; key; rowMutation)
 get(table; key; columnName)
 delete(table; key; columnName)

Query Language
CREATE TABLE users
( user_id int PRIMARY KEY,
fname text,
lname text );
INSERT INTO users
(user_id, fname, lname) VALUES
(1745, 'john', 'smith');
SELECT * FROM users;

Data Structure
 Log-Structured Merge Tree

Fully Distributed …
 No Single Point of Failure

Cassandra Architecture
 Partitioning
 Data distribution across nodes
 Replication
 Data duplication across nodes
 Cluster Membership
 Node management in cluster
 adding/ deleting

Partitioning
 Partitions using Consistent hashing

Partitioning
 Assignment in to the relevant partition

Replication
 Based on configured replication factor

Replication
 Different Replication Policies
◦ Rack Unaware
◦ Rack Aware
◦ Data center Aware

Cluster Membership
 Based on scuttlebutt
 Efficient Gossip based mechanism
 Inspired for real life rumor spreading.
 Anti Entropy protocol
◦ Repair replicated data by comparing &
reconciling differences

Failure Detection
 Track state
◦ Directly, Indirectly
 Accrual Detection mechanism
 Permanent Node change
◦ Admin should explicitly add or remove
 Hints
◦ Data to be replayed in replication
◦ Saved in system.hints table

Accrual Failure Detector
• Node is faulty, suspicion level
monotonically increases.
• Φ(t)  k
• k - threshold variable
• Node is correct
• Φ(t) = 0

Write Operation
 Logging data in commit log/ memtable
 Flushing data from the memtable
◦ Flushing data on threshold
 Storing data on disk in SSTables
 Mark with tombstone
 Compaction
 Remove deletes, Sorts, Merges data, consolidation

Read Request
 Direct/ Background (Read repair)

Delete Operation
 Data not removed immediately
 Only Tombstone is written
 Deleted in Compacting Process

Additional Features
 Adding compression
 Snappy Compression
 Secondary index support
 SSL support
◦ Client/ Node
◦ Node/ Node
 Rolling commit logs
 SSTable data file merging

Performance
 High Throughput & Low Latency
◦ Eliminating on-disk data modification
◦ Eliminate erase-block cycles
◦ No Locking for concurrency control
◦ Maintaining integrity not required
 High Availability
 Linear Scalability
 Fault Tolerant

Empfohlen

Cassandra - Research Paper Overviewsameiralk

Faster and smaller inverted indices with Treaps Research Papersameiralk

Cassandra - A Decentralized Structured Storage SystemVarad Meru

Cassandra - A decentralized storage systemArunit Gupta

CassandraUpaang Saxena

Introduction to Apache Cassandra Knoldus Inc.

Cassandra Tutorial Na Zhu

Apache ignite as in-memory computing platformSurinder Mehra

Empfohlen

Cassandra - Research Paper Overviewsameiralk

Faster and smaller inverted indices with Treaps Research Papersameiralk

Cassandra - A Decentralized Structured Storage SystemVarad Meru

Cassandra - A decentralized storage systemArunit Gupta

CassandraUpaang Saxena

Introduction to Apache Cassandra Knoldus Inc.

Cassandra Tutorial Na Zhu

Apache ignite as in-memory computing platformSurinder Mehra

Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...DataStax

The No SQL Principles and Basic Application Of Casandra ModelRishikese MR

Cassandra internalsnarsiman

MongoDB by TonnyAgate Studio

Everyday I’m scaling... CassandraInstaclustr

Cassandra architectureT Jake Luciani

Introduction to CassandraSoftwareMill

Introduction to cassandraNguyen Quang

Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...DataStax

Cassandra an overviewPritamKathar

Cassandra 101Nader Ganayem

Cassandra trainingAndrás Fehér

Cassandra overviewSean Murphy

Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...DataStax

Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax

Cassandra: Open Source Bigtable + Dynamojbellis

Appache Cassandra nehabsairam

Understanding Cassandra internals to solve real-world problemsAcunu

An Overview of Apache CassandraDataStax

Introduction to NoSQL & Apache CassandraChetan Baheti

Cassandra data structures and algorithmsDuyhai Doan

Cassandra devoxx 2010jbellis

Weitere ähnliche Inhalte

Was ist angesagt?

Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...DataStax

The No SQL Principles and Basic Application Of Casandra ModelRishikese MR

Cassandra internalsnarsiman

MongoDB by TonnyAgate Studio

Everyday I’m scaling... CassandraInstaclustr

Cassandra architectureT Jake Luciani

Introduction to CassandraSoftwareMill

Introduction to cassandraNguyen Quang

Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...DataStax

Cassandra an overviewPritamKathar

Cassandra 101Nader Ganayem

Cassandra trainingAndrás Fehér

Cassandra overviewSean Murphy

Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...DataStax

Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...DataStax

Cassandra: Open Source Bigtable + Dynamojbellis

Appache Cassandra nehabsairam

Understanding Cassandra internals to solve real-world problemsAcunu

An Overview of Apache CassandraDataStax

Introduction to NoSQL & Apache CassandraChetan Baheti

Was ist angesagt? (20)

Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...

The No SQL Principles and Basic Application Of Casandra Model

Cassandra internals

MongoDB by Tonny

Everyday I’m scaling... Cassandra

Cassandra architecture

Introduction to Cassandra

Introduction to cassandra

Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...

Cassandra an overview

Cassandra 101

Cassandra training

Cassandra overview

Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...

Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...

Cassandra: Open Source Bigtable + Dynamo

Appache Cassandra

Understanding Cassandra internals to solve real-world problems

An Overview of Apache Cassandra

Introduction to NoSQL & Apache Cassandra

Andere mochten auch

Cassandra data structures and algorithmsDuyhai Doan

Cassandra devoxx 2010jbellis

Cassandra Pooja GV

Cassandra's Sweet Spot - an introduction to Apache CassandraDave Gardner

69 claves para conocer Big DataStratebi

Introduction To HBaseAnil Gupta

BI, Reporting and Analytics on Apache CassandraVictor Coustenoble

Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBAthiq Ahamed

HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.

Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share

Shall we play a game?Maciej Lasyk

Andere mochten auch (11)

Cassandra data structures and algorithms

Cassandra devoxx 2010

Cassandra

Cassandra's Sweet Spot - an introduction to Apache Cassandra

69 claves para conocer Big Data

Introduction To HBase

BI, Reporting and Analytics on Apache Cassandra

Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB

HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce

Cassandra Data Modeling - Practical Considerations @ Netflix

Shall we play a game?

Ähnlich wie Cassandra - Deep Dive ...

MySpace Data Architecture June 2009Mark Ginnebaugh

Presentation cloud control enterprise manager 12cxKinAnx

SQL Server - High availabilityPeter Gfader

Expert summit SQL Server 2016Łukasz Grala

Best storage engine for MySQLtomflemingh2

Merging and Migrating: Data Portability from the TrenchesAtlassian

What's new in SQL Server 2016James Serra

Practical SQL query monitoring and optimizationIvo Andreev

Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd

Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks

Oracle DBA Training in Hyderabadunited global soft

Oracle DBA Online Trainingin Indiaunited global soft

5 multi-instance management sqlserver.co.il

My Database Skills Killed the ServerColdFusionConference

Sql server-performance-hafizabi-babi

Saying goodbye to SQL Server 2000ukdpe

Best Practices for Building Robust Data Platform with Apache Spark and DeltaDatabricks

Sql Server Performance TuningBala Subra

http://www.hfadeel.com/Blog/?p=151xlight

Building a high-performance data lake analytics engine at Alibaba Cloud with ...Alluxio, Inc.

Ähnlich wie Cassandra - Deep Dive ... (20)

MySpace Data Architecture June 2009

Presentation cloud control enterprise manager 12c

SQL Server - High availability

Expert summit SQL Server 2016

Best storage engine for MySQL

Merging and Migrating: Data Portability from the Trenches

What's new in SQL Server 2016

Practical SQL query monitoring and optimization

Migration to ClickHouse. Practical guide, by Alexander Zaitsev

Building Lakehouses on Delta Lake with SQL Analytics Primer

Oracle DBA Training in Hyderabad

Oracle DBA Online Trainingin India

5 multi-instance management

My Database Skills Killed the Server

Sql server-performance-hafi

Saying goodbye to SQL Server 2000

Best Practices for Building Robust Data Platform with Apache Spark and Delta

Sql Server Performance Tuning

http://www.hfadeel.com/Blog/?p=151

Building a high-performance data lake analytics engine at Alibaba Cloud with ...

Kürzlich hochgeladen

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Partners Life - Insurer Innovation Award 2024The Digital Insurer

Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Kürzlich hochgeladen (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Handwritten Text Recognition for manuscripts and early printed texts

Finology Group – Insurtech Innovation Award 2024

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Partners Life - Insurer Innovation Award 2024

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams

Unblocking The Main Thread Solving ANRs and Frozen Frames

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

The Codex of Business Writing Software for Real-World Solutions 2.pptx

GenCyber Cyber Security Day Presentation

Data Cloud, More than a CDP by Matt Robison

Breaking the Kubernetes Kill Chain: Host Path Mount

CNv6 Instructor Chapter 6 Quality of Service

08448380779 Call Girls In Friends Colony Women Seeking Men

Axa Assurance Maroc - Insurer Innovation Award 2024

Salesforce Community Group Quito, Salesforce 101

How to Troubleshoot Apps for the Modern Connected Worker

Cassandra - Deep Dive ...

1. Cassandra A Decentralized Structured Storage System By Sameera Nelson

2. Outline …  Introduction  Data Model  System Architecture  Failure Detection & Recovery  Local Persistence  Performance  Statistics

3. What is Cassandra ?  Distributed Storage System  Manages Structured Data  Highly available , No SPoF  Not a Relational Data Model  Handle high write throughput ◦ No impact on read efficiency

4. Motivation  Operational Requirements in Facebook ◦ Performance ◦ Reliability/ Dealing with Failures ◦ Efficiency ◦ Continues Growth  Application ◦ Inbox Search Problem, Facebook

5. Similar Work  Google File System ◦ Distributed FS, Single master/Slave  Ficus/ Coda ◦ Distributed FS  Farsite ◦ Distributed FS, No centralized server  Bayou ◦ Distributed Relational DB System  Dynamo ◦ Distributed Storage system

6. Data Model

7. Data Model Figure from Eben Hewitt’s slides.

8. Supported Operations  insert(table; key; rowMutation)  get(table; key; columnName)  delete(table; key; columnName)

9. Query Language CREATE TABLE users ( user_id int PRIMARY KEY, fname text, lname text ); INSERT INTO users (user_id, fname, lname) VALUES (1745, 'john', 'smith'); SELECT * FROM users;

10. Data Structure  Log-Structured Merge Tree

11. System Architecture

12. Architecture

13. Fully Distributed …  No Single Point of Failure

14. Cassandra Architecture  Partitioning  Data distribution across nodes  Replication  Data duplication across nodes  Cluster Membership  Node management in cluster  adding/ deleting

15. Partitioning  The Token Ring

16. Partitioning  Partitions using Consistent hashing

17. Partitioning  Assignment in to the relevant partition

18. Partitioning, Vnodes

19. Replication  Based on configured replication factor

20. Replication  Different Replication Policies ◦ Rack Unaware ◦ Rack Aware ◦ Data center Aware

21. Cluster Membership  Based on scuttlebutt  Efficient Gossip based mechanism  Inspired for real life rumor spreading.  Anti Entropy protocol ◦ Repair replicated data by comparing & reconciling differences

22. Cluster Membership Gossip Based

23. Failure Detection & Recovery

24. Failure Detection  Track state ◦ Directly, Indirectly  Accrual Detection mechanism  Permanent Node change ◦ Admin should explicitly add or remove  Hints ◦ Data to be replayed in replication ◦ Saved in system.hints table

25. Accrual Failure Detector • Node is faulty, suspicion level monotonically increases. • Φ(t)  k • k - threshold variable • Node is correct • Φ(t) = 0

26. Local Persistence

27. Write Request

28. Write Operation

29. Write Operation  Logging data in commit log/ memtable  Flushing data from the memtable ◦ Flushing data on threshold  Storing data on disk in SSTables  Mark with tombstone  Compaction  Remove deletes, Sorts, Merges data, consolidation

30. Write Operation  Compaction

31. Read Request  Direct/ Background (Read repair)

32. Read Operation

33. Delete Operation  Data not removed immediately  Only Tombstone is written  Deleted in Compacting Process

34. Additional Features  Adding compression  Snappy Compression  Secondary index support  SSL support ◦ Client/ Node ◦ Node/ Node  Rolling commit logs  SSTable data file merging

35. Performance

36. Performance  High Throughput & Low Latency ◦ Eliminating on-disk data modification ◦ Eliminate erase-block cycles ◦ No Locking for concurrency control ◦ Maintaining integrity not required  High Availability  Linear Scalability  Fault Tolerant

37. Statistics

38. Stats from Netflix  Liner scalability

39. Stats from Netflix

40. Some users

41. Thank you

42. Read Detailed Structure

Hinweis der Redaktion

Run repair tool
Must run regular node repair on every node in the cluster (by default, every 10 days)
Not maximum compressionVery high speeds and reasonable compression
Cassandra updates the bytes and rewrites the entire sector back out instead of modifying the data on disk