CASSANDRA - Next to RDBMS

•Als PPTX, PDF herunterladen•

1 gefällt mir•366 views

Apache Cassandra is a non-relational database which is given by the Apache. Initially, Cassandra was open sourced by Facebook in 2008, and is now developed by Apache Group. In the normal relational databases data stores in the format of rows, but in Cassandra the data will stored in columns format as key value pairs. Due to this column based data storage its giving the high performance while comparing the relational databases. Cassandra can handle many terabytes of data if need be and can easily handle millions of rows, even on a smaller cluster. Cassandra can get around 20K inserts per second. The performance of Cassandra is high and keeping the performance up while reading mostly depends on the hardware, configuration and number of nodes in your cluster. It can be done in Cassandra without much trouble.

Daten & Analysen

CASSANDRA – An Open Source
Data Storage system
Presented By :
Vipul Kumar
Cr No. - 11/269
UNIVERSITY COLLEGE OF ENGINEERING, KOTA
RAJASTHAN TECHNICAL UNIVERSITY
Presented To :
Mr R K Banyal Sir
CSE Department
COMPUTER SCIENCE AND ENGINEERING DEPARTMENT

Contents
• What is Cassandra ?
• History
• Data Model
• System architecture
• Key features and benefits
• Who is using Cassandra ?
• Conclusion and future scope
Contents

Apache Cassandra™ is a free
Distributed
High performance
Extremely scalable
Fault tolerant(i.e. no single point of failure..)
open source NoSQL database.
Definition of Cassandra

Big Table Dynamo
The history of Cassandra

• Table is a multi dimensional map indexed by key (row
key).
• Columns are grouped into Column Families.
• 2 Types of Column Families
– Simple
– Super (nested Column Families)
• Each Column has
– Name
– Value
– Timestamp
Data Model

• Partitioning
How data is partitioned across nodes
• Replication
How data is duplicated across nodes
System Architecture

• Nodes are logically structured in Ring
Topology.
• Hashed value of key associated with data
partition is used to assign it to a node in the
ring.
• Hashing rounds off after certain value to
support ring structure.
• Lightly loaded nodes moves position to
alleviate highly loaded nodes.
Partitioning

• Each data item is replicated at N (replication factor)
nodes.
• Different Replication Policies
– Rack Unaware – replicate data at N-1 successive
nodes after its coordinator
– Rack Aware – uses ‘Zookeeper’ to choose a leader
which tells nodes the range they are replicas for
– Datacenter Aware – similar to Rack Aware but leader
is chosen at Datacenter level instead of Rack level.
Replication

Gossip Protocol
• Network Communication protocols inspired for real life
rumor spreading.
• Periodic, Pairwise, inter-node communication.
• Low frequency communication ensures low cost.
• Random selection of peers.
• Example – Node A wish to search for pattern in data
– Round 1 – Node A searches locally and then gossips with node
B.
– Round 2 – Node A,B gossips with C and D.
– Round 3 – Nodes A,B,C and D gossips with 4 other nodes ……
• Round by round doubling makes protocol very robust.

Key features & benefits
• Gigabyte to Petabyte scalability
• Big data scalability
• No single point of failure
• Easy Replication / Data distribution
• No need for caching software
• Flexible Schema

Big Data Scalability
• Capable of comfortably scaling to petabytes
• New nodes = linear performance increases
• Add new nodes online
2
1
2
1
4
3
Double throughput
capacity

No single point of failure
• All nodes are same
• Read/write from any node
• Can replicate data among different physical data center
racks

Easy Replication
• Transparency handled by Cassandra
• Multi data center capable
• Exploit all the benefit of cloud computing

No need for caching layer
• Peer to peer layer removes need for special caching layer and
the programming.
• The database use the memory from all the participating nodes
to cache the assigned data.

Flexible Schema
• Dynamic schema design allows for more flexible data storage
than rigid RDBMS
• Handles structured, semi-structured and unstructured data.
• No offline / downtime for schema changes

Conclusion & future scope
• Cassandra is an open source storage system providing
scalability, high performance, and wide applicability.
• Cassandra can support a very high update throughput
while delivering low latency.
• Future works involves adding compression, ability to
support atomicity across keys and secondary index
support.

Empfohlen

Apache cassandraAdnan Siddiqi

Basic Hadoop Architecture V1 vs V2VIVEKVANAVAN

CassandraUpaang Saxena

The No SQL Principles and Basic Application Of Casandra ModelRishikese MR

Hive big-data meetupRemus Rusanu

Big data storesKumaran Ramanujam

Benchmark: Beyond Aurora. Scale-out SQL databases for AWS.Clustrix

HBase introduction talkHayden Marchant

Empfohlen

Apache cassandraAdnan Siddiqi

Basic Hadoop Architecture V1 vs V2VIVEKVANAVAN

CassandraUpaang Saxena

The No SQL Principles and Basic Application Of Casandra ModelRishikese MR

Hive big-data meetupRemus Rusanu

Big data storesKumaran Ramanujam

Benchmark: Beyond Aurora. Scale-out SQL databases for AWS.Clustrix

HBase introduction talkHayden Marchant

Signal Digital: The Skinny on Wide RowsDataStax Academy

Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...ScyllaDB

Apache spark - Spark's distributed programming modelMartin Zapletal

Introduction to Apache Cassandra Knoldus Inc.

Cassandra implementation for collecting data and presenting dataChen Robert

MongoDB.local Atlanta: MongoDB on ZMongoDB

Cassandra an overviewPritamKathar

Apache Cassandra overviewElifTech

Meet Hadoop Family: part 1caizer_x

Spark CoreTodd McGrath

Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Anant Corporation

Cassandra Architecture FTWJeffrey Carpenter

Meet Hadoop Family: part 3caizer_x

Cassandra - A decentralized storage systemArunit Gupta

Geek Night - Functional Data Processing using Spark and ScalaAtif Akhtar

Cassandra - Research Paper Overviewsameiralk

Apache spark its place within a big data stackJunjun Olympia

Nosql- Introduction for BeginnersRahul Dhawani

An Overview of Apache SparkYasoda Jayaweera

Database Architecture & Scaling Strategies, in the Cloud & on the Rack Clustrix

Works For Me! Characterizing Non-Reproducible Bug ReportsSALT Lab @ UBC

Fuzhou NetDargon Comic School 明腾王

Weitere ähnliche Inhalte

Was ist angesagt?

Signal Digital: The Skinny on Wide RowsDataStax Academy

Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...ScyllaDB

Apache spark - Spark's distributed programming modelMartin Zapletal

Introduction to Apache Cassandra Knoldus Inc.

Cassandra implementation for collecting data and presenting dataChen Robert

MongoDB.local Atlanta: MongoDB on ZMongoDB

Cassandra an overviewPritamKathar

Apache Cassandra overviewElifTech

Meet Hadoop Family: part 1caizer_x

Spark CoreTodd McGrath

Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Anant Corporation

Cassandra Architecture FTWJeffrey Carpenter

Meet Hadoop Family: part 3caizer_x

Cassandra - A decentralized storage systemArunit Gupta

Geek Night - Functional Data Processing using Spark and ScalaAtif Akhtar

Cassandra - Research Paper Overviewsameiralk

Apache spark its place within a big data stackJunjun Olympia

Nosql- Introduction for BeginnersRahul Dhawani

An Overview of Apache SparkYasoda Jayaweera

Database Architecture & Scaling Strategies, in the Cloud & on the Rack Clustrix

Was ist angesagt? (20)

Signal Digital: The Skinny on Wide Rows

Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...

Apache spark - Spark's distributed programming model

Introduction to Apache Cassandra

Cassandra implementation for collecting data and presenting data

MongoDB.local Atlanta: MongoDB on Z

Cassandra an overview

Apache Cassandra overview

Meet Hadoop Family: part 1

Spark Core

Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2

Cassandra Architecture FTW

Meet Hadoop Family: part 3

Cassandra - A decentralized storage system

Geek Night - Functional Data Processing using Spark and Scala

Cassandra - Research Paper Overview

Apache spark its place within a big data stack

Nosql- Introduction for Beginners

An Overview of Apache Spark

Database Architecture & Scaling Strategies, in the Cloud & on the Rack

Andere mochten auch

Works For Me! Characterizing Non-Reproducible Bug ReportsSALT Lab @ UBC

Fuzhou NetDargon Comic School 明腾王

Infographic - See How T&D Support & Supply The Petrochemical Industry Thorne & Derrick International

Resume - Jigar JadavJigar Jadav

Joseph Wilk NJJoseph Wilk NJ

Environmentaldeepak11233

Learning CassandraDave Gardner

Andere mochten auch (7)

Works For Me! Characterizing Non-Reproducible Bug Reports

Fuzhou NetDargon Comic School

Infographic - See How T&D Support & Supply The Petrochemical Industry

Resume - Jigar Jadav

Joseph Wilk NJ

Environmental

Learning Cassandra

Ähnlich wie CASSANDRA - Next to RDBMS

cybersecurity notes for mca students for learningVitsRangannavar

Introduction to cassandraNguyen Quang

Appache Cassandra nehabsairam

BigData Developers MeetUpChristian Johannsen

Cassandra - A Basic Introduction GuideMohammed Fazuluddin

Cassandra for mission critical dataOleksandr Semenov

Cassandra tech talkSatish Mehta

Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Lviv Startup Club

cassandra_presentation_finalSergioBruno21

Apache Cassandra training. Overview and BasicsOleg Magazov

NoSQL - Cassandra & MongoDB.pptxNaveen Kumar

6.1-Cassandra.pptyashsharma863914

Cassandrassuserbad56d

6.1-Cassandra.pptDanBarcan2

Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsData Con LA

Cassandra - A Distributed Database System Md. Shohel Rana

Cassandra trainingAndrás Fehér

Big Data Storage Concepts from the "Big Data concepts Technology and Architec...raghdooosh

Apache Cassandra in the Real WorldJeremy Hanna

Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson

Ähnlich wie CASSANDRA - Next to RDBMS (20)

cybersecurity notes for mca students for learning

Introduction to cassandra

Appache Cassandra

BigData Developers MeetUp

Cassandra - A Basic Introduction Guide

Cassandra for mission critical data

Cassandra tech talk

Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...

cassandra_presentation_final

Apache Cassandra training. Overview and Basics

NoSQL - Cassandra & MongoDB.pptx

6.1-Cassandra.ppt

Cassandra

6.1-Cassandra.ppt

Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications

Cassandra - A Distributed Database System

Cassandra training

Big Data Storage Concepts from the "Big Data concepts Technology and Architec...

Apache Cassandra in the Real World

Streaming Analytics with Spark, Kafka, Cassandra and Akka

Kürzlich hochgeladen

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Week-01-2.ppt BBB human Computer interactionfulawalesam

Zuja dropshipping via API with DroFx.pptxolyaivanovalion

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal

Carero dropshipping via API with DroFx.pptxolyaivanovalion

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

Data-Analysis for Chicago Crime Data 2023ymrp368

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor

Invezz.com - Grow your wealth with trading signalsInvezz1

April 2024 - Crypto Market Report's Analysismanisha194592

Kürzlich hochgeladen (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

Week-01-2.ppt BBB human Computer interaction

Zuja dropshipping via API with DroFx.pptx

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

Carero dropshipping via API with DroFx.pptx

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl

BigBuy dropshipping via API with DroFx.pptx

VidaXL dropshipping via API with DroFx.pptx

Data-Analysis for Chicago Crime Data 2023

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

Invezz.com - Grow your wealth with trading signals

April 2024 - Crypto Market Report's Analysis

CASSANDRA - Next to RDBMS

1. CASSANDRA – An Open Source Data Storage system Presented By : Vipul Kumar Cr No. - 11/269 UNIVERSITY COLLEGE OF ENGINEERING, KOTA RAJASTHAN TECHNICAL UNIVERSITY Presented To : Mr R K Banyal Sir CSE Department COMPUTER SCIENCE AND ENGINEERING DEPARTMENT

2. Contents • What is Cassandra ? • History • Data Model • System architecture • Key features and benefits • Who is using Cassandra ? • Conclusion and future scope Contents

3. Apache Cassandra™ is a free Distributed High performance Extremely scalable Fault tolerant(i.e. no single point of failure..) open source NoSQL database. Definition of Cassandra

4. Big Table Dynamo The history of Cassandra

5. • Table is a multi dimensional map indexed by key (row key). • Columns are grouped into Column Families. • 2 Types of Column Families – Simple – Super (nested Column Families) • Each Column has – Name – Value – Timestamp Data Model

6. Data Model

7. • Partitioning How data is partitioned across nodes • Replication How data is duplicated across nodes System Architecture

8. • Nodes are logically structured in Ring Topology. • Hashed value of key associated with data partition is used to assign it to a node in the ring. • Hashing rounds off after certain value to support ring structure. • Lightly loaded nodes moves position to alleviate highly loaded nodes. Partitioning

9. • Each data item is replicated at N (replication factor) nodes. • Different Replication Policies – Rack Unaware – replicate data at N-1 successive nodes after its coordinator – Rack Aware – uses ‘Zookeeper’ to choose a leader which tells nodes the range they are replicas for – Datacenter Aware – similar to Rack Aware but leader is chosen at Datacenter level instead of Rack level. Replication

10. Replication

11. Gossip Protocol • Network Communication protocols inspired for real life rumor spreading. • Periodic, Pairwise, inter-node communication. • Low frequency communication ensures low cost. • Random selection of peers. • Example – Node A wish to search for pattern in data – Round 1 – Node A searches locally and then gossips with node B. – Round 2 – Node A,B gossips with C and D. – Round 3 – Nodes A,B,C and D gossips with 4 other nodes …… • Round by round doubling makes protocol very robust.

12. Key features & benefits • Gigabyte to Petabyte scalability • Big data scalability • No single point of failure • Easy Replication / Data distribution • No need for caching software • Flexible Schema

13. Big Data Scalability • Capable of comfortably scaling to petabytes • New nodes = linear performance increases • Add new nodes online 2 1 2 1 4 3 Double throughput capacity

14. No single point of failure • All nodes are same • Read/write from any node • Can replicate data among different physical data center racks

15. Easy Replication • Transparency handled by Cassandra • Multi data center capable • Exploit all the benefit of cloud computing

16. No need for caching layer • Peer to peer layer removes need for special caching layer and the programming. • The database use the memory from all the participating nodes to cache the assigned data.

17. Flexible Schema • Dynamic schema design allows for more flexible data storage than rigid RDBMS • Handles structured, semi-structured and unstructured data. • No offline / downtime for schema changes

18. Who uses Cassandra

19. Conclusion & future scope • Cassandra is an open source storage system providing scalability, high performance, and wide applicability. • Cassandra can support a very high update throughput while delivering low latency. • Future works involves adding compression, ability to support atomicity across keys and secondary index support.

20. Thank You