SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Introduction to Cassandra
Presented on 26th Feb 2014
Scope
• Introduction to Cassandra and NoSql
• Understanding Cassandra data model
• Configuration, read and writing data in Cassandra
• CQL

2
What is Cassandra
• A Database
• Uses Amazon’s Dyanamo’s fully distribution design
• Uses Google’s BigTable’s column family based data model
• Developed by Facebook (The team was led by Jeff Hammerbacher,
with Avinash Lakshman, Karthik Ranganathan, and Prashant Malik
(Search Team))
• Open source in 2008

3
Problems with RDBMS
• Horizontal scaling: In RDBMS as the size grows the joins become
slows so the retrieval become slow.
• Vertical scaling: adding more hardware, memory, faster processor or
upgrading disk space. Adding hardware creates problem like data
replication, consistency, fail over mechanism.
• Caching layer in large system: like memcache, EHCache, Oracle
Coherence. Updation in the cache and data base is exacerbated over
a cluster.

4
Cassandra
• Apache Cassandra is an open source, distributed, decentralized,
elastically scalable, highly available, fault-tolerant, tuneable
consistent, column-oriented database that bases its distribution
design on Amazon’s Dynamo and its data model on Google’s
Bigtable. Created at Facebook.”

5
Why Cassandra
• Fault tolerant
• Decentralized
• Eventually consistent
• Rich data model
• Elastic
• Highly Available
• No SPF (Single point failure)

6
Cap theorem
• University of California at Berkeley, Eric Brewer posted his CAP theorem in 2000.
• The theorem states that within a large-scale distributed data system, there are three
requirements that have a relationship of sliding dependency.
• Consistency: All database clients will read the same value for the same query, even given
concurrent updates.
• Availability: All database clients will always be able to read and write data.
• Partition Tolerance: The database can be split into multiple machines; it can continue functioning
in the face of network segmentation breaks.

7
Cap theorem (cont.)
• According to theorem only two of the three can be strongly supported distributed data
system

• CA: it means system will block when the system will partitions. so in this the system is been
limited to a single data centre to mitigate this.
• CP: it allow data sharding in order to data scaling. The data will be consistent but data may
loss whenever a node goes down.
• AP: system may return inaccurate data, but the system will always be available, even in the
face of network partitioning. DNS is perhaps the most popular example of a system that is
massively scalable, highly available, and partition-tolerant.

8
9
Fault Tolerant
• Data is automatically replicated to multiple nodes based on
replication factor.
• Replication across multiple data center
• Failed nodes can be replaced with no downtime.
• Uses Accrual Failure Detector for fault detection.

10
Decentralization
• Every node in the cluster is identical (No client server architecture)
• There is no single points of failure.

11
Eventual consistency
• Uses BASE (Basically Available Soft-state Eventual) Consistency.
• As the data is replicated, the latest version of something is sitting on
at least one node in the cluster, but old version will still be on other
node.
• Eventually all nodes will see the latest version.

12
Eventual consistency (Cont.)
• Tuneable Consistency: a replication factor to the number of nodes in
the cluster you want the updates to propagate to.
• Consistency level is a setting that clients must specify on every
operation and that allows you to decide how many replicas in the
cluster must acknowledge a write operation or respond to a read
operation in order to be considered successful. That’s the part where
Cassandra has pushed the decision for determining consistency out to
the client. so strict consistency can be achieved assigning same value
to replication factor and consistency level.

13
Rich Data Model
• Keyspace
• Column family
• Rows
• Column
• Super column

14
Column family
"ToyStore" : {
"Toys" : {
"GumDrop" : {
"Price" : "0.25",
"Section" : "Candy"
}
"Transformer" : {
"Price" : "29.99",
"Section" : "Action
Figures"
}
"MatchboxCar" : {
"Price" : "1.49",
"Section" : "Vehicles"
}
}
},
"Keyspace1" : null,
"system" : null

15
Super Column family

16
"ToyCorporation" : {
"ToyStores" : {
"Ohio Store" : {
"Transformer" : {"Price" : "29.99", "Section" : "Action Figures"}
"GumDrop" : {"Price" : "0.25","Section" : "Candy"}
"MatchboxCar" : {"Price" : "1.49","Section" : "Vehicles"}
}
"New York Store" : {
"JawBreaker" : {"Price" : "4.25","Section" : "Candy"}
"MatchboxCar" : {"Price" : "8.79","Section" : "Vehicles"}
}
}
}

17
Keyspace
It is similar as we have schema in RDBMS, it contains a name and a set
of attributes that defines keyspace wide behaviour.
various attributes are:
1. Replication factor: if it is set to 3 then 3 nodes will be having
the copy of each row.
2. Replica placement strategy: like SimpleStrategy
(RackUnawareStrategy), OldNetworkTopologyStrategy (RackAwareStrategy), and NetworkTopologyStrategy (DatacenterShardStrategy).
3. Column family: will discussed.
18
Column family
• A column family is a container for columns, analogous to the table in
a relational system.
• A Column family holds an ordered list of columns, which is been
refered by the column name.

• [Keyspace][ColumnFamily][Key][Column]

19
Column family (cont.)
column family has two attributes: a name and a comparator.
comparator indicate the sorting order when they are returns against a
query. comparator can be of following types: AsciiType, BytesType,
LexicalUUIDType, IntegerType, LongType, TimeUUIDType, or UTF8Type,
Custom (plug your class to cassandra which should be extending
org.apache.cassandra.db.marshal.AbstractType)

20
Column family (cont.)
Hotel {
• key: AZC_043 { name: Cambria Suites Hayden, phone: 480-444-4444,

address: 400 N. Hayden Rd., city: Scottsdale, state: AZ, zip: 85255}
• key: AZS_011 { name: Clarion Scottsdale Peak, phone: 480-333-3333,
address: 3000 N. Scottsdale Rd, city: Scottsdale, state: AZ, zip: 85255}
• key: CAS_021 { name: W Hotel, phone: 415-222-2222,

address: 181 3rd Street, city: San Francisco, state: CA, zip: 94103}
• key: NYN_042 { name: Waldorf Hotel, phone: 212-555-5555,
address: 301 Park Ave, city: New York, state: NY, zip: 10019}
}

21
Rows
• Cassandra is column-oriented database. each row doesn’t have to
have a same number of columns (as in relational database). Each row
has a unique key, which makes it data accessible.
• Each column family is stored in a separate file.

22
Columns
• The column, which is a name/value pair (and a client-supplied
timestamp of when it was last updated), and a column family, which
is a container for rows that have similar, but not identical, column
sets. each column has an extra column for time stamp which records
the time when last column was last updated. rows does not have
timestamp
• columns are name/value pairs, but a regular column stores a byte
array value

23
Super column
• The value of a super column is a map of subcolumns (which store
byte array values).
• it’s important to keep columns that you are likely to query together in
the same column family, and a super column can be helpful for this.
• Super columns are not indexed.
• Cassandra looks like a four-dimensional hash table. But for super
columns, it becomes more like a five-dimensional hash:
[Keyspace][ColumnFamily][Key][SuperColumn][SubColumn]

24
Some points
• You cannot perform joins in Cassandra. If you have designed a data
model and find that you need a join, you’ll have to either do the work
on the client side, or create a denormalized second column family
that represents the join results for you.
• It is not possible to sort by value, it can only sort by column name in
order to fetch individual columns from a rows without pulling entire
row into memory.
• Column sorting is controllable, but key sorting isn’t row keys always
sort in byte order.

25
Elastic/Highly Avaliable
• Read and write throughput both increase linearly as new machine are
added.
• No downtime or interruption to application.

26
Sharding basic strategies
• feature base or functional segmentation: sharding will feature based
with no common features like user details and items for sale will be
different shards, movie rating and comments will be in different
shards.
• key based sharding: a key in data that will evenly distribute it across
shards. So instead of simply storing one letter of the alphabet for
each server as in the (naive and improper) earlier example, you use a
one-way hash on a key data element and distribute data across
machines according to the hash.
• lookup table: a table with contain information regarding the location
of the actual data.
27
Design Pattern
1. Materialized View (one table per query): create a secondary index to
represent the additional query. “materialized” means storing a full
copy of the original data so that everything you need to answer a
query is right there, without forcing you to look up the original data.
If you are performing a second query because you’re only storing
column names that you use, like foreign keys in the second column
family, that’s a secondary index.

28
Design Pattern (Cont.)
2. Valueless column: storing column value as column name. like in
user/usercity we can have city name as key and users of that city as
column names.
3. Aggregate key: key should be unique so it is possible to add two
column value with a separator to create a aggregate key.

29
Reference
• Assembled using various resources over internet.
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandraPL dream
 
cassandra
cassandracassandra
cassandraAkash R
 
Hive
HiveHive
HiveVetri V
 
Pig
PigPig
PigVetri V
 
Introduction To Maxtable
Introduction To MaxtableIntroduction To Maxtable
Introduction To Maxtablemaxtable
 
Bigtable and Boxwood
Bigtable and BoxwoodBigtable and Boxwood
Bigtable and BoxwoodEvan Weaver
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelRishikese MR
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage systemArunit Gupta
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftAmazon Web Services
 
Cassandra Tutorial
Cassandra Tutorial Cassandra Tutorial
Cassandra Tutorial Na Zhu
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage ManagementNisheet Mahajan
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented DatabaseSuvradeep Rudra
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databasesFabio Fumarola
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftAmazon Web Services
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Intro to column stores
Intro to column storesIntro to column stores
Intro to column storesJustin Swanhart
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overviewAnuja Gunale
 

Was ist angesagt? (20)

Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
cassandra
cassandracassandra
cassandra
 
Hive
HiveHive
Hive
 
Pig
PigPig
Pig
 
Introduction To Maxtable
Introduction To MaxtableIntroduction To Maxtable
Introduction To Maxtable
 
Vertica
VerticaVertica
Vertica
 
Bigtable and Boxwood
Bigtable and BoxwoodBigtable and Boxwood
Bigtable and Boxwood
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Cassandra Tutorial
Cassandra Tutorial Cassandra Tutorial
Cassandra Tutorial
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Intro to column stores
Intro to column storesIntro to column stores
Intro to column stores
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 

Ähnlich wie Introduction to cassandra

cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptxBRINDHA256909
 
Column db dol
Column db dolColumn db dol
Column db dolpoojabi
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
 
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsChapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsnehabsairam
 
NoSQL - Cassandra & MongoDB.pptx
NoSQL -  Cassandra & MongoDB.pptxNoSQL -  Cassandra & MongoDB.pptx
NoSQL - Cassandra & MongoDB.pptxNaveen Kumar
 
2. Lecture2_NOSQL_KeyValue.ppt
2. Lecture2_NOSQL_KeyValue.ppt2. Lecture2_NOSQL_KeyValue.ppt
2. Lecture2_NOSQL_KeyValue.pptShaimaaMohamedGalal
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Web Services
 
White paper on cassandra
White paper on cassandraWhite paper on cassandra
White paper on cassandraNavanit Katiyar
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdfhothyfa
 
Cassandra
CassandraCassandra
Cassandraexsuns
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql DatabaseSuresh Parmar
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon RedshiftKel Graham
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra LearningEhsan Javanmard
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 
Nosql databases
Nosql databasesNosql databases
Nosql databasesFayez Shayeb
 
Cassndra (4).pptx
Cassndra (4).pptxCassndra (4).pptx
Cassndra (4).pptxNikhilAmauriya
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataChen Robert
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Boris Yen
 

Ähnlich wie Introduction to cassandra (20)

cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptx
 
Column db dol
Column db dolColumn db dol
Column db dol
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsChapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
 
NoSQL - Cassandra & MongoDB.pptx
NoSQL -  Cassandra & MongoDB.pptxNoSQL -  Cassandra & MongoDB.pptx
NoSQL - Cassandra & MongoDB.pptx
 
2. Lecture2_NOSQL_KeyValue.ppt
2. Lecture2_NOSQL_KeyValue.ppt2. Lecture2_NOSQL_KeyValue.ppt
2. Lecture2_NOSQL_KeyValue.ppt
 
Amazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and OptimizationAmazon Redshift: Performance Tuning and Optimization
Amazon Redshift: Performance Tuning and Optimization
 
White paper on cassandra
White paper on cassandraWhite paper on cassandra
White paper on cassandra
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
Cassandra
CassandraCassandra
Cassandra
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Cassndra (4).pptx
Cassndra (4).pptxCassndra (4).pptx
Cassndra (4).pptx
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
 

KĂźrzlich hochgeladen

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

KĂźrzlich hochgeladen (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Introduction to cassandra

  • 2. Scope • Introduction to Cassandra and NoSql • Understanding Cassandra data model • Configuration, read and writing data in Cassandra • CQL 2
  • 3. What is Cassandra • A Database • Uses Amazon’s Dyanamo’s fully distribution design • Uses Google’s BigTable’s column family based data model • Developed by Facebook (The team was led by Jeff Hammerbacher, with Avinash Lakshman, Karthik Ranganathan, and Prashant Malik (Search Team)) • Open source in 2008 3
  • 4. Problems with RDBMS • Horizontal scaling: In RDBMS as the size grows the joins become slows so the retrieval become slow. • Vertical scaling: adding more hardware, memory, faster processor or upgrading disk space. Adding hardware creates problem like data replication, consistency, fail over mechanism. • Caching layer in large system: like memcache, EHCache, Oracle Coherence. Updation in the cache and data base is exacerbated over a cluster. 4
  • 5. Cassandra • Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tuneable consistent, column-oriented database that bases its distribution design on Amazon’s Dynamo and its data model on Google’s Bigtable. Created at Facebook.” 5
  • 6. Why Cassandra • Fault tolerant • Decentralized • Eventually consistent • Rich data model • Elastic • Highly Available • No SPF (Single point failure) 6
  • 7. Cap theorem • University of California at Berkeley, Eric Brewer posted his CAP theorem in 2000. • The theorem states that within a large-scale distributed data system, there are three requirements that have a relationship of sliding dependency. • Consistency: All database clients will read the same value for the same query, even given concurrent updates. • Availability: All database clients will always be able to read and write data. • Partition Tolerance: The database can be split into multiple machines; it can continue functioning in the face of network segmentation breaks. 7
  • 8. Cap theorem (cont.) • According to theorem only two of the three can be strongly supported distributed data system • CA: it means system will block when the system will partitions. so in this the system is been limited to a single data centre to mitigate this. • CP: it allow data sharding in order to data scaling. The data will be consistent but data may loss whenever a node goes down. • AP: system may return inaccurate data, but the system will always be available, even in the face of network partitioning. DNS is perhaps the most popular example of a system that is massively scalable, highly available, and partition-tolerant. 8
  • 9. 9
  • 10. Fault Tolerant • Data is automatically replicated to multiple nodes based on replication factor. • Replication across multiple data center • Failed nodes can be replaced with no downtime. • Uses Accrual Failure Detector for fault detection. 10
  • 11. Decentralization • Every node in the cluster is identical (No client server architecture) • There is no single points of failure. 11
  • 12. Eventual consistency • Uses BASE (Basically Available Soft-state Eventual) Consistency. • As the data is replicated, the latest version of something is sitting on at least one node in the cluster, but old version will still be on other node. • Eventually all nodes will see the latest version. 12
  • 13. Eventual consistency (Cont.) • Tuneable Consistency: a replication factor to the number of nodes in the cluster you want the updates to propagate to. • Consistency level is a setting that clients must specify on every operation and that allows you to decide how many replicas in the cluster must acknowledge a write operation or respond to a read operation in order to be considered successful. That’s the part where Cassandra has pushed the decision for determining consistency out to the client. so strict consistency can be achieved assigning same value to replication factor and consistency level. 13
  • 14. Rich Data Model • Keyspace • Column family • Rows • Column • Super column 14
  • 15. Column family "ToyStore" : { "Toys" : { "GumDrop" : { "Price" : "0.25", "Section" : "Candy" } "Transformer" : { "Price" : "29.99", "Section" : "Action Figures" } "MatchboxCar" : { "Price" : "1.49", "Section" : "Vehicles" } } }, "Keyspace1" : null, "system" : null 15
  • 17. "ToyCorporation" : { "ToyStores" : { "Ohio Store" : { "Transformer" : {"Price" : "29.99", "Section" : "Action Figures"} "GumDrop" : {"Price" : "0.25","Section" : "Candy"} "MatchboxCar" : {"Price" : "1.49","Section" : "Vehicles"} } "New York Store" : { "JawBreaker" : {"Price" : "4.25","Section" : "Candy"} "MatchboxCar" : {"Price" : "8.79","Section" : "Vehicles"} } } } 17
  • 18. Keyspace It is similar as we have schema in RDBMS, it contains a name and a set of attributes that defines keyspace wide behaviour. various attributes are: 1. Replication factor: if it is set to 3 then 3 nodes will be having the copy of each row. 2. Replica placement strategy: like SimpleStrategy (RackUnawareStrategy), OldNetworkTopologyStrategy (RackAwareStrategy), and NetworkTopologyStrategy (DatacenterShardStrategy). 3. Column family: will discussed. 18
  • 19. Column family • A column family is a container for columns, analogous to the table in a relational system. • A Column family holds an ordered list of columns, which is been refered by the column name. • [Keyspace][ColumnFamily][Key][Column] 19
  • 20. Column family (cont.) column family has two attributes: a name and a comparator. comparator indicate the sorting order when they are returns against a query. comparator can be of following types: AsciiType, BytesType, LexicalUUIDType, IntegerType, LongType, TimeUUIDType, or UTF8Type, Custom (plug your class to cassandra which should be extending org.apache.cassandra.db.marshal.AbstractType) 20
  • 21. Column family (cont.) Hotel { • key: AZC_043 { name: Cambria Suites Hayden, phone: 480-444-4444, address: 400 N. Hayden Rd., city: Scottsdale, state: AZ, zip: 85255} • key: AZS_011 { name: Clarion Scottsdale Peak, phone: 480-333-3333, address: 3000 N. Scottsdale Rd, city: Scottsdale, state: AZ, zip: 85255} • key: CAS_021 { name: W Hotel, phone: 415-222-2222, address: 181 3rd Street, city: San Francisco, state: CA, zip: 94103} • key: NYN_042 { name: Waldorf Hotel, phone: 212-555-5555, address: 301 Park Ave, city: New York, state: NY, zip: 10019} } 21
  • 22. Rows • Cassandra is column-oriented database. each row doesn’t have to have a same number of columns (as in relational database). Each row has a unique key, which makes it data accessible. • Each column family is stored in a separate file. 22
  • 23. Columns • The column, which is a name/value pair (and a client-supplied timestamp of when it was last updated), and a column family, which is a container for rows that have similar, but not identical, column sets. each column has an extra column for time stamp which records the time when last column was last updated. rows does not have timestamp • columns are name/value pairs, but a regular column stores a byte array value 23
  • 24. Super column • The value of a super column is a map of subcolumns (which store byte array values). • it’s important to keep columns that you are likely to query together in the same column family, and a super column can be helpful for this. • Super columns are not indexed. • Cassandra looks like a four-dimensional hash table. But for super columns, it becomes more like a five-dimensional hash: [Keyspace][ColumnFamily][Key][SuperColumn][SubColumn] 24
  • 25. Some points • You cannot perform joins in Cassandra. If you have designed a data model and find that you need a join, you’ll have to either do the work on the client side, or create a denormalized second column family that represents the join results for you. • It is not possible to sort by value, it can only sort by column name in order to fetch individual columns from a rows without pulling entire row into memory. • Column sorting is controllable, but key sorting isn’t row keys always sort in byte order. 25
  • 26. Elastic/Highly Avaliable • Read and write throughput both increase linearly as new machine are added. • No downtime or interruption to application. 26
  • 27. Sharding basic strategies • feature base or functional segmentation: sharding will feature based with no common features like user details and items for sale will be different shards, movie rating and comments will be in different shards. • key based sharding: a key in data that will evenly distribute it across shards. So instead of simply storing one letter of the alphabet for each server as in the (naive and improper) earlier example, you use a one-way hash on a key data element and distribute data across machines according to the hash. • lookup table: a table with contain information regarding the location of the actual data. 27
  • 28. Design Pattern 1. Materialized View (one table per query): create a secondary index to represent the additional query. “materialized” means storing a full copy of the original data so that everything you need to answer a query is right there, without forcing you to look up the original data. If you are performing a second query because you’re only storing column names that you use, like foreign keys in the second column family, that’s a secondary index. 28
  • 29. Design Pattern (Cont.) 2. Valueless column: storing column value as column name. like in user/usercity we can have city name as key and users of that city as column names. 3. Aggregate key: key should be unique so it is possible to add two column value with a separator to create a aggregate key. 29
  • 30. Reference • Assembled using various resources over internet.