SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Cassandra Tutorial
Apache Cassandra is a free open source
and distributed database management
system.It is highly scalable and designed
to manage very large amounts of
structured data. It provides high
availability with no single point of failure.
NoSQLDatabase
• A NoSQL database (sometimes called as Not Only SQL) is a
database that provides a mechanism to store and retrieve data other
than the tabular relations used in relational databases. These
databases are schema-free, support easy replication, have simple
API, eventually consistent, and can handle huge amounts of data.
• The primary objective of a NoSQL database is to have
• simplicity of design,
• horizontal scaling
• finer control over availability.
• NoSql databases use different data structures compared to
relational databases. It makes some operations faster in NoSQL. The
suitability of a given NoSQL database depends on the problem it
must solve.
• Apache Cassandra is an open source distributed database
system that is designed for storing and managing large
amounts of data across commodity servers. Cassandra can
serve as both a real-time operational data store for online
transactional applications and a read-intensive database for
large-scale business intelligence systems.
• Originally created for facebook, Cassandra is designed to have
peer to peer symmetric nodes, instead of master or named
nodes, to ensure there can never be a single point of failure
Cassandra automatically partitions data across all the nodes
in the database cluster, but the administrator has the power to
determine what data will be replicated and how many copies
of the data will be created.
Features of Cassandra
• Cassandra Features:
• Elastic scalability - Cassandra is highly scalable; it allows to add more hardware to
accommodate more customers and more data as per requirement.
• Always on architecture - Cassandra has no single point of failure and it is continuously
available for business-critical applications that cannot afford a failure.
• Fast linear-scale performance - Cassandra is linearly scalable, i.e., it increases your
throughput as you increase the number of nodes in the cluster. Therefore it maintains a
quick response time.
• Flexible data storage - Cassandra accommodates all possible data formats including:
structured, semi-structured, and unstructured. It can dynamically accommodate changes to
your data structures according to your need.
• Easy data distribution - Cassandra provides the flexibility to distribute data where you
need by replicating data across multiple data centers.
• Transaction support - Cassandra supports properties like Atomicity, Consistency,
Isolation, and Durability (ACID).
• Fast writes - Cassandra was designed to run on cheap commodity hardware. It performs
blazingly fast writes and can store hundreds of terabytes of data, without sacrificing the
read efficiency.
Components of Cassandra
• Cassandra uses the Gossip Protocol in the background to allow the nodes
to communicate with each other and detect any faulty nodes in the cluster.
• The key components of Cassandra are as follows −
• Node − It is the place where data is stored.
• Data center − It is a collection of related nodes.
• Cluster − A cluster is a component that contains one or more data centers.
• Commit log − The commit log is a crash-recovery mechanism in
Cassandra. Every write operation is written to the commit log.
• Mem-table − A mem-table is a memory-resident data structure. After
commit log, the data will be written to the mem-table. Sometimes, for a
single-column family, there will be multiple mem-tables.
• SSTable − It is a disk file to which the data is flushed from the mem-table
when its contents reach a threshold value.
• Bloom filter − These are nothing but quick, nondeterministic, algorithms
for testing whether an element is a member of a set. It is a special kind of
cache. Bloom filters are accessed after every query.
Apache Cassandra data types
• Apache Cassandra NoSQL DBMS supports the most
common data types, including ASCII, bigint, BLOB,
Boolean, counter, decimal, double, float, int, text,
timestamp, UUID, VARCHAR and varint.
• Cassandra's data model offers the convenience of
column indexes with the performance of log-
structured updates, strong support for
denormalization and materialized views, and built-
in caching.
• Data access is performed using Cassandra Query
Language (CQL), which resembles SQL.
Cassandra Query Language
• Users can access Cassandra through its nodes using
Cassandra Query Language (CQL). CQL treats the
database (Keyspace) as a container of tables.
Programmers use cqlsh: a prompt to work with CQL or
separate application language drivers.
• Clients approach any of the nodes for their read-write
operations. That node (coordinator) plays a proxy
between the client and the nodes holding the data.
• Data storage in Cassandra is row-oriented, meaning that
all contents of a row are serialized together on disk.
Every row of columns has its unique key. Each row can
hold up to 2 billion columns .Furthermore, each row
must fit onto a single server, because data is partitioned
solely by row-key.
• To understand why databases like Cassandra, HBase and
BigTable (I’ll call them DSS, Distributed Storage
Services, from now on) were designed the way they are,
we’ll first have to understand what they were built to be
used for.
• DSS(A decision support system (DSS) is a computer-based
information system that supports business or organizational
decision-making activities. were designed to handle enormous
amounts of data, stored in billions of rows on large clusters.
Relational databases incorporate a lot of things that make it hard to
efficiently distribute them over multiple machines. DSS simply
remove some or all of these ties. No operations are allowed, that
require scanning extensive parts of the dataset, meaning no JOINS
or rich-queries
• Cassandra is a NoSQL Column family implementation supporting
the Big Table data model using the architectural aspects introduced
by Amazon Dynamo.
column family
• Cassandra consists of many storage nodes and stores each row
within a single storage node. Within each row, Cassandra
always stores columns sorted by their column names. Using
this sort order, Cassandra supports slice queries where given a
row, users can retrieve a subset of its columns falling within a
given column name range. For example, a slice query with
range tag0 to tag9999 will get all the columns whose names
fall between tag0 and tag9999.
• Keyspace – a group of many column families together. It is
only a logical grouping of column families and provides an
isolated scope for names.
• Finally, super columns reside within a column family that
groups several columns under a one key.
• Cassandra provides very fast writes, and they are actually
faster than reads where it can transfer data about 80-
360MB/sec per node. It achieves this using two
techniques.Cassandra keeps most of the data within memory
at the responsible node, and any updates are done in the
memory and written to the persistent storage (file system) in a
lazy fashion. To avoid losing data, however, Cassandra writes
all transactions to a commit log in the disk. Unlike updating
data items in the disk, writes to commit logs are append-only
and, therefore, avoid rotational delay while writing to the
disk. For more information on disk-drive performance
characteristics, see Resources.
• Unless writes have requested full consistency, Cassandra writes data to enough
nodes without resolving any data inconsistencies where it resolves
inconsistencies only at the first read. This process is called "read repair.“
• Healing from failure is manual
• If a node in a Cassandra cluster has failed, the cluster will continue to work if
you have replicas. Full recovery, which is to redistribute data and compensate
for missing replicas, is a manual operation through a command line tool
called node tool. Also, while the manual operation happens, the system will be
unavailable.
• It remembers deletes
• Cassandra is designed such that it continues to work without a problem even if a
node goes down (or gets disconnected) and comes back later. A consequence is
this complicates data deletions. For example, assume a node is down. While
down, a data item has been deleted in replicas. When the unavailable node
comes back on, it will reintroduce the deleted data item at the syncing process
unless Cassandra remembers that data item has been deleted.

Weitere ähnliche Inhalte

Was ist angesagt?

Cassandra basics 2.0
Cassandra basics 2.0Cassandra basics 2.0
Cassandra basics 2.0Asis Mohanty
 
Evaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseEvaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseDataStax
 
Apache Cassandra overview
Apache Cassandra overviewApache Cassandra overview
Apache Cassandra overviewElifTech
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real WorldJeremy Hanna
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internalsnarsiman
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Benoit Perroud
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarDataStax Academy
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 
Introduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraIntroduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraChetan Baheti
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Jay Patel
 

Was ist angesagt? (20)

Cassandra
CassandraCassandra
Cassandra
 
Cassandra basics 2.0
Cassandra basics 2.0Cassandra basics 2.0
Cassandra basics 2.0
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 
Evaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseEvaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud Database
 
Cassandra ppt 2
Cassandra ppt 2Cassandra ppt 2
Cassandra ppt 2
 
Apache Cassandra overview
Apache Cassandra overviewApache Cassandra overview
Apache Cassandra overview
 
Why Cassandra?
Why Cassandra?Why Cassandra?
Why Cassandra?
 
Apache Cassandra
Apache CassandraApache Cassandra
Apache Cassandra
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internals
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Introduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraIntroduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache Cassandra
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
 

Ähnlich wie Cassandra tutorial

04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdfhothyfa
 
cassandra
cassandracassandra
cassandraAkash R
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Md. Shohel Rana
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Column db dol
Column db dolColumn db dol
Column db dolpoojabi
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMCASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMIJCI JOURNAL
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelRishikese MR
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataChen Robert
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppthothyfa
 
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...raghdooosh
 
Apache Cassandra.pptx
Apache Cassandra.pptxApache Cassandra.pptx
Apache Cassandra.pptxAnyaForger34
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introductionfardinjamshidi
 

Ähnlich wie Cassandra tutorial (20)

04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
cassandra
cassandracassandra
cassandra
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
Dsm project-h base-cassandra
Dsm project-h base-cassandraDsm project-h base-cassandra
Dsm project-h base-cassandra
 
cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptx
 
Column db dol
Column db dolColumn db dol
Column db dol
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEMCASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
CASSANDRA A DISTRIBUTED NOSQL DATABASE FOR HOTEL MANAGEMENT SYSTEM
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Cassndra (4).pptx
Cassndra (4).pptxCassndra (4).pptx
Cassndra (4).pptx
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
 
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
 
Apache Cassandra.pptx
Apache Cassandra.pptxApache Cassandra.pptx
Apache Cassandra.pptx
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 

Mehr von Ramakrishna kapa

Mehr von Ramakrishna kapa (20)

Load balancer in mule
Load balancer in muleLoad balancer in mule
Load balancer in mule
 
Anypoint connectors
Anypoint connectorsAnypoint connectors
Anypoint connectors
 
Batch processing
Batch processingBatch processing
Batch processing
 
Msmq connectivity
Msmq connectivityMsmq connectivity
Msmq connectivity
 
Scopes in mule
Scopes in muleScopes in mule
Scopes in mule
 
Data weave more operations
Data weave more operationsData weave more operations
Data weave more operations
 
Basic math operations using dataweave
Basic math operations using dataweaveBasic math operations using dataweave
Basic math operations using dataweave
 
Dataweave types operators
Dataweave types operatorsDataweave types operators
Dataweave types operators
 
Operators in mule dataweave
Operators in mule dataweaveOperators in mule dataweave
Operators in mule dataweave
 
Data weave in mule
Data weave in muleData weave in mule
Data weave in mule
 
Servicenow connector
Servicenow connectorServicenow connector
Servicenow connector
 
Introduction to testing mule
Introduction to testing muleIntroduction to testing mule
Introduction to testing mule
 
Choice flow control
Choice flow controlChoice flow control
Choice flow control
 
Message enricher example
Message enricher exampleMessage enricher example
Message enricher example
 
Mule exception strategies
Mule exception strategiesMule exception strategies
Mule exception strategies
 
Anypoint connector basics
Anypoint connector basicsAnypoint connector basics
Anypoint connector basics
 
Mule global elements
Mule global elementsMule global elements
Mule global elements
 
Mule message structure and varibles scopes
Mule message structure and varibles scopesMule message structure and varibles scopes
Mule message structure and varibles scopes
 
How to create an api in mule
How to create an api in muleHow to create an api in mule
How to create an api in mule
 
Log4j is a reliable, fast and flexible
Log4j is a reliable, fast and flexibleLog4j is a reliable, fast and flexible
Log4j is a reliable, fast and flexible
 

Kürzlich hochgeladen

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Kürzlich hochgeladen (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Cassandra tutorial

  • 1. Cassandra Tutorial Apache Cassandra is a free open source and distributed database management system.It is highly scalable and designed to manage very large amounts of structured data. It provides high availability with no single point of failure.
  • 2. NoSQLDatabase • A NoSQL database (sometimes called as Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular relations used in relational databases. These databases are schema-free, support easy replication, have simple API, eventually consistent, and can handle huge amounts of data. • The primary objective of a NoSQL database is to have • simplicity of design, • horizontal scaling • finer control over availability. • NoSql databases use different data structures compared to relational databases. It makes some operations faster in NoSQL. The suitability of a given NoSQL database depends on the problem it must solve.
  • 3. • Apache Cassandra is an open source distributed database system that is designed for storing and managing large amounts of data across commodity servers. Cassandra can serve as both a real-time operational data store for online transactional applications and a read-intensive database for large-scale business intelligence systems. • Originally created for facebook, Cassandra is designed to have peer to peer symmetric nodes, instead of master or named nodes, to ensure there can never be a single point of failure Cassandra automatically partitions data across all the nodes in the database cluster, but the administrator has the power to determine what data will be replicated and how many copies of the data will be created.
  • 4. Features of Cassandra • Cassandra Features: • Elastic scalability - Cassandra is highly scalable; it allows to add more hardware to accommodate more customers and more data as per requirement. • Always on architecture - Cassandra has no single point of failure and it is continuously available for business-critical applications that cannot afford a failure. • Fast linear-scale performance - Cassandra is linearly scalable, i.e., it increases your throughput as you increase the number of nodes in the cluster. Therefore it maintains a quick response time. • Flexible data storage - Cassandra accommodates all possible data formats including: structured, semi-structured, and unstructured. It can dynamically accommodate changes to your data structures according to your need. • Easy data distribution - Cassandra provides the flexibility to distribute data where you need by replicating data across multiple data centers. • Transaction support - Cassandra supports properties like Atomicity, Consistency, Isolation, and Durability (ACID). • Fast writes - Cassandra was designed to run on cheap commodity hardware. It performs blazingly fast writes and can store hundreds of terabytes of data, without sacrificing the read efficiency.
  • 5. Components of Cassandra • Cassandra uses the Gossip Protocol in the background to allow the nodes to communicate with each other and detect any faulty nodes in the cluster. • The key components of Cassandra are as follows − • Node − It is the place where data is stored. • Data center − It is a collection of related nodes. • Cluster − A cluster is a component that contains one or more data centers. • Commit log − The commit log is a crash-recovery mechanism in Cassandra. Every write operation is written to the commit log. • Mem-table − A mem-table is a memory-resident data structure. After commit log, the data will be written to the mem-table. Sometimes, for a single-column family, there will be multiple mem-tables. • SSTable − It is a disk file to which the data is flushed from the mem-table when its contents reach a threshold value. • Bloom filter − These are nothing but quick, nondeterministic, algorithms for testing whether an element is a member of a set. It is a special kind of cache. Bloom filters are accessed after every query.
  • 6. Apache Cassandra data types • Apache Cassandra NoSQL DBMS supports the most common data types, including ASCII, bigint, BLOB, Boolean, counter, decimal, double, float, int, text, timestamp, UUID, VARCHAR and varint. • Cassandra's data model offers the convenience of column indexes with the performance of log- structured updates, strong support for denormalization and materialized views, and built- in caching. • Data access is performed using Cassandra Query Language (CQL), which resembles SQL.
  • 7. Cassandra Query Language • Users can access Cassandra through its nodes using Cassandra Query Language (CQL). CQL treats the database (Keyspace) as a container of tables. Programmers use cqlsh: a prompt to work with CQL or separate application language drivers. • Clients approach any of the nodes for their read-write operations. That node (coordinator) plays a proxy between the client and the nodes holding the data.
  • 8. • Data storage in Cassandra is row-oriented, meaning that all contents of a row are serialized together on disk. Every row of columns has its unique key. Each row can hold up to 2 billion columns .Furthermore, each row must fit onto a single server, because data is partitioned solely by row-key. • To understand why databases like Cassandra, HBase and BigTable (I’ll call them DSS, Distributed Storage Services, from now on) were designed the way they are, we’ll first have to understand what they were built to be used for.
  • 9. • DSS(A decision support system (DSS) is a computer-based information system that supports business or organizational decision-making activities. were designed to handle enormous amounts of data, stored in billions of rows on large clusters. Relational databases incorporate a lot of things that make it hard to efficiently distribute them over multiple machines. DSS simply remove some or all of these ties. No operations are allowed, that require scanning extensive parts of the dataset, meaning no JOINS or rich-queries • Cassandra is a NoSQL Column family implementation supporting the Big Table data model using the architectural aspects introduced by Amazon Dynamo.
  • 10. column family • Cassandra consists of many storage nodes and stores each row within a single storage node. Within each row, Cassandra always stores columns sorted by their column names. Using this sort order, Cassandra supports slice queries where given a row, users can retrieve a subset of its columns falling within a given column name range. For example, a slice query with range tag0 to tag9999 will get all the columns whose names fall between tag0 and tag9999. • Keyspace – a group of many column families together. It is only a logical grouping of column families and provides an isolated scope for names. • Finally, super columns reside within a column family that groups several columns under a one key.
  • 11. • Cassandra provides very fast writes, and they are actually faster than reads where it can transfer data about 80- 360MB/sec per node. It achieves this using two techniques.Cassandra keeps most of the data within memory at the responsible node, and any updates are done in the memory and written to the persistent storage (file system) in a lazy fashion. To avoid losing data, however, Cassandra writes all transactions to a commit log in the disk. Unlike updating data items in the disk, writes to commit logs are append-only and, therefore, avoid rotational delay while writing to the disk. For more information on disk-drive performance characteristics, see Resources.
  • 12. • Unless writes have requested full consistency, Cassandra writes data to enough nodes without resolving any data inconsistencies where it resolves inconsistencies only at the first read. This process is called "read repair.“ • Healing from failure is manual • If a node in a Cassandra cluster has failed, the cluster will continue to work if you have replicas. Full recovery, which is to redistribute data and compensate for missing replicas, is a manual operation through a command line tool called node tool. Also, while the manual operation happens, the system will be unavailable. • It remembers deletes • Cassandra is designed such that it continues to work without a problem even if a node goes down (or gets disconnected) and comes back later. A consequence is this complicates data deletions. For example, assume a node is down. While down, a data item has been deleted in replicas. When the unavailable node comes back on, it will reintroduce the deleted data item at the syncing process unless Cassandra remembers that data item has been deleted.