SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Robert Koletka
What is Cassandra
●   Basically a key value store
    ●   With some stuff.


●   It is a NoSQL database that is
    ●   Decentralized : No single point of failure
    ●   Elastic : Linear Scalability
    ●   Fault Tolerant : Replication
    ●   Optimized for writes, reads don't do badly at all
        though.
What is Cassandra
●   Based on two papers
    ●   Bigtable: Google
    ●   Dynamo: Amazon
●   Dynamo partitioning and replication
●   Bigtable data model
●   CAP Theorem
    ●   Consistent NO
    ●   Available YES
    ●   Partition Tolerant YES
What is Cassandra
●   Uses consistent hashing
Data Model
●   Cluster
●   Keyspace : like a DB
●   Column Families : like a Table
●   Super Columns (optional)
●   Columns
●   Values
Data Model
●   Keyspace groups column families together
●   Column Family groups data together
●   Example :
    ●   User Keyspace has
        –   UserProfiles Column Family
        –   Friends Column Family
Data Model
●   Cassandra doesn't require schema's like
    traditional DB's
●   UserProfiles Example
    ●   Rk = {Name:Robert, Surname:Koletka,
        Gender:Male}
    ●   Js = {Name:John, Surname:Smith, Location:WC}
●   Rk & Js both valid entries in UserProfiles
    Column Family even though different columns.
Data Model
●   Think about QUERIES not de-normalizing data.
●   Use Case: “I want to get friends name's and
    surname's for a given UserID”
●   Name & Surname needs to be in the friend
    column family.
●   Js = {Ac:{Name:Alice,Surname:Cook}, Bb:
    {Name:Betty,Surname:Blah}} (Super)
●   Js = {Ac:”Alice Bob”, Bb:”Betty Blah”}
Data Model
●   Column
    ●   Rowkey = {ColumnName:Value,CN=V,CN=V}
●   Super Column
    ●   Rowkey = {SuperColumnName:{CN=V,CN=V},
               SCN:{CN=V,CN=V}}
●   Super Columns group columns together
●   Cannot index on a Sub column.
Define Keyspace
●   create keyspace <keyspace> with <att1>=<value1> and
    <att2>=<value2> ...;
●   create keyspace UserKeyspace with placement_strategy
    = 'org.apache.cassandra.locator.SimpleStrategy' and
    strategy_options = {replication_factor:2};
●   Simple Strategy – place replica on next node
●   NetworkTopologyStrategy – for multiple data centers
●   OldNetworkToplogyStrategy – different data centers and
    different racks
Define Column Family
●   create column family <name> with
    <att1>=<value1> and <att2>=<value2>...;
●   create column family UserProfiles with
    comparator=UTF8Type and
    default_validation_class=UTF8Type and
    key_validation_class=UTF8Type and
    column_metadata=[{column_name:Location,
    validation_class:UTF8Type,
    Index_Type:KEYS}];
Define Column Family
●   Comparator = Column Name validator and
    compare column names
●   default_validation_class = Validation for values
    in columns which are not listed in
    column_metadata
●   key_validation_class = Validate key

●   Default is BytesType
Define Column Family
●   Other Available Types
    ●   AsciiType
    ●   BytesType
    ●   CounterColumnType (distributed counter column, a CF
        either contains counters or non at all)
    ●   Int32Type
    ●   IntegerType (a generic variable-length integer type)
    ●   LexicalUUIDType
    ●   LongType
    ●   UTF8Type
Define Column Family
●   Many more options
    ●   bloom_filter_fp_chance : false positives
    ●   gc_grace : garbage collection
    ●   keys_cached
    ●   row_cache_save_period
    ●   max_compaction_threshold
    ●   ...
Read and Writes
●   Cassandra is optimized for writes
    ●   First written to a commitlog
    ●   Then to an in-memory table (memtable)
    ●   Then periodically written to disk (SStable)
●   Reads
    ●   Read from all SStables and memtables
    ●   Bloom filters used to speed up Sstable lookups
●   Compaction
    ●   Periodically Cassandra merges SStables
Indexes
●   Row Keys
    ●   Cassandra keeps and index of its Row keys
●   Column Indexes
    ●   Known as Secondary Indexes, build an index on
        column values.
    ●   Indexes existing data in the background
    ●   Query by using equality predicates
        –   Then additional filters
Indexes
●   Get userprofiles where location = 'WC'
●   Get userprofiles where location = 'WC' and age
    > 18
●   NOT
    ●   Get userprofiles where age > 18
Consistency
●   Allows for configurable consistency settings
    ●   Read
        –   One, Quorum((Replication Factor / 2) +1), Local/Each_Quorum
            (Data Centers), All.
    ●   Write
        –   Any, One, Quorum, Local/Each_Quorum (Data Centers), All.
●   Any means that data can be written to co-ordinator if
    replica's are down till replica's come back up.
●   Quorum allows for some consistency and tolerating
    some failures.
●   All replica's must be up.
Consistency
●   Read
    ●   At least one node needs to be up to read data from,
        obvious.
    ●   Reads from a number of replicas returning the
        latest data, based on timestamp.
    ●   Read repair ensures data remains consistent,
        updates out of date nodes with latest data. Runs in
        background.
Cassandra Query Language
●   Allows for
    ●   Select
        –   SELECT [FIRST N] [REVERSED] <SELECT EXPR> FROM <COLUMN FAMILY> [USING
            <CONSISTENCY>] [WHERE <CLAUSE>] [LIMIT N];
        –   SELECT [FIRST N] [REVERSED] name1..nameN FROM
        –   Unlike SQL, no guarantee that columns will be returned
        –   SELECT ... WHERE KEY >= startkey and KEY =< endkey AND name1 = value1
    ●   Insert
    ●   Delete
    ●   Update
    ●   Batch
    ●   Truncate
    ●   Create Keyspace
    ●   Create Column Family
    ●   Create Index
    ●   Drop
Other Stuff
●   Cassandra stores columns in sorted order
    ●   Allows you to get the first or last X number of columns
    ●   Potentially store historical user data
●   Single column cannot hold more than 2gb
●   Max number of columns per row is 2 billion
●   Key and Column Names must be <64kb
●   Most Languages have client libraries (Python,
    Java, Scala, Node.js, PHP, C++...)
●   Try not to use raw thrift.
Last Example
●   User Statuses
●   Columns stored in sorted order... use timestamp as column
    name
●   Rk = {1:'Good morning all',2:'lunch was good',3:'time to get
    drunk',4:'so many regrets from last night'}
●   Create column family UserStatuses with comparator =
    LongType and Key_validation_class=UTF8Type and
    default_validation_class=UTF8Type
●   Get last X number of Columns, Get first X number of
    columns

Weitere ähnliche Inhalte

Was ist angesagt?

Android Level 2
Android Level 2Android Level 2
Android Level 2DevMix
 
Installing postgres & postgis
Installing postgres & postgisInstalling postgres & postgis
Installing postgres & postgisJohn Ashmead
 
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]Dinesh Neupane
 
BITS: Introduction to relational databases and MySQL - SQL
BITS: Introduction to relational databases and MySQL - SQLBITS: Introduction to relational databases and MySQL - SQL
BITS: Introduction to relational databases and MySQL - SQLBITS
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2PoguttuezhiniVP
 
Postgresql Database Administration- Day3
Postgresql Database Administration- Day3Postgresql Database Administration- Day3
Postgresql Database Administration- Day3PoguttuezhiniVP
 
MongoDB basics & Introduction
MongoDB basics & IntroductionMongoDB basics & Introduction
MongoDB basics & IntroductionJerwin Roy
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformMartin Zapletal
 
Group Replication in MySQL 8.0 ( A Walk Through )
Group Replication in MySQL 8.0 ( A Walk Through ) Group Replication in MySQL 8.0 ( A Walk Through )
Group Replication in MySQL 8.0 ( A Walk Through ) Mydbops
 
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)M Malai
 
8. key value databases laboratory
8. key value databases laboratory 8. key value databases laboratory
8. key value databases laboratory Fabio Fumarola
 
Discover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQLDiscover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQLDave Stokes
 
MongoDB for Beginners
MongoDB for BeginnersMongoDB for Beginners
MongoDB for BeginnersEnoch Joshua
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationmysqlops
 

Was ist angesagt? (19)

Android Level 2
Android Level 2Android Level 2
Android Level 2
 
Installing postgres & postgis
Installing postgres & postgisInstalling postgres & postgis
Installing postgres & postgis
 
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
 
BITS: Introduction to relational databases and MySQL - SQL
BITS: Introduction to relational databases and MySQL - SQLBITS: Introduction to relational databases and MySQL - SQL
BITS: Introduction to relational databases and MySQL - SQL
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2
 
Scala with MongoDB
Scala with MongoDBScala with MongoDB
Scala with MongoDB
 
Xephon K A Time series database with multiple backends
Xephon K A Time series database with multiple backendsXephon K A Time series database with multiple backends
Xephon K A Time series database with multiple backends
 
Postgresql Database Administration- Day3
Postgresql Database Administration- Day3Postgresql Database Administration- Day3
Postgresql Database Administration- Day3
 
MongoDB basics & Introduction
MongoDB basics & IntroductionMongoDB basics & Introduction
MongoDB basics & Introduction
 
memcached Distributed Cache
memcached Distributed Cachememcached Distributed Cache
memcached Distributed Cache
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
 
CouchDB
CouchDBCouchDB
CouchDB
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive Platform
 
Group Replication in MySQL 8.0 ( A Walk Through )
Group Replication in MySQL 8.0 ( A Walk Through ) Group Replication in MySQL 8.0 ( A Walk Through )
Group Replication in MySQL 8.0 ( A Walk Through )
 
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
Ansible is Our Wishbone(Automate DBA Tasks With Ansible)
 
8. key value databases laboratory
8. key value databases laboratory 8. key value databases laboratory
8. key value databases laboratory
 
Discover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQLDiscover the Power of the NoSQL + SQL with MySQL
Discover the Power of the NoSQL + SQL with MySQL
 
MongoDB for Beginners
MongoDB for BeginnersMongoDB for Beginners
MongoDB for Beginners
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimization
 

Ähnlich wie Cassandra

Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUGStu Hood
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamojbellis
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache CassandraStu Hood
 
Cassandra in production
Cassandra in productionCassandra in production
Cassandra in productionvalstadsve
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overviewSean Murphy
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache CassandraSaeid Zebardast
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfCédrick Lunven
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Heterogenous Persistence
Heterogenous PersistenceHeterogenous Persistence
Heterogenous PersistenceJervin Real
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra Knoldus Inc.
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraArtur Mkrtchyan
 
Big Data Grows Up - A (re)introduction to Cassandra
Big Data Grows Up - A (re)introduction to CassandraBig Data Grows Up - A (re)introduction to Cassandra
Big Data Grows Up - A (re)introduction to CassandraRobbie Strickland
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 

Ähnlich wie Cassandra (20)

Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Cassandra in production
Cassandra in productionCassandra in production
Cassandra in production
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdf
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Cassandra Overview
Cassandra OverviewCassandra Overview
Cassandra Overview
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Cassandra1.2
Cassandra1.2Cassandra1.2
Cassandra1.2
 
Heterogenous Persistence
Heterogenous PersistenceHeterogenous Persistence
Heterogenous Persistence
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Big Data Grows Up - A (re)introduction to Cassandra
Big Data Grows Up - A (re)introduction to CassandraBig Data Grows Up - A (re)introduction to Cassandra
Big Data Grows Up - A (re)introduction to Cassandra
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 

Kürzlich hochgeladen

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Cassandra

  • 2. What is Cassandra ● Basically a key value store ● With some stuff. ● It is a NoSQL database that is ● Decentralized : No single point of failure ● Elastic : Linear Scalability ● Fault Tolerant : Replication ● Optimized for writes, reads don't do badly at all though.
  • 3. What is Cassandra ● Based on two papers ● Bigtable: Google ● Dynamo: Amazon ● Dynamo partitioning and replication ● Bigtable data model ● CAP Theorem ● Consistent NO ● Available YES ● Partition Tolerant YES
  • 4. What is Cassandra ● Uses consistent hashing
  • 5. Data Model ● Cluster ● Keyspace : like a DB ● Column Families : like a Table ● Super Columns (optional) ● Columns ● Values
  • 6. Data Model ● Keyspace groups column families together ● Column Family groups data together ● Example : ● User Keyspace has – UserProfiles Column Family – Friends Column Family
  • 7. Data Model ● Cassandra doesn't require schema's like traditional DB's ● UserProfiles Example ● Rk = {Name:Robert, Surname:Koletka, Gender:Male} ● Js = {Name:John, Surname:Smith, Location:WC} ● Rk & Js both valid entries in UserProfiles Column Family even though different columns.
  • 8. Data Model ● Think about QUERIES not de-normalizing data. ● Use Case: “I want to get friends name's and surname's for a given UserID” ● Name & Surname needs to be in the friend column family. ● Js = {Ac:{Name:Alice,Surname:Cook}, Bb: {Name:Betty,Surname:Blah}} (Super) ● Js = {Ac:”Alice Bob”, Bb:”Betty Blah”}
  • 9. Data Model ● Column ● Rowkey = {ColumnName:Value,CN=V,CN=V} ● Super Column ● Rowkey = {SuperColumnName:{CN=V,CN=V}, SCN:{CN=V,CN=V}} ● Super Columns group columns together ● Cannot index on a Sub column.
  • 10. Define Keyspace ● create keyspace <keyspace> with <att1>=<value1> and <att2>=<value2> ...; ● create keyspace UserKeyspace with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = {replication_factor:2}; ● Simple Strategy – place replica on next node ● NetworkTopologyStrategy – for multiple data centers ● OldNetworkToplogyStrategy – different data centers and different racks
  • 11. Define Column Family ● create column family <name> with <att1>=<value1> and <att2>=<value2>...; ● create column family UserProfiles with comparator=UTF8Type and default_validation_class=UTF8Type and key_validation_class=UTF8Type and column_metadata=[{column_name:Location, validation_class:UTF8Type, Index_Type:KEYS}];
  • 12. Define Column Family ● Comparator = Column Name validator and compare column names ● default_validation_class = Validation for values in columns which are not listed in column_metadata ● key_validation_class = Validate key ● Default is BytesType
  • 13. Define Column Family ● Other Available Types ● AsciiType ● BytesType ● CounterColumnType (distributed counter column, a CF either contains counters or non at all) ● Int32Type ● IntegerType (a generic variable-length integer type) ● LexicalUUIDType ● LongType ● UTF8Type
  • 14. Define Column Family ● Many more options ● bloom_filter_fp_chance : false positives ● gc_grace : garbage collection ● keys_cached ● row_cache_save_period ● max_compaction_threshold ● ...
  • 15. Read and Writes ● Cassandra is optimized for writes ● First written to a commitlog ● Then to an in-memory table (memtable) ● Then periodically written to disk (SStable) ● Reads ● Read from all SStables and memtables ● Bloom filters used to speed up Sstable lookups ● Compaction ● Periodically Cassandra merges SStables
  • 16. Indexes ● Row Keys ● Cassandra keeps and index of its Row keys ● Column Indexes ● Known as Secondary Indexes, build an index on column values. ● Indexes existing data in the background ● Query by using equality predicates – Then additional filters
  • 17. Indexes ● Get userprofiles where location = 'WC' ● Get userprofiles where location = 'WC' and age > 18 ● NOT ● Get userprofiles where age > 18
  • 18. Consistency ● Allows for configurable consistency settings ● Read – One, Quorum((Replication Factor / 2) +1), Local/Each_Quorum (Data Centers), All. ● Write – Any, One, Quorum, Local/Each_Quorum (Data Centers), All. ● Any means that data can be written to co-ordinator if replica's are down till replica's come back up. ● Quorum allows for some consistency and tolerating some failures. ● All replica's must be up.
  • 19. Consistency ● Read ● At least one node needs to be up to read data from, obvious. ● Reads from a number of replicas returning the latest data, based on timestamp. ● Read repair ensures data remains consistent, updates out of date nodes with latest data. Runs in background.
  • 20. Cassandra Query Language ● Allows for ● Select – SELECT [FIRST N] [REVERSED] <SELECT EXPR> FROM <COLUMN FAMILY> [USING <CONSISTENCY>] [WHERE <CLAUSE>] [LIMIT N]; – SELECT [FIRST N] [REVERSED] name1..nameN FROM – Unlike SQL, no guarantee that columns will be returned – SELECT ... WHERE KEY >= startkey and KEY =< endkey AND name1 = value1 ● Insert ● Delete ● Update ● Batch ● Truncate ● Create Keyspace ● Create Column Family ● Create Index ● Drop
  • 21. Other Stuff ● Cassandra stores columns in sorted order ● Allows you to get the first or last X number of columns ● Potentially store historical user data ● Single column cannot hold more than 2gb ● Max number of columns per row is 2 billion ● Key and Column Names must be <64kb ● Most Languages have client libraries (Python, Java, Scala, Node.js, PHP, C++...) ● Try not to use raw thrift.
  • 22. Last Example ● User Statuses ● Columns stored in sorted order... use timestamp as column name ● Rk = {1:'Good morning all',2:'lunch was good',3:'time to get drunk',4:'so many regrets from last night'} ● Create column family UserStatuses with comparator = LongType and Key_validation_class=UTF8Type and default_validation_class=UTF8Type ● Get last X number of Columns, Get first X number of columns