SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
Cassandra Explained


      Disruptive Code
    September 22, 2010

            Eric Evans
    eevans@rackspace.com
           @jericevans
    http://blog.sym-link.com
Outline
●   Background
●   Description
●   API(s)
●   Code Samples
Background
The Digital Universe
Consolidation
Old Guard
Vertical Scaling Sucks
Influential Papers
●   BigTable
    ● Strong consistency
    ● Sparse map data model


    ● GFS, Chubby, et al


●   Dynamo
    ●   O(1) distributed hash table (DHT)
    ●   BASE (aka eventual consistency)
    ●   Client tunable consistency/availability
NoSQL
●   HBase          ●   Hypertable
●   MongoDB        ●   HyperGraphDB
●   Riak           ●   Memcached
●   Voldemort      ●   Tokyo Cabinet
●   Neo4J          ●   Redis
●   Cassandra      ●   CouchDB
NoSQL Big data
●   HBase           ●   Hypertable
●   MongoDB         ●   HyperGraphDB
●   Riak            ●   Memcached
●   Voldemort       ●   Tokyo Cabinet
●   Neo4J           ●   Redis
●   Cassandra       ●   CouchDB
Bigtable / Dynamo
        Bigtable              Dynamo
●   HBase          ●   Riak
●   Hypertable     ●   Voldemort



            Cassandra ??
Dynamo-Bigtable Lovechild
CAP Theorem “Pick Two”
●   CP               ●   AP
    ●   Bigtable         ●   Dynamo
    ●   Hypertable       ●   Voldemort
    ●   HBase            ●   Cassandra
CAP Theorem “Pick Two”



   ●   Consistency
   ●   Availability
   ●   Partition Tolerance
Description
Properties
●   Symmetric
    ● No single point of failure
    ● Linearly scalable


    ● Ease of administration


●   Flexible partitioning, replica placement
●   Automated provisioning
●   High availability (eventual consistency)
P2P Routing
P2P Routing
Partitioning
●   Random
    ●   128bit namespace, (MD5)
    ●   Good distribution
●   Order Preserving
    ●   Tokens determine namespace
    ●   Natural order (lexicographical)
    ●   Range / cover queries
●   Yours ??
Replica Placement
●   SimpleSnitch
    ●   Default
    ●   N-1 successive nodes
●   RackInferringSnitch
    ●   Infers DC/rack from IP
●   PropertyFileSnitch
    ●   Configured w/ a properties file
Bootstrap
Bootstrap
Bootstrap
Choosing Consistency

         Write                      Read
Level     Description      Level     Description
ZERO      Hail Mary        ZERO      N/A
ANY       1 replica (HH)   ANY       N/A
ONE       1 replica        ONE       1 replica
QUORUM    (N / 2) +1       QUORUM    (N / 2) +1
ALL       All replicas     ALL       All replicas

                       R+W>N
Quorum ((N/2) + 1)
Quorum ((N/2) + 1)
Data Model
Overview
●   Keyspace
    ●   Uppermost namespace
    ●   Typically one per application
●   ColumnFamily
    ●   Associates records of a similar kind
    ●   Record-level Atomicity
    ●   Indexed
●   Column
    ●   Basic unit of storage
Sparse Table
Column
●   name
    ●   byte[]
    ●   Queried against (predicates)
    ●   Determines sort order
●   value
    ●   byte[]
    ●   Opaque to Cassandra
●   timestamp
    ●   long
    ●   Conflict resolution (Last Write Wins)
Column Comparators
●    Bytes
●    UTF8
●    TimeUUID
●    Long
●    LexicalUUID
●    Composite (third-party)


    http://github.com/edanuff/CassandraCompositeType
API
RPC




THRIFT
RPC




THRIFT
RPC




AVRO
RPC




AVRO
Idiomatic Client Libraries
●    Pelops, Hector (Java)
●    Pycassa (Python)
●    Cassandra (Ruby)
●    Others …




    http://wiki.apache.org/cassandra/ClientOptions
Code Samples
Pycassa – Python Client API
# creating a connection
from pycassa import connect
hosts = ['host1:9160', 'host2:9160']
client = connect('Keyspace1', hosts)

# creating a column family instance
from pycassa import ColumnFamily
cf = ColumnFamily(client, “Standard1”)

# reading/writing a column
cf.insert('key1', {'name': 'value'})
print cf.get('key1')['name']

   1. http://github.com/vomjom/pycassa
Address Book – Setup
# conf/cassandra.yaml
keyspaces:
  - name: AddressBook
    column_families:
      - name: Addresses
        compare_with: BytesType
        rows_cached: 10000
        keys_cached: 50
        comment: 'No comment'
Adding an entry
key = uuid()

columns = {
    'first':   'Eric',
    'last':    'Evans',
    'email':   'eevans@rackspace.com',
    'city':    'San Antonio',
    'zip':     78250
}

addresses.insert(key, columns)
Fetching a record
# fetching the record by key
record = addresses.get(key)

# accessing columns by name
zipcode = record['zip']
city = record['city']
Indexing (manual)
# conf/cassandra.yaml
keyspaces:
  - name: AddressBook
    column_families:
      - name: Addresses
        compare_with: BytesType
        rows_cached: 10000
        keys_cached: 50
        comment: 'No comment'
      - name: ByCity
        compare_with: UTF8Type
Updating the index
key = uuid()

columns = {
    'first':   'Eric',
    'last':    'Evans',
    'email':   'eevans@rackspace.com',
    'city':    'San Antonio',
    'zip':     78250
}

addresses.insert(key, columns)
byCity.insert('San Antonio', {key: ''})
Indexing (auto)
# conf/cassandra.yaml
keyspaces:
  - name: AddressBook
    column_families:
      - name: Addresses
        compare_with: BytesType
        rows_cached: 10000
        keys_cached: 50
        comment: 'No comment'
        column_metadata:
           - name: city
             index_type: KEYS
Querying the Index
from pycassa.index import create_index_expression
from pycassa.index import create_index_clause

e = create_index_expression('city', 'San Antonio')
clause = create_index_clause([e])

results = address.get_indexed_slices(clause)

for (key, columns) in results.items():
    print “%(first)s %(last)s” % columns
Timeseries
# conf/cassandra.yaml
keyspaces:
  - name: Sites
    column_families:
      - name: Stats
        compare_with: LongType
Logging values
# time as long (milliseconds since epoch)
tstmp = long(time() * 1e6)

stats.insert('org.apache', {tstmp: value})
Slicing
begin = long(start * 1e6)

stats.get_range('org.apache',
                column_start=begin)

end = long((start + 86400) * 1e6)

stats.get_range(start='org.apache',
                finish='org.debian',
                column_start=begin,
                column_finish=end)
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Scaling Twitter with Cassandra
Scaling Twitter with CassandraScaling Twitter with Cassandra
Scaling Twitter with Cassandra
Ryan King
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
DataStax
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architecture
Markus Klems
 

Was ist angesagt? (20)

Bulk Loading Data into Cassandra
Bulk Loading Data into CassandraBulk Loading Data into Cassandra
Bulk Loading Data into Cassandra
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
Apache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceApache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis Price
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Scaling Twitter with Cassandra
Scaling Twitter with CassandraScaling Twitter with Cassandra
Scaling Twitter with Cassandra
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Cassandra + Spark + Elk
Cassandra + Spark + ElkCassandra + Spark + Elk
Cassandra + Spark + Elk
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Introduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopIntroduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and Hadoop
 
Heuritech: Apache Spark REX
Heuritech: Apache Spark REXHeuritech: Apache Spark REX
Heuritech: Apache Spark REX
 
Bulk Loading into Cassandra
Bulk Loading into CassandraBulk Loading into Cassandra
Bulk Loading into Cassandra
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
memcached Distributed Cache
memcached Distributed Cachememcached Distributed Cache
memcached Distributed Cache
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architecture
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
 
Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)
 
Using Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into CassandraUsing Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into Cassandra
 

Andere mochten auch

Andere mochten auch (16)

Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 
Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide Deck
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Google Dremel
Google DremelGoogle Dremel
Google Dremel
 
NoSQL Essentials: Cassandra
NoSQL Essentials: CassandraNoSQL Essentials: Cassandra
NoSQL Essentials: Cassandra
 
Raft
RaftRaft
Raft
 
Software Productivity and Serverless
Software Productivity and ServerlessSoftware Productivity and Serverless
Software Productivity and Serverless
 
[serverlessconf2017]FaaSで簡単に実現する数十万RPSスパイク負荷試験
[serverlessconf2017]FaaSで簡単に実現する数十万RPSスパイク負荷試験[serverlessconf2017]FaaSで簡単に実現する数十万RPSスパイク負荷試験
[serverlessconf2017]FaaSで簡単に実現する数十万RPSスパイク負荷試験
 
Serverlessconf Tokyo 2017 Biz serverless お客様のビジネスを支える サーバーレスアーキテクチャーと開発としてのビジ...
Serverlessconf Tokyo 2017 Biz serverless お客様のビジネスを支えるサーバーレスアーキテクチャーと開発としてのビジ...Serverlessconf Tokyo 2017 Biz serverless お客様のビジネスを支えるサーバーレスアーキテクチャーと開発としてのビジ...
Serverlessconf Tokyo 2017 Biz serverless お客様のビジネスを支える サーバーレスアーキテクチャーと開発としてのビジ...
 
now
nownow
now
 
BluetoothメッシュによるIoTシステムを支えるサーバーレス技術 #serverlesstokyo
BluetoothメッシュによるIoTシステムを支えるサーバーレス技術 #serverlesstokyoBluetoothメッシュによるIoTシステムを支えるサーバーレス技術 #serverlesstokyo
BluetoothメッシュによるIoTシステムを支えるサーバーレス技術 #serverlesstokyo
 
Step functionsとaws batchでオーケストレートするイベントドリブンな機械学習基盤
Step functionsとaws batchでオーケストレートするイベントドリブンな機械学習基盤Step functionsとaws batchでオーケストレートするイベントドリブンな機械学習基盤
Step functionsとaws batchでオーケストレートするイベントドリブンな機械学習基盤
 
分散システムについて語らせてくれ
分散システムについて語らせてくれ分散システムについて語らせてくれ
分散システムについて語らせてくれ
 
Growing up serverless
Growing up serverlessGrowing up serverless
Growing up serverless
 
「サーバレスの薄い本」からの1年 #serverlesstokyo
「サーバレスの薄い本」からの1年 #serverlesstokyo「サーバレスの薄い本」からの1年 #serverlesstokyo
「サーバレスの薄い本」からの1年 #serverlesstokyo
 

Ähnlich wie Cassandra Explained

Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
Murat Çakal
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
shimi_k
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Boris Yen
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
rantav
 

Ähnlich wie Cassandra Explained (20)

Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
NoSQL, no Limits, lots of Fun!
NoSQL, no Limits, lots of Fun!NoSQL, no Limits, lots of Fun!
NoSQL, no Limits, lots of Fun!
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
TDC2017 | Florianopolis - Trilha DevOps How we figured out we had a SRE team ...
TDC2017 | Florianopolis - Trilha DevOps How we figured out we had a SRE team ...TDC2017 | Florianopolis - Trilha DevOps How we figured out we had a SRE team ...
TDC2017 | Florianopolis - Trilha DevOps How we figured out we had a SRE team ...
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparison
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Clojure ♥ cassandra
Clojure ♥ cassandra Clojure ♥ cassandra
Clojure ♥ cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
C# as a System Language
C# as a System LanguageC# as a System Language
C# as a System Language
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 

Mehr von Eric Evans

Time Series Data with Apache Cassandra (ApacheCon EU 2014)
Time Series Data with Apache Cassandra (ApacheCon EU 2014)Time Series Data with Apache Cassandra (ApacheCon EU 2014)
Time Series Data with Apache Cassandra (ApacheCon EU 2014)
Eric Evans
 
Time Series Data with Apache Cassandra
Time Series Data with Apache CassandraTime Series Data with Apache Cassandra
Time Series Data with Apache Cassandra
Eric Evans
 
Cassandra by Example: Data Modelling with CQL3
Cassandra by Example:  Data Modelling with CQL3Cassandra by Example:  Data Modelling with CQL3
Cassandra by Example: Data Modelling with CQL3
Eric Evans
 
Rethinking Topology In Cassandra (ApacheCon NA)
Rethinking Topology In Cassandra (ApacheCon NA)Rethinking Topology In Cassandra (ApacheCon NA)
Rethinking Topology In Cassandra (ApacheCon NA)
Eric Evans
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in Cassandra
Eric Evans
 
Cassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQLCassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQL
Eric Evans
 
NoSQL Yes, But YesCQL, No?
NoSQL Yes, But YesCQL, No?NoSQL Yes, But YesCQL, No?
NoSQL Yes, But YesCQL, No?
Eric Evans
 

Mehr von Eric Evans (20)

Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-case
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-case
 
Time Series Data with Apache Cassandra (ApacheCon EU 2014)
Time Series Data with Apache Cassandra (ApacheCon EU 2014)Time Series Data with Apache Cassandra (ApacheCon EU 2014)
Time Series Data with Apache Cassandra (ApacheCon EU 2014)
 
Time Series Data with Apache Cassandra
Time Series Data with Apache CassandraTime Series Data with Apache Cassandra
Time Series Data with Apache Cassandra
 
Time Series Data with Apache Cassandra
Time Series Data with Apache CassandraTime Series Data with Apache Cassandra
Time Series Data with Apache Cassandra
 
It's not you, it's me: Ending a 15 year relationship with RRD
It's not you, it's me: Ending a 15 year relationship with RRDIt's not you, it's me: Ending a 15 year relationship with RRD
It's not you, it's me: Ending a 15 year relationship with RRD
 
Time series storage in Cassandra
Time series storage in CassandraTime series storage in Cassandra
Time series storage in Cassandra
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in Cassandra
 
Cassandra by Example: Data Modelling with CQL3
Cassandra by Example:  Data Modelling with CQL3Cassandra by Example:  Data Modelling with CQL3
Cassandra by Example: Data Modelling with CQL3
 
Rethinking Topology In Cassandra (ApacheCon NA)
Rethinking Topology In Cassandra (ApacheCon NA)Rethinking Topology In Cassandra (ApacheCon NA)
Rethinking Topology In Cassandra (ApacheCon NA)
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in Cassandra
 
Castle enhanced Cassandra
Castle enhanced CassandraCastle enhanced Cassandra
Castle enhanced Cassandra
 
CQL: SQL In Cassandra
CQL: SQL In CassandraCQL: SQL In Cassandra
CQL: SQL In Cassandra
 
CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)
 
Cassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQLCassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQL
 
NoSQL Yes, But YesCQL, No?
NoSQL Yes, But YesCQL, No?NoSQL Yes, But YesCQL, No?
NoSQL Yes, But YesCQL, No?
 
Outside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraOutside The Box With Apache Cassnadra
Outside The Box With Apache Cassnadra
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
An Introduction To Cassandra
An Introduction To CassandraAn Introduction To Cassandra
An Introduction To Cassandra
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Cassandra Explained

  • 1. Cassandra Explained Disruptive Code September 22, 2010 Eric Evans eevans@rackspace.com @jericevans http://blog.sym-link.com
  • 2. Outline ● Background ● Description ● API(s) ● Code Samples
  • 8. Influential Papers ● BigTable ● Strong consistency ● Sparse map data model ● GFS, Chubby, et al ● Dynamo ● O(1) distributed hash table (DHT) ● BASE (aka eventual consistency) ● Client tunable consistency/availability
  • 9. NoSQL ● HBase ● Hypertable ● MongoDB ● HyperGraphDB ● Riak ● Memcached ● Voldemort ● Tokyo Cabinet ● Neo4J ● Redis ● Cassandra ● CouchDB
  • 10. NoSQL Big data ● HBase ● Hypertable ● MongoDB ● HyperGraphDB ● Riak ● Memcached ● Voldemort ● Tokyo Cabinet ● Neo4J ● Redis ● Cassandra ● CouchDB
  • 11. Bigtable / Dynamo Bigtable Dynamo ● HBase ● Riak ● Hypertable ● Voldemort Cassandra ??
  • 13. CAP Theorem “Pick Two” ● CP ● AP ● Bigtable ● Dynamo ● Hypertable ● Voldemort ● HBase ● Cassandra
  • 14. CAP Theorem “Pick Two” ● Consistency ● Availability ● Partition Tolerance
  • 16. Properties ● Symmetric ● No single point of failure ● Linearly scalable ● Ease of administration ● Flexible partitioning, replica placement ● Automated provisioning ● High availability (eventual consistency)
  • 19. Partitioning ● Random ● 128bit namespace, (MD5) ● Good distribution ● Order Preserving ● Tokens determine namespace ● Natural order (lexicographical) ● Range / cover queries ● Yours ??
  • 20. Replica Placement ● SimpleSnitch ● Default ● N-1 successive nodes ● RackInferringSnitch ● Infers DC/rack from IP ● PropertyFileSnitch ● Configured w/ a properties file
  • 24. Choosing Consistency Write Read Level Description Level Description ZERO Hail Mary ZERO N/A ANY 1 replica (HH) ANY N/A ONE 1 replica ONE 1 replica QUORUM (N / 2) +1 QUORUM (N / 2) +1 ALL All replicas ALL All replicas R+W>N
  • 28. Overview ● Keyspace ● Uppermost namespace ● Typically one per application ● ColumnFamily ● Associates records of a similar kind ● Record-level Atomicity ● Indexed ● Column ● Basic unit of storage
  • 30. Column ● name ● byte[] ● Queried against (predicates) ● Determines sort order ● value ● byte[] ● Opaque to Cassandra ● timestamp ● long ● Conflict resolution (Last Write Wins)
  • 31. Column Comparators ● Bytes ● UTF8 ● TimeUUID ● Long ● LexicalUUID ● Composite (third-party) http://github.com/edanuff/CassandraCompositeType
  • 32. API
  • 37. Idiomatic Client Libraries ● Pelops, Hector (Java) ● Pycassa (Python) ● Cassandra (Ruby) ● Others … http://wiki.apache.org/cassandra/ClientOptions
  • 39. Pycassa – Python Client API # creating a connection from pycassa import connect hosts = ['host1:9160', 'host2:9160'] client = connect('Keyspace1', hosts) # creating a column family instance from pycassa import ColumnFamily cf = ColumnFamily(client, “Standard1”) # reading/writing a column cf.insert('key1', {'name': 'value'}) print cf.get('key1')['name'] 1. http://github.com/vomjom/pycassa
  • 40. Address Book – Setup # conf/cassandra.yaml keyspaces: - name: AddressBook column_families: - name: Addresses compare_with: BytesType rows_cached: 10000 keys_cached: 50 comment: 'No comment'
  • 41. Adding an entry key = uuid() columns = { 'first': 'Eric', 'last': 'Evans', 'email': 'eevans@rackspace.com', 'city': 'San Antonio', 'zip': 78250 } addresses.insert(key, columns)
  • 42. Fetching a record # fetching the record by key record = addresses.get(key) # accessing columns by name zipcode = record['zip'] city = record['city']
  • 43. Indexing (manual) # conf/cassandra.yaml keyspaces: - name: AddressBook column_families: - name: Addresses compare_with: BytesType rows_cached: 10000 keys_cached: 50 comment: 'No comment' - name: ByCity compare_with: UTF8Type
  • 44. Updating the index key = uuid() columns = { 'first': 'Eric', 'last': 'Evans', 'email': 'eevans@rackspace.com', 'city': 'San Antonio', 'zip': 78250 } addresses.insert(key, columns) byCity.insert('San Antonio', {key: ''})
  • 45. Indexing (auto) # conf/cassandra.yaml keyspaces: - name: AddressBook column_families: - name: Addresses compare_with: BytesType rows_cached: 10000 keys_cached: 50 comment: 'No comment' column_metadata: - name: city index_type: KEYS
  • 46. Querying the Index from pycassa.index import create_index_expression from pycassa.index import create_index_clause e = create_index_expression('city', 'San Antonio') clause = create_index_clause([e]) results = address.get_indexed_slices(clause) for (key, columns) in results.items(): print “%(first)s %(last)s” % columns
  • 47. Timeseries # conf/cassandra.yaml keyspaces: - name: Sites column_families: - name: Stats compare_with: LongType
  • 48. Logging values # time as long (milliseconds since epoch) tstmp = long(time() * 1e6) stats.insert('org.apache', {tstmp: value})
  • 49. Slicing begin = long(start * 1e6) stats.get_range('org.apache', column_start=begin) end = long((start + 86400) * 1e6) stats.get_range(start='org.apache', finish='org.debian', column_start=begin, column_finish=end)