SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Downloaden Sie, um offline zu lesen
Inside the Cassandra Database




                                 Jonathan Ellis
                            jbellis@riptano.com / @spyced
Saturday, August 21, 2010
Saturday, August 21, 2010
Why NoSQL?



    ✤    Relational databases don’t scale

    ✤    The relational model maps poorly to some problems

    ✤    Relational databases are slow




Saturday, August 21, 2010
Myth 1



    ✤    “NoSQL is for people who don’t understand {SQL,
         denormalization, query tuning, ...}”
          ✤    Similarly: “Only users of [database X] are turning to NoSQL
               databases, because X sucks.”




Saturday, August 21, 2010
eBay: NoSQL pioneer

    ✤    “BASE is diametrically opposed to ACID. Where ACID is
         pessimistic and forces consistency at the end of every
         operation, BASE is optimistic and accepts that the
         database consistency will be in a state of flux. Although
         this sounds impossible to cope with, in reality it is quite
         manageable and leads to levels of scalability that cannot
         be obtained with ACID.”
          ✤    --BASE: An Acid Alternative, Dan Pritchett, eBay

          ✤    See also: The eBay Architecture, Randy Shoup and Dan Pritchett

Saturday, August 21, 2010
(“The eBay Architecture,” Randy Shoup and Dan Pritchett)

Saturday, August 21, 2010
Myth 2




    ✤    “NoSQL is nothing new because we had key/value
         databases like bdb years ago.”




Saturday, August 21, 2010
Myth 3




    ✤    “Only huge web 2.0 sites like Facebook and Twitter care
         about scalability.”




Saturday, August 21, 2010
Cassandra



    ✤    Relational databases don’t scale

    ✤    The relational model maps poorly to some problems

    ✤    Relational databases are slow




Saturday, August 21, 2010
Saturday, August 21, 2010
Saturday, August 21, 2010
Saturday, August 21, 2010
Saturday, August 21, 2010
Saturday, August 21, 2010
Saturday, August 21, 2010
Use cases

    ✤    Digital Reasoning: NLP + entity analytics

    ✤    OpenX: largest publisher-side ad network in the world

    ✤    Cloudkick: performance data & aggregation

    ✤    SimpleGEO: location-as-API

    ✤    Ooyala: video analytics and business intelligence

    ✤    ngmoco: massively multiplayer game worlds

Saturday, August 21, 2010
Cassandra



    ✤    Relational databases don’t scale

    ✤    The relational model maps poorly to some problems

    ✤    Relational databases are slow




Saturday, August 21, 2010
Saturday, August 21, 2010
Cassandra memory-aware index


                            <key 127>    <row data 0>
                            <key 255>    <row data 1>
                               ...            ...
                                        <row data 127>
                                              ...
                                        <row data 255>
                                              ...




Saturday, August 21, 2010
Reader
           Writer           Memtable




               Commitlog




The Log-Structured Merge-Tree,
Bigtable: A Distributed Storage System
for Structured Data


Saturday, August 21, 2010
Saturday, August 21, 2010
Durable



    ✤    Write to commitlog
          ✤    fsync is cheap since it’s append-only

    ✤    Write to memtable

    ✤    [amortized] flush memtable to sstable



Saturday, August 21, 2010
Scaling




Saturday, August 21, 2010
W   A




                            T

                                    L




Saturday, August 21, 2010
W   A




                                        F



                            T

                                    L




Saturday, August 21, 2010
W           A




                                                F
                                    (A-L]


                            T

                                            L




Saturday, August 21, 2010
W           A




                                    (A-F]       F



                            T
                                    (F-L]
                                            L




Saturday, August 21, 2010
Key “C”

                                W   A




                                        F



                            T

                                    L




Saturday, August 21, 2010
Reliability



    ✤    No single points of failure, clean design

    ✤    Multiple datacenters

    ✤    Monitorable




Saturday, August 21, 2010
Saturday, August 21, 2010
Some headlines


    ✤    “Resyncing Broken MySQL Replication”

    ✤    “How To Repair MySQL Replication”

    ✤    “Fixing Broken MySQL Database Replication”

    ✤    “Replication on Linux broken after db restore”

    ✤    “MySQL :: Repairing broken replication”


Saturday, August 21, 2010
The opposite of heroes




    ✤    “If your software wakes someone up at 4 AM to fix it,
         you’re doing it wrong.”




Saturday, August 21, 2010
Good architecture solves multiple
    problems at once


    ✤    Availability in single datacenter

    ✤    Availablility in multiple datacenters

    ✤    Built-in repair rather than an afterthought
          ✤    or worse, having to restart replication entirely




Saturday, August 21, 2010
Y
                                                    Key “C”
                                        A
                                W




                            U
                                                F




                                T
                                            L

                                    P



Saturday, August 21, 2010
Y
                                            A
                                W




                            U
                                                    F




                                T
                                                L
                                        P



Saturday, August 21, 2010
Y
                                        A
                                W




                            U
                                                F




                                T
                                            L

                                    P



Saturday, August 21, 2010
Y
                                                Key “C”
                                        A
                                W




                            U
                                            F



                                T
                                        L
                                    P



Saturday, August 21, 2010
Y
                                                    Key “C”
                                        A
                                W




                            U
                                                F




                                T
                                            L

                                    P



Saturday, August 21, 2010
Y
                                                    Key “C”
                                        A
                                W




                            U
                                                F




                                T
                                            L

                                    P



Saturday, August 21, 2010
Y
                                                    Key “C”
                                        A
                                W




                            U
                                                F




                                T
                                            L

                                    P



Saturday, August 21, 2010
Y
                                                    Key “C”
                                        A
                                W




                            U
                                                F




                                T
                                            L

                                    P



Saturday, August 21, 2010
Y
                                                    Key “C”
                                        A
                                W




                            U
                                                F




                                        X
                                T
                                            L

                                    P



Saturday, August 21, 2010
Y
                                                       Key “C”
                                           A
                                W




                            U
                                                   F




                                           X
                                T   hint
                                               L

                                    P



Saturday, August 21, 2010
Y
                                                       Key “C”
                                           A
                                W




                            U
                                                   F




                                           X
                                T   hint
                                               L

                                    P



Saturday, August 21, 2010
Y
                                        A
                                W




                            U
                                                F




                                T
                                            L

                                    P



Saturday, August 21, 2010
Y
                                        A
                                W




                            U
                                                F




                                T
                                            L

                                    P



Saturday, August 21, 2010
Y
                                        A
                                W




                            U
                                                F




                                T
                                            L

                                    P



Saturday, August 21, 2010
Saturday, August 21, 2010
Eventual Tuneable consistency



    ✤    ONE, QUORUM, ALL

    ✤    R+W>N

    ✤    Choose availability vs consistency (and latency)




Saturday, August 21, 2010
Monitorable




Saturday, August 21, 2010
Events




Saturday, August 21, 2010
1500
                                                                                       mails sent

          1125



            750



            375



                 0
                            Jan     Feb             Apr            May       Jun         Jul
                            (0.5)   (0.5.1)   Mar   (0.6, 0.6.1)   (0.6.2)   (0.6.3)     (0.6.4)




Saturday, August 21, 2010
Data model




    ✤    Twitter: “Fifteen months ago, it took two weeks to
         perform ALTER TABLE on the statuses [tweets] table.”




Saturday, August 21, 2010
ColumnFamilies
                            Columns




Saturday, August 21, 2010
Saturday, August 21, 2010
Cassandra “indexes” are
    materialized views
 SELECT * FROM tweets
 WHERE user_id IN (SELECT follower FROM followers WHERE user_id = ?)
 ORDER BY time_tweeted DESC
 LIMIT 40




             followers                                           timeline

                                                      ?
                                     SORT




   ?                                                      uuid:tweet



                            tweets

Saturday, August 21, 2010
Analytics in Cassandra




    ✤    @afex: “Cassandra + Pig (Hadoop) is very exciting. A 7
         line script to analyze data from my entire cluster
         transparently, with no ETL? Yes, please”




Saturday, August 21, 2010
TaskTracker




                            JobTracker




Saturday, August 21, 2010
Coming in 0.7 [beta1 released 13/8]

    ✤    Secondary indexes and expressions

    ✤    Large (> 2GB) rows

    ✤    More control over replica placement

    ✤    Online schema changes

    ✤    Flow control

    ✤    Dynamic routing around slow nodes

Saturday, August 21, 2010
More




    ✤    http://wiki.apache.org/cassandra/
         ArticlesAndPresentations

    ✤    http://wiki.apache.org/cassandra/ArchitectureInternals




Saturday, August 21, 2010
Questions




Saturday, August 21, 2010

Weitere ähnliche Inhalte

Andere mochten auch

Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupFrens Jan Rumph
 
NoSQL Database- cassandra column Base DB
NoSQL Database- cassandra column Base DBNoSQL Database- cassandra column Base DB
NoSQL Database- cassandra column Base DBsadegh salehi
 
DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012
DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012
DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012Amazon Web Services
 
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012Amazon Web Services
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraFolio3 Software
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed DatabaseEric Evans
 
Evaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseEvaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseDataStax
 
Best Presentation About Infosys
Best Presentation About InfosysBest Presentation About Infosys
Best Presentation About InfosysDurgadatta Dash
 

Andere mochten auch (10)

Cassandra
CassandraCassandra
Cassandra
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
 
NoSQL Database- cassandra column Base DB
NoSQL Database- cassandra column Base DBNoSQL Database- cassandra column Base DB
NoSQL Database- cassandra column Base DB
 
DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012
DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012
DAT202 Optimizing your Cassandra Database on AWS - AWS re: Invent 2012
 
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
MED202 Netflix’s Transcoding Transformation - AWS re: Invent 2012
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
Evaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseEvaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud Database
 
Best Presentation About Infosys
Best Presentation About InfosysBest Presentation About Infosys
Best Presentation About Infosys
 

Mehr von jbellis

Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databasesjbellis
 
Data day texas: Cassandra and the Cloud
Data day texas: Cassandra and the CloudData day texas: Cassandra and the Cloud
Data day texas: Cassandra and the Cloudjbellis
 
Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015jbellis
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014jbellis
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1jbellis
 
Tokyo cassandra conference 2014
Tokyo cassandra conference 2014Tokyo cassandra conference 2014
Tokyo cassandra conference 2014jbellis
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013jbellis
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0jbellis
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynotejbellis
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012jbellis
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionjbellis
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012jbellis
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandrajbellis
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1jbellis
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Javajbellis
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprisejbellis
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)jbellis
 
Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011jbellis
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)jbellis
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from javajbellis
 

Mehr von jbellis (20)

Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
Data day texas: Cassandra and the Cloud
Data day texas: Cassandra and the CloudData day texas: Cassandra and the Cloud
Data day texas: Cassandra and the Cloud
 
Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
 
Tokyo cassandra conference 2014
Tokyo cassandra conference 2014Tokyo cassandra conference 2014
Tokyo cassandra conference 2014
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solution
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandra
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
 
Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from java
 

Kürzlich hochgeladen

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

Cassandra FrOSCon 10

  • 1. Inside the Cassandra Database Jonathan Ellis jbellis@riptano.com / @spyced Saturday, August 21, 2010
  • 3. Why NoSQL? ✤ Relational databases don’t scale ✤ The relational model maps poorly to some problems ✤ Relational databases are slow Saturday, August 21, 2010
  • 4. Myth 1 ✤ “NoSQL is for people who don’t understand {SQL, denormalization, query tuning, ...}” ✤ Similarly: “Only users of [database X] are turning to NoSQL databases, because X sucks.” Saturday, August 21, 2010
  • 5. eBay: NoSQL pioneer ✤ “BASE is diametrically opposed to ACID. Where ACID is pessimistic and forces consistency at the end of every operation, BASE is optimistic and accepts that the database consistency will be in a state of flux. Although this sounds impossible to cope with, in reality it is quite manageable and leads to levels of scalability that cannot be obtained with ACID.” ✤ --BASE: An Acid Alternative, Dan Pritchett, eBay ✤ See also: The eBay Architecture, Randy Shoup and Dan Pritchett Saturday, August 21, 2010
  • 6. (“The eBay Architecture,” Randy Shoup and Dan Pritchett) Saturday, August 21, 2010
  • 7. Myth 2 ✤ “NoSQL is nothing new because we had key/value databases like bdb years ago.” Saturday, August 21, 2010
  • 8. Myth 3 ✤ “Only huge web 2.0 sites like Facebook and Twitter care about scalability.” Saturday, August 21, 2010
  • 9. Cassandra ✤ Relational databases don’t scale ✤ The relational model maps poorly to some problems ✤ Relational databases are slow Saturday, August 21, 2010
  • 16. Use cases ✤ Digital Reasoning: NLP + entity analytics ✤ OpenX: largest publisher-side ad network in the world ✤ Cloudkick: performance data & aggregation ✤ SimpleGEO: location-as-API ✤ Ooyala: video analytics and business intelligence ✤ ngmoco: massively multiplayer game worlds Saturday, August 21, 2010
  • 17. Cassandra ✤ Relational databases don’t scale ✤ The relational model maps poorly to some problems ✤ Relational databases are slow Saturday, August 21, 2010
  • 19. Cassandra memory-aware index <key 127> <row data 0> <key 255> <row data 1> ... ... <row data 127> ... <row data 255> ... Saturday, August 21, 2010
  • 20. Reader Writer Memtable Commitlog The Log-Structured Merge-Tree, Bigtable: A Distributed Storage System for Structured Data Saturday, August 21, 2010
  • 22. Durable ✤ Write to commitlog ✤ fsync is cheap since it’s append-only ✤ Write to memtable ✤ [amortized] flush memtable to sstable Saturday, August 21, 2010
  • 24. W A T L Saturday, August 21, 2010
  • 25. W A F T L Saturday, August 21, 2010
  • 26. W A F (A-L] T L Saturday, August 21, 2010
  • 27. W A (A-F] F T (F-L] L Saturday, August 21, 2010
  • 28. Key “C” W A F T L Saturday, August 21, 2010
  • 29. Reliability ✤ No single points of failure, clean design ✤ Multiple datacenters ✤ Monitorable Saturday, August 21, 2010
  • 31. Some headlines ✤ “Resyncing Broken MySQL Replication” ✤ “How To Repair MySQL Replication” ✤ “Fixing Broken MySQL Database Replication” ✤ “Replication on Linux broken after db restore” ✤ “MySQL :: Repairing broken replication” Saturday, August 21, 2010
  • 32. The opposite of heroes ✤ “If your software wakes someone up at 4 AM to fix it, you’re doing it wrong.” Saturday, August 21, 2010
  • 33. Good architecture solves multiple problems at once ✤ Availability in single datacenter ✤ Availablility in multiple datacenters ✤ Built-in repair rather than an afterthought ✤ or worse, having to restart replication entirely Saturday, August 21, 2010
  • 34. Y Key “C” A W U F T L P Saturday, August 21, 2010
  • 35. Y A W U F T L P Saturday, August 21, 2010
  • 36. Y A W U F T L P Saturday, August 21, 2010
  • 37. Y Key “C” A W U F T L P Saturday, August 21, 2010
  • 38. Y Key “C” A W U F T L P Saturday, August 21, 2010
  • 39. Y Key “C” A W U F T L P Saturday, August 21, 2010
  • 40. Y Key “C” A W U F T L P Saturday, August 21, 2010
  • 41. Y Key “C” A W U F T L P Saturday, August 21, 2010
  • 42. Y Key “C” A W U F X T L P Saturday, August 21, 2010
  • 43. Y Key “C” A W U F X T hint L P Saturday, August 21, 2010
  • 44. Y Key “C” A W U F X T hint L P Saturday, August 21, 2010
  • 45. Y A W U F T L P Saturday, August 21, 2010
  • 46. Y A W U F T L P Saturday, August 21, 2010
  • 47. Y A W U F T L P Saturday, August 21, 2010
  • 49. Eventual Tuneable consistency ✤ ONE, QUORUM, ALL ✤ R+W>N ✤ Choose availability vs consistency (and latency) Saturday, August 21, 2010
  • 52. 1500 mails sent 1125 750 375 0 Jan Feb Apr May Jun Jul (0.5) (0.5.1) Mar (0.6, 0.6.1) (0.6.2) (0.6.3) (0.6.4) Saturday, August 21, 2010
  • 53. Data model ✤ Twitter: “Fifteen months ago, it took two weeks to perform ALTER TABLE on the statuses [tweets] table.” Saturday, August 21, 2010
  • 54. ColumnFamilies Columns Saturday, August 21, 2010
  • 56. Cassandra “indexes” are materialized views SELECT * FROM tweets WHERE user_id IN (SELECT follower FROM followers WHERE user_id = ?) ORDER BY time_tweeted DESC LIMIT 40 followers timeline ? SORT ? uuid:tweet tweets Saturday, August 21, 2010
  • 57. Analytics in Cassandra ✤ @afex: “Cassandra + Pig (Hadoop) is very exciting. A 7 line script to analyze data from my entire cluster transparently, with no ETL? Yes, please” Saturday, August 21, 2010
  • 58. TaskTracker JobTracker Saturday, August 21, 2010
  • 59. Coming in 0.7 [beta1 released 13/8] ✤ Secondary indexes and expressions ✤ Large (> 2GB) rows ✤ More control over replica placement ✤ Online schema changes ✤ Flow control ✤ Dynamic routing around slow nodes Saturday, August 21, 2010
  • 60. More ✤ http://wiki.apache.org/cassandra/ ArticlesAndPresentations ✤ http://wiki.apache.org/cassandra/ArchitectureInternals Saturday, August 21, 2010