SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Cassandra 2012:
What's New & Upcoming

    Sam Tunnicliffe
●   sam@datastax.com
●   DSE : integrated Big Data platform
    –   Built on Cassandra
    –   Analytics using Hadoop (Hive/Pig/Mahout)
    –   Enterprise Search with Solr
Cassandra in 2012
●   Cassandra 1.1 – April 2012
●   Cassandra 1.2 – November 2012
●   What's on the roadmap
Release Schedule
Version                Release Date
0.5                    24 Jan 2010
0.6                    13 April 2010
0.7                    9 January 2011
0.8                    2 June 2011
1.0                    18 October 2011
1.1                    24 April 2012
1.2                    End November 2012 ?

●   Currently working to six month release cycle
●   Minor releases as and when necessary
Global Row & Key Caches
●   Prior to 1.1 separate row & key caches per
    column family
●   Since 1.1 single row cache & single key
    cache shared across all column families
●   Simpler configuration
    –   Globally:   {key,row}_cache_size_in_mb

    –   Per CF:     ALL|NONE|KEYS_ONLY|ROWS_ONLY
More Granular Storage Config
●   Pre 1.1 one storage location per keyspace
    –   /var/lib/cassandra/ks/cf-hc-1-Data.db

●   In 1.1 one location per column family
    –   /var/lib/cassandra/ks/cf/ks-cf-hc-1-Data.db
●   Allows you to pin certain data to particular
    storage system
Why?




 http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
Why?




 http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives
Row Level Isolation
●   Batched writes within row have always
    been atomic
●   Batched writes within row are now also
    isolated
Row Level Isolation
●   So Writing
      UPDATE foo
      SET column_x='a' AND column_y='1'
      WHERE id='bar';


●   Followed by
      UPDATE foo
      SET column_x='b' AND column_y='2'
      WHERE id='bar';


●   No reader ever sees either
      {column_a:'a', column_b:'2'}
      {column_x:'b', column_y:'1'}
Misc Other Stuff
●   Windows off heap cache
●   write survey mode
●   commitlog segment pre-allocation/recycling
●   abortable compactions
●   multi threaded streaming
●   Hadoop improvements
    –   utilise secondary indexes from Hadoop
    –   better wide row support
CQL 3.0
●   Beta in 1.1, finalized in 1.2
●   Motivations
    –   Better support for wide rows
    –   Native syntax for composites
CQL 3.0
●   Wide Rows
●   Composites
●   Transposition
CQL 3.0
CREATE TABLE comments (
   id text,
   posted_at timestamp,
   author text,
   content text,
   karma int,
   PRIMARY KEY( id, posted_at )
);



INSERT INTO comments (id, posted_at, author, content, karma)
VALUES ('article_x', '2012-10-31T08:00:00', 'freddie', 'awesome', 100);

INSERT INTO comments (id, posted_at, author, content, karma)
VALUES ('article_x', '2012-10-31T09:59:59', 'rageguy', 'F7U12', 0);
CQL 3.0 Transposition
{ article_x : { 2012-10-31T08:00:00 + author => freddie }
              { 2012-10-31T08:00:00 + content => awesome }
              { 2012-10-31T08:00:00 + karma   => 100 }

              { 2012-10-31T09:59:59 + author => rageguy }
              { 2012-10-31T09:59:59 + content => F7U12 }
              { 2012-10-31T09:59:59 + karma   => 1 }
}
{ article_y : { 2012-10-28T15:30:12 + author...




cqlsh:hn> select * from comments where id = 'article_x';


 id        | posted_at           | author | content | karma
-----------+---------------------+---------+---------+-------
 article_x | 2012-10-31 08:00:00 | freddie | awesome |   100
 article_x | 2012-10-31 09:59:59 | rageguy |   F7U12 |     1
CQL 3.0 Transposition
{ article_x : { 2012-10-31T08:00:00 + author => freddie }
              { 2012-10-31T08:00:00 + content => awesome }
              { 2012-10-31T08:00:00 + karma   => 100 }

              { 2012-10-31T09:59:59 + author => rageguy }
              { 2012-10-31T09:59:59 + content => F7U12 }
              { 2012-10-31T09:59:59 + karma   => 1 }
}
{ article_y : { 2012-10-28T15:30:12 + author...




cqlsh:hn> select * from comments where id = 'article_x'
          and posted_at <= '2012-10-31T08:00:00' ;

 id        | posted_at           | author | content | karma
-----------+---------------------+---------+---------+-------
 article_x | 2012-10-31 08:00:00 | freddie | awesome |   100
CQL 3.0 Collections
●   Typed collections – Set, Map, List
●   Good fit for denormalizing small collections
●   Less ugly, more efficient than previous
    approach
●   Composites under the hood
●   Each element stored as one column
    –   So each can have an individual TTL
CQL 3.0 Sets
●   Collection of typed elements
●   No duplicate elements
●   Actually a sorted set
    –   Ordering determined by natural order elements
CQL 3.0 Sets
CREATE TABLE users (
           user_id text PRIMARY KEY,
           first_name text,
           last_name text,
           emails set<text>
       );


●   Set entire value with set literal
    INSERT INTO users (user_id, first_name, last_name, emails)
    VALUES('sam', 'Sam', 'Tunnicliffe',
           {'sam@beobal.com', 'sam@datastax.com'});
CQL 3.0 Sets
●   Insert and remove
    UPDATE users
    SET emails = emails + {'sam@cohodo.net'}
    WHERE user_id = 'sam';

    UPDATE users
    SET emails = emails – {'sam@cohodo.net'}
    WHERE user_id = 'sam'


●   No distinction between empty and null set
    UPDATE users
    SET emails = {}
    WHERE user_id = 'sam';

    DELETE emails
    FROM users
    WHERE user_id = 'sam';
CQL 3.0 Lists
●   Ordering determined by user, not by
    natural ordering of elements
●   Allows multiple occurrences of same value
CQL 3.0 Lists
ALTER TABLE users ADD todo list<text>;

●   Set, Prepend & Append
    UPDATE users
    SET todo = [ 'get to Sinsheim', 'give presentation' ]
    WHERE user_id = 'sam';

    UPDATE users
    SET todo = [ 'finish slides' ] + todo
    WHERE user_id = 'sam';

    UPDATE users
    SET todo = todo + [ 'drink beer' ]
    WHERE user_id = 'sam';
CQL 3.0 Lists
●   Access elements by index
    –   Less performant, requires read of entire list
    UPDATE users
    SET todo[2] = 'slow down'
    WHERE user_id = 'sam';

    DELETE todo[3]
    FROM users
    WHERE user_id = 'sam';


●   Remove by value
    UPDATE users
    SET todo = todo – ['finish slides']
    WHERE user_id = 'sam';
CQL 3.0 Maps
●   Collection of typed key/value pairs
●   Like sets, maps are actually ordered
    –   Ordering determined by the type of the keys
●   Operate on entire collection or individual
    element by key
CQL 3.0 Maps
ALTER TABLE users ADD accounts map<text, text>;
●   Set entire value with map literal
    UPDATE users
    SET accounts = { 'twitter' : 'beobal',
                     'irc'     : 'sam___' }
    WHERE user_id = 'sam';


●   Access elements by key
    UPDATE users
    SET accounts['irc'] = 'beobal@freenode.net'
    WHERE user_id = 'sam';

    DELETE accounts['twitter']
    FROM users
    WHERE user_id = 'sam';
CQL 3.0 Collections
●   Can only retrieve a collection in its entirety
    –   Collections are not intended to be very large
    –   Not a replacement for a good data model
●   Typed but cannot currently be nested
    –   Can't define a list<list<int>>
●   No support for secondary indexes
    –   On the roadmap but not yet implemented
CQL 3.0 Binary Protocol
●   Motivations
    –   Ability to optimize for specific use-cases
    –   Reduce client dependencies
    –   Lay groundwork for future enhancements such
        as streaming/paging
CQL 3.0 Binary Protocol
●   Framed protocol, designed natively for
    CQL 3.0
●   Built on Netty
●   Java, Python & C drivers in development
●   Switched off by default in 1.2
●   Thrift API will remain, no breaking API
    changes
Virtual Nodes
●   Evolution of Cassandra's distribution model
●   Nodes own multiple token ranges
●   Big implications for operations
    –   Replacing failed nodes
    –   Growing / shrinking cluster
●   Stay put for Eric
Concurrent Schema Changes
●   Online schema changes introduced in 0.7
●   Initially schema changes needed serializing
    –   Update schema
    –   Wait for propagation before making next
        change
●   Mostly fixed in 1.1
●   Completed in 1.2
    –   Concurrent ColumnFamily creation
Better JBOD Support
●   Growing data volumes and compaction
    requirements means moar disk
●   Pre 1.2 best practice is to run RAID10
●   Improvements to handling disk failures
    –   disk_failure_policy : stop | best_effort | ignore

    –   best_effort implications   for writes / reads
●   Better write balancing for less contention
Misc Other Stuff
●   Query Tracing
●   Speedup slicing wide rows from disk
●   Atomic Batches
Future Plans
●   Improved caching for wide rows
●   Remove 2-phase compaction
●   Remove SuperColumns internally
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra Interoperability
Colin Charles
 

Was ist angesagt? (20)

Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
 
CQL3 in depth
CQL3 in depthCQL3 in depth
CQL3 in depth
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
 
Python SQite3 database Tutorial | SQlite Database
Python SQite3 database Tutorial | SQlite DatabasePython SQite3 database Tutorial | SQlite Database
Python SQite3 database Tutorial | SQlite Database
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.
 
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...
Cassandra Day SV 2014: Netflix’s Astyanax Java Client Driver for Apache Cassa...
 
How to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with GaleraHow to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with Galera
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Cassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupCassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL Meetup
 
MySQL database replication
MySQL database replicationMySQL database replication
MySQL database replication
 
A brief introduction to PostgreSQL
A brief introduction to PostgreSQLA brief introduction to PostgreSQL
A brief introduction to PostgreSQL
 
MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra Interoperability
 
Cassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathCassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write path
 
Mongodb replication
Mongodb replicationMongodb replication
Mongodb replication
 
MariaDB for developers
MariaDB for developersMariaDB for developers
MariaDB for developers
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
 
Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015
 
How to understand Galera Cluster - 2013
How to understand Galera Cluster - 2013How to understand Galera Cluster - 2013
How to understand Galera Cluster - 2013
 

Andere mochten auch

Tmjh English
Tmjh EnglishTmjh English
Tmjh English
tmjhpri
 

Andere mochten auch (7)

Open Source In Further Education
Open Source In Further EducationOpen Source In Further Education
Open Source In Further Education
 
Open Source Basics
Open Source BasicsOpen Source Basics
Open Source Basics
 
Tmjh English
Tmjh EnglishTmjh English
Tmjh English
 
Browsing History for Mozilla
Browsing History for MozillaBrowsing History for Mozilla
Browsing History for Mozilla
 
Sustainable Podcasting
Sustainable PodcastingSustainable Podcasting
Sustainable Podcasting
 
94light
94light94light
94light
 
A forge is just a tool, but is it the right tool?
A forge is just a tool, but is it the right tool?A forge is just a tool, but is it the right tool?
A forge is just a tool, but is it the right tool?
 

Ähnlich wie Cassandra 2012

Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
Roland Bouman
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
Roland Bouman
 
NoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
NoCOUG_201411_Patel_Managing_a_Large_OLTP_DatabaseNoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
NoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
Paresh Patel
 

Ähnlich wie Cassandra 2012 (20)

PostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesPostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major Features
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 
Recent MariaDB features to learn for a happy life
Recent MariaDB features to learn for a happy lifeRecent MariaDB features to learn for a happy life
Recent MariaDB features to learn for a happy life
 
MariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsMariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAs
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdf
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystem
 
What’s New In PostgreSQL 9.3
What’s New In PostgreSQL 9.3What’s New In PostgreSQL 9.3
What’s New In PostgreSQL 9.3
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
 
dbs class 7.ppt
dbs class 7.pptdbs class 7.ppt
dbs class 7.ppt
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
NoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
NoCOUG_201411_Patel_Managing_a_Large_OLTP_DatabaseNoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
NoCOUG_201411_Patel_Managing_a_Large_OLTP_Database
 
BITS: Introduction to MySQL - Introduction and Installation
BITS: Introduction to MySQL - Introduction and InstallationBITS: Introduction to MySQL - Introduction and Installation
BITS: Introduction to MySQL - Introduction and Installation
 
Introduction to Postrges-XC
Introduction to Postrges-XCIntroduction to Postrges-XC
Introduction to Postrges-XC
 
Think Exa!
Think Exa!Think Exa!
Think Exa!
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Cassandra 2012

  • 1. Cassandra 2012: What's New & Upcoming Sam Tunnicliffe
  • 2. sam@datastax.com ● DSE : integrated Big Data platform – Built on Cassandra – Analytics using Hadoop (Hive/Pig/Mahout) – Enterprise Search with Solr
  • 3. Cassandra in 2012 ● Cassandra 1.1 – April 2012 ● Cassandra 1.2 – November 2012 ● What's on the roadmap
  • 4. Release Schedule Version Release Date 0.5 24 Jan 2010 0.6 13 April 2010 0.7 9 January 2011 0.8 2 June 2011 1.0 18 October 2011 1.1 24 April 2012 1.2 End November 2012 ? ● Currently working to six month release cycle ● Minor releases as and when necessary
  • 5. Global Row & Key Caches ● Prior to 1.1 separate row & key caches per column family ● Since 1.1 single row cache & single key cache shared across all column families ● Simpler configuration – Globally: {key,row}_cache_size_in_mb – Per CF: ALL|NONE|KEYS_ONLY|ROWS_ONLY
  • 6. More Granular Storage Config ● Pre 1.1 one storage location per keyspace – /var/lib/cassandra/ks/cf-hc-1-Data.db ● In 1.1 one location per column family – /var/lib/cassandra/ks/cf/ks-cf-hc-1-Data.db ● Allows you to pin certain data to particular storage system
  • 9. Row Level Isolation ● Batched writes within row have always been atomic ● Batched writes within row are now also isolated
  • 10. Row Level Isolation ● So Writing UPDATE foo SET column_x='a' AND column_y='1' WHERE id='bar'; ● Followed by UPDATE foo SET column_x='b' AND column_y='2' WHERE id='bar'; ● No reader ever sees either {column_a:'a', column_b:'2'} {column_x:'b', column_y:'1'}
  • 11. Misc Other Stuff ● Windows off heap cache ● write survey mode ● commitlog segment pre-allocation/recycling ● abortable compactions ● multi threaded streaming ● Hadoop improvements – utilise secondary indexes from Hadoop – better wide row support
  • 12. CQL 3.0 ● Beta in 1.1, finalized in 1.2 ● Motivations – Better support for wide rows – Native syntax for composites
  • 13. CQL 3.0 ● Wide Rows ● Composites ● Transposition
  • 14. CQL 3.0 CREATE TABLE comments ( id text, posted_at timestamp, author text, content text, karma int, PRIMARY KEY( id, posted_at ) ); INSERT INTO comments (id, posted_at, author, content, karma) VALUES ('article_x', '2012-10-31T08:00:00', 'freddie', 'awesome', 100); INSERT INTO comments (id, posted_at, author, content, karma) VALUES ('article_x', '2012-10-31T09:59:59', 'rageguy', 'F7U12', 0);
  • 15. CQL 3.0 Transposition { article_x : { 2012-10-31T08:00:00 + author => freddie } { 2012-10-31T08:00:00 + content => awesome } { 2012-10-31T08:00:00 + karma => 100 } { 2012-10-31T09:59:59 + author => rageguy } { 2012-10-31T09:59:59 + content => F7U12 } { 2012-10-31T09:59:59 + karma => 1 } } { article_y : { 2012-10-28T15:30:12 + author... cqlsh:hn> select * from comments where id = 'article_x'; id | posted_at | author | content | karma -----------+---------------------+---------+---------+------- article_x | 2012-10-31 08:00:00 | freddie | awesome | 100 article_x | 2012-10-31 09:59:59 | rageguy | F7U12 | 1
  • 16. CQL 3.0 Transposition { article_x : { 2012-10-31T08:00:00 + author => freddie } { 2012-10-31T08:00:00 + content => awesome } { 2012-10-31T08:00:00 + karma => 100 } { 2012-10-31T09:59:59 + author => rageguy } { 2012-10-31T09:59:59 + content => F7U12 } { 2012-10-31T09:59:59 + karma => 1 } } { article_y : { 2012-10-28T15:30:12 + author... cqlsh:hn> select * from comments where id = 'article_x' and posted_at <= '2012-10-31T08:00:00' ; id | posted_at | author | content | karma -----------+---------------------+---------+---------+------- article_x | 2012-10-31 08:00:00 | freddie | awesome | 100
  • 17. CQL 3.0 Collections ● Typed collections – Set, Map, List ● Good fit for denormalizing small collections ● Less ugly, more efficient than previous approach ● Composites under the hood ● Each element stored as one column – So each can have an individual TTL
  • 18. CQL 3.0 Sets ● Collection of typed elements ● No duplicate elements ● Actually a sorted set – Ordering determined by natural order elements
  • 19. CQL 3.0 Sets CREATE TABLE users ( user_id text PRIMARY KEY, first_name text, last_name text, emails set<text> ); ● Set entire value with set literal INSERT INTO users (user_id, first_name, last_name, emails) VALUES('sam', 'Sam', 'Tunnicliffe', {'sam@beobal.com', 'sam@datastax.com'});
  • 20. CQL 3.0 Sets ● Insert and remove UPDATE users SET emails = emails + {'sam@cohodo.net'} WHERE user_id = 'sam'; UPDATE users SET emails = emails – {'sam@cohodo.net'} WHERE user_id = 'sam' ● No distinction between empty and null set UPDATE users SET emails = {} WHERE user_id = 'sam'; DELETE emails FROM users WHERE user_id = 'sam';
  • 21. CQL 3.0 Lists ● Ordering determined by user, not by natural ordering of elements ● Allows multiple occurrences of same value
  • 22. CQL 3.0 Lists ALTER TABLE users ADD todo list<text>; ● Set, Prepend & Append UPDATE users SET todo = [ 'get to Sinsheim', 'give presentation' ] WHERE user_id = 'sam'; UPDATE users SET todo = [ 'finish slides' ] + todo WHERE user_id = 'sam'; UPDATE users SET todo = todo + [ 'drink beer' ] WHERE user_id = 'sam';
  • 23. CQL 3.0 Lists ● Access elements by index – Less performant, requires read of entire list UPDATE users SET todo[2] = 'slow down' WHERE user_id = 'sam'; DELETE todo[3] FROM users WHERE user_id = 'sam'; ● Remove by value UPDATE users SET todo = todo – ['finish slides'] WHERE user_id = 'sam';
  • 24. CQL 3.0 Maps ● Collection of typed key/value pairs ● Like sets, maps are actually ordered – Ordering determined by the type of the keys ● Operate on entire collection or individual element by key
  • 25. CQL 3.0 Maps ALTER TABLE users ADD accounts map<text, text>; ● Set entire value with map literal UPDATE users SET accounts = { 'twitter' : 'beobal', 'irc' : 'sam___' } WHERE user_id = 'sam'; ● Access elements by key UPDATE users SET accounts['irc'] = 'beobal@freenode.net' WHERE user_id = 'sam'; DELETE accounts['twitter'] FROM users WHERE user_id = 'sam';
  • 26. CQL 3.0 Collections ● Can only retrieve a collection in its entirety – Collections are not intended to be very large – Not a replacement for a good data model ● Typed but cannot currently be nested – Can't define a list<list<int>> ● No support for secondary indexes – On the roadmap but not yet implemented
  • 27. CQL 3.0 Binary Protocol ● Motivations – Ability to optimize for specific use-cases – Reduce client dependencies – Lay groundwork for future enhancements such as streaming/paging
  • 28. CQL 3.0 Binary Protocol ● Framed protocol, designed natively for CQL 3.0 ● Built on Netty ● Java, Python & C drivers in development ● Switched off by default in 1.2 ● Thrift API will remain, no breaking API changes
  • 29. Virtual Nodes ● Evolution of Cassandra's distribution model ● Nodes own multiple token ranges ● Big implications for operations – Replacing failed nodes – Growing / shrinking cluster ● Stay put for Eric
  • 30. Concurrent Schema Changes ● Online schema changes introduced in 0.7 ● Initially schema changes needed serializing – Update schema – Wait for propagation before making next change ● Mostly fixed in 1.1 ● Completed in 1.2 – Concurrent ColumnFamily creation
  • 31. Better JBOD Support ● Growing data volumes and compaction requirements means moar disk ● Pre 1.2 best practice is to run RAID10 ● Improvements to handling disk failures – disk_failure_policy : stop | best_effort | ignore – best_effort implications for writes / reads ● Better write balancing for less contention
  • 32. Misc Other Stuff ● Query Tracing ● Speedup slicing wide rows from disk ● Atomic Batches
  • 33. Future Plans ● Improved caching for wide rows ● Remove 2-phase compaction ● Remove SuperColumns internally