SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Brian O‟Neill, Lead Architect, Health Market Science




                                                bone@alumni.brown.edu
                                                @boneill42
 Background
 Setup
 Data Model / Schema
 Naughty List (Astyanax)
 Toy List (CQL)
Our Problem




 Good, bad doctors? Dead doctors?
 Prescriber eligibility and remediation.
The World-Wide
Globally Scalable
Naughty List!
   How about a Naughty and
    Nice list for Santa?

   1.9 billion children
     That will fit in a single row!


   Queries to support:
     Children can login and check
      their standing.
     Santa can find nice children
      by country, state or zip.
Installation
   As easy as…
     Download
     http://cassandra.apache.org/download/

     Uncompress
     tar -xvzf apache-cassandra-1.2.0-beta3-bin.tar.gz

     Run
     bin/cassandra –f
      (-f puts it in foreground)
Configuration
   conf/cassandra.yaml
start_native_transport: true // CHANGE THIS TO TRUE
commitlog_directory: /var/lib/cassandra/commitlog



   conf/log4j-server.properties
log4j.appender.R.File=/var/log/cassandra/system.log
Data Model
 Schema (a.k.a. Keyspace)
 Table (a.k.a. Column Family)
 Row
     Have arbitrary #‟s of columns
     Validator for keys (e.g. UTF8Type)
   Column
     Validator for values and keys
     Comparator for keys (e.g. DateType or BYOC)

    (http://www.youtube.com/watch?v=bKfND4woylw)
Distributed Architecture
   Nodes form a token ring.

   Nodes partition the ring by initial token
     initial_token: (in cassandra.yaml)


   Partitioners map row keys to tokens.
     Usually randomly, to evenly distribute the data


   All columns for a row are stored together on disk
    in sorted order.
Visually
Row     Hash   Token/Hash Range : 0-99
Alice   50
Bob     3
Eve     15




                                  (1-33)
Java Interpretation
 Each table is a Distributed HashMap
 Each row is a SortedMap.

Cassandra provides a massively scalable version of:
HashMap<rowKey, SortedMap<columnKey, columnValue>


   Implications:
     Direct row fetch is fast.
     Searching a range of rows can be costly.
     Searching a range of columns is cheap.
Two Tables
 Children     Table
     Store all the children in the world.
     One row per child.
     One column per attribute.


   NaughtyOrNice Table
     Supports the queries we anticipate
     Wide-Row Strategy
Details of the NaughtyOrNice
List
   One row per standing:country
     Ensures all children in a country are grouped together
      on disk.

   One column per child using a compound key
     Ensures the columns are sorted to support our search
      at varying levels of granularity
      ○ e.g. All nice children in the US.
      ○ e.g. All naughty children in PA.
Visually                            Nice:USA
                           Node 1   CA:94333:johny.b.good
(1) Go to the row.                  CA:94333:richie.rich
(2) Get the column slice
                                    Nice:IRL
                           Node 2   D:EI33:collin.oneill
Watch out for:                      D:EI33:owen.oneill
• Hot spotting
• Unbalanced Clusters
                                    Nice:USA
                                    CA:94111:bart.simpson
                           Node 3
                                    CA:94222:dennis.menace
                                    PA:18964:michael.myers
Our Schema
   bin/cqlsh -3
       CREATE KEYSPACE northpole WITH replication = {'class':'SimpleStrategy',
        'replication_factor':1};

       create table children ( childId varchar, firstName varchar, lastName varchar, timezone varchar,
        country varchar, state varchar, zip varchar, primary key (childId ) ) WITH COMPACT STORAGE;

       create table naughtyOrNiceList ( standingByZone varchar, country varchar, state varchar, zip
        varchar, childId varchar, primary key (standingByZone, country, state, zip, childId) );




   bin/cassandra-cli
     (the “old school” interface)
The CQL->Data Model
Rules
   First primary key becomes the rowkey.

   Subsequent components of the primary key
    form a composite column name.

   One column is then written for each non-
    primary key column.
CQL View
cqlsh:northpole> select * from naughtyornicelist ;

 standingbycountry | state | zip | childid
-------------------+-------+-------+---------------
      naughty:USA | CA | 94111 | bart.simpson
      naughty:USA | CA | 94222 | dennis.menace
        nice:IRL | D | EI33 | collin.oneill
        nice:IRL | D | EI33 | owen.oneill
        nice:USA | CA | 94333 | johny.b.good
        nice:USA | CA | 94333 | richie.rich
CLI View
[default@northpole] list naughtyornicelist;
Using default limit of 100
Using default column limit of 100
-------------------
RowKey: naughty:USA
=> (column=CA:94111:bart.simpson:, value=, timestamp=1355168971612000)
=> (column=CA:94222:dennis.menace:, value=, timestamp=1355168971614000)
-------------------
RowKey: nice:IRL
=> (column=D:EI33:collin.oneill:, value=, timestamp=1355168971604000)
=> (column=D:EI33:owen.oneill:, value=, timestamp=1355168971601000)
-------------------
RowKey: nice:USA
=> (column=CA:94333:johny.b.good:, value=, timestamp=1355168971610000)
=> (column=CA:94333:richie.rich:, value=, timestamp=1355168971606000)
Data Model Implications
select * from children where childid='owen.oneill';

select * from naughtyornicelist where childid='owen.oneill';
Bad Request:

select * from naughtyornicelist where
standingbycountry='nice:IRL' and state='D' and zip='EI33'
and childid='owen.oneill';
No, seriously. Let‟s code!
   What API should we use?
                      Production-   Potential   Momentum
                      Readiness
    Thrift                10           -1          -1
    Hector                10           8           8
    Astyanax              8            9           10
    Kundera (JPA)         6            9           9
    Pelops                7            6           7
    Firebrand             8            10          8
    PlayORM               5            8           7
    GORA                  6            9           7
    CQL Driver            ?            ?           ?

                    Asytanax FTW!
Connect
this.astyanaxContext = new AstyanaxContext.Builder()
         .forCluster("ClusterName")
         .forKeyspace(keyspace)
         .withAstyanaxConfiguration(…)
         .withConnectionPoolConfiguration(…)
         .buildKeyspace(ThriftFamilyFactory.getInstance());


   Specify:
       Cluster Name (arbitrary identifier)
       Keyspace
       Node Discovery Method
       Connection Pool Information


Write/Update
MutationBatch mutation = keyspace.prepareMutationBatch();
columnFamily = new ColumnFamily<String, String>(columnFamilyName,
          StringSerializer.get(), StringSerializer.get());
mutation.withRow(columnFamily, rowKey)
         .putColumn(entry.getKey(), entry.getValue(), null);
mutation.execute();


   Process:
     Create a mutation
     Specify the Column Family with Serializers
     Put your columns.
     Execute
Composite Types
   Composite (a.k.a. Compound)

public class ListEntry {
  @Component(ordinal = 0)
  public String state;
  @Component(ordinal = 1)
  public String zip;
  @Component(ordinal = 2)
  public String childId;
}
Range Builders
range = entitySerializer.buildRange()
.withPrefix(state)
.greaterThanEquals("")
.lessThanEquals("99999");

Then...

.withColumnRange(range).execute();
CQL Collections!
http://www.datastax.com/dev/blog/cql3_collections

   Set
     UPDATE users SET emails = emails + {'fb@friendsofmordor.org'} WHERE
      user_id = 'frodo';

   List
     UPDATE users SET top_places = [ 'the shire' ] + top_places WHERE
      user_id = 'frodo';

   Maps
     UPDATE users SET todo['2012-10-2 12:10'] = 'die' WHERE user_id =
      'frodo';
CQL vs. Thrift
http://www.datastax.com/dev/blog/thrift-to-cql3

   Thrift is legacy API on which all of the Java
    APIs are built.

   CQL is the new native protocol and driver.
Let‟s get back to cranking…
   Recreate the schema (to be CQL friendly)
   UPDATE children SET toys = toys + [ „legos' ] WHERE childId = ‟owen.oneill‟;



   Crank out a Dao layer to use CQL collections
    operations.
Shameless Shoutout(s)
 Virgil
 https://github.com/boneill42/virgil
     REST interface for Cassandra


   https://github.com/boneill42/storm-cassandra
     Distributed Processing on Cassandra
     (Webinar in January)
C*ollege Credit: Creating Your First App in Java with Cassandra

Weitere ähnliche Inhalte

Was ist angesagt?

Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Ontico
 
glance replicator
glance replicatorglance replicator
glance replicator
irix_jp
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and Cassandra
Deependra Ariyadewa
 
Oracle 10g Performance: chapter 00 sampling
Oracle 10g Performance: chapter 00 samplingOracle 10g Performance: chapter 00 sampling
Oracle 10g Performance: chapter 00 sampling
Kyle Hailey
 
Oracle 10g Performance: chapter 09 enqueues
Oracle 10g Performance: chapter 09 enqueuesOracle 10g Performance: chapter 09 enqueues
Oracle 10g Performance: chapter 09 enqueues
Kyle Hailey
 

Was ist angesagt? (20)

CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
 
Rac nonrac clone
Rac nonrac cloneRac nonrac clone
Rac nonrac clone
 
Lab1-DB-Cassandra
Lab1-DB-CassandraLab1-DB-Cassandra
Lab1-DB-Cassandra
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
 
glance replicator
glance replicatorglance replicator
glance replicator
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and Cassandra
 
C* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick BransonC* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick Branson
 
Better Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQLBetter Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQL
 
MongoDB-SESSION03
MongoDB-SESSION03MongoDB-SESSION03
MongoDB-SESSION03
 
Oracle 10g Performance: chapter 00 sampling
Oracle 10g Performance: chapter 00 samplingOracle 10g Performance: chapter 00 sampling
Oracle 10g Performance: chapter 00 sampling
 
Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0
 
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
 
RestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message QueueRestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message Queue
 
How to Use JSON in MySQL Wrong
How to Use JSON in MySQL WrongHow to Use JSON in MySQL Wrong
How to Use JSON in MySQL Wrong
 
Mito, a successor of Integral
Mito, a successor of IntegralMito, a successor of Integral
Mito, a successor of Integral
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 
Full Text Search in PostgreSQL
Full Text Search in PostgreSQLFull Text Search in PostgreSQL
Full Text Search in PostgreSQL
 
Cassandra Summit 2014: Cassandra at Instagram 2014
Cassandra Summit 2014: Cassandra at Instagram 2014Cassandra Summit 2014: Cassandra at Instagram 2014
Cassandra Summit 2014: Cassandra at Instagram 2014
 
Oracle 10g Performance: chapter 09 enqueues
Oracle 10g Performance: chapter 09 enqueuesOracle 10g Performance: chapter 09 enqueues
Oracle 10g Performance: chapter 09 enqueues
 
Top Node.js Metrics to Watch
Top Node.js Metrics to WatchTop Node.js Metrics to Watch
Top Node.js Metrics to Watch
 

Andere mochten auch

Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
supertom
 
Intro to developing for @twitterapi
Intro to developing for @twitterapiIntro to developing for @twitterapi
Intro to developing for @twitterapi
Raffi Krikorian
 
A Sceptical Guide to Functional Programming
A Sceptical Guide to Functional ProgrammingA Sceptical Guide to Functional Programming
A Sceptical Guide to Functional Programming
Garth Gilmour
 
Effective akka scalaio
Effective akka scalaioEffective akka scalaio
Effective akka scalaio
shinolajla
 
Building ‘Bootiful’ microservices cloud
Building ‘Bootiful’ microservices cloudBuilding ‘Bootiful’ microservices cloud
Building ‘Bootiful’ microservices cloud
Idan Fridman
 

Andere mochten auch (20)

Effective cassandra development with achilles
Effective cassandra development with achillesEffective cassandra development with achilles
Effective cassandra development with achilles
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
 
Intro to developing for @twitterapi
Intro to developing for @twitterapiIntro to developing for @twitterapi
Intro to developing for @twitterapi
 
Chicago Hadoop Users Group: Enterprise Data Workflows
Chicago Hadoop Users Group: Enterprise Data WorkflowsChicago Hadoop Users Group: Enterprise Data Workflows
Chicago Hadoop Users Group: Enterprise Data Workflows
 
Spring 3.1 and MVC Testing Support - 4Developers
Spring 3.1 and MVC Testing Support - 4DevelopersSpring 3.1 and MVC Testing Support - 4Developers
Spring 3.1 and MVC Testing Support - 4Developers
 
Reactive Programming With Akka - Lessons Learned
Reactive Programming With Akka - Lessons LearnedReactive Programming With Akka - Lessons Learned
Reactive Programming With Akka - Lessons Learned
 
The no-framework Scala Dependency Injection Framework
The no-framework Scala Dependency Injection FrameworkThe no-framework Scala Dependency Injection Framework
The no-framework Scala Dependency Injection Framework
 
A Sceptical Guide to Functional Programming
A Sceptical Guide to Functional ProgrammingA Sceptical Guide to Functional Programming
A Sceptical Guide to Functional Programming
 
Effective akka scalaio
Effective akka scalaioEffective akka scalaio
Effective akka scalaio
 
Actor Based Asyncronous IO in Akka
Actor Based Asyncronous IO in AkkaActor Based Asyncronous IO in Akka
Actor Based Asyncronous IO in Akka
 
Efficient HTTP Apis
Efficient HTTP ApisEfficient HTTP Apis
Efficient HTTP Apis
 
Beginning Haskell, Dive In, Its Not That Scary!
Beginning Haskell, Dive In, Its Not That Scary!Beginning Haskell, Dive In, Its Not That Scary!
Beginning Haskell, Dive In, Its Not That Scary!
 
On Cassandra Development: Past, Present and Future
On Cassandra Development: Past, Present and FutureOn Cassandra Development: Past, Present and Future
On Cassandra Development: Past, Present and Future
 
Cassandra Development Nirvana
Cassandra Development Nirvana Cassandra Development Nirvana
Cassandra Development Nirvana
 
Software Development with Apache Cassandra
Software Development with Apache CassandraSoftware Development with Apache Cassandra
Software Development with Apache Cassandra
 
Successful Software Development with Apache Cassandra
Successful Software Development with Apache CassandraSuccessful Software Development with Apache Cassandra
Successful Software Development with Apache Cassandra
 
7. Jessica Stromback (VaasaETT) - Consumer Program Development in Europe Toda...
7. Jessica Stromback (VaasaETT) - Consumer Program Development in Europe Toda...7. Jessica Stromback (VaasaETT) - Consumer Program Development in Europe Toda...
7. Jessica Stromback (VaasaETT) - Consumer Program Development in Europe Toda...
 
Building ‘Bootiful’ microservices cloud
Building ‘Bootiful’ microservices cloudBuilding ‘Bootiful’ microservices cloud
Building ‘Bootiful’ microservices cloud
 
Effective Actors
Effective ActorsEffective Actors
Effective Actors
 
Curator intro
Curator introCurator intro
Curator intro
 

Ähnlich wie C*ollege Credit: Creating Your First App in Java with Cassandra

Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
Murat Çakal
 

Ähnlich wie C*ollege Credit: Creating Your First App in Java with Cassandra (20)

Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of IndifferenceRob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
 
Building generic data queries using python ast
Building generic data queries using python astBuilding generic data queries using python ast
Building generic data queries using python ast
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Data import-cheatsheet
Data import-cheatsheetData import-cheatsheet
Data import-cheatsheet
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Spark_Documentation_Template1
Spark_Documentation_Template1Spark_Documentation_Template1
Spark_Documentation_Template1
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
tidyr.pdf
tidyr.pdftidyr.pdf
tidyr.pdf
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
HPCC Systems - ECL for Programmers - Big Data - Data Scientist
HPCC Systems - ECL for Programmers - Big Data - Data ScientistHPCC Systems - ECL for Programmers - Big Data - Data Scientist
HPCC Systems - ECL for Programmers - Big Data - Data Scientist
 
R gráfico
R gráficoR gráfico
R gráfico
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
 
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2Cassandra Community Webinar  - Introduction To Apache Cassandra 1.2
Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
 
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2
 
R Cheat Sheet
R Cheat SheetR Cheat Sheet
R Cheat Sheet
 
Mindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developersMindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developers
 
Wheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility ModulesWheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility Modules
 
Comparative Genomics with GMOD and BioPerl
Comparative Genomics with GMOD and BioPerlComparative Genomics with GMOD and BioPerl
Comparative Genomics with GMOD and BioPerl
 

Mehr von DataStax

Mehr von DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

C*ollege Credit: Creating Your First App in Java with Cassandra

  • 1. Brian O‟Neill, Lead Architect, Health Market Science bone@alumni.brown.edu @boneill42
  • 2.  Background  Setup  Data Model / Schema  Naughty List (Astyanax)  Toy List (CQL)
  • 3. Our Problem  Good, bad doctors? Dead doctors?  Prescriber eligibility and remediation.
  • 4. The World-Wide Globally Scalable Naughty List!  How about a Naughty and Nice list for Santa?  1.9 billion children  That will fit in a single row!  Queries to support:  Children can login and check their standing.  Santa can find nice children by country, state or zip.
  • 5.
  • 6. Installation  As easy as…  Download http://cassandra.apache.org/download/  Uncompress tar -xvzf apache-cassandra-1.2.0-beta3-bin.tar.gz  Run bin/cassandra –f (-f puts it in foreground)
  • 7. Configuration  conf/cassandra.yaml start_native_transport: true // CHANGE THIS TO TRUE commitlog_directory: /var/lib/cassandra/commitlog  conf/log4j-server.properties log4j.appender.R.File=/var/log/cassandra/system.log
  • 8. Data Model  Schema (a.k.a. Keyspace)  Table (a.k.a. Column Family)  Row  Have arbitrary #‟s of columns  Validator for keys (e.g. UTF8Type)  Column  Validator for values and keys  Comparator for keys (e.g. DateType or BYOC) (http://www.youtube.com/watch?v=bKfND4woylw)
  • 9. Distributed Architecture  Nodes form a token ring.  Nodes partition the ring by initial token  initial_token: (in cassandra.yaml)  Partitioners map row keys to tokens.  Usually randomly, to evenly distribute the data  All columns for a row are stored together on disk in sorted order.
  • 10. Visually Row Hash Token/Hash Range : 0-99 Alice 50 Bob 3 Eve 15 (1-33)
  • 11. Java Interpretation  Each table is a Distributed HashMap  Each row is a SortedMap. Cassandra provides a massively scalable version of: HashMap<rowKey, SortedMap<columnKey, columnValue>  Implications:  Direct row fetch is fast.  Searching a range of rows can be costly.  Searching a range of columns is cheap.
  • 12.
  • 13. Two Tables  Children Table  Store all the children in the world.  One row per child.  One column per attribute.  NaughtyOrNice Table  Supports the queries we anticipate  Wide-Row Strategy
  • 14. Details of the NaughtyOrNice List  One row per standing:country  Ensures all children in a country are grouped together on disk.  One column per child using a compound key  Ensures the columns are sorted to support our search at varying levels of granularity ○ e.g. All nice children in the US. ○ e.g. All naughty children in PA.
  • 15. Visually Nice:USA Node 1 CA:94333:johny.b.good (1) Go to the row. CA:94333:richie.rich (2) Get the column slice Nice:IRL Node 2 D:EI33:collin.oneill Watch out for: D:EI33:owen.oneill • Hot spotting • Unbalanced Clusters Nice:USA CA:94111:bart.simpson Node 3 CA:94222:dennis.menace PA:18964:michael.myers
  • 16. Our Schema  bin/cqlsh -3  CREATE KEYSPACE northpole WITH replication = {'class':'SimpleStrategy', 'replication_factor':1};  create table children ( childId varchar, firstName varchar, lastName varchar, timezone varchar, country varchar, state varchar, zip varchar, primary key (childId ) ) WITH COMPACT STORAGE;  create table naughtyOrNiceList ( standingByZone varchar, country varchar, state varchar, zip varchar, childId varchar, primary key (standingByZone, country, state, zip, childId) );  bin/cassandra-cli  (the “old school” interface)
  • 17. The CQL->Data Model Rules  First primary key becomes the rowkey.  Subsequent components of the primary key form a composite column name.  One column is then written for each non- primary key column.
  • 18. CQL View cqlsh:northpole> select * from naughtyornicelist ; standingbycountry | state | zip | childid -------------------+-------+-------+--------------- naughty:USA | CA | 94111 | bart.simpson naughty:USA | CA | 94222 | dennis.menace nice:IRL | D | EI33 | collin.oneill nice:IRL | D | EI33 | owen.oneill nice:USA | CA | 94333 | johny.b.good nice:USA | CA | 94333 | richie.rich
  • 19. CLI View [default@northpole] list naughtyornicelist; Using default limit of 100 Using default column limit of 100 ------------------- RowKey: naughty:USA => (column=CA:94111:bart.simpson:, value=, timestamp=1355168971612000) => (column=CA:94222:dennis.menace:, value=, timestamp=1355168971614000) ------------------- RowKey: nice:IRL => (column=D:EI33:collin.oneill:, value=, timestamp=1355168971604000) => (column=D:EI33:owen.oneill:, value=, timestamp=1355168971601000) ------------------- RowKey: nice:USA => (column=CA:94333:johny.b.good:, value=, timestamp=1355168971610000) => (column=CA:94333:richie.rich:, value=, timestamp=1355168971606000)
  • 20. Data Model Implications select * from children where childid='owen.oneill'; select * from naughtyornicelist where childid='owen.oneill'; Bad Request: select * from naughtyornicelist where standingbycountry='nice:IRL' and state='D' and zip='EI33' and childid='owen.oneill';
  • 21.
  • 22. No, seriously. Let‟s code!  What API should we use? Production- Potential Momentum Readiness Thrift 10 -1 -1 Hector 10 8 8 Astyanax 8 9 10 Kundera (JPA) 6 9 9 Pelops 7 6 7 Firebrand 8 10 8 PlayORM 5 8 7 GORA 6 9 7 CQL Driver ? ? ? Asytanax FTW!
  • 23. Connect this.astyanaxContext = new AstyanaxContext.Builder() .forCluster("ClusterName") .forKeyspace(keyspace) .withAstyanaxConfiguration(…) .withConnectionPoolConfiguration(…) .buildKeyspace(ThriftFamilyFactory.getInstance());  Specify:  Cluster Name (arbitrary identifier)  Keyspace  Node Discovery Method  Connection Pool Information  
  • 24. Write/Update MutationBatch mutation = keyspace.prepareMutationBatch(); columnFamily = new ColumnFamily<String, String>(columnFamilyName, StringSerializer.get(), StringSerializer.get()); mutation.withRow(columnFamily, rowKey) .putColumn(entry.getKey(), entry.getValue(), null); mutation.execute();  Process:  Create a mutation  Specify the Column Family with Serializers  Put your columns.  Execute
  • 25. Composite Types  Composite (a.k.a. Compound) public class ListEntry { @Component(ordinal = 0) public String state; @Component(ordinal = 1) public String zip; @Component(ordinal = 2) public String childId; }
  • 26. Range Builders range = entitySerializer.buildRange() .withPrefix(state) .greaterThanEquals("") .lessThanEquals("99999"); Then... .withColumnRange(range).execute();
  • 27.
  • 28. CQL Collections! http://www.datastax.com/dev/blog/cql3_collections  Set  UPDATE users SET emails = emails + {'fb@friendsofmordor.org'} WHERE user_id = 'frodo';  List  UPDATE users SET top_places = [ 'the shire' ] + top_places WHERE user_id = 'frodo';  Maps  UPDATE users SET todo['2012-10-2 12:10'] = 'die' WHERE user_id = 'frodo';
  • 29. CQL vs. Thrift http://www.datastax.com/dev/blog/thrift-to-cql3  Thrift is legacy API on which all of the Java APIs are built.  CQL is the new native protocol and driver.
  • 30. Let‟s get back to cranking…  Recreate the schema (to be CQL friendly)  UPDATE children SET toys = toys + [ „legos' ] WHERE childId = ‟owen.oneill‟;  Crank out a Dao layer to use CQL collections operations.
  • 31. Shameless Shoutout(s)  Virgil  https://github.com/boneill42/virgil  REST interface for Cassandra  https://github.com/boneill42/storm-cassandra  Distributed Processing on Cassandra  (Webinar in January)