SlideShare ist ein Scribd-Unternehmen logo
1 von 16
#Cassandra13
Real World, Real-Time
Data Modeling
(for analytics apps)
Tim Moreton
Founder and CTO, Acunu
#Cassandra13
Virtual nodes CQL Support
WE C*
#Cassandra13
•e.g Click stream, telemetry, logs
•100x more writes than reads
•Almost all reads are to results
•Almost no writes are ‘updates’
•Really not going to fit in RAM
Real-time analytics
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
•e.g User profiles
•Create, Read, Update, Delete
•Probably mostly reads
•Probably wants atomicity
•Probably fits in RAM
Session storage
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
What folk use C* for
#Cassandra13
Real-time analytics
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
Session storage
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
02:44:02 241.24.41 0.0.1 GET /index.html
What folk use C* for
S WP HA ACIDS WP HA ACID
#Cassandra13
Real-time analytics
What folk use C* for
Session storage
#Cassandra13
Example use case
{
time: 13:50:11,
latitude: 12.5,
longitude: -43.4,
duration: 24,
device_type: ..
}
Call detail records
tens thousands/sec
Real-time dashboards
#Cassandra13
C* Data Modeling 101
• Denormalise: Writes (and disk) are
cheap, reads are expensive:
insert data in every arrangement that
you need to read it
• Items you’ll access together, and want
sorted: put in the same row
• Sets of items you’re likely to access
separately: keep in separate rows
• Atomic counters are the building block
of Cassandra real-time analytics apps
row2
row3
row1
One event
update
One query read
#Cassandra13
#1: Hierarchies
13:00 ... :01→45 :02→62 :03→87
<day> ... :12→2930 :13→3520 :14→3034
13:01 ... :10→3 :11→4 :12→2
14:00
13:02
......
Counting
occurrences
by day, hour,
min, sec
One row for each
value at each level in
the hierarchy
Columns encode sub-components for each level
#Cassandra13
#1: Hierarchies
{
time: 13:02:11,
....
}
13:00 ... :01→45 :02→62 :03→87
<day> ... :12→2930 :13→3520 :14→3034
13:01 ... :10→3 :11→4 :12→2
14:00
13:02
......
11:59
-> 13:02
Counting
occurrences
by day, hour,
min, sec
#Cassandra13
#2: Filtering
{
time: 13:50:11,
device_type : xx,
}
13:00 ... :01→45 :02→62 :03→87
xx
yy
<day>
xx
yy
13:01
xx
yy
14:00
13:02
xx
yy
......
Adding
‘WHERE’s
To filter on a field,
make sure it is in the
partition key
#Cassandra13
#3: Grouping
{
time: 13:50:11,
device_type : xx,
}
Adding
‘GROUP BY’
13:00 ... :01, xx→45 :01, yy→3 :02, xx→7
<day> ... :12, xx→1012 :12,yy→542 :13,xx→228
14:00
......
#Cassandra13
#4: Drilldown
13:00 ... :01, e3→- :01, e4→- :02, e5→-
<day> ... :12, e1→- :12,e2→- :13,e3→-
14:00
......
Going from
counts to the
constituent events
{
_id: e3,
time: 13:01:11,
device_type : xx,
}
e3 time → 13:01:11 device_type → xx ...
Use an identifier in the column key and store
the event in a different ColumnFamily
#Cassandra13
Put it together...
Source: http://paintcutpaste.com/pollock-splatter-painting/
#Cassandra13
Schema agility
Source: http://thoughtstream-distantechoes.blogspot.com/2011/06/13062011_13.html
#Cassandra13
API
event
stream
event
store
roll-up
cubes
Ingest
Processing
dashboard queries programatic interface
API
event
stream
event
store
roll-up
cubes
Ingest
Processing
dashboard queries programatic interface
Cassandra stores raw events, aggregates, data model definition
Acunu Analytics maps events and SQL-like queries into C* ops
API
event
stream
event
store
roll-up
cubes
Ingest
Processing
dashboard queries programatic interfacePROCESSING AT INGEST
JSON, CSV, log ingest
via RESTful HTTP API,
Flume, Storm, AMQP
Storm, MQ HTTP
Acunu Dashboards provides rich, real-time,
embeddable visualizations
SELECT AVG(r)
FROM metrics
GROUP BY host;
AQL Alerting
!
Cubes
MILLISECOND QUERIES
API
event
stream
event
store
roll-up
cubes
Ingest
Processing
dashboard queries programatic interface
API for rich queries,
threshold alerting
Backfill historic results
for new cubes to enable
agile schema changes
#Cassandra13 Apache,Apache Cassandra, Cassandra, Flume, and the eye logos
are trademarks of the Apache Software Foundation.
@timmoreton
@acunu
Thanks!

Weitere ähnliche Inhalte

Mehr von DataStax Academy

Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and DriversDataStax Academy
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph DatabasesDataStax Academy
 

Mehr von DataStax Academy (20)

Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph Databases
 

Kürzlich hochgeladen

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Kürzlich hochgeladen (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

C* Summit 2013: Real World, Real Time Data Modeling by Tim Moreton

  • 1. #Cassandra13 Real World, Real-Time Data Modeling (for analytics apps) Tim Moreton Founder and CTO, Acunu
  • 3. #Cassandra13 •e.g Click stream, telemetry, logs •100x more writes than reads •Almost all reads are to results •Almost no writes are ‘updates’ •Really not going to fit in RAM Real-time analytics 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html •e.g User profiles •Create, Read, Update, Delete •Probably mostly reads •Probably wants atomicity •Probably fits in RAM Session storage 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html What folk use C* for
  • 4. #Cassandra13 Real-time analytics 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html Session storage 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html 02:44:02 241.24.41 0.0.1 GET /index.html What folk use C* for S WP HA ACIDS WP HA ACID
  • 5. #Cassandra13 Real-time analytics What folk use C* for Session storage
  • 6. #Cassandra13 Example use case { time: 13:50:11, latitude: 12.5, longitude: -43.4, duration: 24, device_type: .. } Call detail records tens thousands/sec Real-time dashboards
  • 7. #Cassandra13 C* Data Modeling 101 • Denormalise: Writes (and disk) are cheap, reads are expensive: insert data in every arrangement that you need to read it • Items you’ll access together, and want sorted: put in the same row • Sets of items you’re likely to access separately: keep in separate rows • Atomic counters are the building block of Cassandra real-time analytics apps row2 row3 row1 One event update One query read
  • 8. #Cassandra13 #1: Hierarchies 13:00 ... :01→45 :02→62 :03→87 <day> ... :12→2930 :13→3520 :14→3034 13:01 ... :10→3 :11→4 :12→2 14:00 13:02 ...... Counting occurrences by day, hour, min, sec One row for each value at each level in the hierarchy Columns encode sub-components for each level
  • 9. #Cassandra13 #1: Hierarchies { time: 13:02:11, .... } 13:00 ... :01→45 :02→62 :03→87 <day> ... :12→2930 :13→3520 :14→3034 13:01 ... :10→3 :11→4 :12→2 14:00 13:02 ...... 11:59 -> 13:02 Counting occurrences by day, hour, min, sec
  • 10. #Cassandra13 #2: Filtering { time: 13:50:11, device_type : xx, } 13:00 ... :01→45 :02→62 :03→87 xx yy <day> xx yy 13:01 xx yy 14:00 13:02 xx yy ...... Adding ‘WHERE’s To filter on a field, make sure it is in the partition key
  • 11. #Cassandra13 #3: Grouping { time: 13:50:11, device_type : xx, } Adding ‘GROUP BY’ 13:00 ... :01, xx→45 :01, yy→3 :02, xx→7 <day> ... :12, xx→1012 :12,yy→542 :13,xx→228 14:00 ......
  • 12. #Cassandra13 #4: Drilldown 13:00 ... :01, e3→- :01, e4→- :02, e5→- <day> ... :12, e1→- :12,e2→- :13,e3→- 14:00 ...... Going from counts to the constituent events { _id: e3, time: 13:01:11, device_type : xx, } e3 time → 13:01:11 device_type → xx ... Use an identifier in the column key and store the event in a different ColumnFamily
  • 13. #Cassandra13 Put it together... Source: http://paintcutpaste.com/pollock-splatter-painting/
  • 15. #Cassandra13 API event stream event store roll-up cubes Ingest Processing dashboard queries programatic interface API event stream event store roll-up cubes Ingest Processing dashboard queries programatic interface Cassandra stores raw events, aggregates, data model definition Acunu Analytics maps events and SQL-like queries into C* ops API event stream event store roll-up cubes Ingest Processing dashboard queries programatic interfacePROCESSING AT INGEST JSON, CSV, log ingest via RESTful HTTP API, Flume, Storm, AMQP Storm, MQ HTTP Acunu Dashboards provides rich, real-time, embeddable visualizations SELECT AVG(r) FROM metrics GROUP BY host; AQL Alerting ! Cubes MILLISECOND QUERIES API event stream event store roll-up cubes Ingest Processing dashboard queries programatic interface API for rich queries, threshold alerting Backfill historic results for new cubes to enable agile schema changes
  • 16. #Cassandra13 Apache,Apache Cassandra, Cassandra, Flume, and the eye logos are trademarks of the Apache Software Foundation. @timmoreton @acunu Thanks!