Take a deep dive into understanding best practices for Cassandra data modelin,g with a review of a time series data modeling example. Partition key selection, data duplication, in place aggregation, as well as using TTL's and DateTieredCompaction to positive effect will all be covered.
4. Parking Sensors?
• Wireless / Battery Powered
• On Street Parking
• Cellular Network Gateways
• Real Time Acquisition / Delivery
5. Use Cases
• Real Time Parking Availability
• Utilization Analysis and Planning
• Predictive Analytics
• Directed Enforcement
• Operations, Performance and PM
12. • Level of Granularity
• Clustering Columns
• Enable range queries
• Managing Aggregations
Guidelines (time series)
13. 1 Sensor Message Archive
2 Parking Sessions
3 Current Space Status
Sample Models
14. • Every sensor message received
• Newer messages more important
• Show list of recent events (20)
• Validate message processing
• Drive sensor health reporting
Sensor Message Archive
20. Options
Set a default TTL
AND default_time_to_live = 157680000
(5 years)
Use DateTieredCompactionStrategy
21. DateTieredCompactionStrategy (DTCS)
• Frequently higher performing for time series data
• Manages compaction by age of data in each SSTABLE
• Can significantly reduce compaction overhead
Best Practices
• Data is written in time order
• Data is immutable
• Old data is infrequently accessed
22. 1 Sensor Message Archive
2 Parking Sessions
3 Current Space Status
Sample Models
24. • Start / End of each space being occupied
• Newer messages more important
• Show last session for a space
• Calculate occupancy / vacancy ratio
• Calculate space turnover (number of sessions in time period)
Parking Session
25. Parking Session
VALIDATE
• Duplication
• Partition Size:
– 25 * 365 * 3 = 27,365
• Hot Spotting
• Overwriting
space K
session_start C(D)
session_end
inferred
session
26. Parking Session
space K
session_start C(D)
session_end
inferred
session
create table session(
space text,
session_start text,
session_end timestamp,
inferred text,
PRIMARY KEY (space, session_start)
)
WITH CLUSTERING ORDER BY (session_start DESC);
27. 1 Sensor Message Archive
2 Parking Sessions
3 Current Space Status
Sample Models
29. • Last status change for every space
• Only one status per space
• High read / low write / almost no deletes
• Feed public way finding applications
Current Space Status
30. Current Space Status
space K
customer
district
block
status
last_change
current_status
QUERIES
• Current Status
– by space
– by block
– by district
– by customer
31. Current Space Status
customer K
district C
block C
space
status
last_change
current_status
QUERIES
• Current Status
– by space
– by block
– by district
– by customer
32. Current Space Status
VALIDATE
• Duplication
• Partition Size:
– 10,000
• Hot Spotting
• Overwriting
customer K
district C
block C
space C
status
last_change
current_status
33. Current Space Status
VALIDATE
• Duplication
• Partition Size:
– 10,000
• Hot Spotting
• Overwriting
customer K
district K
block C
street C
status
last_change
current_status
34. Current Space Status
customer K
district K
block C
space C
status
last_change
current_status
create table current_status(
customer text,
district text,
block text,
street text,
space text,
status text,
last_change timestamp,
PRIMARY KEY ((customer, district), block, space)
)
35. Options
customer K
district K
block C
space C
status
last_change
current_status
Use In Memory Table?
• Relatively small foot print of data set
• Very few deletes
• Significant mutation in place
• Very high read / low latency requirements
36. Current Space Status
customer K
district K
block C
space C
status
last_change
current_status
create table current_status(
customer text,
district text,
block text,
space text,
status text,
last_change timestamp,
PRIMARY KEY (customer, district), block, space)
) WITH compaction= { 'class': 'MemoryOnlyStrategy',
'size_limit_in_mb': 25 } AND caching = 'NONE';
37. Results
create table current_status(
customer text,
district text,
block text,
space text,
status text,
last_change timestamp,
PRIMARY KEY (customer, district), block, street, status)
) WITH compaction= { 'class': 'MemoryOnlyStrategy',
'size_limit_in_mb': 25 } AND caching = 'NONE';
create table sensor_message(
sensor_id text,
year_month text,
event_time timestamp,
arr_time timestamp,
space text,
event text,
battery text,
msg_count int,
gateway text,
PRIMARY KEY ((sensor_id, year_month), event_time )
)WITH CLUSTERING ORDER BY (event_time DESC);
create table last_sensor_message(
sensor_id text,
event_time timestamp,
arr_time timestamp,
space text,
event text,
battery text,
msg_count int,
gateway text,
PRIMARY KEY (sensor_id)
)
create table session(
space text,
session_start text,
session_end timestamp,
inferred text,
PRIMARY KEY (space, session_start)
)
WITH CLUSTERING ORDER BY (session_start DESC);