Diese Präsentation wurde erfolgreich gemeldet.

Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, DataStax) | C* Summit 2016

2

Teilen

Alex Petrov
Scalable time-series applications with Cassandra
Scalable Time Series
with Cassandra
@ifesdjeen
Cyanite
past, present and future

YouTube-Videos werden auf SlideShare nicht mehr unterstützt.

Original auf YouTube ansehen

1 von 52
1 von 52

Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, DataStax) | C* Summit 2016

2

Teilen

Herunterladen, um offline zu lesen

Beschreibung

Cassandra is awesome for many things. One of the things it's awesome for is Time Series. Combining the power of Cassandra with APIs of existing Time Series tools, such as Graphite can yield interesting results.

Cyanite is a Time Series aggregator and store built on top of Cassandra. It's fully compatible with Graphite, can serve as a plug-in replacement for Graphite and Graphite web.

Cyanite is using SASI indexes to make glob metric path queries, can query and aggregate, store, display and analyse metrics from hundreds and thousands of servers.

Which data modelling practices work best for Time Series, which new awesome Cassandra features you can use to make your Time Series analysis better.

About the Speaker
Alex Petrov Software Engineer, DataStax

Polyglot programmer. Interested in algorithms, distributed systems, algebra and high performance solutions.

Transkript

  1. 1. Alex Petrov Scalable time-series applications with Cassandra
  2. 2. Scalable Time Series with Cassandra @ifesdjeen
  3. 3. Cyanite past, present and future
  4. 4. Requirements
  5. 5. Throughput Incoming data Aggregates Read queries
  6. 6. Scalability Paths / metrics Historical data Readers / writers
  7. 7. vs raw Aggregated
  8. 8. Predefined list of reports to aggregate Reducing amount of data points Faster queries Precision loss due to aggregation
  9. 9. “servers.*.workers.busyWorkers”: “10s:7d,1m:21d,15m:5y”
  10. 10. Graphite
  11. 11. Store numeric time-series data Render graphs of this data on demand https://graphiteapp.org/
  12. 12. Ecosystem
  13. 13. carbon: listens for time-series data whisper: DB for storing TS data graphite-web: UI & API for graphs
  14. 14. Downsides Single-host solution Plenty of disk seeks Optimised for space Sharding / replication is “manual”
  15. 15. Scaling Cluster topology stored on every node Manual replication Changing cluster topology non-trivial
  16. 16. Scaling Stateless service Automatic shard assignment Replication Easy management Easy topology changes
  17. 17. reminds you of anything?
  18. 18. Cassandra perfect match
  19. 19. Cyanite Stateless Async I/O Custom scheduler No Whisper files Distributed Horizontally scalable
  20. 20. Cyanite responsibilities Carbon-compatible listener Aggregate data in-memory Flush aggregates to Cassandra Path storage Retrieve paths for query Aggregate the query results
  21. 21. Cassandra features Metric expiry, TTL User Defined Types SASI Indexes, LIKE queries Aggregate Functions Offline data loading Clustering range queries IN partition key locking
  22. 22. Globbing index
  23. 23. app.cluster.server.subsystem.metric
  24. 24. Built with SASI indexes Fast, scalable queries Optimised for glob
  25. 25. CREATE TABLE segment ( parent text, segment text, pos int, length int, leaf boolean, PRIMARY KEY (parent, segment))
  26. 26. app.cluster.server.subsystem.metric
  27. 27. SELECT * FROM segments WHERE parent = 'root' AND pos = 1 Wildcard: *
  28. 28. SELECT * FROM segments WHERE pos = 4 Postfix: *.*.*.*.metric
  29. 29. SELECT * FROM segments WHERE parent = 'app' AND pos = 2 ALLOW FILTERING Prefix: app.*
  30. 30. SELECT * FROM SEGMENTS WHERE pos = 3 AND segment LIKE 'abc.%' ALLOW FILTERING Suffix: abc.*.metric
  31. 31. engine Query
  32. 32. 2-Step transformations: Inner Outer Cross metric engineQuery
  33. 33. a.b.c1 a.b.c2 scale(a.b.*, 10.0) {"a.b.c1" [1 2 3] "a.b.c2" [5 6 7]} {"a.b.c1" [10 20 30] "a.b.c2" [50 60 70]}
  34. 34. a.b.c derivative(a.b.c) {"a.b.c" [1 3 6]} {“derivative(a.b.c)" [nil 2 3]}
  35. 35. a.b.c1 a.b.c2 sumSeries(a.b.c1,a.b.c2) {"a.b.c1" [1 1 1] "a.b.c2" [2 2 2]} {“sumSeries(a.b.c,a.b.d)" [3 3 3]}
  36. 36. model Data
  37. 37. CREATE TYPE IF NOT EXISTS metric_resolution ( precision int, period int ); CREATE TYPE IF NOT EXISTS metric_id ( path text, resolution frozen<metric_resolution> );
  38. 38. CREATE TYPE IF NOT EXISTS metric_point ( max double, mean double, min double, sum double );
  39. 39. Cassandra Aggregates
  40. 40. Cassandra aggregates Locked partition Flexible query definition Local aggregation Reduced raw data chatter
  41. 41. CREATE OR REPLACE FUNCTION sumState (state map<bigint, double>, ts bigint, val metric_point) CALLED ON NULL INPUT RETURNS map<bigint, double> LANGUAGE java AS $$ if (state.containsKey(ts)) { state.put(ts, state.get(ts) + val.getDouble("mean")); } else { state.put(ts, val.getDouble("mean")); } return state; $$;
  42. 42. CREATE OR REPLACE FUNCTION sumFinal (state map<bigint, double>) CALLED ON NULL INPUT RETURNS map<bigint, double> LANGUAGE java AS 'return state;';
  43. 43. SELECT sum(time, point) FROM metric WHERE id IN ({path: 'a', resolution: {precision: 1, period: 3600}}, {path: 'b', resolution: {precision: 1, period: 3600}}) AND time > 0 AND time < 5;
  44. 44. Scaling
  45. 45. Scaling cyanite Use DTSC Cyanite is stateless For high loads, split readers and writers Load-balance with HAProxy Colocate Cyanite & Cassandra
  46. 46. soon Coming
  47. 47. P-Square histograms T-Digest quantiles Gorilla Compression Cassandra aggregates Custom glob indexes Scheduler soonComing
  48. 48. Kafka ingester Statsd protocol support Standalone cyanite Dynamic thresholds soonComing

Beschreibung

Cassandra is awesome for many things. One of the things it's awesome for is Time Series. Combining the power of Cassandra with APIs of existing Time Series tools, such as Graphite can yield interesting results.

Cyanite is a Time Series aggregator and store built on top of Cassandra. It's fully compatible with Graphite, can serve as a plug-in replacement for Graphite and Graphite web.

Cyanite is using SASI indexes to make glob metric path queries, can query and aggregate, store, display and analyse metrics from hundreds and thousands of servers.

Which data modelling practices work best for Time Series, which new awesome Cassandra features you can use to make your Time Series analysis better.

About the Speaker
Alex Petrov Software Engineer, DataStax

Polyglot programmer. Interested in algorithms, distributed systems, algebra and high performance solutions.

Transkript

  1. 1. Alex Petrov Scalable time-series applications with Cassandra
  2. 2. Scalable Time Series with Cassandra @ifesdjeen
  3. 3. Cyanite past, present and future
  4. 4. Requirements
  5. 5. Throughput Incoming data Aggregates Read queries
  6. 6. Scalability Paths / metrics Historical data Readers / writers
  7. 7. vs raw Aggregated
  8. 8. Predefined list of reports to aggregate Reducing amount of data points Faster queries Precision loss due to aggregation
  9. 9. “servers.*.workers.busyWorkers”: “10s:7d,1m:21d,15m:5y”
  10. 10. Graphite
  11. 11. Store numeric time-series data Render graphs of this data on demand https://graphiteapp.org/
  12. 12. Ecosystem
  13. 13. carbon: listens for time-series data whisper: DB for storing TS data graphite-web: UI & API for graphs
  14. 14. Downsides Single-host solution Plenty of disk seeks Optimised for space Sharding / replication is “manual”
  15. 15. Scaling Cluster topology stored on every node Manual replication Changing cluster topology non-trivial
  16. 16. Scaling Stateless service Automatic shard assignment Replication Easy management Easy topology changes
  17. 17. reminds you of anything?
  18. 18. Cassandra perfect match
  19. 19. Cyanite Stateless Async I/O Custom scheduler No Whisper files Distributed Horizontally scalable
  20. 20. Cyanite responsibilities Carbon-compatible listener Aggregate data in-memory Flush aggregates to Cassandra Path storage Retrieve paths for query Aggregate the query results
  21. 21. Cassandra features Metric expiry, TTL User Defined Types SASI Indexes, LIKE queries Aggregate Functions Offline data loading Clustering range queries IN partition key locking
  22. 22. Globbing index
  23. 23. app.cluster.server.subsystem.metric
  24. 24. Built with SASI indexes Fast, scalable queries Optimised for glob
  25. 25. CREATE TABLE segment ( parent text, segment text, pos int, length int, leaf boolean, PRIMARY KEY (parent, segment))
  26. 26. app.cluster.server.subsystem.metric
  27. 27. SELECT * FROM segments WHERE parent = 'root' AND pos = 1 Wildcard: *
  28. 28. SELECT * FROM segments WHERE pos = 4 Postfix: *.*.*.*.metric
  29. 29. SELECT * FROM segments WHERE parent = 'app' AND pos = 2 ALLOW FILTERING Prefix: app.*
  30. 30. SELECT * FROM SEGMENTS WHERE pos = 3 AND segment LIKE 'abc.%' ALLOW FILTERING Suffix: abc.*.metric
  31. 31. engine Query
  32. 32. 2-Step transformations: Inner Outer Cross metric engineQuery
  33. 33. a.b.c1 a.b.c2 scale(a.b.*, 10.0) {"a.b.c1" [1 2 3] "a.b.c2" [5 6 7]} {"a.b.c1" [10 20 30] "a.b.c2" [50 60 70]}
  34. 34. a.b.c derivative(a.b.c) {"a.b.c" [1 3 6]} {“derivative(a.b.c)" [nil 2 3]}
  35. 35. a.b.c1 a.b.c2 sumSeries(a.b.c1,a.b.c2) {"a.b.c1" [1 1 1] "a.b.c2" [2 2 2]} {“sumSeries(a.b.c,a.b.d)" [3 3 3]}
  36. 36. model Data
  37. 37. CREATE TYPE IF NOT EXISTS metric_resolution ( precision int, period int ); CREATE TYPE IF NOT EXISTS metric_id ( path text, resolution frozen<metric_resolution> );
  38. 38. CREATE TYPE IF NOT EXISTS metric_point ( max double, mean double, min double, sum double );
  39. 39. Cassandra Aggregates
  40. 40. Cassandra aggregates Locked partition Flexible query definition Local aggregation Reduced raw data chatter
  41. 41. CREATE OR REPLACE FUNCTION sumState (state map<bigint, double>, ts bigint, val metric_point) CALLED ON NULL INPUT RETURNS map<bigint, double> LANGUAGE java AS $$ if (state.containsKey(ts)) { state.put(ts, state.get(ts) + val.getDouble("mean")); } else { state.put(ts, val.getDouble("mean")); } return state; $$;
  42. 42. CREATE OR REPLACE FUNCTION sumFinal (state map<bigint, double>) CALLED ON NULL INPUT RETURNS map<bigint, double> LANGUAGE java AS 'return state;';
  43. 43. SELECT sum(time, point) FROM metric WHERE id IN ({path: 'a', resolution: {precision: 1, period: 3600}}, {path: 'b', resolution: {precision: 1, period: 3600}}) AND time > 0 AND time < 5;
  44. 44. Scaling
  45. 45. Scaling cyanite Use DTSC Cyanite is stateless For high loads, split readers and writers Load-balance with HAProxy Colocate Cyanite & Cassandra
  46. 46. soon Coming
  47. 47. P-Square histograms T-Digest quantiles Gorilla Compression Cassandra aggregates Custom glob indexes Scheduler soonComing
  48. 48. Kafka ingester Statsd protocol support Standalone cyanite Dynamic thresholds soonComing

Weitere Verwandte Inhalte

Weitere von DataStax

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

×