Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 33 Anzeige

Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset

Herunterladen, um offline zu lesen

Streaming data systems have been growing rapidly in importance to the modern data stack. Kafka’s kSQL provides an interface for analytic tools that speak SQL. Apache Superset, the most popular modern open-source visualization and analytics solution, plugs into nearly any data source that speaks SQL, including Kafka. Here, we review and compare methods for connecting Kafka to Superset to enable streaming analytics use cases including anomaly detection, operational monitoring, and online data integration.

Streaming data systems have been growing rapidly in importance to the modern data stack. Kafka’s kSQL provides an interface for analytic tools that speak SQL. Apache Superset, the most popular modern open-source visualization and analytics solution, plugs into nearly any data source that speaks SQL, including Kafka. Here, we review and compare methods for connecting Kafka to Superset to enable streaming analytics use cases including anomaly detection, operational monitoring, and online data integration.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset (20)

Anzeige

Weitere von HostedbyConfluent (20)

Aktuellste (20)

Anzeige

Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset

  1. 1. Streaming Data Analytics with ksqlDB and Superset w/ Robert Stolz Email: robert@preset.io GitHub: garden-of-delete Find me on the Superset Slack!
  2. 2. Who am I? 2 ● Data Engineer and Developer Advocate @ Preset ● Background in scientific research, computational biology, mathematics, open-source software ● Data architecture and best practices nerd ● New(ish) to Kafka
  3. 3. Agenda 3 • The history and anatomy of Apache Superset • What superset offers a streaming data architecture • Streaming analytics w/ Kafka: paths and challenges Feel free to ask questions as they come up Keep an eye out for this series on the Preset Blog!
  4. 4. Apache Superset 4
  5. 5. Apache Superset 2019 2021 2015 Version 1.0, ASF incubator graduation 5
  6. 6. Dynamic Dashboards Dashboard filters and Jinja templating enable end-users to drill deeper into data No Code Exploration Create beautiful, complex charts from your data without having to write any code SQL Lab State of the art SQL IDE with a rich metadata browser for deeper analysis Rich Visualizations Beautiful array of interactive visualizations including geospatial Granular Permissions Row level security, configurable data policies Semantic Layer Support for virtual columns, virtual tables, view creation, and more Caching Reduce load on the database - faster queries, faster results Modern Datastack Support Connect to any SQL speaking database, including popular cloud data warehouses and SQL engines Alerts & Reports Get notified via Slack or email when dips or spikes happen in your data Custom Viz Plugins Build your own custom visualization plug-in or connect to popular 3rd party plug-ins 6 Apache Superset
  7. 7. Superset speaks SQL via SQLAlchemy 7
  8. 8. Who uses Apache Superset? and hundreds more... 8
  9. 9. Value proposition of open-source BI ● Extensibility: custom analytics, embedding, piecemeal ● Control: avoid vendor lock-in ● Cost: free to use and modify, but can be expensive to maintain an enterprise deployment ● Quality: open-source is a better process for making software 9
  10. 10. Superset’s lightweight semantic layer SQL speaking datasources React front-end Python back-end + semantic layer 10
  11. 11. Explore
  12. 12. Explore: in-chart analytics
  13. 13. SQL Lab
  14. 14. Dashboard
  15. 15. Dashboard: Native Filters
  16. 16. Dashboard: Drag and Drop Editing
  17. 17. Dashboard: Alerts and Reports
  18. 18. Why connect streaming data to the BI layer? ● BI is one of the primary sensory organs of modern organizations ● Faster well-informed decision-making is a generally desirable thing ● Many more specific business use-cases require fast response to external events ○ Anomaly detection ○ Location and time-sensitive services ○ Extreme event monitoring ○ Visualizing and analyzing a real-world process that is constantly evolving
  19. 19. The Question Want to understand: what paths exist for getting streaming data from Kafka into Superset? (and more generally into the BI/analytics layer) Distinct from wanting to analyze metadata from a kafka deployment
  20. 20. Best practice: Intermediate datastore ? Want to understand: what paths exist for getting streaming data from Kafka into Superset? (and more generally into the BI/analytics layer) Distinct from wanting to analyze metadata from a kafka deployment
  21. 21. Direct connection - Connect Kafka directly to Superset - The most naive approach
  22. 22. Direct connection - Superset would need to consume data from Kafka topics directly - Undesirable to have data live in the BI/Analytics layer
  23. 23. Streaming Analytics w/ Superset + ksqlDB - ksqlDB provides a SQL speaking interface for data in Kafka topics - Powered by Kafka’s stream processing framework
  24. 24. Streaming Analytics w/ Superset + ksqlDB - No SQLAlchemy dialect for ksqlDB (as of today) - Probably undesirable to have historical data, complex aggregates, etc accessible only through Kafka’s stream-processing framework
  25. 25. Best-practice: Intermediate datastore - Desirable properties: high write-volume, robust support for event data, low read-after-write latency, integrated kafka consumer ?
  26. 26. Best-practice: Intermediate datastore - Desirable properties: high write-volume, robust support for time- series data, low read-after-write latency, integrated kafka consumer - Druid, Clickhouse, Rockset, Pinot, Cassandra, etc ...
  27. 27. How to choose the right datastore?
  28. 28. Path 1: Integrated consumer - Integrated consumers ingest event data directly from Kafka topics - Transformation can be handled by the datastore or by kafka streams - Best performance, limited flexibility in choice of datastore
  29. 29. Path 2: ksqlDB connection - Some transformation tasks are handled by ksqlDB (Kafka Streams) - Expands the list of possible intermediate datastores
  30. 30. Path 3: Ad-hoc consumers - Maximum flexibility around choice of datastore - Comes at the expense of performance - Can be harder to maintain
  31. 31. Superset fits into batch and streaming data architectures Src: Designing Cloud Data Platforms by Danil Zburivsky and Lynda Partner
  32. 32. Manual Setup • Complex set-up • Maximum control over configuration • Good for enterprise deployments • Advanced features require additional set-up (Async Queries, Query Caching, Prophet integration, Dashboard thumbnails, Alerts and Reports) Docker-compose • Easiest set-up • Great for trying out Superset and local development • Some features are part of the stack by default (caching) and some aren’t (alerts and reports, prophet integration) Preset Cloud • No set-up • Good for individual evaluation all the way up to enterprise needs • All advanced Superset features available • Still FREE for small teams! Three ways to run Superset
  33. 33. Streaming Data Analytics with ksqlDB and Superset w/ Robert Stolz Email: robert@preset.io GitHub: garden-of-delete Find me on the Superset Slack!

×