Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Streaming Visualization

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 43 Anzeige

Streaming Visualization

Herunterladen, um offline zu lesen

Most data visualisation solutions today still work on data sources which are stored persistently in a data store, using the so called “data at rest” paradigms. More and more data sources today provide a constant stream of data, from IoT devices to Social Media streams. These data stream publish with high velocity and messages often have to be processed as quick as possible. For the processing and analytics on the data, so called stream processing solutions are available. But these only provide minimal or no visualisation capabilities. One was is to first persist the data into a data store and then use a traditional data visualisation solution to present the data.
If latency is not an issue, such a solution might be good enough. An other question is which data store solution is necessary to keep up with the high load on write and read. If it is not an RDBMS but an NoSQL database, then not all traditional visualisation tools might already integrate with the specific data store. An other option is to use a Streaming Visualisation solution. They are specially built for streaming data and often do not support batch data. A much better solution would be to have one tool capable of handling both, batch and streaming data. This talk presents different architecture blueprints for integrating data visualisation into a fast data solution and highlights some of the products available to implement these blueprints.

Most data visualisation solutions today still work on data sources which are stored persistently in a data store, using the so called “data at rest” paradigms. More and more data sources today provide a constant stream of data, from IoT devices to Social Media streams. These data stream publish with high velocity and messages often have to be processed as quick as possible. For the processing and analytics on the data, so called stream processing solutions are available. But these only provide minimal or no visualisation capabilities. One was is to first persist the data into a data store and then use a traditional data visualisation solution to present the data.
If latency is not an issue, such a solution might be good enough. An other question is which data store solution is necessary to keep up with the high load on write and read. If it is not an RDBMS but an NoSQL database, then not all traditional visualisation tools might already integrate with the specific data store. An other option is to use a Streaming Visualisation solution. They are specially built for streaming data and often do not support batch data. A much better solution would be to have one tool capable of handling both, batch and streaming data. This talk presents different architecture blueprints for integrating data visualisation into a fast data solution and highlights some of the products available to implement these blueprints.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Streaming Visualization (20)

Anzeige

Weitere von Guido Schmutz (20)

Aktuellste (20)

Anzeige

Streaming Visualization

  1. 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Streaming Visualization Guido Schmutz DOAG Big Data 2018 – 20.9.2018 @gschmutz guidoschmutz.wordpress.com
  2. 2. Guido Schmutz Working at Trivadis for more than 21 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz
  3. 3. Agenda 1. Visualization in Big Data Reference Architecture 2. How to implement „Data-in-Motion“? 3. Blueprints for Streaming Visualization 4. Blueprints for Stream Visualization – Implementation
  4. 4. Visualization in Big Data Reference Architecture
  5. 5. Data Value Chain Milliseconds • Place Trace • Serve ad • Enrich Stream • Approve Trans Hundredths of Seconds • Calculate Risk • Leaderboard • Aggregate • Count Second(s) • Retrieve Click Stream • Show orders Minutes • Backtest algo • BI • Daily Reports Hours • Algo discovery • Log analysis • Fraud pattern match Architekturen von Big Data Anwendungen
  6. 6. Traditional BI Infrastructures Enterprise Data Warehouse ETL / Stored Procedures Bulk Source DB Extract File DB Architekturen von Big Data Anwendungen BI Tools Search / Explore Enterprise Apps Logic { } API high latency
  7. 7. Bulk Source Hadoop Clusterd Hadoop Cluster Big Data Platform BI Tools Enterprise Data Warehouse SQL Search / Explore Parallel Processing Storage Storage RawRefined Results high latency Enterprise Apps Logic { } API File Import / SQL Import DB Extract File DB Big Data solves Volume and Variety – not Velocity Introduction to Stream Processing
  8. 8. Bulk Source Hadoop Clusterd Hadoop Cluster Big Data Platform BI Tools Enterprise Data Warehouse SQL Search / Explore Parallel Processing Storage Storage RawRefined Results high latency Enterprise Apps Logic { } API File Import / SQL Import DB Extract File DB Event Source Location Telemetry IoT Data Mobile Apps Social Big Data solves Volume and Variety – not Velocity Introduction to Stream Processing Event Stream
  9. 9. Bulk Source Hadoop Clusterd Hadoop Cluster Big Data Platform BI Tools Enterprise Data Warehouse SQL Search / Explore • Machine Learning • Graph Algorithms • Natural Language Processing Parallel Processing Storage Storage RawRefined Results high latency Enterprise Apps Logic { } API File Import / SQL Import DB Extract File DB Event Stream Event Source Location IoT Data Mobile Apps Social Big Data solves Volume and Variety – not Velocity Introduction to Stream Processing Event Hub Event Hub Event Hub Telemetry
  10. 10. "Data at Rest" vs. "Data in Motion" Data at Rest Data in Motion Store Act Analyze StoreAct Analyze 1110 1010 1010 110 1110 1010 1010 110 Introduction to Stream Processing
  11. 11. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Stream Analytics Platform Stream Processing Architecture solves Velocity BI Tools Enterprise Data Warehouse Event Hub Search / Explore Enterprise Apps Search Results Stream Analytics Reference / Models Dashboard Logic { } API Event Stream Event Stream Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Introduction to Stream Processing Low(est) latency, no history Telemetry
  12. 12. Hadoop Clusterd Hadoop Cluster Stream Analytics Platform Big Data for all historical data analysis BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Search Results Stream Analytics Reference / Models Dashboard Logic { } API Event Stream Event Stream Hadoop Clusterd Hadoop Cluster Big Data Platform Parallel Processing Storage Storage RawRefined Results Data FlowEvent Hub Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social File Import / SQL Import Introduction to Stream Processing Telemetry
  13. 13. Data Store Integrate existing systems through CDC Data Event Hub Integration Consuming Systems StateLogic CDC CDC Connector Traditional Silo-based System LogicUser Interface Capture changes directly on database Change Data Capture (CDC) => think like a global database trigger Transform existing systems to event producer Event Stream Event Stream Introduction to Stream Processing
  14. 14. Hadoop Clusterd Hadoop Cluster Stream Analytics Platform Integrate existing systems with lower latency through CDC BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Search Results Stream Analytics Reference / Models Dashboard Logic { } API Hadoop Clusterd Hadoop Cluster Big Data Platform Parallel Processing Storage Storage RawRefined Results File Import / SQL Import Event Stream Event Stream Data FlowEvent Hub Event Stream Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Introduction to Stream Processing Telemetry
  15. 15. Hadoop Clusterd Hadoop Cluster Big Data Unified Architecture for Modern Data Analytics Solutions SQL Search BI Tools Enterprise Data Warehouse Search / Explore File Import / SQL Import Event Hub Parallel Processing Storage Storage RawRefined Results Microservice State { } API Stream Processor State { } API Event Stream Event Stream Service Stream Analytics Microservices Enterprise Apps Logic { } API Edge Node Rules Event Hub Storage Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Event Stream Telemetry
  16. 16. Two Types of Stream Processing (from Gartner) Introduction to Stream Processing Stream Data Integration • primarily focuses on the ingestion and processing of data sources targeting real- time extract-transform-load (ETL) and data integration use cases • filter and enrich the data • optionally calculate time-windowed aggregations before storing the results in a database or file system Stream Analytics • targets analytics use cases • calculating aggregates and detecting patterns to generate higher-level, more relevant summary information (complex events) • Complex events may signify threats or opportunities that require a response from the business through real-time dashboards, alerts or decision automation
  17. 17. How to implement „Data-in- Motion“?
  18. 18. ”Data-in-Motion” Ecosystem Stream Analytics Event Hub Open Source Closed Source Stream Data Integration Source: adapted from Tibco Edge Introduction to Stream Processing
  19. 19. Apache Kafka – A Streaming Platform High-Level Architecture Distributed Log at the Core Scale-Out Architecture Logs do not (necessarily) forget
  20. 20. Blueprints for Stream Visualization
  21. 21. 1) Direct Streaming to the Consumer ”Data in Motion” Stream Analytics Event Hub Integration Streaming Visualization Channel Consumer Data Flow Data Sources
  22. 22. 2) Use a fast datastore and do regular polling from consumer ”Data in Motion” Stream Analytics Event Hub Integration APIData Store Streaming Visualization Data Flow ConsumerData Sources
  23. 23. 3) Use stateful Stream Analytics and query directly the store ”Data in Motion” Stream Analytics Event Hub Integration API Streaming Visualization ConsumerData Sources
  24. 24. Blueprints for Stream Visualization - Impementation
  25. 25. Visualization: many many options! But do they support Streaming Data?
  26. 26. Oracle Stream Analytics ”Data in Motion” Stream Analytics Event Hub Integration Streaming Visualization Channel Consumer Data Flow Data Sources
  27. 27. Oracle Stream Analytics • Stream Analytics and Visualization in one • offers real-time actionable business insight on streaming data • automates action to drive today’s agile businesses (business user) • Runs on top of Spark Streaming • Cloud and on-premises • Data Sources: Kafka, JMS, GoldenGate, File
  28. 28. Web Sockets / SSE / Custom Java Script Application ”Data in Motion” Stream Analytics Event Hub Integration Streaming Visualization Channel Consumer Data Flow Sever Sent Event (SSE)
  29. 29. Slack / WhatsApp / Twitter / … ”Data in Motion” Stream Analytics Event Hub Integration Streaming Visualization Channel Consumer Data Flow
  30. 30. WebSockets vs. Server Sent Events (SSE) WebSockets • provide a richer protocol to perform bi- directional, full-duplex communication • require full-duplex connections and new Web Socket servers to handle the protocol • Having a two-way channel is more attractive for things like games, messaging apps, and for cases where you need near real-time updates in both directions SSE • SSEs are sent over traditional HTTP • do not require a special protocol or server implementation to get working • If only one direction is necessary, • Server-Sent Events on the other hand, have been designed from the ground up to be efficient
  31. 31. KSQL / REST API / Custom App ”Data in Motion” Stream Analytics Event Hub Integration API Streaming Visualization ConsumerData Sources
  32. 32. KSQL & Arcadia Data ”Data in Motion” Stream Analytics Event Hub Integration API Streaming Visualization ConsumerData Sources
  33. 33. Arcadia Data • Combines Batch and Streaming Visualization in one • Streaming Visualizations based on Confluent KSQL (Kafka) • Acadia Instant and Arcadia Enterprise
  34. 34. Druid & Superset / Imply ”Data in Motion” Stream Analytics Event Hub Integration APIData Store Streaming Visualization Data Flow ConsumerData Sources
  35. 35. What is Druid? • Open Source Time Series DB by Metamarkets • Apache Incubating • Column-Oriented Storage • Streaming and Batch Ingest • Time optimized partitioning • SQL Support • Deep Storage can be HDFS / S3
  36. 36. Imply • Commercial offering of Druid • Built around Apache Druid • Analytics, search and intelligence for event-driven data
  37. 37. Superset • Open source data visualization tool by Airbnb • Apache incubator • Superset supports 30 types of visualizations • easy-to-use interface for exploring and visualizing data • Create and share dashboards • Deep integration with Druid • Integration with most SQL-speaking RDBMS through SQLAlchemy
  38. 38. Elasticsearch / Kibana ”Data in Motion” Stream Analytics Event Hub Integration APIData Store Streaming Visualization Data Flow ConsumerData Sources
  39. 39. Elasticsearch / Kibana Elasticsearch • NoSQL store • a distributed, RESTful search and analytics engine • centrally stores your data so you can discover the expected and uncover the unexpected • lets you perform and combine many types of searches — structured, unstructured, geo, metric • aggregations let you zoom out to explore trends and patterns in your data Kibana • Window into Elasticsearch • Enables visual exploration and analysis of data stored in Elasticsearch
  40. 40. InfluxDB / Grafana or Chronograf ”Data in Motion” Stream Analytics Event Hub Integration APIData Store Streaming Visualization Data Flow ConsumerData Sources
  41. 41. InfluxDB InfluxDB • Popular Time Series Database • Open source as well as Commercial offering Chronograf
  42. 42. Grafana Grafana allows to query, visualize, alert and understand metrics independent of their storage Supports various datasources • Elasticsearch • InfluxDB • Prometheus • OpenTSDB • MySQL • …
  43. 43. Technology on its own won't help you. You need to know how to use it properly.

×