More Related Content Similar to Streaming analytics manager (20) Streaming analytics manager1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager - SAM
Sriharsha Chintalapani Arun Mahadevan
2. Page2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
History of Streaming at Hortonworks
Introduced Storm as Stream Processing Engine in HDP-2.1 (Late 2013)
First to ship Apache Kafka as Enterprise Messaging Queue ( Early 2014)
Added several improvements & features into Apache Storm. Yahoo! Running 2400 nodes of
Storm
Added Security and critical features/improvements to Apache Kafka
Lot of learnings from shipping Storm & Kafka from past 3 years
Vision & Implementation of Registry & Streaming Analytics Manager based on our learnings from
shipping Storm & Kafka for past 3 years.
3. Page3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Schema Registry
Streaming Applications usually fronts with a queue such as Kafka, Kinesis, EventHub etc..
Data in Messaging Queues are Byte payloads and there is no schema associated with it.
Streaming applications developers usually looks at the data flowing and defines their
processing of that data
Any change to this data, schema wise, means developers have to update their code to
process the new format
Support both programmatic schema creation and managed schemas
4. Page4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Schema Registry
Kafka
Kinesis
EventHub
ConsumerBytes
Payload
Bytes
Payload
Storm
Spark Streaming
Others…
Producer
5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager
What is it?
• A platform to design, develop, deploy and manage streaming analytics applications using a drag
drop visualize paradigm in minutes
• Allows you to do event correlation, context enrichment, complex pattern matching, analytical
aggregations and alerts/notifications when insights are discovered.
• Agnostic to the underlying streaming engine and can support multiple streaming engines (e.g:
Storm, Spark Streaming, Flink)
• Extensibility is a first class citizen (add sinks, processors, sources as needed)
Guiding Principle
– Build complex streaming applications easily with minimum code
6. Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Complexities in building streaming applications
New streaming engines and APIs
Implementing windowing, joins, and state management is hard
Interaction with external services such as HBase, Hive, HDFS etc
Deploying with all the necessary configuration files
Operations around the streaming application including monitoring and metrics
Debugging streaming application
Securing a streaming application cluster with the right configurations is a pain
7. Page7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key challenges that SAM is trying to solve
Building streaming applications requires specialized skillsets that most enterprise
organizations don’t have today
Streaming applications require considerable amount of programming, testing and tuning
before deploying to production which takes a significant amount of time
Key streaming primitives such as joining/splitting streams, aggregations over a window of
time and pattern matching are difficult to implement
People don’t prefer to code to build complex streaming applications
No true open source project today solves all of the above challenges
People don’t care about the streaming engine that powers streaming applications so much as
long challenges above are addressed and doesn’t force them into vendor lock in.
8. Page10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager Components and User
Personas
Distributed Streaming
Computation Engine
(Different Streaming Engines that powers higher level services to build stream application. )
App Developer
Business Analyst
Operations
9. Page17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics powered by Druid and Superset
What is Stream Insight?
Provides a tool for business analysts to do descriptive analytics of the streaming data and
insights using a sophisticated UI provided by Superset
Tooling to create time-series and real-time analytics dashboards, charts and graphs and
create rich customizable visualization of data
13. Page21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM Architecture
Web server
(Jetty)
DB
SAM UI
Storage
Manager
Topology
actions
service
Topology DAG Builder
Topology Lifecycle
Manager
Storm
Runners
(translate SAM DAG
to Streaming Engine
topology)
Flink Spark
Flux
Deploy
DAG
Ambari
(cluster manager)
Streaming computation Engines
(Storm)
Service
Pools
REST
API
Environ
Service
Schema
Registry
SR
Client
14. Page22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Topology lifecycle
Initial
DAG
Constructed
Extra artifacts
set up
Deployed
Suspended
Deployment
Failed
Deploy
Kill
Suspend
Kill
Resume
Re-deploy
15. Page23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Topology DAG
Source
Processor 1
Processor 2
Sink 1
Stream 2
Edge
Stream 1
Stream 1
Stream 1
Sink 2
Fields: [
“a”: Int,
“b”:String
…
]
17. Page25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Runner implements - TopologyDAGVisitor
20. Page28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
Custom Processor - allows users to write their own business logic
21. Page29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
Multi-lang support (upcoming)
22. Page30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
UADFs - compute aggregates within a window
Built in functions
STDDEV
STDDEVP
VARIANCE
VARIANCEP
MEAN
MIN
MAX
SUM
COUNT
UPPER
LOWER
INITCAP
SUBSTRING
CHAR_LENGTH
CONCAT
23. Page31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
UDFs - does simple transformations
Built in functions
STDDEV
STDDEVP
VARIANCE
VARIANCEP
MEAN
MIN
MAX
SUM
COUNT
UPPER
LOWER
INITCAP
SUBSTRING
CHAR_LENGTH
CONCAT
24. Page32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
Notifier - sends notifications such as Email, SMS or more complex ones that can
invoke external APIs
Built in notifiers
Email
More in future…
25. Page33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The current release – 0.5
Manual service pool registration not requiring Ambari
Test mode to easily test out the streaming app
Kerberos and delegation token based Authentication
Authorization support with RBAC + permissions
New sources, processors and sinks
Upcoming…
Extending token based authentication for other components
Support for state management in SAM
Support for other streaming engines – Flink, Spark streaming
26. Page34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Try it out!
Its open source under Apache License
https://github.com/hortonworks/streamline
Apache incubation soon
SAM 0.5 is out!
https://groups.google.com/forum/#!forum/streamline-users
Contributions are welcome!