SlideShare ist ein Scribd-Unternehmen logo
1 von 42
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager (SAM)
& Registry
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Registry
Streaming Analytics Manager (SAM)
Demo
Questions
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
History of Streaming at Hortonworks
 Introduced Storm as Stream Processing Engine in HDP-2.1 (Late 2013)
 First to ship Apache Kafka as Enterprise Messaging Queue ( Early 2014)
 Added several improvements & features into Apache Storm.
 Added Security and critical features/improvements to Apache Kafka
 Lot of learnings from shipping Storm & Kafka for past 3 years
 Vision & Implementation of Registry & Streaming Analytics Manager based on our learnings from shipping Storm
& Kafka for past 3 years.
Page4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Registry
Page5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Registry
 Foundational service to enable multiple use-cases including Streaming, Machine Learning,
Service discovery, Application templates
 Offers base frameworks to develop Schema Registry, ML Registry etc..
 Registry modules like Schema Registry, ML Registry build their own entities on top of
versioned entity
 Modular approach to running registry services.
 Users will have flexibility to choose what registry services they would like to enable.
 We have Schema Registry and ML Registry
Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Schema Registry? What Value Does it Provide?
 What is Schema Registry?
• A shared repository of schemas that allows applications to flexibly interact with each other
 What Value does Schema Registry Provide?
– Central Metadata Repository
• Provide reusable schema
• Define relationship between schemas
• Enable generic format conversion, and generic routing
– Operational Efficiency
• To avoid attaching schema to every piece of data
• Producers and consumers can evolve at different rates
 Example Use
– Register Schemas for Kafka Topics to be used by consumers of Kafka Topic (e.g: Nifi, StreamLine)
Page7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry Concepts
• Schema Group
A logical grouping/container
for similar type of schemas or
based any criteria that the
customer has from managing
the schemas
• Schema Metadata
Metadata associated with a
named schema.
• Schema Version
The actual versioned schema
associated a schema meta
definition
Schema Metadata 1
Schema Name
Schema Type
Description
Compatibility Policy
Serializers
Deserializers
Schema Group
Group Name
SchemaVersion 3
SchemaVersion 2
Schema Version 1
version
text
Fingerprint
Page8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Sender/Receiver flow
Local
schema/serdes
cache
Serializer
Producer
Schema
Registry Client
Message Store
Local
schema/serdes
cache
Deserializer
Schema
Registry Client
version
payloa
d
version
payloa
d
Schema Storage SerDes Storage
Consumer
SchemaRegist
ry
SchemaRegist
ry
SchemaRegist
ry
Page9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry
Schema Registry Component Architecture
SR Web Server
Schema Registry
Web App
REST APISchema Registry Client
Java Client
Integrations
Nifi Processors Kafka Ser/Des StreamLine
Schema
Storage
Pluggable Storage
Serializer/Deserializer
Jar Storage
MySQL In-Memory Local
File
System
HDFSPostgre
s
Page11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Compatibility Policies
 What is a Compatibility Policy?
– Defines the rules of how the schemas can evolve
– Subsequent version updates has to honor the schema’s original compatibility.
 Policies Supported
– Backward
– Forward
– Both
– None
Page12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema evolution
Producer
v2
Consumer
v2
Producer
v1
Producer
v4
Consumer
v5
Producer
v1
Consumer
v7
Page17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Serializers/Deserializers
 Snapshot based serializer/deserializer
– Seriliazes the complete payload
– Deserializes the payload to respective type
 Pull based serializer/deserializer
– Serialize whatever elements are required and ignore other elements
– Pull out whatever elements that are required to build the desired object
 Push based deserializer
– Gives callback to receive parsing events for respective fields in schema
Page18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema registry client
 REST based client
 Caching
– Metadata
– Schema versions
– Ser/des libs and class loaders
 URL selectors
– Round robin
– Failover
Page19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HA
 Storage provider
– Depends on transactional support
of underlying SQL stores
– Spinup required schema registry
instances
 Supports HA at SchemaRegistry
– Using ZK/Curator
– Automatic failover of master
– Master gets all writes
– Slaves receives only reads
SchemaRegistr
y
storage
SchemaRegistr
y
SchemaRegistr
y
SchemaRegistr
y
SchemaRegistr
ySchemaRegistr
y
storage
Page20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Integration of Schema Registry
 Kafka
– Using producer/consumer API for serializer/deserializer
 Nifi Processors for Schema Registry
– Fetch Schema
– Serialize/Deserialize with Schema
 StreamLine processors for Schema Registry
– Lookup Schema of a Kafka, Kinesis, EventHubs Topic
– Lookup Schema of a HDFS Directory
Page24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry UI
Page25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
WIP/Future enhancements
 Security
– Kerberos support
– Default authorizers and Apache Ranger support
 Audit of Schemas & Clients
 Rich Types in Schema definition
 Pluggable Listeners
 Schema Policies
 Notifications
– New versions
– Archiving
 Converters
Page26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Try it out!
 Its open source under Apache License
 https://github.com/hortonworks/registry
 Apache incubation soon
 Registry 0.2 release April 25th, 0.3 release on May 31st
 https://groups.google.com/forum/#!forum/registry
 We are seeing outside contributions
 Contributions are welcome!
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager
Page28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager
 What is it?
• A platform used to design, develop, deploy and manage streaming analytics
applications using a drag drop visualize paradigm in minutes
• Supports event correlation, context enrichment , complex pattern matching,
analytical aggregations and alerts/notifications when insights are discovered.
• It is agnostic to the underlying streaming engine and can support multiple streaming
substrates (e.g: Storm, Spark Streaming, Flink)
• Extensibility is a first class citizen (add sinks, processors, sources as needed)
 Guiding Principle
– Build streaming applications easily while focusing on business logic
Page29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Complexities in building streaming applications
 New streaming engines and APIs
 Implementing windows, joins, and state management is hard
 Adding user’s business logic into the application
 Interaction with external services such as HBase, Hive, HDFS etc
 Deploying with all the necessary configuration files
 Operations around the streaming application including monitoring and metrics
 Debugging streaming application
Page30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key challenges that SAM is trying to solve
 Building streaming applications requires specialized skillsets that most enterprise
organizations don’t have today
 Streaming applications require considerable amount of programming, testing and tuning
before deploying to production which takes a significant amount of time
 Key streaming primitives such as joining/splitting streams, aggregations over a window of
time and pattern matching are difficult to implement
 People don’t prefer to code to build complex streaming applications
 No true open source project today solves all of the above challenges
 People don’t care about the streaming engine that powers streaming applications so much as
long challenges above are addressed and doesn’t force them into vendor lock in.
Page31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics Manager Components and User
Personas
Distributed Streaming
Computation Engine
(Different Streaming Engines that powers higher level services to build stream application. )
App Developer
Business Analyst
Operations
Page32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM’s Value Proposition
 A platform using a graphical programming paradigm allowing users to focus on business
logic and easily build and deploy complex streaming applications
 Makes it easier for users to import other service configurations and use them in streaming
applications
 Provides abstractions on the streaming engine used. The abstraction provides the ability to
plugin in open source streaming engines (Storm, Spark, Flink, etc.)
 Decouple schema from the streaming application via integration with Schema Registry
 Provide operational metrics to monitor streaming application via pluggable metrics storage.
E.g. Ambari, OpenTSDB
 Streaming Insights, visualize the data that’s being processed by streaming application
Page33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM’s Key Capabilities
 Building streaming apps using the following primitives
– Connecting to Streams
– Joining Streams
– Forking Streams
– Aggregations over Windows
– Stream Analytics – Descriptive, Predictive, Prescriptive
– Rules Engine
– Transformations
– Filtering and Routing
– Notifications / Alerts
 Deploying streaming apps
– Deploying the streaming app on a a supported streaming engine
– Monitoring the streaming app with metrics
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Typical Streaming Application Workflow
K
a
f
k
a
P1 W1
H
B
a
s
e
Page35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM’s Service Pools and Environments
Stream App 1 Stream App 2
• Service Pool
• A pool of services that can be
used to create different
environments
• Environment
• Consists of a set of services
you choose from 1 or more
service pools.
• Stream App
• The environment is then
associated with a Stream
Application which then uses the
services in that environment for
various configuration
Page36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Page37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Page38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAM’s Components
 Builder Components
Source • Kafka Source
• Event Hub
• HDFS
All Integrated with Schema Registry
Processor • Join
• Window/Aggregate
• Rule
• Normalization/Projection
• Branch
• PMML
• Custom
Sinks • Notification/Alerts (Email Support)
• HDFS
• HBase
• Hive
• JDBC
• Druid
• Cassandra
• Kafka
• OpenTSDB
• Solr
Page39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Page40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Page41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Page42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Streaming Analytics powered by Druid and Superset
 What is Stream Insight?
– Provides a tool to business analysts to do descriptive analytics of the streaming data and
perishable insights using a sophisticated UI provided by Superset
– Tooling to create time-series and real-time analytics dashboards, charts and graphs and
create rich customizable visualization of data
Page43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo
Page45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 Custom Processor
– Allows users to write their own business logic
/**
* Interface for processors to implement for processing messages at runtime
*/
public interface ProcessorRuntime {
/**
* Process the {@link StreamlineEvent} and throw a {@link ProcessingException} if an
error arises during processing
* @param event to be processed
* @return
* @throws ProcessingException
*/
List<Result> process (StreamlineEvent event) throws ProcessingException;
/**
* Initialize any necessary resources needed for the implementation
* @param config
*/
void initialize(Map<String, Object> config);
/**
* Clean up any necessary resources needed for the implementation
*/
void cleanup();
}
Page46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 Window UDF
– Custom UDF’s to process window data
/**
* This is an interface for implementing user defined
functions for a single argument.
*
* @param <O> type of the result
* @param <I> type of the input argument
*/
public interface UDF<O, I> {
O evaluate(I i);
}
Built in functions
 STDDEV
 STDDEVP
 VARIANCE
 VARIANCEP
 MEAN
 MIN
 MAX
 SUM
 COUNT
 UPPER
 LOWER
 INITCAP
 SUBSTRING
 CHAR_LENGTH
 CONCAT
Page47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility with SAM SDK
 Notification Sink
– Interface to send Notifications such as Email, SMS and More complex to invoke external
APIs
public interface Notifier {
void open(NotificationContext ctx);
void notify(Notification notification);
void close();
boolean isPull();
List<String> getFields();
NotificationContext getContext();
}
public interface Notification {
enum Status {
NEW, DELIVERED, FAILED
}
String getId();
List<String> getEventIds();
List<String> getDataSourceIds();
String getRuleId();
Status getStatus();
Map<String, Object> getFieldsAndValues();
String getNotifierName();
long getTs();
}
Page48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What’s Next?
 Manual service pool registration not requiring Ambari
 Test sources and sinks to easily test functionality of streaming app
 Authentication and Authorization
 Other components(sources(Kinesis), processors and sinks)
Page49 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Try it out!
 Its open source under Apache License
 https://github.com/hortonworks/streamline
 Apache incubation soon
 SAM 0.4 is out!
 https://groups.google.com/forum/#!forum/streamline-users
 Contributions are welcome!
Page50 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Follow-up questions
 JP Player, Principle Solutions Engineer
jplayer@hortonworks.com
650.773.3313
 Sam Hjelmfelt, Resident Architect
shjelmfelt@hortonworks.com
605.393.7244
 Kristine Hannigan, Enterprise Account Manager
khannigan@hortonworks.com
415.323.8819

Weitere ähnliche Inhalte

Was ist angesagt?

Manage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in HadoopManage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in Hadoop
DataWorks Summit
 

Was ist angesagt? (20)

Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
 
Apache NiFi: latest developments for flow management at scale
Apache NiFi: latest developments for flow management at scaleApache NiFi: latest developments for flow management at scale
Apache NiFi: latest developments for flow management at scale
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
 
Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
 
Manage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in HadoopManage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in Hadoop
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
 
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
 
Scalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and TesseractScalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and Tesseract
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Introduction to HDF 3.0
Introduction to HDF 3.0Introduction to HDF 3.0
Introduction to HDF 3.0
 

Ähnlich wie Schema Registry & Stream Analytics Manager

SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
DataWorks Summit
 
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
DataWorks Summit/Hadoop Summit
 
ITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming Apps
Timothy Spann
 

Ähnlich wie Schema Registry & Stream Analytics Manager (20)

SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
 
Streamline - Stream Analytics for Everyone
Streamline - Stream Analytics for EveryoneStreamline - Stream Analytics for Everyone
Streamline - Stream Analytics for Everyone
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming data
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easy
 
SAP HANA Native Application Development
SAP HANA Native Application DevelopmentSAP HANA Native Application Development
SAP HANA Native Application Development
 
Next Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics appNext Generation Tooling for building streaming analytics app
Next Generation Tooling for building streaming analytics app
 
Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...Next gen tooling for building streaming analytics apps: code-less development...
Next gen tooling for building streaming analytics apps: code-less development...
 
Paris FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging ManagerParis FOD meetup - Streams Messaging Manager
Paris FOD meetup - Streams Messaging Manager
 
SaaS Introduction-May2014
SaaS Introduction-May2014SaaS Introduction-May2014
SaaS Introduction-May2014
 
SharePoint Framework SPFx
SharePoint Framework SPFxSharePoint Framework SPFx
SharePoint Framework SPFx
 
ITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming Apps
 
Streams GitHub Products Overview for IBM InfoSphere Streams V4.0
Streams GitHub Products Overview for IBM InfoSphere Streams V4.0Streams GitHub Products Overview for IBM InfoSphere Streams V4.0
Streams GitHub Products Overview for IBM InfoSphere Streams V4.0
 
Ambari metrics system - Apache ambari meetup (DataWorks Summit 2017)
Ambari metrics system - Apache ambari meetup (DataWorks Summit 2017)Ambari metrics system - Apache ambari meetup (DataWorks Summit 2017)
Ambari metrics system - Apache ambari meetup (DataWorks Summit 2017)
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
 

Kürzlich hochgeladen

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 

Kürzlich hochgeladen (20)

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 

Schema Registry & Stream Analytics Manager

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager (SAM) & Registry
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Registry Streaming Analytics Manager (SAM) Demo Questions
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved History of Streaming at Hortonworks  Introduced Storm as Stream Processing Engine in HDP-2.1 (Late 2013)  First to ship Apache Kafka as Enterprise Messaging Queue ( Early 2014)  Added several improvements & features into Apache Storm.  Added Security and critical features/improvements to Apache Kafka  Lot of learnings from shipping Storm & Kafka for past 3 years  Vision & Implementation of Registry & Streaming Analytics Manager based on our learnings from shipping Storm & Kafka for past 3 years.
  • 4. Page4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Registry
  • 5. Page5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Registry  Foundational service to enable multiple use-cases including Streaming, Machine Learning, Service discovery, Application templates  Offers base frameworks to develop Schema Registry, ML Registry etc..  Registry modules like Schema Registry, ML Registry build their own entities on top of versioned entity  Modular approach to running registry services.  Users will have flexibility to choose what registry services they would like to enable.  We have Schema Registry and ML Registry
  • 6. Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Schema Registry? What Value Does it Provide?  What is Schema Registry? • A shared repository of schemas that allows applications to flexibly interact with each other  What Value does Schema Registry Provide? – Central Metadata Repository • Provide reusable schema • Define relationship between schemas • Enable generic format conversion, and generic routing – Operational Efficiency • To avoid attaching schema to every piece of data • Producers and consumers can evolve at different rates  Example Use – Register Schemas for Kafka Topics to be used by consumers of Kafka Topic (e.g: Nifi, StreamLine)
  • 7. Page7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Registry Concepts • Schema Group A logical grouping/container for similar type of schemas or based any criteria that the customer has from managing the schemas • Schema Metadata Metadata associated with a named schema. • Schema Version The actual versioned schema associated a schema meta definition Schema Metadata 1 Schema Name Schema Type Description Compatibility Policy Serializers Deserializers Schema Group Group Name SchemaVersion 3 SchemaVersion 2 Schema Version 1 version text Fingerprint
  • 8. Page8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Sender/Receiver flow Local schema/serdes cache Serializer Producer Schema Registry Client Message Store Local schema/serdes cache Deserializer Schema Registry Client version payloa d version payloa d Schema Storage SerDes Storage Consumer SchemaRegist ry SchemaRegist ry SchemaRegist ry
  • 9. Page9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Registry Schema Registry Component Architecture SR Web Server Schema Registry Web App REST APISchema Registry Client Java Client Integrations Nifi Processors Kafka Ser/Des StreamLine Schema Storage Pluggable Storage Serializer/Deserializer Jar Storage MySQL In-Memory Local File System HDFSPostgre s
  • 10. Page11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Compatibility Policies  What is a Compatibility Policy? – Defines the rules of how the schemas can evolve – Subsequent version updates has to honor the schema’s original compatibility.  Policies Supported – Backward – Forward – Both – None
  • 11. Page12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema evolution Producer v2 Consumer v2 Producer v1 Producer v4 Consumer v5 Producer v1 Consumer v7
  • 12. Page17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Serializers/Deserializers  Snapshot based serializer/deserializer – Seriliazes the complete payload – Deserializes the payload to respective type  Pull based serializer/deserializer – Serialize whatever elements are required and ignore other elements – Pull out whatever elements that are required to build the desired object  Push based deserializer – Gives callback to receive parsing events for respective fields in schema
  • 13. Page18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema registry client  REST based client  Caching – Metadata – Schema versions – Ser/des libs and class loaders  URL selectors – Round robin – Failover
  • 14. Page19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HA  Storage provider – Depends on transactional support of underlying SQL stores – Spinup required schema registry instances  Supports HA at SchemaRegistry – Using ZK/Curator – Automatic failover of master – Master gets all writes – Slaves receives only reads SchemaRegistr y storage SchemaRegistr y SchemaRegistr y SchemaRegistr y SchemaRegistr ySchemaRegistr y storage
  • 15. Page20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Integration of Schema Registry  Kafka – Using producer/consumer API for serializer/deserializer  Nifi Processors for Schema Registry – Fetch Schema – Serialize/Deserialize with Schema  StreamLine processors for Schema Registry – Lookup Schema of a Kafka, Kinesis, EventHubs Topic – Lookup Schema of a HDFS Directory
  • 16. Page24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Registry UI
  • 17. Page25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved WIP/Future enhancements  Security – Kerberos support – Default authorizers and Apache Ranger support  Audit of Schemas & Clients  Rich Types in Schema definition  Pluggable Listeners  Schema Policies  Notifications – New versions – Archiving  Converters
  • 18. Page26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Try it out!  Its open source under Apache License  https://github.com/hortonworks/registry  Apache incubation soon  Registry 0.2 release April 25th, 0.3 release on May 31st  https://groups.google.com/forum/#!forum/registry  We are seeing outside contributions  Contributions are welcome!
  • 19. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager
  • 20. Page28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager  What is it? • A platform used to design, develop, deploy and manage streaming analytics applications using a drag drop visualize paradigm in minutes • Supports event correlation, context enrichment , complex pattern matching, analytical aggregations and alerts/notifications when insights are discovered. • It is agnostic to the underlying streaming engine and can support multiple streaming substrates (e.g: Storm, Spark Streaming, Flink) • Extensibility is a first class citizen (add sinks, processors, sources as needed)  Guiding Principle – Build streaming applications easily while focusing on business logic
  • 21. Page29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Complexities in building streaming applications  New streaming engines and APIs  Implementing windows, joins, and state management is hard  Adding user’s business logic into the application  Interaction with external services such as HBase, Hive, HDFS etc  Deploying with all the necessary configuration files  Operations around the streaming application including monitoring and metrics  Debugging streaming application
  • 22. Page30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key challenges that SAM is trying to solve  Building streaming applications requires specialized skillsets that most enterprise organizations don’t have today  Streaming applications require considerable amount of programming, testing and tuning before deploying to production which takes a significant amount of time  Key streaming primitives such as joining/splitting streams, aggregations over a window of time and pattern matching are difficult to implement  People don’t prefer to code to build complex streaming applications  No true open source project today solves all of the above challenges  People don’t care about the streaming engine that powers streaming applications so much as long challenges above are addressed and doesn’t force them into vendor lock in.
  • 23. Page31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager Components and User Personas Distributed Streaming Computation Engine (Different Streaming Engines that powers higher level services to build stream application. ) App Developer Business Analyst Operations
  • 24. Page32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM’s Value Proposition  A platform using a graphical programming paradigm allowing users to focus on business logic and easily build and deploy complex streaming applications  Makes it easier for users to import other service configurations and use them in streaming applications  Provides abstractions on the streaming engine used. The abstraction provides the ability to plugin in open source streaming engines (Storm, Spark, Flink, etc.)  Decouple schema from the streaming application via integration with Schema Registry  Provide operational metrics to monitor streaming application via pluggable metrics storage. E.g. Ambari, OpenTSDB  Streaming Insights, visualize the data that’s being processed by streaming application
  • 25. Page33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM’s Key Capabilities  Building streaming apps using the following primitives – Connecting to Streams – Joining Streams – Forking Streams – Aggregations over Windows – Stream Analytics – Descriptive, Predictive, Prescriptive – Rules Engine – Transformations – Filtering and Routing – Notifications / Alerts  Deploying streaming apps – Deploying the streaming app on a a supported streaming engine – Monitoring the streaming app with metrics
  • 26. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Typical Streaming Application Workflow K a f k a P1 W1 H B a s e
  • 27. Page35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM’s Service Pools and Environments Stream App 1 Stream App 2 • Service Pool • A pool of services that can be used to create different environments • Environment • Consists of a set of services you choose from 1 or more service pools. • Stream App • The environment is then associated with a Stream Application which then uses the services in that environment for various configuration
  • 28. Page36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 29. Page37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 30. Page38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM’s Components  Builder Components Source • Kafka Source • Event Hub • HDFS All Integrated with Schema Registry Processor • Join • Window/Aggregate • Rule • Normalization/Projection • Branch • PMML • Custom Sinks • Notification/Alerts (Email Support) • HDFS • HBase • Hive • JDBC • Druid • Cassandra • Kafka • OpenTSDB • Solr
  • 31. Page39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 32. Page40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 33. Page41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 34. Page42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics powered by Druid and Superset  What is Stream Insight? – Provides a tool to business analysts to do descriptive analytics of the streaming data and perishable insights using a sophisticated UI provided by Superset – Tooling to create time-series and real-time analytics dashboards, charts and graphs and create rich customizable visualization of data
  • 35. Page43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 36. 44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo
  • 37. Page45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Custom Processor – Allows users to write their own business logic /** * Interface for processors to implement for processing messages at runtime */ public interface ProcessorRuntime { /** * Process the {@link StreamlineEvent} and throw a {@link ProcessingException} if an error arises during processing * @param event to be processed * @return * @throws ProcessingException */ List<Result> process (StreamlineEvent event) throws ProcessingException; /** * Initialize any necessary resources needed for the implementation * @param config */ void initialize(Map<String, Object> config); /** * Clean up any necessary resources needed for the implementation */ void cleanup(); }
  • 38. Page46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Window UDF – Custom UDF’s to process window data /** * This is an interface for implementing user defined functions for a single argument. * * @param <O> type of the result * @param <I> type of the input argument */ public interface UDF<O, I> { O evaluate(I i); } Built in functions  STDDEV  STDDEVP  VARIANCE  VARIANCEP  MEAN  MIN  MAX  SUM  COUNT  UPPER  LOWER  INITCAP  SUBSTRING  CHAR_LENGTH  CONCAT
  • 39. Page47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Notification Sink – Interface to send Notifications such as Email, SMS and More complex to invoke external APIs public interface Notifier { void open(NotificationContext ctx); void notify(Notification notification); void close(); boolean isPull(); List<String> getFields(); NotificationContext getContext(); } public interface Notification { enum Status { NEW, DELIVERED, FAILED } String getId(); List<String> getEventIds(); List<String> getDataSourceIds(); String getRuleId(); Status getStatus(); Map<String, Object> getFieldsAndValues(); String getNotifierName(); long getTs(); }
  • 40. Page48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What’s Next?  Manual service pool registration not requiring Ambari  Test sources and sinks to easily test functionality of streaming app  Authentication and Authorization  Other components(sources(Kinesis), processors and sinks)
  • 41. Page49 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Try it out!  Its open source under Apache License  https://github.com/hortonworks/streamline  Apache incubation soon  SAM 0.4 is out!  https://groups.google.com/forum/#!forum/streamline-users  Contributions are welcome!
  • 42. Page50 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Follow-up questions  JP Player, Principle Solutions Engineer jplayer@hortonworks.com 650.773.3313  Sam Hjelmfelt, Resident Architect shjelmfelt@hortonworks.com 605.393.7244  Kristine Hannigan, Enterprise Account Manager khannigan@hortonworks.com 415.323.8819