Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Nächste SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Wird geladen in …3
×
1 von 42

Schema Registry & Stream Analytics Manager

6

Teilen

Herunterladen, um offline zu lesen

Registry is a central metadata repository that allows users to collaboratively use Schema definitions for stream processing.

Stream Analytics Manager, provides a framework to build Streaming applications faster, easier.

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Schema Registry & Stream Analytics Manager

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager (SAM) & Registry
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Registry Streaming Analytics Manager (SAM) Demo Questions
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved History of Streaming at Hortonworks  Introduced Storm as Stream Processing Engine in HDP-2.1 (Late 2013)  First to ship Apache Kafka as Enterprise Messaging Queue ( Early 2014)  Added several improvements & features into Apache Storm.  Added Security and critical features/improvements to Apache Kafka  Lot of learnings from shipping Storm & Kafka for past 3 years  Vision & Implementation of Registry & Streaming Analytics Manager based on our learnings from shipping Storm & Kafka for past 3 years.
  4. 4. Page4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Registry
  5. 5. Page5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Registry  Foundational service to enable multiple use-cases including Streaming, Machine Learning, Service discovery, Application templates  Offers base frameworks to develop Schema Registry, ML Registry etc..  Registry modules like Schema Registry, ML Registry build their own entities on top of versioned entity  Modular approach to running registry services.  Users will have flexibility to choose what registry services they would like to enable.  We have Schema Registry and ML Registry
  6. 6. Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Schema Registry? What Value Does it Provide?  What is Schema Registry? • A shared repository of schemas that allows applications to flexibly interact with each other  What Value does Schema Registry Provide? – Central Metadata Repository • Provide reusable schema • Define relationship between schemas • Enable generic format conversion, and generic routing – Operational Efficiency • To avoid attaching schema to every piece of data • Producers and consumers can evolve at different rates  Example Use – Register Schemas for Kafka Topics to be used by consumers of Kafka Topic (e.g: Nifi, StreamLine)
  7. 7. Page7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Registry Concepts • Schema Group A logical grouping/container for similar type of schemas or based any criteria that the customer has from managing the schemas • Schema Metadata Metadata associated with a named schema. • Schema Version The actual versioned schema associated a schema meta definition Schema Metadata 1 Schema Name Schema Type Description Compatibility Policy Serializers Deserializers Schema Group Group Name SchemaVersion 3 SchemaVersion 2 Schema Version 1 version text Fingerprint
  8. 8. Page8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Sender/Receiver flow Local schema/serdes cache Serializer Producer Schema Registry Client Message Store Local schema/serdes cache Deserializer Schema Registry Client version payloa d version payloa d Schema Storage SerDes Storage Consumer SchemaRegist ry SchemaRegist ry SchemaRegist ry
  9. 9. Page9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Registry Schema Registry Component Architecture SR Web Server Schema Registry Web App REST APISchema Registry Client Java Client Integrations Nifi Processors Kafka Ser/Des StreamLine Schema Storage Pluggable Storage Serializer/Deserializer Jar Storage MySQL In-Memory Local File System HDFSPostgre s
  10. 10. Page11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Compatibility Policies  What is a Compatibility Policy? – Defines the rules of how the schemas can evolve – Subsequent version updates has to honor the schema’s original compatibility.  Policies Supported – Backward – Forward – Both – None
  11. 11. Page12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema evolution Producer v2 Consumer v2 Producer v1 Producer v4 Consumer v5 Producer v1 Consumer v7
  12. 12. Page17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Serializers/Deserializers  Snapshot based serializer/deserializer – Seriliazes the complete payload – Deserializes the payload to respective type  Pull based serializer/deserializer – Serialize whatever elements are required and ignore other elements – Pull out whatever elements that are required to build the desired object  Push based deserializer – Gives callback to receive parsing events for respective fields in schema
  13. 13. Page18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema registry client  REST based client  Caching – Metadata – Schema versions – Ser/des libs and class loaders  URL selectors – Round robin – Failover
  14. 14. Page19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HA  Storage provider – Depends on transactional support of underlying SQL stores – Spinup required schema registry instances  Supports HA at SchemaRegistry – Using ZK/Curator – Automatic failover of master – Master gets all writes – Slaves receives only reads SchemaRegistr y storage SchemaRegistr y SchemaRegistr y SchemaRegistr y SchemaRegistr ySchemaRegistr y storage
  15. 15. Page20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Integration of Schema Registry  Kafka – Using producer/consumer API for serializer/deserializer  Nifi Processors for Schema Registry – Fetch Schema – Serialize/Deserialize with Schema  StreamLine processors for Schema Registry – Lookup Schema of a Kafka, Kinesis, EventHubs Topic – Lookup Schema of a HDFS Directory
  16. 16. Page24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Registry UI
  17. 17. Page25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved WIP/Future enhancements  Security – Kerberos support – Default authorizers and Apache Ranger support  Audit of Schemas & Clients  Rich Types in Schema definition  Pluggable Listeners  Schema Policies  Notifications – New versions – Archiving  Converters
  18. 18. Page26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Try it out!  Its open source under Apache License  https://github.com/hortonworks/registry  Apache incubation soon  Registry 0.2 release April 25th, 0.3 release on May 31st  https://groups.google.com/forum/#!forum/registry  We are seeing outside contributions  Contributions are welcome!
  19. 19. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager
  20. 20. Page28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager  What is it? • A platform used to design, develop, deploy and manage streaming analytics applications using a drag drop visualize paradigm in minutes • Supports event correlation, context enrichment , complex pattern matching, analytical aggregations and alerts/notifications when insights are discovered. • It is agnostic to the underlying streaming engine and can support multiple streaming substrates (e.g: Storm, Spark Streaming, Flink) • Extensibility is a first class citizen (add sinks, processors, sources as needed)  Guiding Principle – Build streaming applications easily while focusing on business logic
  21. 21. Page29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Complexities in building streaming applications  New streaming engines and APIs  Implementing windows, joins, and state management is hard  Adding user’s business logic into the application  Interaction with external services such as HBase, Hive, HDFS etc  Deploying with all the necessary configuration files  Operations around the streaming application including monitoring and metrics  Debugging streaming application
  22. 22. Page30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key challenges that SAM is trying to solve  Building streaming applications requires specialized skillsets that most enterprise organizations don’t have today  Streaming applications require considerable amount of programming, testing and tuning before deploying to production which takes a significant amount of time  Key streaming primitives such as joining/splitting streams, aggregations over a window of time and pattern matching are difficult to implement  People don’t prefer to code to build complex streaming applications  No true open source project today solves all of the above challenges  People don’t care about the streaming engine that powers streaming applications so much as long challenges above are addressed and doesn’t force them into vendor lock in.
  23. 23. Page31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics Manager Components and User Personas Distributed Streaming Computation Engine (Different Streaming Engines that powers higher level services to build stream application. ) App Developer Business Analyst Operations
  24. 24. Page32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM’s Value Proposition  A platform using a graphical programming paradigm allowing users to focus on business logic and easily build and deploy complex streaming applications  Makes it easier for users to import other service configurations and use them in streaming applications  Provides abstractions on the streaming engine used. The abstraction provides the ability to plugin in open source streaming engines (Storm, Spark, Flink, etc.)  Decouple schema from the streaming application via integration with Schema Registry  Provide operational metrics to monitor streaming application via pluggable metrics storage. E.g. Ambari, OpenTSDB  Streaming Insights, visualize the data that’s being processed by streaming application
  25. 25. Page33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM’s Key Capabilities  Building streaming apps using the following primitives – Connecting to Streams – Joining Streams – Forking Streams – Aggregations over Windows – Stream Analytics – Descriptive, Predictive, Prescriptive – Rules Engine – Transformations – Filtering and Routing – Notifications / Alerts  Deploying streaming apps – Deploying the streaming app on a a supported streaming engine – Monitoring the streaming app with metrics
  26. 26. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Typical Streaming Application Workflow K a f k a P1 W1 H B a s e
  27. 27. Page35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM’s Service Pools and Environments Stream App 1 Stream App 2 • Service Pool • A pool of services that can be used to create different environments • Environment • Consists of a set of services you choose from 1 or more service pools. • Stream App • The environment is then associated with a Stream Application which then uses the services in that environment for various configuration
  28. 28. Page36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  29. 29. Page37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  30. 30. Page38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAM’s Components  Builder Components Source • Kafka Source • Event Hub • HDFS All Integrated with Schema Registry Processor • Join • Window/Aggregate • Rule • Normalization/Projection • Branch • PMML • Custom Sinks • Notification/Alerts (Email Support) • HDFS • HBase • Hive • JDBC • Druid • Cassandra • Kafka • OpenTSDB • Solr
  31. 31. Page39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  32. 32. Page40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  33. 33. Page41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  34. 34. Page42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Streaming Analytics powered by Druid and Superset  What is Stream Insight? – Provides a tool to business analysts to do descriptive analytics of the streaming data and perishable insights using a sophisticated UI provided by Superset – Tooling to create time-series and real-time analytics dashboards, charts and graphs and create rich customizable visualization of data
  35. 35. Page43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  36. 36. 44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo
  37. 37. Page45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Custom Processor – Allows users to write their own business logic /** * Interface for processors to implement for processing messages at runtime */ public interface ProcessorRuntime { /** * Process the {@link StreamlineEvent} and throw a {@link ProcessingException} if an error arises during processing * @param event to be processed * @return * @throws ProcessingException */ List<Result> process (StreamlineEvent event) throws ProcessingException; /** * Initialize any necessary resources needed for the implementation * @param config */ void initialize(Map<String, Object> config); /** * Clean up any necessary resources needed for the implementation */ void cleanup(); }
  38. 38. Page46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Window UDF – Custom UDF’s to process window data /** * This is an interface for implementing user defined functions for a single argument. * * @param <O> type of the result * @param <I> type of the input argument */ public interface UDF<O, I> { O evaluate(I i); } Built in functions  STDDEV  STDDEVP  VARIANCE  VARIANCEP  MEAN  MIN  MAX  SUM  COUNT  UPPER  LOWER  INITCAP  SUBSTRING  CHAR_LENGTH  CONCAT
  39. 39. Page47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility with SAM SDK  Notification Sink – Interface to send Notifications such as Email, SMS and More complex to invoke external APIs public interface Notifier { void open(NotificationContext ctx); void notify(Notification notification); void close(); boolean isPull(); List<String> getFields(); NotificationContext getContext(); } public interface Notification { enum Status { NEW, DELIVERED, FAILED } String getId(); List<String> getEventIds(); List<String> getDataSourceIds(); String getRuleId(); Status getStatus(); Map<String, Object> getFieldsAndValues(); String getNotifierName(); long getTs(); }
  40. 40. Page48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What’s Next?  Manual service pool registration not requiring Ambari  Test sources and sinks to easily test functionality of streaming app  Authentication and Authorization  Other components(sources(Kinesis), processors and sinks)
  41. 41. Page49 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Try it out!  Its open source under Apache License  https://github.com/hortonworks/streamline  Apache incubation soon  SAM 0.4 is out!  https://groups.google.com/forum/#!forum/streamline-users  Contributions are welcome!
  42. 42. Page50 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Follow-up questions  JP Player, Principle Solutions Engineer jplayer@hortonworks.com 650.773.3313  Sam Hjelmfelt, Resident Architect shjelmfelt@hortonworks.com 605.393.7244  Kristine Hannigan, Enterprise Account Manager khannigan@hortonworks.com 415.323.8819

×