The Never Landing Stream with HTAP and Streaming

Timothy Spann
Timothy SpannDeveloper Advocate um StreamNative
1
1
The Never Landing
Stream
with HTAP and
Streaming
Timothy Spann
Principal Developer Advocate
2
2
Introduction
The Never Landing Stream with HTAP and Streaming
4
4
FLaNK Stack
Tim Spann
@PaasDev // Blog: www.datainmotion.dev
Principal Developer Advocate.
Princeton Future of Data Meetup.
ex-Pivotal, ex-Hortonworks, ex-StreamNative, ex-PwC
https://medium.com/@tspann
https://github.com/tspannhw
Apache NiFi x Apache Kafka x Apache Flink
5
5
Future of Data - Princeton + Virtual
@PaasDev
https://www.meetup.com/futureofdata-princeton/
From Big Data to AI to Streaming to Containers to
Cloud to Analytics to Cloud Storage to Fast Data to
Machine Learning to Microservices to ...
6
6
CDP IS THE ONLY HYBRID DATA PLATFORM
Hybrid. Open. Portable. Secure.
S3
GCS
OZONE
ADLS
OZONE S3
GCS
ADLS
CLOUDERA DATA PLATFORM
OZONE S3
GCS
ADLS
OPEN DATA
LAKEHOUSE
7
7
Apache NiFi
8
8
CLOUDERA FLOW MANAGEMENT - POWERED BY
APACHE NiFi
Ingest and manage data from edge-to-cloud using a no-code interface
● #1 data ingestion/movement engine
● Strong community
● Product maturity over 11 years
● Deploy on-premises or in the cloud
● Over 400+ pre-built processors
● Built-in data provenance
● Guaranteed delivery
● Throttling and Back pressure
9
9
Cloudera Flow Management
Ingest and manage data from edge-to-cloud using a no-code interface
ACQUIRE PROCESS DELIVER
• Over 300 pre-built processors
• Easy to build your own processors
• Parse, enrich & apply schema
• Filter, Split, Merge & Route
• Throttle & Backpressure
• Guaranteed delivery
• Full data provenance
• Eco-system integration
Advanced tooling to industrialize flow development
(Flow Development Life Cycle)
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLO
G
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLO
G
HASH
MERGE
EXTRACT
DUPLICATE
SPLIT
ROUTE TEXT
ROUTE CONTENT
ROUTE CONTEXT
CONTROL RATE
DISTRIBUTE LOAD
GEOENRICH
SCAN
REPLACE
TRANSLATE
CONVERT
ENCRYPT
TALL
EVALUATE
EXECUTE
10
10
SQL BASED ROUTING WITH NiFi’s QueryRecord Processor
● QueryRecord Processor- Executes a SQL
statement against records and writes the
results to the flow file content.
● CSVReader: Looking up schema from SR, it
will converts CSV Records into
ProcessRecords
● SQL execution via Apache Calcite:
execute configured SQL against the
ProcessRecords for routing
● CSVRecordSetWriter: Converts the result
of the query from Process records into CSV
for the for the flow file content
Do routing(routing geo and speed streams) using standard SQL as opposed to complex regular expressions.
11
11
Key Differentiators
Comprehensive streaming platform – Only vendor to offer a open and comprehensive streaming
platform for real-time data ingestion and processing to produce prescriptive and predictive analytics
Stream to Cloud – Extend the same on-premises streaming capabilities to the cloud with full support
for multi-cloud and hybrid cloud models
400+ pre-built processors – Only product to offer such comprehensive connectivity to a wide range
of data sources from edge to cloud
Enterprise-Grade Security & Governance – Deploy your streaming applications with confidence and
trust with Cloudera SDX offering unified security and governance across the entire platform
Democratize access to real-time data – Enable data analysts and other personas to quickly build
streaming applications with just SQL
12
12
Development & Runtime of DataFlow Functions
Step1. Develop functions
on local workstation or in
CDP Public Cloud using
no-code, UI designer
Step 2. Run functions on
serverless compute
services in AWS, Azure &
GCP
AWS Lambda Azure Functions Google Cloud Functions
13
13
DataFlow Functions Use Cases
Trigger Based, Batch, Scheduled and Microservice Use Cases
Serverless Trigger-Based
File Processing Pipeline
Develop & run data processing pipelines when
files are created or updated in any of the cloud
object stores
Example: When a photo is uploaded to object
storage, a data flow is triggered which runs image
resizing code and delivers resized image to
different locations.
Serverless Workflows /
Orchestration
Chain different low-code functions to build
complex workflows
Example: Automate the handling of support
tickets in a call center or orchestrate data
movement across different cloud services.
Serverless
Scheduled Tasks
Develop and run scheduled tasks without any
code on pre-defined timed intervals
Example: Offload an external database running
on-premises into the cloud once a day every
morning at 4:00 a.m.
Serverless
Microservices
Build and deploy serverless independent modules
that power your applications microservices
architecture
Example: Event-driven functions for easy
communication between thousands of decoupled
services that power a ride-sharing application.
Serverless
Web APIs
Easily build endpoints for your web applications
with HTTP APIs without any code using DFF and
any of the cloud providers' function triggers
Example: Build high performant, scalable web
applications across multiple data centers.
Serverless
Customized Triggers
With the DFF State feature, build flows to create
customized triggers allowing access to
on-premises or external services
Example: Near real time offloading of files from a
remote SFTP server.
14
14
Flow Catalog
• Central repository
for flow definitions
• Import existing
NiFi flows
• Manage flow
definitions
• Initiate flow
deployments
15
15
ReadyFlows
• Cloudera provided
flow definitions
• Cover most common
data flow use cases
• Can be deployed and
adjusted as needed
• Made available
through docs during
Tech Preview
16
16
Deployment
Wizard
• Turns flow definitions
into flow deployments
• Guides users through
providing required
configuration
• Pick from pre-defined
NiFi node sizes
• Define KPIs for the
deployment
Start Deployment Wizard Provide Parameters
Configure Sizing & Scaling Define KPIs
17
17
Key
Performance
Indicators
• Visibility into flow
deployments
• Track high level flow
performance
• Track in-depth NiFi
component metrics
• Defined in
Deployment Wizard
• Monitoring & Alerts
in Deployment
Details
KPI Definition in Deployment Wizard KPI Monitoring
18
18
Dashboard
• Central Monitoring View
• Monitors flow
deployments across
CDP environments
• Monitors flow
deployment health &
performance
• Drill into flow
deployment to monitor
system metrics and
deployment events
19
19
Data Flow
Design for
Everyone
• Cloud-native data
flow development
• Developers get their
own sandbox
• Start developing flows
without installing NiFi
• Redesigned visual
canvas
• Optimized interaction
patterns
• Integration into
CDF-PC Catalog for
versioning
20
20
https://docs.pingcap.com/tidb/dev/mysql-compatibility
Data Distribution and Sharing with TiDB
21
21
NiFi Ingesting REST API
● NiFi consumes stream
(cdc, rest, sensors)
● Distributes real-time to
● Kafka and MySQL at same time
● Flink SQL consumes from Kafka
● TiDB CDC -> Kafka
https://ossinsight.io/docs/api
22
22
Apache Kafka
23
23
Data Distribution with Apache Kafka
24
24
Apache Kudu
25
25
Why Kudu?
A simultaneous combination of sequential and random reads and writes
Can you insert time series data in
real time? How long does it take to
prepare it for analysis? Can you
get results and act fast enough to
change outcomes?
Can you handle large volumes of
machine-generated data? Do you
have the tools to identify
problems or threats? Can your
system do machine learning?
How fast can you add data to your
data store? Are you trading off the
ability to do broad analytics for the
ability to make updates? Are you
retaining only part of your data?
Time Series Data Machine Data Analytics Online Reporting
26
26
Online Transactional Processing (OLTP) and Online Analytical Processing (OLAP)
https://hpi.de/fileadmin/user_upload/hpi/navigation/10_forschung/20_future_soc_lab/Poster/2019-1/To
zun_FSOC-Poster_20191_150443.pdf
HTAP Options - Apache Kudu
27
27
HTAP Options - TiDB
28
28
Apache Flink SQL
29
29
SQL STREAM BUILDER (CLOUDERA SSB)
SQL STREAM BUILDER allows
developers, analysts, and data
scientists to write streaming
applications with industry
standard SQL.
No Java or Scala code
development required.
Simplifies access to data in Kafka
& Flink. Connectors to batch data in
HDFS, Kudu, Hive, S3, JDBC, CDC
and more
Enrich streaming data with batch
data in a single tool
Democratize access to real-time data with just SQL
30
30
SSB MATERIALIZED VIEWS
Key Takeaway; MV’s allow data scientist, analyst and developers consume data from the firehose
31
31
Infer Tables from Kafka Topics with JSON or Avro
32
32
Demos
33
HTAP
INGEST OF ALL DATA
Data Sources Cloudera Data
Flow
Cloudera
Streaming
Analytics
Cloudera
Streams
Processing
Kafka
Lake House
34
34
LLM USE CASE
Vector DB
AI Model
Unstructured file types
Data in Motion
on Cloudera Data
Platform (CDP)
Capture, process &
distribute any data,
anywhere
Other enterprise data Open Data Lakehouse
Materialized Views
Structured Sources
Applications/API’s
Streams
35
35
Live Q&A
Travel Advisories
Weather Reports
Documents
Social Media
Internal Data
Github Data
REST API
HYBRID CLOUD
INTERACT
COLLECT STORE
ENRICH, REPORT
Distribute
Collect
Report
REPORT
Visualize
Report, Automate
AI BASED ENHANCEMENTS
Predict, Automate
VECTOR DATABASE
LLM
Machine
Learning
Data
Visualization
Data Flow
Data
Warehouse
SQL
Stream Builder
Data
Visualization
Input Sentences
Generated Text
Timestamp
Input Sentence
Timestamps
Enrichments
Messaging
Broker
Real-time alerting
Real-time alerting
Aggregations
36
36
RUN AT HOME
37
37
CSP
Community
Edition
● Kafka, KConnect, SMM, SR,
Flink, and SSB in Docker
● Runs in Docker
● Try new features quickly
● Develop applications
locally
● Docker compose file of CSP to run from command line w/o any
dependencies, including Flink, SQL Stream Builder, Kafka, Kafka
Connect, Streams Messaging Manager and Schema Registry
○ $> docker compose up
● Licensed under the Cloudera Community License
● Unsupported
● Community Group Hub for CSP
● Find it on docs.cloudera.com under Applications
38
38
Open Source Edition
● Apache NiFi in Docker
● Runs in Docker
● Try new features
quickly
● Develop applications
locally
● Docker NiFi
○ docker run --name nifi -p 8443:8443 -d -e
SINGLE_USER_CREDENTIALS_USERNAME=admin -e
SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUgh
vvgEvjnaLjFEB apache/nifi:latest
● Licensed under the ASF License
● Unsupported
https://hub.docker.com/r/apache/nifi
39
39
Thank You
1 von 39

Recomendados

GSJUG: Mastering Data Streaming Pipelines 09May2023 von
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023Timothy Spann
255 views80 Folien
Best Practices for Building Hybrid-Cloud Architectures | Hans Jespersen von
Best Practices for Building Hybrid-Cloud Architectures | Hans JespersenBest Practices for Building Hybrid-Cloud Architectures | Hans Jespersen
Best Practices for Building Hybrid-Cloud Architectures | Hans Jespersenconfluent
403 views31 Folien
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr... von
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...
Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...VMware Tanzu
631 views28 Folien
Confluent kafka meetupseattle jan2017 von
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Nitin Kumar
977 views38 Folien
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022 von
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022HostedbyConfluent
333 views35 Folien
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière von
Au delà des brokers, un tour de l’environnement Kafka | Florent RamièreAu delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramièreconfluent
317 views58 Folien

Más contenido relacionado

Similar a The Never Landing Stream with HTAP and Streaming

Streaming Data and Stream Processing with Apache Kafka von
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafkaconfluent
3K views44 Folien
JConWorld_ Continuous SQL with Kafka and Flink von
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
106 views36 Folien
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo von
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
158 views8 Folien
Streaming Sensor Data Slides_Virender von
Streaming Sensor Data Slides_VirenderStreaming Sensor Data Slides_Virender
Streaming Sensor Data Slides_Virendervithakur
720 views36 Folien
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data von
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataTimothy Spann
193 views45 Folien
Leveraging Mainframe Data for Modern Analytics von
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analyticsconfluent
2.5K views33 Folien

Similar a The Never Landing Stream with HTAP and Streaming(20)

Streaming Data and Stream Processing with Apache Kafka von confluent
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
confluent3K views
JConWorld_ Continuous SQL with Kafka and Flink von Timothy Spann
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann106 views
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo von Timothy Spann
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Timothy Spann158 views
Streaming Sensor Data Slides_Virender von vithakur
Streaming Sensor Data Slides_VirenderStreaming Sensor Data Slides_Virender
Streaming Sensor Data Slides_Virender
vithakur720 views
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data von Timothy Spann
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Timothy Spann193 views
Leveraging Mainframe Data for Modern Analytics von confluent
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
confluent2.5K views
Streaming Visualization von Guido Schmutz
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz1.7K views
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf von Timothy Spann
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
Timothy Spann23 views
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp von Timothy Spann
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Timothy Spann163 views
Streaming Data Ingest and Processing with Apache Kafka von Attunity
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
Attunity4.3K views
DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdf von confluent
DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdfDIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdf
DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdf
confluent77 views
Databricks Platform.pptx von Alex Ivy
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy3.2K views
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee... von HostedbyConfluent
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
HostedbyConfluent378 views
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo... von Precisely
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...
Precisely302 views
.NET Cloud-Native Bootcamp- Los Angeles von VMware Tanzu
.NET Cloud-Native Bootcamp- Los Angeles.NET Cloud-Native Bootcamp- Los Angeles
.NET Cloud-Native Bootcamp- Los Angeles
VMware Tanzu450 views
Red hat's updates on the cloud & infrastructure strategy von Orgad Kimchi
Red hat's updates on the cloud & infrastructure strategyRed hat's updates on the cloud & infrastructure strategy
Red hat's updates on the cloud & infrastructure strategy
Orgad Kimchi413 views
Beyond the Brokers: A Tour of the Kafka Ecosystem von confluent
Beyond the Brokers: A Tour of the Kafka EcosystemBeyond the Brokers: A Tour of the Kafka Ecosystem
Beyond the Brokers: A Tour of the Kafka Ecosystem
confluent780 views
Beyond the brokers - A tour of the Kafka ecosystem von Damien Gasparina
Beyond the brokers - A tour of the Kafka ecosystemBeyond the brokers - A tour of the Kafka ecosystem
Beyond the brokers - A tour of the Kafka ecosystem
Damien Gasparina613 views
Beyond the brokers - Un tour de l'écosystème Kafka von Florent Ramiere
Beyond the brokers - Un tour de l'écosystème KafkaBeyond the brokers - Un tour de l'écosystème Kafka
Beyond the brokers - Un tour de l'écosystème Kafka
Florent Ramiere783 views
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli... von Databricks
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
Databricks757 views

Más de Timothy Spann

[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines von
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
96 views25 Folien
CoC23_ Looking at the New Features of Apache NiFi von
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
36 views24 Folien
CoC23_ Let’s Monitor The Conditions at the Conference von
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
17 views17 Folien
CoC23_Utilizing Real-Time Transit Data for Travel Optimization von
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationTimothy Spann
30 views30 Folien
Meetup - Brasil - Data In Motion - 2023 September 19 von
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Timothy Spann
314 views33 Folien
Implement a Universal Data Distribution Architecture to Manage All Streaming ... von
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Timothy Spann
28 views56 Folien

Más de Timothy Spann(20)

[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines von Timothy Spann
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
Timothy Spann96 views
CoC23_ Looking at the New Features of Apache NiFi von Timothy Spann
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
Timothy Spann36 views
CoC23_ Let’s Monitor The Conditions at the Conference von Timothy Spann
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the Conference
Timothy Spann17 views
CoC23_Utilizing Real-Time Transit Data for Travel Optimization von Timothy Spann
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
Timothy Spann30 views
Meetup - Brasil - Data In Motion - 2023 September 19 von Timothy Spann
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
Timothy Spann314 views
Implement a Universal Data Distribution Architecture to Manage All Streaming ... von Timothy Spann
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann28 views
big data fest building modern data streaming apps von Timothy Spann
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming apps
Timothy Spann317 views
OSSNA Building Modern Data Streaming Apps von Timothy Spann
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann155 views
BestInFlowCompetitionTutorials03May2023 von Timothy Spann
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023
Timothy Spann11 views
Cloudera Sandbox Event Guidelines For Workflow von Timothy Spann
Cloudera Sandbox Event Guidelines For WorkflowCloudera Sandbox Event Guidelines For Workflow
Cloudera Sandbox Event Guidelines For Workflow
Timothy Spann32 views
Meet the Committers Webinar_ Lab Preparation von Timothy Spann
Meet the Committers Webinar_ Lab PreparationMeet the Committers Webinar_ Lab Preparation
Meet the Committers Webinar_ Lab Preparation
Timothy Spann32 views
Meetup: Streaming Data Pipeline Development von Timothy Spann
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
Timothy Spann337 views
DevNexus: Apache Pulsar Development 101 with Java von Timothy Spann
DevNexus:  Apache Pulsar Development 101 with JavaDevNexus:  Apache Pulsar Development 101 with Java
DevNexus: Apache Pulsar Development 101 with Java
Timothy Spann261 views
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices von Timothy Spann
Conf42 Python_ ML Enhanced Event Streaming Apps with Python MicroservicesConf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
Timothy Spann443 views
ITPC Building Modern Data Streaming Apps von Timothy Spann
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming Apps
Timothy Spann797 views
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python von Timothy Spann
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with PythonPythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python
Timothy Spann430 views
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java von Timothy Spann
PhillyJug  Getting Started With Real-time Cloud Native Streaming With JavaPhillyJug  Getting Started With Real-time Cloud Native Streaming With Java
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java
Timothy Spann625 views
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud) von Timothy Spann
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Timothy Spann17 views

Último

DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J... von
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...Deltares
9 views24 Folien
DSD-INT 2023 Next-Generation Flood Inundation Mapping for Taiwan - Delft3D FM... von
DSD-INT 2023 Next-Generation Flood Inundation Mapping for Taiwan - Delft3D FM...DSD-INT 2023 Next-Generation Flood Inundation Mapping for Taiwan - Delft3D FM...
DSD-INT 2023 Next-Generation Flood Inundation Mapping for Taiwan - Delft3D FM...Deltares
7 views40 Folien
DSD-INT 2023 European Digital Twin Ocean and Delft3D FM - Dols von
DSD-INT 2023 European Digital Twin Ocean and Delft3D FM - DolsDSD-INT 2023 European Digital Twin Ocean and Delft3D FM - Dols
DSD-INT 2023 European Digital Twin Ocean and Delft3D FM - DolsDeltares
7 views23 Folien
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko... von
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...Deltares
12 views23 Folien
Software testing company in India.pptx von
Software testing company in India.pptxSoftware testing company in India.pptx
Software testing company in India.pptxSakshiPatel82
7 views9 Folien
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the... von
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...Deltares
6 views22 Folien

Último(20)

DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J... von Deltares
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...
DSD-INT 2023 3D hydrodynamic modelling of microplastic transport in lakes - J...
Deltares9 views
DSD-INT 2023 Next-Generation Flood Inundation Mapping for Taiwan - Delft3D FM... von Deltares
DSD-INT 2023 Next-Generation Flood Inundation Mapping for Taiwan - Delft3D FM...DSD-INT 2023 Next-Generation Flood Inundation Mapping for Taiwan - Delft3D FM...
DSD-INT 2023 Next-Generation Flood Inundation Mapping for Taiwan - Delft3D FM...
Deltares7 views
DSD-INT 2023 European Digital Twin Ocean and Delft3D FM - Dols von Deltares
DSD-INT 2023 European Digital Twin Ocean and Delft3D FM - DolsDSD-INT 2023 European Digital Twin Ocean and Delft3D FM - Dols
DSD-INT 2023 European Digital Twin Ocean and Delft3D FM - Dols
Deltares7 views
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko... von Deltares
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
Deltares12 views
Software testing company in India.pptx von SakshiPatel82
Software testing company in India.pptxSoftware testing company in India.pptx
Software testing company in India.pptx
SakshiPatel827 views
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the... von Deltares
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...
DSD-INT 2023 Leveraging the results of a 3D hydrodynamic model to improve the...
Deltares6 views
Citi TechTalk Session 2: Kafka Deep Dive von confluent
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
confluent17 views
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI... von Marc Müller
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Marc Müller37 views
DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge... von Deltares
DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge...DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge...
DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge...
Deltares17 views
Tridens DevOps von Tridens
Tridens DevOpsTridens DevOps
Tridens DevOps
Tridens9 views
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx von animuscrm
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
animuscrm13 views
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ... von Deltares
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...
DSD-INT 2023 Wave-Current Interaction at Montrose Tidal Inlet System and Its ...
Deltares10 views
Headless JS UG Presentation.pptx von Jack Spektor
Headless JS UG Presentation.pptxHeadless JS UG Presentation.pptx
Headless JS UG Presentation.pptx
Jack Spektor7 views
DSD-INT 2023 The Danube Hazardous Substances Model - Kovacs von Deltares
DSD-INT 2023 The Danube Hazardous Substances Model - KovacsDSD-INT 2023 The Danube Hazardous Substances Model - Kovacs
DSD-INT 2023 The Danube Hazardous Substances Model - Kovacs
Deltares8 views
Fleet Management Software in India von Fleetable
Fleet Management Software in India Fleet Management Software in India
Fleet Management Software in India
Fleetable11 views
DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit... von Deltares
DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit...DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit...
DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit...
Deltares13 views
El Arte de lo Possible von Neo4j
El Arte de lo PossibleEl Arte de lo Possible
El Arte de lo Possible
Neo4j39 views

The Never Landing Stream with HTAP and Streaming

  • 1. 1 1 The Never Landing Stream with HTAP and Streaming Timothy Spann Principal Developer Advocate
  • 4. 4 4 FLaNK Stack Tim Spann @PaasDev // Blog: www.datainmotion.dev Principal Developer Advocate. Princeton Future of Data Meetup. ex-Pivotal, ex-Hortonworks, ex-StreamNative, ex-PwC https://medium.com/@tspann https://github.com/tspannhw Apache NiFi x Apache Kafka x Apache Flink
  • 5. 5 5 Future of Data - Princeton + Virtual @PaasDev https://www.meetup.com/futureofdata-princeton/ From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ...
  • 6. 6 6 CDP IS THE ONLY HYBRID DATA PLATFORM Hybrid. Open. Portable. Secure. S3 GCS OZONE ADLS OZONE S3 GCS ADLS CLOUDERA DATA PLATFORM OZONE S3 GCS ADLS OPEN DATA LAKEHOUSE
  • 8. 8 8 CLOUDERA FLOW MANAGEMENT - POWERED BY APACHE NiFi Ingest and manage data from edge-to-cloud using a no-code interface ● #1 data ingestion/movement engine ● Strong community ● Product maturity over 11 years ● Deploy on-premises or in the cloud ● Over 400+ pre-built processors ● Built-in data provenance ● Guaranteed delivery ● Throttling and Back pressure
  • 9. 9 9 Cloudera Flow Management Ingest and manage data from edge-to-cloud using a no-code interface ACQUIRE PROCESS DELIVER • Over 300 pre-built processors • Easy to build your own processors • Parse, enrich & apply schema • Filter, Split, Merge & Route • Throttle & Backpressure • Guaranteed delivery • Full data provenance • Eco-system integration Advanced tooling to industrialize flow development (Flow Development Life Cycle) FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLO G FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLO G HASH MERGE EXTRACT DUPLICATE SPLIT ROUTE TEXT ROUTE CONTENT ROUTE CONTEXT CONTROL RATE DISTRIBUTE LOAD GEOENRICH SCAN REPLACE TRANSLATE CONVERT ENCRYPT TALL EVALUATE EXECUTE
  • 10. 10 10 SQL BASED ROUTING WITH NiFi’s QueryRecord Processor ● QueryRecord Processor- Executes a SQL statement against records and writes the results to the flow file content. ● CSVReader: Looking up schema from SR, it will converts CSV Records into ProcessRecords ● SQL execution via Apache Calcite: execute configured SQL against the ProcessRecords for routing ● CSVRecordSetWriter: Converts the result of the query from Process records into CSV for the for the flow file content Do routing(routing geo and speed streams) using standard SQL as opposed to complex regular expressions.
  • 11. 11 11 Key Differentiators Comprehensive streaming platform – Only vendor to offer a open and comprehensive streaming platform for real-time data ingestion and processing to produce prescriptive and predictive analytics Stream to Cloud – Extend the same on-premises streaming capabilities to the cloud with full support for multi-cloud and hybrid cloud models 400+ pre-built processors – Only product to offer such comprehensive connectivity to a wide range of data sources from edge to cloud Enterprise-Grade Security & Governance – Deploy your streaming applications with confidence and trust with Cloudera SDX offering unified security and governance across the entire platform Democratize access to real-time data – Enable data analysts and other personas to quickly build streaming applications with just SQL
  • 12. 12 12 Development & Runtime of DataFlow Functions Step1. Develop functions on local workstation or in CDP Public Cloud using no-code, UI designer Step 2. Run functions on serverless compute services in AWS, Azure & GCP AWS Lambda Azure Functions Google Cloud Functions
  • 13. 13 13 DataFlow Functions Use Cases Trigger Based, Batch, Scheduled and Microservice Use Cases Serverless Trigger-Based File Processing Pipeline Develop & run data processing pipelines when files are created or updated in any of the cloud object stores Example: When a photo is uploaded to object storage, a data flow is triggered which runs image resizing code and delivers resized image to different locations. Serverless Workflows / Orchestration Chain different low-code functions to build complex workflows Example: Automate the handling of support tickets in a call center or orchestrate data movement across different cloud services. Serverless Scheduled Tasks Develop and run scheduled tasks without any code on pre-defined timed intervals Example: Offload an external database running on-premises into the cloud once a day every morning at 4:00 a.m. Serverless Microservices Build and deploy serverless independent modules that power your applications microservices architecture Example: Event-driven functions for easy communication between thousands of decoupled services that power a ride-sharing application. Serverless Web APIs Easily build endpoints for your web applications with HTTP APIs without any code using DFF and any of the cloud providers' function triggers Example: Build high performant, scalable web applications across multiple data centers. Serverless Customized Triggers With the DFF State feature, build flows to create customized triggers allowing access to on-premises or external services Example: Near real time offloading of files from a remote SFTP server.
  • 14. 14 14 Flow Catalog • Central repository for flow definitions • Import existing NiFi flows • Manage flow definitions • Initiate flow deployments
  • 15. 15 15 ReadyFlows • Cloudera provided flow definitions • Cover most common data flow use cases • Can be deployed and adjusted as needed • Made available through docs during Tech Preview
  • 16. 16 16 Deployment Wizard • Turns flow definitions into flow deployments • Guides users through providing required configuration • Pick from pre-defined NiFi node sizes • Define KPIs for the deployment Start Deployment Wizard Provide Parameters Configure Sizing & Scaling Define KPIs
  • 17. 17 17 Key Performance Indicators • Visibility into flow deployments • Track high level flow performance • Track in-depth NiFi component metrics • Defined in Deployment Wizard • Monitoring & Alerts in Deployment Details KPI Definition in Deployment Wizard KPI Monitoring
  • 18. 18 18 Dashboard • Central Monitoring View • Monitors flow deployments across CDP environments • Monitors flow deployment health & performance • Drill into flow deployment to monitor system metrics and deployment events
  • 19. 19 19 Data Flow Design for Everyone • Cloud-native data flow development • Developers get their own sandbox • Start developing flows without installing NiFi • Redesigned visual canvas • Optimized interaction patterns • Integration into CDF-PC Catalog for versioning
  • 21. 21 21 NiFi Ingesting REST API ● NiFi consumes stream (cdc, rest, sensors) ● Distributes real-time to ● Kafka and MySQL at same time ● Flink SQL consumes from Kafka ● TiDB CDC -> Kafka https://ossinsight.io/docs/api
  • 25. 25 25 Why Kudu? A simultaneous combination of sequential and random reads and writes Can you insert time series data in real time? How long does it take to prepare it for analysis? Can you get results and act fast enough to change outcomes? Can you handle large volumes of machine-generated data? Do you have the tools to identify problems or threats? Can your system do machine learning? How fast can you add data to your data store? Are you trading off the ability to do broad analytics for the ability to make updates? Are you retaining only part of your data? Time Series Data Machine Data Analytics Online Reporting
  • 26. 26 26 Online Transactional Processing (OLTP) and Online Analytical Processing (OLAP) https://hpi.de/fileadmin/user_upload/hpi/navigation/10_forschung/20_future_soc_lab/Poster/2019-1/To zun_FSOC-Poster_20191_150443.pdf HTAP Options - Apache Kudu
  • 29. 29 29 SQL STREAM BUILDER (CLOUDERA SSB) SQL STREAM BUILDER allows developers, analysts, and data scientists to write streaming applications with industry standard SQL. No Java or Scala code development required. Simplifies access to data in Kafka & Flink. Connectors to batch data in HDFS, Kudu, Hive, S3, JDBC, CDC and more Enrich streaming data with batch data in a single tool Democratize access to real-time data with just SQL
  • 30. 30 30 SSB MATERIALIZED VIEWS Key Takeaway; MV’s allow data scientist, analyst and developers consume data from the firehose
  • 31. 31 31 Infer Tables from Kafka Topics with JSON or Avro
  • 33. 33 HTAP INGEST OF ALL DATA Data Sources Cloudera Data Flow Cloudera Streaming Analytics Cloudera Streams Processing Kafka Lake House
  • 34. 34 34 LLM USE CASE Vector DB AI Model Unstructured file types Data in Motion on Cloudera Data Platform (CDP) Capture, process & distribute any data, anywhere Other enterprise data Open Data Lakehouse Materialized Views Structured Sources Applications/API’s Streams
  • 35. 35 35 Live Q&A Travel Advisories Weather Reports Documents Social Media Internal Data Github Data REST API HYBRID CLOUD INTERACT COLLECT STORE ENRICH, REPORT Distribute Collect Report REPORT Visualize Report, Automate AI BASED ENHANCEMENTS Predict, Automate VECTOR DATABASE LLM Machine Learning Data Visualization Data Flow Data Warehouse SQL Stream Builder Data Visualization Input Sentences Generated Text Timestamp Input Sentence Timestamps Enrichments Messaging Broker Real-time alerting Real-time alerting Aggregations
  • 37. 37 37 CSP Community Edition ● Kafka, KConnect, SMM, SR, Flink, and SSB in Docker ● Runs in Docker ● Try new features quickly ● Develop applications locally ● Docker compose file of CSP to run from command line w/o any dependencies, including Flink, SQL Stream Builder, Kafka, Kafka Connect, Streams Messaging Manager and Schema Registry ○ $> docker compose up ● Licensed under the Cloudera Community License ● Unsupported ● Community Group Hub for CSP ● Find it on docs.cloudera.com under Applications
  • 38. 38 38 Open Source Edition ● Apache NiFi in Docker ● Runs in Docker ● Try new features quickly ● Develop applications locally ● Docker NiFi ○ docker run --name nifi -p 8443:8443 -d -e SINGLE_USER_CREDENTIALS_USERNAME=admin -e SINGLE_USER_CREDENTIALS_PASSWORD=ctsBtRBKHRAx69EqUgh vvgEvjnaLjFEB apache/nifi:latest ● Licensed under the ASF License ● Unsupported https://hub.docker.com/r/apache/nifi