SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Downloaden Sie, um offline zu lesen
Complex Event Processing platform handling
millions of users
Krzysztof Zarzycki - CTO @ Getindata
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
About us
Founded in 2014 by
ex-Spotify engineers.
Focus only on Big Data and
Cloud (from day 1)
Community builders (Big
Data Tech Warsaw
organizers)
50+ Big Data engineers
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What is it?
The application logic, analytics, and queries exist continuously, and data flows through them continuously.
Stream Processing
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Why is it important for business?
Stream Processing
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Actionable insights
Stream Processing
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Stream Processing
Why is it important for engineering?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
It’s NOT only about real-time
It’s just natural - data comes continuously.
Stream Processing
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Stream Processing
User sessions spanning minutes, hours, or days
Batch boundaries are often artificial.
♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
[9:00 - 10:00) [10:00-11:00)
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Complex Event Processing
● Analyze patterns,relations, cause-and-effect
○ If A & B then C
● Infer business-relevant events from raw technical stream
● .. and cascade extraction of even higher-level events
● Alerts, triggers, workflow automation
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Complex Event Processing
● behavioral marketing
● product analytics
● business activity monitoring
● technical monitoring and anomaly detection
● IoT
● fraud detection
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
ESP vs CEP
● Difference is blurry and diminishing
● Traditional CEP
○ Complex proc, low latency, single-machine
○ high-level language like SQL
● Traditional ESP
○ straightforward, high-throughput, distributed
○ Broader, more generic and low-level
● NOW: Best of both!
○ Often called “Streaming Analytics”
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
The need
● Streaming model and real-time
● On par with batch
○ Enrichment, Joins
○ Aggregation
○ Reprocessing of historical data
○ Machine Learning scoring, inference
○ Complex Event Processing
● large scale, high-throughput
● correctness and fault tolerance
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
The solution
Apache Flink
open-source stateful processor over massive data streams
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Who uses Flink
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
We use Flink!
Banks Telcos Automotive Adtech
Commiters to Flink
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Late & Out-of-sequence
events
Breaks correctness.
Often handled with very tedious user code.
Or solved in batch by “waiting enough” and “processing twice”.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Late & Out-of-sequence
events
Handled by Flink in the framework
Based on watermarks heuristics, that marks the progress of event time in the stream.
Asserts that all earlier events have probably arrived.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Local State
Operational state obligatory for analytics
Used for accumulators, windows, source offsets, tracking patterns, ...
6
sum
1
3
2
4
1
1
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Local State
Persistent operational state local to computation
Maximize performance with millions of updates per second & core.
Enable out-of-core (more than RAM) processing, with RocksDb
State
Task 1
Logic State
Task N
Logic
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Fault tolerance
(Checkpointing)
State survives abrupt crashes or just maintenance
Checkpointed regularly to resilient external storage.
Accurate - keeps stream offsets, accumulators or windows in perfect sync, consistency.
Efficient - almost no impact on the processing.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Flink Cluster
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
APIs
Java/Scala
high-level Table API
mid-level dataflow
low-level advanced for tricky cases
Developer Data Scientist Analyst
SQL
Incl. analytical functions
MATCH_RECOGNIZE
and UDF extensibility
Python
Based on Table API
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Big picture
Assisting Millions of User in Real-Time
Kcell case
About Kcell
Kcell has a strong software
development team and lots
of experience in building
services and products
We like innovations
> 10 000 000 subscribers
Largest GSM operator
in Kazakhstan
4G (40%), 3G (73%), 2G
(96%) population
Great network
coverage
There is the ongoing
process of company digital
transformation
Not only telco
Business needs
Assisting Millions of Users in Real-Time
SMS events
Voice usage events
Data usage events
Roaming events
Location events
Input Process Actions
Use Cases
Use case scenarios. Just few of many.
Case
If subscriber top-ups her balance too often in
short period of time. We can offer her a less
expensive tariff or auto-payment services.
Balance Top Up Case
Trigger UI
Roaming
Fraud
Trigger to Marketing Platform if subscriber
visited X country OR/AND registered in Y
visited mobile network and his device's type
is Z
Roaming case
Send an email to the anti-fraud unit if
subscriber registered in roaming but his
balance at the moment is equal to 0.
This situation is impossible in standard case.
Fraud case in roaming
Personalized Notifications
Business Automation
Regulatory
Future Work
We have already done a lot. But more great things are coming.
2020 Q3 2020 Q4 2021 Q1 Bright Future
More Data Sources
More Triggers
Geolocation data
Equipment logs
Commoditize Machine Learning
Extract value from ML company-wide!
Enable easy ML training and productization
of models in real-time
Real-time BI
Intraday view on business and
operations
Monetize valuable insights from
our combined rich data sources.
Data Monetization
Predictive maintenance
Network Optimization
To lower operational costs
And make better investments
And many more...
Create behavioral profile of the
customers for better
personalised serving
Customer 360 view
Old System
Why did we start to look for the new solution?
External Vendor
Solution
Blackbox Solution
Scalability issues
Not reliable
1
2
3
Kcell Developers can’t fix, tweak or optimize it
Limited to ~2000 events / sec
Can’t support all needed data sources
Multiple accidents which took too much time to resolve
Scale
Required system throughput
500K
Events / second
10M
Subscribers
40
TB / month
New Solution
Real-time Stream Processing
ingestion outgestion
events
hub
events
processing
HTTP
push/pull
FTP
NFS
MQ
HTTP
push/pull
FTP
MQ
New Solution
Real-time Stream Processing
flink
ingestion outgestion
events
hub
events
processing
HTTP
push/pull
FTP
NFS
MQ
HTTP
push/pull
FTP
MQ
flink flink
New Solution (Operations)
Web UI, Monitoring, Security
flink
ingestion outgestion
events
hub
events
processing
HTTP
push/pull
FTP
NFS
MQ
HTTP
push/pull
FTP
MQ
Admin UI
(Triggers workbench)
Monitoring
Loki - logs
Prometheus/Grafana -
metrics
Security
FreeIPA
Kerberos
LDAP/AD
API (kafka based)
flink flink
New Solution (Data Lake)
Data Lake and Sub-second OLAP Analytics
flink
ingestion outgestion
events
hub
events
processing
HTTP
push/pull
FTP
NFS
MQ
HTTP
push/pull
FTP
MQ
Data Lake
Historical Storage (HDFS)
Batch (Spark) SQL (Hive)
Keep history, Report, Explore
Column-oriented
Data store
OLAP (Druid)
Interactive BI
flink flink
Processing Flow
Real-time Stream Processing
raw call events
data usage events
transform
transformed events
transform
transformed events
local state
RocksDB
control topic
Admin UI
HTTP
calls
notification
events
outgestion
ingestion
ingestion
submit/stop
triggers
Dynamic Rules
Design
Some treats for Squirrels
Dynamic Rules Design
Key Points
● We want to run 100s of triggers/business rules
● A typical approach: job per rule
● Won’t work in our case:
○ Run 100s of topologies/jobs = multiplied resources cost
○ Pull data from Kafka 100s of times
○ State (user features) replicated 100s times
○ Starting rule requires deployment of the job
Dynamic Rules Design
Key Points
● Our approach: One job to run all triggers/rules
○ And to consume all the sources
● Trigger “templates” still coded with java
● adding/removing rules without restarting application
● 100s of rules running efficiently
Dynamic Rules Design
The Overview
billing events
roaming
Sort by time
control topic
notification
events
Deduplicate Router
Late events
Trigger 1
Trigger 2
State
Updater
Apply Triggers (CoProcess Function)
Keyed by User
Dynamic Rules Design
Pros and Cons
Shared resources and costs
● CPU, RAM, state, shuffle
● Pulling data from Kafka
One bad rule affects whole system
● Watermarks are shared
● Failures are shared
No job restart on start of new rules
● Rules started by business, no IT
involved
Still need to code rule template in
the job
● No way to use SQL, Table API, CEP
Sharing of state
● Build customer features, that can
be seen by all rules
Can be tricky to debug
● Code is shared
● Code paths enabled externally
Dynamic Rules Design
Issue: lagging sources slow down all rules
Source A:
highly unordered, late
Source B:
Ordered, low latency
Late notifications
Low latency
notifications
Triggers
Triggers
Group 1
Triggers
Group 2
Source A:
highly unordered, late
Source B:
Ordered, low latency
Triggers
Group
Late notifications
Problem Solution
Flink Changes Wishlist
What could be even better?
attach new branch to existing topology
that receives the same data
Dynamic Topologies
Cheaper topologies
● Graph of topologies that pass
data locally in Flink
● Other words: Local
Proxying/fan-out of Kafka traffic
Share inputs between topologies
Dynamic SQL
SQL
{ }
Decisions made
Some decisions our team made before or during project implementation
Streaming-first approach
Apache Kafka for event hub
Apache Flink
Powerful Real-Time Analytics
Apache Avro
Keep state local to the process
Ingest reference data for local joins and
enrichment
● No need to query external systems
while processing
● Data time correlation correctness
Performance
transformed
events
transformed
events
Subscriber profile data
(events)
Local State
Not at >100K
events / sec
Nifi for data ingestion (no coding)
● but not for CEP
Web UI for configuring triggers
Ease of Use
Flink on YARN, with HDFS
HA for redundancy and running ~24/7
Prometheus & Grafana for monitoring &
alerting
Loki for logs collection and aggregation
Reliability and battle-tested techniques
Kerberos and AD thanks to FreeIPA
Apache Ranger for authorization
Security
One platform for the whole Enterprise
Batch (adhoc) queries too
● Spark, Hive/Presto
Online analytics
● OLAP
Extensiveness
HDP
Open-source technologies
HDP as a licence-free distribution
Just start with a bunch of servers
Cost-Efficiency
Testing
def "should notify when user's balance drops below threshold"() {
given:
BalanceDropTrigger trigger =balanceDropTrigger()
.threshold(
50.0)
.outgestionSystem(
'campaignSystem')
.build()
admin.createsTrigger(trigger)
and:
user.withBalance(
60.0)
when:
user.makesCall(
phoneCall().amountSpent(
20.0))
then:
wait(allowedEventLateness)
and:
List<Notification> actualNotifications =
campaignSystem.getNotifications(
user, trigger)
and:
actualNotifications.size() ==1
assertThat(actualNotifications.first())
.hasMsisdn(
user.msisdn)
.hasBalanceAfter(
40.0)
cleanup:
admin.deletesTrigger(trigger)
}
flink
ingestion outgestion
events
hub
events
processing
Fake
Campaign
System
HTTP
push/pull
FTP
NFS
MQ
Test event generators
Preprod environment
Our Collaboration
Two heads are better than one
Joint development team
Not a vendor solution
Development as one team
Code quality
Code review and
automated tools for
code quality control
Agile Practices
Distant geographic
locations, but
everyday standups
Go live quickly!
<4 months to first
production case
running 24/7!
Deliver
DevOps/Automation
Knowledge sharing
Constant knowledge
exchange in areas of
expertise
Testing
Separate testing
environment
Automated Unit/E2E tests
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Q&A

Weitere ähnliche Inhalte

Was ist angesagt?

Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...Databricks
 
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)Albert Wong
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahDatabricks
 
Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Big Data Spain
 
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonData Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonDatabricks
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataTrieu Nguyen
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Big Data Spain
 
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Finding the needle in the haystack: how Nestle is leveraging big data to defe...Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Finding the needle in the haystack: how Nestle is leveraging big data to defe...Big Data Spain
 
Security Breakout Session
Security Breakout Session Security Breakout Session
Security Breakout Session Splunk
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalVMware Tanzu Korea
 
Lambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB TestingLambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB TestingTrieu Nguyen
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalVMware Tanzu Korea
 
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020Databricks
 
How to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
How to Streamline Incident Response with InfluxDB, PagerDuty and RundeckHow to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
How to Streamline Incident Response with InfluxDB, PagerDuty and RundeckInfluxData
 
Life is but a Stream
Life is but a StreamLife is but a Stream
Life is but a StreamDatabricks
 
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneUsing H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneSri Ambati
 
T-Mobile and Elastic
T-Mobile and ElasticT-Mobile and Elastic
T-Mobile and ElasticElasticsearch
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Impetus Technologies
 
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionHow KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionElasticsearch
 
Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)
Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)
Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)Spark Summit
 

Was ist angesagt? (20)

Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
 
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
 
Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...
 
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris RobisonData Science and Enterprise Engineering with Michael Finger and Chris Robison
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
 
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Finding the needle in the haystack: how Nestle is leveraging big data to defe...Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
 
Security Breakout Session
Security Breakout Session Security Breakout Session
Security Breakout Session
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from Pivotal
 
Lambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB TestingLambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB Testing
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
 
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
 
How to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
How to Streamline Incident Response with InfluxDB, PagerDuty and RundeckHow to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
How to Streamline Incident Response with InfluxDB, PagerDuty and Rundeck
 
Life is but a Stream
Life is but a StreamLife is but a Stream
Life is but a Stream
 
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital OneUsing H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
 
T-Mobile and Elastic
T-Mobile and ElasticT-Mobile and Elastic
T-Mobile and Elastic
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
 
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionHow KeyBank Used Elastic to Build an Enterprise Monitoring Solution
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
 
Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)
Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)
Scaling Your Skillset with Your Data with Jarrett Garcia (Nielsen)
 

Ähnlich wie Complex event processing platform handling millions of users - Krzysztof Zarzycki, GetInData

Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastDatabricks
 
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...Flink Forward
 
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...GetInData
 
Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...GetInData
 
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...GetInData
 
Next generation business automation with the red hat decision manager and red...
Next generation business automation with the red hat decision manager and red...Next generation business automation with the red hat decision manager and red...
Next generation business automation with the red hat decision manager and red...Masahiko Umeno
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of dataconfluent
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain confluent
 
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...Kai Wähner
 
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data ExplosionAudax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data Explosionactifio
 
Approaches to Network Automation
Approaches to Network AutomationApproaches to Network Automation
Approaches to Network AutomationAPNIC
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
Deliver agile flow presentation (1)
Deliver agile   flow presentation (1)Deliver agile   flow presentation (1)
Deliver agile flow presentation (1)James Urquhart
 
Cloud Service Management: Why Machine Learning is Now Essential
Cloud Service Management: Why Machine Learning is Now EssentialCloud Service Management: Why Machine Learning is Now Essential
Cloud Service Management: Why Machine Learning is Now EssentialDevOps.com
 
Device to Intelligence, IOT and Big Data in Oracle
Device to Intelligence, IOT and Big Data in OracleDevice to Intelligence, IOT and Big Data in Oracle
Device to Intelligence, IOT and Big Data in OracleJunSeok Seo
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)KafkaZone
 

Ähnlich wie Complex event processing platform handling millions of users - Krzysztof Zarzycki, GetInData (20)

Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
 
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
Flink Forward Berlin 2018: Krzysztof Zarzycki & Alexey Brodovshuk - "Assistin...
 
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
 
Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...
 
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
 
Next generation business automation with the red hat decision manager and red...
Next generation business automation with the red hat decision manager and red...Next generation business automation with the red hat decision manager and red...
Next generation business automation with the red hat decision manager and red...
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain 
 
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
 
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data ExplosionAudax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
 
Approaches to Network Automation
Approaches to Network AutomationApproaches to Network Automation
Approaches to Network Automation
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Deliver agile flow presentation (1)
Deliver agile   flow presentation (1)Deliver agile   flow presentation (1)
Deliver agile flow presentation (1)
 
Cloud Service Management: Why Machine Learning is Now Essential
Cloud Service Management: Why Machine Learning is Now EssentialCloud Service Management: Why Machine Learning is Now Essential
Cloud Service Management: Why Machine Learning is Now Essential
 
Device to Intelligence, IOT and Big Data in Oracle
Device to Intelligence, IOT and Big Data in OracleDevice to Intelligence, IOT and Big Data in Oracle
Device to Intelligence, IOT and Big Data in Oracle
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
 

Mehr von GetInData

How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...GetInData
 
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczData-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczGetInData
 
How NOT to win a Kaggle competition
How NOT to win a Kaggle competitionHow NOT to win a Kaggle competition
How NOT to win a Kaggle competitionGetInData
 
How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team? How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team? GetInData
 
OpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easierOpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easierGetInData
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformGetInData
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataGetInData
 
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...GetInData
 
MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...GetInData
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
 
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInDataFeast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInDataGetInData
 
Big data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInDataBig data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInDataGetInData
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...GetInData
 
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...GetInData
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataGetInData
 
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInDataStrategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInDataGetInData
 
Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...GetInData
 
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...GetInData
 
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...GetInData
 
How to maximize profit from IoT by using data platform - Albert Lewandowski, ...
How to maximize profit from IoT by using data platform - Albert Lewandowski, ...How to maximize profit from IoT by using data platform - Albert Lewandowski, ...
How to maximize profit from IoT by using data platform - Albert Lewandowski, ...GetInData
 

Mehr von GetInData (20)

How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
 
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczData-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
 
How NOT to win a Kaggle competition
How NOT to win a Kaggle competitionHow NOT to win a Kaggle competition
How NOT to win a Kaggle competition
 
How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team? How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team?
 
OpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easierOpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easier
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML Platform
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
 
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
 
MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInDataFeast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
 
Big data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInDataBig data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInData
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInDataStrategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
 
Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...
 
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
 
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
 
How to maximize profit from IoT by using data platform - Albert Lewandowski, ...
How to maximize profit from IoT by using data platform - Albert Lewandowski, ...How to maximize profit from IoT by using data platform - Albert Lewandowski, ...
How to maximize profit from IoT by using data platform - Albert Lewandowski, ...
 

Kürzlich hochgeladen

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Kürzlich hochgeladen (20)

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Complex event processing platform handling millions of users - Krzysztof Zarzycki, GetInData

  • 1. Complex Event Processing platform handling millions of users Krzysztof Zarzycki - CTO @ Getindata
  • 2. © Copyright. All rights reserved. Not to be reproduced without prior written consent. About us Founded in 2014 by ex-Spotify engineers. Focus only on Big Data and Cloud (from day 1) Community builders (Big Data Tech Warsaw organizers) 50+ Big Data engineers
  • 3. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What is it? The application logic, analytics, and queries exist continuously, and data flows through them continuously. Stream Processing
  • 4. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Why is it important for business? Stream Processing
  • 5. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Actionable insights Stream Processing
  • 6. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Stream Processing Why is it important for engineering?
  • 7. © Copyright. All rights reserved. Not to be reproduced without prior written consent. It’s NOT only about real-time It’s just natural - data comes continuously. Stream Processing
  • 8. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Stream Processing User sessions spanning minutes, hours, or days Batch boundaries are often artificial. ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ [9:00 - 10:00) [10:00-11:00)
  • 9. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Complex Event Processing ● Analyze patterns,relations, cause-and-effect ○ If A & B then C ● Infer business-relevant events from raw technical stream ● .. and cascade extraction of even higher-level events ● Alerts, triggers, workflow automation
  • 10. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Complex Event Processing ● behavioral marketing ● product analytics ● business activity monitoring ● technical monitoring and anomaly detection ● IoT ● fraud detection
  • 11. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ESP vs CEP ● Difference is blurry and diminishing ● Traditional CEP ○ Complex proc, low latency, single-machine ○ high-level language like SQL ● Traditional ESP ○ straightforward, high-throughput, distributed ○ Broader, more generic and low-level ● NOW: Best of both! ○ Often called “Streaming Analytics”
  • 12. © Copyright. All rights reserved. Not to be reproduced without prior written consent. The need ● Streaming model and real-time ● On par with batch ○ Enrichment, Joins ○ Aggregation ○ Reprocessing of historical data ○ Machine Learning scoring, inference ○ Complex Event Processing ● large scale, high-throughput ● correctness and fault tolerance
  • 13. © Copyright. All rights reserved. Not to be reproduced without prior written consent. The solution Apache Flink open-source stateful processor over massive data streams
  • 14. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Who uses Flink
  • 15. © Copyright. All rights reserved. Not to be reproduced without prior written consent. We use Flink! Banks Telcos Automotive Adtech Commiters to Flink
  • 16. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Late & Out-of-sequence events Breaks correctness. Often handled with very tedious user code. Or solved in batch by “waiting enough” and “processing twice”.
  • 17. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Late & Out-of-sequence events Handled by Flink in the framework Based on watermarks heuristics, that marks the progress of event time in the stream. Asserts that all earlier events have probably arrived.
  • 18. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Local State Operational state obligatory for analytics Used for accumulators, windows, source offsets, tracking patterns, ... 6 sum 1 3 2 4 1 1
  • 19. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Local State Persistent operational state local to computation Maximize performance with millions of updates per second & core. Enable out-of-core (more than RAM) processing, with RocksDb State Task 1 Logic State Task N Logic
  • 20. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Fault tolerance (Checkpointing) State survives abrupt crashes or just maintenance Checkpointed regularly to resilient external storage. Accurate - keeps stream offsets, accumulators or windows in perfect sync, consistency. Efficient - almost no impact on the processing.
  • 21. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Flink Cluster
  • 22. © Copyright. All rights reserved. Not to be reproduced without prior written consent. APIs Java/Scala high-level Table API mid-level dataflow low-level advanced for tricky cases Developer Data Scientist Analyst SQL Incl. analytical functions MATCH_RECOGNIZE and UDF extensibility Python Based on Table API
  • 23. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Big picture
  • 24. Assisting Millions of User in Real-Time Kcell case
  • 25. About Kcell Kcell has a strong software development team and lots of experience in building services and products We like innovations > 10 000 000 subscribers Largest GSM operator in Kazakhstan 4G (40%), 3G (73%), 2G (96%) population Great network coverage There is the ongoing process of company digital transformation Not only telco
  • 26. Business needs Assisting Millions of Users in Real-Time SMS events Voice usage events Data usage events Roaming events Location events Input Process Actions
  • 27. Use Cases Use case scenarios. Just few of many. Case If subscriber top-ups her balance too often in short period of time. We can offer her a less expensive tariff or auto-payment services. Balance Top Up Case Trigger UI
  • 28. Roaming Fraud Trigger to Marketing Platform if subscriber visited X country OR/AND registered in Y visited mobile network and his device's type is Z Roaming case Send an email to the anti-fraud unit if subscriber registered in roaming but his balance at the moment is equal to 0. This situation is impossible in standard case. Fraud case in roaming
  • 30. Future Work We have already done a lot. But more great things are coming. 2020 Q3 2020 Q4 2021 Q1 Bright Future More Data Sources More Triggers Geolocation data Equipment logs Commoditize Machine Learning Extract value from ML company-wide! Enable easy ML training and productization of models in real-time Real-time BI Intraday view on business and operations Monetize valuable insights from our combined rich data sources. Data Monetization Predictive maintenance Network Optimization To lower operational costs And make better investments And many more... Create behavioral profile of the customers for better personalised serving Customer 360 view
  • 31. Old System Why did we start to look for the new solution? External Vendor Solution Blackbox Solution Scalability issues Not reliable 1 2 3 Kcell Developers can’t fix, tweak or optimize it Limited to ~2000 events / sec Can’t support all needed data sources Multiple accidents which took too much time to resolve
  • 32. Scale Required system throughput 500K Events / second 10M Subscribers 40 TB / month
  • 33. New Solution Real-time Stream Processing ingestion outgestion events hub events processing HTTP push/pull FTP NFS MQ HTTP push/pull FTP MQ
  • 34. New Solution Real-time Stream Processing flink ingestion outgestion events hub events processing HTTP push/pull FTP NFS MQ HTTP push/pull FTP MQ flink flink
  • 35. New Solution (Operations) Web UI, Monitoring, Security flink ingestion outgestion events hub events processing HTTP push/pull FTP NFS MQ HTTP push/pull FTP MQ Admin UI (Triggers workbench) Monitoring Loki - logs Prometheus/Grafana - metrics Security FreeIPA Kerberos LDAP/AD API (kafka based) flink flink
  • 36. New Solution (Data Lake) Data Lake and Sub-second OLAP Analytics flink ingestion outgestion events hub events processing HTTP push/pull FTP NFS MQ HTTP push/pull FTP MQ Data Lake Historical Storage (HDFS) Batch (Spark) SQL (Hive) Keep history, Report, Explore Column-oriented Data store OLAP (Druid) Interactive BI flink flink
  • 37. Processing Flow Real-time Stream Processing raw call events data usage events transform transformed events transform transformed events local state RocksDB control topic Admin UI HTTP calls notification events outgestion ingestion ingestion submit/stop triggers
  • 39. Dynamic Rules Design Key Points ● We want to run 100s of triggers/business rules ● A typical approach: job per rule ● Won’t work in our case: ○ Run 100s of topologies/jobs = multiplied resources cost ○ Pull data from Kafka 100s of times ○ State (user features) replicated 100s times ○ Starting rule requires deployment of the job
  • 40. Dynamic Rules Design Key Points ● Our approach: One job to run all triggers/rules ○ And to consume all the sources ● Trigger “templates” still coded with java ● adding/removing rules without restarting application ● 100s of rules running efficiently
  • 41. Dynamic Rules Design The Overview billing events roaming Sort by time control topic notification events Deduplicate Router Late events Trigger 1 Trigger 2 State Updater Apply Triggers (CoProcess Function) Keyed by User
  • 42. Dynamic Rules Design Pros and Cons Shared resources and costs ● CPU, RAM, state, shuffle ● Pulling data from Kafka One bad rule affects whole system ● Watermarks are shared ● Failures are shared No job restart on start of new rules ● Rules started by business, no IT involved Still need to code rule template in the job ● No way to use SQL, Table API, CEP Sharing of state ● Build customer features, that can be seen by all rules Can be tricky to debug ● Code is shared ● Code paths enabled externally
  • 43. Dynamic Rules Design Issue: lagging sources slow down all rules Source A: highly unordered, late Source B: Ordered, low latency Late notifications Low latency notifications Triggers Triggers Group 1 Triggers Group 2 Source A: highly unordered, late Source B: Ordered, low latency Triggers Group Late notifications Problem Solution
  • 44. Flink Changes Wishlist What could be even better? attach new branch to existing topology that receives the same data Dynamic Topologies Cheaper topologies ● Graph of topologies that pass data locally in Flink ● Other words: Local Proxying/fan-out of Kafka traffic Share inputs between topologies Dynamic SQL SQL { }
  • 45. Decisions made Some decisions our team made before or during project implementation Streaming-first approach Apache Kafka for event hub Apache Flink Powerful Real-Time Analytics
  • 46. Apache Avro Keep state local to the process Ingest reference data for local joins and enrichment ● No need to query external systems while processing ● Data time correlation correctness Performance transformed events transformed events Subscriber profile data (events) Local State Not at >100K events / sec
  • 47. Nifi for data ingestion (no coding) ● but not for CEP Web UI for configuring triggers Ease of Use
  • 48. Flink on YARN, with HDFS HA for redundancy and running ~24/7 Prometheus & Grafana for monitoring & alerting Loki for logs collection and aggregation Reliability and battle-tested techniques Kerberos and AD thanks to FreeIPA Apache Ranger for authorization Security
  • 49. One platform for the whole Enterprise Batch (adhoc) queries too ● Spark, Hive/Presto Online analytics ● OLAP Extensiveness HDP Open-source technologies HDP as a licence-free distribution Just start with a bunch of servers Cost-Efficiency
  • 50. Testing def "should notify when user's balance drops below threshold"() { given: BalanceDropTrigger trigger =balanceDropTrigger() .threshold( 50.0) .outgestionSystem( 'campaignSystem') .build() admin.createsTrigger(trigger) and: user.withBalance( 60.0) when: user.makesCall( phoneCall().amountSpent( 20.0)) then: wait(allowedEventLateness) and: List<Notification> actualNotifications = campaignSystem.getNotifications( user, trigger) and: actualNotifications.size() ==1 assertThat(actualNotifications.first()) .hasMsisdn( user.msisdn) .hasBalanceAfter( 40.0) cleanup: admin.deletesTrigger(trigger) } flink ingestion outgestion events hub events processing Fake Campaign System HTTP push/pull FTP NFS MQ Test event generators Preprod environment
  • 51. Our Collaboration Two heads are better than one Joint development team Not a vendor solution Development as one team Code quality Code review and automated tools for code quality control Agile Practices Distant geographic locations, but everyday standups Go live quickly! <4 months to first production case running 24/7! Deliver DevOps/Automation Knowledge sharing Constant knowledge exchange in areas of expertise Testing Separate testing environment Automated Unit/E2E tests
  • 52. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Q&A