SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
1
Architecting Microservices
Applications with Instant Analytics
Tim Berglund, Sr. Director, Developer Experience, Confluent
Rachel Pedreschi, Sr.l Director, Global Field Engineering, Imply Data
2
What the heck is Apache
Kafka and Why Should I Care?
K
V
K
V
K
V
K
V
K
V
K
V
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
producer
consumer A
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer B
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer B
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer B
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer A
consumer B
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer A
Stream
Processingapps
rdbms
nosql
dwh/
hadoop
Stream
Processingapps
rdbms
nosql
dwh/
hadoop
consumer A
consumer B
…
…
…
partition 0
partition 1
partition 2
Partitioned Topic
consumer A
consumer A
consumer A
consumer A
consumer A
Streams
Application
Streams
Application
Streams
Application
public static void main(String args[]) {
Properties streamsConfiguration = getProperties(SCHEMA_REGISTRY_URL);
final Map<String, String> serdeConfig =
Collections.singletonMap(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG,
SCHEMA_REGISTRY_URL);
final SpecificAvroSerde<Movie> movieSerde = getMovieAvroSerde(serdeConfig);
final SpecificAvroSerde<Rating> ratingSerde = getRatingAvroSerde(serdeConfig);
final SpecificAvroSerde<RatedMovie> ratedMovieSerde = new SpecificAvroSerde<>();
ratingSerde.configure(serdeConfig, false);
StreamsBuilder builder = new StreamsBuilder();
KTable<Long, Double> ratingAverage = getRatingAverageTable(builder);
getRatedMoviesTable(builder, ratingAverage, movieSerde);
Topology topology = builder.build();
KafkaStreams streams = new KafkaStreams(topology, streamsConfiguration);
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
streams.start();
}
private static SpecificAvroSerde<Rating> getRatingAvroSerde(Map<String, String> serdeConfig) {
final SpecificAvroSerde<Rating> ratingSerde = new SpecificAvroSerde<>();
ratingSerde.configure(serdeConfig, false);
return ratingSerde;
}
public static SpecificAvroSerde<Movie> getMovieAvroSerde(Map<String, String> serdeConfig) {
}
public static SpecificAvroSerde<Movie> getMovieAvroSerde(Map<String, String> serdeConfig) {
final SpecificAvroSerde<Movie> movieSerde = new SpecificAvroSerde<>();
movieSerde.configure(serdeConfig, false);
return movieSerde;
}
public static KTable<Long, String> getRatedMoviesTable(StreamsBuilder builder,
KTable<Long, Double> ratingAverage,
SpecificAvroSerde<Movie> movieSerde) {
builder.stream("raw-movies", Consumed.with(Serdes.Long(), Serdes.String()))
.mapValues(Parser::parseMovie)
.map((key, movie) -> new KeyValue<>(movie.getMovieId(), movie))
.to("movies", Produced.with(Serdes.Long(), movieSerde));
KTable<Long, Movie> movies = builder.table("movies",
Materialized
.<Long, Movie, KeyValueStore<Bytes, byte[]>>as(
"movies-store")
.withValueSerde(movieSerde)
.withKeySerde(Serdes.Long())
);
KTable<Long, String> ratedMovies = ratingAverage
.join(movies, (avg, movie) -> movie.getTitle() + "=" + avg);
ratedMovies.toStream().to("rated-movies", Produced.with(Serdes.Long(), Serdes.String()));
return ratedMovies;
}
public static KTable<Long, Double> getRatingAverageTable(StreamsBuilder builder) {
KStream<Long, String> rawRatings = builder.stream("raw-ratings",
Consumed.with(Serdes.Long(),
Serdes.String()));
KStream<Long, Rating> ratings = rawRatings.mapValues(Parser::parseRating)
return ratedMovies;
}
public static KTable<Long, Double> getRatingAverageTable(StreamsBuilder builder) {
KStream<Long, String> rawRatings = builder.stream("raw-ratings",
Consumed.with(Serdes.Long(),
Serdes.String()));
KStream<Long, Rating> ratings = rawRatings.mapValues(Parser::parseRating)
.map((key, rating) -> new KeyValue<>(rating.getMovieId(), rating));
KStream<Long, Double> numericRatings = ratings.mapValues(Rating::getRating);
KGroupedStream<Long, Double> ratingsById = numericRatings.groupByKey();
KTable<Long, Long> ratingCounts = ratingsById.count();
KTable<Long, Double> ratingSums = ratingsById.reduce((v1, v2) -> v1 + v2);
KTable<Long, Double> ratingAverage = ratingSums.join(ratingCounts,
(sum, count) -> sum / count.doubleValue(),
Materialized.as("average-ratings"));
ratingAverage.toStream()
/*.peek((key, value) -> { // debug only
System.out.println("key = " + key + ", value = " + value);
})*/
.to("average-ratings");
return ratingAverage;
}
CREATE TABLE movie_ratings AS
SELECT title,
SUM(rating)/COUNT(rating) AS avg_rating,
COUNT(rating) AS num_ratings
FROM ratings
LEFT OUTER JOIN movies
ON ratings.movie_id = movies.movie_id
GROUP BY title;
producer
consumer
KSQL Cluster
KSQL
Server
KSQL
Server
orders shipping users
warehouse
order web
UI
users web
UI
orders shipping users
warehouse
order web
UI
users web
UI
druid
product
analytics
27
What the heck is Apache
Druid and Why Should I Care?
28
TimRachel
29
30
31
32
Data
Data
Data
Data Sources
ETL Data
Warehouse
Some Code Usually an RDBMS
Analytics
Reporting
Data mining
Querying
33
34
35
Data
Data
Data
Map/reduce
Reporting and Analytics
ELT
Data
Warehouse
ML/AI Engine
Search system
Data
Lake
HDFSRDBMS / NoSQL
36
37
Data
Data
Data
Data Sources
Message bus
Data
Lake
Streaming OLAP
38
39
40
Typical
Big Data++
Challenges
● Scale: when data is large, we need a lot of servers
● Speed: aiming for sub-second response time
● Complexity: too much fine grain to precompute
● High dimensionality: 10s or 100s of dimensions
● Concurrency: many users and tenants
● Freshness: load from streams
41
Search
platform
OLAP
! Real-time ingestion
! Flexible schema
! Full text search
! Batch ingestion
! Efficient storage
! Fast analytic queries
Timeseries
database
! Optimized storage for
time-based datasets
! Time-based functions
42
! Batch ingestion
! Efficient storage
! Fast analytic queries
Search
platform
OLAP
! Real-time ingestion
! Flexible schema
! Full text search
Timeseries
database
! Optimized storage for
time-based datasets
! Time-based functions
high performance
analytics database for
event-driven data
43
Druid Use Cases
in the Wild
1. Digital Advertising - Publishers, Advertisers,
Exchanges
2. User Event Analytics- Clickstream, QoS, Usage
3. Network Telemetry
4. Lots and Lots of Data- IoT, Product Analytics,
Fraud
44
Gratuitous
Customer Quote “The performance is great ... some of the tables
that we have internally in Druid have billions and
billions of events in them, and we’re scanning
them in under a second.”
Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-
its-hadoop-stuff.html
orders shipping users
warehouse
order web
UI
users web
UI
druid
product
analytics
46
47
You can Druid
too! Druid community site: https://druid.apache.org/
Imply distribution: https://imply.io/get-started
Contribute: https://github.com/apache/druid
48

Weitere ähnliche Inhalte

Was ist angesagt?

The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
confluent
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 

Was ist angesagt? (20)

Building event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache Kafka Building event-driven (Micro)Services with Apache Kafka
Building event-driven (Micro)Services with Apache Kafka
 
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
Bridge to Cloud: Using Apache Kafka to Migrate to AWSBridge to Cloud: Using Apache Kafka to Migrate to AWS
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
 
APAC ksqlDB Workshop
APAC ksqlDB WorkshopAPAC ksqlDB Workshop
APAC ksqlDB Workshop
 
Building Event-Driven (Micro) Services with Apache Kafka
Building Event-Driven (Micro) Services with Apache KafkaBuilding Event-Driven (Micro) Services with Apache Kafka
Building Event-Driven (Micro) Services with Apache Kafka
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafka
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
Building Event-Driven Applications with Apache Kafka & Confluent Platform
Building Event-Driven Applications with Apache Kafka & Confluent PlatformBuilding Event-Driven Applications with Apache Kafka & Confluent Platform
Building Event-Driven Applications with Apache Kafka & Confluent Platform
 
Ingesting streaming data into Graph Database
Ingesting streaming data into Graph DatabaseIngesting streaming data into Graph Database
Ingesting streaming data into Graph Database
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Time series-analysis-using-an-event-streaming-platform -_v3_final
Time series-analysis-using-an-event-streaming-platform -_v3_finalTime series-analysis-using-an-event-streaming-platform -_v3_final
Time series-analysis-using-an-event-streaming-platform -_v3_final
 
Stream me to the Cloud (and back) with Confluent & MongoDB
Stream me to the Cloud (and back) with Confluent & MongoDBStream me to the Cloud (and back) with Confluent & MongoDB
Stream me to the Cloud (and back) with Confluent & MongoDB
 
Top use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache KafkaTop use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache Kafka
 
A guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update ConferenceA guide through the Azure Messaging services - Update Conference
A guide through the Azure Messaging services - Update Conference
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 

Ähnlich wie Architecting Microservices Applications with Instant Analytics

Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...
NLJUG
 

Ähnlich wie Architecting Microservices Applications with Instant Analytics (20)

Kubernetes Controller for Pull Request Based Environment
Kubernetes Controller for Pull Request Based EnvironmentKubernetes Controller for Pull Request Based Environment
Kubernetes Controller for Pull Request Based Environment
 
Lessons from running AppSync in prod
Lessons from running AppSync in prodLessons from running AppSync in prod
Lessons from running AppSync in prod
 
Tech Webinar: Angular 2, Introduction to a new framework
Tech Webinar: Angular 2, Introduction to a new frameworkTech Webinar: Angular 2, Introduction to a new framework
Tech Webinar: Angular 2, Introduction to a new framework
 
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...
Modularity and Domain Driven Design; a killer Combination? - Tom de Wolf & St...
 
ql.io at NodePDX
ql.io at NodePDXql.io at NodePDX
ql.io at NodePDX
 
The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012
 
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
 
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
 
Make streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLMake streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQL
 
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaboration
 
Debugging Microservices - QCON 2017
Debugging Microservices - QCON 2017Debugging Microservices - QCON 2017
Debugging Microservices - QCON 2017
 
Android RenderScript on LLVM
Android RenderScript on LLVMAndroid RenderScript on LLVM
Android RenderScript on LLVM
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 
Introduction to Microservices by Jim Tran, Principal Solutions Architect, AWS
Introduction to Microservices by Jim Tran, Principal Solutions Architect, AWSIntroduction to Microservices by Jim Tran, Principal Solutions Architect, AWS
Introduction to Microservices by Jim Tran, Principal Solutions Architect, AWS
 
Big datadc skyfall_preso_v2
Big datadc skyfall_preso_v2Big datadc skyfall_preso_v2
Big datadc skyfall_preso_v2
 
My past-3 yeas-developer-journey-at-linkedin-by-iantsai
My past-3 yeas-developer-journey-at-linkedin-by-iantsaiMy past-3 yeas-developer-journey-at-linkedin-by-iantsai
My past-3 yeas-developer-journey-at-linkedin-by-iantsai
 
Tackle Containerization Advisor (TCA) for Legacy Applications
Tackle Containerization Advisor (TCA) for Legacy ApplicationsTackle Containerization Advisor (TCA) for Legacy Applications
Tackle Containerization Advisor (TCA) for Legacy Applications
 
Microservices on AWS: Divide & Conquer for Agility and Scalability
Microservices on AWS: Divide & Conquer for Agility and ScalabilityMicroservices on AWS: Divide & Conquer for Agility and Scalability
Microservices on AWS: Divide & Conquer for Agility and Scalability
 

Mehr von confluent

Mehr von confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Architecting Microservices Applications with Instant Analytics

  • 1. 1 Architecting Microservices Applications with Instant Analytics Tim Berglund, Sr. Director, Developer Experience, Confluent Rachel Pedreschi, Sr.l Director, Global Field Engineering, Imply Data
  • 2. 2 What the heck is Apache Kafka and Why Should I Care?
  • 3. K V
  • 4. K V
  • 5. K V
  • 6. K V
  • 7. K V
  • 8. K V
  • 9. … … … partition 0 partition 1 partition 2 Partitioned Topic producer
  • 10. consumer A … … … partition 0 partition 1 partition 2 Partitioned Topic
  • 11. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic
  • 12. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic
  • 13. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic consumer A
  • 14. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic consumer A consumer A
  • 17. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic consumer A consumer A
  • 20. public static void main(String args[]) { Properties streamsConfiguration = getProperties(SCHEMA_REGISTRY_URL); final Map<String, String> serdeConfig = Collections.singletonMap(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, SCHEMA_REGISTRY_URL); final SpecificAvroSerde<Movie> movieSerde = getMovieAvroSerde(serdeConfig); final SpecificAvroSerde<Rating> ratingSerde = getRatingAvroSerde(serdeConfig); final SpecificAvroSerde<RatedMovie> ratedMovieSerde = new SpecificAvroSerde<>(); ratingSerde.configure(serdeConfig, false); StreamsBuilder builder = new StreamsBuilder(); KTable<Long, Double> ratingAverage = getRatingAverageTable(builder); getRatedMoviesTable(builder, ratingAverage, movieSerde); Topology topology = builder.build(); KafkaStreams streams = new KafkaStreams(topology, streamsConfiguration); Runtime.getRuntime().addShutdownHook(new Thread(streams::close)); streams.start(); } private static SpecificAvroSerde<Rating> getRatingAvroSerde(Map<String, String> serdeConfig) { final SpecificAvroSerde<Rating> ratingSerde = new SpecificAvroSerde<>(); ratingSerde.configure(serdeConfig, false); return ratingSerde; } public static SpecificAvroSerde<Movie> getMovieAvroSerde(Map<String, String> serdeConfig) {
  • 21. } public static SpecificAvroSerde<Movie> getMovieAvroSerde(Map<String, String> serdeConfig) { final SpecificAvroSerde<Movie> movieSerde = new SpecificAvroSerde<>(); movieSerde.configure(serdeConfig, false); return movieSerde; } public static KTable<Long, String> getRatedMoviesTable(StreamsBuilder builder, KTable<Long, Double> ratingAverage, SpecificAvroSerde<Movie> movieSerde) { builder.stream("raw-movies", Consumed.with(Serdes.Long(), Serdes.String())) .mapValues(Parser::parseMovie) .map((key, movie) -> new KeyValue<>(movie.getMovieId(), movie)) .to("movies", Produced.with(Serdes.Long(), movieSerde)); KTable<Long, Movie> movies = builder.table("movies", Materialized .<Long, Movie, KeyValueStore<Bytes, byte[]>>as( "movies-store") .withValueSerde(movieSerde) .withKeySerde(Serdes.Long()) ); KTable<Long, String> ratedMovies = ratingAverage .join(movies, (avg, movie) -> movie.getTitle() + "=" + avg); ratedMovies.toStream().to("rated-movies", Produced.with(Serdes.Long(), Serdes.String())); return ratedMovies; } public static KTable<Long, Double> getRatingAverageTable(StreamsBuilder builder) { KStream<Long, String> rawRatings = builder.stream("raw-ratings", Consumed.with(Serdes.Long(), Serdes.String())); KStream<Long, Rating> ratings = rawRatings.mapValues(Parser::parseRating)
  • 22. return ratedMovies; } public static KTable<Long, Double> getRatingAverageTable(StreamsBuilder builder) { KStream<Long, String> rawRatings = builder.stream("raw-ratings", Consumed.with(Serdes.Long(), Serdes.String())); KStream<Long, Rating> ratings = rawRatings.mapValues(Parser::parseRating) .map((key, rating) -> new KeyValue<>(rating.getMovieId(), rating)); KStream<Long, Double> numericRatings = ratings.mapValues(Rating::getRating); KGroupedStream<Long, Double> ratingsById = numericRatings.groupByKey(); KTable<Long, Long> ratingCounts = ratingsById.count(); KTable<Long, Double> ratingSums = ratingsById.reduce((v1, v2) -> v1 + v2); KTable<Long, Double> ratingAverage = ratingSums.join(ratingCounts, (sum, count) -> sum / count.doubleValue(), Materialized.as("average-ratings")); ratingAverage.toStream() /*.peek((key, value) -> { // debug only System.out.println("key = " + key + ", value = " + value); })*/ .to("average-ratings"); return ratingAverage; }
  • 23. CREATE TABLE movie_ratings AS SELECT title, SUM(rating)/COUNT(rating) AS avg_rating, COUNT(rating) AS num_ratings FROM ratings LEFT OUTER JOIN movies ON ratings.movie_id = movies.movie_id GROUP BY title;
  • 26. orders shipping users warehouse order web UI users web UI druid product analytics
  • 27. 27 What the heck is Apache Druid and Why Should I Care?
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. 32 Data Data Data Data Sources ETL Data Warehouse Some Code Usually an RDBMS Analytics Reporting Data mining Querying
  • 33. 33
  • 34. 34
  • 35. 35 Data Data Data Map/reduce Reporting and Analytics ELT Data Warehouse ML/AI Engine Search system Data Lake HDFSRDBMS / NoSQL
  • 36. 36
  • 38. 38
  • 39. 39
  • 40. 40 Typical Big Data++ Challenges ● Scale: when data is large, we need a lot of servers ● Speed: aiming for sub-second response time ● Complexity: too much fine grain to precompute ● High dimensionality: 10s or 100s of dimensions ● Concurrency: many users and tenants ● Freshness: load from streams
  • 41. 41 Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search ! Batch ingestion ! Efficient storage ! Fast analytic queries Timeseries database ! Optimized storage for time-based datasets ! Time-based functions
  • 42. 42 ! Batch ingestion ! Efficient storage ! Fast analytic queries Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search Timeseries database ! Optimized storage for time-based datasets ! Time-based functions high performance analytics database for event-driven data
  • 43. 43 Druid Use Cases in the Wild 1. Digital Advertising - Publishers, Advertisers, Exchanges 2. User Event Analytics- Clickstream, QoS, Usage 3. Network Telemetry 4. Lots and Lots of Data- IoT, Product Analytics, Fraud
  • 44. 44 Gratuitous Customer Quote “The performance is great ... some of the tables that we have internally in Druid have billions and billions of events in them, and we’re scanning them in under a second.” Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts- its-hadoop-stuff.html
  • 45. orders shipping users warehouse order web UI users web UI druid product analytics
  • 46. 46
  • 47. 47 You can Druid too! Druid community site: https://druid.apache.org/ Imply distribution: https://imply.io/get-started Contribute: https://github.com/apache/druid
  • 48. 48