SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Hunter Kelly
@retnuh
Real-Time Domain Rankings
with Kafka Streams
What are we doing?
● Discovering relevant domains in the Fashion Web
● Use modified HITS - a.k.a. Hubs & Authorities
Why use Kafka Streams?
● Why not Spark, Hadoop, Whatevs?
● Advantages of Kafka Streams
Show me the nitty gritty!
● Okay!
The Fashion Web
Curated list of Fashion related sites
● Initial lists from domain experts
● Support multiple languages
● What are the next “best” domains?
cosmopolitan
stylecaster
instyle
racked
harpersbazaar
elle
vogue
hypebeast
http://www.cs.cornell.edu/home/kleinber/auth.pdf
http://en.wikipedia.org/wiki/HITS_algorithm
* Hyperlink-Induced Topic Search, but no-one seems to call it that
Need Data-driven method
● Kleinberg’s HITS* algorithm
● Allows us to discover Hubs and Authorities
● Flattened to domain level
HITS in a nutshell
HITS in a nutshell
Adjacency Matrix
Image courtesy of
http://faculty.ycp.edu/~dbabcock/PastCourses/cs360/lectures/lecture15.html
HITS in a nutshell
● Why not just counts?
● If you point to good sites, you’re a better
Hub
● If good sites point to you, you’re a better
Authority
● Each iteration, the weights change, until
they reach convergence
Common Questions
● What about non-fashion domains?
● Why not PageRank?
Kafka Streams
Please tell me you aren’t using Kafka
to do Matrix multiplication!
Please tell me you aren’t using Kafka
to do Matrix multiplication!
No, I am not crazy enough* to use Kafka to do
matrix multiplication!
Please tell me you aren’t using Kafka
to do Matrix multiplication!
No, I am not crazy enough* to use Kafka to do
matrix multiplication!
*Although, I probably did spend too much
time thinking about it on the bus!
Why not use Map-Reduce?
Basically what it was invented for,
right?
You could use Hadoop...
You could use Hadoop...
You could use Spark…
● High infrastructure overhead if not using it
for anything else
● Bad initial experience w/ Spark Streaming
snapshotting and recovery
● Already using Kafka!
Why use Kafka Streams?
● Has primitives necessary for Map-Reduce
○ Map step groupBy groupByKey
○ Reduce step reduce aggregate
● Focus is on your data
not distributed computing machinery
● Streaming allows us to have
(near)
real-time, up-to-date data
The Nitty Gritty
Kafka Streams 101
● No explicit consumers/producers -
plumbing handled for you
● Topics still a fundamental communication
piece
● Think of individual datum flowing through
KStreams & KTables
Kafka Streams 101 - KSTreams
● Focus is on specific functional
transformations - map, filter, flatMap
● Also supports various flavours of joins with
other KStreams
● Usually created from one or more topics or
a transformation on another KStream
Kafka Streams 101 - KTable
● Still offers functional transforms, but on a
primary-keyed table
● Offers persistent storage
● Created from aggregations on a KStream
or transforms on other KTable
● The bridge between KStream and KTable
● Created by doing a groupBy or
groupByKey on a KStream
● Create KTables by doing reduce,
aggregate, count
Kafka Streams 101 - KGroupedStream
Output Topic
(log compacted)
KTable ops:
groupByKey,
reduce,
toStream
KStream
Input Topic
KStream ops:
flatMapDomain Link Extractor
Domain Reducer
HITS Calculator & API
Data Flow
Output Topic
(log compacted)
KTable ops:
groupByKey,
reduce,
toStream
KStream
Input Topic
KStream ops:
flatMapDomain Link Extractor
Domain Reducer
HITS Calculator & API
Output Topic
(log compacted)
KTable ops:
groupByKey,
reduce,
toStream
KStream
Input Topic
KStream ops:
flatMapDomain Link Extractor
Domain Reducer
HITS Calculator & API
Output Topic
(log compacted)
KTable ops:
groupByKey,
reduce,
toStream
KStream
Input Topic
KStream ops:
flatMapDomain Link Extractor
Domain Reducer
HITS Calculator & API
Current
Domain Link Extractor
Input Topic
Fetch Content
External Links
Resolve Links
Extract Domains
KStream
KStream ops:
flatMap
Extracted Link
Resolver
Domain Link Extractor
Input Topic
Fetch Content
External Links
Resolve Links
Extract Domains
Output
Topic
KStream ops:
flatMap
KStream ops:
flatMap
KStream ops:
flatMap
Domain Link Extractor
Input Topic
Fetch Content
External Links
Resolve Links
Extract Domains
KStream
Other Function(s)
KStream
KStream ops:
flatMap
KStream Topic Whatev’s
With Fork
So what’s
the point?
TL;DR
● Kafka Streams can help solve problems in
your application domain
● Focus on your data!
● Naturally decompose the problem into
flexible microservices
Hunter Kelly
@retnuh
https://github.com/retnuh

Weitere ähnliche Inhalte

Was ist angesagt?

Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
confluent
 

Was ist angesagt? (20)

Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
 
Follow the (Kafka) Streams
Follow the (Kafka) StreamsFollow the (Kafka) Streams
Follow the (Kafka) Streams
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafka
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
 
Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017
 
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails? Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
 
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
 
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
 
Real-world Streaming Architectures
Real-world Streaming ArchitecturesReal-world Streaming Architectures
Real-world Streaming Architectures
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
 
Deploying Confluent Platform for Production
Deploying Confluent Platform for ProductionDeploying Confluent Platform for Production
Deploying Confluent Platform for Production
 
Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and KubernetesDeploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetes
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
 
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...
 
Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructure
 

Ähnlich wie Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams

Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Nitin Kumar
 

Ähnlich wie Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams (20)

Learn from HomeAway Hadoop Development and Operations Best Practices
Learn from HomeAway Hadoop Development and Operations Best PracticesLearn from HomeAway Hadoop Development and Operations Best Practices
Learn from HomeAway Hadoop Development and Operations Best Practices
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Go Big or Go Home: Approaching Kafka Replication at Scale
Go Big or Go Home: Approaching Kafka Replication at ScaleGo Big or Go Home: Approaching Kafka Replication at Scale
Go Big or Go Home: Approaching Kafka Replication at Scale
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
Flink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systemsFlink and Hive integration - unifying enterprise data processing systems
Flink and Hive integration - unifying enterprise data processing systems
 
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stack
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Integration for real-time Kafka SQL
Integration for real-time Kafka SQLIntegration for real-time Kafka SQL
Integration for real-time Kafka SQL
 
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
 

Mehr von confluent

Mehr von confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Kürzlich hochgeladen

Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Kürzlich hochgeladen (20)

Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 

Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams

  • 1. Hunter Kelly @retnuh Real-Time Domain Rankings with Kafka Streams
  • 2. What are we doing? ● Discovering relevant domains in the Fashion Web ● Use modified HITS - a.k.a. Hubs & Authorities Why use Kafka Streams? ● Why not Spark, Hadoop, Whatevs? ● Advantages of Kafka Streams Show me the nitty gritty! ● Okay!
  • 4. Curated list of Fashion related sites ● Initial lists from domain experts ● Support multiple languages ● What are the next “best” domains?
  • 6. http://www.cs.cornell.edu/home/kleinber/auth.pdf http://en.wikipedia.org/wiki/HITS_algorithm * Hyperlink-Induced Topic Search, but no-one seems to call it that Need Data-driven method ● Kleinberg’s HITS* algorithm ● Allows us to discover Hubs and Authorities ● Flattened to domain level
  • 7. HITS in a nutshell
  • 8. HITS in a nutshell Adjacency Matrix Image courtesy of http://faculty.ycp.edu/~dbabcock/PastCourses/cs360/lectures/lecture15.html
  • 9. HITS in a nutshell ● Why not just counts? ● If you point to good sites, you’re a better Hub ● If good sites point to you, you’re a better Authority ● Each iteration, the weights change, until they reach convergence
  • 10. Common Questions ● What about non-fashion domains? ● Why not PageRank?
  • 12. Please tell me you aren’t using Kafka to do Matrix multiplication!
  • 13. Please tell me you aren’t using Kafka to do Matrix multiplication! No, I am not crazy enough* to use Kafka to do matrix multiplication!
  • 14. Please tell me you aren’t using Kafka to do Matrix multiplication! No, I am not crazy enough* to use Kafka to do matrix multiplication! *Although, I probably did spend too much time thinking about it on the bus!
  • 15. Why not use Map-Reduce? Basically what it was invented for, right?
  • 16. You could use Hadoop...
  • 17. You could use Hadoop...
  • 18. You could use Spark… ● High infrastructure overhead if not using it for anything else ● Bad initial experience w/ Spark Streaming snapshotting and recovery ● Already using Kafka!
  • 19. Why use Kafka Streams? ● Has primitives necessary for Map-Reduce ○ Map step groupBy groupByKey ○ Reduce step reduce aggregate ● Focus is on your data not distributed computing machinery ● Streaming allows us to have (near) real-time, up-to-date data
  • 21. Kafka Streams 101 ● No explicit consumers/producers - plumbing handled for you ● Topics still a fundamental communication piece ● Think of individual datum flowing through KStreams & KTables
  • 22. Kafka Streams 101 - KSTreams ● Focus is on specific functional transformations - map, filter, flatMap ● Also supports various flavours of joins with other KStreams ● Usually created from one or more topics or a transformation on another KStream
  • 23. Kafka Streams 101 - KTable ● Still offers functional transforms, but on a primary-keyed table ● Offers persistent storage ● Created from aggregations on a KStream or transforms on other KTable
  • 24. ● The bridge between KStream and KTable ● Created by doing a groupBy or groupByKey on a KStream ● Create KTables by doing reduce, aggregate, count Kafka Streams 101 - KGroupedStream
  • 25. Output Topic (log compacted) KTable ops: groupByKey, reduce, toStream KStream Input Topic KStream ops: flatMapDomain Link Extractor Domain Reducer HITS Calculator & API Data Flow
  • 26. Output Topic (log compacted) KTable ops: groupByKey, reduce, toStream KStream Input Topic KStream ops: flatMapDomain Link Extractor Domain Reducer HITS Calculator & API
  • 27. Output Topic (log compacted) KTable ops: groupByKey, reduce, toStream KStream Input Topic KStream ops: flatMapDomain Link Extractor Domain Reducer HITS Calculator & API
  • 28. Output Topic (log compacted) KTable ops: groupByKey, reduce, toStream KStream Input Topic KStream ops: flatMapDomain Link Extractor Domain Reducer HITS Calculator & API
  • 29. Current Domain Link Extractor Input Topic Fetch Content External Links Resolve Links Extract Domains KStream KStream ops: flatMap
  • 30. Extracted Link Resolver Domain Link Extractor Input Topic Fetch Content External Links Resolve Links Extract Domains Output Topic KStream ops: flatMap KStream ops: flatMap KStream ops: flatMap
  • 31. Domain Link Extractor Input Topic Fetch Content External Links Resolve Links Extract Domains KStream Other Function(s) KStream KStream ops: flatMap KStream Topic Whatev’s With Fork
  • 33. TL;DR ● Kafka Streams can help solve problems in your application domain ● Focus on your data! ● Naturally decompose the problem into flexible microservices
  • 34.