SlideShare ist ein Scribd-Unternehmen logo
1 von 33
DATA
LivePerson Case Study:
Real Time Data Streaming
March 20th 2014
Ran Silberman
About me
● Technical Leader of Data Platform in LivePerson
● Bird watcher and amateur bird photographer
Agenda
● Why we chose Kafka + Storm
● How implementation was done
● Measures of success
● Two examples of use
● Tips from our experience
Data in LivePerson
Visitor in Site
Chat Window
Agent console
LivePerson SaaS Server
LoginMonitor
Rules,
Intelligence,
Decision
Chat
Chat
Invite
DATA
DATA DATA
BIG
DATA
Legacy Data flow in LivePerson
BI DWH
(Oracle)
RealTime
servers
ETL
Sessionize
Modeling
Schema
View
Real-Time data
Historical data
Why Kafka + Storm?
● Need to scale out and plan for future scale
○ Limit for scale should not be technology
○ Let the limit be cost of (commodity) hardware
● What Data platforms can be implemented quickly?
○ Open source - fast evolving and community
○ Micro-services - do only what you ought to do!
● Are there risks in this choice?
○ Yes! technology is not mature enough
○ But, there is no other mature technology that can
address our needs!
Legacy Data flow in LivePerson
BI DWH
(Oracle)
RealTime
servers
Customers
ETL
Sessionize
Modeling
Schema
View
1st phase - move to Hadoop
ETL
Sessionize
Modeling
Schema
View
RealTime
servers
BI DWH
(Vertica)HDFS
Hadoop
MR Job transfers
data to BI DWH
Customers
BI DWH
(Oracle)
2. move to Kafka
6
RealTime
servers
HDFS
BI DWH
(Vertica)
Hadoop
MR Job transfers
data to BI DWH
Kafka
Topic-1
Customers
ETL
Modeling
Schema
View
Sessionize
3. Integrate with new producers
6
RealTime
servers
HDFS
BI DWH
(Vertica)
Hadoop
MR Job transfers
data to BI DWH
Kafka
Topic-1 Topic-2
New
RealTime
servers
Customers
4. Add Real-time BI
6
Customers
RealTime
servers
HDFS
BI DWH
(Vertica)
Hadoop
MR Job transfers
data to BI DWH
Kafka
Topic-1 Topic-2
New
RealTime
servers
Storm
Topology
Analytics
DB
Architecture
Real-time
servers
Kafka
Storm
Cassandra/
CouchBase
Real Time Processing
Flow rate
into Kafka:
33 MB/Sec
Flow rate
from Kafka:
20 MB/Sec
Total daily data
in Kafka:
17 Billion events
Some Numbers: Cyber Monday 2013
Dashboards
4 topologies
reading all
events
Two use cases
1. Visitor list
2. Agent State
1st Strom Use Case: “Visitors List”
Use case:
● Show list of visitors in the “Agent Console”
● Collect data about visitor in real time
● Visitor stickiness in streaming process
Visitors List Topology
Selected Analytics DB - Couchbase
1st Strom Use Case: “Visitors List”
● Document Store - for complex documents
● Searchable - possible to search by different
attributes.
● High throughput - Read & Write
First Storm Topology – Visitor Feed
Storm Topology
Kafka Spout Analyze relevant
events
Write event to
Visitor document
emit emit
Kafka events stream
Add/
Update
Couchbase
“Visitor List” Topology:
Analytics DB: Couchbase - Document store
Parse Avro into
tuple
emit
Visitors List - Storm considerations
● Complex calculations before sending to DB
○ Ignore delayed events
○ Reorder events before storing
● Document cached in memory
● Fields Grouping to bolt that writes to CouchBase
● High parallelism in bolt that writes to CouchBase
Visitors List Topology
2nd Storm Use Case: “Agent State”
Use case:
● Show Agent activity on “Agent Console”
● Count Agent statistics
● Display graphs
Agent Status Topology
Selected Analytics DB - Cassandra
2nd Storm Use Case: “Agent State”
● Wide Column Store DB
● Highly Available w/o Single point of failure
● High throughput
● Optimized for counters
First Storm Topology – Visitor Feed
Storm Topology
Kafka Spout Analyze relevant
events
Send events
emit emit
Kafka events stream
Add
“Agent Status” Topology:
Analytics DB: Cassandra - Document store
Parse Avro into
tuple
emit
Data
visualization
using Highcharts
Agent Status - Storm considerations
● Counters stored by topology
● Calculations done after reading from DB
● Delayed events should not be ignored
● Order of events does not matter
● Using Highcharts for data visualization
Challenges:
● High network traffic
● Writing to Kafka is faster than reading
● All topologies read all events
● How to avoid resource starvation in Storm
Optimizations of Kafka
● Increase Kafka consuming rate by adding partitions
● Run on physical machines with RAID
● Set retention to the proper need
● Monitor data flow!
Optimizations of Storm
● #of Kafka-Spouts = number of total partitions
● Set “Isolation mode” for important topologies
● Validate Network cards can carry network traffic
● Set Storm cluster on high CPU machines
● Monitor servers CPU & Memory (Graphite)
● Assess min. #Cores that topology needs
○ Use “top” -> “load” to find server load
Demo
● Agent Console - https://z1.le.liveperson.net/
71394613 / rans@liveperson.com
● My Site - http://birds-of-israel.weebly.com/
Questions?
Thank you!
ran.silberman@gmail.com

Weitere ähnliche Inhalte

Andere mochten auch

Kelly Introduction
Kelly IntroductionKelly Introduction
Kelly Introduction
HenryHe
 
Mehanicko Merenje Na Agli
Mehanicko Merenje Na AgliMehanicko Merenje Na Agli
Mehanicko Merenje Na Agli
donemkd
 
FGSQUARED Community Widgets
FGSQUARED Community WidgetsFGSQUARED Community Widgets
FGSQUARED Community Widgets
FGSQUARED
 
『シーエー・モバイルのスマートフォンへの取組』シーエー・モバイル山口氏
『シーエー・モバイルのスマートフォンへの取組』シーエー・モバイル山口氏『シーエー・モバイルのスマートフォンへの取組』シーエー・モバイル山口氏
『シーエー・モバイルのスマートフォンへの取組』シーエー・モバイル山口氏
Developers Summit
 

Andere mochten auch (14)

Graph QL Introduction
Graph QL IntroductionGraph QL Introduction
Graph QL Introduction
 
デブサミ2013【15-D-4】Opsから挑むDevOps
デブサミ2013【15-D-4】Opsから挑むDevOpsデブサミ2013【15-D-4】Opsから挑むDevOps
デブサミ2013【15-D-4】Opsから挑むDevOps
 
Kelly Introduction
Kelly IntroductionKelly Introduction
Kelly Introduction
 
Mehanicko Merenje Na Agli
Mehanicko Merenje Na AgliMehanicko Merenje Na Agli
Mehanicko Merenje Na Agli
 
Opensat
OpensatOpensat
Opensat
 
FGSQUARED Community Widgets
FGSQUARED Community WidgetsFGSQUARED Community Widgets
FGSQUARED Community Widgets
 
10 Of The Most Rough and Tough Warriors Throughout History
10 Of The Most Rough and Tough Warriors Throughout History10 Of The Most Rough and Tough Warriors Throughout History
10 Of The Most Rough and Tough Warriors Throughout History
 
Excalibur
ExcaliburExcalibur
Excalibur
 
Secret seo-tienpv
Secret seo-tienpvSecret seo-tienpv
Secret seo-tienpv
 
Copilaria
Copilaria Copilaria
Copilaria
 
Megasat
MegasatMegasat
Megasat
 
『シーエー・モバイルのスマートフォンへの取組』シーエー・モバイル山口氏
『シーエー・モバイルのスマートフォンへの取組』シーエー・モバイル山口氏『シーエー・モバイルのスマートフォンへの取組』シーエー・モバイル山口氏
『シーエー・モバイルのスマートフォンへの取組』シーエー・モバイル山口氏
 
Het belang van innovatie
Het belang van innovatie Het belang van innovatie
Het belang van innovatie
 
Arion
ArionArion
Arion
 

Mehr von LivePerson

Mehr von LivePerson (17)

Microservices on top of kafka
Microservices on top of kafkaMicroservices on top of kafka
Microservices on top of kafka
 
System Revolution- How We Did It
System Revolution- How We Did It System Revolution- How We Did It
System Revolution- How We Did It
 
Liveperson DLD 2015
Liveperson DLD 2015 Liveperson DLD 2015
Liveperson DLD 2015
 
Http 2: Should I care?
Http 2: Should I care?Http 2: Should I care?
Http 2: Should I care?
 
Mobile app real-time content modifications using websockets
Mobile app real-time content modifications using websocketsMobile app real-time content modifications using websockets
Mobile app real-time content modifications using websockets
 
Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices
 
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]
 
Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonApache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePerson
 
Data compression in Modern Application
Data compression in Modern ApplicationData compression in Modern Application
Data compression in Modern Application
 
Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API
 
SIP - Introduction to SIP Protocol
SIP - Introduction to SIP ProtocolSIP - Introduction to SIP Protocol
SIP - Introduction to SIP Protocol
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduce
 
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
 
How can A/B testing go wrong?
How can A/B testing go wrong?How can A/B testing go wrong?
How can A/B testing go wrong?
 
Introduction to Vertica (Architecture & More)
Introduction to Vertica (Architecture & More)Introduction to Vertica (Architecture & More)
Introduction to Vertica (Architecture & More)
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

LivePerson Case Study: Real Time Data Streaming using Storm & Kafka

  • 1. DATA LivePerson Case Study: Real Time Data Streaming March 20th 2014 Ran Silberman
  • 2. About me ● Technical Leader of Data Platform in LivePerson ● Bird watcher and amateur bird photographer
  • 3. Agenda ● Why we chose Kafka + Storm ● How implementation was done ● Measures of success ● Two examples of use ● Tips from our experience
  • 4. Data in LivePerson Visitor in Site Chat Window Agent console LivePerson SaaS Server LoginMonitor Rules, Intelligence, Decision Chat Chat Invite DATA DATA DATA BIG DATA
  • 5. Legacy Data flow in LivePerson BI DWH (Oracle) RealTime servers ETL Sessionize Modeling Schema View Real-Time data Historical data
  • 6. Why Kafka + Storm? ● Need to scale out and plan for future scale ○ Limit for scale should not be technology ○ Let the limit be cost of (commodity) hardware ● What Data platforms can be implemented quickly? ○ Open source - fast evolving and community ○ Micro-services - do only what you ought to do! ● Are there risks in this choice? ○ Yes! technology is not mature enough ○ But, there is no other mature technology that can address our needs!
  • 7.
  • 8. Legacy Data flow in LivePerson BI DWH (Oracle) RealTime servers Customers ETL Sessionize Modeling Schema View
  • 9. 1st phase - move to Hadoop ETL Sessionize Modeling Schema View RealTime servers BI DWH (Vertica)HDFS Hadoop MR Job transfers data to BI DWH Customers BI DWH (Oracle)
  • 10. 2. move to Kafka 6 RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Customers ETL Modeling Schema View Sessionize
  • 11. 3. Integrate with new producers 6 RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Topic-2 New RealTime servers Customers
  • 12. 4. Add Real-time BI 6 Customers RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Topic-2 New RealTime servers Storm Topology Analytics DB
  • 13. Architecture Real-time servers Kafka Storm Cassandra/ CouchBase Real Time Processing Flow rate into Kafka: 33 MB/Sec Flow rate from Kafka: 20 MB/Sec Total daily data in Kafka: 17 Billion events Some Numbers: Cyber Monday 2013 Dashboards 4 topologies reading all events
  • 14.
  • 15. Two use cases 1. Visitor list 2. Agent State
  • 16. 1st Strom Use Case: “Visitors List” Use case: ● Show list of visitors in the “Agent Console” ● Collect data about visitor in real time ● Visitor stickiness in streaming process
  • 18. Selected Analytics DB - Couchbase 1st Strom Use Case: “Visitors List” ● Document Store - for complex documents ● Searchable - possible to search by different attributes. ● High throughput - Read & Write
  • 19. First Storm Topology – Visitor Feed Storm Topology Kafka Spout Analyze relevant events Write event to Visitor document emit emit Kafka events stream Add/ Update Couchbase “Visitor List” Topology: Analytics DB: Couchbase - Document store Parse Avro into tuple emit
  • 20. Visitors List - Storm considerations ● Complex calculations before sending to DB ○ Ignore delayed events ○ Reorder events before storing ● Document cached in memory ● Fields Grouping to bolt that writes to CouchBase ● High parallelism in bolt that writes to CouchBase
  • 22.
  • 23. 2nd Storm Use Case: “Agent State” Use case: ● Show Agent activity on “Agent Console” ● Count Agent statistics ● Display graphs
  • 25. Selected Analytics DB - Cassandra 2nd Storm Use Case: “Agent State” ● Wide Column Store DB ● Highly Available w/o Single point of failure ● High throughput ● Optimized for counters
  • 26. First Storm Topology – Visitor Feed Storm Topology Kafka Spout Analyze relevant events Send events emit emit Kafka events stream Add “Agent Status” Topology: Analytics DB: Cassandra - Document store Parse Avro into tuple emit Data visualization using Highcharts
  • 27. Agent Status - Storm considerations ● Counters stored by topology ● Calculations done after reading from DB ● Delayed events should not be ignored ● Order of events does not matter ● Using Highcharts for data visualization
  • 28. Challenges: ● High network traffic ● Writing to Kafka is faster than reading ● All topologies read all events ● How to avoid resource starvation in Storm
  • 29. Optimizations of Kafka ● Increase Kafka consuming rate by adding partitions ● Run on physical machines with RAID ● Set retention to the proper need ● Monitor data flow!
  • 30. Optimizations of Storm ● #of Kafka-Spouts = number of total partitions ● Set “Isolation mode” for important topologies ● Validate Network cards can carry network traffic ● Set Storm cluster on high CPU machines ● Monitor servers CPU & Memory (Graphite) ● Assess min. #Cores that topology needs ○ Use “top” -> “load” to find server load
  • 31. Demo ● Agent Console - https://z1.le.liveperson.net/ 71394613 / rans@liveperson.com ● My Site - http://birds-of-israel.weebly.com/