SlideShare a Scribd company logo
1 of 40
Download to read offline
Real-time serverless
analytics at Shedd
Overview and hands-on workshop
Dobo Radichkov
OLX Data Summit, March 2018
2
What to expect…
ØGoal is to give you a sweeping view of the Shedd
serverless real-time analytics stack
ØWe will cover a lot of new tools and tech building blocks,
though we will steer clear of the nitty gritty details
ØExpect technical content and hands-on exercises – for
the non-technical folk in the audience, try to focus on the
high-level understanding of the concepts
ØWe hope the presentation gives you inspiration and
smoothens the learning curve in case you decide to
pursue a similar approach
3
Contents
▪ Introduction
▪ Enabling technology
▪ Putting it all together
▪ Future direction
▪ Q&A
4
Why real-time analytics?
VS
Offline Real-time
5
Why real-time analytics?
VS
Offline Real-time
Enables products that adapt and respond to
changing user behaviour instantly and continuously
6
Example: Consider this insight regarding first-time Shedd users
Does not
view any ads
Views 1
or more ads
Makes 1
or more replies
Day 1
activity
Browser Viewer Buyer
7
Example: Consider this insight regarding first-time Shedd users
Does not
view any ads
Views 1
or more ads
Makes 1
or more replies
2.9 ad views
0.02 replies
1.3 active days
150 ad views
0.4 replies
4.7 active days
670 ad views
6.7 replies
11.2 active days
Day 1
activity
Days 2-30
activity
Browser Viewer Buyer
8
Example: Consider this insight regarding first-time Shedd users
Does not
view any ads
Views 1
or more ads
Makes 1
or more replies
2.9 ad views
0.02 replies
1.3 active days
150 ad views
0.4 replies
4.7 active days
670 ad views
6.7 replies
11.2 active days
Day 1
activity
Days 2-30
activity
Browser Viewer Buyer
How can real-time analytics help?
9
Real-time analytics unlocks a number of capabilities
Segment user behaviour and build real-time single customer viewSegmentation
Personalisation
Targeting
Reporting
A/B testing
Data-driven
products
Instantly personalise product experience based on up-to-date user
preferences and behaviour
Target users with push notifications, in-app messaging and custom
product flows based on real-time triggers and rules
Build mission-critical reports for real-time decision-making (e.g.
during large live marketing campaign or new product releases)
Continuously optimise live A/B tests based on real-time results
Enable integration of data analytics & models within our products
10
Real-time analytics enables us to unlock the full value of dataThe diminishing value of data
Recent data is highly valuab
If you act on it in time
Perishable Insights (M. Gualtieri, F
Old + Recent data is more v
If you have the means to combine t
11
BATCH DATA STACK
Operational data layer
(listings, replies, users, orders, etc.)
Raw data layer
(data lake)
Tracking
(Ninja /
Hydra)
Platform DB
(Mongo)
Adjust /
Facebook /
Google
…
BI Segmentation
Performance
marketing
CLM
Batch
recommender
…
DATAWAREHOUSE
Raw data streams
REAL-TIME DATA STACK
Tracking
(Ninja / Hydra)
Platform DB
(Mongo)
…
Real-time
data processing
Real-time database (Online customer view)
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
API gateway
Real-time
recommender
Real-time
segmentation
Other real-time
applications
Today we will take a peek at Shedd’s real-time data stack
12
Contents
▪ Introduction
▪ Enabling technology
▪ Putting it all together
▪ Future direction
▪ Q&A
13
We leverage 3 AWS building blocks for real-time data analytics
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data
14
We leverage 3 AWS building blocks for real-time data analytics
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data
15
Kinesis includes 3 flavours
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Analytics
Amazon Kinesis
Data Firehose
Build custom
applications that process
and analyze streaming
data
Easily process and
analyze streaming data
with standard SQL
Easily load streaming
data into AWS
Stream à Process Stream à Analyse Stream à Ingest
16
Kinesis Data Stream architecture
▪ 1 MB / sec data input
▪ 1 MB / sec data output
▪ 1000 records / sec
▪ 24 hours data retention
▪ $0.015 / shard / hour
($10.80 / shard / month)
▪ $0.014 / 1M records
($14 / 1B records)
…
Stream
Shard
Event / data record (e.g. JSON object)
Write event to stream shard
Read event from stream shard
17
Exercise: Create stream and feed with sample data
1. Create Kinesis data stream 2. Feed sample real-time data
https://us-west-2.console.aws.amazon.com/kinesis/home?region=us-west-2#/streams/create https://awslabs.github.io/amazon-kinesis-data-generator/
18
Kinesis Analytics enables real-time data analysis,
transformation, enrichment and visualisation
19
Exercise: Create Kinesis Analytics application and run some
real-time SQL analysis
1. Create Kinesis Analytics app 2. Run real-time SQL analysis
20
We leverage 3 AWS building blocks for real-time data analytics
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data
21
Evolution of computing models
ON-PREMISE
Physical servers
SERVER as a service
Virtual server in
the cloud
Amazon EC2
APP as a service
Virtual app
container
Amazon ECS
FUNCTION as a service
Serverless
computing
AWS Lambda
22
Lambda is Amazon’s serverless event-driven compute service
Write code in
Python, Node.js,
Java, and others
and upload to
Lambda
Trigger code from
other AWS services,
HTTP endpoints or
in-app activity
Scale seamlessly and
elastically with number of
events, only using
required compute
resource
Only pay for the
compute time
used (per 100ms
execution time)
Forget about infrastructure, administration and scaling – focus 100% on your app logic
23
Exercise: Let’s create 2 simple Lambda functions
1. Create Hello World 2. Create stream processor
24
Combining Lambda with API gateway empowers the data
professional to create serverless APIs
25
serverless framework streamlines and automates deployment
26
Exercise: Create APIs with serverless + API gateway + Lambda
1. Create Hello World endpoint 2. Create mock API endpoint
27
We leverage 3 AWS building blocks for real-time data analytics
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data
28
ElastiCache is Amazon’s managed service for Redis:
an INSANELY fast in-memory key-value database
▪In-memory
▪Low latency
▪Ridiculously fast
▪NoSQL à key-value store
▪Open source
29
Redis + Redshift =
▪ Run few queries infrequently
▪ Process billions of records per query
▪ Standard SQL
▪ Batch
▪ Run millions of commands continuously
▪ Process few records per command
▪ 200 Redis commands + Lua scripting
▪ Real-time
30
Redis is a key-value store supporting 5 basic data types
Key => { Data Structures }
Key
"I'm a Plain Text String!"
Key1 Val1
Key2 Val 2
A: 0.1 B: 0.3 C: 500 D: 500
A B C D
C B B A C
Strings/Blobs/Bitmaps
Hash Tables (objects!)
Linked Lists
Sets
Sorted Sets
String
Hash
List
Set
Sorted set
31
Exercise: Let’s have a look at Redis in action
1. Play with Redis commands 2. Test Redis speed
32
Recap: We covered the 3 AWS building blocks for real-time data
KINESIS
Stream data
LAMBDA
Process data
ELASTICACHE
Store data
+
33
Contents
▪ Introduction
▪ Enabling technology
▪ Putting it all together
▪ Future direction
▪ Q&A
34
Real-time vs offline data stacks
Offline
stack
Real-
time
stack
Raw data Files on S3 Kinesis streams
Database Redshift Redis
Volume
High – processing millions /
billions of records at the same time
Low – processing
single records at a time
Velocity
Low – running
few queries at a time
High – running thousands / millions
of queries at the same time
Query language SQL Python + Redis commands
End-user Humans, BI tools Lambda, APIs, products
35
BATCH DATA STACK
Operational data layer
(listings, replies, users, orders, etc.)
Raw data layer
(data lake)
Tracking
(Ninja /
Hydra)
Platform DB
(Mongo)
Adjust /
Facebook /
Google
…
BI Segmentation
Performance
marketing
CLM
Batch
recommender
…
DATAWAREHOUSE
Raw data streams
REAL-TIME DATA STACK
Tracking
(Ninja / Hydra)
Platform DB
(Mongo)
…
Real-time
data processing
Real-time database (Online customer view)
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
λ
API gateway
Real-time
recommender
Real-time
segmentation
Other real-time
applications
Shedd end-to-end data stack architecutre
36
Shedd app
Android /
iOS SDK
FRONTEND
Recommendation
service orchestrator
Lambda
Endpoint(s)
API gateway
API
Event
stream
Kinesis
Event
processor
Lambda
Online
customer view
ElastiCache
(Redis)
BACKEND
Example: Shedd real-time recommendations
Shedd app
Ninja
Hydra
tracker
EC2
Platform DB
Mongo
TRACKING
37
Shedd app
Android /
iOS SDK
FRONTEND
Recommendation
service orchestrator
Lambda
Endpoint(s)
API gateway
API
Event
stream
Kinesis
Event
processor
Lambda
Online
customer view
ElastiCache
(Redis)
BACKEND
Example: Shedd real-time recommendations
Shedd app
Ninja
Hydra
tracker
EC2
Platform DB
Mongo
TRACKING
Segmentation API
Lambda
Kingsman service
38
Shedd app
Android /
iOS SDK
FRONTEND
Analytics API
handler
Lambda
Endpoint(s)
API gateway
API
Data
warehouse
Redshift
Redis
bulk loader
Lambda
Online
customer view
ElastiCache
(Redis)
BACKEND
Example: Shedd analytics APIs
Shedd app
Ninja
Hydra
tracker
EC2
Platform DB
Mongo
TRACKING
39
Contents
▪ Introduction
▪ Enabling technology
▪ Putting it all together
▪ Future direction
▪ Q&A
Thank you
Questions? Feedback?
Dobo Radichkov
Analytics summit, Jan 2018

More Related Content

What's hot

Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
 
How Retail Banks Use MongoDB
How Retail Banks Use MongoDBHow Retail Banks Use MongoDB
How Retail Banks Use MongoDBMongoDB
 
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB
 
Webinar: Expanding Retail Frontiers with MongoDB
 Webinar: Expanding Retail Frontiers with MongoDB Webinar: Expanding Retail Frontiers with MongoDB
Webinar: Expanding Retail Frontiers with MongoDBMongoDB
 
Improving Transactional Applications with Analytics
Improving Transactional Applications with AnalyticsImproving Transactional Applications with Analytics
Improving Transactional Applications with AnalyticsDATAVERSITY
 
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...MongoDB
 
Webinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDBWebinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDBMongoDB
 
Use Cases for NoSQL in Media
Use Cases for NoSQL in MediaUse Cases for NoSQL in Media
Use Cases for NoSQL in MediaSander Kieft
 
How leading financial services organisations are winning with tech
How leading financial services organisations are winning with techHow leading financial services organisations are winning with tech
How leading financial services organisations are winning with techMongoDB
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FSMongoDB
 
How Enterprises are Using NoSQL for Mission-Critical Applications
How Enterprises are Using NoSQL for Mission-Critical ApplicationsHow Enterprises are Using NoSQL for Mission-Critical Applications
How Enterprises are Using NoSQL for Mission-Critical ApplicationsDATAVERSITY
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB MongoDB
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
 
GraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business GraphGraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business GraphNeo4j
 
Event-Based Subscription with MongoDB
Event-Based Subscription with MongoDBEvent-Based Subscription with MongoDB
Event-Based Subscription with MongoDBMongoDB
 
Business Jumpstart: The Right (and Wrong) Use Cases for MongoDB
Business Jumpstart: The Right (and Wrong) Use Cases for MongoDBBusiness Jumpstart: The Right (and Wrong) Use Cases for MongoDB
Business Jumpstart: The Right (and Wrong) Use Cases for MongoDBMongoDB
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 Migration and Coexistence between Relational and NoSQL Databases by Manuel H... Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...Big Data Spain
 
Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017Neo4j
 
JavaScript as Data Processing Language & HTML5 Integration
JavaScript as Data Processing Language & HTML5 IntegrationJavaScript as Data Processing Language & HTML5 Integration
JavaScript as Data Processing Language & HTML5 IntegrationQuentin Adam
 
Calculating ROI with Innovative eCommerce Platforms
Calculating ROI with Innovative eCommerce PlatformsCalculating ROI with Innovative eCommerce Platforms
Calculating ROI with Innovative eCommerce PlatformsMongoDB
 

What's hot (20)

Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
How Retail Banks Use MongoDB
How Retail Banks Use MongoDBHow Retail Banks Use MongoDB
How Retail Banks Use MongoDB
 
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
 
Webinar: Expanding Retail Frontiers with MongoDB
 Webinar: Expanding Retail Frontiers with MongoDB Webinar: Expanding Retail Frontiers with MongoDB
Webinar: Expanding Retail Frontiers with MongoDB
 
Improving Transactional Applications with Analytics
Improving Transactional Applications with AnalyticsImproving Transactional Applications with Analytics
Improving Transactional Applications with Analytics
 
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
 
Webinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDBWebinar: Position and Trade Management with MongoDB
Webinar: Position and Trade Management with MongoDB
 
Use Cases for NoSQL in Media
Use Cases for NoSQL in MediaUse Cases for NoSQL in Media
Use Cases for NoSQL in Media
 
How leading financial services organisations are winning with tech
How leading financial services organisations are winning with techHow leading financial services organisations are winning with tech
How leading financial services organisations are winning with tech
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
 
How Enterprises are Using NoSQL for Mission-Critical Applications
How Enterprises are Using NoSQL for Mission-Critical ApplicationsHow Enterprises are Using NoSQL for Mission-Critical Applications
How Enterprises are Using NoSQL for Mission-Critical Applications
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
GraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business GraphGraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business Graph
 
Event-Based Subscription with MongoDB
Event-Based Subscription with MongoDBEvent-Based Subscription with MongoDB
Event-Based Subscription with MongoDB
 
Business Jumpstart: The Right (and Wrong) Use Cases for MongoDB
Business Jumpstart: The Right (and Wrong) Use Cases for MongoDBBusiness Jumpstart: The Right (and Wrong) Use Cases for MongoDB
Business Jumpstart: The Right (and Wrong) Use Cases for MongoDB
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 Migration and Coexistence between Relational and NoSQL Databases by Manuel H... Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 
Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017Neo4j PartnerDay Amsterdam 2017
Neo4j PartnerDay Amsterdam 2017
 
JavaScript as Data Processing Language & HTML5 Integration
JavaScript as Data Processing Language & HTML5 IntegrationJavaScript as Data Processing Language & HTML5 Integration
JavaScript as Data Processing Language & HTML5 Integration
 
Calculating ROI with Innovative eCommerce Platforms
Calculating ROI with Innovative eCommerce PlatformsCalculating ROI with Innovative eCommerce Platforms
Calculating ROI with Innovative eCommerce Platforms
 

Similar to Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona

AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...Sungmin Kim
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseElena Lopez
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...Amazon Web Services
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
Functional architectural patterns
Functional architectural patternsFunctional architectural patterns
Functional architectural patternsLars Albertsson
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in MotionRuhani Arora
 
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAmazon Web Services
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Amazon Web Services
 
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...Amazon Web Services
 
Real-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studyReal-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studydeep.bi
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgDavid Pilato
 
20141021 AWS Cloud Taekwon - Big Data on AWS
20141021 AWS Cloud Taekwon - Big Data on AWS20141021 AWS Cloud Taekwon - Big Data on AWS
20141021 AWS Cloud Taekwon - Big Data on AWSAmazon Web Services Korea
 
The Internet as a Single Database
The Internet as a Single DatabaseThe Internet as a Single Database
The Internet as a Single DatabaseDatafiniti
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Amazon Web Services
 
Snowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againSnowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againAlexander Dean
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017Amazon Web Services
 

Similar to Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona (20)

AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Functional architectural patterns
Functional architectural patternsFunctional architectural patterns
Functional architectural patterns
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
 
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
 
Real-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studyReal-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case study
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
 
20141021 AWS Cloud Taekwon - Big Data on AWS
20141021 AWS Cloud Taekwon - Big Data on AWS20141021 AWS Cloud Taekwon - Big Data on AWS
20141021 AWS Cloud Taekwon - Big Data on AWS
 
The Internet as a Single Database
The Internet as a Single DatabaseThe Internet as a Single Database
The Internet as a Single Database
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
 
Snowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againSnowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back again
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
 
Analysing Data in Real-time
Analysing Data in Real-timeAnalysing Data in Real-time
Analysing Data in Real-time
 

Recently uploaded

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Recently uploaded (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona

  • 1. Real-time serverless analytics at Shedd Overview and hands-on workshop Dobo Radichkov OLX Data Summit, March 2018
  • 2. 2 What to expect… ØGoal is to give you a sweeping view of the Shedd serverless real-time analytics stack ØWe will cover a lot of new tools and tech building blocks, though we will steer clear of the nitty gritty details ØExpect technical content and hands-on exercises – for the non-technical folk in the audience, try to focus on the high-level understanding of the concepts ØWe hope the presentation gives you inspiration and smoothens the learning curve in case you decide to pursue a similar approach
  • 3. 3 Contents ▪ Introduction ▪ Enabling technology ▪ Putting it all together ▪ Future direction ▪ Q&A
  • 5. 5 Why real-time analytics? VS Offline Real-time Enables products that adapt and respond to changing user behaviour instantly and continuously
  • 6. 6 Example: Consider this insight regarding first-time Shedd users Does not view any ads Views 1 or more ads Makes 1 or more replies Day 1 activity Browser Viewer Buyer
  • 7. 7 Example: Consider this insight regarding first-time Shedd users Does not view any ads Views 1 or more ads Makes 1 or more replies 2.9 ad views 0.02 replies 1.3 active days 150 ad views 0.4 replies 4.7 active days 670 ad views 6.7 replies 11.2 active days Day 1 activity Days 2-30 activity Browser Viewer Buyer
  • 8. 8 Example: Consider this insight regarding first-time Shedd users Does not view any ads Views 1 or more ads Makes 1 or more replies 2.9 ad views 0.02 replies 1.3 active days 150 ad views 0.4 replies 4.7 active days 670 ad views 6.7 replies 11.2 active days Day 1 activity Days 2-30 activity Browser Viewer Buyer How can real-time analytics help?
  • 9. 9 Real-time analytics unlocks a number of capabilities Segment user behaviour and build real-time single customer viewSegmentation Personalisation Targeting Reporting A/B testing Data-driven products Instantly personalise product experience based on up-to-date user preferences and behaviour Target users with push notifications, in-app messaging and custom product flows based on real-time triggers and rules Build mission-critical reports for real-time decision-making (e.g. during large live marketing campaign or new product releases) Continuously optimise live A/B tests based on real-time results Enable integration of data analytics & models within our products
  • 10. 10 Real-time analytics enables us to unlock the full value of dataThe diminishing value of data Recent data is highly valuab If you act on it in time Perishable Insights (M. Gualtieri, F Old + Recent data is more v If you have the means to combine t
  • 11. 11 BATCH DATA STACK Operational data layer (listings, replies, users, orders, etc.) Raw data layer (data lake) Tracking (Ninja / Hydra) Platform DB (Mongo) Adjust / Facebook / Google … BI Segmentation Performance marketing CLM Batch recommender … DATAWAREHOUSE Raw data streams REAL-TIME DATA STACK Tracking (Ninja / Hydra) Platform DB (Mongo) … Real-time data processing Real-time database (Online customer view) λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ API gateway Real-time recommender Real-time segmentation Other real-time applications Today we will take a peek at Shedd’s real-time data stack
  • 12. 12 Contents ▪ Introduction ▪ Enabling technology ▪ Putting it all together ▪ Future direction ▪ Q&A
  • 13. 13 We leverage 3 AWS building blocks for real-time data analytics KINESIS Stream data LAMBDA Process data ELASTICACHE Store data
  • 14. 14 We leverage 3 AWS building blocks for real-time data analytics KINESIS Stream data LAMBDA Process data ELASTICACHE Store data
  • 15. 15 Kinesis includes 3 flavours © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Amazon Kinesis Data Streams Amazon Kinesis Data Analytics Amazon Kinesis Data Firehose Build custom applications that process and analyze streaming data Easily process and analyze streaming data with standard SQL Easily load streaming data into AWS Stream à Process Stream à Analyse Stream à Ingest
  • 16. 16 Kinesis Data Stream architecture ▪ 1 MB / sec data input ▪ 1 MB / sec data output ▪ 1000 records / sec ▪ 24 hours data retention ▪ $0.015 / shard / hour ($10.80 / shard / month) ▪ $0.014 / 1M records ($14 / 1B records) … Stream Shard Event / data record (e.g. JSON object) Write event to stream shard Read event from stream shard
  • 17. 17 Exercise: Create stream and feed with sample data 1. Create Kinesis data stream 2. Feed sample real-time data https://us-west-2.console.aws.amazon.com/kinesis/home?region=us-west-2#/streams/create https://awslabs.github.io/amazon-kinesis-data-generator/
  • 18. 18 Kinesis Analytics enables real-time data analysis, transformation, enrichment and visualisation
  • 19. 19 Exercise: Create Kinesis Analytics application and run some real-time SQL analysis 1. Create Kinesis Analytics app 2. Run real-time SQL analysis
  • 20. 20 We leverage 3 AWS building blocks for real-time data analytics KINESIS Stream data LAMBDA Process data ELASTICACHE Store data
  • 21. 21 Evolution of computing models ON-PREMISE Physical servers SERVER as a service Virtual server in the cloud Amazon EC2 APP as a service Virtual app container Amazon ECS FUNCTION as a service Serverless computing AWS Lambda
  • 22. 22 Lambda is Amazon’s serverless event-driven compute service Write code in Python, Node.js, Java, and others and upload to Lambda Trigger code from other AWS services, HTTP endpoints or in-app activity Scale seamlessly and elastically with number of events, only using required compute resource Only pay for the compute time used (per 100ms execution time) Forget about infrastructure, administration and scaling – focus 100% on your app logic
  • 23. 23 Exercise: Let’s create 2 simple Lambda functions 1. Create Hello World 2. Create stream processor
  • 24. 24 Combining Lambda with API gateway empowers the data professional to create serverless APIs
  • 25. 25 serverless framework streamlines and automates deployment
  • 26. 26 Exercise: Create APIs with serverless + API gateway + Lambda 1. Create Hello World endpoint 2. Create mock API endpoint
  • 27. 27 We leverage 3 AWS building blocks for real-time data analytics KINESIS Stream data LAMBDA Process data ELASTICACHE Store data
  • 28. 28 ElastiCache is Amazon’s managed service for Redis: an INSANELY fast in-memory key-value database ▪In-memory ▪Low latency ▪Ridiculously fast ▪NoSQL à key-value store ▪Open source
  • 29. 29 Redis + Redshift = ▪ Run few queries infrequently ▪ Process billions of records per query ▪ Standard SQL ▪ Batch ▪ Run millions of commands continuously ▪ Process few records per command ▪ 200 Redis commands + Lua scripting ▪ Real-time
  • 30. 30 Redis is a key-value store supporting 5 basic data types Key => { Data Structures } Key "I'm a Plain Text String!" Key1 Val1 Key2 Val 2 A: 0.1 B: 0.3 C: 500 D: 500 A B C D C B B A C Strings/Blobs/Bitmaps Hash Tables (objects!) Linked Lists Sets Sorted Sets String Hash List Set Sorted set
  • 31. 31 Exercise: Let’s have a look at Redis in action 1. Play with Redis commands 2. Test Redis speed
  • 32. 32 Recap: We covered the 3 AWS building blocks for real-time data KINESIS Stream data LAMBDA Process data ELASTICACHE Store data +
  • 33. 33 Contents ▪ Introduction ▪ Enabling technology ▪ Putting it all together ▪ Future direction ▪ Q&A
  • 34. 34 Real-time vs offline data stacks Offline stack Real- time stack Raw data Files on S3 Kinesis streams Database Redshift Redis Volume High – processing millions / billions of records at the same time Low – processing single records at a time Velocity Low – running few queries at a time High – running thousands / millions of queries at the same time Query language SQL Python + Redis commands End-user Humans, BI tools Lambda, APIs, products
  • 35. 35 BATCH DATA STACK Operational data layer (listings, replies, users, orders, etc.) Raw data layer (data lake) Tracking (Ninja / Hydra) Platform DB (Mongo) Adjust / Facebook / Google … BI Segmentation Performance marketing CLM Batch recommender … DATAWAREHOUSE Raw data streams REAL-TIME DATA STACK Tracking (Ninja / Hydra) Platform DB (Mongo) … Real-time data processing Real-time database (Online customer view) λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ API gateway Real-time recommender Real-time segmentation Other real-time applications Shedd end-to-end data stack architecutre
  • 36. 36 Shedd app Android / iOS SDK FRONTEND Recommendation service orchestrator Lambda Endpoint(s) API gateway API Event stream Kinesis Event processor Lambda Online customer view ElastiCache (Redis) BACKEND Example: Shedd real-time recommendations Shedd app Ninja Hydra tracker EC2 Platform DB Mongo TRACKING
  • 37. 37 Shedd app Android / iOS SDK FRONTEND Recommendation service orchestrator Lambda Endpoint(s) API gateway API Event stream Kinesis Event processor Lambda Online customer view ElastiCache (Redis) BACKEND Example: Shedd real-time recommendations Shedd app Ninja Hydra tracker EC2 Platform DB Mongo TRACKING Segmentation API Lambda Kingsman service
  • 38. 38 Shedd app Android / iOS SDK FRONTEND Analytics API handler Lambda Endpoint(s) API gateway API Data warehouse Redshift Redis bulk loader Lambda Online customer view ElastiCache (Redis) BACKEND Example: Shedd analytics APIs Shedd app Ninja Hydra tracker EC2 Platform DB Mongo TRACKING
  • 39. 39 Contents ▪ Introduction ▪ Enabling technology ▪ Putting it all together ▪ Future direction ▪ Q&A
  • 40. Thank you Questions? Feedback? Dobo Radichkov Analytics summit, Jan 2018