SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Architecting for Real-
Time Big Data Analytics
Robert Winters
www.robertwinters.nl
2
ROBERT WINTERS
Head of Business Intelligence, TravelBird
With ten years’ experience leading analytics teams across retail, gaming, travel, and telecom, I’ve had the opportunity to roll out several successful
big data projects based on a variety of platforms. In my current role as Head of BI at TravelBird I am responsible for the infrastructure, development,
and products associated with personalization, business intelligence, A/B testing, and general business analytics.
Ten years’ experience in analytics, five years with Vertica and big data
About Me
www.robertwinters.nl
3
What is different about big data?Architecting for real-time big data analytics
Volume
The average Fortune 500
company has 235 terabytes of
data, doubling every two years.
However, only 0,5% of data is
used for analysis
Velocity
“Real-time” has gone from “by
start of business tomorrow” to
“within milliseconds”.
Variety
More than 70% of data is semi-
structured/unstructured and
growing more complex; user
generated content is one of the
richest sources of information.
Veracity
Increasing volume, velocity, and
veracity with decreasing control
leads to lower reliability of any
given data point. Data can arrive
several times, out of order, or not
at all.
www.robertwinters.nl
4
NEW INSIGHTS
Big data analysis has allowed us to
discover which days were in demand but
unavailable, how often individuals read a
given email, and how much patience
people have for video ads.
NEW OPPORTUNITIES
The power of additional insights has
unlocked additional revenue streams via
product creation, new partnerships
based on insights, and demand-driven
optimisation.
PREDICTIVE POWER
Having thousands of events per
customer allows us to know what to say,
when to say it, and how to say it on an
individual level.
BETTER CONTROL
Real time monitoring of events provides
faster identification of service failures
and risk identification, reducing the risk
of negative customer experiences.
What can big data bring?Architecting for real-time big data analytics
www.robertwinters.nl
5
Features benefitting from real-time analyticsArchitecting for real-time big data analytics
A number of features can be built based on real-time data processing feeds which significantly improve customer experience and business performance. All of
the above have either been tested or are under development.
Personal
Suggestions
Suggestions can be made
based on changing behaviour,
maximising relevancy
Behaviour
Prediction
Models can estimate
emotional propensity,
activating business rules
Product
Optimisation
Offer content and pricing can
be adjusted continuously
based on customer behaviour
Experience
Improvement
Customer interactions can
highlight demand or difficulty,
allowing rapid correction
Relevant
Messaging
Messaging can be send at the
exact moment of effectiveness
based on location or state
www.robertwinters.nl
6
1,7M 75% 33% 48%
Staff shortage
McKinsey estimates that
1,7M unstaffed big data
positions will exist by
2018.
Management
Complexity
75% of companies
report big data
platforms as somewhat
or extremely challenging
Untrustworthy
One in three business
executives reports their
big data sources as
untrustworthy
Value Creation
Almost half of all big
data projects create low
or no ROI for
organisations
But it brings new challengesArchitecting for real-time big data analytics
www.robertwinters.nl
7
Architecture to enable real-time big dataArchitecting for real-time big data analytics
Data Collection
How you gather data must be
extremely highly performant,
dynamically scalable, and able to
ingest both unstructured and
structured records.
Data Pipeline
The data pipeline should support
multiple speeds of processing
and allow for out of sync or
corrupted data.
Data Storage
In an ideal environment, storage
allows for the analysis of both
structured and semi-structured
data and can horizontally scale.
Analytical Tools
Self-service oriented tooling
dominates today’s BI
requirements and should
empower end users on all data
sources.
www.robertwinters.nl
8
Real-time stream: when you need it now
For decisions and reporting where timeliness is more important than
completeness, real time pipelines are used to achieve sub-second timing
between the event and the targeted feed. This is useful for recommendations,
in-session optimization, and monitoring.
“Slow” stream: when you need it right
At the same time, data is written to a more persistent store for later (eg. nightly)
processing and analysis. This is used to ensure completeness, cleanliness, apply
complex transformations, and ensure that data can be trusted.
Lambda ArchitectureArchitecting for real-time big data analytics
www.robertwinters.nl
9
Relational
Pipelines
Data
Bus
Event
Collectors
Email, mobile,
web events
JSON objects are created in
clients and handed to event
collectors. Success with both “roll
your own” and open source
(Snowplow).
Transactions and
dimensions
To capture changes in production
databases, both heterogenous
replication and a pull-based
approach can be used.
Landing point for
data streams
Both stream types write to a bus
(Kafka, Kinesis, etc) which makes
the data available for both real
time analysis and ELT.
Data CollectionArchitecting for real-time big data analytics
www.robertwinters.nl
10
Data ProcessingArchitecting for real-time big data analytics
Data is read directly off
the bus, receives
minimal transformation,
and pipes each event
into the real time
systems. No validation is
done for quality or
completeness.
STREAM
PROCESSING
Every few minutes,
batches are fetched
from the bus for more
robust processing,
loading to the DWH for
analysis, and rotated to
permanent storage.
BATCH
PROCESSING
Every hour, data from all
sources is processed into
the “system of truth”
data warehouse model
and subjected to all
business rules before
presentation to
reporting.
TRADITIONAL
ETL
www.robertwinters.nl
11
The data problem that you are optimising for dictates the best technology for that scope. With the prevalence of cloud hosting and
managed solutions, having a variety of components in your landscape is easier and more cost effective than ever.
For teams dealing with mostly relational data
in the tens/hundreds of terabytes range, a
MPP column store is often the best tool for
the primary data environment.
Real time aggregations in Redis or Spark often
provide the fastest way to get meaningful
insights from your data when seconds matter.
Some form of “data lake” should exist to keep
a snapshot of all time and allow broad
analysis. The lake could be either Hadoop or a
well organised file store like S3/GFS.
We can use these technologies in conjunction
to have a performant, federated data
platform.
Data PersistenceArchitecting for real-time big data analytics
www.robertwinters.nl
12
Speed of data loading
Data loading which took 70 minutes in Redshift completes
in nine minutes in Vertica, lowering time to analysis
Flex tables
All event data is loaded as semi-structured JSON,
allowing rapid iteration without any DB changes
Ease of administration and tuning
Adjusting table structure for performance optimisation
is trivially easy; multiple projections can be built to
cover different query types
High performance queries
Queries complete orders of magnitude faster than OLTP
systems or Hive, allowing faster iteration in analytics
Event series analytics
We use event analytics to do complex analytics like
dynamic sessionization using simple SQL
Significant SQL extensions
Extensions for text analytics, data manipulation, and
analytical aggregation allow us to do most work directly in
the database, speeding analysis
Why VerticaArchitecting for real-time big data analytics
www.robertwinters.nl
13
Tableau PythonSpark
Machine learning
Predictive analytical models are
built and evaluated within spark
running on AWS, then loaded into
Vertica for user scoring
Business analysis
Business users analyse data via
Tableau’s web interface,
generating >1000 view requests
per day.
Data analysis
General data analytics including
forecasting and regression is
conducted via python models on
top of Vertica queries
Analysis ToolingArchitecting for real-time big data analytics
14
Service oriented
Using a number of simple
services provides technical
flexibility
Analytics focused
Every aspect of the stack
either supports analysis or
application of insight
Low cost
Use of cloud and dynamic
scaling allows excellent
cost management
Cloud based
AWS hosting allows
infinite scaling and easy
administration
All Together
$
www.robertwinters.nl
15
Personalised
Emails
Delivered +20% CTR and
+35% offer views vs marketing
selections
Clustering
Achieved a 60% reduction in
email pressure while
maintaining 90% of clicks
Individual
Suggestions
+11% increase in content
consumption
Calendar
Heatmap
Increased price based on
demand, increasing margin
while maintaining sales
Feature SuccessesSuccesses across a variety of market segments
www.robertwinters.nl
16
Questions?
Rob Winters wintersrd@gmail.com http://www.robertwinters.nl

Weitere ähnliche Inhalte

Was ist angesagt?

Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret Weapon
Databricks
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
Romeo Kienzler
 
Empowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark StreamingEmpowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark Streaming
Databricks
 

Was ist angesagt? (20)

Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
 
Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret Weapon
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
 
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big DataVoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
VoltDB and HPE Vertica Present: Building an IoT Architecture for Fast + Big Data
 
Yellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time AnalyticsYellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time Analytics
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Architecting for analytics
Architecting for analyticsArchitecting for analytics
Architecting for analytics
 
How Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments WebcastHow Yellowbrick Data Integrates to Existing Environments Webcast
How Yellowbrick Data Integrates to Existing Environments Webcast
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
SplunkSummit 2015 - Real World Big Data Architecture
SplunkSummit 2015 -  Real World Big Data ArchitectureSplunkSummit 2015 -  Real World Big Data Architecture
SplunkSummit 2015 - Real World Big Data Architecture
 
Denodo DataFest 2017: Outpace Your Competition with Real-Time Responses
Denodo DataFest 2017: Outpace Your Competition with Real-Time ResponsesDenodo DataFest 2017: Outpace Your Competition with Real-Time Responses
Denodo DataFest 2017: Outpace Your Competition with Real-Time Responses
 
Empowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark StreamingEmpowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark Streaming
 
Building a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache SparkBuilding a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache Spark
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Testing the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big ProblemsTesting the Data Warehouse—Big Data, Big Problems
Testing the Data Warehouse—Big Data, Big Problems
 
Databricks Whitelabel: Making Petabyte Scale Data Consumable to All Our Custo...
Databricks Whitelabel: Making Petabyte Scale Data Consumable to All Our Custo...Databricks Whitelabel: Making Petabyte Scale Data Consumable to All Our Custo...
Databricks Whitelabel: Making Petabyte Scale Data Consumable to All Our Custo...
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
 

Andere mochten auch

Insight Fuelled Shopper Marketing - Engage Shopper Marketing
Insight Fuelled Shopper Marketing - Engage Shopper MarketingInsight Fuelled Shopper Marketing - Engage Shopper Marketing
Insight Fuelled Shopper Marketing - Engage Shopper Marketing
Merlien Institute
 
Business Intelligence in FMCG
Business Intelligence in FMCGBusiness Intelligence in FMCG
Business Intelligence in FMCG
09bm8049
 

Andere mochten auch (17)

Building a Personalized Offer Using Machine Learning
Building a Personalized Offer Using Machine LearningBuilding a Personalized Offer Using Machine Learning
Building a Personalized Offer Using Machine Learning
 
real time big data
real time big data real time big data
real time big data
 
Top bi travelbird
Top bi travelbirdTop bi travelbird
Top bi travelbird
 
Big Data Expo 2015 - Infotopics Zien, Begrijpen, Doen!
Big Data Expo 2015 - Infotopics Zien, Begrijpen, Doen!Big Data Expo 2015 - Infotopics Zien, Begrijpen, Doen!
Big Data Expo 2015 - Infotopics Zien, Begrijpen, Doen!
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
Insight Fuelled Shopper Marketing - Engage Shopper Marketing
Insight Fuelled Shopper Marketing - Engage Shopper MarketingInsight Fuelled Shopper Marketing - Engage Shopper Marketing
Insight Fuelled Shopper Marketing - Engage Shopper Marketing
 
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...
Real-Time Analytics and Visualization of Streaming Big Data with JReport & Sc...
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
 
Northwestern Computational Research Day keynote april 19, 2016 (1)
Northwestern Computational Research Day keynote april 19, 2016 (1)Northwestern Computational Research Day keynote april 19, 2016 (1)
Northwestern Computational Research Day keynote april 19, 2016 (1)
 
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...
AWS re:Invent 2016: How Mapbox Uses the AWS Edge to Deliver Fast Maps for Mob...
 
Real Time and Big Data – It’s About Time
Real Time and Big Data – It’s About TimeReal Time and Big Data – It’s About Time
Real Time and Big Data – It’s About Time
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
 
Business Intelligence in FMCG
Business Intelligence in FMCGBusiness Intelligence in FMCG
Business Intelligence in FMCG
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
BI in FMCG
BI in FMCGBI in FMCG
BI in FMCG
 

Ähnlich wie Architecting for Real-Time Big Data Analytics

Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Denodo
 

Ähnlich wie Architecting for Real-Time Big Data Analytics (20)

Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 
Analyzing Real-time Streaming Data with Amazon Kinesis
Analyzing Real-time Streaming Data with Amazon KinesisAnalyzing Real-time Streaming Data with Amazon Kinesis
Analyzing Real-time Streaming Data with Amazon Kinesis
 
Three Dimensions of Data as a Service
Three Dimensions of Data as a ServiceThree Dimensions of Data as a Service
Three Dimensions of Data as a Service
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
A Winning Strategy for the Digital Economy
A Winning Strategy for the Digital EconomyA Winning Strategy for the Digital Economy
A Winning Strategy for the Digital Economy
 
Is your data paying you dividends?
Is your data paying you dividends? Is your data paying you dividends?
Is your data paying you dividends?
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Transforming Financial Services with Event Streaming Data
Transforming Financial Services with Event Streaming DataTransforming Financial Services with Event Streaming Data
Transforming Financial Services with Event Streaming Data
 
The Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionThe Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reduction
 
KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Architecting for Real-Time Big Data Analytics

  • 1. Architecting for Real- Time Big Data Analytics Robert Winters
  • 2. www.robertwinters.nl 2 ROBERT WINTERS Head of Business Intelligence, TravelBird With ten years’ experience leading analytics teams across retail, gaming, travel, and telecom, I’ve had the opportunity to roll out several successful big data projects based on a variety of platforms. In my current role as Head of BI at TravelBird I am responsible for the infrastructure, development, and products associated with personalization, business intelligence, A/B testing, and general business analytics. Ten years’ experience in analytics, five years with Vertica and big data About Me
  • 3. www.robertwinters.nl 3 What is different about big data?Architecting for real-time big data analytics Volume The average Fortune 500 company has 235 terabytes of data, doubling every two years. However, only 0,5% of data is used for analysis Velocity “Real-time” has gone from “by start of business tomorrow” to “within milliseconds”. Variety More than 70% of data is semi- structured/unstructured and growing more complex; user generated content is one of the richest sources of information. Veracity Increasing volume, velocity, and veracity with decreasing control leads to lower reliability of any given data point. Data can arrive several times, out of order, or not at all.
  • 4. www.robertwinters.nl 4 NEW INSIGHTS Big data analysis has allowed us to discover which days were in demand but unavailable, how often individuals read a given email, and how much patience people have for video ads. NEW OPPORTUNITIES The power of additional insights has unlocked additional revenue streams via product creation, new partnerships based on insights, and demand-driven optimisation. PREDICTIVE POWER Having thousands of events per customer allows us to know what to say, when to say it, and how to say it on an individual level. BETTER CONTROL Real time monitoring of events provides faster identification of service failures and risk identification, reducing the risk of negative customer experiences. What can big data bring?Architecting for real-time big data analytics
  • 5. www.robertwinters.nl 5 Features benefitting from real-time analyticsArchitecting for real-time big data analytics A number of features can be built based on real-time data processing feeds which significantly improve customer experience and business performance. All of the above have either been tested or are under development. Personal Suggestions Suggestions can be made based on changing behaviour, maximising relevancy Behaviour Prediction Models can estimate emotional propensity, activating business rules Product Optimisation Offer content and pricing can be adjusted continuously based on customer behaviour Experience Improvement Customer interactions can highlight demand or difficulty, allowing rapid correction Relevant Messaging Messaging can be send at the exact moment of effectiveness based on location or state
  • 6. www.robertwinters.nl 6 1,7M 75% 33% 48% Staff shortage McKinsey estimates that 1,7M unstaffed big data positions will exist by 2018. Management Complexity 75% of companies report big data platforms as somewhat or extremely challenging Untrustworthy One in three business executives reports their big data sources as untrustworthy Value Creation Almost half of all big data projects create low or no ROI for organisations But it brings new challengesArchitecting for real-time big data analytics
  • 7. www.robertwinters.nl 7 Architecture to enable real-time big dataArchitecting for real-time big data analytics Data Collection How you gather data must be extremely highly performant, dynamically scalable, and able to ingest both unstructured and structured records. Data Pipeline The data pipeline should support multiple speeds of processing and allow for out of sync or corrupted data. Data Storage In an ideal environment, storage allows for the analysis of both structured and semi-structured data and can horizontally scale. Analytical Tools Self-service oriented tooling dominates today’s BI requirements and should empower end users on all data sources.
  • 8. www.robertwinters.nl 8 Real-time stream: when you need it now For decisions and reporting where timeliness is more important than completeness, real time pipelines are used to achieve sub-second timing between the event and the targeted feed. This is useful for recommendations, in-session optimization, and monitoring. “Slow” stream: when you need it right At the same time, data is written to a more persistent store for later (eg. nightly) processing and analysis. This is used to ensure completeness, cleanliness, apply complex transformations, and ensure that data can be trusted. Lambda ArchitectureArchitecting for real-time big data analytics
  • 9. www.robertwinters.nl 9 Relational Pipelines Data Bus Event Collectors Email, mobile, web events JSON objects are created in clients and handed to event collectors. Success with both “roll your own” and open source (Snowplow). Transactions and dimensions To capture changes in production databases, both heterogenous replication and a pull-based approach can be used. Landing point for data streams Both stream types write to a bus (Kafka, Kinesis, etc) which makes the data available for both real time analysis and ELT. Data CollectionArchitecting for real-time big data analytics
  • 10. www.robertwinters.nl 10 Data ProcessingArchitecting for real-time big data analytics Data is read directly off the bus, receives minimal transformation, and pipes each event into the real time systems. No validation is done for quality or completeness. STREAM PROCESSING Every few minutes, batches are fetched from the bus for more robust processing, loading to the DWH for analysis, and rotated to permanent storage. BATCH PROCESSING Every hour, data from all sources is processed into the “system of truth” data warehouse model and subjected to all business rules before presentation to reporting. TRADITIONAL ETL
  • 11. www.robertwinters.nl 11 The data problem that you are optimising for dictates the best technology for that scope. With the prevalence of cloud hosting and managed solutions, having a variety of components in your landscape is easier and more cost effective than ever. For teams dealing with mostly relational data in the tens/hundreds of terabytes range, a MPP column store is often the best tool for the primary data environment. Real time aggregations in Redis or Spark often provide the fastest way to get meaningful insights from your data when seconds matter. Some form of “data lake” should exist to keep a snapshot of all time and allow broad analysis. The lake could be either Hadoop or a well organised file store like S3/GFS. We can use these technologies in conjunction to have a performant, federated data platform. Data PersistenceArchitecting for real-time big data analytics
  • 12. www.robertwinters.nl 12 Speed of data loading Data loading which took 70 minutes in Redshift completes in nine minutes in Vertica, lowering time to analysis Flex tables All event data is loaded as semi-structured JSON, allowing rapid iteration without any DB changes Ease of administration and tuning Adjusting table structure for performance optimisation is trivially easy; multiple projections can be built to cover different query types High performance queries Queries complete orders of magnitude faster than OLTP systems or Hive, allowing faster iteration in analytics Event series analytics We use event analytics to do complex analytics like dynamic sessionization using simple SQL Significant SQL extensions Extensions for text analytics, data manipulation, and analytical aggregation allow us to do most work directly in the database, speeding analysis Why VerticaArchitecting for real-time big data analytics
  • 13. www.robertwinters.nl 13 Tableau PythonSpark Machine learning Predictive analytical models are built and evaluated within spark running on AWS, then loaded into Vertica for user scoring Business analysis Business users analyse data via Tableau’s web interface, generating >1000 view requests per day. Data analysis General data analytics including forecasting and regression is conducted via python models on top of Vertica queries Analysis ToolingArchitecting for real-time big data analytics
  • 14. 14 Service oriented Using a number of simple services provides technical flexibility Analytics focused Every aspect of the stack either supports analysis or application of insight Low cost Use of cloud and dynamic scaling allows excellent cost management Cloud based AWS hosting allows infinite scaling and easy administration All Together $
  • 15. www.robertwinters.nl 15 Personalised Emails Delivered +20% CTR and +35% offer views vs marketing selections Clustering Achieved a 60% reduction in email pressure while maintaining 90% of clicks Individual Suggestions +11% increase in content consumption Calendar Heatmap Increased price based on demand, increasing margin while maintaining sales Feature SuccessesSuccesses across a variety of market segments