SlideShare ist ein Scribd-Unternehmen logo
1 von 29
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Murtaza Doctor Principal Architect
Giang Nguyen Sr. Software Engineer
How we solved
Real-Time User
Segmentation using
Hbase
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Outline
• {rr} Story
• {rr} Personalization Platform
• User Segmentation: Problem Statement
• Design & Architecture
• Performance Metrics
• Q&A
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
• Personalized Product
Recommendations
• Targeted Promotional
Content
• Relevant Brand
Advertising
RichRelevance delivers:
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Multiple Personalization Placements
Personalized Product
Recommendations Targeted Promotions Relevant Brand Advertising
Inventory and
Margin Data
Catalog
Attribute Data
Real time user
behavior
Multi-Channel
purchase
history
3rd Party Data
Sources
Future Data
Sources
Input
Data
100+ algorithms dynamically targeted by
page, context, and user behavior, across
all retail channels.
Targeted content optimization to
customer segments
Monetization through relevant
brand advertising
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
 One place where all data
about your customer lives
(e.g. loyalty, customer
service, offline, catalogue)
 Direct access to data for
your entire enterprise via API
 Real-time actionable data
and business intelligence
 Link customer activity across
any channel
 Leverage 3rd party data (e.g.
weather, geography, census)
Delivering a
Customer-centric
Omni-channel
Data model
RichRelevanceDataMesh
Real-time
Segmentation
DMP
Integration
RichRelevance DataMesh Cloud Platform
Delivering a Single View of your Customer
Event-based
Triggers
Ad Hoc
Reporting
Omni-channel
Personalization
Loyalty
Integration
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Our cloud-based platform supports both real-time processes
and analytical use cases, utilizing technologies to name a
few: Crunch, Hive, HBase, Avro, Azkaban, Voldemort, Kafka
Someone clicks on a {rr} recommendation
every 21 milliseconds
Did You Know?
Our data capacity includes a 1.5 PB Hadoop infrastructure,
which enables us to employ 100+ algorithms in real-time
In the US, we serve 7000 requests per second with an average
response time of 50 ms
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
User Segmentation
Finding and Targeting the Right Audience
Example segment : savvy shopper
• Demographics
– Female
– Age 30 to 25
• DMA
– 98004
• Behavior
– Purchased a Gucci hand bag last month,
Louis Vuitton hand bag 2 weeks back,
Chanel hand bag last week
– Viewed products in the Cartier brand 2 days
ago
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
• Utilizes valuable targeting
data such as event logs,
off-line data, DMA,
demographics, and etc.
• Finds highly qualified
consumers and buckets them
into segments
– Example: for Retail, Segment
Builder supports view, click,
purchase on products,
categories, and brands
What is the {rr} Segment Builder?
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Segment Builder
Segment’s list page
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Segment Builder
Add/Edit segment page
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
• Create segments to capture the audience via web UI
– Segment definition consists of one or more behaviors joined
together using logical expressions
– Example: Find all users who viewed a category in the last 7 days at least
3 times, or clicked on a product at least 2 times in the last 2 days, and
from zip code 90210
• Each behavior is captured by a rule
• Each rule corresponds to a row key in HBase
• Each rule returns the users
• Rules are joined using set union or intersection
• Segment definition consists of one or more rules
Design: Segment Evaluation Engine
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
1.
Segment Builder
Add rule
2.
3.
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Segment Builder
Add additional rule
1.
3.
2.
4.
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
• User interacts with a retail site
• Events are triggered in real-time, each event is
converted into an Avro record
• Events are ingested using our real-time
framework built using Apache Kafka
Let’s start with real-time data ingestion
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Design Principles: Real-Time Solution
• Streaming: Support for streaming versus batch
• Reliability: no loss of events and at least once delivery of
messages
• Scalable: add more brokers, add more consumers,
partition data
• Distributed system with central coordinator like
zookeeper
• Persistence of messages
• Push & pull mechanism
• Support for compression
• Low operational complexity & easy to monitor
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Our Decision: Apache Kafka
• Distributed pub-sub messaging system
• More generic concept (Ingest & Publish)
• Can support both offline & online use-
cases
• Designed for persistent messages as
the common case
• Guarantee ordering of events
• Supports gzip & snappy (0.8) compression protocols
• Supports rewind offset & re-consumption of data
• No concept of master node, brokers are peers
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Common Consumer Framework
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Volume Facts?
• Daily clickstream event data ~ 150 GB
• Average size of message ~ 2 KB
• Batch Size ~ 5000 messages
• Producer throughput: 5000 messages/sec
• Real time Hbase Consumer: 7000 messages/sec
on average
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
End-to-end Real-time Story…
• User exhibits a behavior
• Events are generated at front-end data-center
• Events are streamed to backend data center via
Kafka
• Events are consumed and indexed in HBase
tables
• Takes seconds from event origination to indexing
in HBase
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
site.com
[retailer]
runtime
[websever]
clickstream
events
clicks
views
purchases kafka Producer
kafka
Broker
runtime colo
backend colo
mirrored
Kafka Broker
HBase
Consumer
HBase
Segment
Manager
Segment
History
Segment
Evaluator
avro
events
dimension tables
events
available in msecs
shopper
End-to-end Real-time Story…
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
• Row key represents the behavior
• Columns store the user id
• Cell stores the time when user exhibited
behavior. Used to capture frequency
• One column family U
Design: HBase Table
Rowkey Columns
338VBChanel1370377143 23b93laddf82:1370377143973
Hd92jslahd0a:1313323414192
338CCElectronic1370377143 z3be3la2dfa2:1370477142970
kd9zjsla3d01:1313323414192
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
• Row key contains an attribute value, siteId, metric, and
attribute, followed by the timestamp that this occured,
and finally the userGUID
[salt]_len_[siteId]_len_[metric]_len_[dimension]_len_[value]
[timestamp]
• _len_ is the integer length of the field following it. Fixed-
length fields (timestamp) don't get a length
• Timestamp is stored in day granularity
• [salt] is computed by first creating a hash from the siteId,
metric, and dimension, then combining this with a random
number between 0 and N (number of shards) for sharding
Design: User Index
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
• Each row is sharded to prevent hot-spotting
• Popular products or high level categories can
have millions of users, each day
• Each row key is sharded to keep the row small
• Shard into N number of regions (predefined)
• Calculate the hash with a random number
between 0 and N; we then spread this load to N
regions
Design: Row Sharding
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
• Each row key returns a set of users
• OR = Full outer join
• AND = Inner join
• Done in memory for small rules
• Merged on disk for large rules
Design: Joins
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Segmentation Performance
• Data indexed in real time
– User browses site and clicks product and is
added to a segment in less than a second
• 40K puts/sec over 2 tables, 8 regions per table
• Scaling achieved through addition of regions
• Segments with 4-5 attributes/dimensions can be
evaluated in msecs
• Large segments with millions of users can be
evaluated in secs
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
• Real-time data
ingestion an important
ingredient
• Indexing of
segmentation attributes
• HBase key design
critical for performance
• Distributed data
centers complicates
matters
Lessons Learned
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Thank You!

Weitere ähnliche Inhalte

Was ist angesagt?

How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?DataStax
 
Relying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services ExperienceRelying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services ExperienceCloudera, Inc.
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsDataWorks Summit
 
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...In-Memory Computing Summit
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsCloudera, Inc.
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduCloudera, Inc.
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...DataWorks Summit
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchCloudera, Inc.
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected BreweryJason Hubbard
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...DataWorks Summit
 
Mainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft AzureMainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft AzurePrecisely
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Webinar: Don't Leave Your Data in the Dark
Webinar: Don't Leave Your Data in the DarkWebinar: Don't Leave Your Data in the Dark
Webinar: Don't Leave Your Data in the DarkDataStax
 
Big Data, HPC and Streaming
Big Data, HPC and StreamingBig Data, HPC and Streaming
Big Data, HPC and StreamingAnjani Phuyal
 
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...PivotalOpenSourceHub
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduCloudera, Inc.
 

Was ist angesagt? (20)

HIPAA Compliance in the Cloud
HIPAA Compliance in the CloudHIPAA Compliance in the Cloud
HIPAA Compliance in the Cloud
 
How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?
 
Relying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services ExperienceRelying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services Experience
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache Kudu
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with Search
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected Brewery
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
 
Mainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft AzureMainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft Azure
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Webinar: Don't Leave Your Data in the Dark
Webinar: Don't Leave Your Data in the DarkWebinar: Don't Leave Your Data in the Dark
Webinar: Don't Leave Your Data in the Dark
 
Instrumenting your Instruments
Instrumenting your Instruments Instrumenting your Instruments
Instrumenting your Instruments
 
Big Data, HPC and Streaming
Big Data, HPC and StreamingBig Data, HPC and Streaming
Big Data, HPC and Streaming
 
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
 

Ähnlich wie HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural Case Study

Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.BI
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)DataStax
 
Hello Streams Overview
Hello Streams OverviewHello Streams Overview
Hello Streams Overviewpsanet
 
Vistara 3.1 - Delivering Unified IT Operations
Vistara 3.1 - Delivering Unified IT OperationsVistara 3.1 - Delivering Unified IT Operations
Vistara 3.1 - Delivering Unified IT OperationsVistara
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application DesignOrkhan Gasimov
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformArvind Sathi
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
 
Navigator Systems ltd HireTrack NX questions
Navigator Systems ltd   HireTrack NX questionsNavigator Systems ltd   HireTrack NX questions
Navigator Systems ltd HireTrack NX questionsDavid Rose
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Big Data Spain
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Kai Wähner
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMBig Data Joe™ Rossi
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMBig Data Joe™ Rossi
 
5 Ways to Make Waves with Informatica and Salesforce Analytics
5 Ways to Make Waves with Informatica and Salesforce Analytics5 Ways to Make Waves with Informatica and Salesforce Analytics
5 Ways to Make Waves with Informatica and Salesforce AnalyticsInformatica Cloud
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...DataWorks Summit/Hadoop Summit
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersDavid Walker
 
Race IT Review
Race IT ReviewRace IT Review
Race IT Reviewcschmidtva
 
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...Greg Makowski
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for CybersecurityVMware Tanzu
 
HBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseCloudera, Inc.
 

Ähnlich wie HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural Case Study (20)

Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
Hello Streams Overview
Hello Streams OverviewHello Streams Overview
Hello Streams Overview
 
Vistara 3.1 - Delivering Unified IT Operations
Vistara 3.1 - Delivering Unified IT OperationsVistara 3.1 - Delivering Unified IT Operations
Vistara 3.1 - Delivering Unified IT Operations
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application Design
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics Platform
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
Navigator Systems ltd HireTrack NX questions
Navigator Systems ltd   HireTrack NX questionsNavigator Systems ltd   HireTrack NX questions
Navigator Systems ltd HireTrack NX questions
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
5 Ways to Make Waves with Informatica and Salesforce Analytics
5 Ways to Make Waves with Informatica and Salesforce Analytics5 Ways to Make Waves with Informatica and Salesforce Analytics
5 Ways to Make Waves with Informatica and Salesforce Analytics
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
 
Race IT Review
Race IT ReviewRace IT Review
Race IT Review
 
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
 
Breaking Down the 'Monowhat'
Breaking Down the 'Monowhat'Breaking Down the 'Monowhat'
Breaking Down the 'Monowhat'
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
 
HBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBase
 

Mehr von Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Kürzlich hochgeladen (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural Case Study

  • 1. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Murtaza Doctor Principal Architect Giang Nguyen Sr. Software Engineer How we solved Real-Time User Segmentation using Hbase
  • 2. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Outline • {rr} Story • {rr} Personalization Platform • User Segmentation: Problem Statement • Design & Architecture • Performance Metrics • Q&A
  • 3. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. • Personalized Product Recommendations • Targeted Promotional Content • Relevant Brand Advertising RichRelevance delivers:
  • 4. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Multiple Personalization Placements Personalized Product Recommendations Targeted Promotions Relevant Brand Advertising Inventory and Margin Data Catalog Attribute Data Real time user behavior Multi-Channel purchase history 3rd Party Data Sources Future Data Sources Input Data 100+ algorithms dynamically targeted by page, context, and user behavior, across all retail channels. Targeted content optimization to customer segments Monetization through relevant brand advertising
  • 5. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential.  One place where all data about your customer lives (e.g. loyalty, customer service, offline, catalogue)  Direct access to data for your entire enterprise via API  Real-time actionable data and business intelligence  Link customer activity across any channel  Leverage 3rd party data (e.g. weather, geography, census) Delivering a Customer-centric Omni-channel Data model RichRelevanceDataMesh Real-time Segmentation DMP Integration RichRelevance DataMesh Cloud Platform Delivering a Single View of your Customer Event-based Triggers Ad Hoc Reporting Omni-channel Personalization Loyalty Integration
  • 6. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Our cloud-based platform supports both real-time processes and analytical use cases, utilizing technologies to name a few: Crunch, Hive, HBase, Avro, Azkaban, Voldemort, Kafka Someone clicks on a {rr} recommendation every 21 milliseconds Did You Know? Our data capacity includes a 1.5 PB Hadoop infrastructure, which enables us to employ 100+ algorithms in real-time In the US, we serve 7000 requests per second with an average response time of 50 ms
  • 7. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. User Segmentation Finding and Targeting the Right Audience Example segment : savvy shopper • Demographics – Female – Age 30 to 25 • DMA – 98004 • Behavior – Purchased a Gucci hand bag last month, Louis Vuitton hand bag 2 weeks back, Chanel hand bag last week – Viewed products in the Cartier brand 2 days ago
  • 8. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. • Utilizes valuable targeting data such as event logs, off-line data, DMA, demographics, and etc. • Finds highly qualified consumers and buckets them into segments – Example: for Retail, Segment Builder supports view, click, purchase on products, categories, and brands What is the {rr} Segment Builder?
  • 9. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Segment Builder Segment’s list page
  • 10. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Segment Builder Add/Edit segment page
  • 11. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. • Create segments to capture the audience via web UI – Segment definition consists of one or more behaviors joined together using logical expressions – Example: Find all users who viewed a category in the last 7 days at least 3 times, or clicked on a product at least 2 times in the last 2 days, and from zip code 90210 • Each behavior is captured by a rule • Each rule corresponds to a row key in HBase • Each rule returns the users • Rules are joined using set union or intersection • Segment definition consists of one or more rules Design: Segment Evaluation Engine
  • 12. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. 1. Segment Builder Add rule 2. 3.
  • 13. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Segment Builder Add additional rule 1. 3. 2. 4.
  • 14. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. • User interacts with a retail site • Events are triggered in real-time, each event is converted into an Avro record • Events are ingested using our real-time framework built using Apache Kafka Let’s start with real-time data ingestion
  • 15. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Design Principles: Real-Time Solution • Streaming: Support for streaming versus batch • Reliability: no loss of events and at least once delivery of messages • Scalable: add more brokers, add more consumers, partition data • Distributed system with central coordinator like zookeeper • Persistence of messages • Push & pull mechanism • Support for compression • Low operational complexity & easy to monitor
  • 16. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Our Decision: Apache Kafka • Distributed pub-sub messaging system • More generic concept (Ingest & Publish) • Can support both offline & online use- cases • Designed for persistent messages as the common case • Guarantee ordering of events • Supports gzip & snappy (0.8) compression protocols • Supports rewind offset & re-consumption of data • No concept of master node, brokers are peers
  • 17. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
  • 18. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Common Consumer Framework
  • 19. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Volume Facts? • Daily clickstream event data ~ 150 GB • Average size of message ~ 2 KB • Batch Size ~ 5000 messages • Producer throughput: 5000 messages/sec • Real time Hbase Consumer: 7000 messages/sec on average
  • 20. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. End-to-end Real-time Story… • User exhibits a behavior • Events are generated at front-end data-center • Events are streamed to backend data center via Kafka • Events are consumed and indexed in HBase tables • Takes seconds from event origination to indexing in HBase
  • 21. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. site.com [retailer] runtime [websever] clickstream events clicks views purchases kafka Producer kafka Broker runtime colo backend colo mirrored Kafka Broker HBase Consumer HBase Segment Manager Segment History Segment Evaluator avro events dimension tables events available in msecs shopper End-to-end Real-time Story…
  • 22. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. • Row key represents the behavior • Columns store the user id • Cell stores the time when user exhibited behavior. Used to capture frequency • One column family U Design: HBase Table Rowkey Columns 338VBChanel1370377143 23b93laddf82:1370377143973 Hd92jslahd0a:1313323414192 338CCElectronic1370377143 z3be3la2dfa2:1370477142970 kd9zjsla3d01:1313323414192
  • 23. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. • Row key contains an attribute value, siteId, metric, and attribute, followed by the timestamp that this occured, and finally the userGUID [salt]_len_[siteId]_len_[metric]_len_[dimension]_len_[value] [timestamp] • _len_ is the integer length of the field following it. Fixed- length fields (timestamp) don't get a length • Timestamp is stored in day granularity • [salt] is computed by first creating a hash from the siteId, metric, and dimension, then combining this with a random number between 0 and N (number of shards) for sharding Design: User Index
  • 24. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. • Each row is sharded to prevent hot-spotting • Popular products or high level categories can have millions of users, each day • Each row key is sharded to keep the row small • Shard into N number of regions (predefined) • Calculate the hash with a random number between 0 and N; we then spread this load to N regions Design: Row Sharding
  • 25. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. • Each row key returns a set of users • OR = Full outer join • AND = Inner join • Done in memory for small rules • Merged on disk for large rules Design: Joins
  • 26. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Segmentation Performance • Data indexed in real time – User browses site and clicks product and is added to a segment in less than a second • 40K puts/sec over 2 tables, 8 regions per table • Scaling achieved through addition of regions • Segments with 4-5 attributes/dimensions can be evaluated in msecs • Large segments with millions of users can be evaluated in secs
  • 27. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. • Real-time data ingestion an important ingredient • Indexing of segmentation attributes • HBase key design critical for performance • Distributed data centers complicates matters Lessons Learned
  • 28. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
  • 29. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Thank You!

Hinweis der Redaktion

  1. How we plan to go over stuff
  2. The RichRelevance DataMesh Cloud Platform delivers a single view of your customer by:Giving you one single place to house unlimited sets of dataExample use cases:Create your own run-time strategies (predictive models)Create and manage segments via toolAutomatic & real-time segment creationView performance of strategies against KPIs Run adhoc queries using SQL-like toolImport into offline toolsOLAP capabilitiesMarket Basket AnalysisCustomer Lifetime ValueSequential Pattern miningManage APIs, build products & applications
  3. Nuggets or Data Points1.5PB not as big as yahoo or facebook – huge from a retail industry perspective
  4. Distributed System:: i.e. producers, brokers and consumer entities can all be deployed to different hosts in different colos in a truly distributed fashion and coordination controlled through zookeeperPersistence of Messages: messages need to be persisted on the broker for reliability, replay and temporary storagePush & Pull Mechanism:: i.e. push data to Kafka server and pull data from it using a consumer. This allows for two different rates: rate at which messages are transferred to the kafka server and the rate at which the messages are consumed.: Kafka supports GZIP and version 0.8 will additionally support Snappy compression.