SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Leveraging Customer Data to Enhance Relevancy
in Personalization
“Using Apache Data Processing Projects on top of MongoDB”
Marc Schwering
Sr. Solution Architect – EMEA
marc@mongodb.com
@m4rcsch
2
Big Data Analytics Track
1. Driving Personalized Experiences Using Customer Profiles
2. Leveraging Data to Enhance Relevancy in Personalization
3. Machine Learning to Engage the Customer, with Apache Spark,
IBM Watson, and MongoDB
3
Agenda For This Session
• Personalization Process Review
• The Life of an Application
• Separation of Concerns / Real World Architecture
• Apache Spark and Flink Data Processing Projects
• Clustering with Apache Flink
• Next Steps
4
High Level Personalization Process
1. Profile created
2. Enrich with public data
3. Capture activity
4. Clustering analysis
5. Define Personas
6. Tag with personas
7. Personalize interactions
Batch analytics
Public data
Common
technologies
• R
• Hadoop
• Spark
• Python
• Java
• Many other
options Personas
changed much
less often than
tagging
5
Evolution of a Profile (1)
{
"_id" : ObjectId("553ea57b588ac9ef066428e1"),
"ipAddress" : "216.58.219.238",
"referrer" : ”kay.com",
"firstName" : "John",
"lastName" : "Doe",
"email" : "johndoe@gmail.com"
}
6
Evolution of a Profile (n+1)
{
"_id" : ObjectId("553e7dca588ac9ef066428e0"),
"firstName" : "John",
"lastName" : "Doe",
"address" : "229 W. 43rd St.",
"city" : "New York",
"state" : "NY",
"zipCode" : "10036",
"age" : 30,
"email" : "john.doe@mongodb.com",
"twitterHandle" : "johndoe",
"gender" : "male",
"interests" : [
"electronics",
"basketball",
"weightlifting",
"ultimate frisbee",
"traveling",
"technology"
],
"visitedCounts" : {
"watches" : 3,
"shirts" : 1,
"sunglasses" : 1,
"bags" : 2
},
"purchases" : [
{
"id" : 1,
"desc" : "Power Oxford Dress Shoe",
"category" : "Mens shoes"
},
{
"id" : 2,
"desc" : "Striped Sportshirt",
"category" : "Mens shirts"
}
],
"persona" : "shoe-fanatic”
}
7
One size/document fits all?
• Profile Data
– Preferences
– Personal information
• Contact information
• DOB, gender, ZIP...
• Customer Data
– Purchase History
– Marketing History
• „Session Data“
– View History
– Shopping Cart Data
– Information Broker Data
• Personalisation Data
– Persona Vectors
– Product and Category recommendations
Application
Batch analytics
8
Separation of Concerns
• Profile Data
– Preferences
– Personal information
• Contact information
• DOB, gender, ZIP...
• Customer Data
– Purchase History
– Marketing History
• „Session Data“
– View History
– Shopping Cart Data
– Information Broker Data
• Personalisation Data
– Persona Vectors
– Product and Category recommendations
Batch analytics Layer
Frontend - System
Profile Service
Customer
Service
Session Service Persona Service
9
Benefits
• Code does less, Document and Code stays focused
• Split ability
– Different Teams
– New Languages
– Defined Dependencies
10
Result
• Code does less, Document and Code stays focused
• Split ability
– Different Teams
– New Languages
– Defined Dependencies
KISS
=> Keep it simple and save!
=> Clean Code <=
• Robert C. Marten: https://cleancoders.com/
• M. Fowler / B. Meyer. et. al.: Command Query Separation
Analytics and Personalization
From Query to Clustering
12
Separation of Concerns
• Profile Data
– Preferences
– Personal information
• Contact information
• DOB, gender, ZIP...
• Customer Data
– Purchase History
– Marketing History
• „Session Data“
– View History
– Shopping Cart Data
– Information Broker Data
• Personalisation Data
– Persona Vectors
– Product and Category recommendations
Batch analytics Layer
Frontend – System
Profile Service
Customer
Service
Session Service Persona Service
13
Separation of Concerns
• Profile Data
– Preferences
– Personal information
• Contact information
• DOB, gender, ZIP...
• Customer Data
– Purchase History
– Marketing History
• „Session Data“
– View History
– Shopping Cart Data
– Information Broker Data
• Personalisation Data
– Persona Vectors
– Product and Category recommendations
Batch analytics Layer
Frontend – System
Profile Service
Customer
Service
Session Service Persona Service
14
Architecture revised
Profile Service
Customer
Service
Session Service Persona Service
Frontend – System Backend– Systems
Data
Processing
15
Advice for Developers
• OWN YOUR DATA! (but only relevant Data)
• Say no! (to direct Data ie. DB Access)
Data Processing
17
Hadoop in a Nutshell
• An open source distributed storage and
distributed batch oriented processing framework
• Hadoop Distributed File System (HDFS) to store data on
commodity hardware
• Yarn as resource management platform
• MapReduce as programming model working on top of HDFS
18
Spark in a Nutshell
• Spark is a top-level Apache project
• Can be run on top of YARN and can read any
Hadoop API data, including HDFS or MongoDB
• Fast and general engine for large-scale data processing and
analytics
• Advanced DAG execution engine with support for data locality
and in-memory computing
19
Flink in a Nutshell
• Flink is a top-level Apache project
• Can be run on top of YARN and can read any
Hadoop API data, including HDFS or MongoDB
• A distributed streaming dataflow engine
• Streaming and batch
• Iterative in memory execution and handling
• Cost based optimizer
20
Latency of query operations
Query Aggregation MapReduce Cluster Algorithms
time
MongoDB
Hadoop
Spark/Flink
Iterative Algorithms / Clustering
22
K-Means in Pictures
• Source: Wikipedia K-Means
23
K-Means as a Process
24
Iterations in Hadoop and Spark
25
Iterations in Flink
• Dedicated iteration operators
• Tasks keep running for the iterations, not redeployed for each step
• Caching and optimizations done automatically
Demo
27
Result
28
More…?
29
Takeaways
• Stay focussed => Start and stay small
– Evaluate with BigDocuments but do a PoC focussed on the
topic
• Extending functionality is easy
– Aggregation, MapReduce
– Hadoop Connector opens a new variety of Use Cases
• Extending functionality could be challenging
– Evolution is outpacing help channels
– A lot of options (Spark, Flink, Storm, Hadoop….)
– More than just a binary
30
Next Steps
• Next Session => Hands on Spark and Whatson Content!
– „Machine Learning to Engage the Customer, with Apache Spark, IBM Watson,
and MongoDB“
– RDD Examples
• Try out Spark and Flink
– http://bit.ly/MongoDB_Hadoop_Spark_Webinar
– http://flink.apache.org/
– https://github.com/mongodb/mongo-hadoop
– https://github.com/m4rcsch/flink-mongodb-example
• Participate and ask Questions!
– @m4rcsch
– marc@mongodb.com
Thank you!
Marc Schwering
Sr. Solutions Architect – EMEA
marc@mongodb.com
@m4rcsch

Weitere ähnliche Inhalte

Was ist angesagt?

MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
MongoDB
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query Language
Neo4j
 
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Simplilearn
 
The Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsThe Real-time Web in the Age of Agents
The Real-time Web in the Age of Agents
Joshua Shinavier
 

Was ist angesagt? (19)

Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
 
Solr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewSolr 6.0 Graph Query Overview
Solr 6.0 Graph Query Overview
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to Graph
 
Using Neo4j from Java
Using Neo4j from JavaUsing Neo4j from Java
Using Neo4j from Java
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query Language
 
Cascalog
CascalogCascalog
Cascalog
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
 
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
 
Performance of graph query languages
Performance of graph query languagesPerformance of graph query languages
Performance of graph query languages
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big Data
 
Neo4j - graph database for recommendations
Neo4j - graph database for recommendationsNeo4j - graph database for recommendations
Neo4j - graph database for recommendations
 
Solr Graph Query: Presented by Kevin Watters, KMW Technology
Solr Graph Query: Presented by Kevin Watters, KMW TechnologySolr Graph Query: Presented by Kevin Watters, KMW Technology
Solr Graph Query: Presented by Kevin Watters, KMW Technology
 
The Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsThe Real-time Web in the Age of Agents
The Real-time Web in the Age of Agents
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionApache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4j
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and Workshop
 

Ähnlich wie Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Personalization

Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...
Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...
Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...
Flink Forward
 
MongoDB Partner Program Update - November 2013
MongoDB Partner Program Update - November 2013MongoDB Partner Program Update - November 2013
MongoDB Partner Program Update - November 2013
MongoDB
 
The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)
The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)
The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)
DataPad Inc.
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 

Ähnlich wie Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Personalization (20)

Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...
Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...
Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
 
MongoDB Partner Program Update - November 2013
MongoDB Partner Program Update - November 2013MongoDB Partner Program Update - November 2013
MongoDB Partner Program Update - November 2013
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
 
The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)
The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)
The Last Mile: Challenges and Opportunities in Data Tools (Strata 2014)
 
Apache drill
Apache drillApache drill
Apache drill
 
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova Generazione
 
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the FieldPartner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
 
Neo4j GraphDay Seattle- Sept19- in the enterprise
Neo4j GraphDay Seattle- Sept19-  in the enterpriseNeo4j GraphDay Seattle- Sept19-  in the enterprise
Neo4j GraphDay Seattle- Sept19- in the enterprise
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data ArchitectHadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experience
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Spark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika TechnologiesSpark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika Technologies
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglot
 
awari-ds-aula1.pdf
awari-ds-aula1.pdfawari-ds-aula1.pdf
awari-ds-aula1.pdf
 
Retail & CPG
Retail & CPGRetail & CPG
Retail & CPG
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
Presentacion day f-core v1.2.1.2-technical - english
Presentacion day f-core v1.2.1.2-technical - englishPresentacion day f-core v1.2.1.2-technical - english
Presentacion day f-core v1.2.1.2-technical - english
 

Mehr von MongoDB

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 

Kürzlich hochgeladen (20)

20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 

Big Data Analytics 2: Leveraging Customer Behavior to Enhance Relevancy in Personalization

  • 1. Leveraging Customer Data to Enhance Relevancy in Personalization “Using Apache Data Processing Projects on top of MongoDB” Marc Schwering Sr. Solution Architect – EMEA marc@mongodb.com @m4rcsch
  • 2. 2 Big Data Analytics Track 1. Driving Personalized Experiences Using Customer Profiles 2. Leveraging Data to Enhance Relevancy in Personalization 3. Machine Learning to Engage the Customer, with Apache Spark, IBM Watson, and MongoDB
  • 3. 3 Agenda For This Session • Personalization Process Review • The Life of an Application • Separation of Concerns / Real World Architecture • Apache Spark and Flink Data Processing Projects • Clustering with Apache Flink • Next Steps
  • 4. 4 High Level Personalization Process 1. Profile created 2. Enrich with public data 3. Capture activity 4. Clustering analysis 5. Define Personas 6. Tag with personas 7. Personalize interactions Batch analytics Public data Common technologies • R • Hadoop • Spark • Python • Java • Many other options Personas changed much less often than tagging
  • 5. 5 Evolution of a Profile (1) { "_id" : ObjectId("553ea57b588ac9ef066428e1"), "ipAddress" : "216.58.219.238", "referrer" : ”kay.com", "firstName" : "John", "lastName" : "Doe", "email" : "johndoe@gmail.com" }
  • 6. 6 Evolution of a Profile (n+1) { "_id" : ObjectId("553e7dca588ac9ef066428e0"), "firstName" : "John", "lastName" : "Doe", "address" : "229 W. 43rd St.", "city" : "New York", "state" : "NY", "zipCode" : "10036", "age" : 30, "email" : "john.doe@mongodb.com", "twitterHandle" : "johndoe", "gender" : "male", "interests" : [ "electronics", "basketball", "weightlifting", "ultimate frisbee", "traveling", "technology" ], "visitedCounts" : { "watches" : 3, "shirts" : 1, "sunglasses" : 1, "bags" : 2 }, "purchases" : [ { "id" : 1, "desc" : "Power Oxford Dress Shoe", "category" : "Mens shoes" }, { "id" : 2, "desc" : "Striped Sportshirt", "category" : "Mens shirts" } ], "persona" : "shoe-fanatic” }
  • 7. 7 One size/document fits all? • Profile Data – Preferences – Personal information • Contact information • DOB, gender, ZIP... • Customer Data – Purchase History – Marketing History • „Session Data“ – View History – Shopping Cart Data – Information Broker Data • Personalisation Data – Persona Vectors – Product and Category recommendations Application Batch analytics
  • 8. 8 Separation of Concerns • Profile Data – Preferences – Personal information • Contact information • DOB, gender, ZIP... • Customer Data – Purchase History – Marketing History • „Session Data“ – View History – Shopping Cart Data – Information Broker Data • Personalisation Data – Persona Vectors – Product and Category recommendations Batch analytics Layer Frontend - System Profile Service Customer Service Session Service Persona Service
  • 9. 9 Benefits • Code does less, Document and Code stays focused • Split ability – Different Teams – New Languages – Defined Dependencies
  • 10. 10 Result • Code does less, Document and Code stays focused • Split ability – Different Teams – New Languages – Defined Dependencies KISS => Keep it simple and save! => Clean Code <= • Robert C. Marten: https://cleancoders.com/ • M. Fowler / B. Meyer. et. al.: Command Query Separation
  • 11. Analytics and Personalization From Query to Clustering
  • 12. 12 Separation of Concerns • Profile Data – Preferences – Personal information • Contact information • DOB, gender, ZIP... • Customer Data – Purchase History – Marketing History • „Session Data“ – View History – Shopping Cart Data – Information Broker Data • Personalisation Data – Persona Vectors – Product and Category recommendations Batch analytics Layer Frontend – System Profile Service Customer Service Session Service Persona Service
  • 13. 13 Separation of Concerns • Profile Data – Preferences – Personal information • Contact information • DOB, gender, ZIP... • Customer Data – Purchase History – Marketing History • „Session Data“ – View History – Shopping Cart Data – Information Broker Data • Personalisation Data – Persona Vectors – Product and Category recommendations Batch analytics Layer Frontend – System Profile Service Customer Service Session Service Persona Service
  • 14. 14 Architecture revised Profile Service Customer Service Session Service Persona Service Frontend – System Backend– Systems Data Processing
  • 15. 15 Advice for Developers • OWN YOUR DATA! (but only relevant Data) • Say no! (to direct Data ie. DB Access)
  • 17. 17 Hadoop in a Nutshell • An open source distributed storage and distributed batch oriented processing framework • Hadoop Distributed File System (HDFS) to store data on commodity hardware • Yarn as resource management platform • MapReduce as programming model working on top of HDFS
  • 18. 18 Spark in a Nutshell • Spark is a top-level Apache project • Can be run on top of YARN and can read any Hadoop API data, including HDFS or MongoDB • Fast and general engine for large-scale data processing and analytics • Advanced DAG execution engine with support for data locality and in-memory computing
  • 19. 19 Flink in a Nutshell • Flink is a top-level Apache project • Can be run on top of YARN and can read any Hadoop API data, including HDFS or MongoDB • A distributed streaming dataflow engine • Streaming and batch • Iterative in memory execution and handling • Cost based optimizer
  • 20. 20 Latency of query operations Query Aggregation MapReduce Cluster Algorithms time MongoDB Hadoop Spark/Flink
  • 22. 22 K-Means in Pictures • Source: Wikipedia K-Means
  • 23. 23 K-Means as a Process
  • 25. 25 Iterations in Flink • Dedicated iteration operators • Tasks keep running for the iterations, not redeployed for each step • Caching and optimizations done automatically
  • 26. Demo
  • 29. 29 Takeaways • Stay focussed => Start and stay small – Evaluate with BigDocuments but do a PoC focussed on the topic • Extending functionality is easy – Aggregation, MapReduce – Hadoop Connector opens a new variety of Use Cases • Extending functionality could be challenging – Evolution is outpacing help channels – A lot of options (Spark, Flink, Storm, Hadoop….) – More than just a binary
  • 30. 30 Next Steps • Next Session => Hands on Spark and Whatson Content! – „Machine Learning to Engage the Customer, with Apache Spark, IBM Watson, and MongoDB“ – RDD Examples • Try out Spark and Flink – http://bit.ly/MongoDB_Hadoop_Spark_Webinar – http://flink.apache.org/ – https://github.com/mongodb/mongo-hadoop – https://github.com/m4rcsch/flink-mongodb-example • Participate and ask Questions! – @m4rcsch – marc@mongodb.com
  • 31. Thank you! Marc Schwering Sr. Solutions Architect – EMEA marc@mongodb.com @m4rcsch

Hinweis der Redaktion

  1. Personalization Process Review (What We Heard) Access Pattern and Development Cycle Separation of Concerns (MongoDB Point of View)
  2. Todo: zoom in common tech
  3. Even counts and therefore persona very helpful. A good problem to have is too much information to personalize with – start simple, measure, and add
  4. Profile: show logical document parts
  5. Frontent caching system like varnish
  6. KISS => Keep it simple, stupid! Todo: References!!!
  7. Hadoop: great for big data that is partitionable Spark: MapReduce iterations are fast
  8. Amongst Hadoop and others these ar... In a distributed system, a conventional program would not work as the data is split across nodes. DAG (Directed Acyclic Graph) is a programming style for distributed systems - You can think of it as an alternative to Map Reduce. While MR has just two steps (map and reduce), DAG can have multiple levels that can form a tree structure. Say if you want to execute a SQL query, DAG is more flexible with more functions like map, filter, union etc. Also DAG execution is faster as in case of Apache Tez that succeeds MR due to intermediate results not being written to disk. Coming to Spark, the main concept is "RDD" - Resilient Distributed Dataset. To understand Spark architecture, it's best to read Berkley paper - Page on berkeley.edu In brief, RDDs are distributed data sets that can stay in memory and fallback to disk gracefully. RDDs if lost can be easily rebuilt using a graph that says how to reconstruct. RDDs are great if you want to keep holding a data set in memory and fire a series of queries - this works better than fetching data from disk every time. Another important RDD concept is that there are two types of things that can be done on an RDD - 1) Transformations like, map, filter than results in another RDD. 2) Actions like count that result in an output. A spark job comprises of a DAG of tasks executing transformations and actions on RDDs.
  9. Amongst Hadoop and others these ar... In a distributed system, a conventional program would not work as the data is split across nodes. DAG (Directed Acyclic Graph) is a programming style for distributed systems - You can think of it as an alternative to Map Reduce. While MR has just two steps (map and reduce), DAG can have multiple levels that can form a tree structure. Say if you want to execute a SQL query, DAG is more flexible with more functions like map, filter, union etc. Also DAG execution is faster as in case of Apache Tez that succeeds MR due to intermediate results not being written to disk. Coming to Spark, the main concept is "RDD" - Resilient Distributed Dataset. To understand Spark architecture, it's best to read Berkley paper - Page on berkeley.edu In brief, RDDs are distributed data sets that can stay in memory and fallback to disk gracefully. RDDs if lost can be easily rebuilt using a graph that says how to reconstruct. RDDs are great if you want to keep holding a data set in memory and fire a series of queries - this works better than fetching data from disk every time. Another important RDD concept is that there are two types of things that can be done on an RDD - 1) Transformations like, map, filter than results in another RDD. 2) Actions like count that result in an output. A spark job comprises of a DAG of tasks executing transformations and actions on RDDs.
  10. In a distributed system, a conventional program would not work as the data is split across nodes. DAG (Directed Acyclic Graph) is a programming style for distributed systems - You can think of it as an alternative to Map Reduce. While MR has just two steps (map and reduce), DAG can have multiple levels that can form a tree structure. Say if you want to execute a SQL query, DAG is more flexible with more functions like map, filter, union etc. Also DAG execution is faster as in case of Apache Tez that succeeds MR due to intermediate results not being written to disk. Coming to Spark, the main concept is "RDD" - Resilient Distributed Dataset. To understand Spark architecture, it's best to read Berkley paper - Page on berkeley.edu In brief, RDDs are distributed data sets that can stay in memory and fallback to disk gracefully. RDDs if lost can be easily rebuilt using a graph that says how to reconstruct. RDDs are great if you want to keep holding a data set in memory and fire a series of queries - this works better than fetching data from disk every time. Another important RDD concept is that there are two types of things that can be done on an RDD - 1) Transformations like, map, filter than results in another RDD. 2) Actions like count that result in an output. A spark job comprises of a DAG of tasks executing transformations and actions on RDDs.
  11. Better graphic.. Ggf die von chris nehmen und abaendern Cluster Alrgorithms… Chris Slides
  12. Wikipedia! Gray sqares!
  13. Todo: proper graphic
  14. Todo: add reference
  15. Todo: redesign graphic into MongoDB Version No black box, Logic and hook
  16. K means explained, more complex theme also expained
  17. Insert grpa
  18. Don‘t buy in too early. Solving real problems, Choose the right tool. RDD and / or Clustering Jobs are “natural” Staying operational and low latency focused