SlideShare ist ein Scribd-Unternehmen logo
1 von 73
Downloaden Sie, um offline zu lesen
Donald Szeto
Kenneth Chan
Simon Chan
support@prediction.io
Big Data TechCon - Oct 17, 2013
Building Applications That Predict User Behavior Through Big Data
Using Open-Source Technologies
Machine Learning is....
computers learning to predict
from data
putting
Machine Learning
into practice
Existing data tools are like raw material
PredictionIO

3 Product Principles
Love Developers
•
•
•

By Developers, For Developers
REST APIs & SDKs
Streamlined Data Science Process with UIs
Be Open
•
•
•

Open Source
Github - github.com/predictionio
Supported by Mozilla WebFWD
For Production
challenge #1

Scalability
Big Data Bottlenecks

Machine Learning Processing
PredictionIO has a
horizontally scalable
architecture
challenge #2

Productivity
‣
‣

Conventional
Routine

Collecting Data
Picking an Algorithm

‣

Writing Scalable Code

‣

Evaluating Results

‣

Tuning Algorithm Parameters

‣

Iterate . . .
‣

Item Similarity

‣
‣

Algorithms

Collaborative Filtering (CF)

???

Item Recommendation

‣

kNN Item-based CF

???

‣

Threshold Item-based CF

???

‣

Alternating Least Squares

???

‣

SlopeOne

???

‣

Singular Value Decomposition

???
MapReduce
- Native Java

public class WordCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws .....{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) { sum += val.get(); }
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
‣

Evaluating Results

‣
‣

What else?

But how?

Tuning Algorithm Parameters

‣

Lambda

‣

Regularization

‣

# Iterations

‣

...
‣

Collecting Data

‣
‣

Using PredictionIO SDKs and API

Picking an Algorithm & Writing Scalable Code

Better
‣ Ready-to-use Algorithms
Routine
‣ Evaluating Results & Tuning Algorithm
with
Parameters
PredictionIO
‣ Ready-to-use Metrics
‣
‣

Baseline Algorithms

All of the Above Driven by a GUI (!)
Efficiently tune algorithms
as a developer.
Develop intuition with
how algorithms behave.
(Very) Light Intro to
kNN CF

Users
1 2 3 4 5 6 7 8 9 10
Predict user
rating on unseen
item

Items

Users and items

1

3

2
3

2

4
2

?

3
5

1

1

1

2

5

2

1

4

1

4

3

4
5

5

2

2

5
3

2

1

1
1
k-Nearest Neighbor
(kNN)
Find k most
similar items to
the targeted
items

•Cosine / Adjusted
Cosine

1 2 3 4 5 6 7 8 9 10

Items

•Pearson Correlation

Users

1

3

2

•Jaccard coefficient

3

Weight-sum the
known ratings of
the targeted
users on these
items

2

4
2

4

?

1

1

2

5

2

3

5

1

1

4

1

3

4
5

5

2

2

5
3

2

1

1
1
k-Nearest Neighbor
(kNN)

Users

Correlation as
Item Similarity:
Item 1 & 3:
R1 = [3, 5, 1, 4]
R3 = [3, 4, 2, 5]
Corr(R1, R3) = 0.8

Items

1 2 3 4 5 6 7 8 9 10
1

3

2
3

2
3

4
5

5

4
2

1

4

1

4
1

1

2

5

2

5
3

2

1
2

2

3
5

1

1
1
k-Nearest Neighbor
(kNN)
Sij:
similarity score of
item i and item j

(4 * 0.8 + 2 * 0.9)
/(0.8 + 0.9)
= 5 / 1.7
= 2.9

1 2 3 4 5 6 7 8 9 10

Items

Let S13 = 0.8, S35 =
0.9,
predicted score of
User 6 on Item 3
would be:

Users

1

3

2
3

2

4
2

?

3
5

1

1

1

2

5

2

1

4

1

4

3

4
5

5

2

2

5
3

2

1

1
1
PredictionIO In Action
Something is missing here...
“If you follow this startup, you may also
want to follow these X, Y, Z startups.”
How are we going to do this?

Data

+
What data do we need?

Users - the followers (AngelList users)
Items - the startups
Actions - the “follow” actions
What data do we need?
For this demo...
- Use AngelList API (https://angel.co/api)
- Retrieve all startups (Items) belonging to the following accelerators:

- Retrieve all users following these startups (Users and Actions).
- 16382 users. 993 items. 204176 actions.
What data do we need?
Generate these files...
users.csv - list of unique user ids
<user_id>

startup_id_name_url_incubator.csv - list of startups with ids, names, AngelList URL and
incubators
<startup_id>, <name>, <url>, <incubator>

follows.csv - list of ids for startups and users who follow the startup
<startup_id>, <user_id>
How to integrate with PredictionIO?
How to integrate with PredictionIO?
How to integrate with PredictionIO?
Python SDK is used in this demo (other SDKs are similar)....
# PredictionIO Client Object
client = predictionio.Client(“APP_KEY”)

# Import users and items
client.create_user(“user_id”)
client.create_item(“startup_id”, (“item_type_1”,))

# Example
client.create_user(“5”)
client.create_item(“17”, (“startup”, “y-combinator”,))
How to integrate with PredictionIO?
Python SDK is used in this demo (other SDKs are similar)....
# Import actions
client.identify(“user_id”)
client.record_action_on_item(“action_name”, “startup_id”)

PredictionIO comes with the following built-in actions:
“like”, “dislike”, “rate”, “view”, “conversion”

# Example
client.identify(“5”)
client.record_action_on_item(“like”, “17”) # use like to represent follow
How to integrate with PredictionIO?
Python SDK is used in this demo (other SDKs are similar)....
# More advanced usage
client = predictionio.Client(APP_KEY, THREADS, API_URL, qsize=REQUEST_QSIZE)

# Asynchronous version
client.acreate_user(“user_id”)
client.acreate_item(“item_id”, (“item_type_1”,))
client.arecord_action_on_item(“action_name”, “item_id”)
How to integrate with PredictionIO?
How to integrate with PredictionIO?
How to integrate with PredictionIO?
How to retrieve prediction results?
Python SDK is used in this demo (other SDKs are similar)....
# retrieve prediction results from engine
rec = client.get_itemsim_topn(“engine_name”, “startup_id”, num)

# Example
rec = client.get_itemsim_topn(“my-engine”, “17”, 10)

# Asynchronous version
req = client.aget_itemsim_topn(“engine_name”, “startup_id”, num)
# can do something else in between.....
rec = client.aresp(req)
Demo Setup
Retrieve prediction
results through REST
API
Batch
import data
through
Python SDK

+

Retrieve startups data
through REST API
Store and retrieve
startups data

https://github.com/PredictionIO/Demo-AngelList
Analytics platform for mobile & web.
E-commerce across social, mobile, and web
Data analysis tool

Analytics Backend as a Service (web + mobile + internet of things)
Production and distribution of visual content
Mobile Application Performance Monitoring
Mobile monetization

Payments for marketplaces and crowdfunding
Xbox Kinect for iPhone

Analytics Backend as a Service (web + mobile + internet of things)
Live shows
Youtube network for Artists

Analytics Backend as a Service (web + mobile + internet of things)
Production and distribution of visual content
Mobile Application Performance Monitoring
Analytics platform for mobile & web.
Mobile monetization
E-commerce across social, mobile, and web
Data analysis tool
Live shows
Youtube network for Artists
Payments for marketplaces and crowdfunding
Xbox Kinect for iPhone

Analytics Backend as a Service (web + mobile + internet of things)
How to try different algorithms?
How to try different algorithms?
How to try different algorithms?
How to change algorithm parameters?
How to change algorithm parameters?
How do we evaluate how good the
prediction results are?

We need metrics!
MAP@k
MAP@k

(Mean Average Precision at k)

For this user A...
Retrieve Top 5
items
(prediction):

Relevant items
(actual
behavior):

item4

item7

item10

item10

item6

item25

item15
item7

Total relevant items: 3
MAP@k

(Mean Average Precision at k)

For this user A...
Retrieve Top 5
items
(prediction):

Relevant items
(actual
behavior):

item4

item7

item10

item10

item6

item25

item15
item7

Total relevant items: 3

How good is this prediction?
MAP@k

(Mean Average Precision at k)

For this user A...

Retrieve Top 5
items
(prediction):
Position:

Number of
relevant items
up to this
position:

P@k:

item4

1

0

0

item10

2

1

1/2

item6

3

1

0

item15

4

1

0

item7

5

2

2/5

AP@5 = (1/2 + 2/5) / min(5,3)

Total relevant items: 3

MAP@5 = average of AP@5 of all users
ISMAP@k
- Extension to MAP@k for Item Similarity Engine
MAP@k
For each user....
Retrieve Top 5
items
(prediction):

Relevant items
(actual
behavior):

item4

item7

item10

item10

item6

item25

item15
item7

AP@5 = (1/2 + 2/5) / min(5,3)
MAP@5 = average of AP@5 of all users
ISMAP@k
For each item X...
For each user who “follows” this item X....
Retrieve Top 5 similar
items to X that the user
may also follow
(prediction):

Relevant items
(actual
behavior):

item4

item7

item10

item25

MAP@5 = average of AP@5 of all users

item10

item6

AP@5 = (1/2 + 2/5) / min(5,3)

ISMAP@5 = average of MAP@5 of all items

item15
item7

We have metrics, but how to evaluate?
Offline
Evaluation
Offline Evaluation
What does it mean by “relevant”? What’s the prediction goal?
Retrieve Top 5
items
(prediction):

Relevant items
(actual
behavior):

Get relevant
items
Test set

(Tested with unseen
data)
item4

item7

item10

item10

item6

Training set

item25

item15
item7
Retrieve
prediction

Training

(treated like
unseen)
Split

Data
(seen)
How to set the goal?
How to set the goal?
How do we evaluate how good the
prediction results are?
How do we evaluate how good the
prediction results are?
Sample evaluation scores

Default-Algo

0.0335

random

0.0039
‣

Predict
User
Preference

Smarter Ride-Sharing Apps

‣

Personalized Meal Recommendation

‣

Intelligent Online Course Discovery

‣

App Discovery

‣

and many more...
Getting
Started!

-

@PredictionIO
prediction.io
github.com/predictionio

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to GraphQL at API days
Introduction to GraphQL at API daysIntroduction to GraphQL at API days
Introduction to GraphQL at API daysyann_s
 
Do we need a bigger dev data culture
Do we need a bigger dev data cultureDo we need a bigger dev data culture
Do we need a bigger dev data cultureSimon Dittlmann
 
Postgres Conf Keynote: What got you here WILL get you there
Postgres Conf Keynote: What got you here WILL get you therePostgres Conf Keynote: What got you here WILL get you there
Postgres Conf Keynote: What got you here WILL get you thereAnant Jhingran
 
GraphQL, Redux, and React
GraphQL, Redux, and ReactGraphQL, Redux, and React
GraphQL, Redux, and ReactKeon Kim
 
Using Azure Machine Learning Models
Using Azure Machine Learning ModelsUsing Azure Machine Learning Models
Using Azure Machine Learning ModelsEng Teong Cheah
 
An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)Julien SIMON
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitDatabricks
 
SharePoint Saturday Atlanta 2015
SharePoint Saturday Atlanta 2015SharePoint Saturday Atlanta 2015
SharePoint Saturday Atlanta 2015Pushkar Chivate
 
MongoDB.local Berlin: Building a GraphQL API with MongoDB, Prisma and Typescript
MongoDB.local Berlin: Building a GraphQL API with MongoDB, Prisma and TypescriptMongoDB.local Berlin: Building a GraphQL API with MongoDB, Prisma and Typescript
MongoDB.local Berlin: Building a GraphQL API with MongoDB, Prisma and TypescriptMongoDB
 
MongoDB.local Berlin: App development in a Serverless World
MongoDB.local Berlin: App development in a Serverless WorldMongoDB.local Berlin: App development in a Serverless World
MongoDB.local Berlin: App development in a Serverless WorldMongoDB
 
Intershop Commerce Management with Microsoft SQL Server
Intershop Commerce Management with Microsoft SQL ServerIntershop Commerce Management with Microsoft SQL Server
Intershop Commerce Management with Microsoft SQL ServerMauro Boffardi
 
Leveraging APIs from SharePoint Framework solutions
Leveraging APIs from SharePoint Framework solutionsLeveraging APIs from SharePoint Framework solutions
Leveraging APIs from SharePoint Framework solutionsDragan Panjkov
 
Optimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLOptimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLChijioke “CJ” Ejimuda
 
Pathway to Cloud-Native .NET
Pathway to Cloud-Native .NETPathway to Cloud-Native .NET
Pathway to Cloud-Native .NETVMware Tanzu
 
SciVerse Application Integration Points
SciVerse Application Integration PointsSciVerse Application Integration Points
SciVerse Application Integration PointsElsevier Developers
 
Cloud Abstraction Libraries: Implementation and Comparison
Cloud Abstraction Libraries: Implementation and ComparisonCloud Abstraction Libraries: Implementation and Comparison
Cloud Abstraction Libraries: Implementation and ComparisonUdit Agarwal
 
Build a predictive analytics model on a terabyte of data within hours
Build a predictive analytics model on a terabyte of data within hoursBuild a predictive analytics model on a terabyte of data within hours
Build a predictive analytics model on a terabyte of data within hoursDataWorks Summit
 
About The Event-Driven Data Layer & Adobe Analytics
About The Event-Driven Data Layer & Adobe AnalyticsAbout The Event-Driven Data Layer & Adobe Analytics
About The Event-Driven Data Layer & Adobe AnalyticsKevin Haag
 

Was ist angesagt? (20)

Introduction to GraphQL at API days
Introduction to GraphQL at API daysIntroduction to GraphQL at API days
Introduction to GraphQL at API days
 
Do we need a bigger dev data culture
Do we need a bigger dev data cultureDo we need a bigger dev data culture
Do we need a bigger dev data culture
 
Postgres Conf Keynote: What got you here WILL get you there
Postgres Conf Keynote: What got you here WILL get you therePostgres Conf Keynote: What got you here WILL get you there
Postgres Conf Keynote: What got you here WILL get you there
 
GraphQL, Redux, and React
GraphQL, Redux, and ReactGraphQL, Redux, and React
GraphQL, Redux, and React
 
Using Azure Machine Learning Models
Using Azure Machine Learning ModelsUsing Azure Machine Learning Models
Using Azure Machine Learning Models
 
An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
 
the Indivo X iOS Framework
the Indivo X iOS Frameworkthe Indivo X iOS Framework
the Indivo X iOS Framework
 
Serverless GraphQL with AWS AppSync & AWS Amplify
Serverless GraphQL with AWS AppSync & AWS AmplifyServerless GraphQL with AWS AppSync & AWS Amplify
Serverless GraphQL with AWS AppSync & AWS Amplify
 
SharePoint Saturday Atlanta 2015
SharePoint Saturday Atlanta 2015SharePoint Saturday Atlanta 2015
SharePoint Saturday Atlanta 2015
 
MongoDB.local Berlin: Building a GraphQL API with MongoDB, Prisma and Typescript
MongoDB.local Berlin: Building a GraphQL API with MongoDB, Prisma and TypescriptMongoDB.local Berlin: Building a GraphQL API with MongoDB, Prisma and Typescript
MongoDB.local Berlin: Building a GraphQL API with MongoDB, Prisma and Typescript
 
MongoDB.local Berlin: App development in a Serverless World
MongoDB.local Berlin: App development in a Serverless WorldMongoDB.local Berlin: App development in a Serverless World
MongoDB.local Berlin: App development in a Serverless World
 
Intershop Commerce Management with Microsoft SQL Server
Intershop Commerce Management with Microsoft SQL ServerIntershop Commerce Management with Microsoft SQL Server
Intershop Commerce Management with Microsoft SQL Server
 
Leveraging APIs from SharePoint Framework solutions
Leveraging APIs from SharePoint Framework solutionsLeveraging APIs from SharePoint Framework solutions
Leveraging APIs from SharePoint Framework solutions
 
Optimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLOptimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQL
 
Pathway to Cloud-Native .NET
Pathway to Cloud-Native .NETPathway to Cloud-Native .NET
Pathway to Cloud-Native .NET
 
SciVerse Application Integration Points
SciVerse Application Integration PointsSciVerse Application Integration Points
SciVerse Application Integration Points
 
Cloud Abstraction Libraries: Implementation and Comparison
Cloud Abstraction Libraries: Implementation and ComparisonCloud Abstraction Libraries: Implementation and Comparison
Cloud Abstraction Libraries: Implementation and Comparison
 
Build a predictive analytics model on a terabyte of data within hours
Build a predictive analytics model on a terabyte of data within hoursBuild a predictive analytics model on a terabyte of data within hours
Build a predictive analytics model on a terabyte of data within hours
 
About The Event-Driven Data Layer & Adobe Analytics
About The Event-Driven Data Layer & Adobe AnalyticsAbout The Event-Driven Data Layer & Adobe Analytics
About The Event-Driven Data Layer & Adobe Analytics
 

Ähnlich wie PredictionIO - Building Applications That Predict User Behavior Through Big Data Using Open-Source Technologies

Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemYael Garten
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das
 
AppliFire Blue Print Design Guidelines
AppliFire Blue Print Design GuidelinesAppliFire Blue Print Design Guidelines
AppliFire Blue Print Design GuidelinesAppliFire Platform
 
Telecom datascience master_public
Telecom datascience master_publicTelecom datascience master_public
Telecom datascience master_publicVincent Michel
 
Recommendation Engine Powered by Hadoop
Recommendation Engine Powered by HadoopRecommendation Engine Powered by Hadoop
Recommendation Engine Powered by HadoopPranab Ghosh
 
Recommendation Engine Powered by Hadoop - Pranab Ghosh
Recommendation Engine Powered by Hadoop - Pranab GhoshRecommendation Engine Powered by Hadoop - Pranab Ghosh
Recommendation Engine Powered by Hadoop - Pranab GhoshBigDataCloud
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataAbhishek M Shivalingaiah
 
Cracking web development
Cracking web developmentCracking web development
Cracking web developmentEyal Kenig
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData SolutionsTravis Oliphant
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsJohann Schleier-Smith
 
Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...GameCamp
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
 
Using bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-REDUsing bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-REDLionel Mommeja
 
Comment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesComment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesElasticsearch
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Provectus
 
Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus 15: Introducing TigerGraph 2.4 Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus 15: Introducing TigerGraph 2.4 TigerGraph
 
Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1Dr Sulaimon Afolabi
 
Real time analytics @ netflix
Real time analytics @ netflixReal time analytics @ netflix
Real time analytics @ netflixCody Rioux
 

Ähnlich wie PredictionIO - Building Applications That Predict User Behavior Through Big Data Using Open-Source Technologies (20)

Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
AppliFire Blue Print Design Guidelines
AppliFire Blue Print Design GuidelinesAppliFire Blue Print Design Guidelines
AppliFire Blue Print Design Guidelines
 
Telecom datascience master_public
Telecom datascience master_publicTelecom datascience master_public
Telecom datascience master_public
 
Recommendation Engine Powered by Hadoop
Recommendation Engine Powered by HadoopRecommendation Engine Powered by Hadoop
Recommendation Engine Powered by Hadoop
 
Recommendation Engine Powered by Hadoop - Pranab Ghosh
Recommendation Engine Powered by Hadoop - Pranab GhoshRecommendation Engine Powered by Hadoop - Pranab Ghosh
Recommendation Engine Powered by Hadoop - Pranab Ghosh
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big Data
 
Cracking web development
Cracking web developmentCracking web development
Cracking web development
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData Solutions
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
 
Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Using bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-REDUsing bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-RED
 
Comment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesComment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitables
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 
Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus 15: Introducing TigerGraph 2.4 Graph Gurus 15: Introducing TigerGraph 2.4
Graph Gurus 15: Introducing TigerGraph 2.4
 
Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1Implementing a data_science_project (Python Version)_part1
Implementing a data_science_project (Python Version)_part1
 
Real time analytics @ netflix
Real time analytics @ netflixReal time analytics @ netflix
Real time analytics @ netflix
 

Kürzlich hochgeladen

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Kürzlich hochgeladen (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

PredictionIO - Building Applications That Predict User Behavior Through Big Data Using Open-Source Technologies

  • 1. Donald Szeto Kenneth Chan Simon Chan support@prediction.io Big Data TechCon - Oct 17, 2013 Building Applications That Predict User Behavior Through Big Data Using Open-Source Technologies
  • 2. Machine Learning is.... computers learning to predict from data
  • 4. Existing data tools are like raw material
  • 6. Love Developers • • • By Developers, For Developers REST APIs & SDKs Streamlined Data Science Process with UIs
  • 7. Be Open • • • Open Source Github - github.com/predictionio Supported by Mozilla WebFWD
  • 10. Big Data Bottlenecks Machine Learning Processing
  • 11. PredictionIO has a horizontally scalable architecture
  • 12.
  • 14. ‣ ‣ Conventional Routine Collecting Data Picking an Algorithm ‣ Writing Scalable Code ‣ Evaluating Results ‣ Tuning Algorithm Parameters ‣ Iterate . . .
  • 15. ‣ Item Similarity ‣ ‣ Algorithms Collaborative Filtering (CF) ??? Item Recommendation ‣ kNN Item-based CF ??? ‣ Threshold Item-based CF ??? ‣ Alternating Least Squares ??? ‣ SlopeOne ??? ‣ Singular Value Decomposition ???
  • 16. MapReduce - Native Java public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws .....{ String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }
  • 17. ‣ Evaluating Results ‣ ‣ What else? But how? Tuning Algorithm Parameters ‣ Lambda ‣ Regularization ‣ # Iterations ‣ ...
  • 18.
  • 19. ‣ Collecting Data ‣ ‣ Using PredictionIO SDKs and API Picking an Algorithm & Writing Scalable Code Better ‣ Ready-to-use Algorithms Routine ‣ Evaluating Results & Tuning Algorithm with Parameters PredictionIO ‣ Ready-to-use Metrics ‣ ‣ Baseline Algorithms All of the Above Driven by a GUI (!)
  • 20.
  • 21.
  • 22.
  • 23. Efficiently tune algorithms as a developer. Develop intuition with how algorithms behave.
  • 24. (Very) Light Intro to kNN CF Users 1 2 3 4 5 6 7 8 9 10 Predict user rating on unseen item Items Users and items 1 3 2 3 2 4 2 ? 3 5 1 1 1 2 5 2 1 4 1 4 3 4 5 5 2 2 5 3 2 1 1 1
  • 25. k-Nearest Neighbor (kNN) Find k most similar items to the targeted items •Cosine / Adjusted Cosine 1 2 3 4 5 6 7 8 9 10 Items •Pearson Correlation Users 1 3 2 •Jaccard coefficient 3 Weight-sum the known ratings of the targeted users on these items 2 4 2 4 ? 1 1 2 5 2 3 5 1 1 4 1 3 4 5 5 2 2 5 3 2 1 1 1
  • 26. k-Nearest Neighbor (kNN) Users Correlation as Item Similarity: Item 1 & 3: R1 = [3, 5, 1, 4] R3 = [3, 4, 2, 5] Corr(R1, R3) = 0.8 Items 1 2 3 4 5 6 7 8 9 10 1 3 2 3 2 3 4 5 5 4 2 1 4 1 4 1 1 2 5 2 5 3 2 1 2 2 3 5 1 1 1
  • 27. k-Nearest Neighbor (kNN) Sij: similarity score of item i and item j (4 * 0.8 + 2 * 0.9) /(0.8 + 0.9) = 5 / 1.7 = 2.9 1 2 3 4 5 6 7 8 9 10 Items Let S13 = 0.8, S35 = 0.9, predicted score of User 6 on Item 3 would be: Users 1 3 2 3 2 4 2 ? 3 5 1 1 1 2 5 2 1 4 1 4 3 4 5 5 2 2 5 3 2 1 1 1
  • 29.
  • 31. “If you follow this startup, you may also want to follow these X, Y, Z startups.”
  • 32. How are we going to do this? Data +
  • 33. What data do we need? Users - the followers (AngelList users) Items - the startups Actions - the “follow” actions
  • 34. What data do we need? For this demo... - Use AngelList API (https://angel.co/api) - Retrieve all startups (Items) belonging to the following accelerators: - Retrieve all users following these startups (Users and Actions). - 16382 users. 993 items. 204176 actions.
  • 35. What data do we need? Generate these files... users.csv - list of unique user ids <user_id> startup_id_name_url_incubator.csv - list of startups with ids, names, AngelList URL and incubators <startup_id>, <name>, <url>, <incubator> follows.csv - list of ids for startups and users who follow the startup <startup_id>, <user_id>
  • 36. How to integrate with PredictionIO?
  • 37.
  • 38. How to integrate with PredictionIO?
  • 39. How to integrate with PredictionIO? Python SDK is used in this demo (other SDKs are similar).... # PredictionIO Client Object client = predictionio.Client(“APP_KEY”) # Import users and items client.create_user(“user_id”) client.create_item(“startup_id”, (“item_type_1”,)) # Example client.create_user(“5”) client.create_item(“17”, (“startup”, “y-combinator”,))
  • 40. How to integrate with PredictionIO? Python SDK is used in this demo (other SDKs are similar).... # Import actions client.identify(“user_id”) client.record_action_on_item(“action_name”, “startup_id”) PredictionIO comes with the following built-in actions: “like”, “dislike”, “rate”, “view”, “conversion” # Example client.identify(“5”) client.record_action_on_item(“like”, “17”) # use like to represent follow
  • 41. How to integrate with PredictionIO? Python SDK is used in this demo (other SDKs are similar).... # More advanced usage client = predictionio.Client(APP_KEY, THREADS, API_URL, qsize=REQUEST_QSIZE) # Asynchronous version client.acreate_user(“user_id”) client.acreate_item(“item_id”, (“item_type_1”,)) client.arecord_action_on_item(“action_name”, “item_id”)
  • 42. How to integrate with PredictionIO?
  • 43. How to integrate with PredictionIO?
  • 44. How to integrate with PredictionIO?
  • 45. How to retrieve prediction results? Python SDK is used in this demo (other SDKs are similar).... # retrieve prediction results from engine rec = client.get_itemsim_topn(“engine_name”, “startup_id”, num) # Example rec = client.get_itemsim_topn(“my-engine”, “17”, 10) # Asynchronous version req = client.aget_itemsim_topn(“engine_name”, “startup_id”, num) # can do something else in between..... rec = client.aresp(req)
  • 46. Demo Setup Retrieve prediction results through REST API Batch import data through Python SDK + Retrieve startups data through REST API Store and retrieve startups data https://github.com/PredictionIO/Demo-AngelList
  • 47.
  • 48. Analytics platform for mobile & web. E-commerce across social, mobile, and web Data analysis tool Analytics Backend as a Service (web + mobile + internet of things)
  • 49. Production and distribution of visual content Mobile Application Performance Monitoring Mobile monetization Payments for marketplaces and crowdfunding Xbox Kinect for iPhone Analytics Backend as a Service (web + mobile + internet of things)
  • 50. Live shows Youtube network for Artists Analytics Backend as a Service (web + mobile + internet of things)
  • 51. Production and distribution of visual content Mobile Application Performance Monitoring Analytics platform for mobile & web. Mobile monetization E-commerce across social, mobile, and web Data analysis tool Live shows Youtube network for Artists Payments for marketplaces and crowdfunding Xbox Kinect for iPhone Analytics Backend as a Service (web + mobile + internet of things)
  • 52. How to try different algorithms?
  • 53. How to try different algorithms?
  • 54. How to try different algorithms?
  • 55. How to change algorithm parameters?
  • 56. How to change algorithm parameters?
  • 57. How do we evaluate how good the prediction results are? We need metrics!
  • 58. MAP@k
  • 59. MAP@k (Mean Average Precision at k) For this user A... Retrieve Top 5 items (prediction): Relevant items (actual behavior): item4 item7 item10 item10 item6 item25 item15 item7 Total relevant items: 3
  • 60. MAP@k (Mean Average Precision at k) For this user A... Retrieve Top 5 items (prediction): Relevant items (actual behavior): item4 item7 item10 item10 item6 item25 item15 item7 Total relevant items: 3 How good is this prediction?
  • 61. MAP@k (Mean Average Precision at k) For this user A... Retrieve Top 5 items (prediction): Position: Number of relevant items up to this position: P@k: item4 1 0 0 item10 2 1 1/2 item6 3 1 0 item15 4 1 0 item7 5 2 2/5 AP@5 = (1/2 + 2/5) / min(5,3) Total relevant items: 3 MAP@5 = average of AP@5 of all users
  • 62. ISMAP@k - Extension to MAP@k for Item Similarity Engine
  • 63. MAP@k For each user.... Retrieve Top 5 items (prediction): Relevant items (actual behavior): item4 item7 item10 item10 item6 item25 item15 item7 AP@5 = (1/2 + 2/5) / min(5,3) MAP@5 = average of AP@5 of all users
  • 64. ISMAP@k For each item X... For each user who “follows” this item X.... Retrieve Top 5 similar items to X that the user may also follow (prediction): Relevant items (actual behavior): item4 item7 item10 item25 MAP@5 = average of AP@5 of all users item10 item6 AP@5 = (1/2 + 2/5) / min(5,3) ISMAP@5 = average of MAP@5 of all items item15 item7 We have metrics, but how to evaluate?
  • 66. Offline Evaluation What does it mean by “relevant”? What’s the prediction goal? Retrieve Top 5 items (prediction): Relevant items (actual behavior): Get relevant items Test set (Tested with unseen data) item4 item7 item10 item10 item6 Training set item25 item15 item7 Retrieve prediction Training (treated like unseen) Split Data (seen)
  • 67. How to set the goal?
  • 68. How to set the goal?
  • 69. How do we evaluate how good the prediction results are?
  • 70. How do we evaluate how good the prediction results are?
  • 72. ‣ Predict User Preference Smarter Ride-Sharing Apps ‣ Personalized Meal Recommendation ‣ Intelligent Online Course Discovery ‣ App Discovery ‣ and many more...