SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Simon Chan
simon@prediction.io
Data Science London - April 24, 2013
Big Data Week
Machine Learning is....
computers learning to predict
from data
putting
Machine Learning
into practice
challenge #1
Scalability
Big Data Bottlenecks
Machine Learning Processing
PredictionIO has a
horizontally scalable
architecture
Async SDK
Client client = new Client(appkey);
// Adding user behaviors
req = client.getUserRateItemRequestBuilder(uid, iid, rate);
client.userRateItemAsFuture(req);
Play
Framework
‣ stateless - no server session
‣ non-blocking web request
Play: A Non-blocking Example
def index = Action {
val futureInt = scala.concurrent.Future { slowDataProcess() }
Async {
futureInt.map(i => Ok(views.html.result.render(i)))
}
}
MongoDB
‣ Read scaling: Replica Sets
‣ Write scaling: Sharding
‣ Indexes (e.g. geospatial)
{ geoSearch : "places", near : [33, 33],
maxDistance : 6, search : { uid : "user1" } }
Hadoop
Hadoop&
Cascading&(Java)&
Scalding&(Scala)&
MapReduce
- Native Java
public class WordCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws .....{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) { sum += val.get(); }
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
MapReduce
- Scalding
class ScaldingTestJob(args: Args) extends Job(args) {
Tsv(args(0), 'text)
.flatMap('text -> 'word) { text : String => text.split("s+") }
.groupBy('word) { _.size }
.write(Tsv(args(1))
}
Sample Code
### Sample PredictionIO Python SDK Code
client = predictionio.Client(appkey="<your app key>")
# Add Data
client.create_user(uid=”user123”)
client.create_item(iid=”itemXYZ”, itypes=(1,))
client.user_view_item(uid=”user123”, iid=”itemXYZ”)
# Get Prediction
rec = client.get_itemrec(engine="<engine name>", uid=”user123”, n=5)
Getting
Involved!
- @PredictionIO
- prediction.io - Newsletter
- github.com/predictionio
Q&A
Q: Selecting the right features is a big problem. Can PredictionIO solve this problem?
A: Not at this moment.That’s why we focus on collaborative filtering algorithms right now
which don’t require the use of features.And we believe that the involvement of data
scientists is needed for many specific problems. PredictionIO is positioned as a tool to
make their work easier, but not as a replacement.
Q: How’s PredictionIO different from Weka?
A:Weka, like Mahout, is a ML algorithm library.You can see PredictionIO as a layer on top
of it, which helps you to implement algorithm into production environment by providing a
complete infrastructure.
Q: How do you compare PredictionIO with RapidMiner?
A: RapidMiner is a great product to define data engineering workflow visually.
PredictionIO focuses on a different problem -- i.e. deploying ML solution into production
environment.
Q: How does the algorithm evaluation metrics work in PredictionIO?
A: At this moment, you can evaluate algorithms by some offline metrics, such as Mean
Average Precision, based on your existing data.
Q:What’s the business model?
A: We focus on making PredictionIO a useful open source product at this moment.

Weitere ähnliche Inhalte

Was ist angesagt?

An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
Johann Schleier-Smith
 
Driverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.aiDriverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.ai
Sri Ambati
 

Was ist angesagt? (20)

Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...
Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...
Современная архитектура Android-приложений - Archetype / Степан Гончаров (90 ...
 
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYCMegan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
Megan Kurka, H2O.ai - AutoDoc with H2O Driverless AI - H2O World 2019 NYC
 
AppSync and GraphQL on iOS
AppSync and GraphQL on iOSAppSync and GraphQL on iOS
AppSync and GraphQL on iOS
 
Introducing AWS AppSync: serverless data driven apps with real-time and offli...
Introducing AWS AppSync: serverless data driven apps with real-time and offli...Introducing AWS AppSync: serverless data driven apps with real-time and offli...
Introducing AWS AppSync: serverless data driven apps with real-time and offli...
 
Using Azure Machine Learning Models
Using Azure Machine Learning ModelsUsing Azure Machine Learning Models
Using Azure Machine Learning Models
 
Intershop Commerce Management with Microsoft SQL Server
Intershop Commerce Management with Microsoft SQL ServerIntershop Commerce Management with Microsoft SQL Server
Intershop Commerce Management with Microsoft SQL Server
 
Supercharging Applications with GraphQL and AWS AppSync
Supercharging Applications with GraphQL and AWS AppSyncSupercharging Applications with GraphQL and AWS AppSync
Supercharging Applications with GraphQL and AWS AppSync
 
How BigQuery broke my heart
How BigQuery broke my heartHow BigQuery broke my heart
How BigQuery broke my heart
 
Optimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQLOptimizing PV energy yield with Elasticsearch and graphQL
Optimizing PV energy yield with Elasticsearch and graphQL
 
Redshift VS BigQuery
Redshift VS BigQueryRedshift VS BigQuery
Redshift VS BigQuery
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
 
Pathway to Cloud-Native .NET
Pathway to Cloud-Native .NETPathway to Cloud-Native .NET
Pathway to Cloud-Native .NET
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
Driverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.aiDriverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.ai
 
Performance optimisation with GraphQL
Performance optimisation with GraphQLPerformance optimisation with GraphQL
Performance optimisation with GraphQL
 
Big objects in Salesforce Technology
Big objects in Salesforce TechnologyBig objects in Salesforce Technology
Big objects in Salesforce Technology
 
Agile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender Systems
 
30 days of google cloud event
30 days of google cloud event30 days of google cloud event
30 days of google cloud event
 

Andere mochten auch

Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)
Nikhil Garg
 
Applying Machine Learning to Software Clustering
Applying Machine Learning to Software ClusteringApplying Machine Learning to Software Clustering
Applying Machine Learning to Software Clustering
butest
 
Co-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and SparkCo-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and Spark
sscdotopen
 
201203 Adaptive Empathetic Software
201203 Adaptive Empathetic Software201203 Adaptive Empathetic Software
201203 Adaptive Empathetic Software
Javier Gonzalez-Sanchez
 

Andere mochten auch (19)

PredictionIO - The 1st International Conference on Predictive APIs and Apps
PredictionIO - The 1st International Conference on Predictive APIs and AppsPredictionIO - The 1st International Conference on Predictive APIs and Apps
PredictionIO - The 1st International Conference on Predictive APIs and Apps
 
Machine Learning & Ecommerce - by David Jones - PAPIs Connect
Machine Learning & Ecommerce - by David Jones - PAPIs ConnectMachine Learning & Ecommerce - by David Jones - PAPIs Connect
Machine Learning & Ecommerce - by David Jones - PAPIs Connect
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)
 
Инфраструктура как услуга (IaaS) в Windows Azure
Инфраструктура как услуга (IaaS) в Windows AzureИнфраструктура как услуга (IaaS) в Windows Azure
Инфраструктура как услуга (IaaS) в Windows Azure
 
Microsoft Azure - введение в основные сервисы для разработки и инфраструктуры...
Microsoft Azure - введение в основные сервисы для разработки и инфраструктуры...Microsoft Azure - введение в основные сервисы для разработки и инфраструктуры...
Microsoft Azure - введение в основные сервисы для разработки и инфраструктуры...
 
Презентация MS Azure
Презентация MS AzureПрезентация MS Azure
Презентация MS Azure
 
Naive application of Machine Learning to Software Development
Naive application of Machine Learning to Software DevelopmentNaive application of Machine Learning to Software Development
Naive application of Machine Learning to Software Development
 
Applying Machine Learning to Software Clustering
Applying Machine Learning to Software ClusteringApplying Machine Learning to Software Clustering
Applying Machine Learning to Software Clustering
 
Discovery
DiscoveryDiscovery
Discovery
 
Pragmatic machine learning for the real world
Pragmatic machine learning for the real worldPragmatic machine learning for the real world
Pragmatic machine learning for the real world
 
Setting up a Machine Learning Platform - Monitoring social media the “smart” way
Setting up a Machine Learning Platform - Monitoring social media the “smart” waySetting up a Machine Learning Platform - Monitoring social media the “smart” way
Setting up a Machine Learning Platform - Monitoring social media the “smart” way
 
Seldon - Open Sourcing a Predictive API - Data Science London #ds_ldn
Seldon - Open Sourcing a Predictive API - Data Science London #ds_ldnSeldon - Open Sourcing a Predictive API - Data Science London #ds_ldn
Seldon - Open Sourcing a Predictive API - Data Science London #ds_ldn
 
Co-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and SparkCo-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and Spark
 
Big wins with small data. PredictionIO in ecommerce - David Jones
Big wins with small data. PredictionIO in ecommerce - David JonesBig wins with small data. PredictionIO in ecommerce - David Jones
Big wins with small data. PredictionIO in ecommerce - David Jones
 
Prediction io–final 2014-jp-handout
Prediction io–final 2014-jp-handoutPrediction io–final 2014-jp-handout
Prediction io–final 2014-jp-handout
 
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Machine Learning system architecture – Microsoft Translator, a Case Study :  ...Machine Learning system architecture – Microsoft Translator, a Case Study :  ...
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
 
201203 Adaptive Empathetic Software
201203 Adaptive Empathetic Software201203 Adaptive Empathetic Software
201203 Adaptive Empathetic Software
 
AI For Enterprise
AI For EnterpriseAI For Enterprise
AI For Enterprise
 
The Universal Recommender
The Universal RecommenderThe Universal Recommender
The Universal Recommender
 

Ähnlich wie PredictionIO - Scalable Machine Learning Architecture

Evolving your Data Access with MongoDB Stitch
Evolving your Data Access with MongoDB StitchEvolving your Data Access with MongoDB Stitch
Evolving your Data Access with MongoDB Stitch
MongoDB
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
Yael Garten
 

Ähnlich wie PredictionIO - Scalable Machine Learning Architecture (20)

GDSC Backend Bootcamp.pptx
GDSC Backend Bootcamp.pptxGDSC Backend Bootcamp.pptx
GDSC Backend Bootcamp.pptx
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
Are API Services Taking Over All the Interesting Data Science Problems?
Are API Services Taking Over All the Interesting Data Science Problems?Are API Services Taking Over All the Interesting Data Science Problems?
Are API Services Taking Over All the Interesting Data Science Problems?
 
Coding Naked 2023
Coding Naked 2023Coding Naked 2023
Coding Naked 2023
 
How We Built a Mobile Electronic Health Record App Using Xamarin, Angular, an...
How We Built a Mobile Electronic Health Record App Using Xamarin, Angular, an...How We Built a Mobile Electronic Health Record App Using Xamarin, Angular, an...
How We Built a Mobile Electronic Health Record App Using Xamarin, Angular, an...
 
Evolving your Data Access with MongoDB Stitch
Evolving your Data Access with MongoDB StitchEvolving your Data Access with MongoDB Stitch
Evolving your Data Access with MongoDB Stitch
 
Mobile optimization
Mobile optimizationMobile optimization
Mobile optimization
 
Abhishek_Kumar
Abhishek_KumarAbhishek_Kumar
Abhishek_Kumar
 
Developing Next-Gen Enterprise Web Application
Developing Next-Gen Enterprise Web ApplicationDeveloping Next-Gen Enterprise Web Application
Developing Next-Gen Enterprise Web Application
 
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018
Yufeng Guo |  Coding the 7 steps of machine learning | Codemotion Madrid 2018 Yufeng Guo |  Coding the 7 steps of machine learning | Codemotion Madrid 2018
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018
 
MongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB.local Atlanta: Introduction to Serverless MongoDBMongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB.local Atlanta: Introduction to Serverless MongoDB
 
Clean Architecture @ Taxibeat
Clean Architecture @ TaxibeatClean Architecture @ Taxibeat
Clean Architecture @ Taxibeat
 
I Know It Was MEAN, But I Cut the Cord to LAMP Anyway
I Know It Was MEAN, But I Cut the Cord to LAMP AnywayI Know It Was MEAN, But I Cut the Cord to LAMP Anyway
I Know It Was MEAN, But I Cut the Cord to LAMP Anyway
 
Large scale data capture and experimentation platform at Grab
Large scale data capture and experimentation platform at GrabLarge scale data capture and experimentation platform at Grab
Large scale data capture and experimentation platform at Grab
 
Sufan presentation
Sufan presentationSufan presentation
Sufan presentation
 
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
MongoDB World 2018: Ch-Ch-Ch-Ch-Changes: Taking Your Stitch Application to th...
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Do we need a bigger dev data culture
Do we need a bigger dev data cultureDo we need a bigger dev data culture
Do we need a bigger dev data culture
 
Evolving your Data Access with MongoDB Stitch - Drew Di Palma
Evolving your Data Access with MongoDB Stitch - Drew Di PalmaEvolving your Data Access with MongoDB Stitch - Drew Di Palma
Evolving your Data Access with MongoDB Stitch - Drew Di Palma
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

PredictionIO - Scalable Machine Learning Architecture

  • 1. Simon Chan simon@prediction.io Data Science London - April 24, 2013 Big Data Week
  • 2. Machine Learning is.... computers learning to predict from data
  • 5. Big Data Bottlenecks Machine Learning Processing
  • 6. PredictionIO has a horizontally scalable architecture
  • 7.
  • 8. Async SDK Client client = new Client(appkey); // Adding user behaviors req = client.getUserRateItemRequestBuilder(uid, iid, rate); client.userRateItemAsFuture(req);
  • 9. Play Framework ‣ stateless - no server session ‣ non-blocking web request
  • 10. Play: A Non-blocking Example def index = Action { val futureInt = scala.concurrent.Future { slowDataProcess() } Async { futureInt.map(i => Ok(views.html.result.render(i))) } }
  • 11. MongoDB ‣ Read scaling: Replica Sets ‣ Write scaling: Sharding ‣ Indexes (e.g. geospatial) { geoSearch : "places", near : [33, 33], maxDistance : 6, search : { uid : "user1" } }
  • 13. MapReduce - Native Java public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws .....{ String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }
  • 14. MapReduce - Scalding class ScaldingTestJob(args: Args) extends Job(args) { Tsv(args(0), 'text) .flatMap('text -> 'word) { text : String => text.split("s+") } .groupBy('word) { _.size } .write(Tsv(args(1)) }
  • 16. ### Sample PredictionIO Python SDK Code client = predictionio.Client(appkey="<your app key>") # Add Data client.create_user(uid=”user123”) client.create_item(iid=”itemXYZ”, itypes=(1,)) client.user_view_item(uid=”user123”, iid=”itemXYZ”) # Get Prediction rec = client.get_itemrec(engine="<engine name>", uid=”user123”, n=5)
  • 17. Getting Involved! - @PredictionIO - prediction.io - Newsletter - github.com/predictionio
  • 18. Q&A Q: Selecting the right features is a big problem. Can PredictionIO solve this problem? A: Not at this moment.That’s why we focus on collaborative filtering algorithms right now which don’t require the use of features.And we believe that the involvement of data scientists is needed for many specific problems. PredictionIO is positioned as a tool to make their work easier, but not as a replacement. Q: How’s PredictionIO different from Weka? A:Weka, like Mahout, is a ML algorithm library.You can see PredictionIO as a layer on top of it, which helps you to implement algorithm into production environment by providing a complete infrastructure. Q: How do you compare PredictionIO with RapidMiner? A: RapidMiner is a great product to define data engineering workflow visually. PredictionIO focuses on a different problem -- i.e. deploying ML solution into production environment. Q: How does the algorithm evaluation metrics work in PredictionIO? A: At this moment, you can evaluate algorithms by some offline metrics, such as Mean Average Precision, based on your existing data. Q:What’s the business model? A: We focus on making PredictionIO a useful open source product at this moment.