David	Taieb	
STSM	-	IBM	Cloud	Data	Services	
Developer	advocate		
david_taieb@us.ibm.com	
HANDS-ON	SESSION:		
DEVELOPING	A...
©2016	IBM	Corpora6on		
	
Agenda
•  Pre-requisite	steps	to	be	completed	before	
the	session	
•  Flight	Predict	app	descrip6...
©2016	IBM	Corpora6on		
	
Sign up for Bluemix
•  Access	IBM	Bluemix	website	on	hMps://console.ng.bluemix.net	
•  Click	on	G...
©2016	IBM	Corpora6on		
	
Sign up for a free trial at Flightstats.com
•  Sign	up	at	hMps://developer.flightstats.com/signup	...
©2016	IBM	Corpora6on		
	
Where to find the FlightStats app id and app key
APP	ID	
APP	Key	
Prepare	your	bluemix	space
©2016	IBM	Corpora6on		
	
Create a new space on Bluemix
In	prepara6on	for	running	the	project,	we	create	a	new	space	on	Blu...
©2016	IBM	Corpora6on		
	
Create a Spark Instance
Op6onal:	You	can	skip	this	step	if	you	already	have	a	
space	with	Spark	i...
©2016	IBM	Corpora6on		
	
Create New Spark Instance
Op6onal:	You	can	skip	this	step	if	you	already	have	a	
space	with	Spark...
©2016	IBM	Corpora6on		
	
Agenda
•  Pre-requisite	steps	to	be	completed	before	
the	session	
•  Flight	Predict	app	descrip6...
©2016	IBM	Corpora6on		
	
Flight App Project Description
•  Use	case	
–  Flight	delays	are	a	common	disturbance	during	busi...
©2016	IBM	Corpora6on		
	
Get/Build/Analyze methodology
©2016	IBM	Corpora6on		
	
Flight Predict App Architecture
Weather	
Simple	Data	
Pipes	
Airports	
Flight	Schedules	
Flight	S...
©2016	IBM	Corpora6on		
	
Flow Diagram
Data	
Acquisi6on	
Data	
Prepara6on	
Data	Annota6on	
(Ground	Truth)	
Model	
Training	...
©2016	IBM	Corpora6on		
	
Get the data and build the training/test/blind sets
In	this	step	we’ll	use	Simple	Data	Pipes	open...
©2016	IBM	Corpora6on		
	
Acquiring the data
•  In	the	next	sec6on,	we	show	how	to	acquire	the	training	data	by	
using	the	...
©2016	IBM	Corpora6on		
	
Deploy simple-data-pipe with flightstats connector
•  Go	to	hMps://github.com/ibm-cds-labs/simple...
©2016	IBM	Corpora6on		
	
Complete simple-data-pipe deployment
Add	Weather	service
©2016	IBM	Corpora6on		
	
Add an instance of IBM Weather Service on Bluemix
•  Return	to	the	applica6on	dashboard	
•  Weath...
©2016	IBM	Corpora6on		
	
Create an instance of IBM Weather Service on Bluemix
Search	for	Weather	
Make	sure	to	select	
“pr...
©2016	IBM	Corpora6on		
	
Checkpoint: simple data pipe app dashboard
•  Verify	that	your	app	is	correctly	bound	to	the	righ...
©2016	IBM	Corpora6on		
	
Install flight predict connector
•  Click	Edit	Code	buMon,	edit	package.json	to	add	flight	predict...
©2016	IBM	Corpora6on		
	
Install flight predict connector
•  Click	File/Save	to	save	your	changes	
Redeploy	simple	data	pi...
©2016	IBM	Corpora6on		
	
Redeploy simple data pipe app
•  Use	live	edit	Editor	to	redeploy	the	app	
Verify	your	sdp	instal...
©2016	IBM	Corpora6on		
	
Verify connector install
•  In	this	step,	we	verify	that	the	flight	predict	connector	is	correctly...
©2016	IBM	Corpora6on		
	
Create a new FlightStats pipe
•  Follow	each	screen	to	create	and	configure	a	new	pipe	
Run	the	pi...
©2016	IBM	Corpora6on		
	
Run the pipe
•  Skip	over	the	schedule	tab	
•  In	the	ac6vity	tab,	click	on	Run	Now	to	start	the	...
©2016	IBM	Corpora6on		
	
Explore the data sets
•  In	this	step,	we	take	a	moment	to	explore	the	different	data	sets	that	ha...
©2016	IBM	Corpora6on		
	
Run the pipe again to build the test set
Train	the	models
©2016	IBM	Corpora6on		
	
Train the Models
•  In	the	previous	sec6on	we	have	created	the	training	data	and	we	are	now	ready...
©2016	IBM	Corpora6on		
	
Create a new IPython Notebook
©2016	IBM	Corpora6on		
	
Notebook tour
©2016	IBM	Corpora6on		
	
Notebook tour: Notebook Info
©2016	IBM	Corpora6on		
	
Notebook tour: Environment
©2016	IBM	Corpora6on		
	
Notebook tour: Sharing
`
©2016	IBM	Corpora6on		
	
Agenda
•  Pre-requisite	steps	to	be	completed	before	
the	session	
•  Flight	Predict	app	descrip6...
©2016	IBM	Corpora6on		
	
Before we start building the app…
•  You	can	op6onally	follow	this	tutorial	from	
Github	by	using...
©2016	IBM	Corpora6on		
	
Optional: use prebuilt notebook
Import	required	Python	packages	
• Create	notebook	from	URL	
• Us...
©2016	IBM	Corpora6on		
	
Using Python Packages
•  Write	code	inline	within	cells	
•  Encapsulate	helper	APIs	within	Python...
©2016	IBM	Corpora6on		
	
Flight Predict Python Package on Github
Setup	script	for	installing	Python	Package	
Flight	Predic...
©2016	IBM	Corpora6on		
	
Method 1: Install Flight Predict Package
•  Use	pip	to	Install	Flight	Predict	package	
•  Recomme...
©2016	IBM	Corpora6on		
	
Manage Python packages
•  Check	status	
•  Uninstall	package	
Install	packages	via	sc.addPyFile	m...
©2016	IBM	Corpora6on		
	
Method 2: Install py modules via sc.addPyFile
•  addPyFile	install	individual	py	modules	and	make...
©2016	IBM	Corpora6on		
	
Setup credentials and Import required python modules
In	this	step,	we	import	python	modules	that	...
©2016	IBM	Corpora6on		
	
Get Credentials for Cloudant
From	the	app	dashboard,	click	on	Environment	Variables	from	the	les	...
©2016	IBM	Corpora6on		
	
Get Credentials for Weather
Load	training	set	from	Cloudant
©2016	IBM	Corpora6on		
	
Load training set in Spark SQL DataFrame
…	
In	this	step,	we	use	the	cloudant-spark	connector	(hM...
©2016	IBM	Corpora6on		
	
Loading data: Behind the scene
Use	Spark	SQL	connector	to	load	data	into	a	DataFrame	
connector	i...
©2016	IBM	Corpora6on		
	
Scatter plot visualization
©2016	IBM	Corpora6on		
	
Visualization api
Create	an	RDD	of	LabeledPoint
©2016	IBM	Corpora6on		
	
Transform into an RDD of LabeledPoint
Use	Spark	SQL	connector	to	load	data	into	a	DataFrame
©2016	IBM	Corpora6on		
	
loadLabeledDataRDD api
Train	Machine	Learning	Models
©2016	IBM	Corpora6on		
	
Machine Learning Algorithms
ConSnuous	Output	 Discrete	Output	
Supervised	Learning	
(require	Grou...
©2016	IBM	Corpora6on		
	
Train Logistic Regression Model
Train	Naïve	Bayes	Models
©2016	IBM	Corpora6on		
	
Train NaiveBayes Model
Train	decision	Tree	Model
©2016	IBM	Corpora6on		
	
Train Decision Tree Model
Train	Random	Forest	Model
©2016	IBM	Corpora6on		
	
Train Random Forest Model
Accuracy	Analysis
©2016	IBM	Corpora6on		
	
Naïve Bayes vs Decision Tree
•  Probabilis6c:	compute	the	probability	
of	a	data	instance	to	be	i...
©2016	IBM	Corpora6on		
	
Accuracy Analysis of the Machine Learning Models
In	this	sec6on,	we	will	perform	accuracy	analysi...
©2016	IBM	Corpora6on		
	
Agenda
•  Pre-requisite	steps	to	be	completed	before	
the	session	
•  Flight	Predict	app	descrip6...
©2016	IBM	Corpora6on		
	
Load Test data
Make	sure	to	change	
the	db	name	to	match	
the	one	created	for	
your	test	set	by	y...
©2016	IBM	Corpora6on		
	
Accuracy Metrics
©2016	IBM	Corpora6on		
	
Confusion Matrix
©2016	IBM	Corpora6on		
	
Confusion Matrix
©2016	IBM	Corpora6on		
	
Confusion Matrix
©2016	IBM	Corpora6on		
	
Confusion Matrix
©2016	IBM	Corpora6on		
	
Accuracy metrics API
Output	HTML	
Display	results	HTML	in	Notebook	Cell	
Compute	Metrics	from	lab...
©2016	IBM	Corpora6on		
	
Understand the distribution of your data with Histograms
©2016	IBM	Corpora6on		
	
Training Handler class
•  Provide	flexibility	and	extensibility	to	the	
applica6on	
•  Provide	a	f...
©2016	IBM	Corpora6on		
	
Default Training Handler class
Return	descrip6on	for	each	classes	
Return	total	number	of	classes...
©2016	IBM	Corpora6on		
	
Customize Training Handler
Provide	new	classifica6on	and	add	day	of	departure	as	a	new	feature	
In...
©2016	IBM	Corpora6on		
	
Re-train the models
©2016	IBM	Corpora6on		
	
Re-compute accuracy
Models	1	
Models	2	
BeMer	accuracy	for	NaiveBayes	
and	Logis6c	Regression	
Wo...
©2016	IBM	Corpora6on		
	
Agenda
•  Pre-requisite	steps	to	be	completed	before	
the	session	
•  Flight	Predict	app	descrip6...
©2016	IBM	Corpora6on		
	
Deploy and Run the models
In	the	last	sec6on,	we	will	simulate	deployment	and	running	of	the	mode...
©2016	IBM	Corpora6on		
	
Run the predictive model
©2016	IBM	Corpora6on		
	
runModel API
©2016	IBM	Corpora6on		
	
Get Weather Predictions
©2016	IBM	Corpora6on		
	
Show prediction results
©2016	IBM	Corpora6on		
	
Resource
•  hMps://developer.ibm.com/clouddataservices/	
•  hMps://github.com/ibm-cds-labs/simple...
©2016	IBM	Corpora6on		
	
Thank You
Nächste SlideShare
Wird geladen in …5
×

Spark tutorial pycon 2016 part 1

2.698 Aufrufe

Veröffentlicht am

Build a Machine Learning model with Apache Spark MLLib to predict flight delays using weather data

Veröffentlicht in: Daten & Analysen

Spark tutorial pycon 2016 part 1

  1. 1. David Taieb STSM - IBM Cloud Data Services Developer advocate david_taieb@us.ibm.com HANDS-ON SESSION: DEVELOPING ANALYTIC APPLICATIONS USING APACHE SPARK™ AND PYTHON Part 1: Flight Delay Predict with Spark ML PyCon 2016, Portland
  2. 2. ©2016 IBM Corpora6on Agenda •  Pre-requisite steps to be completed before the session •  Flight Predict app descrip6on and architecture •  Train the models in the Notebook •  Accuracy Analysis and models refinement •  Deploy and run the models
  3. 3. ©2016 IBM Corpora6on Sign up for Bluemix •  Access IBM Bluemix website on hMps://console.ng.bluemix.net •  Click on Get Started for Free •  Complete the form and click Create account •  Look for confirma6on email and click on confirm you account link Sign up for flightstats
  4. 4. ©2016 IBM Corpora6on Sign up for a free trial at Flightstats.com •  Sign up at hMps://developer.flightstats.com/signup •  Fill out the form and monitor email for confirma6on link (access to APIs may take up to 24 hours) •  Once access is granted go to hMps://developer.flightstats.com/admin/applica6ons to view appId and appKey (you will need them in the simple-data-pipe tool to create training sets. •  Op6onal: get familiar with the various flightstats apis: –  hMps://developer.flightstats.com/api-docs/scheduledFlights/v1 –  hMps://developer.flightstats.com/api-docs/airports/v1 How to find your app id and key
  5. 5. ©2016 IBM Corpora6on Where to find the FlightStats app id and app key APP ID APP Key Prepare your bluemix space
  6. 6. ©2016 IBM Corpora6on Create a new space on Bluemix In prepara6on for running the project, we create a new space on Bluemix Create a Spark Instance Op6onal: You can skip this step if you already have a space with Spark instance that you would like to reuse
  7. 7. ©2016 IBM Corpora6on Create a Spark Instance Op6onal: You can skip this step if you already have a space with Spark instance that you would like to reuse
  8. 8. ©2016 IBM Corpora6on Create New Spark Instance Op6onal: You can skip this step if you already have a space with Spark instance that you would like to reuse
  9. 9. ©2016 IBM Corpora6on Agenda •  Pre-requisite steps to be completed before the session •  Flight Predict app descrip6on and architecture •  Train the models in the Notebook •  Accuracy Analysis and models refinement •  Deploy and run the models
  10. 10. ©2016 IBM Corpora6on Flight App Project Description •  Use case –  Flight delays are a common disturbance during business trips –  Being able to predict how likely a flight will be delayed can remove uncertainty and enable users to plan around it. –  Idea: Weather data can be a good explanatory variable for building predic6ve models •  ImplementaSon –  Combine flight sta6s6cs from flightstats.com (System of records) with weather data from IBM Insight for Weather (System of opera6ons) to build a training, test and blind set –  Use Spark MLLib to train predic6ve models and cross validate them –  Create a custom card for Google Now that will automa6cally no6fy user of impending flight delay –  Propose alterna6ng flight routes (e.g. Freebird) Get/Build/Analyze
  11. 11. ©2016 IBM Corpora6on Get/Build/Analyze methodology
  12. 12. ©2016 IBM Corpora6on Flight Predict App Architecture Weather Simple Data Pipes Airports Flight Schedules Flight Status Metadata Training Set Test Set Blind Set Custom Connector run every 24 hours Notebook
  13. 13. ©2016 IBM Corpora6on Flow Diagram Data Acquisi6on Data Prepara6on Data Annota6on (Ground Truth) Model Training •  Cleansing •  Shaping •  Enrichment Model Tes6ng Training Set Test Set Blind Set Iterative Cross-Validation Evaluate Performance and optimize model Train Model •  Itera6ve in Nature: we are never done! •  We will be using this diagram as a roadmap throughout this course Deploy and Run Model
  14. 14. ©2016 IBM Corpora6on Get the data and build the training/test/blind sets In this step we’ll use Simple Data Pipes open source project to acquire data from Flightstats, combine it with Weather data from IBM Insight for Weather and save the data sets into a NoSQL Cloudant Database. Data Acquisi6on Data Prepara6on Data Annota6on (Ground Truth) Model Training •  Cleansing •  Shaping •  Enrichment Model Tes6ng Training Set Test Set Blind Set Iterative Cross-Validation Evaluate Performance and optimize model Train Model Deploy and Run Model
  15. 15. ©2016 IBM Corpora6on Acquiring the data •  In the next sec6on, we show how to acquire the training data by using the simple-data-pipe tool and flight predict connector. •  The flight predict connector combine historical flight data from flightstats.com with weather data from IBM Insight for Weather •  If you want to skip these steps, you can use the already built dataset by using the following creden6als: –  cloudantHost: dtaieb.cloudant.com –  cloudantUserName: weenesserliffircedinvers –  cloudantPassword: 72a5c4f939a9e2578698029d2bb041d775d088b5 Deploy simple-data-pipe
  16. 16. ©2016 IBM Corpora6on Deploy simple-data-pipe with flightstats connector •  Go to hMps://github.com/ibm-cds-labs/simple-data-pipe •  Click on Deploy to Bluemix buMon Click buMon will take you to Bluemix
  17. 17. ©2016 IBM Corpora6on Complete simple-data-pipe deployment Add Weather service
  18. 18. ©2016 IBM Corpora6on Add an instance of IBM Weather Service on Bluemix •  Return to the applica6on dashboard •  Weather service is required by the flight predict connector and must be installed before •  From app dashboard, click on Add a service or API
  19. 19. ©2016 IBM Corpora6on Create an instance of IBM Weather Service on Bluemix Search for Weather Make sure to select “premium plan” to have enough authorized API calls
  20. 20. ©2016 IBM Corpora6on Checkpoint: simple data pipe app dashboard •  Verify that your app is correctly bound to the right services Weather Service used to enrich flight records with weather observa6ons Cloudant Service used to store training, test and blind data sets You’ll need to click on this buMon for the step on the next page It is recommended to increase the app memory to 1GB
  21. 21. ©2016 IBM Corpora6on Install flight predict connector •  Click Edit Code buMon, edit package.json to add flight predict module: – "simple-data-pipe-connector-flightstats":"git://github.com/ibm-cds-labs/simple-data-pipe-connector-flightstats.git" add flight predict module to dependencies Save your changes don’t forget to add comma in the line before to keep json valid
  22. 22. ©2016 IBM Corpora6on Install flight predict connector •  Click File/Save to save your changes Redeploy simple data pipe
  23. 23. ©2016 IBM Corpora6on Redeploy simple data pipe app •  Use live edit Editor to redeploy the app Verify your sdp install
  24. 24. ©2016 IBM Corpora6on Verify connector install •  In this step, we verify that the flight predict connector is correctly installed through the UI Fight connector correctly installed Create new flightstats pipe
  25. 25. ©2016 IBM Corpora6on Create a new FlightStats pipe •  Follow each screen to create and configure a new pipe Run the pipe
  26. 26. ©2016 IBM Corpora6on Run the pipe •  Skip over the schedule tab •  In the ac6vity tab, click on Run Now to start the pipe Explore the data set Click Run Now Then open the log to monitor the ac6vity
  27. 27. ©2016 IBM Corpora6on Explore the data sets •  In this step, we take a moment to explore the different data sets that have been created by the simple data pipe tool •  From bluemix dashboard, click on the cloudant service 6le, then on the Launch buMon •  From the Cloudant dashboard, open the training database •  Open a document to look at the data structure Build the test set
  28. 28. ©2016 IBM Corpora6on Run the pipe again to build the test set Train the models
  29. 29. ©2016 IBM Corpora6on Train the Models •  In the previous sec6on we have created the training data and we are now ready to train the models. •  Steps in this sec6on: –  Create an IPython Notebook –  Load the data sets from the Cloudant database into a Spark Cluster –  Explore the data and train the machine learning models Data Acquisi6on Data Prepara6on Data Annota6on (Ground Truth) Model Training •  Cleansing •  Shaping •  Enrichment Model Tes6ng Training Set Test Set Blind Set Iterative Cross-Validation Evaluate Performance and optimize model Train Model Deploy and Run Model Create IPython Notebook
  30. 30. ©2016 IBM Corpora6on Create a new IPython Notebook
  31. 31. ©2016 IBM Corpora6on Notebook tour
  32. 32. ©2016 IBM Corpora6on Notebook tour: Notebook Info
  33. 33. ©2016 IBM Corpora6on Notebook tour: Environment
  34. 34. ©2016 IBM Corpora6on Notebook tour: Sharing `
  35. 35. ©2016 IBM Corpora6on Agenda •  Pre-requisite steps to be completed before the session •  Flight Predict app descrip6on and architecture •  Train the models in the Notebook •  Accuracy Analysis and models refinement •  Deploy and run the models
  36. 36. ©2016 IBM Corpora6on Before we start building the app… •  You can op6onally follow this tutorial from Github by using a fully built notebook: – hMps://github.com/ibm-cds-labs/simple-data- pipe-connector-flightstats/blob/master/ notebook/Flight%20Predict%20PyCon %202016.ipynb
  37. 37. ©2016 IBM Corpora6on Optional: use prebuilt notebook Import required Python packages • Create notebook from URL • Use hMps://github.com/ibm-cds-labs/simple-data-pipe-connector-flightstats/ raw/master/notebook/Flight%20Predict%20PyCon%202016.ipynb
  38. 38. ©2016 IBM Corpora6on Using Python Packages •  Write code inline within cells •  Encapsulate helper APIs within Python package •  2 ways of using helper Python packages –  egg distribu6on package: pip install from PyPi server or file server (e.g. Github) •  Persistent install across sessions •  Recommended in Produc6on –  SparkContext.addPyFile •  Easy addi6on of a python module file •  Support mul6ple module files via zip format •  Recommended during development where frequent code changes occur Manage egg packages
  39. 39. ©2016 IBM Corpora6on Flight Predict Python Package on Github Setup script for installing Python Package Flight Predict Python library
  40. 40. ©2016 IBM Corpora6on Method 1: Install Flight Predict Package •  Use pip to Install Flight Predict package •  Recommended alterna6ve: build egg distribu6on package and deploy in PyPi
  41. 41. ©2016 IBM Corpora6on Manage Python packages •  Check status •  Uninstall package Install packages via sc.addPyFile method
  42. 42. ©2016 IBM Corpora6on Method 2: Install py modules via sc.addPyFile •  addPyFile install individual py modules and make them available to all executor processes •  Works with modules in zipped files Module containing apis for training the models Module containing apis for running the models Configure creden6als for various services
  43. 43. ©2016 IBM Corpora6on Setup credentials and Import required python modules In this step, we import python modules that will be needed throughout the notebook and setup creden6als to various services. How to get creden6als for Cloudant and Weather Creden6al for Cloudant NoSQL Service Creden6als for Weather Service
  44. 44. ©2016 IBM Corpora6on Get Credentials for Cloudant From the app dashboard, click on Environment Variables from the les sidebar
  45. 45. ©2016 IBM Corpora6on Get Credentials for Weather Load training set from Cloudant
  46. 46. ©2016 IBM Corpora6on Load training set in Spark SQL DataFrame … In this step, we use the cloudant-spark connector (hMps://github.com/cloudant-labs/spark-cloudant) to load data into Spark Make sure to change the db name to match the one created for your training set by your ac6vity (open the Cloudant dashboard to find the name)
  47. 47. ©2016 IBM Corpora6on Loading data: Behind the scene Use Spark SQL connector to load data into a DataFrame connector id Op6ons Cache data for op6mized reuse Create temp SQL Table ScaMer Plot Visualiza6on
  48. 48. ©2016 IBM Corpora6on Scatter plot visualization
  49. 49. ©2016 IBM Corpora6on Visualization api Create an RDD of LabeledPoint
  50. 50. ©2016 IBM Corpora6on Transform into an RDD of LabeledPoint Use Spark SQL connector to load data into a DataFrame
  51. 51. ©2016 IBM Corpora6on loadLabeledDataRDD api Train Machine Learning Models
  52. 52. ©2016 IBM Corpora6on Machine Learning Algorithms ConSnuous Output Discrete Output Supervised Learning (require Ground-Truth) •  Regression - Linear - Ridge - Lasso - Isotonic •  Decision Tree •  RandomForest •  GradientBoostedTree • Classifica6on - Logis6c Regression - SVM - NaiveBayes • Decision Tree • RandomForest • GradientBoostedTree • K-NN (available as add-on spark package) Unsupervised Learning (no Ground-Truth data required) •  Clustering - KMeans - Gaussian Mixture •  Dimensionality Reduc6on - PCA - SVD •  FP-Growth Train Logis6c Regression Model
  53. 53. ©2016 IBM Corpora6on Train Logistic Regression Model Train Naïve Bayes Models
  54. 54. ©2016 IBM Corpora6on Train NaiveBayes Model Train decision Tree Model
  55. 55. ©2016 IBM Corpora6on Train Decision Tree Model Train Random Forest Model
  56. 56. ©2016 IBM Corpora6on Train Random Forest Model Accuracy Analysis
  57. 57. ©2016 IBM Corpora6on Naïve Bayes vs Decision Tree •  Probabilis6c: compute the probability of a data instance to be in a specific class •  Assume that each feature (variable) is independent from the others •  Performance depends on the predic6ve nature of the features (non predic6ve features will affect the accuracy) •  Works well with low amount of training data. Doesn’t need all the possibili6es •  Doesn’t work with categorical features. • Non-Probabilistic: partition the data into subsets that best describe the variable • The deeper the tree, the better the model fits the data • Watch out for overfiting: need to prune the tree • Can handle categorical or continuous features • No need for input to be scaled or standardized: Set you features and go! • Requires a lot of data covering all possibilities
  58. 58. ©2016 IBM Corpora6on Accuracy Analysis of the Machine Learning Models In this sec6on, we will perform accuracy analysis on the test data. We will start by compu6ng the accuracy metrics for each model, including the confusion matrix. We will then use histogram chart to understand the data distribu6on and refine how to classes are computed. Data Acquisi6on Data Prepara6on Data Annota6on (Ground Truth) Model Training •  Cleansing •  Shaping •  Enrichment Model Tes6ng Training Set Test Set Blind Set Iterative Cross-Validation Evaluate Performance and optimize model Train Model Deploy and Run Model
  59. 59. ©2016 IBM Corpora6on Agenda •  Pre-requisite steps to be completed before the session •  Flight Predict app descrip6on and architecture •  Train the models in the Notebook •  Accuracy Analysis and models refinement •  Deploy and run the models
  60. 60. ©2016 IBM Corpora6on Load Test data Make sure to change the db name to match the one created for your test set by your ac6vity (open the Cloudant dashboard to find the name)
  61. 61. ©2016 IBM Corpora6on Accuracy Metrics
  62. 62. ©2016 IBM Corpora6on Confusion Matrix
  63. 63. ©2016 IBM Corpora6on Confusion Matrix
  64. 64. ©2016 IBM Corpora6on Confusion Matrix
  65. 65. ©2016 IBM Corpora6on Confusion Matrix
  66. 66. ©2016 IBM Corpora6on Accuracy metrics API Output HTML Display results HTML in Notebook Cell Compute Metrics from labeled and predic6on data Get the confusion matrix and build html table
  67. 67. ©2016 IBM Corpora6on Understand the distribution of your data with Histograms
  68. 68. ©2016 IBM Corpora6on Training Handler class •  Provide flexibility and extensibility to the applica6on •  Provide a fail fast and try something else mechanism •  Enable user to easily customize classes of data based on how data is distributed •  Enable user to easily add training features
  69. 69. ©2016 IBM Corpora6on Default Training Handler class Return descrip6on for each classes Return total number of classes: Default is 5 Re-classify a record: default uses s.classifica6on field in Json record Extra features Names to be added. None by default Extra features to be added. Array must match the one returned by customTrainingFeaturesNames
  70. 70. ©2016 IBM Corpora6on Customize Training Handler Provide new classifica6on and add day of departure as a new feature Inherit from defaultTrainingHandler Add day of the week using a technique called dummy coding
  71. 71. ©2016 IBM Corpora6on Re-train the models
  72. 72. ©2016 IBM Corpora6on Re-compute accuracy Models 1 Models 2 BeMer accuracy for NaiveBayes and Logis6c Regression Worse for DecisionTree and RandomForest
  73. 73. ©2016 IBM Corpora6on Agenda •  Pre-requisite steps to be completed before the session •  Flight Predict app descrip6on and architecture •  Train the models in the Notebook •  Accuracy Analysis and models refinement •  Deploy and run the models
  74. 74. ©2016 IBM Corpora6on Deploy and Run the models In the last sec6on, we will simulate deployment and running of the models through the notebook by calling APIs from the run package. Data Acquisi6on Data Prepara6on Data Annota6on (Ground Truth) Model Training •  Cleansing •  Shaping •  Enrichment Model Tes6ng Training Set Test Set Blind Set Iterative Cross-Validation Evaluate Performance and optimize model Train Model Deploy and Run Models
  75. 75. ©2016 IBM Corpora6on Run the predictive model
  76. 76. ©2016 IBM Corpora6on runModel API
  77. 77. ©2016 IBM Corpora6on Get Weather Predictions
  78. 78. ©2016 IBM Corpora6on Show prediction results
  79. 79. ©2016 IBM Corpora6on Resource •  hMps://developer.ibm.com/clouddataservices/ •  hMps://github.com/ibm-cds-labs/simple-data-pipe •  hMps://github.com/ibm-cds-labs/pipes-connector-flightstats •  hMp://spark.apache.org/docs/latest/mllib-guide.html •  hMps://console.ng.bluemix.net/data/analy6cs/
  80. 80. ©2016 IBM Corpora6on Thank You

×