SlideShare ist ein Scribd-Unternehmen logo
David	Taieb	
STSM	-	IBM	Cloud	Data	Services	
Developer	advocate		
david_taieb@us.ibm.com	
HANDS-ON	SESSION:		
DEVELOPING	ANALYTIC	APPLICATIONS	
USING	APACHE	SPARK™	AND	PYTHON	
	
Part	1:	Flight	Delay	Predict	with	Spark	ML	
PyCon	2016,	Portland
©2016	IBM	Corpora6on		
	
Agenda
•  Pre-requisite	steps	to	be	completed	before	
the	session	
•  Flight	Predict	app	descrip6on	and	architecture	
•  Train	the	models	in	the	Notebook	
•  Accuracy	Analysis	and	models	refinement	
•  Deploy	and	run	the	models
©2016	IBM	Corpora6on		
	
Sign up for Bluemix
•  Access	IBM	Bluemix	website	on	hMps://console.ng.bluemix.net	
•  Click	on	Get	Started	for	Free	
•  Complete	the	form	and	click	Create	account	
•  Look	for	confirma6on	email	and	click	on	confirm	you	account	link	
Sign	up	for	flightstats
©2016	IBM	Corpora6on		
	
Sign up for a free trial at Flightstats.com
•  Sign	up	at	hMps://developer.flightstats.com/signup	
•  Fill	out	the	form	and	monitor	email	for	confirma6on	link	(access	to	APIs	may	
take	up	to	24	hours)	
•  Once	access	is	granted	go	to	
hMps://developer.flightstats.com/admin/applica6ons	to	view	appId	and	
appKey	(you	will	need	them	in	the	simple-data-pipe	tool	to	create	training	
sets.	
•  Op6onal:	get	familiar	with	the	various	flightstats	apis:	
–  hMps://developer.flightstats.com/api-docs/scheduledFlights/v1	
–  hMps://developer.flightstats.com/api-docs/airports/v1	
	
How	to	find	your	app	id	and	key
©2016	IBM	Corpora6on		
	
Where to find the FlightStats app id and app key
APP	ID	
APP	Key	
Prepare	your	bluemix	space
©2016	IBM	Corpora6on		
	
Create a new space on Bluemix
In	prepara6on	for	running	the	project,	we	create	a	new	space	on	Bluemix		
Create	a	Spark	Instance	
Op6onal:	You	can	skip	this	step	if	you	already	have	a	
space	with	Spark	instance	that	you	would	like	to	reuse
©2016	IBM	Corpora6on		
	
Create a Spark Instance
Op6onal:	You	can	skip	this	step	if	you	already	have	a	
space	with	Spark	instance	that	you	would	like	to	reuse
©2016	IBM	Corpora6on		
	
Create New Spark Instance
Op6onal:	You	can	skip	this	step	if	you	already	have	a	
space	with	Spark	instance	that	you	would	like	to	reuse
©2016	IBM	Corpora6on		
	
Agenda
•  Pre-requisite	steps	to	be	completed	before	
the	session	
•  Flight	Predict	app	descrip6on	and	architecture	
•  Train	the	models	in	the	Notebook	
•  Accuracy	Analysis	and	models	refinement	
•  Deploy	and	run	the	models
©2016	IBM	Corpora6on		
	
Flight App Project Description
•  Use	case	
–  Flight	delays	are	a	common	disturbance	during	business	trips	
–  Being	able	to	predict	how	likely	a	flight	will	be	delayed	can	remove	uncertainty	and	enable	
users	to	plan	around	it.	
–  Idea:	Weather	data	can	be	a	good	explanatory	variable	for	building	predic6ve	models	
•  ImplementaSon	
–  Combine	flight	sta6s6cs	from	flightstats.com	(System	of	records)	with	weather	data	from	
IBM	Insight	for	Weather	(System	of	opera6ons)	to	build	a	training,	test	and	blind	set	
–  Use	Spark	MLLib	to	train	predic6ve	models	and	cross	validate	them	
–  Create	a	custom	card	for	Google	Now	that	will	automa6cally	no6fy	user	of	impending	
flight	delay	
–  Propose	alterna6ng	flight	routes	(e.g.	Freebird)	
Get/Build/Analyze
©2016	IBM	Corpora6on		
	
Get/Build/Analyze methodology
©2016	IBM	Corpora6on		
	
Flight Predict App Architecture
Weather	
Simple	Data	
Pipes	
Airports	
Flight	Schedules	
Flight	Status	
Metadata	
Training	
Set	
Test	
Set	
Blind	
Set	
Custom	Connector	
run	every	24	hours	
Notebook
©2016	IBM	Corpora6on		
	
Flow Diagram
Data	
Acquisi6on	
Data	
Prepara6on	
Data	Annota6on	
(Ground	Truth)	
Model	
Training	
•  Cleansing	
•  Shaping	
•  Enrichment	
Model	Tes6ng	
Training	
Set	
Test	
Set	
Blind	
Set	
Iterative
Cross-Validation
Evaluate Performance and optimize model
Train Model
•  Itera6ve	in	Nature:	we	are	never	done!	
•  We	will	be	using	this	diagram	as	a	roadmap	throughout	this	course	
Deploy	and	
Run	Model
©2016	IBM	Corpora6on		
	
Get the data and build the training/test/blind sets
In	this	step	we’ll	use	Simple	Data	Pipes	open	source	project	to	acquire	data	from	
Flightstats,	combine	it	with	Weather	data	from	IBM	Insight	for	Weather	and	save	
the	data	sets	into	a	NoSQL	Cloudant	Database.	
Data	
Acquisi6on	
Data	
Prepara6on	
Data	Annota6on	
(Ground	Truth)	
Model	
Training	
•  Cleansing	
•  Shaping	
•  Enrichment	
Model	Tes6ng	
Training	
Set	
Test	
Set	
Blind	
Set	
Iterative
Cross-Validation
Evaluate Performance and optimize model
Train Model
Deploy	and	
Run	Model
©2016	IBM	Corpora6on		
	
Acquiring the data
•  In	the	next	sec6on,	we	show	how	to	acquire	the	training	data	by	
using	the	simple-data-pipe	tool	and	flight	predict	connector.	
•  The	flight	predict	connector	combine	historical	flight	data	from	
flightstats.com	with	weather	data	from	IBM	Insight	for	Weather	
•  If	you	want	to	skip	these	steps,	you	can	use	the	already	built	
dataset	by	using	the	following	creden6als:	
–  cloudantHost:	dtaieb.cloudant.com	
–  cloudantUserName:	weenesserliffircedinvers	
–  cloudantPassword:	72a5c4f939a9e2578698029d2bb041d775d088b5	
Deploy	simple-data-pipe
©2016	IBM	Corpora6on		
	
Deploy simple-data-pipe with flightstats connector
•  Go	to	hMps://github.com/ibm-cds-labs/simple-data-pipe	
•  Click	on	Deploy	to	Bluemix	buMon	
Click	buMon	will	take	you	to	Bluemix
©2016	IBM	Corpora6on		
	
Complete simple-data-pipe deployment
Add	Weather	service
©2016	IBM	Corpora6on		
	
Add an instance of IBM Weather Service on Bluemix
•  Return	to	the	applica6on	dashboard	
•  Weather	service	is	required	by	the	
flight	predict	connector	and	must	be	
installed	before	
•  From	app	dashboard,	click	on	Add	a	
service	or	API
©2016	IBM	Corpora6on		
	
Create an instance of IBM Weather Service on Bluemix
Search	for	Weather	
Make	sure	to	select	
“premium	plan”	to	have	
enough	authorized	API	calls
©2016	IBM	Corpora6on		
	
Checkpoint: simple data pipe app dashboard
•  Verify	that	your	app	is	correctly	bound	to	the	right	services	
Weather	Service	used	to	enrich	
flight	records	with	weather	
observa6ons	
Cloudant	Service	used	
to	store	training,	test	
and	blind	data	sets	
You’ll	need	to	click	on	this	buMon	
for	the	step	on	the	next	page	It	is	recommended	to	increase	
the	app	memory	to	1GB
©2016	IBM	Corpora6on		
	
Install flight predict connector
•  Click	Edit	Code	buMon,	edit	package.json	to	add	flight	predict	module:	
– "simple-data-pipe-connector-flightstats":"git://github.com/ibm-cds-labs/simple-data-pipe-connector-flightstats.git"	
add	flight	predict	module	to	dependencies	
Save	your	changes	
don’t	forget	to	add	comma	in	the	line	before	to	keep	json	valid
©2016	IBM	Corpora6on		
	
Install flight predict connector
•  Click	File/Save	to	save	your	changes	
Redeploy	simple	data	pipe
©2016	IBM	Corpora6on		
	
Redeploy simple data pipe app
•  Use	live	edit	Editor	to	redeploy	the	app	
Verify	your	sdp	install
©2016	IBM	Corpora6on		
	
Verify connector install
•  In	this	step,	we	verify	that	the	flight	predict	connector	is	correctly	installed	through	the	UI	
Fight	connector	correctly	installed	
Create	new	flightstats	pipe
©2016	IBM	Corpora6on		
	
Create a new FlightStats pipe
•  Follow	each	screen	to	create	and	configure	a	new	pipe	
Run	the	pipe
©2016	IBM	Corpora6on		
	
Run the pipe
•  Skip	over	the	schedule	tab	
•  In	the	ac6vity	tab,	click	on	Run	Now	to	start	the	pipe	
Explore	the	data	set	
Click	Run	Now	
Then	open	the	log	to	monitor	the	ac6vity
©2016	IBM	Corpora6on		
	
Explore the data sets
•  In	this	step,	we	take	a	moment	to	explore	the	different	data	sets	that	have	been	created	by	the	
simple	data	pipe	tool	
•  From	bluemix	dashboard,	click	on	the	cloudant	service	6le,	then	on	the	Launch	buMon	
•  From	the	Cloudant	dashboard,	open	the	training	database	
•  Open	a	document	to	look	at	the	data	structure	
Build	the	test	set
©2016	IBM	Corpora6on		
	
Run the pipe again to build the test set
Train	the	models
©2016	IBM	Corpora6on		
	
Train the Models
•  In	the	previous	sec6on	we	have	created	the	training	data	and	we	are	now	ready	to	train	the	models.	
•  Steps	in	this	sec6on:	
–  Create	an	IPython	Notebook	
–  Load	the	data	sets	from	the	Cloudant	database	into	a	Spark	Cluster	
–  Explore	the	data	and	train	the	machine	learning	models	
Data	
Acquisi6on	
Data	
Prepara6on	
Data	Annota6on	
(Ground	Truth)	
Model	
Training	
•  Cleansing	
•  Shaping	
•  Enrichment	
Model	Tes6ng	
Training	
Set	
Test	
Set	
Blind	
Set	
Iterative
Cross-Validation
Evaluate Performance and optimize model
Train Model
Deploy	and	
Run	Model	
Create	IPython	Notebook
©2016	IBM	Corpora6on		
	
Create a new IPython Notebook
©2016	IBM	Corpora6on		
	
Notebook tour
©2016	IBM	Corpora6on		
	
Notebook tour: Notebook Info
©2016	IBM	Corpora6on		
	
Notebook tour: Environment
©2016	IBM	Corpora6on		
	
Notebook tour: Sharing
`
©2016	IBM	Corpora6on		
	
Agenda
•  Pre-requisite	steps	to	be	completed	before	
the	session	
•  Flight	Predict	app	descrip6on	and	architecture	
•  Train	the	models	in	the	Notebook	
•  Accuracy	Analysis	and	models	refinement	
•  Deploy	and	run	the	models
©2016	IBM	Corpora6on		
	
Before we start building the app…
•  You	can	op6onally	follow	this	tutorial	from	
Github	by	using	a	fully	built	notebook:	
– hMps://github.com/ibm-cds-labs/simple-data-
pipe-connector-flightstats/blob/master/
notebook/Flight%20Predict%20PyCon
%202016.ipynb
©2016	IBM	Corpora6on		
	
Optional: use prebuilt notebook
Import	required	Python	packages	
• Create	notebook	from	URL	
• Use	hMps://github.com/ibm-cds-labs/simple-data-pipe-connector-flightstats/
raw/master/notebook/Flight%20Predict%20PyCon%202016.ipynb
©2016	IBM	Corpora6on		
	
Using Python Packages
•  Write	code	inline	within	cells	
•  Encapsulate	helper	APIs	within	Python	package	
•  2	ways	of	using	helper	Python	packages	
–  egg	distribu6on	package:	pip	install	from	PyPi	server	or	file	server	
(e.g.	Github)	
•  Persistent	install	across	sessions	
•  Recommended	in	Produc6on	
–  SparkContext.addPyFile	
•  Easy	addi6on	of	a	python	module	file	
•  Support	mul6ple	module	files	via	zip	format	
•  Recommended	during	development	where	frequent	code	changes	occur	
Manage	egg	packages
©2016	IBM	Corpora6on		
	
Flight Predict Python Package on Github
Setup	script	for	installing	Python	Package	
Flight	Predict	Python	library
©2016	IBM	Corpora6on		
	
Method 1: Install Flight Predict Package
•  Use	pip	to	Install	Flight	Predict	package	
•  Recommended	alterna6ve:	build	egg	distribu6on	package	and	deploy	in	PyPi
©2016	IBM	Corpora6on		
	
Manage Python packages
•  Check	status	
•  Uninstall	package	
Install	packages	via	sc.addPyFile	method
©2016	IBM	Corpora6on		
	
Method 2: Install py modules via sc.addPyFile
•  addPyFile	install	individual	py	modules	and	make	them	available	to	all	executor	
processes	
•  Works	with	modules	in	zipped	files	
Module	containing	apis	for	training	the	models	
Module	containing	apis	for	running	the	models	
Configure	creden6als	for	various	services
©2016	IBM	Corpora6on		
	
Setup credentials and Import required python modules
In	this	step,	we	import	python	modules	that	will	be	needed	throughout	the	notebook	
and	setup	creden6als	to	various	services.	
How	to	get	creden6als	for	Cloudant	and	Weather	
Creden6al	for	Cloudant	NoSQL	Service	
Creden6als	for	Weather	Service
©2016	IBM	Corpora6on		
	
Get Credentials for Cloudant
From	the	app	dashboard,	click	on	Environment	Variables	from	the	les	sidebar
©2016	IBM	Corpora6on		
	
Get Credentials for Weather
Load	training	set	from	Cloudant
©2016	IBM	Corpora6on		
	
Load training set in Spark SQL DataFrame
…	
In	this	step,	we	use	the	cloudant-spark	connector	(hMps://github.com/cloudant-labs/spark-cloudant)	
to	load	data	into	Spark	
Make	sure	to	change	
the	db	name	to	match	
the	one	created	for	
your	training	set	by	
your	ac6vity	(open	the	
Cloudant	dashboard	to	
find	the	name)
©2016	IBM	Corpora6on		
	
Loading data: Behind the scene
Use	Spark	SQL	connector	to	load	data	into	a	DataFrame	
connector	id	
Op6ons	
Cache	data	for	op6mized	reuse	
Create	temp	SQL	Table	
ScaMer	Plot	Visualiza6on
©2016	IBM	Corpora6on		
	
Scatter plot visualization
©2016	IBM	Corpora6on		
	
Visualization api
Create	an	RDD	of	LabeledPoint
©2016	IBM	Corpora6on		
	
Transform into an RDD of LabeledPoint
Use	Spark	SQL	connector	to	load	data	into	a	DataFrame
©2016	IBM	Corpora6on		
	
loadLabeledDataRDD api
Train	Machine	Learning	Models
©2016	IBM	Corpora6on		
	
Machine Learning Algorithms
ConSnuous	Output	 Discrete	Output	
Supervised	Learning	
(require	Ground-Truth)	
•  Regression	
				-	Linear	
				-	Ridge	
				-	Lasso	
				-	Isotonic	
•  Decision	Tree	
•  RandomForest	
•  GradientBoostedTree	
• Classifica6on	
				-	Logis6c	Regression	
				-	SVM	
				-	NaiveBayes	
• Decision	Tree	
• RandomForest	
• GradientBoostedTree	
• K-NN	(available	as	add-on	spark	package)	
Unsupervised	Learning	
(no	Ground-Truth	data	required)	
•  Clustering	
				-	KMeans	
				-	Gaussian	Mixture	
•  Dimensionality	Reduc6on	
				-	PCA	
				-	SVD	
•  FP-Growth	
Train	Logis6c	Regression	Model
©2016	IBM	Corpora6on		
	
Train Logistic Regression Model
Train	Naïve	Bayes	Models
©2016	IBM	Corpora6on		
	
Train NaiveBayes Model
Train	decision	Tree	Model
©2016	IBM	Corpora6on		
	
Train Decision Tree Model
Train	Random	Forest	Model
©2016	IBM	Corpora6on		
	
Train Random Forest Model
Accuracy	Analysis
©2016	IBM	Corpora6on		
	
Naïve Bayes vs Decision Tree
•  Probabilis6c:	compute	the	probability	
of	a	data	instance	to	be	in	a	specific	
class	
•  Assume	that	each	feature	(variable)	is	
independent	from	the	others	
•  Performance	depends	on	the	predic6ve	
nature	of	the	features	(non	predic6ve	
features	will	affect	the	accuracy)	
•  Works	well	with	low	amount	of	training	
data.	Doesn’t	need	all	the	possibili6es	
•  Doesn’t	work	with	categorical	features.	
• Non-Probabilistic: partition the data into
subsets that best describe the variable
• The deeper the tree, the better the model
fits the data
• Watch out for overfiting: need to prune
the tree
• Can handle categorical or continuous
features
• No need for input to be scaled or
standardized: Set you features and go!
• Requires a lot of data covering all
possibilities
©2016	IBM	Corpora6on		
	
Accuracy Analysis of the Machine Learning Models
In	this	sec6on,	we	will	perform	accuracy	analysis	on	the	test	data.	We	will	start	by	
compu6ng	the	accuracy	metrics	for	each	model,	including	the	confusion	matrix.	We	
will	then	use	histogram	chart	to	understand	the	data	distribu6on	and	refine	how	to	
classes	are	computed.	
Data	
Acquisi6on	
Data	
Prepara6on	
Data	Annota6on	
(Ground	Truth)	
Model	
Training	
•  Cleansing	
•  Shaping	
•  Enrichment	
Model	Tes6ng	
Training	
Set	
Test	
Set	
Blind	
Set	
Iterative
Cross-Validation
Evaluate Performance and optimize model
Train Model
Deploy	and	
Run	Model
©2016	IBM	Corpora6on		
	
Agenda
•  Pre-requisite	steps	to	be	completed	before	
the	session	
•  Flight	Predict	app	descrip6on	and	architecture	
•  Train	the	models	in	the	Notebook	
•  Accuracy	Analysis	and	models	refinement	
•  Deploy	and	run	the	models
©2016	IBM	Corpora6on		
	
Load Test data
Make	sure	to	change	
the	db	name	to	match	
the	one	created	for	
your	test	set	by	your	
ac6vity	(open	the	
Cloudant	dashboard	to	
find	the	name)
©2016	IBM	Corpora6on		
	
Accuracy Metrics
©2016	IBM	Corpora6on		
	
Confusion Matrix
©2016	IBM	Corpora6on		
	
Confusion Matrix
©2016	IBM	Corpora6on		
	
Confusion Matrix
©2016	IBM	Corpora6on		
	
Confusion Matrix
©2016	IBM	Corpora6on		
	
Accuracy metrics API
Output	HTML	
Display	results	HTML	in	Notebook	Cell	
Compute	Metrics	from	labeled	and	predic6on	data	
Get	the	confusion	matrix	and	build	html	table
©2016	IBM	Corpora6on		
	
Understand the distribution of your data with Histograms
©2016	IBM	Corpora6on		
	
Training Handler class
•  Provide	flexibility	and	extensibility	to	the	
applica6on	
•  Provide	a	fail	fast	and	try	something	else	
mechanism	
•  Enable	user	to	easily	customize	classes	of	data	
based	on	how	data	is	distributed	
•  Enable	user	to	easily	add	training	features
©2016	IBM	Corpora6on		
	
Default Training Handler class
Return	descrip6on	for	each	classes	
Return	total	number	of	classes:	Default	is	5	
Re-classify	a	record:	default	uses	
s.classifica6on	field	in	Json	record	
Extra	features	Names	to	be	added.	None	by	default	
Extra	features	to	be	added.	Array	must	match	the	
one	returned	by	customTrainingFeaturesNames
©2016	IBM	Corpora6on		
	
Customize Training Handler
Provide	new	classifica6on	and	add	day	of	departure	as	a	new	feature	
Inherit	from	defaultTrainingHandler	
Add	day	of	the	week	using	a	technique	
called	dummy	coding
©2016	IBM	Corpora6on		
	
Re-train the models
©2016	IBM	Corpora6on		
	
Re-compute accuracy
Models	1	
Models	2	
BeMer	accuracy	for	NaiveBayes	
and	Logis6c	Regression	
Worse	for	DecisionTree	and	
RandomForest
©2016	IBM	Corpora6on		
	
Agenda
•  Pre-requisite	steps	to	be	completed	before	
the	session	
•  Flight	Predict	app	descrip6on	and	architecture	
•  Train	the	models	in	the	Notebook	
•  Accuracy	Analysis	and	models	refinement	
•  Deploy	and	run	the	models
©2016	IBM	Corpora6on		
	
Deploy and Run the models
In	the	last	sec6on,	we	will	simulate	deployment	and	running	of	the	models	
through	the	notebook	by	calling	APIs	from	the	run	package.	
Data	
Acquisi6on	
Data	
Prepara6on	
Data	Annota6on	
(Ground	Truth)	
Model	
Training	
•  Cleansing	
•  Shaping	
•  Enrichment	
Model	Tes6ng	
Training	
Set	
Test	
Set	
Blind	
Set	
Iterative
Cross-Validation
Evaluate Performance and optimize model
Train Model
Deploy	and	
Run	Models
©2016	IBM	Corpora6on		
	
Run the predictive model
©2016	IBM	Corpora6on		
	
runModel API
©2016	IBM	Corpora6on		
	
Get Weather Predictions
©2016	IBM	Corpora6on		
	
Show prediction results
©2016	IBM	Corpora6on		
	
Resource
•  hMps://developer.ibm.com/clouddataservices/	
•  hMps://github.com/ibm-cds-labs/simple-data-pipe	
•  hMps://github.com/ibm-cds-labs/pipes-connector-flightstats	
•  hMp://spark.apache.org/docs/latest/mllib-guide.html	
•  hMps://console.ng.bluemix.net/data/analy6cs/
©2016	IBM	Corpora6on		
	
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
Flink Forward
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai Wähner
 

Was ist angesagt? (20)

Shalini Agarwal, LinkedIn. Engineering excellence: marathon, not a sprint
Shalini Agarwal, LinkedIn. Engineering excellence: marathon, not a sprintShalini Agarwal, LinkedIn. Engineering excellence: marathon, not a sprint
Shalini Agarwal, LinkedIn. Engineering excellence: marathon, not a sprint
 
APEX Interactive Grid API Essentials: The Stuff You Will Really Use
APEX Interactive Grid API Essentials:  The Stuff You Will Really UseAPEX Interactive Grid API Essentials:  The Stuff You Will Really Use
APEX Interactive Grid API Essentials: The Stuff You Will Really Use
 
E fw b4rbr62uiizvvipyb_cannell_lowcodelowdown_apex_vbcs
E fw b4rbr62uiizvvipyb_cannell_lowcodelowdown_apex_vbcsE fw b4rbr62uiizvvipyb_cannell_lowcodelowdown_apex_vbcs
E fw b4rbr62uiizvvipyb_cannell_lowcodelowdown_apex_vbcs
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019   FaaS composition using Kafka and CloudEventsServerless London 2019   FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEvents
 
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...
 
Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...
Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...
Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...
 
Bridging the Gap Between Datasets and DataFrames
Bridging the Gap Between Datasets and DataFramesBridging the Gap Between Datasets and DataFrames
Bridging the Gap Between Datasets and DataFrames
 
"It’s not only Lambda! Economics behind Serverless" at Serverless Architectur...
"It’s not only Lambda! Economics behind Serverless" at Serverless Architectur..."It’s not only Lambda! Economics behind Serverless" at Serverless Architectur...
"It’s not only Lambda! Economics behind Serverless" at Serverless Architectur...
 
[Collinge] Office 365 Enterprise Network Connectivity Using Published Office ...
[Collinge] Office 365 Enterprise Network Connectivity Using Published Office ...[Collinge] Office 365 Enterprise Network Connectivity Using Published Office ...
[Collinge] Office 365 Enterprise Network Connectivity Using Published Office ...
 
Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...
Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...
Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...
 
DEVELOPING SHAREPOINT FRAMEWORK SOLUTIONS FOR THE ENTERPRISE
DEVELOPING SHAREPOINT FRAMEWORK SOLUTIONS FOR THE ENTERPRISEDEVELOPING SHAREPOINT FRAMEWORK SOLUTIONS FOR THE ENTERPRISE
DEVELOPING SHAREPOINT FRAMEWORK SOLUTIONS FOR THE ENTERPRISE
 
What's New in Toolkits for IBM Streams V4.1
What's New in Toolkits for IBM Streams V4.1What's New in Toolkits for IBM Streams V4.1
What's New in Toolkits for IBM Streams V4.1
 
Apache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel IndustryApache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel Industry
 
Forge - DevCon 2016: Developing & Deploying Secure, Scalable Applications on ...
Forge - DevCon 2016: Developing & Deploying Secure, Scalable Applications on ...Forge - DevCon 2016: Developing & Deploying Secure, Scalable Applications on ...
Forge - DevCon 2016: Developing & Deploying Secure, Scalable Applications on ...
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
 
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
 
ECS19 - Johan Delimon - Keep your Skype for Business Hybrid working like a ch...
ECS19 - Johan Delimon - Keep your Skype for Business Hybrid working like a ch...ECS19 - Johan Delimon - Keep your Skype for Business Hybrid working like a ch...
ECS19 - Johan Delimon - Keep your Skype for Business Hybrid working like a ch...
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open source
 
Deploying in a Cloud First World
Deploying in a Cloud First WorldDeploying in a Cloud First World
Deploying in a Cloud First World
 
ECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data ConnectECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
 

Andere mochten auch

Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015
Jongwook Woo
 
Halko_santafe_2015
Halko_santafe_2015Halko_santafe_2015
Halko_santafe_2015
Nathan Halko
 
Preso spark leadership
Preso spark leadershipPreso spark leadership
Preso spark leadership
sjoerdluteyn
 

Andere mochten auch (20)

GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™
 
Pixie dust overview
Pixie dust overviewPixie dust overview
Pixie dust overview
 
Data Science with Spark - Training at SparkSummit (East)
Data Science with Spark - Training at SparkSummit (East)Data Science with Spark - Training at SparkSummit (East)
Data Science with Spark - Training at SparkSummit (East)
 
Scala meetup - Intro to spark
Scala meetup - Intro to sparkScala meetup - Intro to spark
Scala meetup - Intro to spark
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015
 
Halko_santafe_2015
Halko_santafe_2015Halko_santafe_2015
Halko_santafe_2015
 
Spark, the new age of data scientist
Spark, the new age of data scientistSpark, the new age of data scientist
Spark, the new age of data scientist
 
Performance
PerformancePerformance
Performance
 
Spark - Philly JUG
Spark  - Philly JUGSpark  - Philly JUG
Spark - Philly JUG
 
Preso spark leadership
Preso spark leadershipPreso spark leadership
Preso spark leadership
 
Spark introduction - In Chinese
Spark introduction - In ChineseSpark introduction - In Chinese
Spark introduction - In Chinese
 
Apache Spark with Scala
Apache Spark with ScalaApache Spark with Scala
Apache Spark with Scala
 
Spark the next top compute model
Spark   the next top compute modelSpark   the next top compute model
Spark the next top compute model
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Scala in practice
Scala in practiceScala in practice
Scala in practice
 
An Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark MeetupAn Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark Meetup
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)
A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)
A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)
 

Ähnlich wie Spark tutorial pycon 2016 part 1

Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
IBM Cloud Data Services
 

Ähnlich wie Spark tutorial pycon 2016 part 1 (20)

Platform Showcase: Making the Ultimate Live Demo, by Gabriel Michaud
Platform Showcase: Making the Ultimate Live Demo, by Gabriel MichaudPlatform Showcase: Making the Ultimate Live Demo, by Gabriel Michaud
Platform Showcase: Making the Ultimate Live Demo, by Gabriel Michaud
 
Rock the activity stream api
Rock the activity stream api Rock the activity stream api
Rock the activity stream api
 
A301 ctu madrid2016-monitoring
A301 ctu madrid2016-monitoringA301 ctu madrid2016-monitoring
A301 ctu madrid2016-monitoring
 
Integrating SAP with codeBeamer ALM for Traceability and Data Consistency
Integrating SAP with codeBeamer ALM for Traceability and Data ConsistencyIntegrating SAP with codeBeamer ALM for Traceability and Data Consistency
Integrating SAP with codeBeamer ALM for Traceability and Data Consistency
 
Scribe online 01 best practices for sol performance
Scribe online 01   best practices for sol performanceScribe online 01   best practices for sol performance
Scribe online 01 best practices for sol performance
 
Joel Oleson: Business Process Automation Made Easy in SharePoint and Office 365
Joel Oleson: Business Process Automation Made Easy in SharePoint and Office 365Joel Oleson: Business Process Automation Made Easy in SharePoint and Office 365
Joel Oleson: Business Process Automation Made Easy in SharePoint and Office 365
 
How to deploy machine learning models into production
How to deploy machine learning models into productionHow to deploy machine learning models into production
How to deploy machine learning models into production
 
Facilitez votre transition DevOps grâce à l'automatisation de votre infras...
 Facilitez votre transition DevOps grâce à l'automatisation de votre infras... Facilitez votre transition DevOps grâce à l'automatisation de votre infras...
Facilitez votre transition DevOps grâce à l'automatisation de votre infras...
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...
Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...
Building Serverless Applications That Align with Twelve-Factor Methods - AWS ...
 
rapidMATION Webinar: The future of work: humans and software bots working tog...
rapidMATION Webinar: The future of work: humans and software bots working tog...rapidMATION Webinar: The future of work: humans and software bots working tog...
rapidMATION Webinar: The future of work: humans and software bots working tog...
 
Forge - DevCon 2016: Building Value-Added Integrations with Autodesk’s IoT APIs
Forge - DevCon 2016: Building Value-Added Integrations with Autodesk’s IoT APIsForge - DevCon 2016: Building Value-Added Integrations with Autodesk’s IoT APIs
Forge - DevCon 2016: Building Value-Added Integrations with Autodesk’s IoT APIs
 
Continuous Integration and Continuous Delivery Best Practices for Building Mo...
Continuous Integration and Continuous Delivery Best Practices for Building Mo...Continuous Integration and Continuous Delivery Best Practices for Building Mo...
Continuous Integration and Continuous Delivery Best Practices for Building Mo...
 
Getting started with Microsoft Flow
Getting started with Microsoft FlowGetting started with Microsoft Flow
Getting started with Microsoft Flow
 
Measure and Increase Developer Productivity with Help of Serverless at Server...
Measure and Increase Developer Productivity with Help of Serverless at Server...Measure and Increase Developer Productivity with Help of Serverless at Server...
Measure and Increase Developer Productivity with Help of Serverless at Server...
 
Spsct15 power shell_csom - amit vasu
Spsct15 power shell_csom - amit vasuSpsct15 power shell_csom - amit vasu
Spsct15 power shell_csom - amit vasu
 
Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
 
How to build and deploy app on Replit
How to build and deploy app on ReplitHow to build and deploy app on Replit
How to build and deploy app on Replit
 
Adobe Ask the AEM Community Expert Session Oct 2016
Adobe Ask the AEM Community Expert Session Oct 2016Adobe Ask the AEM Community Expert Session Oct 2016
Adobe Ask the AEM Community Expert Session Oct 2016
 
Ti 1217 extend and surround your Adobe DX solutions with IBM Software
Ti 1217 extend and surround your Adobe DX solutions with IBM SoftwareTi 1217 extend and surround your Adobe DX solutions with IBM Software
Ti 1217 extend and surround your Adobe DX solutions with IBM Software
 

Kürzlich hochgeladen

Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
RafigAliyev2
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptx
benishzehra469
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 

Kürzlich hochgeladen (20)

How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptx
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 

Spark tutorial pycon 2016 part 1