Weitere ähnliche Inhalte Ähnlich wie Machine learning without the PhD - azure ml (20) Mehr von Simon Elliston Ball (11) Kürzlich hochgeladen (20) Machine learning without the PhD - azure ml1. Machine Learning without the
PhD - Azure ML
Simon Elliston Ball
Head of Big Data
@sireb
http://bit.ly/learningAzureML
#learnAzureML
6. Goals of machine learning
Prediction
Explanation
Automation
Anomaly detection
8. Classification
Binary
A or B A or !A
Multi-Class
A, B, C, D, E
Binary Multi-Class
A or !A, ok: B or !B, how about C? …
9. Types of learning
Regression
Clustering
Classification
Natural Language Processing
Deep learning
11. Cleaning
80% of any data scientists time is spent cleaning the data
leaving just 20% to complain about cleaning the data
https://www.flickr.com/photos/derekgavey/4283300990
13. Cleaning
Missing values
Normalization
Scaling
Filtering
https://www.flickr.com/photos/derekgavey/4283300990
Signal Processing
Complex Models
Simple Thresholds
Smoothing
Moving Average
14. Cleaning
Missing values
Normalization
Scaling
Filtering
Meta data and naming
https://www.flickr.com/photos/derekgavey/4283300990
Column Names
Projection
Type Cleanups
15. Splits
Training
Testing
Validation
https://www.flickr.com/photos/tabor-roeder/11606138806
80%
20%
e.g. when Comparing different models
22. Scoring
Apply model to your data
Outputs:
Result
The probability that the result is sort of right
23. Demo time…
Azure Portal -> Machine Learning Studio
New Experiment
https://www.flickr.com/photos/mgdtgd/144569790
24. Evaluating
Is my classifier any good?
False Negative
True Positive
False Positive True Negative
Precision: TP/(TP+FP)
Accuracy
(TP+TN)/(P+N)
25. Evaluating
How far out was I?
Error distance functions:
Mean squared error
Mean absolute error
R2 Coefficient of Determination
https://www.flickr.com/photos/dahlstroms/3945656390
27. R you ready?
Programming Language based on S+
By Statisticians
For statisticians
Many many libraries
Included in Azure ML…
28. R you ready?
abc
abind
actuar
ade4
AdMit
aod
ape
approximator
arm
arules
arulesViz
ash
assertthat
AtelieR
BaBooN
BACCO
BaM
bark
BAS
base
BayesDA
bayesGARCH
bayesm
bayesmix
bayesQR
bayesSurv
Bayesthresh
BayesTree
BayesValidate
BayesX
BayHaz
bbemkr
BCBCSF
BCE
bclust
bcp
BenfordTests
bfp
BH
bisoreg
bit
bitops
BLR
BMA
Bmix
BMS
bnlearn
boa
Bolstad
boot
bootstrap
bqtl
BradleyTerry2
brew
brglm
bspec
bspmma
BVS
cairoDevice
calibrator
car
caret
catnet
caTools
chron
class
cluster
clusterSim
coda
codetools
coin
colorspace
combinat
compiler
corpcor
cslogistic
ctv
cubature
data.table
datasets
date
dclone
deal
Deducer
DeducerExtras
deldir
DEoptimR
deSolve
devtools
dichromat
digest
distrom
dlm
doSNOW
dplyr
DPpackage
dse
e1071
EbayesThresh
ebdbNet
effects
emulator
ensembleBMA
entropy
EvalEst
evaluate
evdbayes
evora
exactLoglinTest
expm
extremevalues
factorQR
faoutlier
fitdistrplus
FME
foreach
forecast
foreign
formatR
Formula
fracdiff
gam
gamlr
gbm
gclus
gdata
gee
genetics
geoR
geoRglm
geosphere
ggmcmc
ggplot2
glmmBUGS
glmnet
gmodels
gmp
gnm
googlePublicDa
ta
googleVis
GPArotation
gplots
graphics
grDevices
gregmisc
grid
gridExtra
growcurves
grpreg
gsubfn
gtable
gtools
gWidgets
gWidgetsRGtk2
haplo.stats
hbsae
hdrcde
heavy
hflights
HH
HI
highr
Hmisc
htmltools
httpuv
httr
IBrokers
igraph
intervals
iplots
ipred
irr
iterators
JavaGD
JGR
kernlab
KernSmooth
KFKSDS
kinship2
kknn
klaR
knitr
ks
labeling
Lahman
lars
lattice
latticeExtra
lava
lavaan
leaps
LearnBayes
limSolve
lme4
lmm
lmPerm
lmtest
locfit
lpSolve
magic
magrittr
mapdata
mapproj
maps
maptools
maptree
markdown
MASS
MasterBayes
Matrix
matrixcalc
MatrixModels
maxent
maxLik
mcmc
MCMCglmm
MCMCpack
memoise
methods
mgcv
mice
microbenchmar
k
mime
minpack.lm
minqa
misc3d
miscF
miscTools
mixtools
mlbench
mlogitBMA
mnormt
MNP
modeltools
mombf
monomvn
monreg
mosaic
MSBVAR
msm
multcomp
multicool
munsell
mvoutlier
mvtnorm
ncvreg
nlme
NLP
nnet
numbers
numDeriv
openNLP
openNLPdata
OutlierDC
OutlierDM
outliers
pacbpred
parallel
partitions
party
PAWL
pbivnorm
pcaPP
permute
pls
plyr
png
polynom
PottsUtils
predmixcor
PresenceAbse
nce
prodlim
profdpm
profileModel
proto
pscl
psych
quadprog
quantreg
qvcalc
R.matlab
R.methodsS3
R.oo
R.utils
R2HTML
R2jags
R2OpenBUGS
R2WinBUGS
ramps
RandomFields
randomForest
RArcInfo
raster
rbugs
RColorBrewer
Rcpp
RcppArmadillo
rcppbugs
RcppEigen
RcppExamples
RCurl
relimp
reshape
reshape2
rgdal
rgeos
rgl
RGraphics
RGtk2
RJaCGH
rjags
rJava
RJSONIO
robComposition
s
robustbase
RODBC
rootSolve
roxygen
roxygen2
rpart
rrcov
rscproxy
RSGHB
RSNNS
RTextTools
RUnit
runjags
Runuran
rworldmap
rworldxtra
SampleSizeMe
ans
SampleSizePro
portions
sandwich
sbgcop
scales
scapeMCMC
scatterplot3d
sciplot
segmented
sem
seriation
setRNG
sgeostat
shapefiles
shiny
SimpleTable
slam
smoothSurv
sna
snow
SnowballC
snowFT
sp
spacetime
SparseM
spatial
spBayes
spdep
spikeslab
splancs
splines
spTimer
stats
stats4
stochvol
stringr
strucchange
stsm
stsm.class
SuppDists
survival
svmpath
tau
tcltk
tcltk2
TeachingDemo
s
tensorA
testthat
textcat
textir
tfplot
tframe
tgp
TH.data
timeDate
tm
tools
translations
tree
tseries
tsfa
tsoutliers
TSP
UsingR
utils
varSelectIP
vcd
vegan
VGAM
VIF
whisker
wordcloud
XLConnect
XML
xtable
xts
yaml
zic
zipfR
zoo
29. R you ready?
Two Data Sets enter.
One Data Set leaves.
(And a chart if you’re lucky)
31. Publish a web service.
https://www.flickr.com/photos/jurvetson/6858583426