1. ML made easy
jss 2011-05-19
Thursday, May 19, 2011
2. Google Prediction API
• The announced subject of this session
• RESTful machine learning service
• Limits: no access to models (or any internals), max. 100 MB training
data, max. 40k predictions/day (100 in free tier)
• No fun for serious use
• Might work well for ppl w/o background in ML
Thursday, May 19, 2011
3. Still:
Simple, unified API to access range of ML algorithms plus measures
and infrastructure for parameter search
would be good thing to have. Enter:
Thursday, May 19, 2011
4. scikits.learn
• Python module for machine learning,
built on scipy & numpy
• Started in 2007 as GSoC, main contrib
by INRIA
Thursday, May 19, 2011
5. Features
• Solid: Supervised learning: Support Vector Machines, Generalized
Linear Models
• Work in progress: Unsupervised learning: Clustering, Gaussian
mixture models, manifold learning, ICA, Gaussian Processes
• Planed: Gaussian graphical models, matrix factorization
Thursday, May 19, 2011
6. Back End
• Own Numpy/SciPy implementations
• C/C++ modules (liblinear & libsvm)
• Cython (linear models not covered w/ liblinear)
• Multi-processing
Thursday, May 19, 2011
7. Docs
• In-depth RST documentation
• Interfaces, Narrative, Method Background, Practical Tips
• Lots of examples
• Active community & mailing list
• Developer: optimization, conventions, etc.
Thursday, May 19, 2011
8. API
clf = Classifier(kernel=‘rbf’) clf is a (pickel-able)
model object
clf.fit(X, y)
clf.predict(y2) same API for all
ML techniques
Thursday, May 19, 2011
9. Full Example
from scikits.learn.svm import SVC
from scikits.learn.metrics import classification_report
from numpy import array
X = array([[1, 1, 1], [1, 0, 1], [0, 1, 1], [0, 0, 1], ..])
y = array([0, 1, 1, 0, ..])
N = 4
clf = SVC(kernel='rbf', gamma=1e-4, C=1000)
clf.fit(X[:N], y[:N])
pred = clf.predict(X[N:])
print classification_report(y[N:], pred)
Thursday, May 19, 2011