Si è tornato a parlare molto di Machine Learning negli ultimi anni. Grazie anche al fatto che è possibile oggi processare enormi moli di dati in tempi (relativamente) veloci questa parte dell'informatica sta vivendo una seconda giovinezza.
In questa sessione vedremo cos'è il machine learning, quali sono le diverse casistiche tecniche e funzionali in cui può essere usato ed inizieremo a "giocare" con i dati per vedere fin dove possiamo spingerci, usando strumenti On-Premise e quindi spostandoci poi sull'offerta Azure Machine Learning dove, una volta fatta propria la teoria, si possono realizzare soluzioni estremamente complesse in modo molto visuale, oppure integrandosi con R ed IPython e sfruttare la scalabilità di Azure per avere performance ottimali. Il tutto senza dimenticare che gli algoritmi così ottenuti possono essere facilmente integrati nelle nostre applicazioni semplicemente invocando un web service.
3. Microsoft SQL Server MVP
Works with SQL Server from 6.5, on BI from 2003
Specialized in Data Solution Architecture, Database Design,
Performance Tuning, High-Performance Data Warehousing, BI, Big Data
President of UGISS (Italian SQL Server UG)
Regular Speaker @ SQL Server events
Consulting & Training, Mentor @ SolidQ
E-mail: dmauri@solidq.com
Twitter: @mauridb
Blog: http://sqlblog.com/blogs/davide_mauri
Davide Mauri
4. • MACHINE LEARNING, WHAT’S THAT?
• SUPERVISED & UNSUPERVISED METHODS
• TOOL & LANGUAGES
• EXPERIMENTING ON-PREMISES
• IPYTHON & R
• AZURE MACHINE LEARNING
• AZUREML STUDIO
• NOTEBOOKS
• INTEGRATING AZUREML IN CUSTOM APPLICATIONS
• CREATING AZUREML SERVICES WITH PYTHON AND R
6. MACHINE LEARNING
•Algorithms that learn from data
•Nothing really new from a scientific point of view
• "Field of study that gives computers the ability to learn without
being explicitly programmed“ - 1959, Arthur Samuel
•Requires *a lot* of compute power (even for not-so-big-
data)
• Azure, here we come!
7. MACHINE LEARNING
•Very useful for
• Identify unknown and complex pattern
• Identify hidden correlations
• Automatically classify data
• Predict future trend and/or values basing on past knowledge
8. MACHINE LEARNING
•Thanks to the cloud it’s now possible to integrate ML
Algorithms into Line-Of-Business applications
• Choose the algorithm
• Train it
• Expose as a RESTful Web Service
• Call it from you App
• You’re Happy
9. MACHINE LEARNING
•Two main categories (but sometimes are even divided in up
to five categories!)
• Supervised
• Unsupervised
•Supervised: humans (usually) teach to algorithms what is the
expected result
•Unsupervised: algorithms tries to autonomously identify
patterns and rules in given dataset
10. LANGUAGES
•Most common languages used for machine learning
• R
• Python
•Less common but on the rise
• Julia
• Scala
• Go
• Rust
11. TOOLS - PYTHON
•Python Packages
• Scikit-Learn
• SciPy, NumPy, Pandas, Matplotlib, Seaborn
•Jupyter (was: IPython)
• Anaconda
•Microsoft Data Science Virtual Machine
•Pytools for Visual Studio
12. TOOLS - R
•R
•RStudio
•Microsoft Open R Portal
• Microsoft R Open (MRO)
•Microsoft Data Science Virtual Machine
•Anaconda
• https://www.continuum.io/conda-for-r
13. DATASETS
•To learn ML, sample and well-known datasets are needed
•Here some places where nice Datasets can be found
• http://archive.ics.uci.edu/ml/datasets.html
• http://www.kdnuggets.com/datasets/index.html
• http://homepages.inf.ed.ac.uk/rbf/IAPR/researchers/MLPAGES/ml
dat.htm
• https://en.wikipedia.org/wiki/Data_set#Classic_datasets
• https://mran.revolutionanalytics.com/documents/data/
14. IRIS DATASET
•150 instances of Iris Flowers
• 3 classes: Virginica, Versicolor, Setosa
• 4 features: Sepal Width & Length, Petal Width & Length
•One of the most used for educational purposes
• Simple, but….
• Un class is linearly separable
• Other two classes are NOT linearly separable
•Available at UC Irvine Machine Learning Repository
• http://archive.ics.uci.edu/ml/datasets/Iris
18. AZUREML STUDIO
•www.azureml.com
•Azure ML Studio
• Web application (“Workspace”) for developing ML solutions
•Development Process
• Experiment
• Score
• Evaluate
• Publish
19. AZUREML STUDIO
• “Democratize Machine Learning”
• Free Tier Available
• 10 GB Storage Space
• 1h max experiment duration
• Staging Web API
• Standard Tier
• Costs per “Seat”, Studio and API Usage
• https://azure.microsoft.com/en-us/pricing/details/machine-learning/
20. AZUREML STUDIO
•Fully Interactive Environment
•Fully Integrated with Azure Ecosystem, but not only that
• Very easy to use external data sources
•Support Jupyter/IPython Notebooks!
• Even more Interative!