SlideShare ist ein Scribd-Unternehmen logo
1 von 7
Downloaden Sie, um offline zu lesen
Start Machine learning programming in
5 simple steps
By Renjith M P
https://www.linkedin.com/in/renjith-m-p-bbb67860/
To start with machine learning, we need to follow five basic steps.
Steps
1. Choose a use case / problem statement :- Define your objective
2. Prepare data to train the system :- for any machine learning project, first your need to train
the system with some data
3. Choose a programming language and useful libraries for machine learning :- Yes, obviously
you need to choose a programming language to implement your machine learning
4. Training and prediction implementation :- Implement your solution using the programming
language that you have selected
5. Evaluate the result accuracy :- validate the results (Based on accuracy results, we could
accept the model or we could fine tune the model with various parameters and improve the
model until we get a satisfactory result )
Warning:
Target Audience : Basic knowledge on python (execute python scripts, install packages etc ) is
mandatory to follow this course.
Lets get into action. We will choose a use case and implement the machine learning for the same.
1. Choose a use case / problem statement
Usecase : Predict the species of iris flower based on the lengths and widths of sepals and
petals .
Iris setosa Iris versicolor Iris virginica
2. Prepare data to train the system
We will be using iris flower data set (https://en.wikipedia.org/wiki/Iris_flower_data_set )
which consist of 150 rows. Each row will have 5 columns
1. sepal length
2. sepal width
3. petal length
4.petal width
5.species of iris plant
out of 150 rows, only 120 rows will be used to train the model and rest will be used to
validate the accuracy of predictions.
3. Choose a programming language and libaries for machine learning
There are quite few options available however the famous once are R & Python.
My choice is Python. Unlike R, Python is a complete language and platform that you can
use for both research and development and to develop production systems
Ecosystem & Libraries
Machine learning needs plenty of numeric computations, data mining, algorithms and
plotting.
Python offers a few ecosystems and libraries for multiple functionalities.One of the
commonly used ecosystem is SciPy,which is a collection of open source software for
scientific computing in Python, which has many packages or libraries.
Out of that please find the list of packages from SciPy ecosystem,that we are going to use
Package Desciption
NumPy The fundamental package for numerical computation. It defines the
numerical array and matrix types and basic operations on them.
MatplotLib a mature and popular plotting package, that provides publication-
quality 2D plotting as well as rudimentary 3D plotting
SciPy Library One of the components of the SciPy stack, providing many numerical
routines
Pandas Providing high-performance, easy to use data structures
sklearn Simple and efficient tools for data mining and data analysis
Accessible to everybody, and reusable in various contexts
Built on NumPy, SciPy, and matplotlib
4. Training, Prediction and validation implementation
4.1. Import libraries (before importing make sure you install them using pip/pip3)
4.2. Load data to train the model
import pandas
import matplotlib.pyplot as plt
from sklearn import model_selection
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import
LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
# Load dataset
url =
"https://raw.githubusercontent.com/renjithmp/machinelearning/maste
r/python/usecases/1_irisflowers/iris.csv"
names = ['sepal-length', 'sepal-width', 'petal-length',
'petal-width', 'class']
dataset = pandas.read_csv(url, names=names)
#print important information about dataset
print(dataset.shape)
print (dataset.head(20))
print (dataset.describe())
print(dataset.groupby('class').size())
#visualize the data
dataset.plot(kind='box', subplots=True, layout=(2,2),
sharex=False, sharey=False)
plt.show()
Explanation :
dataset.plot() function :-
The box plot (a.k.a. box and whisker diagram) is a standardized way of displaying the
distribution of data based on the five number summary: minimum, first quartile, median, third
quartile, and maximum. In the simplest box plot the central rectangle spans the first quartile to the
third quartile (the interquartile range or IQR). A segment inside the rectangle shows the median and
"whiskers" above and below the box show the locations of the minimum and maximum.
In machine learning, it is important to analys the data using different parameters.
Visualize them using plot methods makes it much easier than analyze data in tabular format.
For our use case, we will get below plots for sepal,petal length’s and widths.
4.3. split the data for training and validation
Explanation :-
X_train – training data (120 rows consist of petal ,sepal lengths and widths)
Y_train – training data (120 rows consist of class of plant)
x_validate – validation data (30 rows conist of petal,sepal lengths and widths)
Y_train -validation data(30 rows consist of class of plant)
4.4.Train few models using training data
Lets use X_train and Y_train to train few models
models=[]
models.append(('LR',LogisticRegression()))
models.append(('LDA',LinearDiscriminantAnalysis()))
models.append(('KNN',KNeighborsClassifier()))
models.append(('CART',DecisionTreeClassifier()))
models.append(('NB',GaussianNB()))
models.append(('SVM',SVC()))
array=dataset.values
X=array[:,0:4]
Y=array[:,4]
validation_size=0.20
seed=7
scoring='accuracy'
X_train,X_validation,Y_train,Y_validation=model_selection.train_t
est_split(X,Y,test_size=validation_size,random_state=seed)
The explanation of algorithms can be found @ scikit-learn.org e.g
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.ht
ml
I am not covering them here as it need a much deeper explanation. For now, we need to keep in mind that
a model is something that has the capability to learn by it self using the training data and predict the output
for future use cases
Explanation :
Kfold :- it is a very useful function to divide and shuffle the data in dataset.
Here we are dividing the data in to 10 equal parts.
Cross_val_score :– This is the most important step. We are feeding the model with training data (X_train
-input and Y_train -corresponding output ). The method will execute the model and provide accuracy for
each of the fold (remember we used 10 folds)
take the mean and std deviation of 10 fold’s to see the accuracy for the entire training set.
4.5. Choose the best model which seems to be more accurate
As you can see, we have executed 5 different models for the training data (5 different algorithms) and
results shows that (cv_results.mean())
KneighborsClassifier() gives the most accurate results (0.98 or 98 %)
4.6.Predict and validate the results using validation data set
results=[]
names=[]
for name,model in models:
kfold=model_selection.KFold(n_splits=10,random_state=seed)
cv_results=model_selection.cross_val_score(model,X_train,Y_train,c
v=kfold,scoring=scoring)
results.append(cv_results)
names.append(name)
msg="%s: %f (%f)" % (name,cv_results.mean(),cv_results.std())
print(msg)
knn=KNeighborsClassifier()
knn.fit(X_train,Y_train)
predictions=knn.predict(X_validation)
print(accuracy_score(Y_validation,predictions))
Lets choose KNN and find predict the output for validation data
5. Publish results
The accuracy_score() function can be used to see the accuracy of the prediction. In our use case we
can see an accuracy of 0.90 (90%)
You can find the source code here
https://github.com/renjithmp/machinelearning/blob/master/python/usecases/1_irisflowers/
flowerclassprediction.py
Reference
Jason Brownlee article
https://machinelearningmastery.com/machine-learning-in-python-step-by-step/
Scikit
https://scikit-learn.org

Weitere ähnliche Inhalte

Was ist angesagt?

Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Edureka!
 
Feature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceFeature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceVenkat Projects
 
Object Oriented Programming Lab Manual
Object Oriented Programming Lab Manual Object Oriented Programming Lab Manual
Object Oriented Programming Lab Manual Abdul Hannan
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? HackerEarth
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearnPratap Dangeti
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle CompetitionsDataRobot
 
Object Oriented Programming in Matlab
Object Oriented Programming in Matlab Object Oriented Programming in Matlab
Object Oriented Programming in Matlab AlbanLevy
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTrilok Sharma
 
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesScott Clark
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentationHJ van Veen
 
Presentation on BornoNet Research Paper and Python Basics
Presentation on BornoNet Research Paper and Python BasicsPresentation on BornoNet Research Paper and Python Basics
Presentation on BornoNet Research Paper and Python BasicsShibbir Ahmed
 
machine learning
machine learningmachine learning
machine learningMounisha A
 
H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelH2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelSri Ambati
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
 
Machine Learning Fundamentals
Machine Learning FundamentalsMachine Learning Fundamentals
Machine Learning FundamentalsSigOpt
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in RBabu Priyavrat
 

Was ist angesagt? (20)

Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...
 
Feature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceFeature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performance
 
Object Oriented Programming Lab Manual
Object Oriented Programming Lab Manual Object Oriented Programming Lab Manual
Object Oriented Programming Lab Manual
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ?
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions
 
Object Oriented Programming in Matlab
Object Oriented Programming in Matlab Object Oriented Programming in Matlab
Object Oriented Programming in Matlab
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
 
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning Pipelines
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 
Presentation on BornoNet Research Paper and Python Basics
Presentation on BornoNet Research Paper and Python BasicsPresentation on BornoNet Research Paper and Python Basics
Presentation on BornoNet Research Paper and Python Basics
 
machine learning
machine learningmachine learning
machine learning
 
H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelH2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Machine learning
Machine learning Machine learning
Machine learning
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
Machine Learning Fundamentals
Machine Learning FundamentalsMachine Learning Fundamentals
Machine Learning Fundamentals
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in R
 

Ähnlich wie Start machine learning in 5 simple steps

IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET Journal
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Gülden Bilgütay
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxRuby Shrestha
 
House price prediction
House price predictionHouse price prediction
House price predictionSabahBegum
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Analysis using r
Analysis using rAnalysis using r
Analysis using rPriya Mohan
 
employee turnover prediction document.docx
employee turnover prediction document.docxemployee turnover prediction document.docx
employee turnover prediction document.docxrohithprabhas1
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic ResearchMiklos Koren
 
Machine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionMachine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionSiddharth Shrivastava
 
Workshop: Your first machine learning project
Workshop: Your first machine learning projectWorkshop: Your first machine learning project
Workshop: Your first machine learning projectAlex Austin
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningHoa Le
 
Azure machine learning service
Azure machine learning serviceAzure machine learning service
Azure machine learning serviceRuth Yakubu
 
Inteligencia artificial para android como empezar
Inteligencia artificial para android como empezarInteligencia artificial para android como empezar
Inteligencia artificial para android como empezarIsabel Palomar
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceLviv Startup Club
 
Cse 7th-sem-machine-learning-laboratory-csml1819
Cse 7th-sem-machine-learning-laboratory-csml1819Cse 7th-sem-machine-learning-laboratory-csml1819
Cse 7th-sem-machine-learning-laboratory-csml1819HODCSE21
 
MCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiMCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiAmazon Web Services
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...Robert Grossman
 

Ähnlich wie Start machine learning in 5 simple steps (20)

Lecture-6-7.pptx
Lecture-6-7.pptxLecture-6-7.pptx
Lecture-6-7.pptx
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
 
The ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptxThe ABC of Implementing Supervised Machine Learning with Python.pptx
The ABC of Implementing Supervised Machine Learning with Python.pptx
 
House price prediction
House price predictionHouse price prediction
House price prediction
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Analysis using r
Analysis using rAnalysis using r
Analysis using r
 
employee turnover prediction document.docx
employee turnover prediction document.docxemployee turnover prediction document.docx
employee turnover prediction document.docx
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic Research
 
Machine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionMachine Learning - Simple Linear Regression
Machine Learning - Simple Linear Regression
 
Workshop: Your first machine learning project
Workshop: Your first machine learning projectWorkshop: Your first machine learning project
Workshop: Your first machine learning project
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
DS LAB MANUAL.pdf
DS LAB MANUAL.pdfDS LAB MANUAL.pdf
DS LAB MANUAL.pdf
 
Azure machine learning service
Azure machine learning serviceAzure machine learning service
Azure machine learning service
 
Inteligencia artificial para android como empezar
Inteligencia artificial para android como empezarInteligencia artificial para android como empezar
Inteligencia artificial para android como empezar
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
 
Cse 7th-sem-machine-learning-laboratory-csml1819
Cse 7th-sem-machine-learning-laboratory-csml1819Cse 7th-sem-machine-learning-laboratory-csml1819
Cse 7th-sem-machine-learning-laboratory-csml1819
 
MCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiMCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry Pi
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
 

Kürzlich hochgeladen

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 

Kürzlich hochgeladen (20)

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 

Start machine learning in 5 simple steps

  • 1. Start Machine learning programming in 5 simple steps By Renjith M P https://www.linkedin.com/in/renjith-m-p-bbb67860/ To start with machine learning, we need to follow five basic steps. Steps 1. Choose a use case / problem statement :- Define your objective 2. Prepare data to train the system :- for any machine learning project, first your need to train the system with some data 3. Choose a programming language and useful libraries for machine learning :- Yes, obviously you need to choose a programming language to implement your machine learning 4. Training and prediction implementation :- Implement your solution using the programming language that you have selected 5. Evaluate the result accuracy :- validate the results (Based on accuracy results, we could accept the model or we could fine tune the model with various parameters and improve the model until we get a satisfactory result ) Warning: Target Audience : Basic knowledge on python (execute python scripts, install packages etc ) is mandatory to follow this course. Lets get into action. We will choose a use case and implement the machine learning for the same. 1. Choose a use case / problem statement Usecase : Predict the species of iris flower based on the lengths and widths of sepals and petals . Iris setosa Iris versicolor Iris virginica
  • 2. 2. Prepare data to train the system We will be using iris flower data set (https://en.wikipedia.org/wiki/Iris_flower_data_set ) which consist of 150 rows. Each row will have 5 columns 1. sepal length 2. sepal width 3. petal length 4.petal width 5.species of iris plant out of 150 rows, only 120 rows will be used to train the model and rest will be used to validate the accuracy of predictions. 3. Choose a programming language and libaries for machine learning There are quite few options available however the famous once are R & Python. My choice is Python. Unlike R, Python is a complete language and platform that you can use for both research and development and to develop production systems Ecosystem & Libraries Machine learning needs plenty of numeric computations, data mining, algorithms and plotting. Python offers a few ecosystems and libraries for multiple functionalities.One of the commonly used ecosystem is SciPy,which is a collection of open source software for scientific computing in Python, which has many packages or libraries. Out of that please find the list of packages from SciPy ecosystem,that we are going to use Package Desciption NumPy The fundamental package for numerical computation. It defines the numerical array and matrix types and basic operations on them. MatplotLib a mature and popular plotting package, that provides publication- quality 2D plotting as well as rudimentary 3D plotting SciPy Library One of the components of the SciPy stack, providing many numerical routines Pandas Providing high-performance, easy to use data structures sklearn Simple and efficient tools for data mining and data analysis Accessible to everybody, and reusable in various contexts Built on NumPy, SciPy, and matplotlib
  • 3. 4. Training, Prediction and validation implementation 4.1. Import libraries (before importing make sure you install them using pip/pip3) 4.2. Load data to train the model import pandas import matplotlib.pyplot as plt from sklearn import model_selection from sklearn.metrics import classification_report from sklearn.metrics import confusion_matrix from sklearn.metrics import accuracy_score from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.naive_bayes import GaussianNB from sklearn.svm import SVC # Load dataset url = "https://raw.githubusercontent.com/renjithmp/machinelearning/maste r/python/usecases/1_irisflowers/iris.csv" names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class'] dataset = pandas.read_csv(url, names=names) #print important information about dataset print(dataset.shape) print (dataset.head(20)) print (dataset.describe()) print(dataset.groupby('class').size()) #visualize the data dataset.plot(kind='box', subplots=True, layout=(2,2), sharex=False, sharey=False) plt.show()
  • 4. Explanation : dataset.plot() function :- The box plot (a.k.a. box and whisker diagram) is a standardized way of displaying the distribution of data based on the five number summary: minimum, first quartile, median, third quartile, and maximum. In the simplest box plot the central rectangle spans the first quartile to the third quartile (the interquartile range or IQR). A segment inside the rectangle shows the median and "whiskers" above and below the box show the locations of the minimum and maximum. In machine learning, it is important to analys the data using different parameters. Visualize them using plot methods makes it much easier than analyze data in tabular format. For our use case, we will get below plots for sepal,petal length’s and widths.
  • 5. 4.3. split the data for training and validation Explanation :- X_train – training data (120 rows consist of petal ,sepal lengths and widths) Y_train – training data (120 rows consist of class of plant) x_validate – validation data (30 rows conist of petal,sepal lengths and widths) Y_train -validation data(30 rows consist of class of plant) 4.4.Train few models using training data Lets use X_train and Y_train to train few models models=[] models.append(('LR',LogisticRegression())) models.append(('LDA',LinearDiscriminantAnalysis())) models.append(('KNN',KNeighborsClassifier())) models.append(('CART',DecisionTreeClassifier())) models.append(('NB',GaussianNB())) models.append(('SVM',SVC())) array=dataset.values X=array[:,0:4] Y=array[:,4] validation_size=0.20 seed=7 scoring='accuracy' X_train,X_validation,Y_train,Y_validation=model_selection.train_t est_split(X,Y,test_size=validation_size,random_state=seed)
  • 6. The explanation of algorithms can be found @ scikit-learn.org e.g https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.ht ml I am not covering them here as it need a much deeper explanation. For now, we need to keep in mind that a model is something that has the capability to learn by it self using the training data and predict the output for future use cases Explanation : Kfold :- it is a very useful function to divide and shuffle the data in dataset. Here we are dividing the data in to 10 equal parts. Cross_val_score :– This is the most important step. We are feeding the model with training data (X_train -input and Y_train -corresponding output ). The method will execute the model and provide accuracy for each of the fold (remember we used 10 folds) take the mean and std deviation of 10 fold’s to see the accuracy for the entire training set. 4.5. Choose the best model which seems to be more accurate As you can see, we have executed 5 different models for the training data (5 different algorithms) and results shows that (cv_results.mean()) KneighborsClassifier() gives the most accurate results (0.98 or 98 %) 4.6.Predict and validate the results using validation data set results=[] names=[] for name,model in models: kfold=model_selection.KFold(n_splits=10,random_state=seed) cv_results=model_selection.cross_val_score(model,X_train,Y_train,c v=kfold,scoring=scoring) results.append(cv_results) names.append(name) msg="%s: %f (%f)" % (name,cv_results.mean(),cv_results.std()) print(msg) knn=KNeighborsClassifier() knn.fit(X_train,Y_train) predictions=knn.predict(X_validation) print(accuracy_score(Y_validation,predictions))
  • 7. Lets choose KNN and find predict the output for validation data 5. Publish results The accuracy_score() function can be used to see the accuracy of the prediction. In our use case we can see an accuracy of 0.90 (90%) You can find the source code here https://github.com/renjithmp/machinelearning/blob/master/python/usecases/1_irisflowers/ flowerclassprediction.py Reference Jason Brownlee article https://machinelearningmastery.com/machine-learning-in-python-step-by-step/ Scikit https://scikit-learn.org