The presentation describes how Machine Learning Algorithms can be automated through a Flask Web API. It represents the effectivity of machine learning automation that would reduce operation time dramatically.
1. DRIVERLESS ML
Automation of ML using Driverless API
Sayantan Ghosh
Kalinga Institute Of Industrial
technology
2. Key Capabilities of Driverless API
1
2
It can produce interactive
graphical visualization using
advanced pygal library.
It can preprocess the dataset
very efficiently.( Ex-It can handle
categorical Data as well as NaN
or missing Values).
It can do Feature Scaling very
efficienty to increase
accuracy and acceptbility.
The API can process
dataset analization in a
very less amount of time.
3
4
It can be used for Binary as well as
Multiclass Classification, Churn
Modeling, Credit Card Fraud
Detection, Marketing Analysis.
It can preprocess the dataset very
efficiently.( Ex-It can handle
categorical Data as well as NaN or
missing Values).
Data Preprocessingt
Visualization
Time Efficient
Feature Scaling
5
6
2
4. Methodology &
Implementations
WorkFlow Diagram of the API
4
Data Collection Stage
.csv Split
Amount Epoch
Featur Selection &
Dimensionality Reduction
Compute the feature Importances and reduce the
dataset using relevant features
Data Preprocessing
(Categorical,Missing Value Handling)
Merging of Classification
Algorithm
All the ML Classifiers are implemented into the dataset
through K-Fold Cross Validation and results are stored.
Analyzation Report &
Visualization of Predicted results
using pygal
All the Categorical datas are One Hot Encoded and
Missing Values are handed using mean values..
At the Input Phase the user will Provide the .csv
file, Split amount of the dataset and the epoch
Count and the optimizer Algorithms.
5. Keras Flask Scikit-Learn
Keras is used for implementing the
Artificial Neural Network.
Flask is used for implementing
the Web API.
Scikit-lEarn is used for Implementing the
overall Classification Algorithms and overall
inn the preprocessing Phase.
TECHNOLOGIES USED
5
Pygal is used for implementing the
visualizations using Support Vector
Graphics
1
4 Pygal
Numpy is used for computing the
numeracal Operations.
5 Numpy5
Pandas is used for Implementing all the
DataFrame processing.
6 Pandas
6. Automation Of Classification Algorithms
6
For the Automation Process I have used 6 Classification Algorithm and each Algorithm is
feed into the K-Fold Cross Validation into 10 Splits.
Accuracy Comparasion of
Classification Algorithms which can
help to choose proper classifiers in
less amount of Time.
K-Fold
Cross
Validation
(10 splits)
7. Result Analysis
On
Various Datasets
Dataset : titanic_train.csv
Target Column : Survived
Split Amount: 0.3
Epoch Count: 100
Optimizer : adam
7
79.01 80.36
73.63
80.26
62.86
82.27
0
10
20
30
40
50
60
70
80
90
Logistic
Regression
KNN Decision
Tree
Random
Forest
Naive
Bayes
SVM
Chart Title
Logistic Regression KNN Decision Tree
Random Forest Naive Bayes SVM
8. Comparative
Accuracy
Analysis of
Classifiers
Dataset :
Breast_Tumor_Classification.csv
Target Column : diagnosis
Split Amount: 0.3
Epoch Count: 100
Optimizer : adam
8
97.36 96.48
92.612
95.96
93.15
97.88
0
20
40
60
80
100
120
Logistic
Regression
KNN Decision
Tree
Random
Forest
Naive
Bayes
SVM
Chart Title
Logistic Regression KNN Decision Tree
Random Forest Naive Bayes SVM
9. Auto-Visualization of
Feature Importance and Data details
The API is proved to analyze and visualize the feature-
Importances much more efficiently.
It is the Feature Importance Report of the titanic
Datset.
9
10. Future Applications
of the API
10
Financial Analysis
and Bank Churn
Model
Business
Modeling
Health Care
Applications
Weather
Prediction
11. CONCLUSION
Machine learning has become one of the main engines of the current era. The
production pipeline of a machine learning models passe through different phases
and stages that require wide knowledge of several available tools, and algorithms.
However, as the scale of data produced daily is increasing continuously at an
exponential scale, it has become essential to automate this process. In this
project, I have covered comprehensively the state-of-the-art research effort in the
domain of Driverless ML frameworks. I have also highlighted research directions
and open challenges that need to be addressed in order to achieve the vision
and goals of the Driverless ML process. I have already built the working API and
currently targeting to integrate Convolution Neural Network to order to automate
disease recognition using Image processing.