This document discusses moving from traditional business intelligence (BI) tools to adopting machine learning. It begins with an overview of common BI workflows and their limitations. It then provides introductions to machine learning, deep learning, and artificial intelligence. The machine learning pipeline is explained along with examples of adopting machine learning in products. Challenges of adopting machine learning are discussed as well as cost optimization strategies. Real world use cases are presented and open source options are mentioned.
2. Agenda
● Workflow with common BI tools
● Limitations of BI tools
● Black box Introduction to Machine Learning
● Machine Learning, Deep Learning & AI
● Machine Learning Pipeline
● Adopting Machine Learning in your Product : Use cases
● Challenges in adopting Machine Learning
● Open Source Options
● Cost Optimization
3. What and Why of BI
● Data is core to business strategy to gain competitive advantage
● BI has grown from a decision support system to a decisive factor
● BI gives the first hand look on business health
● Answers the “What” and “Where” of Business
○ Critical to run operations
○ Important to build tactical decision
Descriptive Inquisitive Predictive Prescriptive
4. How BI is done today?
ETL
Reporting Server
Visualization
BI
Administration
Iterations
OLAP Systems :
Data Marts/Lakes
OLTP Systems
Web
Mobile
6. BI Approaches - Self Service BI
● Self Service BI (Tableau, Qlikview, PowerBI, JasperSoft)
○ Pros :
■ Business Analyst Friendly
■ Quick Turnaround time
■ Quick changes and fixes pretty easy to do
○ Cons :
■ Less Customization opportunities
■ Major changes require incremental development cycles
■ Least Flexible from a developer perspective since most of the solutions are
available as out-of-the-box tools offered by third party products
7. BI Approaches - Web Technologies based BI
● Web Technologies based BI (D3, FusionCharts, HighCharts etc.)
○ Pros :
■ Excellent visuals possible
■ Fine grained customizations possible
■ No limit to kind of visualizations, integrations with third party libraries
■ Uses the modern web technologies HTML5, CSS3, Javascript based approach
○ Cons :
■ Big Turnaround time
■ Even simple customizations or fixes need to go through full development cycle
■ Highly skilled web development skill sets needed
8. BI Approaches - Hybrid BI
● Hybrid BI (JasperSoft, Qlikview)
○ Pros :
■ Self service BI
■ Third party integrations with few supported charting libraries to extend capabilies
■ Libraries available to build on the fly reports (dynamic reports)
○ Cons :
■ Big Turn around time
■ Even simple customizations or fixes need to go through full development cycle
■ Highly skilled web development skill sets needed
9. Current Gaps of BI Approaches
Essence of BI is visual decision making
○ 2D visuals is the best a human can perceive
Answers from a BI system comes with a considerable delay
○ Delay would mean loss of money as well as opportunity
○ Business does not wish to wait to fetch its own data
Dashboards are static until next change
○ User interactivity and interest drop significantly after first few hits (especially for strategic
dashboards)
Change Management is expensive (cost, time and effort wise)
10. Future of BI - Embrace AI
1. BI is about depth. EDA is still a forte of humans. Machines are good at repeating
tasks at unparalleled speed
2. AI provides the scale and speed which humans currently can’t offer
3. AI offers promise to close in the process delay between the business questions and
answer
4. AI provides an opportunity to transfer the BI talent of an enterprise to invest time on
learning new skills of AI and spend quality time of data exploration rather than
doing repeat BI work
5. AI offers innovative solutions for user interactivity which can make a dashboard as
easy to use as a personal assistant (voice, text driven BI)
12. What is not Machine Learning ?
● Rule Based Approach
● Legacy Systems
13. Learning Algorithm
What is Machine Learning ?
● Solve prediction problem
Input Data
● Logic is learned from examples & not by rules
Training Data
Prediction Function
or
Trained Model
14. Types of Machine Learning
Machine Learning
ReinforcementUnsupervisedSupervised
Task Driven Data Driven Environment Driven
15. Spam Mail Detection
● Input - Mail
● Output - Spam or Ham
● Supervised Machine Learning,
● Binary Classification Problem
16. ● Input - Sensor Data
● Output - Failure time
● Supervised Machine Learning,
● Regression Problem
Predicting Lift Failure
26. Machine Learning Pipeline - Business Understanding
● Business understanding includes clarity what you are trying to achieve.
● Machine learning is not possible with small data size
● Consolidating data pipeline to channelize continues flow of data.
● Web scraping, data lakes access, REST etc.
27. Machine Learning Pipeline - Data Wrangling
● Production data is never clean.
● It needs a major effort ( around 70% of total effort ) to make it ready for next stage
● Transforming & mapping data from raw format to another format ready for next stage
28. Machine Learning Pipeline - Data Visualization
● Visualization makes it easy to grasp difficult concepts
● Find useful pattern in the data
● Interactively drill down into charts for deeper details
29. Vectors - Fixed length array of numbers
● Text documents
● Image files
● CSV
● Audio
● Video
● Time Series data
● Many more ...
Machine Learning Pipeline - Data Preprocessing
Feature Extraction
30. Machine Learning Pipeline - Model Training
Learning Algorithm
Regression/Trees/SVM/Naiv
e Bayes/Neural Networks/
Prediction Function
or
Trained Model
31. ● Linear Regression
● Logistic Regression
● Naive Bayes
● Nearest Neighbors
● Decision Trees
● Ensemble Methods
● Clustering
● Support Vector Machines
● Neural Networks
● CNN
● RNN
● GAN
Machine Learning Pipeline - Learning Algorithms
33. Machine Learning Pipeline - Model Validation
● Training different learning method will give you different trained model.
● Also, each model have huge possibilities of configuration (hyper-parameters).
● Finding the best model among all possibilities & best configuration for it is done as a part
of Model Validation.
● If results are not satisfactory, one has to go back in the chain & fix a few things
35. Business Intelligence vs Machine Learning
Image Sourced from DataRobot
● BI is about deriving not-so-complex pattern
from historical data
● ML can find complex patterns in high
volume of data
● ML is about predicting future based on past
data
● ML can be automated
40. 1. Reduce manual
effort of classifying
reviews.
2.Channelizing data
from Web server to
Analytics Engine.
1. Getting
data ready for
visualization.
2. Historical
data shows
past trends.
Visualization
of trend
Text needs to
be tokenized
& vectorized
Different
models were
trained.
Naive Bayes,
SGD Classifier
Choose the
best model
with best
hyper-
parameter
Naive Bayes
(MultinomialNB)
was chosen & put
in deployment
1. Customer Service Industry
● Manually labeled data is used for training model.
● Labels are target & review are feature data
● Batch training is supported by MultinomialNB allowing incremental learning
● Any mis-classification done by model will be labelled right & fed again
42. 2. Fast Query Chatbots
1. Reduce manual effort
understanding the text
query
2. Waiting for BI has a
long turnaround time
3. We are trying to do this
using chatbot
1. Getting data
ready for
visualization.
2. Historical
data shows
past trends
Visualization
of trend of
text & sql
Text cannot
be used for
ML
Needs to be
tokenized &
vectorized
Deep learning
models with
different layer
configuration
Choosing the
best model
with best
hyper-
parameter
Model with best
config was chosen
& put in
deployment
● Convert natural language query to SQL Query
● Model is trained with historical text (feature) & SQL (target)
● The generated SQL was executed & Output was subjected to visualization libraries
● Anybody without database & infra understanding can get visualization in seconds
45. Data & Security
● Volume of data - Machine learning
on smaller data is infeasible.
● Accessibility of data - Important
data is not accessible & may be in
encrypted format.
46. Infrastructure for development
● Finding the best model is an iterative
process.
● More experiments leads better model.
● Hyper-parameter Tuning
● Scaled infrastructure for developer is
important.
47. Infrastructure for deployment
● Speedy Deployment.
● Easy deployment
● Fluctuating Demand.
● Need of Elastic infrastructure.
● Cost optimization.
52. Why Python makes life easy ?
● Easy to learn for ETL developers
● Integrates very well with other technologies
● Full-stack development -
○ Dashboard using bokeh,
○ Web application using django,
○ Machine learning models using scikit,
○ Scaling using PySpark
57. What is Deep Learning ?
● Specialized Learning Technique
● Rather than we choosing features for learning, this technique finds
important feature derivatives.
● Objective is to learn best derived features for prediction.
● It mimics the way our brain learns
● Very useful for natural language, computer vision, audio, video etc.
58. Do you always need Deep Learning ?
● More data is required for Deep Learning
● More Compute Power
● Models less interpretable
“Don’t kill a mosquito with a cannon ball”
Don’t use Deep Learning if you don’t need to
59. Cost optimization:
● Use Open Source alternatives
● Infrastructure optimization
● Don’t reinvent the wheel
62. Monolithic Infrastructure - Preallocated Infra
Model Training
● Developers request access
whenever required
● Might incur delay in peak
working hours.
● Idle in non-working hours
Model Interfacing
● Idle in non-peak hours.
● May fall short in spikes.
● Pay even if infra is not used
63. Serverless Infrastructure - Elastic Allocation
Model Training
● No-preallocation
● Pay only for what you use
● Absolute no idle time for infra
● No wait time for developers
Model Interfacing
● Allocate infra only when required
● Scales down during non-peak
hours
● Improved customer experience
even in peak hours
66. Distributed Machine Learning using Spark
● Apache Spark is a distributed data
processing framework.
● Many machine learning algorithms are
implemented in Spark.
● Most of the API’s are same that of scikit-
learn
● Scaled ETL & Machine Learning can be done
using Spark
71. Visit : www.zekeLabs.com for more details
THANK YOU
Let us know how can we help your organization to Upskill the
employees to stay updated in the ever-evolving IT Industry.
Get in touch:
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com