Moving from BI to AI : For decision makers

zekeLabs
Moving from BI to AI
Learning made Simpler !
www.zekeLabs.com

Agenda
● Workflow with common BI tools
● Limitations of BI tools
● Black box Introduction to Machine Learning
● Machine Learning, Deep Learning & AI
● Machine Learning Pipeline
● Adopting Machine Learning in your Product : Use cases
● Challenges in adopting Machine Learning
● Open Source Options
● Cost Optimization

What and Why of BI
● Data is core to business strategy to gain competitive advantage
● BI has grown from a decision support system to a decisive factor
● BI gives the first hand look on business health
● Answers the “What” and “Where” of Business
○ Critical to run operations
○ Important to build tactical decision
Descriptive Inquisitive Predictive Prescriptive

How BI is done today?
ETL
Reporting Server
Visualization
BI
Administration
Iterations
OLAP Systems :
Data Marts/Lakes
OLTP Systems
Web
Mobile

Current BI Implementation - 3 Approaches
Web
Technologies
Based BI
Self Service
BI
Hybrid
BI

BI Approaches - Self Service BI
● Self Service BI (Tableau, Qlikview, PowerBI, JasperSoft)
○ Pros :
■ Business Analyst Friendly
■ Quick Turnaround time
■ Quick changes and fixes pretty easy to do
○ Cons :
■ Less Customization opportunities
■ Major changes require incremental development cycles
■ Least Flexible from a developer perspective since most of the solutions are
available as out-of-the-box tools offered by third party products

BI Approaches - Web Technologies based BI
● Web Technologies based BI (D3, FusionCharts, HighCharts etc.)
○ Pros :
■ Excellent visuals possible
■ Fine grained customizations possible
■ No limit to kind of visualizations, integrations with third party libraries
■ Uses the modern web technologies HTML5, CSS3, Javascript based approach
○ Cons :
■ Big Turnaround time
■ Even simple customizations or fixes need to go through full development cycle
■ Highly skilled web development skill sets needed

BI Approaches - Hybrid BI
● Hybrid BI (JasperSoft, Qlikview)
○ Pros :
■ Self service BI
■ Third party integrations with few supported charting libraries to extend capabilies
■ Libraries available to build on the fly reports (dynamic reports)
○ Cons :
■ Big Turn around time
■ Even simple customizations or fixes need to go through full development cycle
■ Highly skilled web development skill sets needed

Current Gaps of BI Approaches
Essence of BI is visual decision making
○ 2D visuals is the best a human can perceive
Answers from a BI system comes with a considerable delay
○ Delay would mean loss of money as well as opportunity
○ Business does not wish to wait to fetch its own data
Dashboards are static until next change
○ User interactivity and interest drop significantly after first few hits (especially for strategic
dashboards)
Change Management is expensive (cost, time and effort wise)

Future of BI - Embrace AI
1. BI is about depth. EDA is still a forte of humans. Machines are good at repeating
tasks at unparalleled speed
2. AI provides the scale and speed which humans currently can’t offer
3. AI offers promise to close in the process delay between the business questions and
answer
4. AI provides an opportunity to transfer the BI talent of an enterprise to invest time on
learning new skills of AI and spend quality time of data exploration rather than
doing repeat BI work
5. AI offers innovative solutions for user interactivity which can make a dashboard as
easy to use as a personal assistant (voice, text driven BI)

What is not Machine Learning ?
● Rule Based Approach
● Legacy Systems

Learning Algorithm
What is Machine Learning ?
● Solve prediction problem
Input Data
● Logic is learned from examples & not by rules
Training Data
Prediction Function
or
Trained Model

Types of Machine Learning
Machine Learning
ReinforcementUnsupervisedSupervised
Task Driven Data Driven Environment Driven

Spam Mail Detection
● Input - Mail
● Output - Spam or Ham
● Supervised Machine Learning,
● Binary Classification Problem

● Input - Sensor Data
● Output - Failure time
● Regression Problem
Predicting Lift Failure

● Input - Accident details
● Output - Insurance amount
● Regression Problem
Predicting Insurance Amount

● Input - Patient Synopsis (fever,
temperature, BP, etc. )
● Output - Diagnosis
● Multi-class classification Problem
Medical Diagnosis

Question - What is common between them ?

Market Segmentation
● Input - Customer Details
● Output - Clusters
● Unsupervised Machine Learning,
● Clustering Problem

Robot playing Football
● Input - Player information,
Rewards
● Output - Action to score
● Reinforcement Learning

Machine Learning Pipeline - Business Understanding
● Business understanding includes clarity what you are trying to achieve.
● Machine learning is not possible with small data size
● Consolidating data pipeline to channelize continues flow of data.
● Web scraping, data lakes access, REST etc.

Machine Learning Pipeline - Data Wrangling
● Production data is never clean.
● It needs a major effort ( around 70% of total effort ) to make it ready for next stage
● Transforming & mapping data from raw format to another format ready for next stage

Machine Learning Pipeline - Data Visualization
● Visualization makes it easy to grasp difficult concepts
● Find useful pattern in the data
● Interactively drill down into charts for deeper details

Vectors - Fixed length array of numbers
● Text documents
● Image files
● CSV
● Audio
● Video
● Time Series data
● Many more ...
Machine Learning Pipeline - Data Preprocessing
Feature Extraction

Machine Learning Pipeline - Model Training
Learning Algorithm
Regression/Trees/SVM/Naiv
e Bayes/Neural Networks/
Prediction Function
or
Trained Model

● Linear Regression
● Logistic Regression
● Naive Bayes
● Nearest Neighbors
● Decision Trees
● Ensemble Methods
● Clustering
● Support Vector Machines
● Neural Networks
● CNN
● RNN
● GAN
Machine Learning Pipeline - Learning Algorithms

Prediction
Prediction Function
or
Trained Model

Machine Learning Pipeline - Model Validation
● Training different learning method will give you different trained model.
● Also, each model have huge possibilities of configuration (hyper-parameters).
● Finding the best model among all possibilities & best configuration for it is done as a part
of Model Validation.
● If results are not satisfactory, one has to go back in the chain & fix a few things

Machine Learning Pipeline - Deployment
Trained Model
Or
Interface Model
Consumers RESTful Interface

Business Intelligence vs Machine Learning
Image Sourced from DataRobot
● BI is about deriving not-so-complex pattern
from historical data
● ML can find complex patterns in high
volume of data
● ML is about predicting future based on past
data
● ML can be automated

Break - Let’s meet in 5 minutes.

Adopting Machine Learning - Real Stories

1. Reduce manual
effort of classifying
reviews.
2.Channelizing data
from Web server to
Analytics Engine.
1. Getting
data ready for
visualization.
2. Historical
data shows
past trends.
Visualization
of trend
Text needs to
be tokenized
& vectorized
Different
models were
trained.
Naive Bayes,
SGD Classifier
Choose the
best model
with best
hyper-
parameter
Naive Bayes
(MultinomialNB)
was chosen & put
in deployment
1. Customer Service Industry
● Manually labeled data is used for training model.
● Labels are target & review are feature data
● Batch training is supported by MultinomialNB allowing incremental learning
● Any mis-classification done by model will be labelled right & fed again

2. Fast Query Chatbots
1. Reduce manual effort
understanding the text
query
2. Waiting for BI has a
long turnaround time
3. We are trying to do this
using chatbot
1. Getting data
ready for
visualization.
2. Historical
data shows
past trends
Visualization
of trend of
text & sql
Text cannot
be used for
ML
Needs to be
tokenized &
vectorized
Deep learning
models with
different layer
configuration
Choosing the
best model
with best
hyper-
parameter
Model with best
config was chosen
& put in
deployment
● Convert natural language query to SQL Query
● Model is trained with historical text (feature) & SQL (target)
● The generated SQL was executed & Output was subjected to visualization libraries
● Anybody without database & infra understanding can get visualization in seconds

Challenges of Adopting Machine Learning

Data & Security
● Volume of data - Machine learning
on smaller data is infeasible.
● Accessibility of data - Important
data is not accessible & may be in
encrypted format.

Infrastructure for development
● Finding the best model is an iterative
process.
● More experiments leads better model.
● Hyper-parameter Tuning
● Scaled infrastructure for developer is
important.

Infrastructure for deployment
● Speedy Deployment.
● Easy deployment
● Fluctuating Demand.
● Need of Elastic infrastructure.
● Cost optimization.

Talent Acquisition
● Upskill your current team ?

Overcoming the challenges - Getting started

Choose a Good Programming Language

Why Python makes life easy ?
● Easy to learn for ETL developers
● Integrates very well with other technologies
● Full-stack development -
○ Dashboard using bokeh,
○ Web application using django,
○ Machine learning models using scikit,
○ Scaling using PySpark

Choose appropriate Libraries
- Statistical Modeling & Data Processing

- Visualization

- Machine Learning or Deep Learning

Choosing between
Machine Learning or Deep learning

What is Deep Learning ?
● Specialized Learning Technique
● Rather than we choosing features for learning, this technique finds
important feature derivatives.
● Objective is to learn best derived features for prediction.
● It mimics the way our brain learns
● Very useful for natural language, computer vision, audio, video etc.

Do you always need Deep Learning ?
● More data is required for Deep Learning
● More Compute Power
● Models less interpretable
“Don’t kill a mosquito with a cannon ball”
Don’t use Deep Learning if you don’t need to

Cost optimization:
● Use Open Source alternatives
● Infrastructure optimization
● Don’t reinvent the wheel

Infrastructure Optimization
Monolithic or Serverless

Monolithic Infrastructure - Preallocated Infra
Model Training
● Developers request access
whenever required
● Might incur delay in peak
working hours.
● Idle in non-working hours
Model Interfacing
● Idle in non-peak hours.
● May fall short in spikes.
● Pay even if infra is not used

Serverless Infrastructure - Elastic Allocation
Model Training
● No-preallocation
● Pay only for what you use
● Absolute no idle time for infra
● No wait time for developers
Model Interfacing
● Allocate infra only when required
● Scales down during non-peak
hours
● Improved customer experience
even in peak hours

Serverless Infrastructure Solutions
● Open Function as a Service (OpenFaas)
● AWS Lambda
● Google Cloud Function
● Azure Function

Container based CI/CD for ML/AI application

Distributed Machine Learning using Spark
● Apache Spark is a distributed data
processing framework.
● Many machine learning algorithms are
implemented in Spark.
● Most of the API’s are same that of scikit-
learn
● Scaled ETL & Machine Learning can be done
using Spark

Other alternatives
Google Cloud AI

Repositories
● https://github.com/zekelabs/machine-learning-for-beginners
● https://github.com/zekelabs/tensorflow-tutorial/
● Dog breed prediction -
https://www.edyoda.com/resources/watch/54AEA4CDC35394F1183A9D
D17AA47/
● Python learning course -
https://www.edyoda.com/resources/videolisting/98/

Visit : www.zekeLabs.com for more details
THANK YOU
Let us know how can we help your organization to Upskill the
employees to stay updated in the ever-evolving IT Industry.
Get in touch:
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com

Moving from BI to AI : For decision makers

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Moving from BI to AI : For decision makers

Ähnlich wie Moving from BI to AI : For decision makers (20)

Mehr von zekeLabs Technologies

Mehr von zekeLabs Technologies (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Moving from BI to AI : For decision makers