SlideShare a Scribd company logo
1 of 89
Download to read offline
Bohitesh Misra
Co-Founder and Director
Decisiontree Endeavour Pvt Ltd (www.ndtepl.com)
Bohitesh.misra@gmail.com
#ITNEXT100 #Eminent CIOs of India #CIO200
Bohitesh Misra (C) Bohitesh Misra 2
Bohitesh Misra
 Health passports - These are mobile apps that indicate the relative
level of infection risk a person is and whether they can gain access to
buildings, supermarkets, restaurants, public spaces and
transportation. Ex. Aarogya Setu
 Embedded AI - It has the potential to increase the accuracy, insights
and intelligence gained from current and next-generation sensors.
 Responsible AI - purpose is to assist businesses in making more
ethical, balanced business decisions by attempting to reduce bias.
Identify fake news.
 Generative AI - It is the technology most often used for creating “deep
fakes” videos and digital content.
 AI-augmented development - its purpose is to improve the cycle times
of application and DevOps teams in creating high-quality software
faster and more consistently.
(C) Bohitesh Misra 3
Bohitesh Misra
 By 2022, at least 40% of new application development projects will
have artificial intelligence co-developers on the team.
 By 2022, 10% of new vehicles will have autonomous driving
capabilities, compared with less than 1% in 2018.
 By 2030, blockchain will create $3.1 trillion in business value.
 Through 2028, storage, computing and advanced AI and analytics
technologies will expand the capabilities of edge devices.
 By 2022, 100 million consumers will shop in Augmented Reality.
 By 2022, more than 50% of all people collaborating in Industry 4.0
ecosystems will use virtual assistants or intelligent agents to
interact more naturally with their surroundings and with people.
(C) Bohitesh Misra 4
Bohitesh Misra (C) Bohitesh Misra 5
Bohitesh Misra
The Internet of Things (IoT) refers
to the ever-growing network of
physical objects that feature an IP
address, and the communication
that occurs between these objects
and other Internet-enabled devices
and systems.
(C) Bohitesh Misra 6
Bohitesh Misra (C) Bohitesh Misra 7
The Internet of Things
connects all manner of
end-points, a treasure
trove of data
Networks and device
proliferation enable
access to a massive
and growing amount
of traditionally
siloed information
Analytics and
business intelligence
tools empower
decision makers by
extracting and
presenting
meaningful
information in real-
time
IoT Big Data Analytics
(C) Bohitesh Misra 8
Bohitesh Misra (C) Bohitesh Misra 9
Bohitesh Misra 10
▪ There has been enormous data
growth in both commercial and
scientific databases due to
advances in data generation and
collection technologies
▪ New mantra
▪ Gather whatever data you can
whenever and wherever possible.
▪ Expectations
▪ Gathered data will have value either
for the purpose collected or for a
purpose not envisioned.
Computational Simulations
Social Networking: Twitter
Sensor Networks
Traffic Patterns
Cyber Security
E-Commerce
Bohitesh Misra
Big Data is a phrase used to mean a massive volume of both structured and
unstructured data that is so large it is difficult to process using traditional database
and software techniques.
An example of big data might be petabytes
(1,024 terabytes) or exabytes (1,024
petabytes) of data consisting of billions to
trillions of records of millions of people—all
from different sources (e.g. Web, sales,
customer contact center, social media, mobile
data, e-Commerce and so on).
A single Jet engine generates 10+terabytes of data
in 30 minutes of flight time. With many thousand
flights per day, generation of data reaches up to
many Petabytes.
Bohitesh Misra
Application Of Big Data analytics
Homeland
Security
Smarter Healthcare
Integrated and
smart patient care
systems and
processes
Retail & Multi-channel
sales
Highly personalized
customer
experience across
channels and
devices
Telecom
Manufacturing
Intelligent
interconnectivity
across the
enterprise for
enhanced control,
speed and efficiency
Traffic Control
Trading Analytics
Search Quality
Log Analysis
Finance & Banking
Seamless customer
experience across all
banking channels
(C) Bohitesh Misra 12
Bohitesh Misra 13
 Lots of data is being collected and warehoused
◦ Web data
 Yahoo has Peta Bytes of web data
 Facebook has billions of active users
◦ purchases at department/ grocery stores, e-commerce
 Amazon handles millions of visits/day
◦ Bank/Credit Card transactions
 Computers have become cheaper and more powerful
 Competitive Pressure is Strong
◦ Provide better, customized services for an edge (e.g. CRM)
Bohitesh Misra 14
 Data collected and stored at enormous speeds
◦ Remote sensors on a satellite
 NASA archives over petabytes of earth science data / year
◦ Telescopes scanning the skies
 Sky survey data
◦ High-throughput biological data
◦ Scientific simulations
 terabytes of data generated in a few hours
 Data mining helps scientists
◦ in automated analysis of massive datasets
◦ In hypothesis formation
MRI Data from Brain
Sky Survey Data
Surface Temperature of Earth
Bohitesh Misra 15
Improving health care and reducing costs
Finding alternative/ green energy sources
Predicting the impact of climate change
Reducing hunger and poverty by increasing agriculture production
Bohitesh Misra
 Data Mining is Extraction of Knowledge from large volumes of data
that are structured or unstructured.
 Data mining is a potential solution to a big problem facing many
firms : an overabundance of data and a relative dearth of staff,
technology, and time to transform numbers and notes into
meaningful information about existing and prospective customers.
 Alternative names
◦ Knowledge discovery (mining) in databases (KDD), knowledge extraction, data /
pattern analysis, data archeology, data dredging, information harvesting,
business intelligence
 AI refers to the ability of machines to perform cognitive tasks like
thinking, perceiving, learning, problem solving and decision making
Bohitesh Misra 17
 Science
◦ Astronomy, bioinformatics, drug discovery
 Business
◦ CRM (Customer Relationship management), fraud detection, e-commerce,
manufacturing, sports/entertainment, telecom, targeted marketing, health care,
warehouses
 Web:
◦ Search engines, advertising, web and text mining
 Government
◦ Surveillance, crime detection, profiling tax cheaters
Bohitesh Misra 18
Data Mining
Machine
Learning
Statistics
Applications
Algorithm
Pattern
Recognition
High-Performance
Computing
Visualization
Database
Technology
Bohitesh Misra
Bohitesh Misra
 Supervised Learning
 Unsupervised Learning
 Reinforcement Learning
Bohitesh Misra
 Supervised Learning
◦ supervised learning is a learning in which we teach or train the machine using data which
is well labelled that means some data is already tagged with the correct answer.
◦ After that, the machine is provided with a new set of examples(data) so that supervised
learning algorithm analyses the training data (set of training examples) and produces a
correct outcome from labelled data.
◦ suppose you are given a basket filled with different kinds of fruits. Now the first step is to
train the machine with all different fruits one by one
 If shape of object is rounded and depression at top having color Red then it will be labeled as –Apple.
 If shape of object is long curving cylinder having color Green-Yellow then it will be labeled as –Banana.
◦ Now suppose after training the data, you have given a new separate fruit say Banana from
basket and asked to identify it.
◦ Since the machine has already learned the things from previous data and this time have to
use it wisely. It will first classify the fruit with its shape and color and would confirm the
fruit name as BANANA and put it in Banana category.
Bohitesh Misra
 Types:-
• Regression
• Logistic Regression
• Classification
• Naïve Bayes Classifiers
• Decision Trees
• Support Vector Machine
 Advantages:-
• Supervised learning allows collecting data and produce data output from the previous experiences.
• Helps to optimize performance criteria with the help of experience.
• Supervised machine learning helps to solve various types of real-world computation problems.
 Disadvantages:-
• Classifying big data can be challenging.
• Training for supervised learning needs a lot of computation time. So, it requires a lot of time.
Bohitesh Misra
 Unsupervised learning is the training of machine using information that is neither
classified nor labelled and allowing the algorithm to act on that information without
guidance.
 Here the task of machine is to group unsorted information according to similarities,
patterns and differences without any prior training of data.
 For instance, suppose it is given an image having both dogs and cats which have
not seen ever.
 machine has no idea about the features of dogs and cat so we can’t categorize it in
dogs and cats. But it can categorize them according to their similarities, patterns,
and difference
 It allows the model to work on its own to discover patterns and information that
was previously undetected. It mainly deals with unlabelled data.
Bohitesh Misra
 Unsupervised learning classified into two categories of
algorithms:
• Clustering: A clustering problem is where you want to discover the
inherent groupings in the data, such as grouping customers by
purchasing behaviour.
• Association: An association rule learning problem is where you want to
discover rules that describe large portions of your data, such as people
that buy X also tend to buy Y.
Bohitesh Misra 25
 Classification: predicting an item class
 Clustering: finding clusters in data
 Associations: e.g. A & B & C occur frequently
 Visualization: to facilitate human discovery
 Summarization: describing a group
 Deviation Detection: finding changes
 Estimation: predicting a continuous value
 Link Analysis: finding relationships
Bohitesh Misra
Tid Refund Marital
Status
Taxable
Income Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
11 No Married 60K No
12 Yes Divorced 220K No
13 No Single 85K Yes
14 No Married 75K No
15 No Single 90K Yes
10
Milk
Data
Data Mining Tasks
26
Bohitesh Misra
 Find a model for class attribute as a function of the values of
other attributes
27
Tid Employed
Level of
Education
# years at
present
address
Credit
Worthy
1 Yes Graduate 5 Yes
2 Yes High School 2 No
3 No Undergrad 1 No
4 Yes High School 10 Yes
… … … … …
10
Model for predicting credit
worthiness
Class Employed
No Education
Number of
years
No Yes
Graduate
{ High school,
Undergrad }
Yes No
> 7 yrs < 7 yrs
Yes
Number of
years
No
> 3 yr < 3 yr
Predictive Modeling: Classification
Bohitesh Misra
 Classification and label prediction
◦ Construct models (functions) based on some training examples
◦ Describe and distinguish classes or concepts for future prediction
 E.g., classify countries based on (climate), or classify cars based on (gas mileage)
◦ Predict some unknown class labels
 Typical methods
◦ Decision trees, naïve Bayesian classification, support vector machines, neural
networks, rule-based classification, pattern-based classification, logistic regression
 Typical applications:
◦ Credit card fraud detection, direct marketing, classifying stars, diseases, web-
pages
28
Bohitesh Misra
▪ Classifying credit card transactions as legitimate or
fraudulent
▪ Classifying land covers (water bodies, urban areas,
forests, etc.) using satellite data
▪ Categorizing news stories as finance, weather,
entertainment, sports, etc
▪ Identifying intruders in the cyberspace
▪ Predicting tumor cells as benign or malignant
29
Bohitesh Misra
◦ Goal: Predict fraudulent cases in credit card transactions.
◦ Approach:
 Use credit card transactions and the information on its account-
holder as attributes.
 When does a customer buy, what does he buy, how often he pays on
time, etc
 Label past transactions as fraud or fair transactions. This forms the
class attribute.
 Learn a model for the class of the transactions.
 Use this model to detect fraud by observing credit card transactions
on an account.
30
Bohitesh Misra
 Churn prediction for telephone customers
◦ Goal: To predict whether a customer is likely to be lost to a competitor.
◦ Approach:
 Use detailed record of transactions with each of the past and
present customers, to find attributes.
 How often the customer calls, where he calls, what time-of-the
day he calls most, his financial status, marital status, etc.
 Label the customers as loyal or disloyal.
 Find a model for loyalty.
31
Bohitesh Misra
Finding groups of objects such that the objects in a group will be
similar (or related) to one another and different from (or unrelated to)
the objects in other groups
32
Inter-cluster
distances are
maximized
Intra-cluster
distances are
minimized
Clustering
Bohitesh Misra
 Unsupervised learning (i.e., Class label is unknown)
 Group data to form new categories (i.e., clusters), e.g., cluster houses to
find distribution patterns
 Principle: Maximizing intra-class similarity & minimizing interclass
similarity
33
Bohitesh Misra
 K-means clustering
◦ aims to partition n observations
into k clusters in which each
observation belongs to
the cluster with the nearest mean
 Hierarchical clustering
◦ Produces a set of nested clusters
organized as a hierarchical tree
◦ Can be visualized as a dendrogram
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 3
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 4
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 6
Nested Clusters Dendrogram
3 6 4 1 2 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1
2
3
4
5
6
1
2 5
3
4
Bohitesh Misra
 Market Segmentation:
◦ Goal: subdivide a market into distinct subsets of customers where
any subset may conceivably be selected as a market target to be
reached with a distinct marketing mix.
◦ Approach:
 Collect different attributes of customers based on their geographical and
lifestyle related information.
 Find clusters of similar customers.
 Measure the clustering quality by observing buying patterns of customers
in same cluster vs. those from different clusters.
35
Bohitesh Misra
 Given a set of records each of which contain some number of
items from a given collection
◦ Produce dependency rules which will predict occurrence of an item
based on occurrences of other items.
36
TID Items
1 Bread, Coke, Milk
2 Beer, Bread
3 Beer, Coke, Diaper, Milk
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk
Rules Discovered:
{Milk} --> {Coke}
{Diaper, Milk} --> {Beer}
Bohitesh Misra
 Frequent patterns (or frequent itemsets)
◦ What items are frequently purchased together in your Walmart?
 Association, correlation vs. causality
◦ A typical association rule
 Diaper → Beer [0.5%, 75%] (support, confidence)
◦ Are strongly associated items also strongly correlated?
 How to mine such patterns and rules efficiently in large datasets?
37
Bohitesh Misra
 Market-basket analysis
◦ Rules are used for sales promotion, shelf management, and
inventory management
 Medical Informatics
◦ Rules are used to find combination of patient symptoms and test
results associated with certain diseases
38
Bohitesh Misra 39
 An Example Subspace Differential Coexpression Pattern from lung
cancer dataset
Enriched with the TNF/NFB signaling pathway
which is well-known to be related to lung cancer
P-value: 1.4*10-5 (6/10 overlap with the pathway)
Three lung cancer datasets [Bhattacharjee et al. 2001], [Stearman et al. 2005], [Su et al. 2007]
Association Analysis - Applications
Bohitesh Misra
 Outlier analysis
◦ Outlier: A data object that does not comply with the general behavior of the data.
Detect significant deviations from normal behavior
◦ Noise or exception? ― One person’s garbage could be another person’s treasure
◦ Methods: by product of clustering or regression analysis
◦ Useful in fraud detection, rare events analysis
40
Bohitesh Misra 41
 Outlier: A data object that deviates significantly from the normal objects as if it
were generated by a different mechanism
◦ Ex.: Unusual credit card purchase
 Outliers are different from the noise data
◦ Noise is random error or variance in a measured variable
◦ Noise should be removed before outlier detection
 Applications:
◦ Credit card fraud detection
◦ Telecom fraud detection
◦ Customer segmentation
◦ Medical analysis
Bohitesh Misra
42
 Three kinds: global, contextual and collective outliers
 Global outlier (or point anomaly)
◦ Object is Og if it significantly deviates from the rest of the data set
◦ Ex. Intrusion detection in computer networks
◦ Issue: Find an appropriate measurement of deviation
 Contextual outlier (or conditional outlier)
◦ Object is Oc if it deviates significantly based on a selected context
◦ Ex. 80o F in Urbana: outlier? (depending on summer or winter?)
◦ Can be viewed as a generalization of local outliers—whose density significantly deviates
from its local area
◦ Issue: How to define or formulate meaningful context?
Global Outlier
Bohitesh Misra
43
 Collective Outliers
◦ A subset of data objects collectively deviate significantly from the
whole data set, even if the individual data objects may not be
outliers
◦ Applications: E.g., intrusion detection:
 When a number of computers keep sending denial-of-service
packages to each other Collective Outlier
◼ Detection of collective outliers
◼ Consider not only behavior of individual objects, but also that of groups of objects
◼ Need to have the background knowledge on the relationship among data objects, such
as a distance or similarity measure on objects.
◼ A data set may have multiple types of outlier
◼ One object may belong to more than one type of outlier
Bohitesh Misra
 Regression is the measure of the average relationship between two or more
variables in terms of original units of data.
 Predict a value of a given continuous valued variable based on the values of
other variables, assuming a linear or nonlinear model of dependency.
 Independent variable: variable which is used to predict of interest
 Dependent variable: variable which we want to predict
 Examples:
◦ Predicting sales amounts of new product based on advertising expenditure.
◦ Predicting wind velocities as a function of temperature, humidity, air
pressure, etc.
◦ Time series prediction of stock market indices.
Bohitesh Misra
 Regression analysis provides estimates of values of
Dependent Variable (DV) from the values of the IV by means
of device called regression lines.
 It helps in obtaining a measure of error involved in using the
regression lines as a basis of estimation.
 With the help of regression coefficients, we can calculate the
correlation coefficient.
Regression is the attempt to explain the variation in a dependent variable using the variation in
independent variables.
If the independent variable(s) sufficiently explain the variation in the dependent variable, the
model can be used for prediction.
Independent variable (x)
Dependent
variable
Independent variable (x)
Dependent
variable
(y)
The output of a regression is a function that predicts the dependent variable based upon
values of the independent variables.
Simple regression fits a straight line to the data.
y’ = b0 + b1X ± є
b0 (y intercept)
b1 = slope
= ∆y/ ∆x
є
Bohitesh Misra
Logistic Regression is used for a different class of problems known as
classification problems. Here the aim is to predict the group to which the
current object under observation belongs to. It gives a discrete binary
outcome between 0 and 1. A simple example would be whether a person
will vote or not in upcoming elections
How does it work?
LR measures the relationship between the dependent variable (what we
want to predict) and the one or more independent variables, by estimating
probabilities using its underlying logistic functions.
Making predictions?
These probabilities must then be transformed into binary values in order
to actually make a prediction. Logistic function or sigmoid function does it
and its values range between 0 and 1. We can transform into 0 or 1 using
a threshold classifier.
Logistic vs Linear?
Logistic regression gives a discrete outcome, but linear regression gives a
continuous outcome.
Bohitesh Misra
 Mining Methodology
◦ Mining knowledge in multi-dimensional
space
◦ Data mining: An interdisciplinary effort
◦ Handling noise, uncertainty, and
incompleteness of data
◦ Pattern evaluation and pattern- or
constraint-guided mining
 User Interaction
◦ Interactive mining
◦ Incorporation of background knowledge
◦ Presentation and visualization of data
mining results
49
 Efficiency and Scalability
◦ Efficiency and scalability of data mining
algorithms
◦ Parallel, distributed, stream, and
incremental mining methods
 Diversity of data types
◦ Handling complex types of data
◦ Mining dynamic, networked, and global
data repositories
 Data mining and society
◦ Social impacts of data mining
◦ Privacy-preserving data mining
Bohitesh Misra
Bohitesh Misra
51
Bohitesh Misra
Bohitesh Misra
Bohitesh Misra
 Skill #1 : Statistics, Probability,
Hypothesis testing, multivariate analysis
 Skill #2 : Computer science, data
structures, algorithms, parallel
computing, scripting languages-R,
Python and Perl, Cloud computing
 Skill #3: Correlation, Modeling exercises,
Business Understanding and ability to
assess which models are feasible
So what are the skills needed for data scientist?
Bohitesh Misra (C) Bohitesh Misra 56
Bohitesh Misra
 Identify the problem or opportunity.
◦ The importance of customer relationship and understanding the firms goal are more
crucial than understanding the technology.
◦ Always build a bilateral trust and intimacy with consumers.
 Prepare the data
◦ To over come with the hidden agendas in interpreting data that a personnel in an
organisation used to create a bridge between statisticians and the concerned HODs.
 Transform the data into meaningful information
◦ Firms need to establish a clear cut objectives to limit what need to find.
◦ Develop a standardized data recording system.
 Validate the model on different samples
 Fine-tune the model
Bohitesh Misra
Bohitesh Misra
Bohitesh Misra
Bohitesh Misra
Bohitesh Misra
 The Explosive Growth of Data: from terabytes to petabytes
◦ Data collection and data availability
 Automated data collection tools, database systems, Web, computerized
society
◦ Major sources of abundant data
 Business: Web, e-commerce, transactions, stocks, …
 Science: Remote sensing, bioinformatics, scientific simulation, …
 Society and everyone: news, digital cameras, YouTube
 We are drowning in data, but starving for knowledge!
 “Necessity is mother of invention”—Data mining—Automated analysis of
massive data sets
63
Bohitesh Misra
 There is ready availability of large amounts of data with
exponential growth of data,
 Sharp decline in cost of storage, unlimited level of computing
power and bandwidth
 Fact-based decisions have resulted in emergence of self-service
analytics and BI
 Rapid rise in AI investments and Advanced analytics techniques
helped analysts to have access to sophisticated algorithms
64
Bohitesh Misra
 Analytics helps profitability in cross-selling and up-selling to
current customers
 Helps in reducing costs
◦ early payments to suppliers to take advantage of discounts,
◦ Retain cash as long as possible
◦ Use of matrices to find optimal balance
 Helps in detection and prevention of frauds
◦ Sophisticated forensic analytics to find irregularities in financial
transactions
 Helps in extrapolating current trends
65
Bohitesh Misra
Government of India (NITI Aayog) has released the draft National Strategy
on Artificial Intelligence.
Key features are setting up of research centers to foster breakthroughs,
Intellectual Property (IP) protection and continuous re-skilling to keep
talent up-to-date
http://niti.gov.in/writereaddata/files/document_publication/NationalStrat
egy-for-AI-Discussion-Paper.pdf
Major focus of use of analytics in areas like:
 Healthcare
 Agriculture
 Education
 Smart cities and infrastructure
 Transportation
66
Bohitesh Misra
Bohitesh Misra
Machine learning is using data to find patterns & generate business insights.
 User churn prediction - Engaging a customer at right time can help reduce the churn if we know specific
customers are about to churn
 Recommendation engine - Up-selling & cross-selling based on machine learning basket analytics
 Customer segmentation - With statistical segmentations, users can be defined in specific type of users to
better understand of your customer base.
 Marketing Campaign optimisation - To better manage marketing budget, one need to analyse which
campaign doing well and why.
 Product inventory optimisation - with the demand prediction, business can be lean enough to reduce storage
& waiting costs for various products
 Dynamic deal scoring – help to price smartly
68
Bohitesh Misra
Bohitesh Misra
Bohitesh Misra
 Crop and Soil Monitoring – Companies are leveraging sensors and various IoT-based technologies
to monitor crop and soil health. Using Deep Learning for Image Analysis, Agricultural Product
Grading, Alerts on Crop Infestation
 Predictive Agricultural Analytics – Various AI and machine learning tools are being used to predict
the optimal time to sow seeds, get alerts on risks from pest attacks, and more.
 Supply Chain Efficiencies – Companies are using real-time data analytics on data-streams coming
from multiple sources to build an efficient and smart supply chain.
 Image Recognition for Soil Science - Use of AI and machine learning to predict pest and disease,
forecast commodity prices for better price realizations and recommends products to farmers
 Minimum Support Price estimation - Use of AI and Machine learning to predict MSP for various
crops in real time estimate.
71
Bohitesh Misra
Price optimization allow retailers to consider factors
such as:
•Yield prediction using ML
•Competition
•Weather (IMD), Satellite imagery
•Season
•Special events / holidays
•Macroeconomic variables, farm machinery,
•Operating costs, Input cost - seed, Fertilizer,
pesticides
•Warehouse information (FCI), cold storage
to determine:
•The initial price
•The best price
•The discount price
•The promotional price
•MSP for major crops
Using Dimension reduction, Naïve Bayes Algorithm
which is a Machine Learning Classification technique
e-National Agriculture Market
Soil Health Card
mKisan Portal
Multivariate agricultural commodity MSP price
forecasting model
Directorate of Marketing & Inspection, Ministry of
Agriculture
Bohitesh Misra
(AI Bots to help evaluate live player performance using IoT)
73
Bohitesh Misra
 Facebook can predict break-ups?
 http://www.huffingtonpost.com/2014/02/14/facebook-
relationship-study_n_4784291.html
(C) Bohitesh Misra 74
Bohitesh Misra
https://www.theguardian.com/technology/2016/jul/05/google-deepmind-nhs-machine-learning-blindness
(C) Bohitesh Misra 75
Bohitesh Misra
 http://karpathy.github.io/2015/10/25/selfie/
(C) Bohitesh Misra 76
Bohitesh Misra
 http://www.wired.co.uk/article/crimeradar-rio-app-predict-crime
(C) Bohitesh Misra 77
Bohitesh Misra
https://interestingengineering.com/ai-camera-mistakes-referees-bald-head-for-ball-ruining-
game-for-viewers
AI robot cameras, who are trained to follow the ball, kept mistaking a referee's bald head with
the ball. Hilarious ☺
(C) Bohitesh Misra 78
Bohitesh Misra
~1 billion cameras worldwide by 2020
 30 billion inferences/sec
Tesla P40: 2,500 inferences/sec @
720P
 AI City needs ~10M P40 servers
1B cameras by 2020
Bohitesh Misra
Real time Surveillance
2.Warning/Comparison Zone:
real-time display of the current
pictured people v.s surveillance
people status.
1.Real-time Surveillance Zone:real-time
display of the monitoring screen.
3.Pictured Display Zoon:real-time
display the pictured photos.
4.Menu Zone:
Capability to
complete real-time
surveillance,
picture/inquiry,
police
notification/inquiry
and database
management.
Bohitesh Misra
Scene Parsing Crowd Density
Analysis
Crowd Tracking Search by Face
Face
Recognition
License
Recognition
Pedestrian
Detection
People
Counting
Face Alignment
Face Detection
Vehicle Model
Recognition
Vehicle
Detection
81
Bohitesh Misra
AI analytics to detect COVID violations in high-traffic public places
Camera based AI system detects:
•Intrusive monitoring of
temperature of people
entering into any premise
•detects through its AI based
algorithms whether person is
wearing a face cover (mask)
•detects social distancing
between people
Bohitesh Misra
MONITORING SUSPICIOUS BEHAVIOUR
Bohitesh Misra
Use of Data Science by Zomato
https://analyticsindiamag.com/the-amazing-way-zomato-uses-data-science-for-success/
AIM:
Driving commercial and operational efficiencies such as for logistics
optimisation, call centre/driver fleet capacity planning, delivery time
prediction, ad delivery, supply prioritization which are some of the key
areas
Process:
Zomato team uses Scala pipeline which ingests data from S3 and performs
ETL operations needed for machine learning algorithms. “Most of the
machine learning modelling happens in Python and leverages scale
transformed historic raw data as input. The model once finalised is then
set up as a service in production, deployed on dedicated servers as
dockerized REST APIs using Elastic Beanstalk/ECS.
84
Bohitesh Misra
 Example – “Havells Adonia” in @HavellsIndia
85
Bohitesh Misra 86
Bohitesh Misra
Areas where AI is most likely to be exploited
• Physical
• Remote-controlled car crashes - The biggest concern involves AI being used to
carry out physical attacks on humans, such as hacking into self-driving cars to
cause major collisions.
• Digital
• Sophisticated phishing - In the future, attempts to access sensitive and
personal information from an individual could be carried out by AI almost
entirely. “These attacks may use AI systems to complete certain tasks more
successfully than any human could,”
• Political
• Manipulating public opinion - Fake news and fake videos generated by bots
and AI could have a big impact on public opinion, disrupting all layers of
society, from politics to media. The use of social media bots spreading fake
news was already a reality during the 2016 US presidential campaign.
AI could threaten our world
Bohitesh Misra
https://analyticsindiamag.com/how-fintech-
firm-xpay-life-is-taking-ai-ml-to-rural-india/
https://yourstory.com/2019/12/startup-rural-india-digital-bill-
payments-xpay-life-fintech
http://www.enterpriseitworld.com/guest-talk/xiphias-
holds-the-promise-of-transactional-transparency/
https://www.linkedin.com/pulse/post-covid-19-challenges-opportunities-
indian-industry-misra/
Bohitesh Misra
Bohitesh Misra
Data Science Everywhere
INTERNET & CLOUD
Image Classification
Speech Recognition
Language Translation
Language Processing
Sentiment Analysis
Recommendation
MEDIA &
ENTERTAINMENT
Video Captioning
Video Search
Real Time Translation
AUTONOMOUS
MACHINES
Pedestrian Detection
Lane Tracking
Recognize Traffic Sign
SECURITY & DEFENSE
Face Detection
Video Surveillance
Satellite Imagery
MEDICINE & BIOLOGY
Cancer Cell Detection
Diabetic Grading
Drug Discovery
COVID-19 detection
90
BOHITESH MISRA, PMP
CO-FOUNDER, XIPHIAS XPAY LIFE PVT LTD
BOHITESH.MISRA@GMAIL.COM
#CIO200 #ITNEXT100 #EMINENT CIOS OF INDIA 2019
@bohiteshmisra
/in/bohitesh

More Related Content

What's hot

Cognitive technologies with David Schatsky at Blocks + Bots
Cognitive technologies with David Schatsky at Blocks + BotsCognitive technologies with David Schatsky at Blocks + Bots
Cognitive technologies with David Schatsky at Blocks + BotsAdrienne Debigare
 
Cognitive analytics: What's coming in 2016?
Cognitive analytics: What's coming in 2016?Cognitive analytics: What's coming in 2016?
Cognitive analytics: What's coming in 2016?IBM Analytics
 
Analytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big DataAnalytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big DataDavid Pittman
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overviewoptier
 
Gene Villeneuve - Moving from descriptive to cognitive analytics
Gene Villeneuve - Moving from descriptive to cognitive analyticsGene Villeneuve - Moving from descriptive to cognitive analytics
Gene Villeneuve - Moving from descriptive to cognitive analyticsIBM Sverige
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 
IBM Academy of Technology & Cognitive Computing
IBM Academy of Technology & Cognitive ComputingIBM Academy of Technology & Cognitive Computing
IBM Academy of Technology & Cognitive ComputingNico Chillemi
 
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of thingsBig Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of thingsRamakant Gawande
 
Big data overview external
Big data overview externalBig data overview external
Big data overview externalBrett Colbert
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)Shahbaz Anjam
 

What's hot (20)

Fraud and Risk in Big Data
Fraud and Risk in Big DataFraud and Risk in Big Data
Fraud and Risk in Big Data
 
Cognitive technologies with David Schatsky at Blocks + Bots
Cognitive technologies with David Schatsky at Blocks + BotsCognitive technologies with David Schatsky at Blocks + Bots
Cognitive technologies with David Schatsky at Blocks + Bots
 
Cognitive analytics: What's coming in 2016?
Cognitive analytics: What's coming in 2016?Cognitive analytics: What's coming in 2016?
Cognitive analytics: What's coming in 2016?
 
Analytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big DataAnalytics: The Real-world Use of Big Data
Analytics: The Real-world Use of Big Data
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
McKinsey Big Data Overview
McKinsey Big Data OverviewMcKinsey Big Data Overview
McKinsey Big Data Overview
 
Gene Villeneuve - Moving from descriptive to cognitive analytics
Gene Villeneuve - Moving from descriptive to cognitive analyticsGene Villeneuve - Moving from descriptive to cognitive analytics
Gene Villeneuve - Moving from descriptive to cognitive analytics
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
IBM Academy of Technology & Cognitive Computing
IBM Academy of Technology & Cognitive ComputingIBM Academy of Technology & Cognitive Computing
IBM Academy of Technology & Cognitive Computing
 
IT FUTURE- Big data
IT FUTURE- Big dataIT FUTURE- Big data
IT FUTURE- Big data
 
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of thingsBig Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
Big data overview external
Big data overview externalBig data overview external
Big data overview external
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
Big Data Predictions ebook
Big Data Predictions ebookBig Data Predictions ebook
Big Data Predictions ebook
 
Big data case study collection
Big data   case study collectionBig data   case study collection
Big data case study collection
 
Big data
Big dataBig data
Big data
 
The promise and challenge of Big Data
The promise and challenge of Big DataThe promise and challenge of Big Data
The promise and challenge of Big Data
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 

Similar to Use of data science for startups_Sept 2021

big data and machine learning ppt.pptx
big data and machine learning ppt.pptxbig data and machine learning ppt.pptx
big data and machine learning ppt.pptxNATASHABANO
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data scienceVipul Kalamkar
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial IntelligenceEnes Bolfidan
 
Artificial data in big data analytics 33.pdf
Artificial data in big data analytics 33.pdfArtificial data in big data analytics 33.pdf
Artificial data in big data analytics 33.pdfAditi943522
 
Diksha in big data analytics fgfgydxyuuhf13.pdf
Diksha in big data analytics fgfgydxyuuhf13.pdfDiksha in big data analytics fgfgydxyuuhf13.pdf
Diksha in big data analytics fgfgydxyuuhf13.pdfAditi943522
 
Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptxSamiksha880257
 
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...Dozie Agbo
 
Bda assignment can also be used for BDA notes and concept understanding.
Bda assignment can also be used for BDA notes and concept understanding.Bda assignment can also be used for BDA notes and concept understanding.
Bda assignment can also be used for BDA notes and concept understanding.Aditya205306
 
In the Dark? Understanding Big Data & AI: Talent Acquisition Strategies for 2018
In the Dark? Understanding Big Data & AI: Talent Acquisition Strategies for 2018In the Dark? Understanding Big Data & AI: Talent Acquisition Strategies for 2018
In the Dark? Understanding Big Data & AI: Talent Acquisition Strategies for 2018Yoh Staffing Solutions
 
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxProject 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxstilliegeorgiana
 
In-Depth Data Analytics
In-Depth Data AnalyticsIn-Depth Data Analytics
In-Depth Data AnalyticsYASH GAIKWAD
 
Emerging_Exponential_Technologies[1]_[Autosaved]_[Autosaved][1].pptx
Emerging_Exponential_Technologies[1]_[Autosaved]_[Autosaved][1].pptxEmerging_Exponential_Technologies[1]_[Autosaved]_[Autosaved][1].pptx
Emerging_Exponential_Technologies[1]_[Autosaved]_[Autosaved][1].pptxsahanagowda464633
 
data analytics lecture1.pptx
data analytics lecture1.pptxdata analytics lecture1.pptx
data analytics lecture1.pptxNamrataBhatt8
 
Introduction To Data Science
Introduction To Data Science Introduction To Data Science
Introduction To Data Science PriyaMaurya52
 
Deep learning vs ML vs AI vs DS .pdf
Deep learning vs ML vs AI vs DS .pdfDeep learning vs ML vs AI vs DS .pdf
Deep learning vs ML vs AI vs DS .pdfSudhanshiBakre1
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAkshata Humbe
 

Similar to Use of data science for startups_Sept 2021 (20)

big data and machine learning ppt.pptx
big data and machine learning ppt.pptxbig data and machine learning ppt.pptx
big data and machine learning ppt.pptx
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
What is data science ?
What is data science ?What is data science ?
What is data science ?
 
Artificial data in big data analytics 33.pdf
Artificial data in big data analytics 33.pdfArtificial data in big data analytics 33.pdf
Artificial data in big data analytics 33.pdf
 
Diksha in big data analytics fgfgydxyuuhf13.pdf
Diksha in big data analytics fgfgydxyuuhf13.pdfDiksha in big data analytics fgfgydxyuuhf13.pdf
Diksha in big data analytics fgfgydxyuuhf13.pdf
 
Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptx
 
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
 
Bda assignment can also be used for BDA notes and concept understanding.
Bda assignment can also be used for BDA notes and concept understanding.Bda assignment can also be used for BDA notes and concept understanding.
Bda assignment can also be used for BDA notes and concept understanding.
 
In the Dark? Understanding Big Data & AI: Talent Acquisition Strategies for 2018
In the Dark? Understanding Big Data & AI: Talent Acquisition Strategies for 2018In the Dark? Understanding Big Data & AI: Talent Acquisition Strategies for 2018
In the Dark? Understanding Big Data & AI: Talent Acquisition Strategies for 2018
 
Data Science - NXT Level_Dr.Arun.pdf
Data Science - NXT Level_Dr.Arun.pdfData Science - NXT Level_Dr.Arun.pdf
Data Science - NXT Level_Dr.Arun.pdf
 
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxProject 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
 
In-Depth Data Analytics
In-Depth Data AnalyticsIn-Depth Data Analytics
In-Depth Data Analytics
 
Difference b/w DataScience, Data Analyst
Difference b/w DataScience, Data AnalystDifference b/w DataScience, Data Analyst
Difference b/w DataScience, Data Analyst
 
Emerging_Exponential_Technologies[1]_[Autosaved]_[Autosaved][1].pptx
Emerging_Exponential_Technologies[1]_[Autosaved]_[Autosaved][1].pptxEmerging_Exponential_Technologies[1]_[Autosaved]_[Autosaved][1].pptx
Emerging_Exponential_Technologies[1]_[Autosaved]_[Autosaved][1].pptx
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
data analytics lecture1.pptx
data analytics lecture1.pptxdata analytics lecture1.pptx
data analytics lecture1.pptx
 
Introduction To Data Science
Introduction To Data Science Introduction To Data Science
Introduction To Data Science
 
Deep learning vs ML vs AI vs DS .pdf
Deep learning vs ML vs AI vs DS .pdfDeep learning vs ML vs AI vs DS .pdf
Deep learning vs ML vs AI vs DS .pdf
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 

More from Bohitesh Misra, PMP

Innovation in enterpreneurship_2021
Innovation in enterpreneurship_2021Innovation in enterpreneurship_2021
Innovation in enterpreneurship_2021Bohitesh Misra, PMP
 
Building castles on sand - Project Management in distributed project environment
Building castles on sand - Project Management in distributed project environmentBuilding castles on sand - Project Management in distributed project environment
Building castles on sand - Project Management in distributed project environmentBohitesh Misra, PMP
 
Disruptive technologies - Session 4 - Biochip Digital twin Smart Fabrics
Disruptive technologies - Session 4 - Biochip Digital twin Smart FabricsDisruptive technologies - Session 4 - Biochip Digital twin Smart Fabrics
Disruptive technologies - Session 4 - Biochip Digital twin Smart FabricsBohitesh Misra, PMP
 
Disruptive technologies - Session 3 - Green it_Smartdust
Disruptive technologies - Session 3 - Green it_SmartdustDisruptive technologies - Session 3 - Green it_Smartdust
Disruptive technologies - Session 3 - Green it_SmartdustBohitesh Misra, PMP
 
Disruptive technologies - Session 1 - introduction
Disruptive technologies - Session 1 - introductionDisruptive technologies - Session 1 - introduction
Disruptive technologies - Session 1 - introductionBohitesh Misra, PMP
 
Business analytics why now_what next
Business analytics why now_what nextBusiness analytics why now_what next
Business analytics why now_what nextBohitesh Misra, PMP
 
Internet of Things (IoT) based Solar Energy System security considerations
Internet of Things (IoT) based Solar Energy System security considerationsInternet of Things (IoT) based Solar Energy System security considerations
Internet of Things (IoT) based Solar Energy System security considerationsBohitesh Misra, PMP
 

More from Bohitesh Misra, PMP (8)

Innovation in enterpreneurship_2021
Innovation in enterpreneurship_2021Innovation in enterpreneurship_2021
Innovation in enterpreneurship_2021
 
Building castles on sand - Project Management in distributed project environment
Building castles on sand - Project Management in distributed project environmentBuilding castles on sand - Project Management in distributed project environment
Building castles on sand - Project Management in distributed project environment
 
Disruptive technologies - Session 4 - Biochip Digital twin Smart Fabrics
Disruptive technologies - Session 4 - Biochip Digital twin Smart FabricsDisruptive technologies - Session 4 - Biochip Digital twin Smart Fabrics
Disruptive technologies - Session 4 - Biochip Digital twin Smart Fabrics
 
Disruptive technologies - Session 3 - Green it_Smartdust
Disruptive technologies - Session 3 - Green it_SmartdustDisruptive technologies - Session 3 - Green it_Smartdust
Disruptive technologies - Session 3 - Green it_Smartdust
 
Disruptive technologies - Session 1 - introduction
Disruptive technologies - Session 1 - introductionDisruptive technologies - Session 1 - introduction
Disruptive technologies - Session 1 - introduction
 
Big data and analytics
Big data and analyticsBig data and analytics
Big data and analytics
 
Business analytics why now_what next
Business analytics why now_what nextBusiness analytics why now_what next
Business analytics why now_what next
 
Internet of Things (IoT) based Solar Energy System security considerations
Internet of Things (IoT) based Solar Energy System security considerationsInternet of Things (IoT) based Solar Energy System security considerations
Internet of Things (IoT) based Solar Energy System security considerations
 

Recently uploaded

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Recently uploaded (20)

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

Use of data science for startups_Sept 2021

  • 1. Bohitesh Misra Co-Founder and Director Decisiontree Endeavour Pvt Ltd (www.ndtepl.com) Bohitesh.misra@gmail.com #ITNEXT100 #Eminent CIOs of India #CIO200
  • 2. Bohitesh Misra (C) Bohitesh Misra 2
  • 3. Bohitesh Misra  Health passports - These are mobile apps that indicate the relative level of infection risk a person is and whether they can gain access to buildings, supermarkets, restaurants, public spaces and transportation. Ex. Aarogya Setu  Embedded AI - It has the potential to increase the accuracy, insights and intelligence gained from current and next-generation sensors.  Responsible AI - purpose is to assist businesses in making more ethical, balanced business decisions by attempting to reduce bias. Identify fake news.  Generative AI - It is the technology most often used for creating “deep fakes” videos and digital content.  AI-augmented development - its purpose is to improve the cycle times of application and DevOps teams in creating high-quality software faster and more consistently. (C) Bohitesh Misra 3
  • 4. Bohitesh Misra  By 2022, at least 40% of new application development projects will have artificial intelligence co-developers on the team.  By 2022, 10% of new vehicles will have autonomous driving capabilities, compared with less than 1% in 2018.  By 2030, blockchain will create $3.1 trillion in business value.  Through 2028, storage, computing and advanced AI and analytics technologies will expand the capabilities of edge devices.  By 2022, 100 million consumers will shop in Augmented Reality.  By 2022, more than 50% of all people collaborating in Industry 4.0 ecosystems will use virtual assistants or intelligent agents to interact more naturally with their surroundings and with people. (C) Bohitesh Misra 4
  • 5. Bohitesh Misra (C) Bohitesh Misra 5
  • 6. Bohitesh Misra The Internet of Things (IoT) refers to the ever-growing network of physical objects that feature an IP address, and the communication that occurs between these objects and other Internet-enabled devices and systems. (C) Bohitesh Misra 6
  • 7. Bohitesh Misra (C) Bohitesh Misra 7
  • 8. The Internet of Things connects all manner of end-points, a treasure trove of data Networks and device proliferation enable access to a massive and growing amount of traditionally siloed information Analytics and business intelligence tools empower decision makers by extracting and presenting meaningful information in real- time IoT Big Data Analytics (C) Bohitesh Misra 8
  • 9. Bohitesh Misra (C) Bohitesh Misra 9
  • 10. Bohitesh Misra 10 ▪ There has been enormous data growth in both commercial and scientific databases due to advances in data generation and collection technologies ▪ New mantra ▪ Gather whatever data you can whenever and wherever possible. ▪ Expectations ▪ Gathered data will have value either for the purpose collected or for a purpose not envisioned. Computational Simulations Social Networking: Twitter Sensor Networks Traffic Patterns Cyber Security E-Commerce
  • 11. Bohitesh Misra Big Data is a phrase used to mean a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques. An example of big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of people—all from different sources (e.g. Web, sales, customer contact center, social media, mobile data, e-Commerce and so on). A single Jet engine generates 10+terabytes of data in 30 minutes of flight time. With many thousand flights per day, generation of data reaches up to many Petabytes.
  • 12. Bohitesh Misra Application Of Big Data analytics Homeland Security Smarter Healthcare Integrated and smart patient care systems and processes Retail & Multi-channel sales Highly personalized customer experience across channels and devices Telecom Manufacturing Intelligent interconnectivity across the enterprise for enhanced control, speed and efficiency Traffic Control Trading Analytics Search Quality Log Analysis Finance & Banking Seamless customer experience across all banking channels (C) Bohitesh Misra 12
  • 13. Bohitesh Misra 13  Lots of data is being collected and warehoused ◦ Web data  Yahoo has Peta Bytes of web data  Facebook has billions of active users ◦ purchases at department/ grocery stores, e-commerce  Amazon handles millions of visits/day ◦ Bank/Credit Card transactions  Computers have become cheaper and more powerful  Competitive Pressure is Strong ◦ Provide better, customized services for an edge (e.g. CRM)
  • 14. Bohitesh Misra 14  Data collected and stored at enormous speeds ◦ Remote sensors on a satellite  NASA archives over petabytes of earth science data / year ◦ Telescopes scanning the skies  Sky survey data ◦ High-throughput biological data ◦ Scientific simulations  terabytes of data generated in a few hours  Data mining helps scientists ◦ in automated analysis of massive datasets ◦ In hypothesis formation MRI Data from Brain Sky Survey Data Surface Temperature of Earth
  • 15. Bohitesh Misra 15 Improving health care and reducing costs Finding alternative/ green energy sources Predicting the impact of climate change Reducing hunger and poverty by increasing agriculture production
  • 16. Bohitesh Misra  Data Mining is Extraction of Knowledge from large volumes of data that are structured or unstructured.  Data mining is a potential solution to a big problem facing many firms : an overabundance of data and a relative dearth of staff, technology, and time to transform numbers and notes into meaningful information about existing and prospective customers.  Alternative names ◦ Knowledge discovery (mining) in databases (KDD), knowledge extraction, data / pattern analysis, data archeology, data dredging, information harvesting, business intelligence  AI refers to the ability of machines to perform cognitive tasks like thinking, perceiving, learning, problem solving and decision making
  • 17. Bohitesh Misra 17  Science ◦ Astronomy, bioinformatics, drug discovery  Business ◦ CRM (Customer Relationship management), fraud detection, e-commerce, manufacturing, sports/entertainment, telecom, targeted marketing, health care, warehouses  Web: ◦ Search engines, advertising, web and text mining  Government ◦ Surveillance, crime detection, profiling tax cheaters
  • 18. Bohitesh Misra 18 Data Mining Machine Learning Statistics Applications Algorithm Pattern Recognition High-Performance Computing Visualization Database Technology
  • 20. Bohitesh Misra  Supervised Learning  Unsupervised Learning  Reinforcement Learning
  • 21. Bohitesh Misra  Supervised Learning ◦ supervised learning is a learning in which we teach or train the machine using data which is well labelled that means some data is already tagged with the correct answer. ◦ After that, the machine is provided with a new set of examples(data) so that supervised learning algorithm analyses the training data (set of training examples) and produces a correct outcome from labelled data. ◦ suppose you are given a basket filled with different kinds of fruits. Now the first step is to train the machine with all different fruits one by one  If shape of object is rounded and depression at top having color Red then it will be labeled as –Apple.  If shape of object is long curving cylinder having color Green-Yellow then it will be labeled as –Banana. ◦ Now suppose after training the data, you have given a new separate fruit say Banana from basket and asked to identify it. ◦ Since the machine has already learned the things from previous data and this time have to use it wisely. It will first classify the fruit with its shape and color and would confirm the fruit name as BANANA and put it in Banana category.
  • 22. Bohitesh Misra  Types:- • Regression • Logistic Regression • Classification • Naïve Bayes Classifiers • Decision Trees • Support Vector Machine  Advantages:- • Supervised learning allows collecting data and produce data output from the previous experiences. • Helps to optimize performance criteria with the help of experience. • Supervised machine learning helps to solve various types of real-world computation problems.  Disadvantages:- • Classifying big data can be challenging. • Training for supervised learning needs a lot of computation time. So, it requires a lot of time.
  • 23. Bohitesh Misra  Unsupervised learning is the training of machine using information that is neither classified nor labelled and allowing the algorithm to act on that information without guidance.  Here the task of machine is to group unsorted information according to similarities, patterns and differences without any prior training of data.  For instance, suppose it is given an image having both dogs and cats which have not seen ever.  machine has no idea about the features of dogs and cat so we can’t categorize it in dogs and cats. But it can categorize them according to their similarities, patterns, and difference  It allows the model to work on its own to discover patterns and information that was previously undetected. It mainly deals with unlabelled data.
  • 24. Bohitesh Misra  Unsupervised learning classified into two categories of algorithms: • Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behaviour. • Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.
  • 25. Bohitesh Misra 25  Classification: predicting an item class  Clustering: finding clusters in data  Associations: e.g. A & B & C occur frequently  Visualization: to facilitate human discovery  Summarization: describing a group  Deviation Detection: finding changes  Estimation: predicting a continuous value  Link Analysis: finding relationships
  • 26. Bohitesh Misra Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 11 No Married 60K No 12 Yes Divorced 220K No 13 No Single 85K Yes 14 No Married 75K No 15 No Single 90K Yes 10 Milk Data Data Mining Tasks 26
  • 27. Bohitesh Misra  Find a model for class attribute as a function of the values of other attributes 27 Tid Employed Level of Education # years at present address Credit Worthy 1 Yes Graduate 5 Yes 2 Yes High School 2 No 3 No Undergrad 1 No 4 Yes High School 10 Yes … … … … … 10 Model for predicting credit worthiness Class Employed No Education Number of years No Yes Graduate { High school, Undergrad } Yes No > 7 yrs < 7 yrs Yes Number of years No > 3 yr < 3 yr Predictive Modeling: Classification
  • 28. Bohitesh Misra  Classification and label prediction ◦ Construct models (functions) based on some training examples ◦ Describe and distinguish classes or concepts for future prediction  E.g., classify countries based on (climate), or classify cars based on (gas mileage) ◦ Predict some unknown class labels  Typical methods ◦ Decision trees, naïve Bayesian classification, support vector machines, neural networks, rule-based classification, pattern-based classification, logistic regression  Typical applications: ◦ Credit card fraud detection, direct marketing, classifying stars, diseases, web- pages 28
  • 29. Bohitesh Misra ▪ Classifying credit card transactions as legitimate or fraudulent ▪ Classifying land covers (water bodies, urban areas, forests, etc.) using satellite data ▪ Categorizing news stories as finance, weather, entertainment, sports, etc ▪ Identifying intruders in the cyberspace ▪ Predicting tumor cells as benign or malignant 29
  • 30. Bohitesh Misra ◦ Goal: Predict fraudulent cases in credit card transactions. ◦ Approach:  Use credit card transactions and the information on its account- holder as attributes.  When does a customer buy, what does he buy, how often he pays on time, etc  Label past transactions as fraud or fair transactions. This forms the class attribute.  Learn a model for the class of the transactions.  Use this model to detect fraud by observing credit card transactions on an account. 30
  • 31. Bohitesh Misra  Churn prediction for telephone customers ◦ Goal: To predict whether a customer is likely to be lost to a competitor. ◦ Approach:  Use detailed record of transactions with each of the past and present customers, to find attributes.  How often the customer calls, where he calls, what time-of-the day he calls most, his financial status, marital status, etc.  Label the customers as loyal or disloyal.  Find a model for loyalty. 31
  • 32. Bohitesh Misra Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups 32 Inter-cluster distances are maximized Intra-cluster distances are minimized Clustering
  • 33. Bohitesh Misra  Unsupervised learning (i.e., Class label is unknown)  Group data to form new categories (i.e., clusters), e.g., cluster houses to find distribution patterns  Principle: Maximizing intra-class similarity & minimizing interclass similarity 33
  • 34. Bohitesh Misra  K-means clustering ◦ aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean  Hierarchical clustering ◦ Produces a set of nested clusters organized as a hierarchical tree ◦ Can be visualized as a dendrogram -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 1 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 3 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 4 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 x y Iteration 6 Nested Clusters Dendrogram 3 6 4 1 2 5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 1 2 3 4 5 6 1 2 5 3 4
  • 35. Bohitesh Misra  Market Segmentation: ◦ Goal: subdivide a market into distinct subsets of customers where any subset may conceivably be selected as a market target to be reached with a distinct marketing mix. ◦ Approach:  Collect different attributes of customers based on their geographical and lifestyle related information.  Find clusters of similar customers.  Measure the clustering quality by observing buying patterns of customers in same cluster vs. those from different clusters. 35
  • 36. Bohitesh Misra  Given a set of records each of which contain some number of items from a given collection ◦ Produce dependency rules which will predict occurrence of an item based on occurrences of other items. 36 TID Items 1 Bread, Coke, Milk 2 Beer, Bread 3 Beer, Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Diaper, Milk Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer}
  • 37. Bohitesh Misra  Frequent patterns (or frequent itemsets) ◦ What items are frequently purchased together in your Walmart?  Association, correlation vs. causality ◦ A typical association rule  Diaper → Beer [0.5%, 75%] (support, confidence) ◦ Are strongly associated items also strongly correlated?  How to mine such patterns and rules efficiently in large datasets? 37
  • 38. Bohitesh Misra  Market-basket analysis ◦ Rules are used for sales promotion, shelf management, and inventory management  Medical Informatics ◦ Rules are used to find combination of patient symptoms and test results associated with certain diseases 38
  • 39. Bohitesh Misra 39  An Example Subspace Differential Coexpression Pattern from lung cancer dataset Enriched with the TNF/NFB signaling pathway which is well-known to be related to lung cancer P-value: 1.4*10-5 (6/10 overlap with the pathway) Three lung cancer datasets [Bhattacharjee et al. 2001], [Stearman et al. 2005], [Su et al. 2007] Association Analysis - Applications
  • 40. Bohitesh Misra  Outlier analysis ◦ Outlier: A data object that does not comply with the general behavior of the data. Detect significant deviations from normal behavior ◦ Noise or exception? ― One person’s garbage could be another person’s treasure ◦ Methods: by product of clustering or regression analysis ◦ Useful in fraud detection, rare events analysis 40
  • 41. Bohitesh Misra 41  Outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism ◦ Ex.: Unusual credit card purchase  Outliers are different from the noise data ◦ Noise is random error or variance in a measured variable ◦ Noise should be removed before outlier detection  Applications: ◦ Credit card fraud detection ◦ Telecom fraud detection ◦ Customer segmentation ◦ Medical analysis
  • 42. Bohitesh Misra 42  Three kinds: global, contextual and collective outliers  Global outlier (or point anomaly) ◦ Object is Og if it significantly deviates from the rest of the data set ◦ Ex. Intrusion detection in computer networks ◦ Issue: Find an appropriate measurement of deviation  Contextual outlier (or conditional outlier) ◦ Object is Oc if it deviates significantly based on a selected context ◦ Ex. 80o F in Urbana: outlier? (depending on summer or winter?) ◦ Can be viewed as a generalization of local outliers—whose density significantly deviates from its local area ◦ Issue: How to define or formulate meaningful context? Global Outlier
  • 43. Bohitesh Misra 43  Collective Outliers ◦ A subset of data objects collectively deviate significantly from the whole data set, even if the individual data objects may not be outliers ◦ Applications: E.g., intrusion detection:  When a number of computers keep sending denial-of-service packages to each other Collective Outlier ◼ Detection of collective outliers ◼ Consider not only behavior of individual objects, but also that of groups of objects ◼ Need to have the background knowledge on the relationship among data objects, such as a distance or similarity measure on objects. ◼ A data set may have multiple types of outlier ◼ One object may belong to more than one type of outlier
  • 44. Bohitesh Misra  Regression is the measure of the average relationship between two or more variables in terms of original units of data.  Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency.  Independent variable: variable which is used to predict of interest  Dependent variable: variable which we want to predict  Examples: ◦ Predicting sales amounts of new product based on advertising expenditure. ◦ Predicting wind velocities as a function of temperature, humidity, air pressure, etc. ◦ Time series prediction of stock market indices.
  • 45. Bohitesh Misra  Regression analysis provides estimates of values of Dependent Variable (DV) from the values of the IV by means of device called regression lines.  It helps in obtaining a measure of error involved in using the regression lines as a basis of estimation.  With the help of regression coefficients, we can calculate the correlation coefficient.
  • 46. Regression is the attempt to explain the variation in a dependent variable using the variation in independent variables. If the independent variable(s) sufficiently explain the variation in the dependent variable, the model can be used for prediction. Independent variable (x) Dependent variable
  • 47. Independent variable (x) Dependent variable (y) The output of a regression is a function that predicts the dependent variable based upon values of the independent variables. Simple regression fits a straight line to the data. y’ = b0 + b1X ± є b0 (y intercept) b1 = slope = ∆y/ ∆x є
  • 48. Bohitesh Misra Logistic Regression is used for a different class of problems known as classification problems. Here the aim is to predict the group to which the current object under observation belongs to. It gives a discrete binary outcome between 0 and 1. A simple example would be whether a person will vote or not in upcoming elections How does it work? LR measures the relationship between the dependent variable (what we want to predict) and the one or more independent variables, by estimating probabilities using its underlying logistic functions. Making predictions? These probabilities must then be transformed into binary values in order to actually make a prediction. Logistic function or sigmoid function does it and its values range between 0 and 1. We can transform into 0 or 1 using a threshold classifier. Logistic vs Linear? Logistic regression gives a discrete outcome, but linear regression gives a continuous outcome.
  • 49. Bohitesh Misra  Mining Methodology ◦ Mining knowledge in multi-dimensional space ◦ Data mining: An interdisciplinary effort ◦ Handling noise, uncertainty, and incompleteness of data ◦ Pattern evaluation and pattern- or constraint-guided mining  User Interaction ◦ Interactive mining ◦ Incorporation of background knowledge ◦ Presentation and visualization of data mining results 49  Efficiency and Scalability ◦ Efficiency and scalability of data mining algorithms ◦ Parallel, distributed, stream, and incremental mining methods  Diversity of data types ◦ Handling complex types of data ◦ Mining dynamic, networked, and global data repositories  Data mining and society ◦ Social impacts of data mining ◦ Privacy-preserving data mining
  • 54. Bohitesh Misra  Skill #1 : Statistics, Probability, Hypothesis testing, multivariate analysis  Skill #2 : Computer science, data structures, algorithms, parallel computing, scripting languages-R, Python and Perl, Cloud computing  Skill #3: Correlation, Modeling exercises, Business Understanding and ability to assess which models are feasible So what are the skills needed for data scientist?
  • 55. Bohitesh Misra (C) Bohitesh Misra 56
  • 56. Bohitesh Misra  Identify the problem or opportunity. ◦ The importance of customer relationship and understanding the firms goal are more crucial than understanding the technology. ◦ Always build a bilateral trust and intimacy with consumers.  Prepare the data ◦ To over come with the hidden agendas in interpreting data that a personnel in an organisation used to create a bridge between statisticians and the concerned HODs.  Transform the data into meaningful information ◦ Firms need to establish a clear cut objectives to limit what need to find. ◦ Develop a standardized data recording system.  Validate the model on different samples  Fine-tune the model
  • 61. Bohitesh Misra  The Explosive Growth of Data: from terabytes to petabytes ◦ Data collection and data availability  Automated data collection tools, database systems, Web, computerized society ◦ Major sources of abundant data  Business: Web, e-commerce, transactions, stocks, …  Science: Remote sensing, bioinformatics, scientific simulation, …  Society and everyone: news, digital cameras, YouTube  We are drowning in data, but starving for knowledge!  “Necessity is mother of invention”—Data mining—Automated analysis of massive data sets 63
  • 62. Bohitesh Misra  There is ready availability of large amounts of data with exponential growth of data,  Sharp decline in cost of storage, unlimited level of computing power and bandwidth  Fact-based decisions have resulted in emergence of self-service analytics and BI  Rapid rise in AI investments and Advanced analytics techniques helped analysts to have access to sophisticated algorithms 64
  • 63. Bohitesh Misra  Analytics helps profitability in cross-selling and up-selling to current customers  Helps in reducing costs ◦ early payments to suppliers to take advantage of discounts, ◦ Retain cash as long as possible ◦ Use of matrices to find optimal balance  Helps in detection and prevention of frauds ◦ Sophisticated forensic analytics to find irregularities in financial transactions  Helps in extrapolating current trends 65
  • 64. Bohitesh Misra Government of India (NITI Aayog) has released the draft National Strategy on Artificial Intelligence. Key features are setting up of research centers to foster breakthroughs, Intellectual Property (IP) protection and continuous re-skilling to keep talent up-to-date http://niti.gov.in/writereaddata/files/document_publication/NationalStrat egy-for-AI-Discussion-Paper.pdf Major focus of use of analytics in areas like:  Healthcare  Agriculture  Education  Smart cities and infrastructure  Transportation 66
  • 66. Bohitesh Misra Machine learning is using data to find patterns & generate business insights.  User churn prediction - Engaging a customer at right time can help reduce the churn if we know specific customers are about to churn  Recommendation engine - Up-selling & cross-selling based on machine learning basket analytics  Customer segmentation - With statistical segmentations, users can be defined in specific type of users to better understand of your customer base.  Marketing Campaign optimisation - To better manage marketing budget, one need to analyse which campaign doing well and why.  Product inventory optimisation - with the demand prediction, business can be lean enough to reduce storage & waiting costs for various products  Dynamic deal scoring – help to price smartly 68
  • 69. Bohitesh Misra  Crop and Soil Monitoring – Companies are leveraging sensors and various IoT-based technologies to monitor crop and soil health. Using Deep Learning for Image Analysis, Agricultural Product Grading, Alerts on Crop Infestation  Predictive Agricultural Analytics – Various AI and machine learning tools are being used to predict the optimal time to sow seeds, get alerts on risks from pest attacks, and more.  Supply Chain Efficiencies – Companies are using real-time data analytics on data-streams coming from multiple sources to build an efficient and smart supply chain.  Image Recognition for Soil Science - Use of AI and machine learning to predict pest and disease, forecast commodity prices for better price realizations and recommends products to farmers  Minimum Support Price estimation - Use of AI and Machine learning to predict MSP for various crops in real time estimate. 71
  • 70. Bohitesh Misra Price optimization allow retailers to consider factors such as: •Yield prediction using ML •Competition •Weather (IMD), Satellite imagery •Season •Special events / holidays •Macroeconomic variables, farm machinery, •Operating costs, Input cost - seed, Fertilizer, pesticides •Warehouse information (FCI), cold storage to determine: •The initial price •The best price •The discount price •The promotional price •MSP for major crops Using Dimension reduction, Naïve Bayes Algorithm which is a Machine Learning Classification technique e-National Agriculture Market Soil Health Card mKisan Portal Multivariate agricultural commodity MSP price forecasting model Directorate of Marketing & Inspection, Ministry of Agriculture
  • 71. Bohitesh Misra (AI Bots to help evaluate live player performance using IoT) 73
  • 72. Bohitesh Misra  Facebook can predict break-ups?  http://www.huffingtonpost.com/2014/02/14/facebook- relationship-study_n_4784291.html (C) Bohitesh Misra 74
  • 76. Bohitesh Misra https://interestingengineering.com/ai-camera-mistakes-referees-bald-head-for-ball-ruining- game-for-viewers AI robot cameras, who are trained to follow the ball, kept mistaking a referee's bald head with the ball. Hilarious ☺ (C) Bohitesh Misra 78
  • 77. Bohitesh Misra ~1 billion cameras worldwide by 2020  30 billion inferences/sec Tesla P40: 2,500 inferences/sec @ 720P  AI City needs ~10M P40 servers 1B cameras by 2020
  • 78. Bohitesh Misra Real time Surveillance 2.Warning/Comparison Zone: real-time display of the current pictured people v.s surveillance people status. 1.Real-time Surveillance Zone:real-time display of the monitoring screen. 3.Pictured Display Zoon:real-time display the pictured photos. 4.Menu Zone: Capability to complete real-time surveillance, picture/inquiry, police notification/inquiry and database management.
  • 79. Bohitesh Misra Scene Parsing Crowd Density Analysis Crowd Tracking Search by Face Face Recognition License Recognition Pedestrian Detection People Counting Face Alignment Face Detection Vehicle Model Recognition Vehicle Detection 81
  • 80. Bohitesh Misra AI analytics to detect COVID violations in high-traffic public places Camera based AI system detects: •Intrusive monitoring of temperature of people entering into any premise •detects through its AI based algorithms whether person is wearing a face cover (mask) •detects social distancing between people
  • 82. Bohitesh Misra Use of Data Science by Zomato https://analyticsindiamag.com/the-amazing-way-zomato-uses-data-science-for-success/ AIM: Driving commercial and operational efficiencies such as for logistics optimisation, call centre/driver fleet capacity planning, delivery time prediction, ad delivery, supply prioritization which are some of the key areas Process: Zomato team uses Scala pipeline which ingests data from S3 and performs ETL operations needed for machine learning algorithms. “Most of the machine learning modelling happens in Python and leverages scale transformed historic raw data as input. The model once finalised is then set up as a service in production, deployed on dedicated servers as dockerized REST APIs using Elastic Beanstalk/ECS. 84
  • 83. Bohitesh Misra  Example – “Havells Adonia” in @HavellsIndia 85
  • 85. Bohitesh Misra Areas where AI is most likely to be exploited • Physical • Remote-controlled car crashes - The biggest concern involves AI being used to carry out physical attacks on humans, such as hacking into self-driving cars to cause major collisions. • Digital • Sophisticated phishing - In the future, attempts to access sensitive and personal information from an individual could be carried out by AI almost entirely. “These attacks may use AI systems to complete certain tasks more successfully than any human could,” • Political • Manipulating public opinion - Fake news and fake videos generated by bots and AI could have a big impact on public opinion, disrupting all layers of society, from politics to media. The use of social media bots spreading fake news was already a reality during the 2016 US presidential campaign. AI could threaten our world
  • 88. Bohitesh Misra Data Science Everywhere INTERNET & CLOUD Image Classification Speech Recognition Language Translation Language Processing Sentiment Analysis Recommendation MEDIA & ENTERTAINMENT Video Captioning Video Search Real Time Translation AUTONOMOUS MACHINES Pedestrian Detection Lane Tracking Recognize Traffic Sign SECURITY & DEFENSE Face Detection Video Surveillance Satellite Imagery MEDICINE & BIOLOGY Cancer Cell Detection Diabetic Grading Drug Discovery COVID-19 detection 90
  • 89. BOHITESH MISRA, PMP CO-FOUNDER, XIPHIAS XPAY LIFE PVT LTD BOHITESH.MISRA@GMAIL.COM #CIO200 #ITNEXT100 #EMINENT CIOS OF INDIA 2019 @bohiteshmisra /in/bohitesh