Big data and machine learning for Businesses

Big Data and Machine
Learning for businesses
By Abdul Wahid

Who AM I
Abdul Wahid
CEO at GleeTech (http://gleetech.com)
• Started working as a Software Engineer in 2005.
• PhD (Computer Science).
• 2016, Victoria University of Wellington
• Help people to develop awesome digital products.
• Passionate about
• Big Data
• Machine Learning
• Tech Startups/Businesses
• Email:abdul@gleetech.com
• Personal site: http://awahid.net

Big Data
• Additional Dimentions
• Complexity
• Multiple sources and data streams
• Variability
• Unpredictable Data flows.
• Social media trending.

Data is the new Oil of the Digital
Economy
Wired 2014

Why Big Data is important
• Data contains information.
• Information leads to insights.
• Insights helps in making better decisions.

Examples - Using data
• Weather Data
• Utility & Manufacturing companies
• Transport & Tourism
• News, Accidents, Census
• Insurance Companies
• Survey, Blogs, Comments, Tweets
• Marketing Analysis
• Customer Segmentation

Examples - Benefits of data
• Sports
• Strategies, Data driven analytics
• Entertainment
• Users opinions, Sentiments
• Retail
• Consumer behavior
• Health Care
• Patient/symptom analysis
• Financial Institutions
• Fraud Detection

How to derive insights from data?

Machine Learning?
“Field of study that gives computers the ability to learn without being explicitly
programmed.” Arthur Samuel (1959).

https://www.linkedin.com/pulse/business-intelligence-its-relationship-big-data-geekstyle

Top Algorithms
• Supervised Learning
• Decision Trees
• Naïve Bayes Classification
• Ordinary Least Squares Regression
• Logistic Regression
• Support Vector Machines
• Ensemble Methods
• Unsupervised Learning
• Clustering Algorithms
• Centroid-based algorithms
• Connectivity-based algorithms
• Density-based algorithms
• Probabilistic
• Dimensionality Reduction
• Neural networks / Deep Learning
• Principal Component Analysis
• Singular Value Decomposition
• Independent Component Analysis

Example
• We want some hypothesis h that predicts whether we will be paid back
1. +a, -c, +i, +e, +o, +u: Y
2. -a, +c, -i, +e, -o, -u: N
3. +a, -c, +i, -e, -o, -u: Y
4. -a, -c, +i, +e, -o, -u: Y
5. -a, +c, +i, -e, -o, -u: N
6. -a, -c, +i, -e, -o, +u: Y
7. +a, -c, -i, -e, +o, -u: N
8. +a, +c, +i, -e, +o, -u: N
• Lots of possible hypotheses: will be paid back if…
• Income is high (wrong on 2 occasions in training data)
• Income is high and no Criminal record (always right in training data)
• (Address is known AND ((NOT Old) OR Unemployed)) OR ((NOT Address is known) AND (NOT
Criminal Record)) (always right in training data)
• Which one seems best? Anything better?

Decision trees
high Income?
yes no
NO
yes no
NO
Criminal record?
YES

Constructing a decision
tree, one step at a time address?
yes no
+a, -c, +i, +e, +o, +u: Y
-a, +c, -i, +e, -o, -u: N
+a, -c, +i, -e, -o, -u: Y
-a, -c, +i, +e, -o, -u: Y
-a, +c, +i, -e, -o, -u: N
-a, -c, +i, -e, -o, +u: Y
+a, -c, -i, -e, +o, -u: N
+a, +c, +i, -e, +o, -u: N
-a, +c, -i, +e, -o, -u: N
-a, -c, +i, +e, -o, -u: Y
-a, +c, +i, -e, -o, -u: N
-a, -c, +i, -e, -o, +u: Y
+a, -c, +i, +e, +o, +u: Y
+a, -c, +i, -e, -o, -u: Y
+a, -c, -i, -e, +o, -u: N
+a, +c, +i, -e, +o, -u: N
criminal? criminal?
-a, +c, -i, +e, -o, -u: N
-a, +c, +i, -e, -o, -u: N
-a, -c, +i, +e, -o, -u: Y
-a, -c, +i, -e, -o, +u: Y
+a, -c, +i, +e, +o, +u: Y
+a, -c, +i, -e, -o, -u: Y
+a, -c, -i, -e, +o, -u: N
+a, +c, +i, -e, +o, -u: N
income?
+a, -c, +i, +e, +o, +u: Y
+a, -c, +i, -e, -o, -u: Y
+a, -c, -i, -e, +o, -u: N
yes no
yes no
yes no Address was
maybe not the
best attribute to
start with…

Starting with a different attribute
• Seems like a much better starting point than address
• Each node almost completely uniform
• Almost completely predicts whether we will be paid back
yes no
+a, -c, +i, +e, +o, +u: Y
-a, +c, -i, +e, -o, -u: N
+a, -c, +i, -e, -o, -u: Y
-a, -c, +i, +e, -o, -u: Y
-a, +c, +i, -e, -o, -u: N
-a, -c, +i, -e, -o, +u: Y
+a, -c, -i, -e, +o, -u: N
+a, +c, +i, -e, +o, -u: N
criminal?
-a, +c, -i, +e, -o, -u: N
-a, +c, +i, -e, -o, -u: N
+a, +c, +i, -e, +o, -u: N
+a, -c, +i, +e, +o, +u: Y
+a, -c, +i, -e, -o, -u: Y
-a, -c, +i, +e, -o, -u: Y
-a, -c, +i, -e, -o, +u: Y
+a, -c, -i, -e, +o, -u: N

Linear Regression
• The task of fitting a
straight line through a
set of points

Logistic Regression
• Measures the relationship
between the categorical
dependent variable and
one or more independent
variables
• Credit Scoring
• Measuring the success
rates of marketing
campaigns
• Predicting the revenues of
a certain product

Support Vector Machine
• Binary classification algorithm
• SVM generates a (N — 1)
dimensional hyperlane to
separate those points into 2
groups.

Singular Value Decomposition
• PCA is actually a simple
application of SVD

Independent Component Analysis
• Revealing hidden factors that
underlie sets of random
variables.
• data variables are assumed to
be linear mixtures of some
unknown latent variables, and
the mixing system is also
unknown.
• The latent variables are
assumed non-gaussian and
mutually independent.
• Images, Documents

https://thinkgrowth.org/artificial-intelligence-and-you-demystifying-the-technology-landscape-21d4d34fbb10

Conclusions
• Data is nothing without insights
• Machine learning is the key for deriving insights from data
• Big Data and Machine Learning has a huge potential
• Mobile First to AI First Approach

Next Steps
• Machine Learning Course
• https://www.coursera.org/learn/machine-learning
• Big Data Course
• https://www.coursera.org/specializations/big-data
• Must know Machine Learning Algorithms
• http://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-
engineers.html
• Machine Learning Startups
• https://angel.co/machine-learning
• Recent Cool Startups
• http://www.informationweek.com/big-data/software-platforms/10-cool-machine-
learning-startups-to-watch/d/d-id/1326571

References
• https://www.slideshare.net/nasrinhussain1/big-data-ppt-
31616290?from_action=save
• http://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-engineers.html
• https://www.slideshare.net/larsga/introduction-to-big-datamachine-learning/40-
Top_10_machine_learning_algs1
• https://www.google.com.pk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0ahUK
Ewjx0_2Q5b7TAhXFPRQKHfiNCV8QFggmMAE&url=http%3A%2F%2Fnlp.cs.rpi.edu%2Fco
urse%2Fspring16%2Flecture10.pptx&usg=AFQjCNFFg6FbId7hqMj1lwbufcfPvb7Wmw
• https://thinkgrowth.org/artificial-intelligence-and-you-demystifying-the-technology-
landscape-21d4d34fbb10
• https://www.linkedin.com/pulse/business-intelligence-its-relationship-big-data-geekstyle

Big data and machine learning for Businesses

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (9)

Ähnlich wie Big data and machine learning for Businesses

Ähnlich wie Big data and machine learning for Businesses (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Big data and machine learning for Businesses