SlideShare ist ein Scribd-Unternehmen logo
1 von 56
© 2018 Leap Research. All rights reserved.
Prepared for:
A Guest Lecture at SBM ITB
March 26, 2018
A Guest Lecture at SBM ITB
March 26, 2018
3 Big Data and Big Data Analytics
10 A Word of Caution
14 Big Data Projects
20 Frameworks
25 Analytical Methods in Data Mining
44 Business Applications
49 Epilogue
noun, (used with a singular or plural verb)
1.Computers. data sets, typically consisting of billions or trillions of records, that are so
vast and complex that they require new and powerful computational resources to
process:
Supercomputers can analyze big data to create models of global climate change.
Reference: big data. Dictionary.com. Dictionary.com Unabridged. Random House, Inc.
http://www.dictionary.com/browse/big-data (accessed: January 29, 2018)
Big data is data sets that are so voluminous and complex that traditional data processing
application software are inadequate to deal with them. Big data challenges include
capturing data, data storage, data analysis, search, sharing, transfer, visualization,
querying, updating and information privacy. There are three dimensions to big data
known as Volume, Variety and Velocity.
Reference: https://en.wikipedia.org/wiki/Big_data (accessed: January 29, 2018)
Big data is a term that describes the large volume of data – both structured and
unstructured – that inundates a business on a day-to-day basis. But it’s not the amount
of data that’s important. It’s what organizations do with the data that matters. Big data
can be analyzed for insights that lead to better decisions and strategic business moves.
Reference: https://www.sas.com/en_id/insights/big-data/what-is-big-data.html
(accessed: January 29, 2018)
Regarding big data, there are two views. One view is that of classical statistics, which
considers big as simply not small. Theoretically, big is the sample size after which
asymptotic properties of the method "kick in" for valid results. The other view is that of
contemporary statistics, which considers big with regard to lifting (mathematical
calculation) the observations and learning from the variables.
Reference: Ratner, Bruce. Statistical and Machine-Learning Data Mining: Techniques for
Better Predictive Modeling and Analysis of Big Data. 3rd ed. Boca Raton, FL: CRC Press,
2017
“Big Data” refers to datasets whose size is beyond the ability of typical database
software tools to capture, store, manage and analyze.
Reference: McKinsey Global Institute (MGI), Big Data: The next frontier for innovation,
competition, and productivity, Report, June, 2012
Top recurring themes in the definitions
provided by 43 thought leaders.
Reference: Dutcher, Jennifer. "What Is
Big Data?" University of California
Berkeley School of Information.
September 03, 2014. Accessed March
19, 2018.
https://datascience.berkeley.edu/what-
is-big-data
Source: http://www.ibmbigdatahub.com/infographic/four-vs-big-data
Big data analytics examines large amounts of data to uncover hidden patterns, correlations and other insights.
https://www.sas.com/en_id/insights/analytics/big-data-analytics.html
Big data analytics refers to the strategy of analyzing large volumes of data, or big data. This big data is
gathered from a wide variety of sources, including social networks, videos, digital images, sensors, and sales
transaction records. The aim in analyzing all this data is to uncover patterns and connections that might
otherwise be invisible, and that might provide valuable insights about the users who created it. Through this
insight, businesses may be able to gain an edge over their rivals and make superior business decisions.
https://www.techopedia.com/definition/28659/big-data-analytics
Big Data analytics is the process of collecting, organizing and analyzing large sets of data (called Big Data) to
discover patterns and other useful information. Big Data analytics can help organizations to better
understand the information contained within the data and will also help identify the data that is most
important to the business and future business decisions. Analysts working with Big Data typically want the
knowledge that comes from analyzing the data.
https://www.webopedia.com/TERM/B/big_data_analytics.html
It’s important to remember that the primary value
from big data comes not from the data in its raw form,
but from the processing and analysis of it and the
insights, products, and services that emerge from
analysis. The sweeping changes in big data
technologies and management approaches need to be
accompanied by similarly dramatic shifts in how data
supports decisions and product/service innovation.
Thomas H. Davenport in Big Data in Big Companies
Marketing / Retail
Data mining helps marketing companies build models
based on historical data to predict who will respond to
the new marketing campaigns
Finance / Banking
Data mining gives financial institutions information
about loan information and credit reporting
Manufacturing
By applying data mining in operational engineering
data, manufacturers can detect faulty equipment and
determine optimal control parameters
Government
Data mining helps government agency by digging and
analyzing records of the financial transaction to build
patterns that can detect money laundering or criminal
activities
Privacy Issues
People are afraid of their personal information is
collected and used in an unethical way that potentially
causing them a lot of troubles. While businesses collect
information about their customers in many ways for
understanding their purchasing behaviors trends.
Security Issues
Businesses own information about their employees and
customers including social security number, birthday,
payroll and etc.
Misuse of Information / Inaccurate Information
Information is collected through data mining intended
for the ethical purposes can be misused. This
information may be exploited by unethical people or
businesses to take benefits of vulnerable people or
discriminate against a group of people. (Read
Ombudsman mengingatkan pemerintah agar serius lindungi
data masyarakat, Cambridge Analytica: What is the company
now embroiled in Facebook data controversy?)
Source: http://www.zentut.com/data-mining/advantages-and-disadvantages-of-data-mining/
Source: https://www.forbes.com/sites/bernardmarr/2015/03/17/where-big-data-projects-fail/#72098e0468b5
Not starting with clear business objectives
Not making a good business case
Management failure
Poor communication
Not having the right skills for the job
What people who fall into that trap often failed to
appreciate is that analytics in business in about
problem solving – and first you need to know what
problem you are trying to solve
You need to know why your business needs to use Big
Data before you start doing it. If you don’t – wait until
you do.
Those holding the purse strings haven’t taken into
account some long-term or ongoing cost associated
with the project; the senior project managers just
can’t talk productively to the data scientist workers in
the lab; senior managers don’t trust the algorithms
Big Data in business is about the interface between
the analytical, experimental science that goes on in
data labs, and the profit and target chasing sales force
and boardroom
Companies are often fond of starting up data projects
“left, right and centre” without thinking enough about
how this might impact resources in the future
Launched in 2008, Google Flu Trends (GFT) is Google’s algorithm which mined five years of web logs, containing
hundreds of billions of searches, and created a predictive model utilizing 45 search terms that “proved to be a more
useful and timely indicator [of flu] than government statistics with their natural reporting lags.” (described in the
2013 book Big Data: A Revolution That Will Transform How We Live, Work and Think by Viktor Mayer-Schönberger
and Kenneth Cukier)
In Science, a team of Harvard-affiliated researchers published their findings that GFT has over-estimated the
prevalence of flu for 100 out of the last 108 weeks; it’s been wrong since August 2011. The Science article further
points out that a simplistic forecasting model – a model as basic as one that predicts the temperature by looking at
recent-past temperatures – would have forecasted flu better than GFT.
Large datasets don’t guarantee valid datasets
Correlation is not causation!
They were merely finding statistical patterns in the data. They cared about correlation rather than causation. This is common in
big data analysis. Figuring out what causes what is hard (impossible, some say). Figuring out what is correlated with what is
much cheaper and easier. (Source: https://www.ft.com/content/21a6e7d8-b479-11e3-a09a-00144feabdc0)
Source: https://hbr.org/2014/03/google-flu-trends-failure-shows-good-data-big-data
Getting Ahead of Rodents in Chicago, Illinois, USA
Analytics were developed by the advanced analytics unit in the city of Chicago office that enables the city to more
efficiently focus its limited resources on the control of rodents, an important disease vector for both humans and
pets. Using citizen complaints, these analytics generate a heat map showing rodent hotpots as they develop across
the city. Working with researchers from Carnegie Mellon University's Event and Pattern Detection Laboratory, the
team developed a spatiotemporal model to identify correlations among 350 different factors and spikes in rodent
complaints across the city. The model identified 31 factors that predict when and where rodent complaints are most
like to occur over the subsequent week. (Source: https://www.ncbi.nlm.nih.gov/books/NBK401950/)
Transforming Patient Care at the University of Iowa Hospital and Clinics, Iowa, USA
Surgeons at the University of Iowa Hospitals and Clinics needed to know if patients were susceptible to infections in
order to make critical treatment decisions in the operating room. The goals of the project were to predict patients
with the biggest risk of surgical site infections, in order to reduce infection rate to improve patient care and decrease
costs. Reducing the infection rate has major implications for overall patient health and cost savings. The solution
consisted of the ability to merge historical electronic health record data and live patient vital signs to predict infection
likelihood, and provide doctors with real-time, predictive decisions during surgical procedures so they can create a
plan to reduce risk. The surgical team harnessed the power of big data analytics, coupled with other methods, to
keep patients safe—reducing surgical site infections by 58 percent—while decreasing the cost of care and decreasing
the incidence of readmissions by 40%. (Source: https://insidebigdata.com/2016/11/16/case-studies-big-data-and-
healthcare-life-sciences/)
ⱡ Peppers, Don, and Martha Rogers. The One to One Future: Building Business Relationships One Customer at a Time. London: Piatkus, 1993.
In businesses, big data
is usually concerned
with database
marketing
Database Marketing is
the process of building,
maintaining, and using
customer database and
other databases for the
purposes of contacting
and transactingⱡ
1.Database marketing
applications have 3
broad categories:
2.1. Selling products and
services to new customers
3.2. Selling additional
products and services to
existing customers
4.3. Monitoring and
maintaining existing
customer relationships
The underlying
technology driving
database marketing
efforts is Data Mining
Database Marketing Applications
Reference: Putler and Krider (2012)
Prospecting for new customers
• Customers acquisition through various channels
Qualifying those potential new customers once they
have been found
• Through activities such as credit scoring
Database Marketing Applications
Reference: Putler and Krider (2012)
Cross-selling
Targeting a current customer in order to sell a product or service to that customer that is
different from the products or services that the customer has previously purchased from the
organization
Up-selling
Targeting an offer to an existing customer to upgrade the product or service he / she is
currently purchasing from an organization
Market basket analysis
Examining the composition of items in customers’ “baskets” on single purchase occasions to
find merchandising opportunities that could lead to additional product sales
Example: the famous “beer and diaper” story (https://tdwi.org/articles/2016/11/15/beer-
and-diapers-impossible-correlation.aspx)
Database Marketing Applications
Reference: Putler and Krider (2012)
Customer attrition
(churn) analysis
To find patterns in a
current customer’s
purchase, usage,
and/or complaint
behavior that suggests
that the customer is
about to become an ex-
customer
Customer
segmentation
Grouping customers
into segments based on
their past purchase
behavior allows the
organization to develop
customized promotions
and communications
for each segment
Recommendation
systems
Group products based
on which customers
have bought them and
then make
recommendations
based on the overlap of
the buyers of two or
more products
E.g., “Your
Recommendations” in
Amazon.com
Fraud detection
Allows an organization
to uncover customers
who’re engaged in
fraudulent behavior
against them
Protects customer and
enterprise information,
assets, accounts and
transactions through
the real-time, near-
real-time or batch
analysis of activities by
users and other defined
entities (such as kiosks)
(https://www.gartner.c
om/it-glossary/fraud-
detection)
Source: https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining
A data mining process model that
describes commonly used approaches
that data mining experts use to tackle
problems.
Polls conducted at one and the same
website (KDNuggets) in 2002, 2004,
2007 and 2014 show that it was the
leading methodology used by industry
data miners who decided to respond to
the survey.
Reference: Chapman, Pete, Julia Clinton, Randy Kerber, Thomas Khabaza, Thomas Reinartz, Colin Shearer, and Rüdiger Wirth. CRISP-DM 1.0: step-by-step data mining guide. Chicago, IL: SPSS, 2000.
10-20% 20-30% 50-70% 10-20% 10-20% 5-10%Time Spent
Source: ftp://ftp.software.ibm.com/software/data/sw-library/services/ASUM.pdf
https://hbr.org/video/3633937151001/the-explainer-big-data-and-analytics
ⱡ
ⱡ Ratner (2017)
Statistics Big Data
Machine
Learning &
Lifting
+ +Data
Mining =
Data mining is all of statistics and exploratory data analysis for big and
small data with the power of the PC for lifting data and learning the
structures within the data
ⱡ
ⱡ Ratner (2017)
1.Statistics with emphasis on a proper exploratory data
analysis1.
• This includes using the descriptive and non inferential parts of
classical statistical machinery as indicators. The parts include the sum of
squares, degrees of freedom, F-ratios, chi-squared values, and p-values but
exclude inferential conclusions.
1.Big data2.
• Big data are given special mention because of today’s digital
environment. However, because small data are a component of big data,
they are not excluded.
1.Machine learning & lifting3.
• The PC is the learning machine, the essential processing unit, having
the ability to learn without being explicitly programmed and the
intelligence to find structure in the data. Moreover, the PC is essential for
big data because it can always do what it is explicitly programmed to do.
Predictive
Modeling
Methods
Linear and
Logistic
Regression
Decision Trees &
Rules
Artificial Neural
Networks
Time series
analysis
Grouping
Methods
K-Means
K-Nearest
Neighbor
Hierarchical
agglomerative
methods
Self-organizing
maps
Other Methods
Data description,
summarization,
visualization
Dependency
analysis
Text mining
Web log mining
Open Source
R
https://cran.r-project.org
Microsoft R Open
https://mran.revolutionanal
ytics.com
Commercial
SAS® Enterprise Miner™
https://www.sas.com/en_id
/software/enterprise-
miner.html
IBM SPSS Modeler
https://www.ibm.com/id-
en/marketplace/spss-
modeler
Linear Regression estimates the relationship between several independent variables and a dependent
variable
The dependent variable is of quantitative (numerical) type with ordinal, interval, or ratio measurement
level
Examples: amount of loan, savings account monthly balance, revenue, sales
The independent variable can be of qualitative (categorical) or quantitative (numerical) type
Examples of qualitative (categorical) type variable: marital status, highest completed education level, city,
housing ownership
Mathematical equation: 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + … + 𝛽 𝑘 𝑥 𝑘 + 𝜀
𝑦 is the dependent variable
𝑥1, 𝑥2, … , 𝑥 𝑘 are the independent variables
𝛽0, 𝛽1, 𝛽2, …, 𝛽 𝑘 are the model parameters that need to be estimated from the data
𝜀 is the random error component of the model (for statistical inference purposes it’s assumed to be normally
distributed with mean 0 and variance 1)
The model parameters are estimated by the least squares method
Predictive Modeling Methods
Source: https://en.wikipedia.org/wiki/Linear_regression
Predictive Modeling Methods
In linear regression, the observations (red) are
assumed to be the result of random deviations
(green) from an underlying relationship (blue)
between the dependent variable (y) and
independent variable (x).
The data sets in the Anscombe's quartet are designed to have the same linear
regression line (as well as identical means, standard deviations, and correlations)
but are graphically very different. This illustrates the pitfalls of relying solely on a
fitted model to understand the relationship between variables.
Based on IBM SPSS Modeler
Predictive Modeling Methods
Logistic regression is a type of regression used when the dependent variable is binary or ordinal
The outcome is either “success” or “failure”; e.g., buy or not buy, churn or not churn
The independent variables can be either numerical or categorical
It is commonly used for predicting the probability of occurrence of an event or outcome, based on several
independent variables of categorical or numerical type
Examples: probability of a prospect to buy a new insurance, probability of a current bank customer to accept a
new credit card offer, probability of a customer to close his/her bank account
The model is based on the Binomial distribution
Mathematical equation:
Let the conditional probability that the outcome is present be denoted by 𝑃 𝑌 = 1 𝒙) = 𝜋(𝒙)
The logit of the logistic regression model is given by the equation: 𝑔(𝒙) = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + … + 𝛽 𝑘 𝑥 𝑘
The logistic regression model is 𝜋 𝒙 =
𝑒 𝑔(𝒙)
1+𝑒 𝑔(𝒙)
The model parameters are estimated by the maximum likelihood method
Predictive Modeling Methods
Predictive Modeling Methods
Based on IBM SPSS Modeler
Decision Trees relax the assumption about the probability distribution of the data
Hence, it’s a distribution free method
It works by partitioning the data set recursively into a set of “purer” regions
The dependent and independent variables can be either numerical or categorical
Predictive Modeling Methods
Reference: Breiman et al., 1998
A tree structured classification rule to identify high
risk patients (those who will not survive at least 30
days) on the basis of the initial 24-hour data.
The letter F means not high risk; G means high risk.
The entire construction of a tree, then, revolves
around three elements:
1. The selection of the splits
2. The decisions when to declare a node terminal or
to continue splitting it
3. The assignment of each terminal node to a class
(classification tree) or the estimation of the value
of the dependent variable in the terminal node
(regression tree)
The crux of the problem is how to use the data L
to determine the splits, the terminal nodes, and
their assignments. It turns out that the class
assignment problem is simple. The whole story
is in finding good splits and in knowing when to
stop splitting.
Predictive Modeling Methods
Based on IBM SPSS Modeler
Predictive Modeling Methods
Source: Hastie and Tibshirani (2016)
Predictive Modeling Methods
Neural network diagram with a single hidden layer. The
hidden layer derives transformations of the inputs –
nonlinear transformations of linear combinations—which
are then used to model the output.
Neural network diagram with three hidden layers and
multiple outputs
Predictive Modeling Methods
Based on IBM SPSS Modeler
Key features:
Handling of categorical and continuous variables. By assuming variables to be independent, a joint multinomial-
normal distribution can be placed on categorical and continuous variables.
Automatic selection of number of clusters. By comparing the values of a model-choice criterion across different
clustering solutions, the procedure can automatically determine the optimal number of clusters.
Scalability. By constructing a cluster features (CF) tree that summarizes the records, the TwoStep algorithm
allows you to analyze large data files.
In order to handle categorical and continuous variables, the TwoStep Cluster Analysis procedure uses a
likelihood distance measure which assumes that variables in the cluster model are independent
Further, each continuous variable is assumed to have a normal (Gaussian) distribution and each
categorical variable is assumed to have a multinomial distribution
Grouping Methods
Source: https://www.ibm.com/support/knowledgecenter/en/SSLVMB_22.0.0/com.ibm.spss.statistics.cs/spss/tutorials/twostepcluster_table.htm
Source: http://www.norusis.com/pdf/SPC_v13.pdf
Grouping Methods
Grouping Methods
Based on IBM SPSS Modeler
Data came from Global Market Research 2016: An
ESOMAR Industry Report. Amsterdam: ESOMAR,
2016
Using a dendrogram analysis based on the
detailed data from 52 countries and setting the
number of groups to five, Indonesia appears to
share a similar profile with Philippines and
Vietnam in the South East Asia region
Malaysia turns out to be closer to China
While Singapore is closer to Hong Kong, South
Korea, and Taiwan. They’re all grouped together
with Germany, the 3rd largest market.
Analysis was done in R
Grouping Method: An Example of a Dendrogram Analysis
Dendrogram was derived using Euclidean distance measure and “complete linkage” clustering method
Banking
• Cross-selling and up-
selling
• Credit scoring for home
loan application
• Credit scoring for credit
card application
• Customer segmentation
based on transactional
data (via ATM,
mobile/SMS/Internet
banking, teller)
• Churn analysis
(predicting if an account
would become
inactive/dormant)
InformationandCommunication
Technology
• Cross-selling and up-
selling
• Churn analysis
(predicting if the
customer would stop
using the service)
FinancingCompany
• Cross-selling and up-
selling of two- and
four-wheelers
• Churn analysis
(predicting if the loan
would be fully paid
before the end of the
contract)
• Credit scoring for two-
and four-wheelers loan
application
• Credit scoring for
refinancing application
Objective: to predict the likelihood of an account to become inactive / dormant in the next 6 months
Challenges faced:
How to define “inactive” or “dormant”? What would be the time period to be classified as “inactive” /
“dormant”?
Unclean data; e.g., wrong dates in the database, wrong account’s status
Merging customer’s profile data and transactional data
Customers’ complaint data were not recorded completely
Variables used in the analysis include the following:
Customer’s demographic data
Product ownership with the bank: savings, investment, credit card, secured and unsecured loan
Customer’s detailed transaction
Customer’s monthly balance
Methods employed:
Decision trees
Logistic regression
Objective: to predict the likelihood that a loan (for a two- or four-wheeler) would be fully paid before the
end of the contract
Variables used in the analysis include the following:
Customer’s demographic data
Type of customer: new or recurring
Info on the current contract: new or used vehicle, amount of loan, loan tenor, amount of down payment
Monthly payment data
Delinquent record
Methods employed:
Decision trees
Logistic regression
Challenges faced:
After investigating many variables and fitting many models, we found out that Lebaran appeared to have a
significant impact towards the payment behavior
Hence, we created a dummy variable to indicate the month before Lebaran and the month of Lebaran to
improve the prediction
Objective: to predict the likelihood that a current customer would accept a new loan offer (for a two- or
four-wheeler) when the current loan contract has expired
Variables used in the analysis include the following:
Customer’s demographic data
Type of customer: new or recurring
Info on the current contract: new or used vehicle, amount of loan, loan tenor, amount of down payment
Monthly payment data
Delinquent record
Methods employed:
Decision trees
Logistic regression
Challenges faced:
The prediction accuracy was poor (close to 50%); most likely due to the irrelevant variables used in the model
It appeared that payment behavior data was not correlated with the likelihood of accepting a new loan offer
It could be more fruitful to conduct a marketing research project to test several loan concepts and offer the
winning concept to the segments of customers showing the highest interest towards it
*A term coined by Stephen Few in his book Big Data, Big Dupe (2018)
*
It’s all about the (explicit) objectives!
Data deep dive by solid data sensemaking skills
Augmented by good technologies
Proposal to mine value from data by Stephen Few in Big Data, Big Dupe (2018)
2 Only relevant data of good quality is useful
3 Information is only useful if it is understood
4 The steps that lead to understanding require skills that must be learned
5 The development of data-sensemaking skills involves a great deal of study and practice
6 Study and practice take time – lots of time
7 Technologies are needed to augment human data-sensemaking abilities
8 Only well-designed technologies are worth the investment of time and money
9 Skilled data sensemakers combine general analytical skills with specific domain knowledge;
one without the other is not enough
1 Useful information can be derived from data
Books and Articles
Breiman, Leo, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and Regression Trees. First
reprint ed. Boca Raton, FL: CRC Press, 1998.
Brown, Meta S. Data Mining for Dummies. Hoboken, NJ: Wiley, 2014.
Few, Stephen. Big Data, Big Dupe: A Little Book about a Big Bunch of Nonsense. El Dorado Hills, CA: Analytics Press,
2018.
Hastie, Trevor J., Robert John Tibshirani, and Jerome H. Friedman. The Elements of Statistical Learning: Data Mining,
Inference, and Prediction. 2nd ed. New York, NY: Springer, 2016.
Hosmer, David W., and Stanley Lemeshow. Applied Logistic Regression. 2nd ed. New York: Wiley, 2000.
Linoff, Gordon S., and Michael J.A. Berry. Data Mining Techniques For Marketing, Sales, and Customer Relationship
Management. 3rd ed. Indianapolis, IN: Wiley, 2011.
Olson, David L., and Yong Shi. Introduction to Business Data Mining. International ed. Boston, MA: McGraw-Hill, 2007.
Putler, Daniel S., and Robert E. Krider. Customer and Business Analytics: Applied Data Mining for Business Decision
Making Using R. International ed. Boca Raton, FL: CRC Press, 2012.
Books and Articles
Ratner, Bruce. Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis
of Big Data. 3rd ed. Boca Raton, FL: CRC Press, 2017.
Siddiqi, Naeem. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. Hoboken, NJ: Wiley,
2006.
Zicari, Roberto V. "Big Data: Challenges and Opportunities." In Big Data Computing, edited by Rajendra Akerkar, 104-
28. Boca Raton, FL: CRC Press, 2014.
Online Resources
KDNuggets™ (https://www.kdnuggets.com/)
A leading site on Business Analytics, Big Data, Data Mining, Data Science, and Machine Learning and is edited by
Gregory Piatetsky-Shapiro and Matthew Mayo
IBM Big Data & Analytics Hub (http://www.ibmbigdatahub.com/)
Brought by IBM, the Hub is the home for current content and conversation regarding big data and analytics for the
enterprise from thought-leaders, subject matter experts and big data practitioners
IBM developerWorks Data and Analytics (https://www.ibm.com/developerworks/learn/analytics/)
A source of innovative how-tos, tools, code, and communities to help you solve the complex problems that you face
as a developer in an enterprise organization
SAS Big Data Insights (https://www.sas.com/en_id/insights/big-data.html)
Explore hot topics and trends in big data curated by SAS
Coursera (https://www.coursera.org)
Provides universal access to the world’s best education, partnering with top universities and organizations to offer
courses online
I would like to thank
Sani Susanto, Ph.D for inviting me to give
the talk
Bruce Ratner, Ph.D for suggesting the
outline of the presentation
Leap.Research
LeapResearchID
Leap.Research
Leap Research
Leap Research
Leap Research
Leap-Research.com
© 2018 Leap Research. All rights reserved.
Leap Research
SOHO Podomoro City, Unit 18-05
Jl. Letjen S. Parman Kav. 28, Jakarta 11470
Quantitative Senior Research Director and
Partner
ts.lim@leap-research.com
+62 818 906 875

Weitere ähnliche Inhalte

Was ist angesagt?

Importance of Data Analytics
 Importance of Data Analytics Importance of Data Analytics
Importance of Data AnalyticsProduct School
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An OverviewMachinePulse
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Ppt for Application of big data
Ppt for Application of big dataPpt for Application of big data
Ppt for Application of big dataPrashant Sharma
 
Analytics with Descriptive, Predictive and Prescriptive Techniques
Analytics with Descriptive, Predictive and Prescriptive TechniquesAnalytics with Descriptive, Predictive and Prescriptive Techniques
Analytics with Descriptive, Predictive and Prescriptive Techniquesleadershipsoil
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data VisualizationCenterline Digital
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data ScienceBrijeshGoyani
 
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsTools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsJOSEPH FRANCIS
 
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?Bernard Marr
 
Business analytics workshop presentation final
Business analytics workshop presentation   finalBusiness analytics workshop presentation   final
Business analytics workshop presentation finalBrian Beveridge
 

Was ist angesagt? (20)

Importance of Data Analytics
 Importance of Data Analytics Importance of Data Analytics
Importance of Data Analytics
 
Overview of Big data(ppt)
Overview of Big data(ppt)Overview of Big data(ppt)
Overview of Big data(ppt)
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Ppt for Application of big data
Ppt for Application of big dataPpt for Application of big data
Ppt for Application of big data
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
What is big data?
What is big data?What is big data?
What is big data?
 
Big data Ppt
Big data PptBig data Ppt
Big data Ppt
 
Analytics with Descriptive, Predictive and Prescriptive Techniques
Analytics with Descriptive, Predictive and Prescriptive TechniquesAnalytics with Descriptive, Predictive and Prescriptive Techniques
Analytics with Descriptive, Predictive and Prescriptive Techniques
 
Business Analytics
 Business Analytics  Business Analytics
Business Analytics
 
Big data and analytics
Big data and analyticsBig data and analytics
Big data and analytics
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data Visualization
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
 
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsTools and techniques adopted for big data analytics
Tools and techniques adopted for big data analytics
 
Big data
Big dataBig data
Big data
 
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
 
Business analytics workshop presentation final
Business analytics workshop presentation   finalBusiness analytics workshop presentation   final
Business analytics workshop presentation final
 

Ähnlich wie Applications of Big Data Analytics in Businesses

Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBala Iyer
 
CS309A Final Paper_KM_DD
CS309A Final Paper_KM_DDCS309A Final Paper_KM_DD
CS309A Final Paper_KM_DDDavid Darrough
 
Analytics solution
Analytics solutionAnalytics solution
Analytics solutioncamssguide
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paperJohn Enoch
 
Odgers Berndtson and Unico Big Data White Paper
Odgers Berndtson and Unico Big Data White PaperOdgers Berndtson and Unico Big Data White Paper
Odgers Berndtson and Unico Big Data White PaperRobertson Executive Search
 
Big Data: Are you ready for it? Can you handle it?
Big Data: Are you ready for it? Can you handle it? Big Data: Are you ready for it? Can you handle it?
Big Data: Are you ready for it? Can you handle it? ScaleFocus
 
Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...IRJET Journal
 
Mejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big DataMejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big DataMiguel Ángel Gómez
 
Big Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin MalhotraBig Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin MalhotraVin Malhotra
 
Gartner eBook on Big Data
Gartner eBook on Big DataGartner eBook on Big Data
Gartner eBook on Big DataJyrki Määttä
 
BIG DATA & BUSINESS ANALYTICS
BIG DATA & BUSINESS ANALYTICSBIG DATA & BUSINESS ANALYTICS
BIG DATA & BUSINESS ANALYTICSVikram Joshi
 
Todays companies are dealing with an avalanche of data from socia.docx
Todays companies are dealing with an avalanche of data from socia.docxTodays companies are dealing with an avalanche of data from socia.docx
Todays companies are dealing with an avalanche of data from socia.docxamit657720
 
Big data analytics and its impact on internet users
Big data analytics and its impact on internet usersBig data analytics and its impact on internet users
Big data analytics and its impact on internet usersStruggler Ever
 
A study on web analytics with reference to select sports websites
A study on web analytics with reference to select sports websitesA study on web analytics with reference to select sports websites
A study on web analytics with reference to select sports websitesBhanu Prakash
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyClaudiu Popa
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data scienceVipul Kalamkar
 

Ähnlich wie Applications of Big Data Analytics in Businesses (20)

Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the Marketspace
 
CS309A Final Paper_KM_DD
CS309A Final Paper_KM_DDCS309A Final Paper_KM_DD
CS309A Final Paper_KM_DD
 
The Big Data Talent Gap
The Big Data Talent GapThe Big Data Talent Gap
The Big Data Talent Gap
 
Analytics solution
Analytics solutionAnalytics solution
Analytics solution
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paper
 
Big data assignment
Big data assignmentBig data assignment
Big data assignment
 
Odgers Berndtson and Unico Big Data White Paper
Odgers Berndtson and Unico Big Data White PaperOdgers Berndtson and Unico Big Data White Paper
Odgers Berndtson and Unico Big Data White Paper
 
Big Data: Where is the Real Opportunity?
Big Data: Where is the Real Opportunity?Big Data: Where is the Real Opportunity?
Big Data: Where is the Real Opportunity?
 
Big Data: Are you ready for it? Can you handle it?
Big Data: Are you ready for it? Can you handle it? Big Data: Are you ready for it? Can you handle it?
Big Data: Are you ready for it? Can you handle it?
 
Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...Big data analytics in Business Management and Businesss Intelligence: A Lietr...
Big data analytics in Business Management and Businesss Intelligence: A Lietr...
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Mejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big DataMejorar la toma de decisiones con Big Data
Mejorar la toma de decisiones con Big Data
 
Big Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin MalhotraBig Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin Malhotra
 
Gartner eBook on Big Data
Gartner eBook on Big DataGartner eBook on Big Data
Gartner eBook on Big Data
 
BIG DATA & BUSINESS ANALYTICS
BIG DATA & BUSINESS ANALYTICSBIG DATA & BUSINESS ANALYTICS
BIG DATA & BUSINESS ANALYTICS
 
Todays companies are dealing with an avalanche of data from socia.docx
Todays companies are dealing with an avalanche of data from socia.docxTodays companies are dealing with an avalanche of data from socia.docx
Todays companies are dealing with an avalanche of data from socia.docx
 
Big data analytics and its impact on internet users
Big data analytics and its impact on internet usersBig data analytics and its impact on internet users
Big data analytics and its impact on internet users
 
A study on web analytics with reference to select sports websites
A study on web analytics with reference to select sports websitesA study on web analytics with reference to select sports websites
A study on web analytics with reference to select sports websites
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on Privacy
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 

Mehr von T.S. Lim

The Usefulness of Marketing Research
The Usefulness of Marketing ResearchThe Usefulness of Marketing Research
The Usefulness of Marketing ResearchT.S. Lim
 
Leap Brand Investigator Express
Leap Brand Investigator ExpressLeap Brand Investigator Express
Leap Brand Investigator ExpressT.S. Lim
 
Leap Idea Navigator Express
Leap Idea Navigator ExpressLeap Idea Navigator Express
Leap Idea Navigator ExpressT.S. Lim
 
Leap Concept Navigator Express
Leap Concept Navigator ExpressLeap Concept Navigator Express
Leap Concept Navigator ExpressT.S. Lim
 
Leap TVC Connector Express
Leap TVC Connector ExpressLeap TVC Connector Express
Leap TVC Connector ExpressT.S. Lim
 
Alur Kerja di Riset Pemasaran Kuantitatif
Alur Kerja di Riset Pemasaran KuantitatifAlur Kerja di Riset Pemasaran Kuantitatif
Alur Kerja di Riset Pemasaran KuantitatifT.S. Lim
 
A National Survey of Corruption in Indonesia: Final Report
A National Survey of Corruption in Indonesia: Final ReportA National Survey of Corruption in Indonesia: Final Report
A National Survey of Corruption in Indonesia: Final ReportT.S. Lim
 
The Roles of Analytics in Digital Marketing
The Roles of Analytics in Digital MarketingThe Roles of Analytics in Digital Marketing
The Roles of Analytics in Digital MarketingT.S. Lim
 
Introduction to Leap Research
Introduction to Leap ResearchIntroduction to Leap Research
Introduction to Leap ResearchT.S. Lim
 
Data Penyelenggara Fintech Terdaftar per Agustus 2018
Data Penyelenggara Fintech Terdaftar per Agustus 2018Data Penyelenggara Fintech Terdaftar per Agustus 2018
Data Penyelenggara Fintech Terdaftar per Agustus 2018T.S. Lim
 
PACT Principles
PACT PrinciplesPACT Principles
PACT PrinciplesT.S. Lim
 
Indonesia Market Research Turnover 2012 – 2016
Indonesia Market Research Turnover 2012 – 2016Indonesia Market Research Turnover 2012 – 2016
Indonesia Market Research Turnover 2012 – 2016T.S. Lim
 
PENETRASI & PERILAKU PENGGUNA INTERNET INDONESIA 2017
PENETRASI & PERILAKU PENGGUNA INTERNET INDONESIA 2017PENETRASI & PERILAKU PENGGUNA INTERNET INDONESIA 2017
PENETRASI & PERILAKU PENGGUNA INTERNET INDONESIA 2017T.S. Lim
 
Are Big Brands Dying?
Are Big Brands Dying? Are Big Brands Dying?
Are Big Brands Dying? T.S. Lim
 
Ipsos MORI Thinks: Millennial Myths and Realities
Ipsos MORI Thinks: Millennial Myths and RealitiesIpsos MORI Thinks: Millennial Myths and Realities
Ipsos MORI Thinks: Millennial Myths and RealitiesT.S. Lim
 
Leap Research: Your Trusted Research Partner
Leap Research: Your Trusted Research PartnerLeap Research: Your Trusted Research Partner
Leap Research: Your Trusted Research PartnerT.S. Lim
 
Leap Research: Our Quantitative Research Capabilities
Leap Research: Our Quantitative Research CapabilitiesLeap Research: Our Quantitative Research Capabilities
Leap Research: Our Quantitative Research CapabilitiesT.S. Lim
 
Persuasion: 195 Evidence-Based Principles
Persuasion: 195 Evidence-Based PrinciplesPersuasion: 195 Evidence-Based Principles
Persuasion: 195 Evidence-Based PrinciplesT.S. Lim
 
Indonesia Market Research Turnover 2011 – 2015
Indonesia Market Research Turnover 2011 – 2015Indonesia Market Research Turnover 2011 – 2015
Indonesia Market Research Turnover 2011 – 2015T.S. Lim
 
Data Analysis and Statistics
Data Analysis and StatisticsData Analysis and Statistics
Data Analysis and StatisticsT.S. Lim
 

Mehr von T.S. Lim (20)

The Usefulness of Marketing Research
The Usefulness of Marketing ResearchThe Usefulness of Marketing Research
The Usefulness of Marketing Research
 
Leap Brand Investigator Express
Leap Brand Investigator ExpressLeap Brand Investigator Express
Leap Brand Investigator Express
 
Leap Idea Navigator Express
Leap Idea Navigator ExpressLeap Idea Navigator Express
Leap Idea Navigator Express
 
Leap Concept Navigator Express
Leap Concept Navigator ExpressLeap Concept Navigator Express
Leap Concept Navigator Express
 
Leap TVC Connector Express
Leap TVC Connector ExpressLeap TVC Connector Express
Leap TVC Connector Express
 
Alur Kerja di Riset Pemasaran Kuantitatif
Alur Kerja di Riset Pemasaran KuantitatifAlur Kerja di Riset Pemasaran Kuantitatif
Alur Kerja di Riset Pemasaran Kuantitatif
 
A National Survey of Corruption in Indonesia: Final Report
A National Survey of Corruption in Indonesia: Final ReportA National Survey of Corruption in Indonesia: Final Report
A National Survey of Corruption in Indonesia: Final Report
 
The Roles of Analytics in Digital Marketing
The Roles of Analytics in Digital MarketingThe Roles of Analytics in Digital Marketing
The Roles of Analytics in Digital Marketing
 
Introduction to Leap Research
Introduction to Leap ResearchIntroduction to Leap Research
Introduction to Leap Research
 
Data Penyelenggara Fintech Terdaftar per Agustus 2018
Data Penyelenggara Fintech Terdaftar per Agustus 2018Data Penyelenggara Fintech Terdaftar per Agustus 2018
Data Penyelenggara Fintech Terdaftar per Agustus 2018
 
PACT Principles
PACT PrinciplesPACT Principles
PACT Principles
 
Indonesia Market Research Turnover 2012 – 2016
Indonesia Market Research Turnover 2012 – 2016Indonesia Market Research Turnover 2012 – 2016
Indonesia Market Research Turnover 2012 – 2016
 
PENETRASI & PERILAKU PENGGUNA INTERNET INDONESIA 2017
PENETRASI & PERILAKU PENGGUNA INTERNET INDONESIA 2017PENETRASI & PERILAKU PENGGUNA INTERNET INDONESIA 2017
PENETRASI & PERILAKU PENGGUNA INTERNET INDONESIA 2017
 
Are Big Brands Dying?
Are Big Brands Dying? Are Big Brands Dying?
Are Big Brands Dying?
 
Ipsos MORI Thinks: Millennial Myths and Realities
Ipsos MORI Thinks: Millennial Myths and RealitiesIpsos MORI Thinks: Millennial Myths and Realities
Ipsos MORI Thinks: Millennial Myths and Realities
 
Leap Research: Your Trusted Research Partner
Leap Research: Your Trusted Research PartnerLeap Research: Your Trusted Research Partner
Leap Research: Your Trusted Research Partner
 
Leap Research: Our Quantitative Research Capabilities
Leap Research: Our Quantitative Research CapabilitiesLeap Research: Our Quantitative Research Capabilities
Leap Research: Our Quantitative Research Capabilities
 
Persuasion: 195 Evidence-Based Principles
Persuasion: 195 Evidence-Based PrinciplesPersuasion: 195 Evidence-Based Principles
Persuasion: 195 Evidence-Based Principles
 
Indonesia Market Research Turnover 2011 – 2015
Indonesia Market Research Turnover 2011 – 2015Indonesia Market Research Turnover 2011 – 2015
Indonesia Market Research Turnover 2011 – 2015
 
Data Analysis and Statistics
Data Analysis and StatisticsData Analysis and Statistics
Data Analysis and Statistics
 

Kürzlich hochgeladen

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 

Kürzlich hochgeladen (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 

Applications of Big Data Analytics in Businesses

  • 1. © 2018 Leap Research. All rights reserved. Prepared for: A Guest Lecture at SBM ITB March 26, 2018 A Guest Lecture at SBM ITB March 26, 2018
  • 2. 3 Big Data and Big Data Analytics 10 A Word of Caution 14 Big Data Projects 20 Frameworks 25 Analytical Methods in Data Mining 44 Business Applications 49 Epilogue
  • 3.
  • 4. noun, (used with a singular or plural verb) 1.Computers. data sets, typically consisting of billions or trillions of records, that are so vast and complex that they require new and powerful computational resources to process: Supercomputers can analyze big data to create models of global climate change. Reference: big data. Dictionary.com. Dictionary.com Unabridged. Random House, Inc. http://www.dictionary.com/browse/big-data (accessed: January 29, 2018) Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy. There are three dimensions to big data known as Volume, Variety and Velocity. Reference: https://en.wikipedia.org/wiki/Big_data (accessed: January 29, 2018)
  • 5. Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves. Reference: https://www.sas.com/en_id/insights/big-data/what-is-big-data.html (accessed: January 29, 2018) Regarding big data, there are two views. One view is that of classical statistics, which considers big as simply not small. Theoretically, big is the sample size after which asymptotic properties of the method "kick in" for valid results. The other view is that of contemporary statistics, which considers big with regard to lifting (mathematical calculation) the observations and learning from the variables. Reference: Ratner, Bruce. Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data. 3rd ed. Boca Raton, FL: CRC Press, 2017
  • 6. “Big Data” refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze. Reference: McKinsey Global Institute (MGI), Big Data: The next frontier for innovation, competition, and productivity, Report, June, 2012 Top recurring themes in the definitions provided by 43 thought leaders. Reference: Dutcher, Jennifer. "What Is Big Data?" University of California Berkeley School of Information. September 03, 2014. Accessed March 19, 2018. https://datascience.berkeley.edu/what- is-big-data
  • 8. Big data analytics examines large amounts of data to uncover hidden patterns, correlations and other insights. https://www.sas.com/en_id/insights/analytics/big-data-analytics.html Big data analytics refers to the strategy of analyzing large volumes of data, or big data. This big data is gathered from a wide variety of sources, including social networks, videos, digital images, sensors, and sales transaction records. The aim in analyzing all this data is to uncover patterns and connections that might otherwise be invisible, and that might provide valuable insights about the users who created it. Through this insight, businesses may be able to gain an edge over their rivals and make superior business decisions. https://www.techopedia.com/definition/28659/big-data-analytics Big Data analytics is the process of collecting, organizing and analyzing large sets of data (called Big Data) to discover patterns and other useful information. Big Data analytics can help organizations to better understand the information contained within the data and will also help identify the data that is most important to the business and future business decisions. Analysts working with Big Data typically want the knowledge that comes from analyzing the data. https://www.webopedia.com/TERM/B/big_data_analytics.html
  • 9. It’s important to remember that the primary value from big data comes not from the data in its raw form, but from the processing and analysis of it and the insights, products, and services that emerge from analysis. The sweeping changes in big data technologies and management approaches need to be accompanied by similarly dramatic shifts in how data supports decisions and product/service innovation. Thomas H. Davenport in Big Data in Big Companies
  • 10.
  • 11. Marketing / Retail Data mining helps marketing companies build models based on historical data to predict who will respond to the new marketing campaigns Finance / Banking Data mining gives financial institutions information about loan information and credit reporting Manufacturing By applying data mining in operational engineering data, manufacturers can detect faulty equipment and determine optimal control parameters Government Data mining helps government agency by digging and analyzing records of the financial transaction to build patterns that can detect money laundering or criminal activities Privacy Issues People are afraid of their personal information is collected and used in an unethical way that potentially causing them a lot of troubles. While businesses collect information about their customers in many ways for understanding their purchasing behaviors trends. Security Issues Businesses own information about their employees and customers including social security number, birthday, payroll and etc. Misuse of Information / Inaccurate Information Information is collected through data mining intended for the ethical purposes can be misused. This information may be exploited by unethical people or businesses to take benefits of vulnerable people or discriminate against a group of people. (Read Ombudsman mengingatkan pemerintah agar serius lindungi data masyarakat, Cambridge Analytica: What is the company now embroiled in Facebook data controversy?) Source: http://www.zentut.com/data-mining/advantages-and-disadvantages-of-data-mining/
  • 12. Source: https://www.forbes.com/sites/bernardmarr/2015/03/17/where-big-data-projects-fail/#72098e0468b5 Not starting with clear business objectives Not making a good business case Management failure Poor communication Not having the right skills for the job What people who fall into that trap often failed to appreciate is that analytics in business in about problem solving – and first you need to know what problem you are trying to solve You need to know why your business needs to use Big Data before you start doing it. If you don’t – wait until you do. Those holding the purse strings haven’t taken into account some long-term or ongoing cost associated with the project; the senior project managers just can’t talk productively to the data scientist workers in the lab; senior managers don’t trust the algorithms Big Data in business is about the interface between the analytical, experimental science that goes on in data labs, and the profit and target chasing sales force and boardroom Companies are often fond of starting up data projects “left, right and centre” without thinking enough about how this might impact resources in the future
  • 13. Launched in 2008, Google Flu Trends (GFT) is Google’s algorithm which mined five years of web logs, containing hundreds of billions of searches, and created a predictive model utilizing 45 search terms that “proved to be a more useful and timely indicator [of flu] than government statistics with their natural reporting lags.” (described in the 2013 book Big Data: A Revolution That Will Transform How We Live, Work and Think by Viktor Mayer-Schönberger and Kenneth Cukier) In Science, a team of Harvard-affiliated researchers published their findings that GFT has over-estimated the prevalence of flu for 100 out of the last 108 weeks; it’s been wrong since August 2011. The Science article further points out that a simplistic forecasting model – a model as basic as one that predicts the temperature by looking at recent-past temperatures – would have forecasted flu better than GFT. Large datasets don’t guarantee valid datasets Correlation is not causation! They were merely finding statistical patterns in the data. They cared about correlation rather than causation. This is common in big data analysis. Figuring out what causes what is hard (impossible, some say). Figuring out what is correlated with what is much cheaper and easier. (Source: https://www.ft.com/content/21a6e7d8-b479-11e3-a09a-00144feabdc0) Source: https://hbr.org/2014/03/google-flu-trends-failure-shows-good-data-big-data
  • 14.
  • 15. Getting Ahead of Rodents in Chicago, Illinois, USA Analytics were developed by the advanced analytics unit in the city of Chicago office that enables the city to more efficiently focus its limited resources on the control of rodents, an important disease vector for both humans and pets. Using citizen complaints, these analytics generate a heat map showing rodent hotpots as they develop across the city. Working with researchers from Carnegie Mellon University's Event and Pattern Detection Laboratory, the team developed a spatiotemporal model to identify correlations among 350 different factors and spikes in rodent complaints across the city. The model identified 31 factors that predict when and where rodent complaints are most like to occur over the subsequent week. (Source: https://www.ncbi.nlm.nih.gov/books/NBK401950/) Transforming Patient Care at the University of Iowa Hospital and Clinics, Iowa, USA Surgeons at the University of Iowa Hospitals and Clinics needed to know if patients were susceptible to infections in order to make critical treatment decisions in the operating room. The goals of the project were to predict patients with the biggest risk of surgical site infections, in order to reduce infection rate to improve patient care and decrease costs. Reducing the infection rate has major implications for overall patient health and cost savings. The solution consisted of the ability to merge historical electronic health record data and live patient vital signs to predict infection likelihood, and provide doctors with real-time, predictive decisions during surgical procedures so they can create a plan to reduce risk. The surgical team harnessed the power of big data analytics, coupled with other methods, to keep patients safe—reducing surgical site infections by 58 percent—while decreasing the cost of care and decreasing the incidence of readmissions by 40%. (Source: https://insidebigdata.com/2016/11/16/case-studies-big-data-and- healthcare-life-sciences/)
  • 16. ⱡ Peppers, Don, and Martha Rogers. The One to One Future: Building Business Relationships One Customer at a Time. London: Piatkus, 1993. In businesses, big data is usually concerned with database marketing Database Marketing is the process of building, maintaining, and using customer database and other databases for the purposes of contacting and transactingⱡ 1.Database marketing applications have 3 broad categories: 2.1. Selling products and services to new customers 3.2. Selling additional products and services to existing customers 4.3. Monitoring and maintaining existing customer relationships The underlying technology driving database marketing efforts is Data Mining
  • 17. Database Marketing Applications Reference: Putler and Krider (2012) Prospecting for new customers • Customers acquisition through various channels Qualifying those potential new customers once they have been found • Through activities such as credit scoring
  • 18. Database Marketing Applications Reference: Putler and Krider (2012) Cross-selling Targeting a current customer in order to sell a product or service to that customer that is different from the products or services that the customer has previously purchased from the organization Up-selling Targeting an offer to an existing customer to upgrade the product or service he / she is currently purchasing from an organization Market basket analysis Examining the composition of items in customers’ “baskets” on single purchase occasions to find merchandising opportunities that could lead to additional product sales Example: the famous “beer and diaper” story (https://tdwi.org/articles/2016/11/15/beer- and-diapers-impossible-correlation.aspx)
  • 19. Database Marketing Applications Reference: Putler and Krider (2012) Customer attrition (churn) analysis To find patterns in a current customer’s purchase, usage, and/or complaint behavior that suggests that the customer is about to become an ex- customer Customer segmentation Grouping customers into segments based on their past purchase behavior allows the organization to develop customized promotions and communications for each segment Recommendation systems Group products based on which customers have bought them and then make recommendations based on the overlap of the buyers of two or more products E.g., “Your Recommendations” in Amazon.com Fraud detection Allows an organization to uncover customers who’re engaged in fraudulent behavior against them Protects customer and enterprise information, assets, accounts and transactions through the real-time, near- real-time or batch analysis of activities by users and other defined entities (such as kiosks) (https://www.gartner.c om/it-glossary/fraud- detection)
  • 20.
  • 21. Source: https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining A data mining process model that describes commonly used approaches that data mining experts use to tackle problems. Polls conducted at one and the same website (KDNuggets) in 2002, 2004, 2007 and 2014 show that it was the leading methodology used by industry data miners who decided to respond to the survey.
  • 22. Reference: Chapman, Pete, Julia Clinton, Randy Kerber, Thomas Khabaza, Thomas Reinartz, Colin Shearer, and Rüdiger Wirth. CRISP-DM 1.0: step-by-step data mining guide. Chicago, IL: SPSS, 2000. 10-20% 20-30% 50-70% 10-20% 10-20% 5-10%Time Spent
  • 25.
  • 26. ⱡ ⱡ Ratner (2017) Statistics Big Data Machine Learning & Lifting + +Data Mining = Data mining is all of statistics and exploratory data analysis for big and small data with the power of the PC for lifting data and learning the structures within the data
  • 27. ⱡ ⱡ Ratner (2017) 1.Statistics with emphasis on a proper exploratory data analysis1. • This includes using the descriptive and non inferential parts of classical statistical machinery as indicators. The parts include the sum of squares, degrees of freedom, F-ratios, chi-squared values, and p-values but exclude inferential conclusions. 1.Big data2. • Big data are given special mention because of today’s digital environment. However, because small data are a component of big data, they are not excluded. 1.Machine learning & lifting3. • The PC is the learning machine, the essential processing unit, having the ability to learn without being explicitly programmed and the intelligence to find structure in the data. Moreover, the PC is essential for big data because it can always do what it is explicitly programmed to do.
  • 28. Predictive Modeling Methods Linear and Logistic Regression Decision Trees & Rules Artificial Neural Networks Time series analysis Grouping Methods K-Means K-Nearest Neighbor Hierarchical agglomerative methods Self-organizing maps Other Methods Data description, summarization, visualization Dependency analysis Text mining Web log mining
  • 29. Open Source R https://cran.r-project.org Microsoft R Open https://mran.revolutionanal ytics.com Commercial SAS® Enterprise Miner™ https://www.sas.com/en_id /software/enterprise- miner.html IBM SPSS Modeler https://www.ibm.com/id- en/marketplace/spss- modeler
  • 30. Linear Regression estimates the relationship between several independent variables and a dependent variable The dependent variable is of quantitative (numerical) type with ordinal, interval, or ratio measurement level Examples: amount of loan, savings account monthly balance, revenue, sales The independent variable can be of qualitative (categorical) or quantitative (numerical) type Examples of qualitative (categorical) type variable: marital status, highest completed education level, city, housing ownership Mathematical equation: 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + … + 𝛽 𝑘 𝑥 𝑘 + 𝜀 𝑦 is the dependent variable 𝑥1, 𝑥2, … , 𝑥 𝑘 are the independent variables 𝛽0, 𝛽1, 𝛽2, …, 𝛽 𝑘 are the model parameters that need to be estimated from the data 𝜀 is the random error component of the model (for statistical inference purposes it’s assumed to be normally distributed with mean 0 and variance 1) The model parameters are estimated by the least squares method Predictive Modeling Methods
  • 31. Source: https://en.wikipedia.org/wiki/Linear_regression Predictive Modeling Methods In linear regression, the observations (red) are assumed to be the result of random deviations (green) from an underlying relationship (blue) between the dependent variable (y) and independent variable (x). The data sets in the Anscombe's quartet are designed to have the same linear regression line (as well as identical means, standard deviations, and correlations) but are graphically very different. This illustrates the pitfalls of relying solely on a fitted model to understand the relationship between variables.
  • 32. Based on IBM SPSS Modeler Predictive Modeling Methods
  • 33. Logistic regression is a type of regression used when the dependent variable is binary or ordinal The outcome is either “success” or “failure”; e.g., buy or not buy, churn or not churn The independent variables can be either numerical or categorical It is commonly used for predicting the probability of occurrence of an event or outcome, based on several independent variables of categorical or numerical type Examples: probability of a prospect to buy a new insurance, probability of a current bank customer to accept a new credit card offer, probability of a customer to close his/her bank account The model is based on the Binomial distribution Mathematical equation: Let the conditional probability that the outcome is present be denoted by 𝑃 𝑌 = 1 𝒙) = 𝜋(𝒙) The logit of the logistic regression model is given by the equation: 𝑔(𝒙) = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + … + 𝛽 𝑘 𝑥 𝑘 The logistic regression model is 𝜋 𝒙 = 𝑒 𝑔(𝒙) 1+𝑒 𝑔(𝒙) The model parameters are estimated by the maximum likelihood method Predictive Modeling Methods
  • 34. Predictive Modeling Methods Based on IBM SPSS Modeler
  • 35. Decision Trees relax the assumption about the probability distribution of the data Hence, it’s a distribution free method It works by partitioning the data set recursively into a set of “purer” regions The dependent and independent variables can be either numerical or categorical Predictive Modeling Methods Reference: Breiman et al., 1998 A tree structured classification rule to identify high risk patients (those who will not survive at least 30 days) on the basis of the initial 24-hour data. The letter F means not high risk; G means high risk.
  • 36. The entire construction of a tree, then, revolves around three elements: 1. The selection of the splits 2. The decisions when to declare a node terminal or to continue splitting it 3. The assignment of each terminal node to a class (classification tree) or the estimation of the value of the dependent variable in the terminal node (regression tree) The crux of the problem is how to use the data L to determine the splits, the terminal nodes, and their assignments. It turns out that the class assignment problem is simple. The whole story is in finding good splits and in knowing when to stop splitting. Predictive Modeling Methods
  • 37. Based on IBM SPSS Modeler Predictive Modeling Methods
  • 38. Source: Hastie and Tibshirani (2016) Predictive Modeling Methods Neural network diagram with a single hidden layer. The hidden layer derives transformations of the inputs – nonlinear transformations of linear combinations—which are then used to model the output. Neural network diagram with three hidden layers and multiple outputs
  • 39. Predictive Modeling Methods Based on IBM SPSS Modeler
  • 40. Key features: Handling of categorical and continuous variables. By assuming variables to be independent, a joint multinomial- normal distribution can be placed on categorical and continuous variables. Automatic selection of number of clusters. By comparing the values of a model-choice criterion across different clustering solutions, the procedure can automatically determine the optimal number of clusters. Scalability. By constructing a cluster features (CF) tree that summarizes the records, the TwoStep algorithm allows you to analyze large data files. In order to handle categorical and continuous variables, the TwoStep Cluster Analysis procedure uses a likelihood distance measure which assumes that variables in the cluster model are independent Further, each continuous variable is assumed to have a normal (Gaussian) distribution and each categorical variable is assumed to have a multinomial distribution Grouping Methods Source: https://www.ibm.com/support/knowledgecenter/en/SSLVMB_22.0.0/com.ibm.spss.statistics.cs/spss/tutorials/twostepcluster_table.htm
  • 42. Grouping Methods Based on IBM SPSS Modeler
  • 43. Data came from Global Market Research 2016: An ESOMAR Industry Report. Amsterdam: ESOMAR, 2016 Using a dendrogram analysis based on the detailed data from 52 countries and setting the number of groups to five, Indonesia appears to share a similar profile with Philippines and Vietnam in the South East Asia region Malaysia turns out to be closer to China While Singapore is closer to Hong Kong, South Korea, and Taiwan. They’re all grouped together with Germany, the 3rd largest market. Analysis was done in R Grouping Method: An Example of a Dendrogram Analysis Dendrogram was derived using Euclidean distance measure and “complete linkage” clustering method
  • 44.
  • 45. Banking • Cross-selling and up- selling • Credit scoring for home loan application • Credit scoring for credit card application • Customer segmentation based on transactional data (via ATM, mobile/SMS/Internet banking, teller) • Churn analysis (predicting if an account would become inactive/dormant) InformationandCommunication Technology • Cross-selling and up- selling • Churn analysis (predicting if the customer would stop using the service) FinancingCompany • Cross-selling and up- selling of two- and four-wheelers • Churn analysis (predicting if the loan would be fully paid before the end of the contract) • Credit scoring for two- and four-wheelers loan application • Credit scoring for refinancing application
  • 46. Objective: to predict the likelihood of an account to become inactive / dormant in the next 6 months Challenges faced: How to define “inactive” or “dormant”? What would be the time period to be classified as “inactive” / “dormant”? Unclean data; e.g., wrong dates in the database, wrong account’s status Merging customer’s profile data and transactional data Customers’ complaint data were not recorded completely Variables used in the analysis include the following: Customer’s demographic data Product ownership with the bank: savings, investment, credit card, secured and unsecured loan Customer’s detailed transaction Customer’s monthly balance Methods employed: Decision trees Logistic regression
  • 47. Objective: to predict the likelihood that a loan (for a two- or four-wheeler) would be fully paid before the end of the contract Variables used in the analysis include the following: Customer’s demographic data Type of customer: new or recurring Info on the current contract: new or used vehicle, amount of loan, loan tenor, amount of down payment Monthly payment data Delinquent record Methods employed: Decision trees Logistic regression Challenges faced: After investigating many variables and fitting many models, we found out that Lebaran appeared to have a significant impact towards the payment behavior Hence, we created a dummy variable to indicate the month before Lebaran and the month of Lebaran to improve the prediction
  • 48. Objective: to predict the likelihood that a current customer would accept a new loan offer (for a two- or four-wheeler) when the current loan contract has expired Variables used in the analysis include the following: Customer’s demographic data Type of customer: new or recurring Info on the current contract: new or used vehicle, amount of loan, loan tenor, amount of down payment Monthly payment data Delinquent record Methods employed: Decision trees Logistic regression Challenges faced: The prediction accuracy was poor (close to 50%); most likely due to the irrelevant variables used in the model It appeared that payment behavior data was not correlated with the likelihood of accepting a new loan offer It could be more fruitful to conduct a marketing research project to test several loan concepts and offer the winning concept to the segments of customers showing the highest interest towards it
  • 49.
  • 50. *A term coined by Stephen Few in his book Big Data, Big Dupe (2018) * It’s all about the (explicit) objectives! Data deep dive by solid data sensemaking skills Augmented by good technologies
  • 51. Proposal to mine value from data by Stephen Few in Big Data, Big Dupe (2018) 2 Only relevant data of good quality is useful 3 Information is only useful if it is understood 4 The steps that lead to understanding require skills that must be learned 5 The development of data-sensemaking skills involves a great deal of study and practice 6 Study and practice take time – lots of time 7 Technologies are needed to augment human data-sensemaking abilities 8 Only well-designed technologies are worth the investment of time and money 9 Skilled data sensemakers combine general analytical skills with specific domain knowledge; one without the other is not enough 1 Useful information can be derived from data
  • 52. Books and Articles Breiman, Leo, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and Regression Trees. First reprint ed. Boca Raton, FL: CRC Press, 1998. Brown, Meta S. Data Mining for Dummies. Hoboken, NJ: Wiley, 2014. Few, Stephen. Big Data, Big Dupe: A Little Book about a Big Bunch of Nonsense. El Dorado Hills, CA: Analytics Press, 2018. Hastie, Trevor J., Robert John Tibshirani, and Jerome H. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York, NY: Springer, 2016. Hosmer, David W., and Stanley Lemeshow. Applied Logistic Regression. 2nd ed. New York: Wiley, 2000. Linoff, Gordon S., and Michael J.A. Berry. Data Mining Techniques For Marketing, Sales, and Customer Relationship Management. 3rd ed. Indianapolis, IN: Wiley, 2011. Olson, David L., and Yong Shi. Introduction to Business Data Mining. International ed. Boston, MA: McGraw-Hill, 2007. Putler, Daniel S., and Robert E. Krider. Customer and Business Analytics: Applied Data Mining for Business Decision Making Using R. International ed. Boca Raton, FL: CRC Press, 2012.
  • 53. Books and Articles Ratner, Bruce. Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data. 3rd ed. Boca Raton, FL: CRC Press, 2017. Siddiqi, Naeem. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. Hoboken, NJ: Wiley, 2006. Zicari, Roberto V. "Big Data: Challenges and Opportunities." In Big Data Computing, edited by Rajendra Akerkar, 104- 28. Boca Raton, FL: CRC Press, 2014.
  • 54. Online Resources KDNuggets™ (https://www.kdnuggets.com/) A leading site on Business Analytics, Big Data, Data Mining, Data Science, and Machine Learning and is edited by Gregory Piatetsky-Shapiro and Matthew Mayo IBM Big Data & Analytics Hub (http://www.ibmbigdatahub.com/) Brought by IBM, the Hub is the home for current content and conversation regarding big data and analytics for the enterprise from thought-leaders, subject matter experts and big data practitioners IBM developerWorks Data and Analytics (https://www.ibm.com/developerworks/learn/analytics/) A source of innovative how-tos, tools, code, and communities to help you solve the complex problems that you face as a developer in an enterprise organization SAS Big Data Insights (https://www.sas.com/en_id/insights/big-data.html) Explore hot topics and trends in big data curated by SAS Coursera (https://www.coursera.org) Provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online
  • 55. I would like to thank Sani Susanto, Ph.D for inviting me to give the talk Bruce Ratner, Ph.D for suggesting the outline of the presentation
  • 56. Leap.Research LeapResearchID Leap.Research Leap Research Leap Research Leap Research Leap-Research.com © 2018 Leap Research. All rights reserved. Leap Research SOHO Podomoro City, Unit 18-05 Jl. Letjen S. Parman Kav. 28, Jakarta 11470 Quantitative Senior Research Director and Partner ts.lim@leap-research.com +62 818 906 875