SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Overview of BA Discussion
ď‚— Business Analytics (BA)
ď‚— Overview
ď‚— History
ď‚— Types of Business Analytics
ď‚— Real world examples
ď‚— Challenges
ď‚— Relations to Data Mining
Business Analytics (BA) : an
overview
ď‚— BA can be considered a subset of Business intelligence
ď‚— A set of skills, technologies, applications and practices
ď‚— exploration and investigation of past business performance
to gain insight and drive business planning.
ď‚— Like Business Intelligence, BA can focus either on the
business as a whole or only on segments of it
ď‚— Focuses on developing new insights and understanding
of performance based on data and statistical methods
BA : Short History
ď‚— Analytics in business dates far before computing
ď‚— Frederick Taylor, father of scientific management, 19th
century
ď‚— time management exercises used in industrial settings
ď‚— Henry Ford : assembly line pacing used to improve output
and business profitability
ď‚— BA becomes widespread when computers were used in
DSS systems in the 60’s
ď‚— Evolved into ERP, data warehouses, etc.
Types of Business Analytics
ď‚— Reporting or Descriptive Analytics
ď‚— Affinity grouping
ď‚— Clustering
ď‚— Modeling or Predictive analytics
BA: Reporting
ď‚— Based on the need to locate and distribute business
insights and experiences
ď‚— Often involves ETL procedures used alongside a data
warehousing scheme
ď‚— The data is then collected, quantified, and organized
using reporting tools
ď‚— Reporting, allows for information describing different
views of an enterprise to come together one place
ď‚— A user could query a production and marketing database to
determine if production of a product could be moved closer
to where a product is sold
BA: Affinity grouping
ď‚— A tool used by businesses and
organizations to take ideas
and data and organize them.
ď‚— Often takes the form of an affinity diagram
ď‚— Enables data and ideas stemming from
brainstorming to be sorted into groups
ď‚— Sorting is based on their natural relationships
BA: Clustering
ď‚— Placing a set of objects into groups (called clusters) so
that the objects in the same cluster are more similar (in
some sense or another) to each other than to those in
other clusters – wikipedia
ď‚— Is a main task of explorative data mining and statistical
data analysis
ď‚— Clustering is a general task that does not have one set
solution
ď‚— Clustering can be hard or fuzzy
ď‚— Can be done by people or machines
ď‚— The latter is preferred
BA: how do we model clusters?
 Connectivity models – how data can be connected to
other points
 Density models – defining a cluster by determining where
sets of data points are densest
 Distribution model – clusters are modeled using statistical
distributions
ď‚— Expectation maximization
BA: Predictive Analysis
ď‚— Stems from the desire to predict future events through
analyzing data an enterprise has collected
ď‚— Pattern exploitation results in the identification of
opportunities and also risks
ď‚— Allow relationships in disparate data to be identified
ď‚— Helps guide in decision making in a business
ď‚— Is often implemented in the form of data mining
BA : Examples
 Credit company– uses business analytics to track credit risk of
customers as well as matching customers to offerings
 Sales and offers – companies can track customer interaction,
and use that information to determine appropriate product
offerings.
ď‚— Sales groups can use BA to optimize inventory and analyze
past sales
ď‚— Could measure peak purchasing times for products
ď‚— Could decide whether or not to stock poorly selling items
ď‚— Give examples of business cases where data mining might be
useful, and describe how data mining would be used
ď‚— Preventing credit card fraud through detecting spending patterns
ď‚— Inventory management by tracking sales
BA : Challenges
ď‚— Acquiring sufficient volumes of high quality data
ď‚— Most data acquired in the field is unsorted and appears in
many different formats
ď‚— When dealing with high volume data, deciding what is
important and what is noise
ď‚— Rapidly reacting storage structures
ď‚— BA can influence customer interactions, and as such that
information must be available fast
ď‚— Ex: a customized sales pitch
Business Analytics & Data Mining
ď‚— Data Mining is an important sub task of Business
Analytics
ď‚— Both Predictive analysis and clustering tasks
utilize information retrieved from data mining
ď‚— Data mining helps handle some of the specific
problems faced when conducting Business
Analytics
ď‚— Dealing with and sorting through large data sets
Data Mining : An Overview
ď‚— What is Data Mining ?
ď‚— History
ď‚— Applications of Data Mining
ď‚— Detecting data discrepancies or outliers
ď‚— Relationship identification
ď‚— Data-Function mapping for modeling/prediction
ď‚— Categorizing and Summarizing Data
ď‚— Standards
ď‚— Challenges
Data Mining : What is it?
ď‚— Applying statistical analysis techniques to data
ď‚— the goal often being to determine unnoticed patterns or to
collect categorized information
ď‚— turns collected data into understandable structures
ď‚— Data Mining is often used as a buzz word to describe
processing large amounts of data
ď‚— In essence, its correct use relates to discovery of new
things through observation
ď‚— Synonymous with knowledge discovery
Data Mining : History
ď‚— Though HNC trademarked the term in 1990, hands on
pattern extraction is centuries old
ď‚— As long as statistic analysis has existed
ď‚— Discoveries in computer science have increasingly
shifted the field from hands on to machine dependent,
this allows for :
ď‚— The use of data indexing and DB systems to handle data
efficiently
ď‚— The application of statistical algorithms on a large scale,
possibly in a distributed manner, with less error
Data Mining : Use : Application
ď‚— Data Mining is often broken into several different
categories of tasks
ď‚— Detecting data discrepancies or outliers
ď‚— Relationship identification
ď‚— Data-Function mapping for modeling/prediction
ď‚— Categorizing and Summarizing Data
Data Mining : Finding outliers
ď‚— The process of analyzing large, mostly
homogeneous, sets of data and determining
which sets or points
 “go with the flow” and conform with patterns the rest
of the data seem to follow
ď‚— do not follow expected results when viewed against
the entire set of data
ď‚— An outlier can be a point or set of points, but can
also be defined through other means
ď‚— A period of time could yield unexpected results
ď‚— Ex. Network Intrusion
Data Mining : Techniques in finding outliers
 Rule Based – deciding a set of rules that
determine an outlier (or what isn’t one)
ď‚— Can be fuzzy or hard rules
 Cluster Analysis – As mentioned earlier
 Distance or Standard Deviation – Determining an
average over a data set and marking points that
aren’t within a Deviation or Distance
Applications of Outlier Detection
ď‚— Network Intrusion Detection
ď‚— Unusual bursts of network activity
ď‚— Identity Theft Detection
ď‚— Unusual spending or customer activity
ď‚— Detecting Software bugs
ď‚— Software does not deliver expected outputs
ď‚— Sensor event detection
ď‚— Monitoring patient health fluctuations in a medical setting
ď‚— Preprocessing
ď‚— Removing data skews based on extenuating
circumstances
Relationship Discovery: Basics
ď‚— Understanding how data is related is a key factor
in trend and knowledge discovery
ď‚— This is the definition of data mining
ď‚— Ex: Which products are often bought before a major
forecasted storm
ď‚— {hamburger buns} => {???}
ď‚— With small sets of data, or with correlations that
aren’t subtle (as the one above), identifying
relationships is not as difficult
ď‚— With large data sets or subtle relations a
combination of rule generation and data analysis
can be used to expedite the process
Relationship Discovery: How its done
ď‚— Since the number of relationships between points
of data could be boundless, two important
concepts are often introduced in relationship
discovery:
ď‚— The amount of data within which a relationship
might exist, called the support of a rule.
ď‚— The probability that data in the support will verify a
selected rule, called the confidence of a rule.
Relationship Discovery: How its done
ď‚— Generally we apply minimum bounds to both the support of
a rule and its confidence to determine relationships
ď‚— First : determine possible relationships
ď‚— Set a minimum support
ď‚— Orders with hamburgers, Orders with hamburger buns
ď‚— Other, user specific rules can be used here
ď‚— Second : take the remaining sets, look for patterns in the
items sets such that occurrence rate is above the minimum
confidence
ď‚— How many people bought hamburgers and buns together
ď‚— Ex: we find that if the customer is a male, and they buy
diapers, they will also buy beer
ď‚— {male, diapers} => {beer}
Matching data to functions
ď‚— Often, it is desirable to match data sets and the
factors that determine them to functions
ď‚— Allows for the possibility of predicting future results
ď‚— Involves learning how dependent and
independent variables in our data interact
ď‚— Dependent : the result, or where a point exists
ď‚— Independent : an cause or circumstance that
determines the dependent variable
ď‚— If we know how dependent and independent
variables interact, we can create a function and
run simulations to see results
Uses of Function-Data Mapping
ď‚— Weather Forecasting
ď‚— Determining what conditions lead to what kinds of
weather
ď‚— Stock market analysis
ď‚— When to buy and when to sell
ď‚— Crime Prevention
ď‚— What conditions cause or prevent crime
Categorizing
 Categorizing – Often we want to separate data
based off of a set of predefined attributes
ď‚— Very helpful in pattern recognition
ď‚— Ex: a persons political preference
ď‚— The process :
ď‚— we synthetically generate or measure a set of
observations (data points) with known categories
ď‚— we extract properties from said observations which
we believe contribute to the category
ď‚— These are called explanatory variables
ď‚— Finally we examine new data for these properties
Summarizing
 Summarizing – we almost never want to look at all of
the data individually
ď‚— Having too much data can actually hider the decision
making process
ď‚— Known as information overload
ď‚— Summarizing takes the results from data mining and
transforms it into formats that can be easily read
without omitting important information
ď‚— Summarizing might :
ď‚— Extract and display only important data
ď‚— correlate and abstract data to display trends
ď‚— Formats Include : Reports, Graphs, Dashboards, etc.
Standards : CRISP-DM
ď‚— Cross Industry Standard Process for Data Mining
ď‚— describes common practice for conducting data mining in an
enterprise setting
 KD nuggets – a community resource in DM and analytics
took polls and found CRISP-DM was the top methodology
in 02’, 04’, & 07’
ď‚— Six step methodology
ď‚— Business Understanding
ď‚— Data Understanding
ď‚— Data Preparation
ď‚— Modeling
ď‚— Evaluation
ď‚— Deployment
CRISP-DM : Explained
ď‚— Business Understanding
ď‚— Determining the business purpose
 Define success conditions – how do we know we succeeded
ď‚— Ex : improved prediction accuracy
ď‚— Map purpose/success conditions to data mining results
ď‚— Ex: fraud prevention => detect deviations
ď‚— Data Understanding
 Collecting and exploring data – defining its attributes
ď‚— Data quality verification
CRISP-DM : Explained
ď‚— Data Preparation
ď‚— Data Cleaning
 Normalization – fitting data within ranges
 Outlier removal – removing cases that could skew the model
 Handle missing attributes – the data was not obtained
 Formatting – changing data so that it fits with our tools
 Modeling – fitting the data to a model following the
methods previously described and then interpreting that
model
ď‚— Assess the accuracy of the collected data
ď‚— General purpose divided into prediction or description
CRISP-DM : Explained
 Evaluation – look at results and measure them with respect
to the success cases defined earlier
ď‚— Determine if one has succeeded
ď‚— Determine next steps, how do we apply the results
 Deployment – The execution of a strategy for using the
results of our data mining
ď‚— Includes preparing ways to monitor and maintain the
application of data mining results in the day to day
ď‚— Includes some sort of final summary
SEMMA
ď‚— Sample, Explore, Modify, Model and Assess
ď‚— Proposed by SAS Institute : A producer of BI and BA
software suites.
ď‚— Though this model is often considered general SAS
prefers to apply it directly to their products
ď‚— Focuses mainly on data mining and not on applying results
to business (unlike CRISP-DM)
Sampl
e
selecting the data set
Explor
e
Understand data through discovering relationships, both expected and
otherwise
Modify Transform and clean the data in order to prepare it for the modeling
process
Model Apply models to the data in order to discover trends and make predictions
Assess Evaluate the results of the modeling process to determine the reliability of
the mined data
Challenges in data mining
ď‚— Not enough or too much data
ď‚— Oftentimes it is difficult to access sufficient quantities of data
for small enterprises
ď‚— If the enterprise is large however, sometimes there is too
much and deciding what to keep is difficult
ď‚— Acquiring clean data
ď‚— Multiple formats or no format at all
ď‚— Privacy and ethical concerns
ď‚— Data aggregation : data compiled from multiple sources can
lead to revelations that violate privacy concerns
ď‚— Ex: anonymous data is collected and aggregated, leading to
identification

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data FundamentalsCloudera, Inc.
 
Data Mining Technique - CRISP-DM
Data Mining Technique - CRISP-DMData Mining Technique - CRISP-DM
Data Mining Technique - CRISP-DMAshish Chandra Jha
 
Analytical tools
Analytical toolsAnalytical tools
Analytical toolsAniket Joshi
 
Business Intelligence And Business Analytics | Management
Business Intelligence And Business Analytics | ManagementBusiness Intelligence And Business Analytics | Management
Business Intelligence And Business Analytics | ManagementTransweb Global Inc
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An OverviewMachinePulse
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Seerat Malik
 
Application of MIS in manufacturing sector
Application of MIS in manufacturing sectorApplication of MIS in manufacturing sector
Application of MIS in manufacturing sectorArpan Mahato
 
Business analytics and data visualisation
Business analytics and data visualisationBusiness analytics and data visualisation
Business analytics and data visualisationShwetabh Jaiswal
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analyticsSSaudia
 
Business intelligence vs business analytics
Business intelligence  vs business analyticsBusiness intelligence  vs business analytics
Business intelligence vs business analyticsSuvradeep Rudra
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data AnalyticsUtkarsh Sharma
 

Was ist angesagt? (20)

Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
 
Business Analytics
 Business Analytics  Business Analytics
Business Analytics
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Data Mining Technique - CRISP-DM
Data Mining Technique - CRISP-DMData Mining Technique - CRISP-DM
Data Mining Technique - CRISP-DM
 
Analytical tools
Analytical toolsAnalytical tools
Analytical tools
 
Business Intelligence And Business Analytics | Management
Business Intelligence And Business Analytics | ManagementBusiness Intelligence And Business Analytics | Management
Business Intelligence And Business Analytics | Management
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Data Preparation.pptx
Data Preparation.pptxData Preparation.pptx
Data Preparation.pptx
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
 
Application of MIS in manufacturing sector
Application of MIS in manufacturing sectorApplication of MIS in manufacturing sector
Application of MIS in manufacturing sector
 
Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
 
Data analytics
Data analyticsData analytics
Data analytics
 
Business analytics and data visualisation
Business analytics and data visualisationBusiness analytics and data visualisation
Business analytics and data visualisation
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Business intelligence vs business analytics
Business intelligence  vs business analyticsBusiness intelligence  vs business analytics
Business intelligence vs business analytics
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
Data analytics
Data analyticsData analytics
Data analytics
 

Andere mochten auch

Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
Using Road Sensor Data for Official Statistics: towards a Big Data MethodologyUsing Road Sensor Data for Official Statistics: towards a Big Data Methodology
Using Road Sensor Data for Official Statistics: towards a Big Data MethodologyPiet J.H. Daas
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-stepsShesha R
 
Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API
Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API
Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API SensorUp
 
PMM23 Week 3 Lectures
PMM23 Week 3 LecturesPMM23 Week 3 Lectures
PMM23 Week 3 Lecturespdiddyboy2
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining ProcessMarc Berman
 
Business analytics
Business analyticsBusiness analytics
Business analyticsSilla Rupesh
 

Andere mochten auch (7)

Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
Using Road Sensor Data for Official Statistics: towards a Big Data MethodologyUsing Road Sensor Data for Official Statistics: towards a Big Data Methodology
Using Road Sensor Data for Official Statistics: towards a Big Data Methodology
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
 
Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API
Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API
Analyze Your Smart City: Build Sensor Analytics with OGC SensorThings API
 
PMM23 Week 3 Lectures
PMM23 Week 3 LecturesPMM23 Week 3 Lectures
PMM23 Week 3 Lectures
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
Tugas komdat 1
Tugas komdat 1Tugas komdat 1
Tugas komdat 1
 
Business analytics
Business analyticsBusiness analytics
Business analytics
 

Ă„hnlich wie Business analytics and data mining

Data Mining
Data MiningData Mining
Data MiningGary Stefan
 
Data Analysis - Approach & Techniques
Data Analysis - Approach & TechniquesData Analysis - Approach & Techniques
Data Analysis - Approach & TechniquesInvenkLearn
 
what is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysiswhat is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysisData analysis ireland
 
leewayhertz.com-Data analysis workflow using Scikit-learn.pdf
leewayhertz.com-Data analysis workflow using Scikit-learn.pdfleewayhertz.com-Data analysis workflow using Scikit-learn.pdf
leewayhertz.com-Data analysis workflow using Scikit-learn.pdfKristiLBurns
 
Business intelligence and analytics
Business intelligence and analyticsBusiness intelligence and analytics
Business intelligence and analyticsYogesh Supekar
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentationmillerca2
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxcloudserviceuit
 
Data Processing & Explain each term in details.pptx
Data Processing & Explain each term in details.pptxData Processing & Explain each term in details.pptx
Data Processing & Explain each term in details.pptxPratikshaSurve4
 
Unit 1 pptx.pptx
Unit 1 pptx.pptxUnit 1 pptx.pptx
Unit 1 pptx.pptxrekhabawa2
 
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousingShubha Brota Raha
 
Data and Information Visualization part 2.pptx
Data and Information Visualization part 2.pptxData and Information Visualization part 2.pptx
Data and Information Visualization part 2.pptxLamees EL- Ghazoly
 

Ă„hnlich wie Business analytics and data mining (20)

Data Mining
Data MiningData Mining
Data Mining
 
Data Analysis - Approach & Techniques
Data Analysis - Approach & TechniquesData Analysis - Approach & Techniques
Data Analysis - Approach & Techniques
 
what is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysiswhat is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysis
 
leewayhertz.com-Data analysis workflow using Scikit-learn.pdf
leewayhertz.com-Data analysis workflow using Scikit-learn.pdfleewayhertz.com-Data analysis workflow using Scikit-learn.pdf
leewayhertz.com-Data analysis workflow using Scikit-learn.pdf
 
Data Mining
Data MiningData Mining
Data Mining
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Datamining
DataminingDatamining
Datamining
 
Datamining
DataminingDatamining
Datamining
 
Business intelligence and analytics
Business intelligence and analyticsBusiness intelligence and analytics
Business intelligence and analytics
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentation
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
 
Data Processing & Explain each term in details.pptx
Data Processing & Explain each term in details.pptxData Processing & Explain each term in details.pptx
Data Processing & Explain each term in details.pptx
 
Data Mining
Data MiningData Mining
Data Mining
 
Unit 1 pptx.pptx
Unit 1 pptx.pptxUnit 1 pptx.pptx
Unit 1 pptx.pptx
 
data analysis-mining
data analysis-miningdata analysis-mining
data analysis-mining
 
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousing
 
Data and Information Visualization part 2.pptx
Data and Information Visualization part 2.pptxData and Information Visualization part 2.pptx
Data and Information Visualization part 2.pptx
 
Data mining-basic
Data mining-basicData mining-basic
Data mining-basic
 
Unit2
Unit2Unit2
Unit2
 

Mehr von Hoang Nguyen

Rest api to integrate with your site
Rest api to integrate with your siteRest api to integrate with your site
Rest api to integrate with your siteHoang Nguyen
 
How to build a rest api
How to build a rest apiHow to build a rest api
How to build a rest apiHoang Nguyen
 
Smm and caching
Smm and cachingSmm and caching
Smm and cachingHoang Nguyen
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsHoang Nguyen
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching worksHoang Nguyen
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cacheHoang Nguyen
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherenceHoang Nguyen
 
Python your new best friend
Python your new best friendPython your new best friend
Python your new best friendHoang Nguyen
 
Python language data types
Python language data typesPython language data types
Python language data typesHoang Nguyen
 
Python basics
Python basicsPython basics
Python basicsHoang Nguyen
 
Programming for engineers in python
Programming for engineers in pythonProgramming for engineers in python
Programming for engineers in pythonHoang Nguyen
 
Learning python
Learning pythonLearning python
Learning pythonHoang Nguyen
 
Extending burp with python
Extending burp with pythonExtending burp with python
Extending burp with pythonHoang Nguyen
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and pythonHoang Nguyen
 
Object oriented programming using c++
Object oriented programming using c++Object oriented programming using c++
Object oriented programming using c++Hoang Nguyen
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysisHoang Nguyen
 
Object model
Object modelObject model
Object modelHoang Nguyen
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithmsHoang Nguyen
 

Mehr von Hoang Nguyen (20)

Rest api to integrate with your site
Rest api to integrate with your siteRest api to integrate with your site
Rest api to integrate with your site
 
How to build a rest api
How to build a rest apiHow to build a rest api
How to build a rest api
 
Api crash
Api crashApi crash
Api crash
 
Smm and caching
Smm and cachingSmm and caching
Smm and caching
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching works
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
 
Cache recap
Cache recapCache recap
Cache recap
 
Python your new best friend
Python your new best friendPython your new best friend
Python your new best friend
 
Python language data types
Python language data typesPython language data types
Python language data types
 
Python basics
Python basicsPython basics
Python basics
 
Programming for engineers in python
Programming for engineers in pythonProgramming for engineers in python
Programming for engineers in python
 
Learning python
Learning pythonLearning python
Learning python
 
Extending burp with python
Extending burp with pythonExtending burp with python
Extending burp with python
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and python
 
Object oriented programming using c++
Object oriented programming using c++Object oriented programming using c++
Object oriented programming using c++
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
 
Object model
Object modelObject model
Object model
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 

KĂĽrzlich hochgeladen

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 

KĂĽrzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 

Business analytics and data mining

  • 1. Overview of BA Discussion ď‚— Business Analytics (BA) ď‚— Overview ď‚— History ď‚— Types of Business Analytics ď‚— Real world examples ď‚— Challenges ď‚— Relations to Data Mining
  • 2. Business Analytics (BA) : an overview ď‚— BA can be considered a subset of Business intelligence ď‚— A set of skills, technologies, applications and practices ď‚— exploration and investigation of past business performance to gain insight and drive business planning. ď‚— Like Business Intelligence, BA can focus either on the business as a whole or only on segments of it ď‚— Focuses on developing new insights and understanding of performance based on data and statistical methods
  • 3. BA : Short History ď‚— Analytics in business dates far before computing ď‚— Frederick Taylor, father of scientific management, 19th century ď‚— time management exercises used in industrial settings ď‚— Henry Ford : assembly line pacing used to improve output and business profitability ď‚— BA becomes widespread when computers were used in DSS systems in the 60’s ď‚— Evolved into ERP, data warehouses, etc.
  • 4. Types of Business Analytics ď‚— Reporting or Descriptive Analytics ď‚— Affinity grouping ď‚— Clustering ď‚— Modeling or Predictive analytics
  • 5. BA: Reporting ď‚— Based on the need to locate and distribute business insights and experiences ď‚— Often involves ETL procedures used alongside a data warehousing scheme ď‚— The data is then collected, quantified, and organized using reporting tools ď‚— Reporting, allows for information describing different views of an enterprise to come together one place ď‚— A user could query a production and marketing database to determine if production of a product could be moved closer to where a product is sold
  • 6. BA: Affinity grouping ď‚— A tool used by businesses and organizations to take ideas and data and organize them. ď‚— Often takes the form of an affinity diagram ď‚— Enables data and ideas stemming from brainstorming to be sorted into groups ď‚— Sorting is based on their natural relationships
  • 7. BA: Clustering ď‚— Placing a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters – wikipedia ď‚— Is a main task of explorative data mining and statistical data analysis ď‚— Clustering is a general task that does not have one set solution ď‚— Clustering can be hard or fuzzy ď‚— Can be done by people or machines ď‚— The latter is preferred
  • 8. BA: how do we model clusters? ď‚— Connectivity models – how data can be connected to other points ď‚— Density models – defining a cluster by determining where sets of data points are densest ď‚— Distribution model – clusters are modeled using statistical distributions ď‚— Expectation maximization
  • 9. BA: Predictive Analysis ď‚— Stems from the desire to predict future events through analyzing data an enterprise has collected ď‚— Pattern exploitation results in the identification of opportunities and also risks ď‚— Allow relationships in disparate data to be identified ď‚— Helps guide in decision making in a business ď‚— Is often implemented in the form of data mining
  • 10. BA : Examples ď‚— Credit company– uses business analytics to track credit risk of customers as well as matching customers to offerings ď‚— Sales and offers – companies can track customer interaction, and use that information to determine appropriate product offerings. ď‚— Sales groups can use BA to optimize inventory and analyze past sales ď‚— Could measure peak purchasing times for products ď‚— Could decide whether or not to stock poorly selling items ď‚— Give examples of business cases where data mining might be useful, and describe how data mining would be used ď‚— Preventing credit card fraud through detecting spending patterns ď‚— Inventory management by tracking sales
  • 11. BA : Challenges ď‚— Acquiring sufficient volumes of high quality data ď‚— Most data acquired in the field is unsorted and appears in many different formats ď‚— When dealing with high volume data, deciding what is important and what is noise ď‚— Rapidly reacting storage structures ď‚— BA can influence customer interactions, and as such that information must be available fast ď‚— Ex: a customized sales pitch
  • 12. Business Analytics & Data Mining ď‚— Data Mining is an important sub task of Business Analytics ď‚— Both Predictive analysis and clustering tasks utilize information retrieved from data mining ď‚— Data mining helps handle some of the specific problems faced when conducting Business Analytics ď‚— Dealing with and sorting through large data sets
  • 13. Data Mining : An Overview ď‚— What is Data Mining ? ď‚— History ď‚— Applications of Data Mining ď‚— Detecting data discrepancies or outliers ď‚— Relationship identification ď‚— Data-Function mapping for modeling/prediction ď‚— Categorizing and Summarizing Data ď‚— Standards ď‚— Challenges
  • 14. Data Mining : What is it? ď‚— Applying statistical analysis techniques to data ď‚— the goal often being to determine unnoticed patterns or to collect categorized information ď‚— turns collected data into understandable structures ď‚— Data Mining is often used as a buzz word to describe processing large amounts of data ď‚— In essence, its correct use relates to discovery of new things through observation ď‚— Synonymous with knowledge discovery
  • 15. Data Mining : History ď‚— Though HNC trademarked the term in 1990, hands on pattern extraction is centuries old ď‚— As long as statistic analysis has existed ď‚— Discoveries in computer science have increasingly shifted the field from hands on to machine dependent, this allows for : ď‚— The use of data indexing and DB systems to handle data efficiently ď‚— The application of statistical algorithms on a large scale, possibly in a distributed manner, with less error
  • 16. Data Mining : Use : Application ď‚— Data Mining is often broken into several different categories of tasks ď‚— Detecting data discrepancies or outliers ď‚— Relationship identification ď‚— Data-Function mapping for modeling/prediction ď‚— Categorizing and Summarizing Data
  • 17. Data Mining : Finding outliers ď‚— The process of analyzing large, mostly homogeneous, sets of data and determining which sets or points ď‚— “go with the flow” and conform with patterns the rest of the data seem to follow ď‚— do not follow expected results when viewed against the entire set of data ď‚— An outlier can be a point or set of points, but can also be defined through other means ď‚— A period of time could yield unexpected results ď‚— Ex. Network Intrusion
  • 18. Data Mining : Techniques in finding outliers ď‚— Rule Based – deciding a set of rules that determine an outlier (or what isn’t one) ď‚— Can be fuzzy or hard rules ď‚— Cluster Analysis – As mentioned earlier ď‚— Distance or Standard Deviation – Determining an average over a data set and marking points that aren’t within a Deviation or Distance
  • 19. Applications of Outlier Detection ď‚— Network Intrusion Detection ď‚— Unusual bursts of network activity ď‚— Identity Theft Detection ď‚— Unusual spending or customer activity ď‚— Detecting Software bugs ď‚— Software does not deliver expected outputs ď‚— Sensor event detection ď‚— Monitoring patient health fluctuations in a medical setting ď‚— Preprocessing ď‚— Removing data skews based on extenuating circumstances
  • 20. Relationship Discovery: Basics ď‚— Understanding how data is related is a key factor in trend and knowledge discovery ď‚— This is the definition of data mining ď‚— Ex: Which products are often bought before a major forecasted storm ď‚— {hamburger buns} => {???} ď‚— With small sets of data, or with correlations that aren’t subtle (as the one above), identifying relationships is not as difficult ď‚— With large data sets or subtle relations a combination of rule generation and data analysis can be used to expedite the process
  • 21. Relationship Discovery: How its done ď‚— Since the number of relationships between points of data could be boundless, two important concepts are often introduced in relationship discovery: ď‚— The amount of data within which a relationship might exist, called the support of a rule. ď‚— The probability that data in the support will verify a selected rule, called the confidence of a rule.
  • 22. Relationship Discovery: How its done ď‚— Generally we apply minimum bounds to both the support of a rule and its confidence to determine relationships ď‚— First : determine possible relationships ď‚— Set a minimum support ď‚— Orders with hamburgers, Orders with hamburger buns ď‚— Other, user specific rules can be used here ď‚— Second : take the remaining sets, look for patterns in the items sets such that occurrence rate is above the minimum confidence ď‚— How many people bought hamburgers and buns together ď‚— Ex: we find that if the customer is a male, and they buy diapers, they will also buy beer ď‚— {male, diapers} => {beer}
  • 23. Matching data to functions ď‚— Often, it is desirable to match data sets and the factors that determine them to functions ď‚— Allows for the possibility of predicting future results ď‚— Involves learning how dependent and independent variables in our data interact ď‚— Dependent : the result, or where a point exists ď‚— Independent : an cause or circumstance that determines the dependent variable ď‚— If we know how dependent and independent variables interact, we can create a function and run simulations to see results
  • 24. Uses of Function-Data Mapping ď‚— Weather Forecasting ď‚— Determining what conditions lead to what kinds of weather ď‚— Stock market analysis ď‚— When to buy and when to sell ď‚— Crime Prevention ď‚— What conditions cause or prevent crime
  • 25. Categorizing ď‚— Categorizing – Often we want to separate data based off of a set of predefined attributes ď‚— Very helpful in pattern recognition ď‚— Ex: a persons political preference ď‚— The process : ď‚— we synthetically generate or measure a set of observations (data points) with known categories ď‚— we extract properties from said observations which we believe contribute to the category ď‚— These are called explanatory variables ď‚— Finally we examine new data for these properties
  • 26. Summarizing ď‚— Summarizing – we almost never want to look at all of the data individually ď‚— Having too much data can actually hider the decision making process ď‚— Known as information overload ď‚— Summarizing takes the results from data mining and transforms it into formats that can be easily read without omitting important information ď‚— Summarizing might : ď‚— Extract and display only important data ď‚— correlate and abstract data to display trends ď‚— Formats Include : Reports, Graphs, Dashboards, etc.
  • 27. Standards : CRISP-DM ď‚— Cross Industry Standard Process for Data Mining ď‚— describes common practice for conducting data mining in an enterprise setting ď‚— KD nuggets – a community resource in DM and analytics took polls and found CRISP-DM was the top methodology in 02’, 04’, & 07’ ď‚— Six step methodology ď‚— Business Understanding ď‚— Data Understanding ď‚— Data Preparation ď‚— Modeling ď‚— Evaluation ď‚— Deployment
  • 28. CRISP-DM : Explained ď‚— Business Understanding ď‚— Determining the business purpose ď‚— Define success conditions – how do we know we succeeded ď‚— Ex : improved prediction accuracy ď‚— Map purpose/success conditions to data mining results ď‚— Ex: fraud prevention => detect deviations ď‚— Data Understanding ď‚— Collecting and exploring data – defining its attributes ď‚— Data quality verification
  • 29. CRISP-DM : Explained ď‚— Data Preparation ď‚— Data Cleaning ď‚— Normalization – fitting data within ranges ď‚— Outlier removal – removing cases that could skew the model ď‚— Handle missing attributes – the data was not obtained ď‚— Formatting – changing data so that it fits with our tools ď‚— Modeling – fitting the data to a model following the methods previously described and then interpreting that model ď‚— Assess the accuracy of the collected data ď‚— General purpose divided into prediction or description
  • 30. CRISP-DM : Explained ď‚— Evaluation – look at results and measure them with respect to the success cases defined earlier ď‚— Determine if one has succeeded ď‚— Determine next steps, how do we apply the results ď‚— Deployment – The execution of a strategy for using the results of our data mining ď‚— Includes preparing ways to monitor and maintain the application of data mining results in the day to day ď‚— Includes some sort of final summary
  • 31. SEMMA ď‚— Sample, Explore, Modify, Model and Assess ď‚— Proposed by SAS Institute : A producer of BI and BA software suites. ď‚— Though this model is often considered general SAS prefers to apply it directly to their products ď‚— Focuses mainly on data mining and not on applying results to business (unlike CRISP-DM) Sampl e selecting the data set Explor e Understand data through discovering relationships, both expected and otherwise Modify Transform and clean the data in order to prepare it for the modeling process Model Apply models to the data in order to discover trends and make predictions Assess Evaluate the results of the modeling process to determine the reliability of the mined data
  • 32. Challenges in data mining ď‚— Not enough or too much data ď‚— Oftentimes it is difficult to access sufficient quantities of data for small enterprises ď‚— If the enterprise is large however, sometimes there is too much and deciding what to keep is difficult ď‚— Acquiring clean data ď‚— Multiple formats or no format at all ď‚— Privacy and ethical concerns ď‚— Data aggregation : data compiled from multiple sources can lead to revelations that violate privacy concerns ď‚— Ex: anonymous data is collected and aggregated, leading to identification

Hinweis der Redaktion

  1. Taylor : mechanical engineer who focused on improving industrial efficiency DSS – Decision Support Systems, ERP – Enterprise Resource Planning
  2. 4:40
  3. Fuzzy clustering – each object has a likeliness of belonging to a cluster
  4. Expected max - multivariate normal distributions - One can simply pick arbitrary values for one of the two sets of unknowns, use them to estimate the second set, then use these new values to find a better estimate of the first set, and then keep alternating between the two until the resulting values both converge to fixed points
  5. 17:20
  6. Agrawal, R.; Imieliński, T.; Swami, A. (1993). "Mining association rules between sets of items in large databases". Proceedings of the 1993 ACM SIGMOD international conference on Management of data - SIGMOD '93. pp. 207. doi:10.1145/170035.170072.ISBN 0897915925.  http://en.wikipedia.org/wiki/Association_rule_learning#Useful_Concepts
  7. Agrawal - Agrawal, R.; Imieliński, T.; Swami, A. (1993). "Mining association rules between sets of items in large databases". Proceedings of the 1993 ACM SIGMOD international conference on Management of data - SIGMOD '93. pp. 207 30 min
  8. http://en.wikipedia.org/wiki/Regression_analysis
  9. http://en.wikipedia.org/wiki/Statistical_classification
  10. http://en.wikipedia.org/wiki/Information_overload
  11. http://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining http://www.kdnuggets.com/polls/2002/methodology.htm http://www.kdnuggets.com/polls/2004/data_mining_methodology.htm http://www.kdnuggets.com/polls/2007/data_mining_methodology.htm
  12. http://dms.irb.hr/tutorial/tut_prob_understand.php http://dms.irb.hr/tutorial/tut_data_understand.php
  13. http://dms.irb.hr/tutorial/tut_data_prepare.php http://dms.irb.hr/tutorial/tut_modelling.php
  14. http://dms.irb.hr/tutorial/tut_evaluation.php http://dms.irb.hr/tutorial/tut_deployment.php
  15. http://www.sas.com/offices/europe/uk/technologies/analytics/datamining/miner/semma.html
  16. Data mining - wikipedia