SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Combining Data Mining and Machine
Learning for Effective User Profiling
Saturday, 14 May 2016
Wealth of data/information, Lack of knowledge
The databases are more and more large
• Terrorbytes!
A deluge of data, containing a lot of hidden information
• new knowledge
What are the technological motivations?
• Technologies to collect data
• Bar code readers, scanners, cameras, etc..
• Technologies to store data
• Databases, data warehouses, other repositories
• Network (Web) as computing and storage platform
An example of data deluge:
• the WEB and SOCIAL MEDIA !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Why Mine Data? Commercial Viewpoint
Lots of data is being collected and warehoused
• Web data, e-commerce
• Purchases at department/ grocery stores
• Bank/Credit Card transactions
Competitive Pressure is Strong
• Use Data Mining to provide better, customized services for an edge
(e.g. in Customer Relationship Management)
Why Mine Data? Scientific Viewpoint
Data collected and stored at enormous
speeds (GB/hour)
• remote sensors on a satellite
• telescopes scanning the skies
• microarrays generating gene expression data
• scientific simulations generating terabytes of data
Traditional techniques infeasible for raw data
Data mining may help scientists:
• in classifying and segmenting data
• in Hypothesis Formation
What is Data Mining
Data mining (Many Definitions)
Exploration & analysis, by automatic or semi-automatic means, of large
quantities of data in order to discover meaningful patterns
Data mining: a misnomer?
It should be pattern mining in analogy to gold mining
Alternative names:
Knowledge discovery(mining) in databases (KDD), knowledge extraction,
data/pattern analysis, data archeology, data dredging, information
harvesting, business intelligence, etc.
Origins of Data Mining
• Draws ideas from machine learning/AI, pattern recognition, statistics,
database systems, HPC
• Traditional Techniques may be unsuitable due to
1. Enormity of data
2. High dimensionality of data
3. Heterogeneous, distributed nature of data
Machine Learning/
Pattern
Recognition
Statistics/
AI
Data Mining
Database
systems
High
Performance
Computing
KDD is a process
Data Warehouse Cleansing / Selection /
Transformation
Data Selection
Data Integration
Databases
Pattern Interpretation /
Evaluation
– Data mining is the core of the
KDD process
Data Mining
Task-relevant Data
Data Mining: On What Kind of Data?
• Relational databases
• Data warehouses
• Transactional databases
• Advanced DB and information repositories
1. Object-oriented and object-relational databases
2. Spatial databases
3. Time-series data and temporal data
4. Text databases and multimedia databases
5. Heterogeneous and legacy databases
6. WWW
Web Mining applies DM to WWW
Data Mining
•Often applied to structured database
Web mining
• Applied to less structured data, dynamic, of huge size
• Not only Web content, but also hyperlinks and access
log
Web Mining Hierarchy
Why?
Data gathered from both the web and more conventional sources can
be used to answer such questions as:
• Marketing - those likely to buy.
• Forecasts - predicting demand.
• Loyalty - those likely to defect.
• Credit - which were the profitable items.
• Fraud - when and where they occur.
Related Terms
DATA MINING PREDICTIVEANALYTICS
DISCOVERY AND
COMMUNICATION OF
MEANINGFUL
PATTERNS IN DATA.
PROCESS OF DISCOVERING
PATTERNS IN LARGE
DATASETS USING METHODS
FROM AI, MACHINE
LEARNING, STATISTICS AND
DATABASE SYSTEMS
TECHNIQUES FROM
STATISTICS, MACHINE
LEARNING AND DATA
MINING IN CONJUNCTION
WITH HISTORICAL AND
CURRENT DATATO MAKE
PREDICTIONS ABOUT THE
FUTURE.
Machine Learning
Underlying processx y
Machine
learning
algorithm
Model that approximates the
underlying process
“Using data to understand an underlying process”
Underlying process {x1, x2, …}
Machine
learning
algorithm
Model that approximates the
underlying process
“Using data to understand an underlying process”
Data set 1
Model 1
Data set 2
Model 2
The created model depends on the data values used for training.
Machine
learning
algorithm
Machine
learning
algorithm
Why build a model?
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo o
o
o o
o o
o
o
o
o
o o
x
o
o o
o
o x o o
o oo
oo oo
o o
o
o
o
o
o o
o o o
o
x
o o
oo
o
o o
Time
• Predict
– A continuous value
– A category label
• Find clusters in data
• Identify key predictors
• …
Why build a model (cont..)
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo o
o
o o
o o
o
o
o
o
o o
x
o
o o
o
o x o o
o oo
oo oo
o o
o
o
o
o
o o
o o o
o
x
o o
o
oo
o
o o
Time
• Predict
– A continuous value
– A category label
• Find clusters in data
• Identify key predictors
• …
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo o
o
o o
o o
o
o
o
o
o o
x
o
o o
o
o x o o
o oo
oo oo
o o
o
o
o
o
o o
o o o
o
x
o o
oo
o
o o
Time
• Predict
– A continuous value
– A category label
• Find clusters in data
• Identify key predictors
• …
Why build a model (cont..)
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo o
o
o o
o o
o
o
o
o
o o
x
o
o o
o
o x o o
o oo
oo oo
o o
o
o
o
o
o o
o o o
o
x
o o
oo
o
o o
Time
• Predict
– A continuous value
– A category label
• Find clusters in data
• Identify key predictors
• …
Why build a model (cont..)
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo o
o
o o
o o
o
o
o
o
o o
x
o
o o
o
o x o o
o oo
oo oo
o o
o
o
o
o
o o
o o o
o
x
o o
oo
o
o o
Time
– A continuous value
– A category label
• Find clusters in data
• Identify key predictors
• …
Why build a model (cont..)
Training phase
– The machine learning algorithm
learns from data
– Output is a trained model
– Time consuming
– Typically involves multiple iterations
over training data
Testing or scoring phase
– The trained model is used in
conjunction with new data inputs to
estimate corresponding output
– Much quicker as compared to training
MACHINE
LEARNING
ALGORITHM
Trained
model
Training
data
TRAINED
MODEL
Corresponding
data output
New
data
input
Linear
– OLS regression
Generalized linear
– Logistic regression, GAMs
Rule based
– Decision trees
Kernel-based
– Support vector machines
White box
– Regression family, Decision tree
family
Black box
– Neural networks
Parametric
– Regression family
Non-parametric
– Support vector machines, Rule based fuzzy systems
Ensemble based
– Random forest, AdaBoost
Supervised
– Decision trees,logistic
regression
Unsupervised
– K-means clustering,
hierarchical clustering
Generative
– Naïve Bayes, mixture of
Gaussians
Discriminative
– Support vector machines,
logistic regression,
Decision trees
Classification
– Decision trees, logistic
regression
Regression (predicting a
continuous value)
– OLSregression
Algorithm
Source :http://what-when-how.com/face-recognition/facial-landmark-localization-face-recognition-techniques-part-1/
Linear regression Logistic regression
Decisiontrees
Multi-layer perceptron
Random forest
Source :Wikipedia
Ref: http://www.saedsayad.com/logistic_regression.htm
Source :Wikipedia
Which algorithm should I use ?
• Objective of analysis
– Prediction of a continuous value
– classification
– identifying key predictors
• Data type and distribution
• Computational complexity of the algorithm Data volume
Combining Data Mining and Machine Learning for Effective User Profiling

Weitere ähnliche Inhalte

Was ist angesagt?

Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligencehktripathy
 
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...Edureka!
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersMelinda Thielbar
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataVipin Batra
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectbodaceacat
 
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 Unexpected Challenges in Large Scale Machine Learning by Charles Parker Unexpected Challenges in Large Scale Machine Learning by Charles Parker
Unexpected Challenges in Large Scale Machine Learning by Charles ParkerBigMine
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldDez Blanchfield
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challengesfazail amin
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceCaserta
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Simplilearn
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data ScienceTJ Stalcup
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...BigMine
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalstelligence
 
"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.org"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.orgAIBDP
 

Was ist angesagt? (20)

Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligence
 
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl Winters
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data, Baby Steps
Big Data, Baby StepsBig Data, Baby Steps
Big Data, Baby Steps
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 Unexpected Challenges in Large Scale Machine Learning by Charles Parker Unexpected Challenges in Large Scale Machine Learning by Charles Parker
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challenges
 
Using hadoop for big data
Using hadoop for big dataUsing hadoop for big data
Using hadoop for big data
 
Big data 101
Big data 101Big data 101
Big data 101
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data Science
 
De-Mystifying Big Data
De-Mystifying Big DataDe-Mystifying Big Data
De-Mystifying Big Data
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-final
 
"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.org"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.org
 
Bigdata
BigdataBigdata
Bigdata
 

Andere mochten auch

Brighten Your Future With IT : Why I Need to Start Learn Programming
Brighten Your Future With IT : Why I Need to Start Learn ProgrammingBrighten Your Future With IT : Why I Need to Start Learn Programming
Brighten Your Future With IT : Why I Need to Start Learn ProgrammingMuhammad Singgih Z.A
 
Memaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.jsMemaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.jsCodePolitan
 
How Kudo Elevates Undeserved Indonesians
How Kudo Elevates Undeserved IndonesiansHow Kudo Elevates Undeserved Indonesians
How Kudo Elevates Undeserved IndonesiansMuhammad Singgih Z.A
 
IoT Devices, Which One is Right for You to Learn
IoT Devices, Which One is Right for You to LearnIoT Devices, Which One is Right for You to Learn
IoT Devices, Which One is Right for You to LearnToni Haryanto
 
Get in Touch with Internet of Things
Get in Touch with Internet of ThingsGet in Touch with Internet of Things
Get in Touch with Internet of ThingsCodePolitan
 
IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?CodePolitan
 
Rapid Android Development for Hackathon
Rapid Android Development for HackathonRapid Android Development for Hackathon
Rapid Android Development for HackathonCodePolitan
 
E-Magazine Codepolitan : Perkembangan Internet of Things
E-Magazine Codepolitan : Perkembangan Internet of ThingsE-Magazine Codepolitan : Perkembangan Internet of Things
E-Magazine Codepolitan : Perkembangan Internet of ThingsAbdul Fauzan
 
React Webinar With CodePolitan
React Webinar With CodePolitanReact Webinar With CodePolitan
React Webinar With CodePolitanRiza Fahmi
 
CodePolitan Media Partner SOP
CodePolitan Media Partner SOPCodePolitan Media Partner SOP
CodePolitan Media Partner SOPCodePolitan
 
Scaling tokopedia-past-present-future
Scaling tokopedia-past-present-futureScaling tokopedia-past-present-future
Scaling tokopedia-past-present-futureRein Mahatma
 
Rekayasa Web 1-Teknologi Website
Rekayasa Web 1-Teknologi WebsiteRekayasa Web 1-Teknologi Website
Rekayasa Web 1-Teknologi WebsiteKhaerul Anwar
 
Serverless Architecture
Serverless ArchitectureServerless Architecture
Serverless ArchitectureCodePolitan
 
Perkembangan Teknologi Informasi di Dunia Industri
Perkembangan Teknologi Informasi di Dunia IndustriPerkembangan Teknologi Informasi di Dunia Industri
Perkembangan Teknologi Informasi di Dunia IndustriKresna Galuh
 
Strategi Gaul di Sosial Media
Strategi Gaul di Sosial MediaStrategi Gaul di Sosial Media
Strategi Gaul di Sosial MediaKresna Galuh
 
Compared: IBM Watson Services / Microsoft Azure Services
Compared: IBM Watson Services / Microsoft Azure ServicesCompared: IBM Watson Services / Microsoft Azure Services
Compared: IBM Watson Services / Microsoft Azure ServicesCraig Milroy
 
How to ready your organization for Artificial Intelligence
How to ready your organization for Artificial IntelligenceHow to ready your organization for Artificial Intelligence
How to ready your organization for Artificial IntelligenceCraig Milroy
 

Andere mochten auch (19)

Brighten Your Future With IT : Why I Need to Start Learn Programming
Brighten Your Future With IT : Why I Need to Start Learn ProgrammingBrighten Your Future With IT : Why I Need to Start Learn Programming
Brighten Your Future With IT : Why I Need to Start Learn Programming
 
Memaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.jsMemaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.js
 
How Kudo Elevates Undeserved Indonesians
How Kudo Elevates Undeserved IndonesiansHow Kudo Elevates Undeserved Indonesians
How Kudo Elevates Undeserved Indonesians
 
IoT Devices, Which One is Right for You to Learn
IoT Devices, Which One is Right for You to LearnIoT Devices, Which One is Right for You to Learn
IoT Devices, Which One is Right for You to Learn
 
Codepolitan profile 2016
Codepolitan profile 2016Codepolitan profile 2016
Codepolitan profile 2016
 
Get in Touch with Internet of Things
Get in Touch with Internet of ThingsGet in Touch with Internet of Things
Get in Touch with Internet of Things
 
IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?
 
Rapid Android Development for Hackathon
Rapid Android Development for HackathonRapid Android Development for Hackathon
Rapid Android Development for Hackathon
 
E-Magazine Codepolitan : Perkembangan Internet of Things
E-Magazine Codepolitan : Perkembangan Internet of ThingsE-Magazine Codepolitan : Perkembangan Internet of Things
E-Magazine Codepolitan : Perkembangan Internet of Things
 
Technology Stack KUDO.co.id
Technology Stack KUDO.co.idTechnology Stack KUDO.co.id
Technology Stack KUDO.co.id
 
React Webinar With CodePolitan
React Webinar With CodePolitanReact Webinar With CodePolitan
React Webinar With CodePolitan
 
CodePolitan Media Partner SOP
CodePolitan Media Partner SOPCodePolitan Media Partner SOP
CodePolitan Media Partner SOP
 
Scaling tokopedia-past-present-future
Scaling tokopedia-past-present-futureScaling tokopedia-past-present-future
Scaling tokopedia-past-present-future
 
Rekayasa Web 1-Teknologi Website
Rekayasa Web 1-Teknologi WebsiteRekayasa Web 1-Teknologi Website
Rekayasa Web 1-Teknologi Website
 
Serverless Architecture
Serverless ArchitectureServerless Architecture
Serverless Architecture
 
Perkembangan Teknologi Informasi di Dunia Industri
Perkembangan Teknologi Informasi di Dunia IndustriPerkembangan Teknologi Informasi di Dunia Industri
Perkembangan Teknologi Informasi di Dunia Industri
 
Strategi Gaul di Sosial Media
Strategi Gaul di Sosial MediaStrategi Gaul di Sosial Media
Strategi Gaul di Sosial Media
 
Compared: IBM Watson Services / Microsoft Azure Services
Compared: IBM Watson Services / Microsoft Azure ServicesCompared: IBM Watson Services / Microsoft Azure Services
Compared: IBM Watson Services / Microsoft Azure Services
 
How to ready your organization for Artificial Intelligence
How to ready your organization for Artificial IntelligenceHow to ready your organization for Artificial Intelligence
How to ready your organization for Artificial Intelligence
 

Ähnlich wie Combining Data Mining and Machine Learning for Effective User Profiling

DataScienceIntroduction.pptx
DataScienceIntroduction.pptxDataScienceIntroduction.pptx
DataScienceIntroduction.pptxKannanThangavelu2
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1malathieswaran29
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
Big data from the trenches
Big data from the trenchesBig data from the trenches
Big data from the trenchesAzrul MADISA
 
Classification & Clustering.pptx
Classification & Clustering.pptxClassification & Clustering.pptx
Classification & Clustering.pptxImXaib
 
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesKrishna Sankar
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data sciencebhavesh lande
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdfssuser0413ec
 
Data Management Planning in the arts
Data Management Planning in the artsData Management Planning in the arts
Data Management Planning in the artsSarah Jones
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptxinfinix8
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptxXanGwaps
 
Data, Text and Web Mining
Data, Text and Web Mining Data, Text and Web Mining
Data, Text and Web Mining Jeremiah Fadugba
 
Data mining concept and methods for basic
Data mining concept and methods for basicData mining concept and methods for basic
Data mining concept and methods for basicNivaTripathy2
 

Ähnlich wie Combining Data Mining and Machine Learning for Effective User Profiling (20)

DataScienceIntroduction.pptx
DataScienceIntroduction.pptxDataScienceIntroduction.pptx
DataScienceIntroduction.pptx
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
Compilerpt
CompilerptCompilerpt
Compilerpt
 
Dm1.1
Dm1.1Dm1.1
Dm1.1
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
Big data from the trenches
Big data from the trenchesBig data from the trenches
Big data from the trenches
 
Classification & Clustering.pptx
Classification & Clustering.pptxClassification & Clustering.pptx
Classification & Clustering.pptx
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Ds01 data science
Ds01   data scienceDs01   data science
Ds01 data science
 
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
 
Data Mining Lecture_1.pptx
Data Mining Lecture_1.pptxData Mining Lecture_1.pptx
Data Mining Lecture_1.pptx
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdf
 
Data Management Planning in the arts
Data Management Planning in the artsData Management Planning in the arts
Data Management Planning in the arts
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
Data, Text and Web Mining
Data, Text and Web Mining Data, Text and Web Mining
Data, Text and Web Mining
 
Data mining concept and methods for basic
Data mining concept and methods for basicData mining concept and methods for basic
Data mining concept and methods for basic
 

Mehr von CodePolitan

Pre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium MemberPre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium MemberCodePolitan
 
Materi devcussion 1.0
Materi devcussion 1.0Materi devcussion 1.0
Materi devcussion 1.0CodePolitan
 
Slides alexander-makarov
Slides alexander-makarovSlides alexander-makarov
Slides alexander-makarovCodePolitan
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjajaCodePolitan
 
Dev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potentialDev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potentialCodePolitan
 
Slides imanzah-hidayat
Slides imanzah-hidayatSlides imanzah-hidayat
Slides imanzah-hidayatCodePolitan
 
Ids johanes alexander
Ids   johanes alexanderIds   johanes alexander
Ids johanes alexanderCodePolitan
 
2017 10 28 angular in war - rev3
2017 10 28   angular in war - rev32017 10 28   angular in war - rev3
2017 10 28 angular in war - rev3CodePolitan
 

Mehr von CodePolitan (11)

Pre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium MemberPre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium Member
 
Materi devcussion 1.0
Materi devcussion 1.0Materi devcussion 1.0
Materi devcussion 1.0
 
Slides alexander-makarov
Slides alexander-makarovSlides alexander-makarov
Slides alexander-makarov
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjaja
 
Dev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potentialDev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potential
 
Slides imanzah-hidayat
Slides imanzah-hidayatSlides imanzah-hidayat
Slides imanzah-hidayat
 
Ids johanes alexander
Ids   johanes alexanderIds   johanes alexander
Ids johanes alexander
 
Vison final
Vison   finalVison   final
Vison final
 
Tride
TrideTride
Tride
 
React ftw
React ftwReact ftw
React ftw
 
2017 10 28 angular in war - rev3
2017 10 28   angular in war - rev32017 10 28   angular in war - rev3
2017 10 28 angular in war - rev3
 

Kürzlich hochgeladen

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 

Kürzlich hochgeladen (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 

Combining Data Mining and Machine Learning for Effective User Profiling

  • 1. Combining Data Mining and Machine Learning for Effective User Profiling Saturday, 14 May 2016
  • 2. Wealth of data/information, Lack of knowledge The databases are more and more large • Terrorbytes! A deluge of data, containing a lot of hidden information • new knowledge What are the technological motivations? • Technologies to collect data • Bar code readers, scanners, cameras, etc.. • Technologies to store data • Databases, data warehouses, other repositories • Network (Web) as computing and storage platform An example of data deluge: • the WEB and SOCIAL MEDIA !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  • 3. Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused • Web data, e-commerce • Purchases at department/ grocery stores • Bank/Credit Card transactions Competitive Pressure is Strong • Use Data Mining to provide better, customized services for an edge (e.g. in Customer Relationship Management)
  • 4. Why Mine Data? Scientific Viewpoint Data collected and stored at enormous speeds (GB/hour) • remote sensors on a satellite • telescopes scanning the skies • microarrays generating gene expression data • scientific simulations generating terabytes of data Traditional techniques infeasible for raw data Data mining may help scientists: • in classifying and segmenting data • in Hypothesis Formation
  • 5. What is Data Mining Data mining (Many Definitions) Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns Data mining: a misnomer? It should be pattern mining in analogy to gold mining Alternative names: Knowledge discovery(mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.
  • 6. Origins of Data Mining • Draws ideas from machine learning/AI, pattern recognition, statistics, database systems, HPC • Traditional Techniques may be unsuitable due to 1. Enormity of data 2. High dimensionality of data 3. Heterogeneous, distributed nature of data Machine Learning/ Pattern Recognition Statistics/ AI Data Mining Database systems High Performance Computing
  • 7. KDD is a process Data Warehouse Cleansing / Selection / Transformation Data Selection Data Integration Databases Pattern Interpretation / Evaluation – Data mining is the core of the KDD process Data Mining Task-relevant Data
  • 8. Data Mining: On What Kind of Data? • Relational databases • Data warehouses • Transactional databases • Advanced DB and information repositories 1. Object-oriented and object-relational databases 2. Spatial databases 3. Time-series data and temporal data 4. Text databases and multimedia databases 5. Heterogeneous and legacy databases 6. WWW
  • 9. Web Mining applies DM to WWW Data Mining •Often applied to structured database Web mining • Applied to less structured data, dynamic, of huge size • Not only Web content, but also hyperlinks and access log
  • 11. Why? Data gathered from both the web and more conventional sources can be used to answer such questions as: • Marketing - those likely to buy. • Forecasts - predicting demand. • Loyalty - those likely to defect. • Credit - which were the profitable items. • Fraud - when and where they occur.
  • 12. Related Terms DATA MINING PREDICTIVEANALYTICS DISCOVERY AND COMMUNICATION OF MEANINGFUL PATTERNS IN DATA. PROCESS OF DISCOVERING PATTERNS IN LARGE DATASETS USING METHODS FROM AI, MACHINE LEARNING, STATISTICS AND DATABASE SYSTEMS TECHNIQUES FROM STATISTICS, MACHINE LEARNING AND DATA MINING IN CONJUNCTION WITH HISTORICAL AND CURRENT DATATO MAKE PREDICTIONS ABOUT THE FUTURE.
  • 13.
  • 14. Machine Learning Underlying processx y Machine learning algorithm Model that approximates the underlying process “Using data to understand an underlying process”
  • 15. Underlying process {x1, x2, …} Machine learning algorithm Model that approximates the underlying process “Using data to understand an underlying process”
  • 16. Data set 1 Model 1 Data set 2 Model 2 The created model depends on the data values used for training. Machine learning algorithm Machine learning algorithm
  • 17. Why build a model? o o o o o o o o o o o o o o oo o o o o o o o o o o o o x o o o o o x o o o oo oo oo o o o o o o o o o o o o x o o oo o o o Time • Predict – A continuous value – A category label • Find clusters in data • Identify key predictors • …
  • 18. Why build a model (cont..) o o o o o o o o o o o o o o oo o o o o o o o o o o o o x o o o o o x o o o oo oo oo o o o o o o o o o o o o x o o o oo o o o Time • Predict – A continuous value – A category label • Find clusters in data • Identify key predictors • …
  • 19. o o o o o o o o o o o o o o oo o o o o o o o o o o o o x o o o o o x o o o oo oo oo o o o o o o o o o o o o x o o oo o o o Time • Predict – A continuous value – A category label • Find clusters in data • Identify key predictors • … Why build a model (cont..)
  • 20. o o o o o o o o o o o o o o oo o o o o o o o o o o o o x o o o o o x o o o oo oo oo o o o o o o o o o o o o x o o oo o o o Time • Predict – A continuous value – A category label • Find clusters in data • Identify key predictors • … Why build a model (cont..)
  • 21. o o o o o o o o o o o o o o oo o o o o o o o o o o o o x o o o o o x o o o oo oo oo o o o o o o o o o o o o x o o oo o o o Time – A continuous value – A category label • Find clusters in data • Identify key predictors • … Why build a model (cont..)
  • 22. Training phase – The machine learning algorithm learns from data – Output is a trained model – Time consuming – Typically involves multiple iterations over training data Testing or scoring phase – The trained model is used in conjunction with new data inputs to estimate corresponding output – Much quicker as compared to training MACHINE LEARNING ALGORITHM Trained model Training data TRAINED MODEL Corresponding data output New data input
  • 23. Linear – OLS regression Generalized linear – Logistic regression, GAMs Rule based – Decision trees Kernel-based – Support vector machines White box – Regression family, Decision tree family Black box – Neural networks Parametric – Regression family Non-parametric – Support vector machines, Rule based fuzzy systems Ensemble based – Random forest, AdaBoost Supervised – Decision trees,logistic regression Unsupervised – K-means clustering, hierarchical clustering Generative – Naïve Bayes, mixture of Gaussians Discriminative – Support vector machines, logistic regression, Decision trees Classification – Decision trees, logistic regression Regression (predicting a continuous value) – OLSregression Algorithm
  • 24. Source :http://what-when-how.com/face-recognition/facial-landmark-localization-face-recognition-techniques-part-1/ Linear regression Logistic regression Decisiontrees Multi-layer perceptron Random forest Source :Wikipedia Ref: http://www.saedsayad.com/logistic_regression.htm Source :Wikipedia
  • 25. Which algorithm should I use ? • Objective of analysis – Prediction of a continuous value – classification – identifying key predictors • Data type and distribution • Computational complexity of the algorithm Data volume

Hinweis der Redaktion

  1. Big Data 1. The increasing ‘datafication’ of the world, which means we generate new data at frightening rates. 2. Our increasing ability to harness and analyse large and complex sets of data Activity Data: Simple activities like listening to music or reading a book are now generating data. Digital music players and eBooks collect data on our activities. Your smart phone collects data on how you use it and your web browser collects information on what you are searching for. Your credit card company collects data on where you shop and your shop collects data on what you buy. It is hard to imagine any activity that does not generate data. Conversation Data: Our conversations are now digitally recorded. It all started with emails but nowadays most of our conversations leave a digital trail. Just think of all the conversations we have on social media sites like Facebook or Twitter. Even many of our phone conversations are now digitally recorded. Photo and Video Image Data: Just think about all the pictures we take on our smart phones or digital cameras. We upload and share 100s of thousands of them on social media sites every second. The increasing amounts of CCTV cameras take video images and every minute we up-load hundreds of hours of video images to YouTube and other sites. Sensor Data: We are increasingly surrounded by sensors that collect and share data. Take your smart phone, it contains a global positioning sensor to track exactly where you are every second of the day, it includes an accelometer to track the speed and direction at which you are travelling. We now have sensors in many devices and products. The Internet of Things Data: We now have smart TVs that are able to collect and process data, we have smart watches, smart fridges, and smart alarms. The Internet of Things, or Internet of Everything connects these devices so that the traffic sensors on the road send data to your alarm clock which will wake you up earlier than planned because the blocked road means you have to leave earlier to make your 9am meeting…
  2. 1.Learning the application domain: –relevant prior knowledge and goals of application 2.Creating a target data set: data selection 3. Data cleaning and preprocessing: (may take 60% of effort!) 4. Data reduction and transformation: Find useful features, dimensionality/variable reduction, invariant representation. 5. Choosing functions of data mining summarization, classification, regression, association, clustering. 6.Choosing the mining algorithm(s) 7. Data mining: search for patterns of interest Pattern evaluation and knowledge presentation visualization, transformation, removing redundant patterns, etc. 8.Use of discovered knowledge
  3. Content mining seeks to uncover the objects and resources within a site while structure mining reveals the inter and intra connectivity of the web pages. Usage mining analyzes web server logs to track the activities of the users as they traverse a site. A web site is often the first point of contact between a potential customer and a company. It is therefore essential that the process of browsing/using the web site is made as simple and pleasurable as possible for the customer. Carefully designed web pages play a major part here and can be enhanced through information relating to web access. The progress of the customer is monitored by the web server log which holds details of every web page visited.
  4. Information is available from: • Registration forms, these are very useful and the customers should be persuaded to fill out at least one. Useful information such as age, sex and location can be obtained. • Server log, this provides details of each web page visited and timings. However, the main advantage of mining web server logs relate to sales and marketing. Sites like Amazon hold individual customer’s previous product searches and past purchases with which to target this particular individual. • Past purchases and previous search patterns, useful for personalization of web pages. • Cookies, these reside on the customers hard drive and enable details between sessions to be recorded.