SlideShare a Scribd company logo
1 of 18
Download to read offline
Christof Monz
Informatics Institute
University of Amsterdam
Data Mining
Week 1: Introduction
Today’s Class
Christof Monz
Data Minging - Week 1: Introduction
1
Overview of Data Mining
Overview of Machine Learning
Course administrivia
What’s Data Mining?
Christof Monz
Data Minging - Week 1: Introduction
2
Data: Records, web pages, documents, etc.
Mining: The process or business of extracting
ore or minerals from the ground (The American
Heritage)
Data Mining: The nontrivial extraction of
implicit, previously unknown, and potentially
useful information from large amounts of data
Why Data Mining?
Christof Monz
Data Minging - Week 1: Introduction
3
There is an abundance of data resources:
commercial databases, intranets, the Internet,
. . .
These resources contain a large amount of
valuable data
The best way to structure the data depends on
how one wants to exploit it
Manual data organization is very laborious and
expensive
There is a need to automate this process
Some Application Areas
Christof Monz
Data Minging - Week 1: Introduction
4
Customer analysis (what impacts customer
behavior?)
Medical research (what is the impact of
lifestyle/drug effects?)
Insurance (risk assessment)
Stock investment (which factors impact stock
performance?)
Fraud detection (when is a transaction likely to
be fraudulent?)
The Need for Automated Analysis
Christof Monz
Data Minging - Week 1: Introduction
5
Much of the available data is never analyzed!
What is and isn’t Data Mining
Christof Monz
Data Minging - Week 1: Introduction
6
Look up in an electronically available phone
book what John Doe’s phone number and
address is (isn’t Data Mining but database
management)
Infer from analyzing a number of web pages
what John Doe’s phone number is, although
this information is not expressed explicitly (is
Data Mining)
Situating Data Mining
Christof Monz
Data Minging - Week 1: Introduction
7
Data Mining lies on the intersection of a
number of research areas
Data Mining Tasks
Christof Monz
Data Minging - Week 1: Introduction
8
Prediction
• Use some variables to predict unknown or future values
of other variables
Description
• Find human-interpretable patterns that describe the data
Some Data Mining Tasks
Christof Monz
Data Minging - Week 1: Introduction
9
Classification (Predictive)
Clustering (Descriptive)
Association Rule Discovery (Descriptive)
Sequential Pattern Discovery (Descriptive)
Regression (Predictive)
Deviation Detection (Predictive)
Classification
Christof Monz
Data Minging - Week 1: Introduction
10
Given a collection of records (training set)
• Each record contains a set of attributes, one of the
attributes is the class.
Find a model for class attribute as a function of
the values of other attributes
Goal: previously unseen records should be
assigned a class as accurately as possible.
• A test set is used to determine the accuracy of the model
Example: Direct Marketing
Christof Monz
Data Minging - Week 1: Introduction
11
Goal: Reduce cost of mailing by targeting a set
of consumers likely to buy a new cell-phone
product
Approach:
• Use the data for a similar product introduced before
• We know which customers decided to buy and which
decided otherwise. This buy/don’t buy decision forms
the class attribute
• Collect various demographic, lifestyle, and
company-interaction related information about all such
customers (where they stay, how much they earn, . . . )
• Use this information as input attributes to learn a
classifier model
Classify This!
Christof Monz
Data Minging - Week 1: Introduction
12
Some Observations
Christof Monz
Data Minging - Week 1: Introduction
13
Training data (examples for which the class is
known)
Feature extraction (what are the ’things’ that
are relevant to predict a class?)
Feature weight (how important is a feature?)
Feature combination (sometimes features act
together)
Over-fitting (some features don’t generalize
well)
Evaluation (how accurate is the prediction?)
Machine Learning
Christof Monz
Data Minging - Week 1: Introduction
14
The research area of machine learning
investigates and formalizes the challenge of
prediction and description by computer
Machine learning plays a central role in data
mining
It is used for:
• Building new models
• Adapting existing models to new situations
• Comparing the performance of competing models
Machine Learning is . . .
Christof Monz
Data Minging - Week 1: Introduction
15
. . . the principles, methods, and algorithms for
learning and prediction on the basis of past
experience
. . . already everywhere: speech recognition,
hand-written character recognition, computer
vision, information retrieval, operating systems,
compilers, fraud detection, security, defense
applications, . . .
Learning
Christof Monz
Data Minging - Week 1: Introduction
16
Steps
• entertain a (biased) set of possibilities
• adjust predictions based on feedback
• rethink the set of possibilities
Principles of learning are ‘universal’
• society (e.g., scientific community)
• animal (e.g., human)
• machine
Learning and Prediction
Christof Monz
Data Minging - Week 1: Introduction
17
We make predictions all the time but rarely
investigate the processes underlying our
predictions
In carrying out scientific research we are also
governed by how theories are evaluated
To automate the process of making predictions
we need to understand in addition how we
search and refine ‘theories’
Learning: Key Steps
Christof Monz
Data Minging - Week 1: Introduction
18
Data and assumptions
• What data is available for the learning task?
• What can we assume about the problem?
Representation
• How should we represent the examples to be classified?
Evaluation and Estimation
• How well are we doing?
• How do we adjust our predictions based on the
feedback?
• Can we rethink the approach to do even better?
Example
Christof Monz
Data Minging - Week 1: Introduction
19
A classification problem: predict the grades for
students taking this course
Key Steps:
1. data
2. assumptions
3. representation
4. estimation
5. evaluation
6. model selection
Example
Christof Monz
Data Minging - Week 1: Introduction
20
Key Steps:
1. data: what ‘past experience’ can we rely on?
2. assumptions: what can we assume about the students or
the course?
3. representation: how do we ‘summarize’ a student?
4. estimation: how do we construct a map from students to
grades?
5. evaluation: how well are we predicting?
6. model selection: perhaps we can do even better?
Example: Data
Christof Monz
Data Minging - Week 1: Introduction
21
The data we have available (in principle):
• Names and grades of students in past years ML courses
• Academic record of past and current students
Training data:
Student ML course 1 course 2 . . .
Peter A B A . . .
David B A A . . .
Test data:
Student ML course 1 course 2 . . .
Jack ? C A . . .
Kate ? A A . . .
Assumptions
Christof Monz
Data Minging - Week 1: Introduction
22
There are many assumptions we can make to
facilitate predictions:
• The course has remained roughly the same over the years
• Each student performs independently from others
Example: Representation
Christof Monz
Data Minging - Week 1: Introduction
23
Academic records are rather diverse so we might
limit the summaries to a select few courses
For example, we can summarize the ith
student
(say David) with a vector: xi = [B A A]
The available data in this representation:
Training Testing
Student ML grade Student ML grade
x1 A x1 ?
x2 B x2 ?
. . . . . . . . . . . .
Example: Estimation
Christof Monz
Data Minging - Week 1: Introduction
24
Given the training data
Student ML grade
x1 A
x2 B
. . . . . .
find a mapping from input vectors x to ‘labels’
y encoding the grades for the ML course.
Possible solution (nearest neighbor classifier):
1. For any student x in the test set find the ‘closest’
student xi in the training set
2. Predict yi as the grade of the closest student
Example: Evaluation
Christof Monz
Data Minging - Week 1: Introduction
25
How can we tell how good our predictions are?
• We can wait till the end of this course
• We can try to assess the accuracy based on the data we
already have (part of the training data)
Possible solution:
• Divide the training set further into training and test sets
• Evaluate the classifier constructed on the basis of only
the smaller training set on the new test set
Example: Model Selection
Christof Monz
Data Minging - Week 1: Introduction
26
We can refine
• the estimation algorithm (e.g., using a classifier other
than the nearest neighbor classifier)
• the representation (e.g., base the summaries on a
different set of courses)
• the assumptions (e.g., perhaps students work in groups)
etc.
We have to rely on the method of evaluating
the accuracy of our predictions to select among
the possible refinements
Types of Learning Approaches
Christof Monz
Data Minging - Week 1: Introduction
27
Supervised learning: where we get a set of
training inputs and outputs
• E.g., classification, regression
Unsupervised learning: where we are
interested in capturing inherent organization in
the data
• E.g., clustering, density estimation
Reinforcement learning: where we only get
feedback in the form of how well we are doing
(not what we should be doing)
• E.g., planning
Challenges of Data Mining
Christof Monz
Data Minging - Week 1: Introduction
28
Scalability
Dimensionality/Complexity
Data quality
Data ownership
Privacy considerations
Continually updated data
Recap
Christof Monz
Data Minging - Week 1: Introduction
29
Difference between data mining and other
research areas
Applications of data mining
Need for automation and the use of machine
learning
Key steps in machine learning
About This Course
Christof Monz
Data Minging - Week 1: Introduction
30
This course does not:
• give a comprehensive introduction to data mining
• cover how to adapt data mining to specific applications
• cover feature extraction
• cover evaluation issues in detail
This course does:
• focus on the pre-dominant approach in data mining:
machine learning
• sketch some of the example applications
• introduce a representative selection of machine learning
techniques used in data mining
• focus on the algorithmic fundamentals of machine
learning
Approaches Covered
Christof Monz
Data Minging - Week 1: Introduction
31
Linear regression (regression)
Decision Trees (classification)
Neural Networks (classification)
k-Nearest-Neighbors (classification)
Naive Bayes (classification)
K-Means (clustering)
Hierarchical Clustering (clustering)
What to get out of this Course
Christof Monz
Data Minging - Week 1: Introduction
32
At the end of this course you will have learned:
• what type of problems can be addressed by data mining
techniques
• what the most common machine learning approaches in
data mining are
• which machine learning approaches are appropriate for a
given type of data mining application
• the algorithmic fundamentals of a number of relevant
machine learning approaches
Course Administrivia
Christof Monz
Data Minging - Week 1: Introduction
33
Exam counts for 40%, homework counts for
20%, practical assignments (40%)
Lectures are on Tuesday 9-11am (D1.116)
Tutorials (werk colleges) are on Thursday
9-11am (G0.05) and Fridays 9-11am (G5.29)
Labs are on Thursday 11am-1pm (G0.18)
or Friday 11am-1pm (G0.18)
Course Administrivia
Christof Monz
Data Minging - Week 1: Introduction
34
Teaching assistants:
Yijin He (email: jiyinhe@gmail.com)
(English only!)
Spyros Martzoukos (email:
S.Martzoukos@uva.nl) (English only!)
Course web page: on Blackboard
Check course web page regularly for
announcements, slides, . . .

More Related Content

Viewers also liked

Sw 7 triple20
Sw 7 triple20Sw 7 triple20
Sw 7 triple20okeee
 
Sw cursusoverzicht
Sw cursusoverzichtSw cursusoverzicht
Sw cursusoverzichtokeee
 
Mit press a semantic web primer - 2004 !! - (by laxxuss)
Mit press   a semantic web primer - 2004 !! - (by laxxuss)Mit press   a semantic web primer - 2004 !! - (by laxxuss)
Mit press a semantic web primer - 2004 !! - (by laxxuss)okeee
 
Kbms jan catin cont(1)
Kbms jan catin cont(1)Kbms jan catin cont(1)
Kbms jan catin cont(1)okeee
 
Chapter1 de vrieshuizing
Chapter1 de vrieshuizingChapter1 de vrieshuizing
Chapter1 de vrieshuizingokeee
 
Kbms text-image
Kbms text-imageKbms text-image
Kbms text-imageokeee
 
Chapter7 huizing
Chapter7 huizingChapter7 huizing
Chapter7 huizingokeee
 
Really usefulebooks 0262012421_the mit press a semantic web primer 2nd editio...
Really usefulebooks 0262012421_the mit press a semantic web primer 2nd editio...Really usefulebooks 0262012421_the mit press a semantic web primer 2nd editio...
Really usefulebooks 0262012421_the mit press a semantic web primer 2nd editio...okeee
 
Hcm p137 hilliges
Hcm p137 hilligesHcm p137 hilliges
Hcm p137 hilligesokeee
 
Sw semantic web
Sw semantic webSw semantic web
Sw semantic webokeee
 
Chapter5 bryant
Chapter5 bryantChapter5 bryant
Chapter5 bryantokeee
 

Viewers also liked (11)

Sw 7 triple20
Sw 7 triple20Sw 7 triple20
Sw 7 triple20
 
Sw cursusoverzicht
Sw cursusoverzichtSw cursusoverzicht
Sw cursusoverzicht
 
Mit press a semantic web primer - 2004 !! - (by laxxuss)
Mit press   a semantic web primer - 2004 !! - (by laxxuss)Mit press   a semantic web primer - 2004 !! - (by laxxuss)
Mit press a semantic web primer - 2004 !! - (by laxxuss)
 
Kbms jan catin cont(1)
Kbms jan catin cont(1)Kbms jan catin cont(1)
Kbms jan catin cont(1)
 
Chapter1 de vrieshuizing
Chapter1 de vrieshuizingChapter1 de vrieshuizing
Chapter1 de vrieshuizing
 
Kbms text-image
Kbms text-imageKbms text-image
Kbms text-image
 
Chapter7 huizing
Chapter7 huizingChapter7 huizing
Chapter7 huizing
 
Really usefulebooks 0262012421_the mit press a semantic web primer 2nd editio...
Really usefulebooks 0262012421_the mit press a semantic web primer 2nd editio...Really usefulebooks 0262012421_the mit press a semantic web primer 2nd editio...
Really usefulebooks 0262012421_the mit press a semantic web primer 2nd editio...
 
Hcm p137 hilliges
Hcm p137 hilligesHcm p137 hilliges
Hcm p137 hilliges
 
Sw semantic web
Sw semantic webSw semantic web
Sw semantic web
 
Chapter5 bryant
Chapter5 bryantChapter5 bryant
Chapter5 bryant
 

Similar to Dm week01 intro.handout

Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Data Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesData Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesYuchen Zhao
 
Classifying Unstructured Text - A Hybrid Deterministic/ML Approach
Classifying Unstructured Text - A Hybrid Deterministic/ML ApproachClassifying Unstructured Text - A Hybrid Deterministic/ML Approach
Classifying Unstructured Text - A Hybrid Deterministic/ML ApproachDataWorks Summit/Hadoop Summit
 
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...eMadrid network
 
How any institution can get started on learning analytics
How any institution can get started on learning analyticsHow any institution can get started on learning analytics
How any institution can get started on learning analyticsJeremy Anderson
 
Investigating learning strategies in a dispositional learning analytics conte...
Investigating learning strategies in a dispositional learning analytics conte...Investigating learning strategies in a dispositional learning analytics conte...
Investigating learning strategies in a dispositional learning analytics conte...Bart Rienties
 
Understanding Student Learning Using Learning Management Systems and Basic An...
Understanding Student Learning Using Learning Management Systems and Basic An...Understanding Student Learning Using Learning Management Systems and Basic An...
Understanding Student Learning Using Learning Management Systems and Basic An...Michael Wilder
 
How AI will change the way you help students succeed - SchooLinks
How AI will change the way you help students succeed - SchooLinksHow AI will change the way you help students succeed - SchooLinks
How AI will change the way you help students succeed - SchooLinksKatie Fang
 
Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesRui Pedro Paiva
 
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.pptweek1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.pptRidoVercascade
 
The power of learning analytics to unpack learning and teaching: a critical p...
The power of learning analytics to unpack learning and teaching: a critical p...The power of learning analytics to unpack learning and teaching: a critical p...
The power of learning analytics to unpack learning and teaching: a critical p...Bart Rienties
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Jeet Das
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingAkin Osman Kazakci
 
MOOCs & Social Learning: Challenges and opportunities
MOOCs & Social Learning: Challenges and opportunitiesMOOCs & Social Learning: Challenges and opportunities
MOOCs & Social Learning: Challenges and opportunitiesVitomir Kovanovic
 
Wcss2010presentation
Wcss2010presentationWcss2010presentation
Wcss2010presentationyusuke_510
 
L 8 introduction to machine learning final kirti.pptx
L 8 introduction to machine learning final kirti.pptxL 8 introduction to machine learning final kirti.pptx
L 8 introduction to machine learning final kirti.pptxKirti Verma
 

Similar to Dm week01 intro.handout (20)

Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Data Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesData Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world Challenges
 
Data mining intro
Data mining introData mining intro
Data mining intro
 
Classifying Unstructured Text - A Hybrid Deterministic/ML Approach
Classifying Unstructured Text - A Hybrid Deterministic/ML ApproachClassifying Unstructured Text - A Hybrid Deterministic/ML Approach
Classifying Unstructured Text - A Hybrid Deterministic/ML Approach
 
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
VII Jornadas eMadrid "Education in exponential times". Mesa redonda eMadrid L...
 
How any institution can get started on learning analytics
How any institution can get started on learning analyticsHow any institution can get started on learning analytics
How any institution can get started on learning analytics
 
Investigating learning strategies in a dispositional learning analytics conte...
Investigating learning strategies in a dispositional learning analytics conte...Investigating learning strategies in a dispositional learning analytics conte...
Investigating learning strategies in a dispositional learning analytics conte...
 
Understanding Student Learning Using Learning Management Systems and Basic An...
Understanding Student Learning Using Learning Management Systems and Basic An...Understanding Student Learning Using Learning Management Systems and Basic An...
Understanding Student Learning Using Learning Management Systems and Basic An...
 
03 presentation-bothiesson
03 presentation-bothiesson03 presentation-bothiesson
03 presentation-bothiesson
 
DMDW Unit 1.pdf
DMDW Unit 1.pdfDMDW Unit 1.pdf
DMDW Unit 1.pdf
 
How AI will change the way you help students succeed - SchooLinks
How AI will change the way you help students succeed - SchooLinksHow AI will change the way you help students succeed - SchooLinks
How AI will change the way you help students succeed - SchooLinks
 
Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and Techniques
 
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.pptweek1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
week1-thursday-2id50-q2-2021-2022-intro-and-basic-fd.ppt
 
The power of learning analytics to unpack learning and teaching: a critical p...
The power of learning analytics to unpack learning and teaching: a critical p...The power of learning analytics to unpack learning and teaching: a critical p...
The power of learning analytics to unpack learning and teaching: a critical p...
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototyping
 
MOOCs & Social Learning: Challenges and opportunities
MOOCs & Social Learning: Challenges and opportunitiesMOOCs & Social Learning: Challenges and opportunities
MOOCs & Social Learning: Challenges and opportunities
 
Wcss2010presentation
Wcss2010presentationWcss2010presentation
Wcss2010presentation
 
L 8 introduction to machine learning final kirti.pptx
L 8 introduction to machine learning final kirti.pptxL 8 introduction to machine learning final kirti.pptx
L 8 introduction to machine learning final kirti.pptx
 
Joseph Jay Williams - WESST - Bridging Research and Practice via MOOClets & C...
Joseph Jay Williams - WESST - Bridging Research and Practice via MOOClets & C...Joseph Jay Williams - WESST - Bridging Research and Practice via MOOClets & C...
Joseph Jay Williams - WESST - Bridging Research and Practice via MOOClets & C...
 

More from okeee

Week02 answer
Week02 answerWeek02 answer
Week02 answerokeee
 
Dm uitwerkingen wc2
Dm uitwerkingen wc2Dm uitwerkingen wc2
Dm uitwerkingen wc2okeee
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1okeee
 
Dm uitwerkingen wc3
Dm uitwerkingen wc3Dm uitwerkingen wc3
Dm uitwerkingen wc3okeee
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1okeee
 
Dm part03 neural-networks-handout
Dm part03 neural-networks-handoutDm part03 neural-networks-handout
Dm part03 neural-networks-handoutokeee
 
Dm part03 neural-networks-homework
Dm part03 neural-networks-homeworkDm part03 neural-networks-homework
Dm part03 neural-networks-homeworkokeee
 
Prob18
Prob18Prob18
Prob18okeee
 
Overfit10
Overfit10Overfit10
Overfit10okeee
 
Decision tree.10.11
Decision tree.10.11Decision tree.10.11
Decision tree.10.11okeee
 
Dm week01 linreg.handout
Dm week01 linreg.handoutDm week01 linreg.handout
Dm week01 linreg.handoutokeee
 
Dm week02 decision-trees-handout
Dm week02 decision-trees-handoutDm week02 decision-trees-handout
Dm week02 decision-trees-handoutokeee
 
Dm week01 homework(1)
Dm week01 homework(1)Dm week01 homework(1)
Dm week01 homework(1)okeee
 
Chapter6 huizing
Chapter6 huizingChapter6 huizing
Chapter6 huizingokeee
 
Kbms audio
Kbms audioKbms audio
Kbms audiookeee
 
Kbms video-app
Kbms video-appKbms video-app
Kbms video-appokeee
 
Sw owl rules-proposal
Sw owl rules-proposalSw owl rules-proposal
Sw owl rules-proposalokeee
 
Sw practicumopdracht 4
Sw practicumopdracht 4Sw practicumopdracht 4
Sw practicumopdracht 4okeee
 
Sw wordnet h1
Sw wordnet h1Sw wordnet h1
Sw wordnet h1okeee
 
Sw wordnet intro
Sw wordnet introSw wordnet intro
Sw wordnet introokeee
 

More from okeee (20)

Week02 answer
Week02 answerWeek02 answer
Week02 answer
 
Dm uitwerkingen wc2
Dm uitwerkingen wc2Dm uitwerkingen wc2
Dm uitwerkingen wc2
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1
 
Dm uitwerkingen wc3
Dm uitwerkingen wc3Dm uitwerkingen wc3
Dm uitwerkingen wc3
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1
 
Dm part03 neural-networks-handout
Dm part03 neural-networks-handoutDm part03 neural-networks-handout
Dm part03 neural-networks-handout
 
Dm part03 neural-networks-homework
Dm part03 neural-networks-homeworkDm part03 neural-networks-homework
Dm part03 neural-networks-homework
 
Prob18
Prob18Prob18
Prob18
 
Overfit10
Overfit10Overfit10
Overfit10
 
Decision tree.10.11
Decision tree.10.11Decision tree.10.11
Decision tree.10.11
 
Dm week01 linreg.handout
Dm week01 linreg.handoutDm week01 linreg.handout
Dm week01 linreg.handout
 
Dm week02 decision-trees-handout
Dm week02 decision-trees-handoutDm week02 decision-trees-handout
Dm week02 decision-trees-handout
 
Dm week01 homework(1)
Dm week01 homework(1)Dm week01 homework(1)
Dm week01 homework(1)
 
Chapter6 huizing
Chapter6 huizingChapter6 huizing
Chapter6 huizing
 
Kbms audio
Kbms audioKbms audio
Kbms audio
 
Kbms video-app
Kbms video-appKbms video-app
Kbms video-app
 
Sw owl rules-proposal
Sw owl rules-proposalSw owl rules-proposal
Sw owl rules-proposal
 
Sw practicumopdracht 4
Sw practicumopdracht 4Sw practicumopdracht 4
Sw practicumopdracht 4
 
Sw wordnet h1
Sw wordnet h1Sw wordnet h1
Sw wordnet h1
 
Sw wordnet intro
Sw wordnet introSw wordnet intro
Sw wordnet intro
 

Recently uploaded

Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 

Recently uploaded (20)

Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 

Dm week01 intro.handout

  • 1. Christof Monz Informatics Institute University of Amsterdam Data Mining Week 1: Introduction Today’s Class Christof Monz Data Minging - Week 1: Introduction 1 Overview of Data Mining Overview of Machine Learning Course administrivia
  • 2. What’s Data Mining? Christof Monz Data Minging - Week 1: Introduction 2 Data: Records, web pages, documents, etc. Mining: The process or business of extracting ore or minerals from the ground (The American Heritage) Data Mining: The nontrivial extraction of implicit, previously unknown, and potentially useful information from large amounts of data Why Data Mining? Christof Monz Data Minging - Week 1: Introduction 3 There is an abundance of data resources: commercial databases, intranets, the Internet, . . . These resources contain a large amount of valuable data The best way to structure the data depends on how one wants to exploit it Manual data organization is very laborious and expensive There is a need to automate this process
  • 3. Some Application Areas Christof Monz Data Minging - Week 1: Introduction 4 Customer analysis (what impacts customer behavior?) Medical research (what is the impact of lifestyle/drug effects?) Insurance (risk assessment) Stock investment (which factors impact stock performance?) Fraud detection (when is a transaction likely to be fraudulent?) The Need for Automated Analysis Christof Monz Data Minging - Week 1: Introduction 5 Much of the available data is never analyzed!
  • 4. What is and isn’t Data Mining Christof Monz Data Minging - Week 1: Introduction 6 Look up in an electronically available phone book what John Doe’s phone number and address is (isn’t Data Mining but database management) Infer from analyzing a number of web pages what John Doe’s phone number is, although this information is not expressed explicitly (is Data Mining) Situating Data Mining Christof Monz Data Minging - Week 1: Introduction 7 Data Mining lies on the intersection of a number of research areas
  • 5. Data Mining Tasks Christof Monz Data Minging - Week 1: Introduction 8 Prediction • Use some variables to predict unknown or future values of other variables Description • Find human-interpretable patterns that describe the data Some Data Mining Tasks Christof Monz Data Minging - Week 1: Introduction 9 Classification (Predictive) Clustering (Descriptive) Association Rule Discovery (Descriptive) Sequential Pattern Discovery (Descriptive) Regression (Predictive) Deviation Detection (Predictive)
  • 6. Classification Christof Monz Data Minging - Week 1: Introduction 10 Given a collection of records (training set) • Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes Goal: previously unseen records should be assigned a class as accurately as possible. • A test set is used to determine the accuracy of the model Example: Direct Marketing Christof Monz Data Minging - Week 1: Introduction 11 Goal: Reduce cost of mailing by targeting a set of consumers likely to buy a new cell-phone product Approach: • Use the data for a similar product introduced before • We know which customers decided to buy and which decided otherwise. This buy/don’t buy decision forms the class attribute • Collect various demographic, lifestyle, and company-interaction related information about all such customers (where they stay, how much they earn, . . . ) • Use this information as input attributes to learn a classifier model
  • 7. Classify This! Christof Monz Data Minging - Week 1: Introduction 12 Some Observations Christof Monz Data Minging - Week 1: Introduction 13 Training data (examples for which the class is known) Feature extraction (what are the ’things’ that are relevant to predict a class?) Feature weight (how important is a feature?) Feature combination (sometimes features act together) Over-fitting (some features don’t generalize well) Evaluation (how accurate is the prediction?)
  • 8. Machine Learning Christof Monz Data Minging - Week 1: Introduction 14 The research area of machine learning investigates and formalizes the challenge of prediction and description by computer Machine learning plays a central role in data mining It is used for: • Building new models • Adapting existing models to new situations • Comparing the performance of competing models Machine Learning is . . . Christof Monz Data Minging - Week 1: Introduction 15 . . . the principles, methods, and algorithms for learning and prediction on the basis of past experience . . . already everywhere: speech recognition, hand-written character recognition, computer vision, information retrieval, operating systems, compilers, fraud detection, security, defense applications, . . .
  • 9. Learning Christof Monz Data Minging - Week 1: Introduction 16 Steps • entertain a (biased) set of possibilities • adjust predictions based on feedback • rethink the set of possibilities Principles of learning are ‘universal’ • society (e.g., scientific community) • animal (e.g., human) • machine Learning and Prediction Christof Monz Data Minging - Week 1: Introduction 17 We make predictions all the time but rarely investigate the processes underlying our predictions In carrying out scientific research we are also governed by how theories are evaluated To automate the process of making predictions we need to understand in addition how we search and refine ‘theories’
  • 10. Learning: Key Steps Christof Monz Data Minging - Week 1: Introduction 18 Data and assumptions • What data is available for the learning task? • What can we assume about the problem? Representation • How should we represent the examples to be classified? Evaluation and Estimation • How well are we doing? • How do we adjust our predictions based on the feedback? • Can we rethink the approach to do even better? Example Christof Monz Data Minging - Week 1: Introduction 19 A classification problem: predict the grades for students taking this course Key Steps: 1. data 2. assumptions 3. representation 4. estimation 5. evaluation 6. model selection
  • 11. Example Christof Monz Data Minging - Week 1: Introduction 20 Key Steps: 1. data: what ‘past experience’ can we rely on? 2. assumptions: what can we assume about the students or the course? 3. representation: how do we ‘summarize’ a student? 4. estimation: how do we construct a map from students to grades? 5. evaluation: how well are we predicting? 6. model selection: perhaps we can do even better? Example: Data Christof Monz Data Minging - Week 1: Introduction 21 The data we have available (in principle): • Names and grades of students in past years ML courses • Academic record of past and current students Training data: Student ML course 1 course 2 . . . Peter A B A . . . David B A A . . . Test data: Student ML course 1 course 2 . . . Jack ? C A . . . Kate ? A A . . .
  • 12. Assumptions Christof Monz Data Minging - Week 1: Introduction 22 There are many assumptions we can make to facilitate predictions: • The course has remained roughly the same over the years • Each student performs independently from others Example: Representation Christof Monz Data Minging - Week 1: Introduction 23 Academic records are rather diverse so we might limit the summaries to a select few courses For example, we can summarize the ith student (say David) with a vector: xi = [B A A] The available data in this representation: Training Testing Student ML grade Student ML grade x1 A x1 ? x2 B x2 ? . . . . . . . . . . . .
  • 13. Example: Estimation Christof Monz Data Minging - Week 1: Introduction 24 Given the training data Student ML grade x1 A x2 B . . . . . . find a mapping from input vectors x to ‘labels’ y encoding the grades for the ML course. Possible solution (nearest neighbor classifier): 1. For any student x in the test set find the ‘closest’ student xi in the training set 2. Predict yi as the grade of the closest student Example: Evaluation Christof Monz Data Minging - Week 1: Introduction 25 How can we tell how good our predictions are? • We can wait till the end of this course • We can try to assess the accuracy based on the data we already have (part of the training data) Possible solution: • Divide the training set further into training and test sets • Evaluate the classifier constructed on the basis of only the smaller training set on the new test set
  • 14. Example: Model Selection Christof Monz Data Minging - Week 1: Introduction 26 We can refine • the estimation algorithm (e.g., using a classifier other than the nearest neighbor classifier) • the representation (e.g., base the summaries on a different set of courses) • the assumptions (e.g., perhaps students work in groups) etc. We have to rely on the method of evaluating the accuracy of our predictions to select among the possible refinements Types of Learning Approaches Christof Monz Data Minging - Week 1: Introduction 27 Supervised learning: where we get a set of training inputs and outputs • E.g., classification, regression Unsupervised learning: where we are interested in capturing inherent organization in the data • E.g., clustering, density estimation Reinforcement learning: where we only get feedback in the form of how well we are doing (not what we should be doing) • E.g., planning
  • 15. Challenges of Data Mining Christof Monz Data Minging - Week 1: Introduction 28 Scalability Dimensionality/Complexity Data quality Data ownership Privacy considerations Continually updated data Recap Christof Monz Data Minging - Week 1: Introduction 29 Difference between data mining and other research areas Applications of data mining Need for automation and the use of machine learning Key steps in machine learning
  • 16. About This Course Christof Monz Data Minging - Week 1: Introduction 30 This course does not: • give a comprehensive introduction to data mining • cover how to adapt data mining to specific applications • cover feature extraction • cover evaluation issues in detail This course does: • focus on the pre-dominant approach in data mining: machine learning • sketch some of the example applications • introduce a representative selection of machine learning techniques used in data mining • focus on the algorithmic fundamentals of machine learning Approaches Covered Christof Monz Data Minging - Week 1: Introduction 31 Linear regression (regression) Decision Trees (classification) Neural Networks (classification) k-Nearest-Neighbors (classification) Naive Bayes (classification) K-Means (clustering) Hierarchical Clustering (clustering)
  • 17. What to get out of this Course Christof Monz Data Minging - Week 1: Introduction 32 At the end of this course you will have learned: • what type of problems can be addressed by data mining techniques • what the most common machine learning approaches in data mining are • which machine learning approaches are appropriate for a given type of data mining application • the algorithmic fundamentals of a number of relevant machine learning approaches Course Administrivia Christof Monz Data Minging - Week 1: Introduction 33 Exam counts for 40%, homework counts for 20%, practical assignments (40%) Lectures are on Tuesday 9-11am (D1.116) Tutorials (werk colleges) are on Thursday 9-11am (G0.05) and Fridays 9-11am (G5.29) Labs are on Thursday 11am-1pm (G0.18) or Friday 11am-1pm (G0.18)
  • 18. Course Administrivia Christof Monz Data Minging - Week 1: Introduction 34 Teaching assistants: Yijin He (email: jiyinhe@gmail.com) (English only!) Spyros Martzoukos (email: S.Martzoukos@uva.nl) (English only!) Course web page: on Blackboard Check course web page regularly for announcements, slides, . . .