SlideShare ist ein Scribd-Unternehmen logo
1 von 28
An Efficient Two-Step Method for
Classification of Spatial Data
Authors : Krzysztof Koperski, Jiawei Han, Nebojsa Stefanovic
Presented on : Spatial Data Handling (SDH’ 98)
Reviewed by: Abhishek Agrawal
Introduction
• In spatial databases very large amounts of Spatial Data have been collected used in various
applications ranging from remote sensing to geographical information systems (GIS), computer
cartography, environmental assessment and planning etc.
• These spatial databases contains many hidden and interesting implicit spatial relations and
patterns which are extracted which are not explicitly stored in such databases.
• One of the spatial data mining techniques is the classification of the spatial objects stored in the
spatial databases where the objective is to label different spatial objects by identifying set of rules
that can describe the partition.
Classification Approach : Spatial Decision Tree
❖ In this paper[1], authors have used decision tree to classify spatial objects based on
➢ Non-Spatial properties of the classified objects (Traditional)
➢ Spatial relations of the classified objects to other objects in the database
❖ Also, authors have analyzed the problem of classification of spatial objects in relevance to
thematic maps and and spatial relationships to other objects in the database.
❖ With the new approach of spatial classification using decision tree, authors provided the
experimental results of both real and synthetic data to compare the performance and quality of the
results with other existing methods in the same problem space.
Business Problem: Label the local business units such as shopping malls
or stores based on their business profit status based on the influence of
their trade area.
Problem Definition
Problem Definition Continue..
Data Mining Problem: Classification of spatial objects such as shopping
malls or stores defined by its attributes, that belong to two or different
classes Y and N which are selected based on attribute high_profit with two
values Y for “yes” and N for “no”.
● In our example, objects OID1 and OID2
belong to class Y and objects OID3,
OID4 and OID5 belong to class N.
● In our example, objects OID1 and OID2
belong to class Y and objects OID3,
OID4 and OID5 belong to class N.
● We want to build a decision tree
classifying objects Oi based on two
types of information:
➢ descriptions of the objects in the
proximity of objects Oi
● In our example, objects OID1 and OID2
belong to class Y and objects OID3,
OID4 and OID5 belong to class N.
● We want to build a decision tree
classifying objects Oi based on two
types of information:
➢ descriptions of the objects in the
proximity of objects Oi
➢ non-spatial attributes of the
thematic map
State of the Art
● Fayyad et. al.[2] used decision tree methods to classify images of stellar objects to detect stars
and galaxies. They used low-level image processing system FOCAS to select and generate basic
attributes. The proposed method deals with image databases and is tailored for the astronomical
attributes which is not suitable for vector data format (GIS Database) .
● Another approach, Ester et. al.[3], based on ID3 algorithm and uses the concept of neighbourhood
graphs. This method doesn’t analyze aggregate values of non-spatial attributes for the
neighbouring objects. Similarly it doesn’t perform any relevance analysis for narrowing its search
space.
● Ng and Yu[4] described a method for the extraction of strong, common and discriminating
characteristics of clusters based on the thematic map. They have not extended the result
characteristics of thematic map to construct decision trees.
Classification Algorithm
Building a decision tree to classify spatial object based on spatial predicates, functions and
thematic maps.
Input :
1. Spatial Database containing:
a. classified objects Oc
b. other spatial objects with non-spatial attributes
2. Geo-mining query specifying:
a. objects to be used, predictive attributes, predicates and functions
b. attribute, predicate or function used as a class label
Output :
Binary Decision Tree
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm
3. Find the best size for the buffer for aggregates of thematic map polygons. It is done by finding for
all relevant non-spatial aggregate attributes the size of the buffer Xmax where the information gain
for the aggregated attribute is maximum.
4. Build sets of predicates using relevant fine predicates and generalize based on concept
hierarchies.
5. Generate Decision Tree
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
Step1.a : Define MBR(Minimum Bounding Rect.)
using data distribution and confidence
level as threshold.
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
Step 2.a : Find coarse description for the sample to
list the spatial attributes, functions etc.
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
Step 2.b : Generalize the predicates using concept
hierarchies
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm
RELIEF ALGORITHM
Find Relevant Attributes
Step 2.c : For every object s in the sample two nearest neighbours are found,
where one neighbour belongs to the same class(Y/N) as object s (nearest hit)
and other neighbour belongs to a class different than s (nearest miss)
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm
RELIEF ALGORITHM
Find Relevant Attributes
Step 2.c : Give weights to the predicate based on neighbourhood predicates:
➔ For nearest hit, if it has the same predicate value, then weight for this predicate increases ↑
➔ For nearest hit, if it has the different predicate value, then weight for this predicate decreases ↓
➔ For nearest miss, if it has the same predicate value, then weight for this predicate decreases ↓
➔ For nearest miss, if it has the different predicate value, then weight for this predicate increases ↑
Now based on weight > threshold, we select the relevant predicates
Method: Spatial Decision Tree
Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm
3. Find the best size for the buffer for aggregates of thematic map polygons. It is done by finding for
all relevant non-spatial aggregate attributes the size of the buffer Xmax where the information gain
for the aggregated attribute is maximum.
4. Build sets of predicates using relevant fine predicates and generalize based on concept
hierarchies.
5. Generate Decision Tree
Method: Spatial Decision Tree
Step 3: Find the best size for the buffer for aggregates of thematic map
polygons.
• Now for the shape of the buffer, different criteria
may be used. The buffers may be based on
rings or customer penetration polygons.
• The rings have some advantages:
1. ease of use,
2. no need to determine trade area based on
customer data
1. easy comparison between sites
Method: Spatial Decision Tree
Step 3: Find the best size for the buffer for aggregates of thematic map
polygons.
• Buffers represents area that have an impact
on class label attribute of classified objects.
• The size of buffer is fixed by finding for all
relevant non-spatial aggregate attributes,
the size of the buffer Xmax where the
information gain for the aggregated attribute
is maximum.
Method: Spatial Decision Tree
Step 4 : Build sets of predicates using relevant fine predicates and
generalize based on concept hierarchies.
Method: Spatial Decision Tree
Step 4 : Build sets of predicates using relevant fine predicates and
generalize based on concept hierarchies.
Method: Spatial Decision Tree
Step 5 : Build Decision Tree
Method: Spatial Decision Tree
Step 5 : Build Decision Tree : Binary Split ( Based on Info gain )
Complexity Analysis
Complexity Analysis:
Results & Performance Evaluation
• Experiments were performed on synthetic data merge with TIGER U.S. census data for
washington state.
• With real data, best results were found with threshold between 0 to 0.2 and accuracy drastically
increased when relevance analysis was used.
Conclusion and Future Directions
• Classification of geographical objects enables researcher to explore
interesting relations between spatial and non-spatial data.
• The algorithm performs less costly, approximate spatial computations,
relevance analyses for producing smaller and more accurate decision trees.
• The pre-computed spatial indexes can be stored as part of regular spatial
query to find neighbourhood attributes.
• Authors plan to perform experiments using aggregate values for thematic
maps and by varying distance for close_to spatial predicates.
• Integrate with their spatial data mining prototype GeoMiner
References
[1] Koperski, Krzysztof, Jiawei Han, and Nebojsa Stefanovic. "An efficient two-step method for
classification of spatial data." proceedings of International Symposium on Spatial Data Handling
(SDH’98). 1998.
[2] Fayyad, Usama M., S. George Djorgovski, and Nicholas Weir. "Automating the analysis and
cataloging of sky surveys." Advances in knowledge discovery and data mining. American
Association for Artificial Intelligence, 1996.
[3] Ester, Martin, Hans-Peter Kriegel, and Jörg Sander. "Spatial data mining: A database approach."
Advances in spatial databases. Springer Berlin Heidelberg, 1997.
[4] Ng, R. T., and Y. Yu Discovering Strong. "Common and Discriminating Characteristics of Clusters
from Thematic Maps." Proc. of the 11th Annual Symp. on Geographic Information Systems. 1997.

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (11)

My8clst
My8clstMy8clst
My8clst
 
What is cluster analysis
What is cluster analysisWhat is cluster analysis
What is cluster analysis
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
Data clustering
Data clustering Data clustering
Data clustering
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis Introduction
 
Dataa miining
Dataa miiningDataa miining
Dataa miining
 
Lect4
Lect4Lect4
Lect4
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 

Andere mochten auch

Svm implementation for Health Data
Svm implementation for Health DataSvm implementation for Health Data
Svm implementation for Health DataAbhishek Agrawal
 
Customer Centric Data Mining
Customer Centric Data MiningCustomer Centric Data Mining
Customer Centric Data Mininganjeshdubey
 
Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...
Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...
Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...shibbirtanvin
 
Artificial Neural Network
Artificial Neural Network Artificial Neural Network
Artificial Neural Network Iman Ardekani
 
artificial neural network
artificial neural networkartificial neural network
artificial neural networkPallavi Yadav
 
Artificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural NetworksArtificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural NetworksThe Integral Worm
 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications Ahmed_hashmi
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networksstellajoseph
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkDEEPASHRI HK
 

Andere mochten auch (18)

Classification Using Decision tree
Classification Using Decision treeClassification Using Decision tree
Classification Using Decision tree
 
Svm implementation for Health Data
Svm implementation for Health DataSvm implementation for Health Data
Svm implementation for Health Data
 
Classification ANN
Classification ANNClassification ANN
Classification ANN
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Customer Centric Data Mining
Customer Centric Data MiningCustomer Centric Data Mining
Customer Centric Data Mining
 
Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...
Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...
Knowledge Discovery from Academic Data using Association Rule Mining, Paper P...
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
Artificial Neural Network
Artificial Neural Network Artificial Neural Network
Artificial Neural Network
 
artificial neural network
artificial neural networkartificial neural network
artificial neural network
 
Artificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural NetworksArtificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural Networks
 
Decision tree
Decision treeDecision tree
Decision tree
 
Back propagation
Back propagationBack propagation
Back propagation
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Decision tree
Decision treeDecision tree
Decision tree
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 

Ähnlich wie Efficient two-step spatial classification

Developing a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGISDeveloping a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGISCOGS Presentations
 
Computer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectComputer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectIOSR Journals
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit vmalathieswaran29
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.pptvikassingh569137
 
Applying a new subject classification scheme for a database by a data-driven ...
Applying a new subject classification scheme for a database by a data-driven ...Applying a new subject classification scheme for a database by a data-driven ...
Applying a new subject classification scheme for a database by a data-driven ...National Institute of Informatics
 
DM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year studentsDM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year studentssriharipatilin
 
Automated features extraction from satellite images.
Automated features extraction from satellite images.Automated features extraction from satellite images.
Automated features extraction from satellite images.HimanshuGupta1081
 
Survey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real ImagesSurvey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real ImagesCSCJournals
 
Object Oriented Programming_Lecture 2
Object Oriented Programming_Lecture 2Object Oriented Programming_Lecture 2
Object Oriented Programming_Lecture 2Mahmoud Alfarra
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonJonathon Hare
 

Ähnlich wie Efficient two-step spatial classification (20)

DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
 
Developing a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGISDeveloping a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGIS
 
Computer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectComputer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an Object
 
Virtual Enterprise Model
Virtual Enterprise ModelVirtual Enterprise Model
Virtual Enterprise Model
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
 
ClusetrigBasic.ppt
ClusetrigBasic.pptClusetrigBasic.ppt
ClusetrigBasic.ppt
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
47 292-298
47 292-29847 292-298
47 292-298
 
Applying a new subject classification scheme for a database by a data-driven ...
Applying a new subject classification scheme for a database by a data-driven ...Applying a new subject classification scheme for a database by a data-driven ...
Applying a new subject classification scheme for a database by a data-driven ...
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
DM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year studentsDM UNIT_4 PPT for btech final year students
DM UNIT_4 PPT for btech final year students
 
Automated features extraction from satellite images.
Automated features extraction from satellite images.Automated features extraction from satellite images.
Automated features extraction from satellite images.
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
lab report 4
lab report 4lab report 4
lab report 4
 
Survey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real ImagesSurvey of The Problem of Object Detection In Real Images
Survey of The Problem of Object Detection In Real Images
 
Object Oriented Programming_Lecture 2
Object Oriented Programming_Lecture 2Object Oriented Programming_Lecture 2
Object Oriented Programming_Lecture 2
 
Unit3
Unit3Unit3
Unit3
 
Searching Images: Recent research at Southampton
Searching Images: Recent research at SouthamptonSearching Images: Recent research at Southampton
Searching Images: Recent research at Southampton
 
I1803026164
I1803026164I1803026164
I1803026164
 
advancedR.pdf
advancedR.pdfadvancedR.pdf
advancedR.pdf
 

Kürzlich hochgeladen

Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 

Kürzlich hochgeladen (20)

Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 

Efficient two-step spatial classification

  • 1. An Efficient Two-Step Method for Classification of Spatial Data Authors : Krzysztof Koperski, Jiawei Han, Nebojsa Stefanovic Presented on : Spatial Data Handling (SDH’ 98) Reviewed by: Abhishek Agrawal
  • 2. Introduction • In spatial databases very large amounts of Spatial Data have been collected used in various applications ranging from remote sensing to geographical information systems (GIS), computer cartography, environmental assessment and planning etc. • These spatial databases contains many hidden and interesting implicit spatial relations and patterns which are extracted which are not explicitly stored in such databases. • One of the spatial data mining techniques is the classification of the spatial objects stored in the spatial databases where the objective is to label different spatial objects by identifying set of rules that can describe the partition.
  • 3. Classification Approach : Spatial Decision Tree ❖ In this paper[1], authors have used decision tree to classify spatial objects based on ➢ Non-Spatial properties of the classified objects (Traditional) ➢ Spatial relations of the classified objects to other objects in the database ❖ Also, authors have analyzed the problem of classification of spatial objects in relevance to thematic maps and and spatial relationships to other objects in the database. ❖ With the new approach of spatial classification using decision tree, authors provided the experimental results of both real and synthetic data to compare the performance and quality of the results with other existing methods in the same problem space.
  • 4. Business Problem: Label the local business units such as shopping malls or stores based on their business profit status based on the influence of their trade area. Problem Definition
  • 5. Problem Definition Continue.. Data Mining Problem: Classification of spatial objects such as shopping malls or stores defined by its attributes, that belong to two or different classes Y and N which are selected based on attribute high_profit with two values Y for “yes” and N for “no”.
  • 6. ● In our example, objects OID1 and OID2 belong to class Y and objects OID3, OID4 and OID5 belong to class N.
  • 7. ● In our example, objects OID1 and OID2 belong to class Y and objects OID3, OID4 and OID5 belong to class N. ● We want to build a decision tree classifying objects Oi based on two types of information: ➢ descriptions of the objects in the proximity of objects Oi
  • 8. ● In our example, objects OID1 and OID2 belong to class Y and objects OID3, OID4 and OID5 belong to class N. ● We want to build a decision tree classifying objects Oi based on two types of information: ➢ descriptions of the objects in the proximity of objects Oi ➢ non-spatial attributes of the thematic map
  • 9. State of the Art ● Fayyad et. al.[2] used decision tree methods to classify images of stellar objects to detect stars and galaxies. They used low-level image processing system FOCAS to select and generate basic attributes. The proposed method deals with image databases and is tailored for the astronomical attributes which is not suitable for vector data format (GIS Database) . ● Another approach, Ester et. al.[3], based on ID3 algorithm and uses the concept of neighbourhood graphs. This method doesn’t analyze aggregate values of non-spatial attributes for the neighbouring objects. Similarly it doesn’t perform any relevance analysis for narrowing its search space. ● Ng and Yu[4] described a method for the extraction of strong, common and discriminating characteristics of clusters based on the thematic map. They have not extended the result characteristics of thematic map to construct decision trees.
  • 10. Classification Algorithm Building a decision tree to classify spatial object based on spatial predicates, functions and thematic maps. Input : 1. Spatial Database containing: a. classified objects Oc b. other spatial objects with non-spatial attributes 2. Geo-mining query specifying: a. objects to be used, predictive attributes, predicates and functions b. attribute, predicate or function used as a class label Output : Binary Decision Tree
  • 11. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. b. Perform generalization of the sets of predicates based on concept hierarchies c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm 3. Find the best size for the buffer for aggregates of thematic map polygons. It is done by finding for all relevant non-spatial aggregate attributes the size of the buffer Xmax where the information gain for the aggregated attribute is maximum. 4. Build sets of predicates using relevant fine predicates and generalize based on concept hierarchies. 5. Generate Decision Tree
  • 12. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description
  • 13. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. Step1.a : Define MBR(Minimum Bounding Rect.) using data distribution and confidence level as threshold.
  • 14. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. Step 2.a : Find coarse description for the sample to list the spatial attributes, functions etc.
  • 15. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. b. Perform generalization of the sets of predicates based on concept hierarchies Step 2.b : Generalize the predicates using concept hierarchies
  • 16. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. b. Perform generalization of the sets of predicates based on concept hierarchies c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm RELIEF ALGORITHM Find Relevant Attributes Step 2.c : For every object s in the sample two nearest neighbours are found, where one neighbour belongs to the same class(Y/N) as object s (nearest hit) and other neighbour belongs to a class different than s (nearest miss)
  • 17. 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. b. Perform generalization of the sets of predicates based on concept hierarchies c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm RELIEF ALGORITHM Find Relevant Attributes Step 2.c : Give weights to the predicate based on neighbourhood predicates: ➔ For nearest hit, if it has the same predicate value, then weight for this predicate increases ↑ ➔ For nearest hit, if it has the different predicate value, then weight for this predicate decreases ↓ ➔ For nearest miss, if it has the same predicate value, then weight for this predicate decreases ↓ ➔ For nearest miss, if it has the different predicate value, then weight for this predicate increases ↑ Now based on weight > threshold, we select the relevant predicates Method: Spatial Decision Tree
  • 18. Method: Spatial Decision Tree 1. Collect a set S of classified objects and other objects that are used for description 2. For the sample of spatial object Oc from S: a. Build sets of predicates describing all objects using coarse predicates, functions and attributes. b. Perform generalization of the sets of predicates based on concept hierarchies c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm 3. Find the best size for the buffer for aggregates of thematic map polygons. It is done by finding for all relevant non-spatial aggregate attributes the size of the buffer Xmax where the information gain for the aggregated attribute is maximum. 4. Build sets of predicates using relevant fine predicates and generalize based on concept hierarchies. 5. Generate Decision Tree
  • 19. Method: Spatial Decision Tree Step 3: Find the best size for the buffer for aggregates of thematic map polygons. • Now for the shape of the buffer, different criteria may be used. The buffers may be based on rings or customer penetration polygons. • The rings have some advantages: 1. ease of use, 2. no need to determine trade area based on customer data 1. easy comparison between sites
  • 20. Method: Spatial Decision Tree Step 3: Find the best size for the buffer for aggregates of thematic map polygons. • Buffers represents area that have an impact on class label attribute of classified objects. • The size of buffer is fixed by finding for all relevant non-spatial aggregate attributes, the size of the buffer Xmax where the information gain for the aggregated attribute is maximum.
  • 21. Method: Spatial Decision Tree Step 4 : Build sets of predicates using relevant fine predicates and generalize based on concept hierarchies.
  • 22. Method: Spatial Decision Tree Step 4 : Build sets of predicates using relevant fine predicates and generalize based on concept hierarchies.
  • 23. Method: Spatial Decision Tree Step 5 : Build Decision Tree
  • 24. Method: Spatial Decision Tree Step 5 : Build Decision Tree : Binary Split ( Based on Info gain )
  • 26. Results & Performance Evaluation • Experiments were performed on synthetic data merge with TIGER U.S. census data for washington state. • With real data, best results were found with threshold between 0 to 0.2 and accuracy drastically increased when relevance analysis was used.
  • 27. Conclusion and Future Directions • Classification of geographical objects enables researcher to explore interesting relations between spatial and non-spatial data. • The algorithm performs less costly, approximate spatial computations, relevance analyses for producing smaller and more accurate decision trees. • The pre-computed spatial indexes can be stored as part of regular spatial query to find neighbourhood attributes. • Authors plan to perform experiments using aggregate values for thematic maps and by varying distance for close_to spatial predicates. • Integrate with their spatial data mining prototype GeoMiner
  • 28. References [1] Koperski, Krzysztof, Jiawei Han, and Nebojsa Stefanovic. "An efficient two-step method for classification of spatial data." proceedings of International Symposium on Spatial Data Handling (SDH’98). 1998. [2] Fayyad, Usama M., S. George Djorgovski, and Nicholas Weir. "Automating the analysis and cataloging of sky surveys." Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, 1996. [3] Ester, Martin, Hans-Peter Kriegel, and Jörg Sander. "Spatial data mining: A database approach." Advances in spatial databases. Springer Berlin Heidelberg, 1997. [4] Ng, R. T., and Y. Yu Discovering Strong. "Common and Discriminating Characteristics of Clusters from Thematic Maps." Proc. of the 11th Annual Symp. on Geographic Information Systems. 1997.