This document summarizes an article on classifying spatial data using a two-step decision tree method. It introduces spatial classification and describes the authors' approach of using spatial relationships and attributes in decision trees. The method collects classified spatial objects, builds predicate descriptions, performs relevance analysis to identify important attributes, determines optimal buffer sizes, constructs the decision tree using fine-grained predicates, and evaluates performance on real and synthetic datasets.
1. An Efficient Two-Step Method for
Classification of Spatial Data
Authors : Krzysztof Koperski, Jiawei Han, Nebojsa Stefanovic
Presented on : Spatial Data Handling (SDH’ 98)
Reviewed by: Abhishek Agrawal
2. Introduction
• In spatial databases very large amounts of Spatial Data have been collected used in various
applications ranging from remote sensing to geographical information systems (GIS), computer
cartography, environmental assessment and planning etc.
• These spatial databases contains many hidden and interesting implicit spatial relations and
patterns which are extracted which are not explicitly stored in such databases.
• One of the spatial data mining techniques is the classification of the spatial objects stored in the
spatial databases where the objective is to label different spatial objects by identifying set of rules
that can describe the partition.
3. Classification Approach : Spatial Decision Tree
❖ In this paper[1], authors have used decision tree to classify spatial objects based on
➢ Non-Spatial properties of the classified objects (Traditional)
➢ Spatial relations of the classified objects to other objects in the database
❖ Also, authors have analyzed the problem of classification of spatial objects in relevance to
thematic maps and and spatial relationships to other objects in the database.
❖ With the new approach of spatial classification using decision tree, authors provided the
experimental results of both real and synthetic data to compare the performance and quality of the
results with other existing methods in the same problem space.
4. Business Problem: Label the local business units such as shopping malls
or stores based on their business profit status based on the influence of
their trade area.
Problem Definition
5. Problem Definition Continue..
Data Mining Problem: Classification of spatial objects such as shopping
malls or stores defined by its attributes, that belong to two or different
classes Y and N which are selected based on attribute high_profit with two
values Y for “yes” and N for “no”.
6. ● In our example, objects OID1 and OID2
belong to class Y and objects OID3,
OID4 and OID5 belong to class N.
7. ● In our example, objects OID1 and OID2
belong to class Y and objects OID3,
OID4 and OID5 belong to class N.
● We want to build a decision tree
classifying objects Oi based on two
types of information:
➢ descriptions of the objects in the
proximity of objects Oi
8. ● In our example, objects OID1 and OID2
belong to class Y and objects OID3,
OID4 and OID5 belong to class N.
● We want to build a decision tree
classifying objects Oi based on two
types of information:
➢ descriptions of the objects in the
proximity of objects Oi
➢ non-spatial attributes of the
thematic map
9. State of the Art
● Fayyad et. al.[2] used decision tree methods to classify images of stellar objects to detect stars
and galaxies. They used low-level image processing system FOCAS to select and generate basic
attributes. The proposed method deals with image databases and is tailored for the astronomical
attributes which is not suitable for vector data format (GIS Database) .
● Another approach, Ester et. al.[3], based on ID3 algorithm and uses the concept of neighbourhood
graphs. This method doesn’t analyze aggregate values of non-spatial attributes for the
neighbouring objects. Similarly it doesn’t perform any relevance analysis for narrowing its search
space.
● Ng and Yu[4] described a method for the extraction of strong, common and discriminating
characteristics of clusters based on the thematic map. They have not extended the result
characteristics of thematic map to construct decision trees.
10. Classification Algorithm
Building a decision tree to classify spatial object based on spatial predicates, functions and
thematic maps.
Input :
1. Spatial Database containing:
a. classified objects Oc
b. other spatial objects with non-spatial attributes
2. Geo-mining query specifying:
a. objects to be used, predictive attributes, predicates and functions
b. attribute, predicate or function used as a class label
Output :
Binary Decision Tree
11. Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm
3. Find the best size for the buffer for aggregates of thematic map polygons. It is done by finding for
all relevant non-spatial aggregate attributes the size of the buffer Xmax where the information gain
for the aggregated attribute is maximum.
4. Build sets of predicates using relevant fine predicates and generalize based on concept
hierarchies.
5. Generate Decision Tree
12. Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
13. Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
Step1.a : Define MBR(Minimum Bounding Rect.)
using data distribution and confidence
level as threshold.
14. Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
Step 2.a : Find coarse description for the sample to
list the spatial attributes, functions etc.
15. Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
Step 2.b : Generalize the predicates using concept
hierarchies
16. Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm
RELIEF ALGORITHM
Find Relevant Attributes
Step 2.c : For every object s in the sample two nearest neighbours are found,
where one neighbour belongs to the same class(Y/N) as object s (nearest hit)
and other neighbour belongs to a class different than s (nearest miss)
17. 1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm
RELIEF ALGORITHM
Find Relevant Attributes
Step 2.c : Give weights to the predicate based on neighbourhood predicates:
➔ For nearest hit, if it has the same predicate value, then weight for this predicate increases ↑
➔ For nearest hit, if it has the different predicate value, then weight for this predicate decreases ↓
➔ For nearest miss, if it has the same predicate value, then weight for this predicate decreases ↓
➔ For nearest miss, if it has the different predicate value, then weight for this predicate increases ↑
Now based on weight > threshold, we select the relevant predicates
Method: Spatial Decision Tree
18. Method: Spatial Decision Tree
1. Collect a set S of classified objects and other objects that are used for description
2. For the sample of spatial object Oc from S:
a. Build sets of predicates describing all objects using coarse predicates, functions and
attributes.
b. Perform generalization of the sets of predicates based on concept hierarchies
c. Find coarse predicates, functions, and relevant attributes using RELIEF algorithm
3. Find the best size for the buffer for aggregates of thematic map polygons. It is done by finding for
all relevant non-spatial aggregate attributes the size of the buffer Xmax where the information gain
for the aggregated attribute is maximum.
4. Build sets of predicates using relevant fine predicates and generalize based on concept
hierarchies.
5. Generate Decision Tree
19. Method: Spatial Decision Tree
Step 3: Find the best size for the buffer for aggregates of thematic map
polygons.
• Now for the shape of the buffer, different criteria
may be used. The buffers may be based on
rings or customer penetration polygons.
• The rings have some advantages:
1. ease of use,
2. no need to determine trade area based on
customer data
1. easy comparison between sites
20. Method: Spatial Decision Tree
Step 3: Find the best size for the buffer for aggregates of thematic map
polygons.
• Buffers represents area that have an impact
on class label attribute of classified objects.
• The size of buffer is fixed by finding for all
relevant non-spatial aggregate attributes,
the size of the buffer Xmax where the
information gain for the aggregated attribute
is maximum.
21. Method: Spatial Decision Tree
Step 4 : Build sets of predicates using relevant fine predicates and
generalize based on concept hierarchies.
22. Method: Spatial Decision Tree
Step 4 : Build sets of predicates using relevant fine predicates and
generalize based on concept hierarchies.
26. Results & Performance Evaluation
• Experiments were performed on synthetic data merge with TIGER U.S. census data for
washington state.
• With real data, best results were found with threshold between 0 to 0.2 and accuracy drastically
increased when relevance analysis was used.
27. Conclusion and Future Directions
• Classification of geographical objects enables researcher to explore
interesting relations between spatial and non-spatial data.
• The algorithm performs less costly, approximate spatial computations,
relevance analyses for producing smaller and more accurate decision trees.
• The pre-computed spatial indexes can be stored as part of regular spatial
query to find neighbourhood attributes.
• Authors plan to perform experiments using aggregate values for thematic
maps and by varying distance for close_to spatial predicates.
• Integrate with their spatial data mining prototype GeoMiner
28. References
[1] Koperski, Krzysztof, Jiawei Han, and Nebojsa Stefanovic. "An efficient two-step method for
classification of spatial data." proceedings of International Symposium on Spatial Data Handling
(SDH’98). 1998.
[2] Fayyad, Usama M., S. George Djorgovski, and Nicholas Weir. "Automating the analysis and
cataloging of sky surveys." Advances in knowledge discovery and data mining. American
Association for Artificial Intelligence, 1996.
[3] Ester, Martin, Hans-Peter Kriegel, and Jörg Sander. "Spatial data mining: A database approach."
Advances in spatial databases. Springer Berlin Heidelberg, 1997.
[4] Ng, R. T., and Y. Yu Discovering Strong. "Common and Discriminating Characteristics of Clusters
from Thematic Maps." Proc. of the 11th Annual Symp. on Geographic Information Systems. 1997.