3. “The function of the brain and nervous
system is to protect us from being
overwhelmed and confused by this mass of
largely useless and irrelevant knowledge,
by shutting out most of what we should
otherwise perceive or remember at any
moment, and leaving only that very small
and special selection which is likely to
be practically useful.”
-Aldous Huxley
7. Operation Point Cluster
• Review general clustering
algorithms
• Suggest strategies &
implementations for clustering
for web applications
– Server-side (C#)
– Offline w/ArcGIS (Python)
– Offline w/3rd Party (Python)
8. Data Classification
(One Dimensional Clustering)
• Equal-interval
– Clusters have same max – min
(interval)
• Quantile
– Clusters have same count
• Natural Breaks (Jenks)
– Clusters have minimum deviation from
mean
10. KMeans
(Centroid-based)
1. Choose random starting points
2. Assign each target point to
cluster candidates
3. Replace randomly centroid
point with mean of group.
4. Repeat steps 2 & 3 until
convergence.
11.
12. Grid Clustering
(Grid-based)
1. Overlay mesh sized appropriate
for zoom level
2. Compare point coordinates to
mesh to create clusters.
• Very common on client-side
• Can lead to undesired “Grid”
effect
• Somewhat non-deterministic
14. QuadTree
(Distance-based)
1.Input minimum cluster tolerance
2.Recursively insert points into
existing tree
1. Where distance < tolerance, number
of points++
2. Where distance > tolerance, insert
to child node.
• Easy to implement
• Can lead to “Grid” affect
16. DBSCAN
(Density-based)
1. Takes search radius and
minimum number of points for
cluster
2. Visit each point and count
number of points in search
radius
• Clusters can be any shape
• Search radius determined by zoom
level
18. Where should clustering
occur?
• Small number of points ( < 10,000 )
• No addition server load
Client-side
• Widely available within client APIs
• Limited by client-side languages
• Medium number of points ( < 1M )
• Many language/library options
Server-side
• Robust querying
• Very maintainable / extendible
• Large number of points( > 1M)
• Many language/library options
Offline
• Limited querying
• Output Normal Feature Class
19. Clustering Server Object Extension
(C#/QuadTree)
1. Extends MapServer
2. Wraps map query based on extent
3. returns clustered results
4. Stateless
5. Problems
1. Re-calculates tree on each request
2. Client-side wrappers
3. Lost out-of-box ArcGIS Server
functions
20. Clustering with Arcpy
(distance-based / offline)
1.Divide data into logical
chunks (where clause)
2.Integrate using tolerance
3.Collect Events
4.Spatial Join
add descriptive statistics
4.Append all results
21. Clustering w/Python
• Numpy/Scipy
– Defacto
• Scikit-Learn
– (Python machine learning library)
• PyTables
– HDF5, akin to NetCDF, but with
support for hierarchical tables and
very scalable
– http://bcdcspatial.blogspot.com/2013
/02/converting-arcgis-feature-class-
to.html