SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Brendan Collins
“The function of the brain and nervous
system is to protect us from being
overwhelmed and confused by this mass of
largely useless and irrelevant knowledge,
by shutting out most of what we should
otherwise perceive or remember at any
moment, and leaving only that very small
and special selection which is likely to
be practically useful.”

                          -Aldous Huxley
103,000 Public Schools
(No Clustering)
103,000 Public Schools
(Count)
103,000 Public Schools
(Mean Student Teacher Ratio)
Operation Point Cluster
• Review general clustering
  algorithms
• Suggest strategies &
  implementations for clustering
  for web applications
 – Server-side (C#)
 – Offline w/ArcGIS (Python)
 – Offline w/3rd Party (Python)
Data Classification
       (One Dimensional Clustering)

• Equal-interval
 – Clusters have same max – min
   (interval)
• Quantile
 – Clusters have same count
• Natural Breaks (Jenks)
 – Clusters have minimum deviation from
   mean
KMeans
(Centroid-based)
KMeans
            (Centroid-based)

1.   Choose random starting points
2.   Assign each target point to
     cluster candidates
3.   Replace randomly centroid
     point with mean of group.
4.   Repeat steps 2 & 3 until
     convergence.
Grid Clustering
            (Grid-based)

1.   Overlay mesh sized appropriate
    for zoom level
2. Compare point coordinates to
    mesh to create clusters.
• Very common on client-side
• Can lead to undesired “Grid”
  effect
• Somewhat non-deterministic
QuadTree
(Distance-based)




                   http://en.wikipedia.org/wiki/QUADTREE
QuadTree
            (Distance-based)
1.Input minimum cluster tolerance
2.Recursively insert points into
  existing tree
 1. Where distance < tolerance, number
    of points++
 2. Where distance > tolerance, insert
    to child node.
• Easy to implement
• Can lead to “Grid” affect
DBSCAN
(Density-based)




                  http://en.wikipedia.org/wiki/DBSCAN
DBSCAN
           (Density-based)

1.   Takes search radius and
    minimum number of points for
    cluster
2. Visit each point and count
    number of points in search
    radius
• Clusters can be any shape
• Search radius determined by zoom
  level
Strategies & Implementations for Web Apps
    (Server Object Extension vs. Pre-Crunched)
Where should clustering
            occur?
              • Small number of points ( < 10,000 )
              • No addition server load
Client-side
              • Widely available within client APIs
              • Limited by client-side languages
              • Medium number of points ( < 1M )
              • Many language/library options
Server-side
              • Robust querying
              • Very maintainable / extendible
              • Large number of points( > 1M)
              • Many language/library options
   Offline
              • Limited querying
              • Output Normal Feature Class
Clustering Server Object Extension
           (C#/QuadTree)

1. Extends MapServer
2. Wraps map query based on extent
3. returns clustered results
4. Stateless
5. Problems
 1. Re-calculates tree on each request
 2. Client-side wrappers
 3. Lost out-of-box ArcGIS Server
    functions
Clustering with Arcpy
     (distance-based / offline)

1.Divide data into logical
  chunks (where clause)
2.Integrate using tolerance
3.Collect Events
4.Spatial Join
 add descriptive statistics
4.Append all results
Clustering w/Python
• Numpy/Scipy
  – Defacto
• Scikit-Learn
  – (Python machine learning library)
• PyTables
  – HDF5, akin to NetCDF, but with
    support for hierarchical tables and
    very scalable
  – http://bcdcspatial.blogspot.com/2013
    /02/converting-arcgis-feature-class-
    to.html
Scikit-Learn




       SciKit – Learn…btw it’s awesome - http://scikit-learn.org/stable/
Bleeding Edge Python
• PyPy, Cython, Anaconda, Numba
  Pro, Pandas
• Python is now a first-class
  citizen on the GPU!
In Summary:
• Clustering is not Panning
• Think outside Count
• Clustering is not only for
  spatial data
Thank You!



Follow us on Twitter:
     @blueraster
     @brendancol



Visit us at:
     blueraster.com/blog
     bcdcspatial.blogspot.com

Weitere ähnliche Inhalte

Was ist angesagt?

Building DSLs with Scala
Building DSLs with ScalaBuilding DSLs with Scala
Building DSLs with ScalaMohit Jaggi
 
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPCAmazon Web Services
 
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...Databricks
 
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLPDictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLPSujit Pal
 
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Sujit Pal
 
Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.
Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.
Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.GeeksLab Odessa
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
Scalable Scientific Computing with Dask
Scalable Scientific Computing with DaskScalable Scientific Computing with Dask
Scalable Scientific Computing with DaskUwe Korn
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15MLconf
 
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...Edureka!
 
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovA Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovSpark Summit
 
Experience with Kafka & Storm
Experience with Kafka & StormExperience with Kafka & Storm
Experience with Kafka & StormOtto Mok
 
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim HunterDeep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim HunterDatabricks
 
Advanced deeplearning4j features
Advanced deeplearning4j featuresAdvanced deeplearning4j features
Advanced deeplearning4j featuresAdam Gibson
 
Consistent hashing
Consistent hashingConsistent hashing
Consistent hashingzroger
 
Introduction to df
Introduction to dfIntroduction to df
Introduction to dfMohit Jaggi
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit
 
Strata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4jStrata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4jAdam Gibson
 
Composing High-Performance Memory Allocators with Heap Layers
Composing High-Performance Memory Allocators with Heap LayersComposing High-Performance Memory Allocators with Heap Layers
Composing High-Performance Memory Allocators with Heap LayersEmery Berger
 

Was ist angesagt? (20)

Building DSLs with Scala
Building DSLs with ScalaBuilding DSLs with Scala
Building DSLs with Scala
 
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
(CMP303) ResearchCloud: CfnCluster and Internet2 for Enterprise HPC
 
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
 
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLPDictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
Dictionary based Annotation at Scale with Spark, SolrTextTagger and OpenNLP
 
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
 
Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.
Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.
Java/Scala Lab 2016. Сергей Моренец: Способы повышения эффективности в Java 8.
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Scalable Scientific Computing with Dask
Scalable Scientific Computing with DaskScalable Scientific Computing with Dask
Scalable Scientific Computing with Dask
 
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
 
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
 
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovA Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
 
Experience with Kafka & Storm
Experience with Kafka & StormExperience with Kafka & Storm
Experience with Kafka & Storm
 
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim HunterDeep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
 
Machine learning
Machine learningMachine learning
Machine learning
 
Advanced deeplearning4j features
Advanced deeplearning4j featuresAdvanced deeplearning4j features
Advanced deeplearning4j features
 
Consistent hashing
Consistent hashingConsistent hashing
Consistent hashing
 
Introduction to df
Introduction to dfIntroduction to df
Introduction to df
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef Habdank
 
Strata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4jStrata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4j
 
Composing High-Performance Memory Allocators with Heap Layers
Composing High-Performance Memory Allocators with Heap LayersComposing High-Performance Memory Allocators with Heap Layers
Composing High-Performance Memory Allocators with Heap Layers
 

Andere mochten auch

The Power of Story Maps, Data Visualization, and Analysis: NetHope and Blue R...
The Power of Story Maps, Data Visualization, and Analysis: NetHope and Blue R...The Power of Story Maps, Data Visualization, and Analysis: NetHope and Blue R...
The Power of Story Maps, Data Visualization, and Analysis: NetHope and Blue R...Blue Raster
 
U.S. National Arboretum - Esri User Conference 2015 Presentation
U.S. National Arboretum - Esri User Conference 2015 Presentation U.S. National Arboretum - Esri User Conference 2015 Presentation
U.S. National Arboretum - Esri User Conference 2015 Presentation Blue Raster
 
Harnessing Python
Harnessing PythonHarnessing Python
Harnessing PythonBlue Raster
 
Total Knockout: Start-to-Finish Development of Suitability Applications Using...
Total Knockout: Start-to-Finish Development of Suitability Applications Using...Total Knockout: Start-to-Finish Development of Suitability Applications Using...
Total Knockout: Start-to-Finish Development of Suitability Applications Using...Blue Raster
 
Best Practices for Story Maps
Best Practices for Story MapsBest Practices for Story Maps
Best Practices for Story MapsBlue Raster
 
Building a Custom UI control with D3
Building a Custom UI control with D3Building a Custom UI control with D3
Building a Custom UI control with D3Blue Raster
 
Unless We Act Now: Impact of Climate Change on Children
Unless We Act Now: Impact of Climate Change on ChildrenUnless We Act Now: Impact of Climate Change on Children
Unless We Act Now: Impact of Climate Change on ChildrenBlue Raster
 
Python Multiprocessing Spoon-fed - Blue Raster Esri Developer Summit 2013 Lig...
Python Multiprocessing Spoon-fed - Blue Raster Esri Developer Summit 2013 Lig...Python Multiprocessing Spoon-fed - Blue Raster Esri Developer Summit 2013 Lig...
Python Multiprocessing Spoon-fed - Blue Raster Esri Developer Summit 2013 Lig...Blue Raster
 

Andere mochten auch (8)

The Power of Story Maps, Data Visualization, and Analysis: NetHope and Blue R...
The Power of Story Maps, Data Visualization, and Analysis: NetHope and Blue R...The Power of Story Maps, Data Visualization, and Analysis: NetHope and Blue R...
The Power of Story Maps, Data Visualization, and Analysis: NetHope and Blue R...
 
U.S. National Arboretum - Esri User Conference 2015 Presentation
U.S. National Arboretum - Esri User Conference 2015 Presentation U.S. National Arboretum - Esri User Conference 2015 Presentation
U.S. National Arboretum - Esri User Conference 2015 Presentation
 
Harnessing Python
Harnessing PythonHarnessing Python
Harnessing Python
 
Total Knockout: Start-to-Finish Development of Suitability Applications Using...
Total Knockout: Start-to-Finish Development of Suitability Applications Using...Total Knockout: Start-to-Finish Development of Suitability Applications Using...
Total Knockout: Start-to-Finish Development of Suitability Applications Using...
 
Best Practices for Story Maps
Best Practices for Story MapsBest Practices for Story Maps
Best Practices for Story Maps
 
Building a Custom UI control with D3
Building a Custom UI control with D3Building a Custom UI control with D3
Building a Custom UI control with D3
 
Unless We Act Now: Impact of Climate Change on Children
Unless We Act Now: Impact of Climate Change on ChildrenUnless We Act Now: Impact of Climate Change on Children
Unless We Act Now: Impact of Climate Change on Children
 
Python Multiprocessing Spoon-fed - Blue Raster Esri Developer Summit 2013 Lig...
Python Multiprocessing Spoon-fed - Blue Raster Esri Developer Summit 2013 Lig...Python Multiprocessing Spoon-fed - Blue Raster Esri Developer Summit 2013 Lig...
Python Multiprocessing Spoon-fed - Blue Raster Esri Developer Summit 2013 Lig...
 

Ähnlich wie Operation Point Cluster - Blue Raster Esri Developer Summit 2013 Presentation

Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
Modern Computing: Cloud, Distributed, & High Performance
Modern Computing: Cloud, Distributed, & High PerformanceModern Computing: Cloud, Distributed, & High Performance
Modern Computing: Cloud, Distributed, & High Performanceinside-BigData.com
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large Systems
Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large SystemsCyclone DDS Unleashed: Scalability in DDS and Dealing with Large Systems
Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large SystemsZettaScaleTechnology
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
AWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAmazon Web Services
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Jen Aman
 
Automated product categorization
Automated product categorizationAutomated product categorization
Automated product categorizationAndreas Loupasakis
 
Automated product categorization
Automated product categorization   Automated product categorization
Automated product categorization Warply
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudScott Miao
 
Java scalability considerations yogesh deshpande
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpandeIndicThreads
 
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsRainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsJosh Carlisle
 
Cloud Computing - Geektalk
Cloud Computing - GeektalkCloud Computing - Geektalk
Cloud Computing - GeektalkMalisa Ncube
 

Ähnlich wie Operation Point Cluster - Blue Raster Esri Developer Summit 2013 Presentation (20)

Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Modern Computing: Cloud, Distributed, & High Performance
Modern Computing: Cloud, Distributed, & High PerformanceModern Computing: Cloud, Distributed, & High Performance
Modern Computing: Cloud, Distributed, & High Performance
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
Introduction
IntroductionIntroduction
Introduction
 
Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large Systems
Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large SystemsCyclone DDS Unleashed: Scalability in DDS and Dealing with Large Systems
Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large Systems
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
AWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data Analytics
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
 
Automated product categorization
Automated product categorizationAutomated product categorization
Automated product categorization
 
Automated product categorization
Automated product categorization   Automated product categorization
Automated product categorization
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Java scalability considerations yogesh deshpande
Java scalability considerations   yogesh deshpandeJava scalability considerations   yogesh deshpande
Java scalability considerations yogesh deshpande
 
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsRainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Cloud Computing - Geektalk
Cloud Computing - GeektalkCloud Computing - Geektalk
Cloud Computing - Geektalk
 

Mehr von Blue Raster

ArcGIS Maps and Microsoft Integration: Power BI
ArcGIS Maps and Microsoft Integration: Power BIArcGIS Maps and Microsoft Integration: Power BI
ArcGIS Maps and Microsoft Integration: Power BIBlue Raster
 
ArcGIS StoryMaps: Agencies Are Creating Great Story Maps; Here's How You Can Too
ArcGIS StoryMaps: Agencies Are Creating Great Story Maps; Here's How You Can TooArcGIS StoryMaps: Agencies Are Creating Great Story Maps; Here's How You Can Too
ArcGIS StoryMaps: Agencies Are Creating Great Story Maps; Here's How You Can TooBlue Raster
 
ESRI Federal Small Business Specialty 2017 Partner of the Year Award
ESRI Federal Small Business Specialty 2017 Partner of the Year AwardESRI Federal Small Business Specialty 2017 Partner of the Year Award
ESRI Federal Small Business Specialty 2017 Partner of the Year AwardBlue Raster
 
Emerging Hot Spot Analysis
Emerging Hot Spot AnalysisEmerging Hot Spot Analysis
Emerging Hot Spot AnalysisBlue Raster
 
Fighting Climate Change by Fighting Fires - Esri FedGIS 2016 Presentation
Fighting Climate Change by Fighting Fires - Esri FedGIS 2016 PresentationFighting Climate Change by Fighting Fires - Esri FedGIS 2016 Presentation
Fighting Climate Change by Fighting Fires - Esri FedGIS 2016 PresentationBlue Raster
 
AppStudio for ArcGIS: The Basics - Esri FedGIS 2016
AppStudio for ArcGIS: The Basics - Esri FedGIS 2016AppStudio for ArcGIS: The Basics - Esri FedGIS 2016
AppStudio for ArcGIS: The Basics - Esri FedGIS 2016Blue Raster
 
Blue Raster Natureserve Synergy Workshop Presentation
Blue Raster Natureserve Synergy Workshop PresentationBlue Raster Natureserve Synergy Workshop Presentation
Blue Raster Natureserve Synergy Workshop PresentationBlue Raster
 
Blue Raster Presents on Emerging Hotspots of Global Tree Cover Loss
Blue Raster Presents on Emerging Hotspots of Global Tree Cover LossBlue Raster Presents on Emerging Hotspots of Global Tree Cover Loss
Blue Raster Presents on Emerging Hotspots of Global Tree Cover LossBlue Raster
 
Creating Apps with Maps in AppStudio - Esri User Conference 2015 Presentation
Creating Apps with Maps in AppStudio - Esri User Conference 2015 PresentationCreating Apps with Maps in AppStudio - Esri User Conference 2015 Presentation
Creating Apps with Maps in AppStudio - Esri User Conference 2015 PresentationBlue Raster
 
Kill those bugs with the ultimate tool - Chrome DevTools
Kill those bugs with the ultimate tool - Chrome DevToolsKill those bugs with the ultimate tool - Chrome DevTools
Kill those bugs with the ultimate tool - Chrome DevToolsBlue Raster
 
Make JavaScript Lean, Mean, and Clean
Make JavaScript Lean, Mean, and CleanMake JavaScript Lean, Mean, and Clean
Make JavaScript Lean, Mean, and CleanBlue Raster
 
Building a Custom UI Control with D3 - 2015 Esri Devoloper Summit
Building a Custom UI Control with D3 - 2015 Esri Devoloper SummitBuilding a Custom UI Control with D3 - 2015 Esri Devoloper Summit
Building a Custom UI Control with D3 - 2015 Esri Devoloper SummitBlue Raster
 
Great Story Maps - Blue Raster Esri User Conference 2014 Presentation
Great Story Maps - Blue Raster Esri User Conference 2014 PresentationGreat Story Maps - Blue Raster Esri User Conference 2014 Presentation
Great Story Maps - Blue Raster Esri User Conference 2014 PresentationBlue Raster
 
Online Mapping Patterns in 2013 and Beyond
Online Mapping Patterns in 2013 and BeyondOnline Mapping Patterns in 2013 and Beyond
Online Mapping Patterns in 2013 and BeyondBlue Raster
 
Migrating Department of Education Web Mapping App to AWS EC2
Migrating Department of Education Web Mapping App to AWS EC2Migrating Department of Education Web Mapping App to AWS EC2
Migrating Department of Education Web Mapping App to AWS EC2Blue Raster
 
PADDDtracker: Crowdsourcing PADDDs Using the Power of ArcGIS for Server - 201...
PADDDtracker: Crowdsourcing PADDDs Using the Power of ArcGIS for Server - 201...PADDDtracker: Crowdsourcing PADDDs Using the Power of ArcGIS for Server - 201...
PADDDtracker: Crowdsourcing PADDDs Using the Power of ArcGIS for Server - 201...Blue Raster
 
Javascript Editing Tools Made Easy Blue Raster - Esri Developer Summit 2013 L...
Javascript Editing Tools Made Easy Blue Raster - Esri Developer Summit 2013 L...Javascript Editing Tools Made Easy Blue Raster - Esri Developer Summit 2013 L...
Javascript Editing Tools Made Easy Blue Raster - Esri Developer Summit 2013 L...Blue Raster
 
Moving ArcGIS Servers to AWS Cloud Hosting - NCES, Blue Raster, Sanametrix - ...
Moving ArcGIS Servers to AWS Cloud Hosting - NCES, Blue Raster, Sanametrix - ...Moving ArcGIS Servers to AWS Cloud Hosting - NCES, Blue Raster, Sanametrix - ...
Moving ArcGIS Servers to AWS Cloud Hosting - NCES, Blue Raster, Sanametrix - ...Blue Raster
 
What's New with the NCES SDDS and Web Mapping Tools - Blue Raster and Sanamet...
What's New with the NCES SDDS and Web Mapping Tools - Blue Raster and Sanamet...What's New with the NCES SDDS and Web Mapping Tools - Blue Raster and Sanamet...
What's New with the NCES SDDS and Web Mapping Tools - Blue Raster and Sanamet...Blue Raster
 
Innovative Data Collection Techniques for Public School Boundaries - Blue Ras...
Innovative Data Collection Techniques for Public School Boundaries - Blue Ras...Innovative Data Collection Techniques for Public School Boundaries - Blue Ras...
Innovative Data Collection Techniques for Public School Boundaries - Blue Ras...Blue Raster
 

Mehr von Blue Raster (20)

ArcGIS Maps and Microsoft Integration: Power BI
ArcGIS Maps and Microsoft Integration: Power BIArcGIS Maps and Microsoft Integration: Power BI
ArcGIS Maps and Microsoft Integration: Power BI
 
ArcGIS StoryMaps: Agencies Are Creating Great Story Maps; Here's How You Can Too
ArcGIS StoryMaps: Agencies Are Creating Great Story Maps; Here's How You Can TooArcGIS StoryMaps: Agencies Are Creating Great Story Maps; Here's How You Can Too
ArcGIS StoryMaps: Agencies Are Creating Great Story Maps; Here's How You Can Too
 
ESRI Federal Small Business Specialty 2017 Partner of the Year Award
ESRI Federal Small Business Specialty 2017 Partner of the Year AwardESRI Federal Small Business Specialty 2017 Partner of the Year Award
ESRI Federal Small Business Specialty 2017 Partner of the Year Award
 
Emerging Hot Spot Analysis
Emerging Hot Spot AnalysisEmerging Hot Spot Analysis
Emerging Hot Spot Analysis
 
Fighting Climate Change by Fighting Fires - Esri FedGIS 2016 Presentation
Fighting Climate Change by Fighting Fires - Esri FedGIS 2016 PresentationFighting Climate Change by Fighting Fires - Esri FedGIS 2016 Presentation
Fighting Climate Change by Fighting Fires - Esri FedGIS 2016 Presentation
 
AppStudio for ArcGIS: The Basics - Esri FedGIS 2016
AppStudio for ArcGIS: The Basics - Esri FedGIS 2016AppStudio for ArcGIS: The Basics - Esri FedGIS 2016
AppStudio for ArcGIS: The Basics - Esri FedGIS 2016
 
Blue Raster Natureserve Synergy Workshop Presentation
Blue Raster Natureserve Synergy Workshop PresentationBlue Raster Natureserve Synergy Workshop Presentation
Blue Raster Natureserve Synergy Workshop Presentation
 
Blue Raster Presents on Emerging Hotspots of Global Tree Cover Loss
Blue Raster Presents on Emerging Hotspots of Global Tree Cover LossBlue Raster Presents on Emerging Hotspots of Global Tree Cover Loss
Blue Raster Presents on Emerging Hotspots of Global Tree Cover Loss
 
Creating Apps with Maps in AppStudio - Esri User Conference 2015 Presentation
Creating Apps with Maps in AppStudio - Esri User Conference 2015 PresentationCreating Apps with Maps in AppStudio - Esri User Conference 2015 Presentation
Creating Apps with Maps in AppStudio - Esri User Conference 2015 Presentation
 
Kill those bugs with the ultimate tool - Chrome DevTools
Kill those bugs with the ultimate tool - Chrome DevToolsKill those bugs with the ultimate tool - Chrome DevTools
Kill those bugs with the ultimate tool - Chrome DevTools
 
Make JavaScript Lean, Mean, and Clean
Make JavaScript Lean, Mean, and CleanMake JavaScript Lean, Mean, and Clean
Make JavaScript Lean, Mean, and Clean
 
Building a Custom UI Control with D3 - 2015 Esri Devoloper Summit
Building a Custom UI Control with D3 - 2015 Esri Devoloper SummitBuilding a Custom UI Control with D3 - 2015 Esri Devoloper Summit
Building a Custom UI Control with D3 - 2015 Esri Devoloper Summit
 
Great Story Maps - Blue Raster Esri User Conference 2014 Presentation
Great Story Maps - Blue Raster Esri User Conference 2014 PresentationGreat Story Maps - Blue Raster Esri User Conference 2014 Presentation
Great Story Maps - Blue Raster Esri User Conference 2014 Presentation
 
Online Mapping Patterns in 2013 and Beyond
Online Mapping Patterns in 2013 and BeyondOnline Mapping Patterns in 2013 and Beyond
Online Mapping Patterns in 2013 and Beyond
 
Migrating Department of Education Web Mapping App to AWS EC2
Migrating Department of Education Web Mapping App to AWS EC2Migrating Department of Education Web Mapping App to AWS EC2
Migrating Department of Education Web Mapping App to AWS EC2
 
PADDDtracker: Crowdsourcing PADDDs Using the Power of ArcGIS for Server - 201...
PADDDtracker: Crowdsourcing PADDDs Using the Power of ArcGIS for Server - 201...PADDDtracker: Crowdsourcing PADDDs Using the Power of ArcGIS for Server - 201...
PADDDtracker: Crowdsourcing PADDDs Using the Power of ArcGIS for Server - 201...
 
Javascript Editing Tools Made Easy Blue Raster - Esri Developer Summit 2013 L...
Javascript Editing Tools Made Easy Blue Raster - Esri Developer Summit 2013 L...Javascript Editing Tools Made Easy Blue Raster - Esri Developer Summit 2013 L...
Javascript Editing Tools Made Easy Blue Raster - Esri Developer Summit 2013 L...
 
Moving ArcGIS Servers to AWS Cloud Hosting - NCES, Blue Raster, Sanametrix - ...
Moving ArcGIS Servers to AWS Cloud Hosting - NCES, Blue Raster, Sanametrix - ...Moving ArcGIS Servers to AWS Cloud Hosting - NCES, Blue Raster, Sanametrix - ...
Moving ArcGIS Servers to AWS Cloud Hosting - NCES, Blue Raster, Sanametrix - ...
 
What's New with the NCES SDDS and Web Mapping Tools - Blue Raster and Sanamet...
What's New with the NCES SDDS and Web Mapping Tools - Blue Raster and Sanamet...What's New with the NCES SDDS and Web Mapping Tools - Blue Raster and Sanamet...
What's New with the NCES SDDS and Web Mapping Tools - Blue Raster and Sanamet...
 
Innovative Data Collection Techniques for Public School Boundaries - Blue Ras...
Innovative Data Collection Techniques for Public School Boundaries - Blue Ras...Innovative Data Collection Techniques for Public School Boundaries - Blue Ras...
Innovative Data Collection Techniques for Public School Boundaries - Blue Ras...
 

Operation Point Cluster - Blue Raster Esri Developer Summit 2013 Presentation

  • 2.
  • 3. “The function of the brain and nervous system is to protect us from being overwhelmed and confused by this mass of largely useless and irrelevant knowledge, by shutting out most of what we should otherwise perceive or remember at any moment, and leaving only that very small and special selection which is likely to be practically useful.” -Aldous Huxley
  • 6. 103,000 Public Schools (Mean Student Teacher Ratio)
  • 7. Operation Point Cluster • Review general clustering algorithms • Suggest strategies & implementations for clustering for web applications – Server-side (C#) – Offline w/ArcGIS (Python) – Offline w/3rd Party (Python)
  • 8. Data Classification (One Dimensional Clustering) • Equal-interval – Clusters have same max – min (interval) • Quantile – Clusters have same count • Natural Breaks (Jenks) – Clusters have minimum deviation from mean
  • 10. KMeans (Centroid-based) 1. Choose random starting points 2. Assign each target point to cluster candidates 3. Replace randomly centroid point with mean of group. 4. Repeat steps 2 & 3 until convergence.
  • 11.
  • 12. Grid Clustering (Grid-based) 1. Overlay mesh sized appropriate for zoom level 2. Compare point coordinates to mesh to create clusters. • Very common on client-side • Can lead to undesired “Grid” effect • Somewhat non-deterministic
  • 13. QuadTree (Distance-based) http://en.wikipedia.org/wiki/QUADTREE
  • 14. QuadTree (Distance-based) 1.Input minimum cluster tolerance 2.Recursively insert points into existing tree 1. Where distance < tolerance, number of points++ 2. Where distance > tolerance, insert to child node. • Easy to implement • Can lead to “Grid” affect
  • 15. DBSCAN (Density-based) http://en.wikipedia.org/wiki/DBSCAN
  • 16. DBSCAN (Density-based) 1. Takes search radius and minimum number of points for cluster 2. Visit each point and count number of points in search radius • Clusters can be any shape • Search radius determined by zoom level
  • 17. Strategies & Implementations for Web Apps (Server Object Extension vs. Pre-Crunched)
  • 18. Where should clustering occur? • Small number of points ( < 10,000 ) • No addition server load Client-side • Widely available within client APIs • Limited by client-side languages • Medium number of points ( < 1M ) • Many language/library options Server-side • Robust querying • Very maintainable / extendible • Large number of points( > 1M) • Many language/library options Offline • Limited querying • Output Normal Feature Class
  • 19. Clustering Server Object Extension (C#/QuadTree) 1. Extends MapServer 2. Wraps map query based on extent 3. returns clustered results 4. Stateless 5. Problems 1. Re-calculates tree on each request 2. Client-side wrappers 3. Lost out-of-box ArcGIS Server functions
  • 20. Clustering with Arcpy (distance-based / offline) 1.Divide data into logical chunks (where clause) 2.Integrate using tolerance 3.Collect Events 4.Spatial Join add descriptive statistics 4.Append all results
  • 21. Clustering w/Python • Numpy/Scipy – Defacto • Scikit-Learn – (Python machine learning library) • PyTables – HDF5, akin to NetCDF, but with support for hierarchical tables and very scalable – http://bcdcspatial.blogspot.com/2013 /02/converting-arcgis-feature-class- to.html
  • 22. Scikit-Learn SciKit – Learn…btw it’s awesome - http://scikit-learn.org/stable/
  • 23. Bleeding Edge Python • PyPy, Cython, Anaconda, Numba Pro, Pandas • Python is now a first-class citizen on the GPU!
  • 24. In Summary: • Clustering is not Panning • Think outside Count • Clustering is not only for spatial data
  • 25. Thank You! Follow us on Twitter: @blueraster @brendancol Visit us at: blueraster.com/blog bcdcspatial.blogspot.com