SlideShare ist ein Scribd-Unternehmen logo
1 von 89
Geo Analytics Tutorial
Pete Skomoroch
Sr. Data Scientist - LinkedIn (@peteskomoroch)

#geoanalytics
** Hadoop Intro slides from Kevin Weil, Twitter
Topics
ā€£   Data Science & Geo Analytics
ā€£   Useful Geo tools and Datasets
ā€£   Hadoop, Pig, and Big Data
ā€£   Cleaning Location Data with Mechanical Turk
ā€£   Spatial Tweet Analytics with Hadoop & Python
ā€£   Using Social Data to Understand Cities
ā€£   Q&A
Topics
ā€£   Data Science & Geo Analytics
ā€£   Useful Geo tools and Datasets
ā€£   Hadoop, Pig, and Big Data
ā€£   Cleaning Location Data with Mechanical Turk
ā€£   Spatial Tweet Analytics with Hadoop & Python
ā€£   Using Social Data to Understand Cities
ā€£   Q&A
Analytics & Data are Hot Topics
Analytics & Data are Hot Topics
Analytics & Data are Hot Topics
Analytics & Data are Hot Topics
Analytics & Data are Hot Topics
Analytics & Data are Hot Topics
Data Exhaust
               My Delicious Tags
Data Science




       * http://www.drewconway.com/zia/?p=2378
Data Visualization




          ā€£   http://www.dataspora.com/blog/
Spatial Analysis

                   Map by Dr. John Snow of London,
                   showing clusters of cholera cases in
                   the 1854 Broad Street cholera
                   outbreak. This was one of the first
                   uses of map-based spatial analysis.
Spatial Analysis

ā€¢ Spatial regression - estimate dependencies between variables
ā€¢ Gravity models - estimate the flow of people, material, or
 information between locations
ā€¢ Spatial interpolation - estimate variables at unobserved locations
 based on other measured values
ā€¢ Simulation - use models and data to predict spatial phenomena
Life Span & Food by Zip Code




* http://zev.lacounty.gov/news/health/death-by-zip-code
* http://www.verysmallarray.com/?p=975
Where Americans Are Moving (IRS Data)




 ā€£   (Jon Bruner) http://jebruner.com/2010/06/the-migration-map/
Facebook Connectivity (Pete Warden)




* http://petewarden.typepad.com/searchbrowser/2010/02/how-to-split-up-the-us.html
Topics
ā€£   Data Science & Geo Analytics
ā€£   Useful Geo tools and Datasets
ā€£   Hadoop, Pig, and Big Data
ā€£   Cleaning Location Data with Mechanical Turk
ā€£   Spatial Tweet Analytics with Hadoop & Python
ā€£   Using Social Data to Understand Cities
ā€£   Q&A
Useful Geo Tools

ā€¢R, Matlab, SciPy, Commercial Geo Software
ā€¢R Spatial Pkgs http://cran.r-project.org/web/views/Spatial.html
ā€¢Hadoop, Amazon EC2, Mechanical Turk
ā€¢Data Science Toolkit: http://www.datasciencetoolkit.org/
ā€¢80% of effort is often in cleaning and processing data
DataScienceToolkit.org

ā€¢Runs on VM or Amazon EC2
ā€¢Street Address to Coordinates
ā€¢Coordinates to Political Areas
ā€¢Geodict (text extraction)
ā€¢IP Address to Coordinates
ā€¢New UK release on Github
Resources for location data

ā€¢ SimpleGeo
ā€¢ Factual
ā€¢ Geonames
ā€¢ Infochimps
ā€¢ Data.gov
ā€¢ DataWrangling.com
Topics
ā€£   Data Science & Geo Analytics
ā€£   Useful Geo tools and Datasets
ā€£   Hadoop, Pig, and Big Data
ā€£   Cleaning Location Data with Mechanical Turk
ā€£   Spatial Tweet Analytics with Hadoop & Python
ā€£   Using Social Data to Understand Cities
ā€£   Q&A
Hadoop: Motivation

  ā€¢We want to crunch 1TB of Twitter stream data and understand
   spatial patterns in Tweets
  ā€¢Data collected from the Twitter ā€œGarden Hoseā€ API last Spring
Data is Getting Big
ā€£   NYSE: 1 TB/day
ā€£   Facebook: 20+ TB
    compressed/day
ā€£   CERN/LHC: 40 TB/day (15
    PB/year!)
ā€£   And growth is accelerating
ā€£   Need multiple machines,
    horizontal scalability
Hadoop
ā€£   Distributed file system (hard to store a PB)
ā€£   Fault-tolerant, handles replication, node failure, etc
ā€£   MapReduce-based parallel computation
    (even harder to process a PB)
ā€£   Generic key-value based computation interface
    allows for wide applicability
ā€£   Open source, top-level Apache project
ā€£   Scalable: Y! has a 4000-node cluster
ā€£   Powerful: sorted a TB of random integers in 62 seconds
MapReduce?
cat file | grep geo | sort | uniq -c >   ā€£   Challenge: how many tweets per
output                                       county, given tweets table?
                                         ā€£   Input: key=row, value=tweet info
                                         ā€£   Map: output key=county, value=1
                                         ā€£   Shuffle: sort by county
                                         ā€£   Reduce: for each county, sum
                                         ā€£   Output: county, tweet count
                                         ā€£   With 2x machines, runs close to
                                             2x faster.
MapReduce?
cat file | grep geo | sort | uniq -c >   ā€£   Challenge: how many tweets per
output                                       county, given tweets table?
                                         ā€£   Input: key=row, value=tweet info
                                         ā€£   Map: output key=county, value=1
                                         ā€£   Shuffle: sort by county
                                         ā€£   Reduce: for each county, sum
                                         ā€£   Output: county, tweet count
                                         ā€£   With 2x machines, runs close to
                                             2x faster.
MapReduce?
cat file | grep geo | sort | uniq -c >   ā€£   Challenge: how many tweets per
output                                       county, given tweets table?
                                         ā€£   Input: key=row, value=tweet info
                                         ā€£   Map: output key=county, value=1
                                         ā€£   Shuffle: sort by county
                                         ā€£   Reduce: for each county, sum
                                         ā€£   Output: county, tweet count
                                         ā€£   With 2x machines, runs close to
                                             2x faster.
MapReduce?
cat file | grep geo | sort | uniq -c >   ā€£   Challenge: how many tweets per
output                                       county, given tweets table?
                                         ā€£   Input: key=row, value=tweet info
                                         ā€£   Map: output key=county, value=1
                                         ā€£   Shuffle: sort by county
                                         ā€£   Reduce: for each county, sum
                                         ā€£   Output: county, tweet count
                                         ā€£   With 2x machines, runs close to
                                             2x faster.
MapReduce?
cat file | grep geo | sort | uniq -c >   ā€£   Challenge: how many tweets per
output                                       county, given tweets table?
                                         ā€£   Input: key=row, value=tweet info
                                         ā€£   Map: output key=county, value=1
                                         ā€£   Shuffle: sort by county
                                         ā€£   Reduce: for each county, sum
                                         ā€£   Output: county, tweet count
                                         ā€£   With 2x machines, runs close to
                                             2x faster.
MapReduce?
cat file | grep geo | sort | uniq -c >   ā€£   Challenge: how many tweets per
output                                       county, given tweets table?
                                         ā€£   Input: key=row, value=tweet info
                                         ā€£   Map: output key=county, value=1
                                         ā€£   Shuffle: sort by county
                                         ā€£   Reduce: for each county, sum
                                         ā€£   Output: county, tweet count
                                         ā€£   With 2x machines, runs close to
                                             2x faster.
MapReduce?
cat file | grep geo | sort | uniq -c >   ā€£   Challenge: how many tweets per
output                                       county, given tweets table?
                                         ā€£   Input: key=row, value=tweet info
                                         ā€£   Map: output key=county, value=1
                                         ā€£   Shuffle: sort by county
                                         ā€£   Reduce: for each county, sum
                                         ā€£   Output: county, tweet count
                                         ā€£   With 2x machines, runs close
                                             to 2x faster.
But...
ā€£   Analysis typically done in Java
ā€£   Single-input, two-stage data flow is rigid
ā€£   Projections, filters: custom code
ā€£   Joins: lengthy, error-prone
ā€£   n-stage jobs: Hard to manage
ā€£   Prototyping/exploration requires             ā€£   analytics in Eclipse?
    compilation                                      ur doin it wrong...
Enter Pig

            ā€£   High level language
            ā€£   Transformations on sets of records
            ā€£   Process data one step at a time
            ā€£   Easier than SQL?
Why Pig?
ā€£   Because I bet you can read the following script.
A Real Pig Script




ā€£   Now, just for fun... the same calculation in vanilla Hadoop MapReduce.
No, seriously.
Pig Simplifies Analysis

ā€£   The Pig version is:
ā€£        5% of the code, 5% of the time
ā€£        Within 50% of the execution time.
ā€£   Pig      Geo:
    ā€£   Programmable: fuzzy matching, custom filtering
    ā€£   Easily link multiple datasets, regardless of size/structure
    ā€£   Iterative, quick
A Real Example

ā€£   Fire up your Elastic MapReduce Cluster.
    ā€£   ... or follow along at http://bit.ly/whereanalytics
ā€£   I used Twitterā€™s streaming API to store some tweets
ā€£   Simplest thing: group by location and count with Pig
    ā€£   http://bit.ly/where20pig


ā€£   Here comes some code!
tweets = LOAD 's3://where20demo/sample-tweets' as (
  user_screen_name:chararray,
  tweet_id:chararray,
  ...
  user_friends_count:int,
  user_statuses_count:int,
  user_location:chararray,
  user_lang:chararray,
  user_time_zone:chararray,
  place_id:chararray,
  ...);
tweets = LOAD 's3://where20demo/sample-tweets' as (
  user_screen_name:chararray,
  tweet_id:chararray,
  ...
  user_friends_count:int,
  user_statuses_count:int,
  user_location:chararray,
  user_lang:chararray,
  user_time_zone:chararray,
  place_id:chararray,
  ...);
tweets_with_location = FILTER tweets BY user_location !=
'NULL';
normalized_locations = FOREACH tweets_with_location
GENERATE LOWER(user_location) as user_location;
grouped_tweets = GROUP normalized_locations BY
user_location PARALLEL 10;
location_counts = FOREACH grouped_tweets GENERATE $0 as
location, SIZE($1) as user_count;
sorted_counts = ORDER location_counts BY user_count DESC;
STORE sorted_counts INTO 'global_location_tweets';
hadoop@ip-10-160-113-142:~$ hadoop dfs -cat /global_location_counts/part* | head -30

brasil           37985
indonesia        33777
brazil           22432
london           17294
usa              14564
sĆ£o paulo        14238
new york         13420
tokyo            10967
singapore        10225
rio de janeiro   10135
los angeles      9934
california       9386
chicago          9155
uk               9095
jakarta          9086
germany          8741
canada           8201
                 7696
                 7121
jakarta, indonesia  6480
nyc              6456
new york, ny     6331
Neat, but...

 ā€£   Wow, that data is messy!
     ā€£   brasil, brazil at #1 and #3
     ā€£   new york, nyc, and new york ny all in the top 30
 ā€£   Mechanical Turk to the rescue...
Topics
ā€£   Data Science & Geo Analytics
ā€£   Useful Geo tools and Datasets
ā€£   Hadoop, Pig, and Big Data
ā€£   Cleaning Location Data with Mechanical Turk
ā€£   Spatial Tweet Analytics with Hadoop & Python
ā€£   Using Social Data to Understand Cities
ā€£   Q&A
Code examples weā€™ll cover are on Github
You can run them on Elastic MapReduce
Cleaning Twitter Profile Location Names


                     Filter Exact
                       Matches
Extract Top Tweet
    Locations                           Clean with
                                          MTurk
                    Aggregate Context
                      with Hadoop
We will map locations to GeoNames IDs
Start with Location Exact Matches
Use Mechanical Turk to improve results
Workers do simple tasks for a few cents
We constructed the following task
Workers used a Geonames search tool
Location search tool code is on Github
Preparing Data to send to MTurk
We use consensus answers from workers
Processing MTurk Output
Topics
ā€£   Data Science & Geo Analytics
ā€£   Useful Geo tools and Datasets
ā€£   Hadoop, Pig, and Big Data
ā€£   Cleaning Location Data with Mechanical Turk
ā€£   Spatial Tweet Analytics with Hadoop & Python
ā€£   Using Social Data to Understand Cities
ā€£   Q&A
Tokenizing and Cleaning Tweet Text
ā€£   Extract Tweet topics with Hadoop + Python + NLTK + Wikipedia
Build Phrase Dictionary with Wikipedia
Streaming Tweet Parser (Python + NLTK)
Parse Tweets and Join to Wikipedia (Pig)
Aggregate by US County for Analysis
Clean Data => Thematic US County Map
Twitter users by county in our sample
ā€œLady Gagaā€ Tweets
ā€œTea Partyā€ Tweets
ā€œDallasā€ Tweets
ā€œStephen Colbertā€ Tweets
Topics
ā€£   Data Science & Geo Analytics
ā€£   Useful Geo tools and Datasets
ā€£   Hadoop, Pig, and Big Data
ā€£   Cleaning Location Data with Mechanical Turk
ā€£   Spatial Tweet Analytics with Hadoop & Python
ā€£   Using Social Data to Understand Cities
ā€£   Q&A
LinkedIn Skills
Skills in the Design Industry
Exploring the Spatial Distribution of Skills
People with ā€œShip Buildingā€ Skills
What is the Skill profile of a given city?
Expertise correlated with Santa Clara, CA
Expertise correlated with Los Angeles
Expertise correlated with Washington, DC
Yuba City, CA has 21.3% Unemployment




                     21.3
Ames, Iowa has 4.7% Unemployment




                    21.3
Topics
ā€£   Data Science & Geo Analytics
ā€£   Useful Geo tools and Datasets
ā€£   Hadoop, Pig, and Big Data
ā€£   Cleaning Location Data with Mechanical Turk
ā€£   Spatial Tweet Analytics with Hadoop & Python
ā€£   Using Social Data to Understand Cities
ā€£   Q&A
Questions?   Follow me at
             twitter.com/peteskomoroch
             datawrangling.com

Weitere Ƥhnliche Inhalte

Was ist angesagt?

Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptxSadhanaParameswaran
Ā 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
Ā 
ChatGPT.pdf
ChatGPT.pdfChatGPT.pdf
ChatGPT.pdfdhatura
Ā 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)SiamAhmed16
Ā 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big DataMd. Salman Ahmed
Ā 
Data Lineage: Using Knowledge Graphs for Deeper Insights into Your Data Pipel...
Data Lineage: Using Knowledge Graphs for Deeper Insights into Your Data Pipel...Data Lineage: Using Knowledge Graphs for Deeper Insights into Your Data Pipel...
Data Lineage: Using Knowledge Graphs for Deeper Insights into Your Data Pipel...Neo4j
Ā 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023HyunJoon Jung
Ā 
Graph Databases ā€“ Benefits and Risks
Graph Databases ā€“ Benefits and RisksGraph Databases ā€“ Benefits and Risks
Graph Databases ā€“ Benefits and RisksDATAVERSITY
Ā 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSampath Kumar
Ā 
Data Modeling Techniques
Data Modeling TechniquesData Modeling Techniques
Data Modeling TechniquesDATAVERSITY
Ā 
How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyPekka Abrahamsson / Tampere University
Ā 
Big Data
Big DataBig Data
Big DataRohit Jain
Ā 
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ArangoDB Database
Ā 
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your DataBuild Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your DataJean Ihm
Ā 
A Universe of Knowledge Graphs
A Universe of Knowledge GraphsA Universe of Knowledge Graphs
A Universe of Knowledge GraphsNeo4j
Ā 
Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsDATAVERSITY
Ā 

Was ist angesagt? (20)

Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
Ā 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
Ā 
ChatGPT.pdf
ChatGPT.pdfChatGPT.pdf
ChatGPT.pdf
Ā 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
Ā 
Data science
Data scienceData science
Data science
Ā 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
Ā 
Data Lineage: Using Knowledge Graphs for Deeper Insights into Your Data Pipel...
Data Lineage: Using Knowledge Graphs for Deeper Insights into Your Data Pipel...Data Lineage: Using Knowledge Graphs for Deeper Insights into Your Data Pipel...
Data Lineage: Using Knowledge Graphs for Deeper Insights into Your Data Pipel...
Ā 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023
Ā 
Graph Databases ā€“ Benefits and Risks
Graph Databases ā€“ Benefits and RisksGraph Databases ā€“ Benefits and Risks
Graph Databases ā€“ Benefits and Risks
Ā 
Big data ppt
Big data pptBig data ppt
Big data ppt
Ā 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Ā 
Big data ppt
Big data pptBig data ppt
Big data ppt
Ā 
Data Modeling Techniques
Data Modeling TechniquesData Modeling Techniques
Data Modeling Techniques
Ā 
How ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundlyHow ChatGPT and AI-assisted coding changes software engineering profoundly
How ChatGPT and AI-assisted coding changes software engineering profoundly
Ā 
Big Data
Big DataBig Data
Big Data
Ā 
Generative AI
Generative AIGenerative AI
Generative AI
Ā 
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
Ā 
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your DataBuild Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
Ā 
A Universe of Knowledge Graphs
A Universe of Knowledge GraphsA Universe of Knowledge Graphs
A Universe of Knowledge Graphs
Ā 
Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property Graphs
Ā 

Andere mochten auch

Geo data analytics
Geo data analyticsGeo data analytics
Geo data analyticsDaniel Marcous
Ā 
Enrich Gis With Social Media And Open Data
Enrich Gis With Social Media And Open DataEnrich Gis With Social Media And Open Data
Enrich Gis With Social Media And Open DataJan Willem van Eck
Ā 
Data mining in big databases with geo reference and easy web publishing and s...
Data mining in big databases with geo reference and easy web publishing and s...Data mining in big databases with geo reference and easy web publishing and s...
Data mining in big databases with geo reference and easy web publishing and s...MapWindow GIS
Ā 
Atlas Of Cambodia 2007
Atlas Of Cambodia 2007Atlas Of Cambodia 2007
Atlas Of Cambodia 2007Jan-Peter Mund
Ā 
Geo-analytics Architecture - Technologies
Geo-analytics Architecture - TechnologiesGeo-analytics Architecture - Technologies
Geo-analytics Architecture - TechnologiesBlue BRIDGE
Ā 
Intro to open refine
Intro to open refineIntro to open refine
Intro to open refineSchool of Data
Ā 
Big Data: Small Screen Location-Based Services 2.0
Big Data: Small Screen Location-Based Services 2.0 Big Data: Small Screen Location-Based Services 2.0
Big Data: Small Screen Location-Based Services 2.0 Kevin Foreman
Ā 
Citizen science, vgi, geo crowd sourcing, big geo data how they matter to th...
Citizen science, vgi, geo  crowd sourcing, big geo data how they matter to th...Citizen science, vgi, geo  crowd sourcing, big geo data how they matter to th...
Citizen science, vgi, geo crowd sourcing, big geo data how they matter to th...Maria Antonia Brovelli
Ā 
The Power of Geo Analytics (and maps) to Improve Predictive Analytics in Heal...
The Power of Geo Analytics (and maps) to Improve Predictive Analytics in Heal...The Power of Geo Analytics (and maps) to Improve Predictive Analytics in Heal...
The Power of Geo Analytics (and maps) to Improve Predictive Analytics in Heal...Health Catalyst
Ā 
Leveraging Geo-Spatial (Big) Data for Financial Services Solutions
Leveraging Geo-Spatial (Big) Data for Financial Services SolutionsLeveraging Geo-Spatial (Big) Data for Financial Services Solutions
Leveraging Geo-Spatial (Big) Data for Financial Services SolutionsCapgemini
Ā 
DataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census dataDataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census dataRitvvij Parrikh
Ā 
Charles Birnbaum (Foursquare) Ā«Big data and user generated content LBSĀ»
Charles Birnbaum (Foursquare) Ā«Big data and user generated content LBSĀ»Charles Birnbaum (Foursquare) Ā«Big data and user generated content LBSĀ»
Charles Birnbaum (Foursquare) Ā«Big data and user generated content LBSĀ»e-Legion
Ā 
Lecture8 rocks
Lecture8 rocksLecture8 rocks
Lecture8 rocksairporte
Ā 
Linked (Geo) Data - Adding a Spatial Dimension to the Web of Data
Linked (Geo) Data - Adding a Spatial Dimension to the Web of DataLinked (Geo) Data - Adding a Spatial Dimension to the Web of Data
Linked (Geo) Data - Adding a Spatial Dimension to the Web of DataAndreas Langegger
Ā 
Big Data in Retail
Big Data in RetailBig Data in Retail
Big Data in RetailCisco Services
Ā 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl conceptsjeshocarme
Ā 
SXSW Keynote - The Game Layer On Top Of The World
SXSW Keynote - The Game Layer On Top Of The WorldSXSW Keynote - The Game Layer On Top Of The World
SXSW Keynote - The Game Layer On Top Of The WorldSeth Priebatsch
Ā 

Andere mochten auch (20)

Geo data analytics
Geo data analyticsGeo data analytics
Geo data analytics
Ā 
Enrich Gis With Social Media And Open Data
Enrich Gis With Social Media And Open DataEnrich Gis With Social Media And Open Data
Enrich Gis With Social Media And Open Data
Ā 
Data mining in big databases with geo reference and easy web publishing and s...
Data mining in big databases with geo reference and easy web publishing and s...Data mining in big databases with geo reference and easy web publishing and s...
Data mining in big databases with geo reference and easy web publishing and s...
Ā 
Atlas Of Cambodia 2007
Atlas Of Cambodia 2007Atlas Of Cambodia 2007
Atlas Of Cambodia 2007
Ā 
Get Big Geo Data
Get Big Geo DataGet Big Geo Data
Get Big Geo Data
Ā 
Geo-analytics Architecture - Technologies
Geo-analytics Architecture - TechnologiesGeo-analytics Architecture - Technologies
Geo-analytics Architecture - Technologies
Ā 
Intro to open refine
Intro to open refineIntro to open refine
Intro to open refine
Ā 
Big Data: Small Screen Location-Based Services 2.0
Big Data: Small Screen Location-Based Services 2.0 Big Data: Small Screen Location-Based Services 2.0
Big Data: Small Screen Location-Based Services 2.0
Ā 
Citizen science, vgi, geo crowd sourcing, big geo data how they matter to th...
Citizen science, vgi, geo  crowd sourcing, big geo data how they matter to th...Citizen science, vgi, geo  crowd sourcing, big geo data how they matter to th...
Citizen science, vgi, geo crowd sourcing, big geo data how they matter to th...
Ā 
The Power of Geo Analytics (and maps) to Improve Predictive Analytics in Heal...
The Power of Geo Analytics (and maps) to Improve Predictive Analytics in Heal...The Power of Geo Analytics (and maps) to Improve Predictive Analytics in Heal...
The Power of Geo Analytics (and maps) to Improve Predictive Analytics in Heal...
Ā 
Leveraging Geo-Spatial (Big) Data for Financial Services Solutions
Leveraging Geo-Spatial (Big) Data for Financial Services SolutionsLeveraging Geo-Spatial (Big) Data for Financial Services Solutions
Leveraging Geo-Spatial (Big) Data for Financial Services Solutions
Ā 
Mobile LBS
Mobile LBSMobile LBS
Mobile LBS
Ā 
NFC, LBS and Smart Screens in Transport
NFC, LBS and Smart Screens in Transport NFC, LBS and Smart Screens in Transport
NFC, LBS and Smart Screens in Transport
Ā 
DataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census dataDataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census data
Ā 
Charles Birnbaum (Foursquare) Ā«Big data and user generated content LBSĀ»
Charles Birnbaum (Foursquare) Ā«Big data and user generated content LBSĀ»Charles Birnbaum (Foursquare) Ā«Big data and user generated content LBSĀ»
Charles Birnbaum (Foursquare) Ā«Big data and user generated content LBSĀ»
Ā 
Lecture8 rocks
Lecture8 rocksLecture8 rocks
Lecture8 rocks
Ā 
Linked (Geo) Data - Adding a Spatial Dimension to the Web of Data
Linked (Geo) Data - Adding a Spatial Dimension to the Web of DataLinked (Geo) Data - Adding a Spatial Dimension to the Web of Data
Linked (Geo) Data - Adding a Spatial Dimension to the Web of Data
Ā 
Big Data in Retail
Big Data in RetailBig Data in Retail
Big Data in Retail
Ā 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
Ā 
SXSW Keynote - The Game Layer On Top Of The World
SXSW Keynote - The Game Layer On Top Of The WorldSXSW Keynote - The Game Layer On Top Of The World
SXSW Keynote - The Game Layer On Top Of The World
Ā 

Ƅhnlich wie Geo Analytics Tutorial - Where 2.0 2011

NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)Kevin Weil
Ā 
Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)
Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)
Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)Kevin Weil
Ā 
Spatial Analytics, Where 2.0 2010
Spatial Analytics, Where 2.0 2010Spatial Analytics, Where 2.0 2010
Spatial Analytics, Where 2.0 2010Kevin Weil
Ā 
Where20 Spatial Analytics 2010
Where20 Spatial Analytics 2010Where20 Spatial Analytics 2010
Where20 Spatial Analytics 2010seagor
Ā 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil
Ā 
Hadoop and pig at twitter (oscon 2010)
Hadoop and pig at twitter (oscon 2010)Hadoop and pig at twitter (oscon 2010)
Hadoop and pig at twitter (oscon 2010)Kevin Weil
Ā 
Big Data at Twitter, Chirp 2010
Big Data at Twitter, Chirp 2010Big Data at Twitter, Chirp 2010
Big Data at Twitter, Chirp 2010Kevin Weil
Ā 
Big Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilindBig Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilindEMC
Ā 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm DesignGabriela Agustini
Ā 
Concurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papersConcurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papersSubhajit Sahu
Ā 
Terascale Learning
Terascale LearningTerascale Learning
Terascale Learningpauldix
Ā 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopHĆ©loĆÆse Nonne
Ā 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansattilacsordas
Ā 
COCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate AscentCOCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate Ascentjeykottalam
Ā 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
Ā 
Processing Big Data in Real-Time - Yanai Franchi, Tikal
Processing Big Data in Real-Time - Yanai Franchi, TikalProcessing Big Data in Real-Time - Yanai Franchi, Tikal
Processing Big Data in Real-Time - Yanai Franchi, TikalCodemotion Tel Aviv
Ā 
Graph processing
Graph processingGraph processing
Graph processingyeahjs
Ā 
Cliff Click Explains GBM at Netflix October 10 2013
Cliff Click Explains GBM at Netflix October 10 2013Cliff Click Explains GBM at Netflix October 10 2013
Cliff Click Explains GBM at Netflix October 10 2013Sri Ambati
Ā 
Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonJoe Stein
Ā 

Ƅhnlich wie Geo Analytics Tutorial - Where 2.0 2011 (20)

NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)
Ā 
Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)
Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)
Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)
Ā 
Spatial Analytics, Where 2.0 2010
Spatial Analytics, Where 2.0 2010Spatial Analytics, Where 2.0 2010
Spatial Analytics, Where 2.0 2010
Ā 
Where20 Spatial Analytics 2010
Where20 Spatial Analytics 2010Where20 Spatial Analytics 2010
Where20 Spatial Analytics 2010
Ā 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
Ā 
Hadoop and pig at twitter (oscon 2010)
Hadoop and pig at twitter (oscon 2010)Hadoop and pig at twitter (oscon 2010)
Hadoop and pig at twitter (oscon 2010)
Ā 
Big Data at Twitter, Chirp 2010
Big Data at Twitter, Chirp 2010Big Data at Twitter, Chirp 2010
Big Data at Twitter, Chirp 2010
Ā 
Big Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilindBig Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilind
Ā 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
Ā 
Concurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papersConcurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papers
Ā 
Terascale Learning
Terascale LearningTerascale Learning
Terascale Learning
Ā 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
Ā 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticians
Ā 
COCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate AscentCOCOA: Communication-Efficient Coordinate Ascent
COCOA: Communication-Efficient Coordinate Ascent
Ā 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Ā 
Processing Big Data in Real-Time - Yanai Franchi, Tikal
Processing Big Data in Real-Time - Yanai Franchi, TikalProcessing Big Data in Real-Time - Yanai Franchi, Tikal
Processing Big Data in Real-Time - Yanai Franchi, Tikal
Ā 
Graph processing
Graph processingGraph processing
Graph processing
Ā 
Using MapReduce for Largeā€“scale Medical Image Analysis
Using MapReduce for Largeā€“scale Medical Image AnalysisUsing MapReduce for Largeā€“scale Medical Image Analysis
Using MapReduce for Largeā€“scale Medical Image Analysis
Ā 
Cliff Click Explains GBM at Netflix October 10 2013
Cliff Click Explains GBM at Netflix October 10 2013Cliff Click Explains GBM at Netflix October 10 2013
Cliff Click Explains GBM at Netflix October 10 2013
Ā 
Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With Python
Ā 

Mehr von Peter Skomoroch

Bridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportBridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportPeter Skomoroch
Ā 
Managing Machines: The New AI Dev Stack
Managing Machines: The New AI Dev StackManaging Machines: The New AI Dev Stack
Managing Machines: The New AI Dev StackPeter Skomoroch
Ā 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AIPeter Skomoroch
Ā 
Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkPeter Skomoroch
Ā 
Building Competitive Moats With Data
Building Competitive Moats With DataBuilding Competitive Moats With Data
Building Competitive Moats With DataPeter Skomoroch
Ā 
O'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustO'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustPeter Skomoroch
Ā 
SF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsSF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsPeter Skomoroch
Ā 
Skills, Reputation, and Search
Skills, Reputation, and SearchSkills, Reputation, and Search
Skills, Reputation, and SearchPeter Skomoroch
Ā 
LinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social TaggingLinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social TaggingPeter Skomoroch
Ā 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data ProductsPeter Skomoroch
Ā 
Practical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, TokyoPractical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, TokyoPeter Skomoroch
Ā 
Street Fighting Data Science
Street Fighting Data ScienceStreet Fighting Data Science
Street Fighting Data SciencePeter Skomoroch
Ā 
Data Mashups -Data Science Summit
Data Mashups -Data Science SummitData Mashups -Data Science Summit
Data Mashups -Data Science SummitPeter Skomoroch
Ā 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopPeter Skomoroch
Ā 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPeter Skomoroch
Ā 
Elasticwulf Pycon Talk
Elasticwulf Pycon TalkElasticwulf Pycon Talk
Elasticwulf Pycon TalkPeter Skomoroch
Ā 

Mehr von Peter Skomoroch (16)

Bridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportBridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder Support
Ā 
Managing Machines: The New AI Dev Stack
Managing Machines: The New AI Dev StackManaging Machines: The New AI Dev Stack
Managing Machines: The New AI Dev Stack
Ā 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AI
Ā 
Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you think
Ā 
Building Competitive Moats With Data
Building Competitive Moats With DataBuilding Competitive Moats With Data
Building Competitive Moats With Data
Ā 
O'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustO'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data Exhaust
Ā 
SF Data Science: Developing Data Products
SF Data Science: Developing Data ProductsSF Data Science: Developing Data Products
SF Data Science: Developing Data Products
Ā 
Skills, Reputation, and Search
Skills, Reputation, and SearchSkills, Reputation, and Search
Skills, Reputation, and Search
Ā 
LinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social TaggingLinkedIn Endorsements: Reputation, Virality, and Social Tagging
LinkedIn Endorsements: Reputation, Virality, and Social Tagging
Ā 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data Products
Ā 
Practical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, TokyoPractical Problem Solving with Data - Onlab Data Conference, Tokyo
Practical Problem Solving with Data - Onlab Data Conference, Tokyo
Ā 
Street Fighting Data Science
Street Fighting Data ScienceStreet Fighting Data Science
Street Fighting Data Science
Ā 
Data Mashups -Data Science Summit
Data Mashups -Data Science SummitData Mashups -Data Science Summit
Data Mashups -Data Science Summit
Ā 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
Ā 
Prototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.orgPrototyping Data Intensive Apps: TrendingTopics.org
Prototyping Data Intensive Apps: TrendingTopics.org
Ā 
Elasticwulf Pycon Talk
Elasticwulf Pycon TalkElasticwulf Pycon Talk
Elasticwulf Pycon Talk
Ā 

KĆ¼rzlich hochgeladen

šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜RTylerCroy
Ā 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
Ā 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
Ā 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
Ā 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
Ā 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
Ā 
Scaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationScaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationRadu Cotescu
Ā 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
Ā 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
Ā 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
Ā 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
Ā 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
Ā 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
Ā 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
Ā 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
Ā 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
Ā 
#StandardsGoals for 2024: Whatā€™s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: Whatā€™s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: Whatā€™s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: Whatā€™s new for BISAC - Tech Forum 2024BookNet Canada
Ā 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
Ā 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
Ā 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...gurkirankumar98700
Ā 

KĆ¼rzlich hochgeladen (20)

šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜
Ā 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Ā 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
Ā 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Ā 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Ā 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Ā 
Scaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationScaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organization
Ā 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Ā 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
Ā 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Ā 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Ā 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Ā 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Ā 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
Ā 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Ā 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Ā 
#StandardsGoals for 2024: Whatā€™s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: Whatā€™s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: Whatā€™s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: Whatā€™s new for BISAC - Tech Forum 2024
Ā 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Ā 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Ā 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...
Ā 

Geo Analytics Tutorial - Where 2.0 2011

  • 1. Geo Analytics Tutorial Pete Skomoroch Sr. Data Scientist - LinkedIn (@peteskomoroch) #geoanalytics ** Hadoop Intro slides from Kevin Weil, Twitter
  • 2. Topics ā€£ Data Science & Geo Analytics ā€£ Useful Geo tools and Datasets ā€£ Hadoop, Pig, and Big Data ā€£ Cleaning Location Data with Mechanical Turk ā€£ Spatial Tweet Analytics with Hadoop & Python ā€£ Using Social Data to Understand Cities ā€£ Q&A
  • 3. Topics ā€£ Data Science & Geo Analytics ā€£ Useful Geo tools and Datasets ā€£ Hadoop, Pig, and Big Data ā€£ Cleaning Location Data with Mechanical Turk ā€£ Spatial Tweet Analytics with Hadoop & Python ā€£ Using Social Data to Understand Cities ā€£ Q&A
  • 4. Analytics & Data are Hot Topics
  • 5. Analytics & Data are Hot Topics
  • 6. Analytics & Data are Hot Topics
  • 7. Analytics & Data are Hot Topics
  • 8. Analytics & Data are Hot Topics
  • 9. Analytics & Data are Hot Topics
  • 10. Data Exhaust My Delicious Tags
  • 11. Data Science * http://www.drewconway.com/zia/?p=2378
  • 12. Data Visualization ā€£ http://www.dataspora.com/blog/
  • 13. Spatial Analysis Map by Dr. John Snow of London, showing clusters of cholera cases in the 1854 Broad Street cholera outbreak. This was one of the first uses of map-based spatial analysis.
  • 14. Spatial Analysis ā€¢ Spatial regression - estimate dependencies between variables ā€¢ Gravity models - estimate the flow of people, material, or information between locations ā€¢ Spatial interpolation - estimate variables at unobserved locations based on other measured values ā€¢ Simulation - use models and data to predict spatial phenomena
  • 15. Life Span & Food by Zip Code * http://zev.lacounty.gov/news/health/death-by-zip-code * http://www.verysmallarray.com/?p=975
  • 16. Where Americans Are Moving (IRS Data) ā€£ (Jon Bruner) http://jebruner.com/2010/06/the-migration-map/
  • 17. Facebook Connectivity (Pete Warden) * http://petewarden.typepad.com/searchbrowser/2010/02/how-to-split-up-the-us.html
  • 18. Topics ā€£ Data Science & Geo Analytics ā€£ Useful Geo tools and Datasets ā€£ Hadoop, Pig, and Big Data ā€£ Cleaning Location Data with Mechanical Turk ā€£ Spatial Tweet Analytics with Hadoop & Python ā€£ Using Social Data to Understand Cities ā€£ Q&A
  • 19. Useful Geo Tools ā€¢R, Matlab, SciPy, Commercial Geo Software ā€¢R Spatial Pkgs http://cran.r-project.org/web/views/Spatial.html ā€¢Hadoop, Amazon EC2, Mechanical Turk ā€¢Data Science Toolkit: http://www.datasciencetoolkit.org/ ā€¢80% of effort is often in cleaning and processing data
  • 20. DataScienceToolkit.org ā€¢Runs on VM or Amazon EC2 ā€¢Street Address to Coordinates ā€¢Coordinates to Political Areas ā€¢Geodict (text extraction) ā€¢IP Address to Coordinates ā€¢New UK release on Github
  • 21. Resources for location data ā€¢ SimpleGeo ā€¢ Factual ā€¢ Geonames ā€¢ Infochimps ā€¢ Data.gov ā€¢ DataWrangling.com
  • 22. Topics ā€£ Data Science & Geo Analytics ā€£ Useful Geo tools and Datasets ā€£ Hadoop, Pig, and Big Data ā€£ Cleaning Location Data with Mechanical Turk ā€£ Spatial Tweet Analytics with Hadoop & Python ā€£ Using Social Data to Understand Cities ā€£ Q&A
  • 23. Hadoop: Motivation ā€¢We want to crunch 1TB of Twitter stream data and understand spatial patterns in Tweets ā€¢Data collected from the Twitter ā€œGarden Hoseā€ API last Spring
  • 24. Data is Getting Big ā€£ NYSE: 1 TB/day ā€£ Facebook: 20+ TB compressed/day ā€£ CERN/LHC: 40 TB/day (15 PB/year!) ā€£ And growth is accelerating ā€£ Need multiple machines, horizontal scalability
  • 25. Hadoop ā€£ Distributed file system (hard to store a PB) ā€£ Fault-tolerant, handles replication, node failure, etc ā€£ MapReduce-based parallel computation (even harder to process a PB) ā€£ Generic key-value based computation interface allows for wide applicability ā€£ Open source, top-level Apache project ā€£ Scalable: Y! has a 4000-node cluster ā€£ Powerful: sorted a TB of random integers in 62 seconds
  • 26. MapReduce? cat file | grep geo | sort | uniq -c > ā€£ Challenge: how many tweets per output county, given tweets table? ā€£ Input: key=row, value=tweet info ā€£ Map: output key=county, value=1 ā€£ Shuffle: sort by county ā€£ Reduce: for each county, sum ā€£ Output: county, tweet count ā€£ With 2x machines, runs close to 2x faster.
  • 27. MapReduce? cat file | grep geo | sort | uniq -c > ā€£ Challenge: how many tweets per output county, given tweets table? ā€£ Input: key=row, value=tweet info ā€£ Map: output key=county, value=1 ā€£ Shuffle: sort by county ā€£ Reduce: for each county, sum ā€£ Output: county, tweet count ā€£ With 2x machines, runs close to 2x faster.
  • 28. MapReduce? cat file | grep geo | sort | uniq -c > ā€£ Challenge: how many tweets per output county, given tweets table? ā€£ Input: key=row, value=tweet info ā€£ Map: output key=county, value=1 ā€£ Shuffle: sort by county ā€£ Reduce: for each county, sum ā€£ Output: county, tweet count ā€£ With 2x machines, runs close to 2x faster.
  • 29. MapReduce? cat file | grep geo | sort | uniq -c > ā€£ Challenge: how many tweets per output county, given tweets table? ā€£ Input: key=row, value=tweet info ā€£ Map: output key=county, value=1 ā€£ Shuffle: sort by county ā€£ Reduce: for each county, sum ā€£ Output: county, tweet count ā€£ With 2x machines, runs close to 2x faster.
  • 30. MapReduce? cat file | grep geo | sort | uniq -c > ā€£ Challenge: how many tweets per output county, given tweets table? ā€£ Input: key=row, value=tweet info ā€£ Map: output key=county, value=1 ā€£ Shuffle: sort by county ā€£ Reduce: for each county, sum ā€£ Output: county, tweet count ā€£ With 2x machines, runs close to 2x faster.
  • 31. MapReduce? cat file | grep geo | sort | uniq -c > ā€£ Challenge: how many tweets per output county, given tweets table? ā€£ Input: key=row, value=tweet info ā€£ Map: output key=county, value=1 ā€£ Shuffle: sort by county ā€£ Reduce: for each county, sum ā€£ Output: county, tweet count ā€£ With 2x machines, runs close to 2x faster.
  • 32. MapReduce? cat file | grep geo | sort | uniq -c > ā€£ Challenge: how many tweets per output county, given tweets table? ā€£ Input: key=row, value=tweet info ā€£ Map: output key=county, value=1 ā€£ Shuffle: sort by county ā€£ Reduce: for each county, sum ā€£ Output: county, tweet count ā€£ With 2x machines, runs close to 2x faster.
  • 33. But... ā€£ Analysis typically done in Java ā€£ Single-input, two-stage data flow is rigid ā€£ Projections, filters: custom code ā€£ Joins: lengthy, error-prone ā€£ n-stage jobs: Hard to manage ā€£ Prototyping/exploration requires ā€£ analytics in Eclipse? compilation ur doin it wrong...
  • 34. Enter Pig ā€£ High level language ā€£ Transformations on sets of records ā€£ Process data one step at a time ā€£ Easier than SQL?
  • 35. Why Pig? ā€£ Because I bet you can read the following script.
  • 36. A Real Pig Script ā€£ Now, just for fun... the same calculation in vanilla Hadoop MapReduce.
  • 38. Pig Simplifies Analysis ā€£ The Pig version is: ā€£ 5% of the code, 5% of the time ā€£ Within 50% of the execution time. ā€£ Pig Geo: ā€£ Programmable: fuzzy matching, custom filtering ā€£ Easily link multiple datasets, regardless of size/structure ā€£ Iterative, quick
  • 39. A Real Example ā€£ Fire up your Elastic MapReduce Cluster. ā€£ ... or follow along at http://bit.ly/whereanalytics ā€£ I used Twitterā€™s streaming API to store some tweets ā€£ Simplest thing: group by location and count with Pig ā€£ http://bit.ly/where20pig ā€£ Here comes some code!
  • 40.
  • 41. tweets = LOAD 's3://where20demo/sample-tweets' as ( user_screen_name:chararray, tweet_id:chararray, ... user_friends_count:int, user_statuses_count:int, user_location:chararray, user_lang:chararray, user_time_zone:chararray, place_id:chararray, ...);
  • 42. tweets = LOAD 's3://where20demo/sample-tweets' as ( user_screen_name:chararray, tweet_id:chararray, ... user_friends_count:int, user_statuses_count:int, user_location:chararray, user_lang:chararray, user_time_zone:chararray, place_id:chararray, ...);
  • 43. tweets_with_location = FILTER tweets BY user_location != 'NULL';
  • 44. normalized_locations = FOREACH tweets_with_location GENERATE LOWER(user_location) as user_location;
  • 45. grouped_tweets = GROUP normalized_locations BY user_location PARALLEL 10;
  • 46. location_counts = FOREACH grouped_tweets GENERATE $0 as location, SIZE($1) as user_count;
  • 47. sorted_counts = ORDER location_counts BY user_count DESC;
  • 48. STORE sorted_counts INTO 'global_location_tweets';
  • 49. hadoop@ip-10-160-113-142:~$ hadoop dfs -cat /global_location_counts/part* | head -30 brasil 37985 indonesia 33777 brazil 22432 london 17294 usa 14564 sĆ£o paulo 14238 new york 13420 tokyo 10967 singapore 10225 rio de janeiro 10135 los angeles 9934 california 9386 chicago 9155 uk 9095 jakarta 9086 germany 8741 canada 8201 7696 7121 jakarta, indonesia 6480 nyc 6456 new york, ny 6331
  • 50. Neat, but... ā€£ Wow, that data is messy! ā€£ brasil, brazil at #1 and #3 ā€£ new york, nyc, and new york ny all in the top 30 ā€£ Mechanical Turk to the rescue...
  • 51. Topics ā€£ Data Science & Geo Analytics ā€£ Useful Geo tools and Datasets ā€£ Hadoop, Pig, and Big Data ā€£ Cleaning Location Data with Mechanical Turk ā€£ Spatial Tweet Analytics with Hadoop & Python ā€£ Using Social Data to Understand Cities ā€£ Q&A
  • 52. Code examples weā€™ll cover are on Github
  • 53. You can run them on Elastic MapReduce
  • 54. Cleaning Twitter Profile Location Names Filter Exact Matches Extract Top Tweet Locations Clean with MTurk Aggregate Context with Hadoop
  • 55. We will map locations to GeoNames IDs
  • 56. Start with Location Exact Matches
  • 57. Use Mechanical Turk to improve results
  • 58. Workers do simple tasks for a few cents
  • 59. We constructed the following task
  • 60. Workers used a Geonames search tool
  • 61. Location search tool code is on Github
  • 62. Preparing Data to send to MTurk
  • 63. We use consensus answers from workers
  • 65. Topics ā€£ Data Science & Geo Analytics ā€£ Useful Geo tools and Datasets ā€£ Hadoop, Pig, and Big Data ā€£ Cleaning Location Data with Mechanical Turk ā€£ Spatial Tweet Analytics with Hadoop & Python ā€£ Using Social Data to Understand Cities ā€£ Q&A
  • 66. Tokenizing and Cleaning Tweet Text ā€£ Extract Tweet topics with Hadoop + Python + NLTK + Wikipedia
  • 67. Build Phrase Dictionary with Wikipedia
  • 68. Streaming Tweet Parser (Python + NLTK)
  • 69. Parse Tweets and Join to Wikipedia (Pig)
  • 70. Aggregate by US County for Analysis
  • 71. Clean Data => Thematic US County Map
  • 72. Twitter users by county in our sample
  • 77. Topics ā€£ Data Science & Geo Analytics ā€£ Useful Geo tools and Datasets ā€£ Hadoop, Pig, and Big Data ā€£ Cleaning Location Data with Mechanical Turk ā€£ Spatial Tweet Analytics with Hadoop & Python ā€£ Using Social Data to Understand Cities ā€£ Q&A
  • 79. Skills in the Design Industry
  • 80. Exploring the Spatial Distribution of Skills
  • 81. People with ā€œShip Buildingā€ Skills
  • 82. What is the Skill profile of a given city?
  • 83. Expertise correlated with Santa Clara, CA
  • 85. Expertise correlated with Washington, DC
  • 86. Yuba City, CA has 21.3% Unemployment 21.3
  • 87. Ames, Iowa has 4.7% Unemployment 21.3
  • 88. Topics ā€£ Data Science & Geo Analytics ā€£ Useful Geo tools and Datasets ā€£ Hadoop, Pig, and Big Data ā€£ Cleaning Location Data with Mechanical Turk ā€£ Spatial Tweet Analytics with Hadoop & Python ā€£ Using Social Data to Understand Cities ā€£ Q&A
  • 89. Questions? Follow me at twitter.com/peteskomoroch datawrangling.com

Hinweis der Redaktion

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n
  81. \n
  82. \n
  83. \n
  84. \n
  85. \n
  86. \n
  87. \n
  88. \n