SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Downloaden Sie, um offline zu lesen
Python for Open Data Lovers:
                    Explore It, Analyze It, Map It

                           Jackie Kazil    Dana Bauer
                            @jackiekazil   @geography76




Saturday, March 10, 2012
Saturday, March 10, 2012
Saturday, March 10, 2012
Saturday, March 10, 2012
Saturday, March 10, 2012
Where are we going?
                    • open data everywhere
                    • a data swiss army knife
                    • finding network patterns
                    • finding spatial patterns
                    • which stories to pursue? moving beyond
                           data analysis




Saturday, March 10, 2012
•        Data.gov

                  •        OpenDataPhilly

                  •        DC Data Catalog

                  •        DataSF

                  •        Chicago Data Portal

                  •        NYC Open Data

                  •        London Datastore




Saturday, March 10, 2012
assembly member expenses
                                                        bicycle lanes
                                                  city purchase orders
                                                     dialysis centers
                                                       elevation data
                                                     filming locations
                                Google Transit Feed Specification (GTFS)
                                                    historical photos
                                                      influenza rates
                                                     judicial districts
                               Key Stage 2 test results by free school meal eligibility
                                                         land cover
                           monthly calls to Human Services Agency switchboard operators
                                              neighborhood health clinics
                                              Oyster ticket stop locations
                                                 political districts
                                                quality of life indicators
                                                restaurant inspections
                                                         sewer lines
                                                       traffic counts
                                     utility excavation and paving five-year plan
                                             violent crime incidents
                                                        ward offices
                                                       youth centers
                                                           zoning

                              **real-time parking availability and pricing**




Saturday, March 10, 2012
Saturday, March 10, 2012
http://bit.ly/DCdatafail




Saturday, March 10, 2012
Saturday, March 10, 2012
• What are DC agencies spending money on?
                    • How much are they spending?
                    • What are the relationships between
                           businesses and agencies?

                    • Where are these businesses located?

Saturday, March 10, 2012
Saturday, March 10, 2012
swiss army knife

                    •      csvkit: http://csvkit.readthedocs.org/

                    •      a set of Python utilities for working with csv

                    •      meant to replace csv module

                    •      pip install csvkit (no issues!)




Saturday, March 10, 2012
$       csvcut -n purchase2011_cleaned.csv
                  1: PO_NUMBER
                  2: AGENCY_NAME
                  3: NIGP_DESCRIPTION
                  4: PO_TOTAL_AMOUNT
                  5: ORDER_DATE
                  6: SUPPLIER
                  7: SUPPLIER_FULL_ADDRESS


           ! ! !



Saturday, March 10, 2012
$   csvcut -c 2,6 purchase2011_cleaned.csv | csvstat
               1. AGENCY_NAME
           !    <type 'unicode'>
           !    Nulls: False
           !    Unique values: 85
           !    5 most frequent values:
           !    ! DISTRICT OF COLUMBIA PUBLIC SCHOOLS:!2410
           !    ! STATE SUPERINTENDENT OF EDUCATION (OSSE):! 1340
           !    ! DEPARTMENT OF HEALTH:! 895
           !    ! OFFICE OF CHIEF TECHNOLOGY OFFICER:! 786
           !    ! OFF PUBLIC ED FACILITIES MODERNIZATION:!722
           !    Max length: 40
               2. SUPPLIER
           !    <type 'unicode'>
           !    Nulls: False
           !    Unique values: 4357
           !    5 most frequent values:
           !    ! OST, INC.:! 841
           !    ! DELL COMPUTER CORP.:! 366
           !    ! AMERICAN EXPRESS COMPANY:! 282
           !    ! MVS, INC.:! 176
           !    ! CAPITAL SERVICES AND SUPPLIES:! 167
           !    Max length: 52

           Row count: 16075

           !     !     !



Saturday, March 10, 2012
$   csvgrep -c 6 -r ^MAYA purchase2011_cleaned.csv

PO_NUMBER,AGENCY_NAME,NIGP_DESCRIPTION,PO_TOTAL_AMOUNT,ORDER_DATE,SUPPLIER,SUPPLIER_FULL_ADDRESS
PO352244,STATE SUPERINTENDENT OF EDUCATION (OSSE),EDUCATION AND TRAINING CONSULTING 38,408644.73,01/04/2011,MAYA ANGELOU
PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001"
PO352652,STATE SUPERINTENDENT OF EDUCATION (OSSE),EDUCATION AND TRAINING CONSULTING 38,111679.16,01/07/2011,MAYA ANGELOU
PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001"
PO352920,PUBLIC CHARTER SCHOOLS,SCHOOL OPERATION AND MANAGEMENT SERVICES 71,2205630.13,01/11/2011,MAYA ANGELOU PCS,"1851
9TH STREET NW, WASHINGTON, DC, 20001"
PO355150,STATE SUPERINTENDENT OF EDUCATION (OSSE),EDUCATION AND TRAINING CONSULTING 38,391092.49,02/07/2011,MAYA ANGELOU
PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001"
PO356426,STATE SUPERINTENDENT OF EDUCATION (OSSE),FINANCIAL SERVICES (NOT OTHERWISE CLASSIFIED)
49,999891,02/23/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001"
PO356632,STATE SUPERINTENDENT OF EDUCATION (OSSE),PROFESSIONAL SERVICES (NOT OTHERWISE CLASSIFIED)
58,187200,02/25/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001"
PO359961,PUBLIC CHARTER SCHOOLS,SCHOOL OPERATION AND MANAGEMENT SERVICES 71,1753238,04/12/2011,MAYA ANGELOU PCS,"1851
9TH STREET NW, WASHINGTON, DC, 20001"
PO360284,STATE SUPERINTENDENT OF EDUCATION (OSSE),EDUCATION AND TRAINING CONSULTING 38,110729.88,04/14/2011,MAYA ANGELOU
PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001"
PO361203,STATE SUPERINTENDENT OF EDUCATION (OSSE),EDUCATION AND TRAINING CONSULTING 38,92617.32,04/28/2011,MAYA ANGELOU
PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001"
PO351462-V2,STATE SUPERINTENDENT OF EDUCATION (OSSE),EDUCATIONAL RESEARCH SERVICES 19,152229.95,05/05/2011,MAYA ANGELOU
PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001"
PO364208,STATE SUPERINTENDENT OF EDUCATION (OSSE),EDUCATION AND TRAINING CONSULTING 38,118825.51,06/09/2011,MAYA ANGELOU
PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001"
PO366839,PUBLIC CHARTER SCHOOLS,SCHOOL OPERATION AND MANAGEMENT SERVICES 71,2767027,07/12/2011,MAYA ANGELOU PCS,"1851
9TH STREET NW, WASHINGTON, DC, 20001"
PO365094-V2,STATE SUPERINTENDENT OF EDUCATION (OSSE),YOUTH CARE SERVICES 95,98092.35,08/15/2011,MAYA ANGELOU PCS,"1851
9TH STREET NW, WASHINGTON, DC, 20001"
PO370948,STATE SUPERINTENDENT OF EDUCATION (OSSE),YOUTH CARE SERVICES 95,45736.58,08/25/2011,MAYA ANGELOU PCS,"1851 9TH
STREET NW, WASHINGTON, DC, 20001"
PO361027-V5,STATE SUPERINTENDENT OF EDUCATION (OSSE),EDUCATION AND TRAINING CONSULTING 38,29424.86,09/06/2011,MAYA
ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001"
PO374132,STATE SUPERINTENDENT OF EDUCATION (OSSE),YOUTH CARE SERVICES 95,9000,09/28/2011,MAYA ANGELOU PCS,"1851 9TH
STREET NW, WASHINGTON, DC, 20001"
PO377919,STATE SUPERINTENDENT OF EDUCATION (OSSE),YOUTH CARE SERVICES 95,491663.6,10/25/2011,MAYA ANGELOU PCS,"1851 9TH
STREET NW, WASHINGTON, DC, 20001"
PO381219,STATE SUPERINTENDENT OF EDUCATION (OSSE),EDUCATION AND TRAINING CONSULTING 38,120188.81,11/29/2011,MAYA ANGELOU
PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001"
PO383965,STATE SUPERINTENDENT OF EDUCATION (OSSE),YOUTH CARE SERVICES 95,294690.57,12/22/2011,MAYA ANGELOU PCS,"1436 U
STREET, NW SUITE 203, WASHINGTON, DC, 20009"
!   !    !
Saturday, March 10, 2012
$ csvcut -c 4,2,6,5 purchase2011_cleaned.csv | csvsort -r | head -n
 20 | csvlook
 ------------------------------------------------------------------------------------------------------------
 | PO_TOTAL_AMOUNT | AGENCY_NAME                             | SUPPLIER                       | ORDER_DATE |
 ------------------------------------------------------------------------------------------------------------
 | 154133337.02     | DEPARTMENT OF TRANSPORTATION           | SKANSKA-FACCHINA JV            | 2011-11-10 |
 | 62677473.88      | DEPARTMENT OF REAL ESTATE SERVICES     | EEC OF DC INC-FORRESTER CONSTR | 2011-09-22 |
 | 31809425.48      | DEPARTMENT OF HEALTH                   | DEFENSE LOGISTIC AGENCY        | 2011-09-08 |
 | 23600580.0       | DEPARTMENT OF CORRECTIONS              | UNITY HEALTH CARE, INC.        | 2011-10-24 |
 | 23538552.0       | DEPARTMENT OF REAL ESTATE SERVICES     | EEC-FORRESTER ANACOSTIA        | 2011-11-08 |
 | 22375314.45      | DEPARTMENT OF CORRECTIONS              | CORRECTIONS CORPORATION OF     | 2011-05-25 |
 | 21450000.04      | DEPARTMENT OF HUMAN SERVICES           | THE COMMUNITY PARTNERSHIPHOME | 2011-08-18 |
 | 20813348.99      | DEPARTMENT OF REAL ESTATE SERVICES     | THE JOHN AKRIDGE CO            | 2011-06-28 |
 | 20622000.0       | DEPARTMENT OF TRANSPORTATION           | W M SCHLOSSER CO INC           | 2011-08-29 |
 | 19824914.0       | DEPARTMENT OF CORRECTIONS              | CORRECTIONS CORPORATION OF     | 2011-10-24 |
 | 18300956.56      | DEPARTMENT OF HUMAN SERVICES           | THE COMMUNITY PARTNERSHIPHOME | 2011-11-29 |
 | 18104339.98      | DEPARTMENT OF HUMAN SERVICES           | THE COMMUNITY PARTNERSHIPHOME | 2011-05-17 |
 | 18000000.0       | DEPARTMENT OF HEALTH                   | DC PRIMARY CARE ASSOCIATION    | 2011-03-10 |
 | 17000000.0       | DEPARTMENT OF HEALTH                   | CHILDRENS NATIONAL MEDICAL CTR | 2011-11-25 |
 | 16850000.0       | DEPUTY MAYOR FOR ECONOMIC DEVELOPMENT | 2 M STREET REDEVELOPMENT LLC    | 2011-09-29 |
 | 16333257.33      | DEPARTMENT OF HUMAN SERVICES           | THE COMMUNITY PARTNERSHIPHOME | 2011-06-02 |
 | 14206937.0       | PUBLIC CHARTER SCHOOLS                 | FRIENDSHIP PCS                 | 2011-07-12 |
 | 13862557.44      | MUNICIPAL FACILITIES: NON-CAPITAL      | US SECURITY ASSOCIATES, INC.   | 2011-10-07 |
 | 13800000.0       | DISTRICT DEPARTMENT OF THE ENVIRONMENT | VERMONT ENERGY INVESTMENT CORP | 2011-10-04 |
 ------------------------------------------------------------------------------------------------------------

 !     !     !




Saturday, March 10, 2012
Social Network Analysis

                       “Social network analysis is focused on
                       uncovering the patterning of people's
                                    interaction.”
                        - http://www.insna.org/sna/what.html




Saturday, March 10, 2012
99th House




                               President: Reagan
                               House majority: Democrats
                               Years: 1985, 1986


Saturday, March 10, 2012
107th House




                               President: Bush
                               House majority: Republicans
                               Years: 2001, 2002


Saturday, March 10, 2012
108th House




                   President: Bush
                   House majority: Republicans
                   Years: 2003, 2004

Saturday, March 10, 2012
109th House




                                President: Bush
                                House majority: Republicans
                                Years: 2005, 2006


Saturday, March 10, 2012
110th House




                                President: Bush
                                House majority: Democrats
                                Years: 2007, 2008


Saturday, March 10, 2012
111th House




                                President: Obama
                                House majority: Democrats
                                Years: 2009, 2010


Saturday, March 10, 2012
CSV to network
                import networkx as nx

                G = nx.Graph()
                node_edgelist = []

                # grab edges
                for row in csv_file:
                    node_edgelist.append((n,e))

                # create edges
                for f in node_edgelist:
                    for t in node_edgelist:
                         if t != f:
                             add_edge_or_weight(G, f[0], t[0])



Saturday, March 10, 2012
Centrality Analysis (networkx)
                Degree - nx.degree(G)
                # of connections; More connections = more important

                Closeness centrality
                nx.closeness_centrality(G)
                Distance to all other nodes; Closer = more important

                Betweenness centrality
                nx.betweenness_centrality(G)
                Based on the shortest path of info control

                Page rank
                nx.pagerank(G)
                Node gains importance via the importance around him

Saturday, March 10, 2012
Centrality Analysis (networkx)




Saturday, March 10, 2012
Centrality Analysis (networkx)
                Digi Docs Inc Document Mangers (Dallas)
                “Offers software that generates loan documents for electronic delivery.”

                Iron Mountain (Mountain View)
                “Iron Mountain provides information management services that help organizations
                lower the costs, risks and inefficiencies of managing their physical and digital data.”

                MVS, Inc. (Washington, DC)
                “MVS Consulting is an 8(a) STARS II, HUBZone, LSDBE, CBE, and MBE IT
                Solutions company that provides IT solutions to Federal, State and Local
                Government Agencies.”

                MDM OFFICE SYSTEMS INC (Washington, DC)
                "Standard Office Supply - Office Supplies, Furniture Dealer, Educational Products,
                Breakroom Supplies, Imaging Supplies, and Coffee Services"

                Capital Services and Supplies (Washington, DC)
                “CSSI is an office solutions firm located in Washington, DC since 1980. CSSI’s
                goods and services are available to commercial, government, and educational
                institutions throughout the continental United States.”


Saturday, March 10, 2012
Centrality Analysis (networkx)
                           Not included in previous slide...

                            United States Postal Service
                                         &
                               Dell Computer Corp




Saturday, March 10, 2012
Visual the network
                pos=nx.spring_layout(G,iterations=100)
                plot.figure(1,figsize=(15,15))
                plt.axis('off')

                nx.draw_networkx_nodes(
                   G,
                   pos,node_size=100,
                   alpha=1,
                   node_color='g'
                )

                nx.draw_networkx_edges(G,pos,alpha=0.2)
                plot.savefig('graph.png')

Saturday, March 10, 2012
Visual the network




Saturday, March 10, 2012
Trimming nodes

                g2 = G.copy()
                d = nx.degree(g2)
                for n in g2.nodes():
                   if d[n] <= degree: g2.remove_node(n)
                   return g2




Saturday, March 10, 2012
Degree Distribution
                               d=nx.degree(G)
                               plot.figure(1,figsize=(15,10))
                               h=plot.hist(d.values(),100)




Saturday, March 10, 2012
Degree Distribution




Saturday, March 10, 2012
Degree Distribution




Saturday, March 10, 2012
Trimmed nodes




Saturday, March 10, 2012
Adding labels




Saturday, March 10, 2012
nx.draw_networkx_labels
                           (g3,pos,alpha=1)
                           nx.draw_networkx_edges
                           (g3,pos,alpha=0.05)



Saturday, March 10, 2012
Maps to maps




Saturday, March 10, 2012
Spatial is special

                    • spatial data = attributes, location, time
                    • mappable!
                    • spatial data must be referenced in space
                    • Tobler’s First Law of Geography


Saturday, March 10, 2012
Spatial analysis
                    • large data sets       a smaller amount of
                           meaningful information
                    • exploratory (ESDA)
                    • spatial statistics
                    • mathematical modeling and prediction of
                           spatial processes




Saturday, March 10, 2012
Techniques

                    • point pattern analysis -- hot spots, k
                           density, nearest neighbor
                    • spatial interpolation -- kriging
                    • spatial regression -- ordinary least squares,
                           geographically weighted regression




Saturday, March 10, 2012
Saturday, March 10, 2012
Saturday, March 10, 2012
Saturday, March 10, 2012
Saturday, March 10, 2012
Saturday, March 10, 2012
Saturday, March 10, 2012
Saturday, March 10, 2012
Saturday, March 10, 2012
PySAL

                    •      GeoDa Center at ASU

                    •      Python library for spatial analysis, with modules for
                           exploratory spatial data analysis, spatial
                           econometrics, and location modeling

                    •      http://code.google.com/p/pysal/

                    •      requires NumPy, SciPy




Saturday, March 10, 2012
PySAL
                    •      developers looking for spatial analytical methods
                           to incorporate in application development

                    •      analysts working on projects that require custom
                           scripting

                    •      looking for a user-friendly GUI? Try STARS,
                           GeoDA, GeoDASpace.

                    •      want to integrate into a powerful GIS? Look for
                           plug-ins for ArcGIS & QGIS.




Saturday, March 10, 2012
Saturday, March 10, 2012
Next steps

                    •      quantify clusters in city, region, nation

                    •      examine clusters along networks, business
                           corridors

                    •      create beautiful, interactive maps and charts to
                           allow users to explore spending patterns on their
                           own




Saturday, March 10, 2012
From data analysis to stories




Saturday, March 10, 2012
Which stories would we go
                                     after?
                    • construction contracts
                    • funding to charter schools
                    • health care costs in prisons
                    • local vs. regional vs. national purchases
                    • technology services -- look for overlap


Saturday, March 10, 2012
Want to learn more?
          The SAGE Handbook of
          Spatial Analysis
          eds. A. Stewart Fotheringham and
          Peter A. Rogerson

          Interactive Spatial Data
          Analysis
          Trevor Bailey and Tony Gatrell

          Geographic Information
          Analysis
          David O’Sullivan and David Unwin

          PySAL
          Luc Anselin, GeoDA Center
          Arizona State University           Mia, age 3, geographer in training




Saturday, March 10, 2012
And even more?
   NetworkX tutorial
   http://networkx.lanl.gov/
   networkx_tutorial.pdf

   UCD Dublin summer course
   http://mlg.ucd.ie/summer

   Social Network Analysis for
   Startups (O'Reilly Media)
   http://shop.oreilly.com/product/
   0636920020424.do




Saturday, March 10, 2012

Weitere ähnliche Inhalte

Andere mochten auch

The Django Book, Chapter 16: django.contrib
The Django Book, Chapter 16: django.contribThe Django Book, Chapter 16: django.contrib
The Django Book, Chapter 16: django.contribTzu-ping Chung
 
The Django Book Chapter 9 - Django Workshop - Taipei.py
The Django Book Chapter 9 - Django Workshop - Taipei.pyThe Django Book Chapter 9 - Django Workshop - Taipei.py
The Django Book Chapter 9 - Django Workshop - Taipei.pyTzu-ping Chung
 
NoSql Day - Chiusura
NoSql Day - ChiusuraNoSql Day - Chiusura
NoSql Day - ChiusuraWEBdeBS
 
Overview of Testing Talks at Pycon
Overview of Testing Talks at PyconOverview of Testing Talks at Pycon
Overview of Testing Talks at PyconJacqueline Kazil
 
2016 py con2016_lightingtalk_php to python
2016 py con2016_lightingtalk_php to python2016 py con2016_lightingtalk_php to python
2016 py con2016_lightingtalk_php to pythonJiho Lee
 
Django - The Web framework for perfectionists with deadlines
Django - The Web framework  for perfectionists with deadlinesDjango - The Web framework  for perfectionists with deadlines
Django - The Web framework for perfectionists with deadlinesMarkus Zapke-Gründemann
 
NoSql Day - Apertura
NoSql Day - AperturaNoSql Day - Apertura
NoSql Day - AperturaWEBdeBS
 
2007 - 应用系统脆弱性概论
2007 - 应用系统脆弱性概论 2007 - 应用系统脆弱性概论
2007 - 应用系统脆弱性概论 Na Lee
 
Super Advanced Python –act1
Super Advanced Python –act1Super Advanced Python –act1
Super Advanced Python –act1Ke Wei Louis
 
Authentication & Authorization in ASPdotNet MVC
Authentication & Authorization in ASPdotNet MVCAuthentication & Authorization in ASPdotNet MVC
Authentication & Authorization in ASPdotNet MVCMindfire Solutions
 

Andere mochten auch (18)

Vim for Mere Mortals
Vim for Mere MortalsVim for Mere Mortals
Vim for Mere Mortals
 
User-centered open source
User-centered open sourceUser-centered open source
User-centered open source
 
The Django Book, Chapter 16: django.contrib
The Django Book, Chapter 16: django.contribThe Django Book, Chapter 16: django.contrib
The Django Book, Chapter 16: django.contrib
 
Digesting jQuery
Digesting jQueryDigesting jQuery
Digesting jQuery
 
PyClab.__init__(self)
PyClab.__init__(self)PyClab.__init__(self)
PyClab.__init__(self)
 
The Django Book Chapter 9 - Django Workshop - Taipei.py
The Django Book Chapter 9 - Django Workshop - Taipei.pyThe Django Book Chapter 9 - Django Workshop - Taipei.py
The Django Book Chapter 9 - Django Workshop - Taipei.py
 
Bottle - Python Web Microframework
Bottle - Python Web MicroframeworkBottle - Python Web Microframework
Bottle - Python Web Microframework
 
Html5 History-API
Html5 History-APIHtml5 History-API
Html5 History-API
 
EuroDjangoCon 2009 - Ein Rückblick
EuroDjangoCon 2009 - Ein RückblickEuroDjangoCon 2009 - Ein Rückblick
EuroDjangoCon 2009 - Ein Rückblick
 
NoSql Day - Chiusura
NoSql Day - ChiusuraNoSql Day - Chiusura
NoSql Day - Chiusura
 
Overview of Testing Talks at Pycon
Overview of Testing Talks at PyconOverview of Testing Talks at Pycon
Overview of Testing Talks at Pycon
 
2016 py con2016_lightingtalk_php to python
2016 py con2016_lightingtalk_php to python2016 py con2016_lightingtalk_php to python
2016 py con2016_lightingtalk_php to python
 
Django - The Web framework for perfectionists with deadlines
Django - The Web framework  for perfectionists with deadlinesDjango - The Web framework  for perfectionists with deadlines
Django - The Web framework for perfectionists with deadlines
 
NoSql Day - Apertura
NoSql Day - AperturaNoSql Day - Apertura
NoSql Day - Apertura
 
PythonBrasil[8] closing
PythonBrasil[8] closingPythonBrasil[8] closing
PythonBrasil[8] closing
 
2007 - 应用系统脆弱性概论
2007 - 应用系统脆弱性概论 2007 - 应用系统脆弱性概论
2007 - 应用系统脆弱性概论
 
Super Advanced Python –act1
Super Advanced Python –act1Super Advanced Python –act1
Super Advanced Python –act1
 
Authentication & Authorization in ASPdotNet MVC
Authentication & Authorization in ASPdotNet MVCAuthentication & Authorization in ASPdotNet MVC
Authentication & Authorization in ASPdotNet MVC
 

Ähnlich wie PyCon 2012: Python for data lovers: explore it, analyze it, map it

Heartland 2050 meeting 3
Heartland 2050 meeting 3 Heartland 2050 meeting 3
Heartland 2050 meeting 3 Heartland2050
 
Case for open data in transit
Case for open data in  transitCase for open data in  transit
Case for open data in transitVitaly Vlasov
 
Slideshare Economic Development Overview 2012
Slideshare Economic Development Overview 2012Slideshare Economic Development Overview 2012
Slideshare Economic Development Overview 2012Nicholas Brake
 
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 KeynoteSmart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 KeynoteJoel Natividad
 
Traffic and Safety Accomplishments and Needs Report - Dec. 08, 2011
Traffic and Safety Accomplishments and Needs Report - Dec. 08, 2011Traffic and Safety Accomplishments and Needs Report - Dec. 08, 2011
Traffic and Safety Accomplishments and Needs Report - Dec. 08, 2011Utah Department of Transportation
 
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYCNYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYCJoel Natividad
 
URBAN TRAFFIC DATA HACK - ROLAND MAJOR
URBAN TRAFFIC DATA HACK - ROLAND MAJORURBAN TRAFFIC DATA HACK - ROLAND MAJOR
URBAN TRAFFIC DATA HACK - ROLAND MAJORBig Data Week
 
Tulsa Transit Board 10-30-2012
Tulsa Transit Board 10-30-2012Tulsa Transit Board 10-30-2012
Tulsa Transit Board 10-30-2012rtspincog
 
Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked .
 
Go Code Colorado: Inspiring Innovation & Opening Data
Go Code Colorado: Inspiring Innovation & Opening DataGo Code Colorado: Inspiring Innovation & Opening Data
Go Code Colorado: Inspiring Innovation & Opening DataSalesforce Developers
 
Linking Placemaking & Mobility to Health Impacts - Rounds Presentation @ U Of...
Linking Placemaking & Mobility to Health Impacts - Rounds Presentation @ U Of...Linking Placemaking & Mobility to Health Impacts - Rounds Presentation @ U Of...
Linking Placemaking & Mobility to Health Impacts - Rounds Presentation @ U Of...Anthony Smith
 
Open Data i Transport Públic
Open Data i Transport PúblicOpen Data i Transport Públic
Open Data i Transport PúblicAMTU
 
Linked_Open_Data_Rome_Netcamp_13
Linked_Open_Data_Rome_Netcamp_13Linked_Open_Data_Rome_Netcamp_13
Linked_Open_Data_Rome_Netcamp_13Michele Piunti
 
VTA Hack My Ride Lightning Talks
VTA Hack My Ride Lightning TalksVTA Hack My Ride Lightning Talks
VTA Hack My Ride Lightning TalksSCVTA
 
Local Open Data: a perspective from local government in England 2014
Local Open Data: a perspective from local government in England 2014Local Open Data: a perspective from local government in England 2014
Local Open Data: a perspective from local government in England 2014Gesche Schmid
 
Local Open Data: A perspective from local government in England by Gesche Schmid
Local Open Data: A perspective from local government in England by Gesche SchmidLocal Open Data: A perspective from local government in England by Gesche Schmid
Local Open Data: A perspective from local government in England by Gesche SchmidOpening-up.eu
 
Pebs14 what's in your neighbor's garage workshop
Pebs14   what's in your neighbor's garage workshopPebs14   what's in your neighbor's garage workshop
Pebs14 what's in your neighbor's garage workshopBirgit Hess
 

Ähnlich wie PyCon 2012: Python for data lovers: explore it, analyze it, map it (20)

Heartland 2050 meeting 3
Heartland 2050 meeting 3 Heartland 2050 meeting 3
Heartland 2050 meeting 3
 
Case for open data in transit
Case for open data in  transitCase for open data in  transit
Case for open data in transit
 
Slideshare Economic Development Overview 2012
Slideshare Economic Development Overview 2012Slideshare Economic Development Overview 2012
Slideshare Economic Development Overview 2012
 
Harf co economic dev presentation 4.26.13
Harf co economic dev presentation 4.26.13Harf co economic dev presentation 4.26.13
Harf co economic dev presentation 4.26.13
 
Technical update: Local Waste Service Standards Project | Paul MacKay | Octob...
Technical update: Local Waste Service Standards Project | Paul MacKay | Octob...Technical update: Local Waste Service Standards Project | Paul MacKay | Octob...
Technical update: Local Waste Service Standards Project | Paul MacKay | Octob...
 
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 KeynoteSmart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
 
Traffic and Safety Accomplishments and Needs Report - Dec. 08, 2011
Traffic and Safety Accomplishments and Needs Report - Dec. 08, 2011Traffic and Safety Accomplishments and Needs Report - Dec. 08, 2011
Traffic and Safety Accomplishments and Needs Report - Dec. 08, 2011
 
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYCNYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
 
URBAN TRAFFIC DATA HACK - ROLAND MAJOR
URBAN TRAFFIC DATA HACK - ROLAND MAJORURBAN TRAFFIC DATA HACK - ROLAND MAJOR
URBAN TRAFFIC DATA HACK - ROLAND MAJOR
 
Tulsa Transit Board 10-30-2012
Tulsa Transit Board 10-30-2012Tulsa Transit Board 10-30-2012
Tulsa Transit Board 10-30-2012
 
Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011
 
CK2017: Unified Geospatial Centre to Manage Cities Better
CK2017: Unified Geospatial Centre to Manage Cities Better CK2017: Unified Geospatial Centre to Manage Cities Better
CK2017: Unified Geospatial Centre to Manage Cities Better
 
Go Code Colorado: Inspiring Innovation & Opening Data
Go Code Colorado: Inspiring Innovation & Opening DataGo Code Colorado: Inspiring Innovation & Opening Data
Go Code Colorado: Inspiring Innovation & Opening Data
 
Linking Placemaking & Mobility to Health Impacts - Rounds Presentation @ U Of...
Linking Placemaking & Mobility to Health Impacts - Rounds Presentation @ U Of...Linking Placemaking & Mobility to Health Impacts - Rounds Presentation @ U Of...
Linking Placemaking & Mobility to Health Impacts - Rounds Presentation @ U Of...
 
Open Data i Transport Públic
Open Data i Transport PúblicOpen Data i Transport Públic
Open Data i Transport Públic
 
Linked_Open_Data_Rome_Netcamp_13
Linked_Open_Data_Rome_Netcamp_13Linked_Open_Data_Rome_Netcamp_13
Linked_Open_Data_Rome_Netcamp_13
 
VTA Hack My Ride Lightning Talks
VTA Hack My Ride Lightning TalksVTA Hack My Ride Lightning Talks
VTA Hack My Ride Lightning Talks
 
Local Open Data: a perspective from local government in England 2014
Local Open Data: a perspective from local government in England 2014Local Open Data: a perspective from local government in England 2014
Local Open Data: a perspective from local government in England 2014
 
Local Open Data: A perspective from local government in England by Gesche Schmid
Local Open Data: A perspective from local government in England by Gesche SchmidLocal Open Data: A perspective from local government in England by Gesche Schmid
Local Open Data: A perspective from local government in England by Gesche Schmid
 
Pebs14 what's in your neighbor's garage workshop
Pebs14   what's in your neighbor's garage workshopPebs14   what's in your neighbor's garage workshop
Pebs14 what's in your neighbor's garage workshop
 

Kürzlich hochgeladen

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Kürzlich hochgeladen (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

PyCon 2012: Python for data lovers: explore it, analyze it, map it

  • 1. Python for Open Data Lovers: Explore It, Analyze It, Map It Jackie Kazil Dana Bauer @jackiekazil @geography76 Saturday, March 10, 2012
  • 6. Where are we going? • open data everywhere • a data swiss army knife • finding network patterns • finding spatial patterns • which stories to pursue? moving beyond data analysis Saturday, March 10, 2012
  • 7. Data.gov • OpenDataPhilly • DC Data Catalog • DataSF • Chicago Data Portal • NYC Open Data • London Datastore Saturday, March 10, 2012
  • 8. assembly member expenses bicycle lanes city purchase orders dialysis centers elevation data filming locations Google Transit Feed Specification (GTFS) historical photos influenza rates judicial districts Key Stage 2 test results by free school meal eligibility land cover monthly calls to Human Services Agency switchboard operators neighborhood health clinics Oyster ticket stop locations political districts quality of life indicators restaurant inspections sewer lines traffic counts utility excavation and paving five-year plan violent crime incidents ward offices youth centers zoning **real-time parking availability and pricing** Saturday, March 10, 2012
  • 12. • What are DC agencies spending money on? • How much are they spending? • What are the relationships between businesses and agencies? • Where are these businesses located? Saturday, March 10, 2012
  • 14. swiss army knife • csvkit: http://csvkit.readthedocs.org/ • a set of Python utilities for working with csv • meant to replace csv module • pip install csvkit (no issues!) Saturday, March 10, 2012
  • 15. $ csvcut -n purchase2011_cleaned.csv 1: PO_NUMBER 2: AGENCY_NAME 3: NIGP_DESCRIPTION 4: PO_TOTAL_AMOUNT 5: ORDER_DATE 6: SUPPLIER 7: SUPPLIER_FULL_ADDRESS ! ! ! Saturday, March 10, 2012
  • 16. $ csvcut -c 2,6 purchase2011_cleaned.csv | csvstat 1. AGENCY_NAME ! <type 'unicode'> ! Nulls: False ! Unique values: 85 ! 5 most frequent values: ! ! DISTRICT OF COLUMBIA PUBLIC SCHOOLS:!2410 ! ! STATE SUPERINTENDENT OF EDUCATION (OSSE):! 1340 ! ! DEPARTMENT OF HEALTH:! 895 ! ! OFFICE OF CHIEF TECHNOLOGY OFFICER:! 786 ! ! OFF PUBLIC ED FACILITIES MODERNIZATION:!722 ! Max length: 40 2. SUPPLIER ! <type 'unicode'> ! Nulls: False ! Unique values: 4357 ! 5 most frequent values: ! ! OST, INC.:! 841 ! ! DELL COMPUTER CORP.:! 366 ! ! AMERICAN EXPRESS COMPANY:! 282 ! ! MVS, INC.:! 176 ! ! CAPITAL SERVICES AND SUPPLIES:! 167 ! Max length: 52 Row count: 16075 ! ! ! Saturday, March 10, 2012
  • 17. $ csvgrep -c 6 -r ^MAYA purchase2011_cleaned.csv PO_NUMBER,AGENCY_NAME,NIGP_DESCRIPTION,PO_TOTAL_AMOUNT,ORDER_DATE,SUPPLIER,SUPPLIER_FULL_ADDRESS PO352244,STATE SUPERINTENDENT OF EDUCATION (OSSE),EDUCATION AND TRAINING CONSULTING 38,408644.73,01/04/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001" PO352652,STATE SUPERINTENDENT OF EDUCATION (OSSE),EDUCATION AND TRAINING CONSULTING 38,111679.16,01/07/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001" PO352920,PUBLIC CHARTER SCHOOLS,SCHOOL OPERATION AND MANAGEMENT SERVICES 71,2205630.13,01/11/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001" PO355150,STATE SUPERINTENDENT OF EDUCATION (OSSE),EDUCATION AND TRAINING CONSULTING 38,391092.49,02/07/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001" PO356426,STATE SUPERINTENDENT OF EDUCATION (OSSE),FINANCIAL SERVICES (NOT OTHERWISE CLASSIFIED) 49,999891,02/23/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001" PO356632,STATE SUPERINTENDENT OF EDUCATION (OSSE),PROFESSIONAL SERVICES (NOT OTHERWISE CLASSIFIED) 58,187200,02/25/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001" PO359961,PUBLIC CHARTER SCHOOLS,SCHOOL OPERATION AND MANAGEMENT SERVICES 71,1753238,04/12/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001" PO360284,STATE SUPERINTENDENT OF EDUCATION (OSSE),EDUCATION AND TRAINING CONSULTING 38,110729.88,04/14/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001" PO361203,STATE SUPERINTENDENT OF EDUCATION (OSSE),EDUCATION AND TRAINING CONSULTING 38,92617.32,04/28/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001" PO351462-V2,STATE SUPERINTENDENT OF EDUCATION (OSSE),EDUCATIONAL RESEARCH SERVICES 19,152229.95,05/05/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001" PO364208,STATE SUPERINTENDENT OF EDUCATION (OSSE),EDUCATION AND TRAINING CONSULTING 38,118825.51,06/09/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001" PO366839,PUBLIC CHARTER SCHOOLS,SCHOOL OPERATION AND MANAGEMENT SERVICES 71,2767027,07/12/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001" PO365094-V2,STATE SUPERINTENDENT OF EDUCATION (OSSE),YOUTH CARE SERVICES 95,98092.35,08/15/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001" PO370948,STATE SUPERINTENDENT OF EDUCATION (OSSE),YOUTH CARE SERVICES 95,45736.58,08/25/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001" PO361027-V5,STATE SUPERINTENDENT OF EDUCATION (OSSE),EDUCATION AND TRAINING CONSULTING 38,29424.86,09/06/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001" PO374132,STATE SUPERINTENDENT OF EDUCATION (OSSE),YOUTH CARE SERVICES 95,9000,09/28/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001" PO377919,STATE SUPERINTENDENT OF EDUCATION (OSSE),YOUTH CARE SERVICES 95,491663.6,10/25/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001" PO381219,STATE SUPERINTENDENT OF EDUCATION (OSSE),EDUCATION AND TRAINING CONSULTING 38,120188.81,11/29/2011,MAYA ANGELOU PCS,"1851 9TH STREET NW, WASHINGTON, DC, 20001" PO383965,STATE SUPERINTENDENT OF EDUCATION (OSSE),YOUTH CARE SERVICES 95,294690.57,12/22/2011,MAYA ANGELOU PCS,"1436 U STREET, NW SUITE 203, WASHINGTON, DC, 20009" ! ! ! Saturday, March 10, 2012
  • 18. $ csvcut -c 4,2,6,5 purchase2011_cleaned.csv | csvsort -r | head -n 20 | csvlook ------------------------------------------------------------------------------------------------------------ | PO_TOTAL_AMOUNT | AGENCY_NAME | SUPPLIER | ORDER_DATE | ------------------------------------------------------------------------------------------------------------ | 154133337.02 | DEPARTMENT OF TRANSPORTATION | SKANSKA-FACCHINA JV | 2011-11-10 | | 62677473.88 | DEPARTMENT OF REAL ESTATE SERVICES | EEC OF DC INC-FORRESTER CONSTR | 2011-09-22 | | 31809425.48 | DEPARTMENT OF HEALTH | DEFENSE LOGISTIC AGENCY | 2011-09-08 | | 23600580.0 | DEPARTMENT OF CORRECTIONS | UNITY HEALTH CARE, INC. | 2011-10-24 | | 23538552.0 | DEPARTMENT OF REAL ESTATE SERVICES | EEC-FORRESTER ANACOSTIA | 2011-11-08 | | 22375314.45 | DEPARTMENT OF CORRECTIONS | CORRECTIONS CORPORATION OF | 2011-05-25 | | 21450000.04 | DEPARTMENT OF HUMAN SERVICES | THE COMMUNITY PARTNERSHIPHOME | 2011-08-18 | | 20813348.99 | DEPARTMENT OF REAL ESTATE SERVICES | THE JOHN AKRIDGE CO | 2011-06-28 | | 20622000.0 | DEPARTMENT OF TRANSPORTATION | W M SCHLOSSER CO INC | 2011-08-29 | | 19824914.0 | DEPARTMENT OF CORRECTIONS | CORRECTIONS CORPORATION OF | 2011-10-24 | | 18300956.56 | DEPARTMENT OF HUMAN SERVICES | THE COMMUNITY PARTNERSHIPHOME | 2011-11-29 | | 18104339.98 | DEPARTMENT OF HUMAN SERVICES | THE COMMUNITY PARTNERSHIPHOME | 2011-05-17 | | 18000000.0 | DEPARTMENT OF HEALTH | DC PRIMARY CARE ASSOCIATION | 2011-03-10 | | 17000000.0 | DEPARTMENT OF HEALTH | CHILDRENS NATIONAL MEDICAL CTR | 2011-11-25 | | 16850000.0 | DEPUTY MAYOR FOR ECONOMIC DEVELOPMENT | 2 M STREET REDEVELOPMENT LLC | 2011-09-29 | | 16333257.33 | DEPARTMENT OF HUMAN SERVICES | THE COMMUNITY PARTNERSHIPHOME | 2011-06-02 | | 14206937.0 | PUBLIC CHARTER SCHOOLS | FRIENDSHIP PCS | 2011-07-12 | | 13862557.44 | MUNICIPAL FACILITIES: NON-CAPITAL | US SECURITY ASSOCIATES, INC. | 2011-10-07 | | 13800000.0 | DISTRICT DEPARTMENT OF THE ENVIRONMENT | VERMONT ENERGY INVESTMENT CORP | 2011-10-04 | ------------------------------------------------------------------------------------------------------------ ! ! ! Saturday, March 10, 2012
  • 19. Social Network Analysis “Social network analysis is focused on uncovering the patterning of people's interaction.” - http://www.insna.org/sna/what.html Saturday, March 10, 2012
  • 20. 99th House President: Reagan House majority: Democrats Years: 1985, 1986 Saturday, March 10, 2012
  • 21. 107th House President: Bush House majority: Republicans Years: 2001, 2002 Saturday, March 10, 2012
  • 22. 108th House President: Bush House majority: Republicans Years: 2003, 2004 Saturday, March 10, 2012
  • 23. 109th House President: Bush House majority: Republicans Years: 2005, 2006 Saturday, March 10, 2012
  • 24. 110th House President: Bush House majority: Democrats Years: 2007, 2008 Saturday, March 10, 2012
  • 25. 111th House President: Obama House majority: Democrats Years: 2009, 2010 Saturday, March 10, 2012
  • 26. CSV to network import networkx as nx G = nx.Graph() node_edgelist = [] # grab edges for row in csv_file: node_edgelist.append((n,e)) # create edges for f in node_edgelist: for t in node_edgelist: if t != f: add_edge_or_weight(G, f[0], t[0]) Saturday, March 10, 2012
  • 27. Centrality Analysis (networkx) Degree - nx.degree(G) # of connections; More connections = more important Closeness centrality nx.closeness_centrality(G) Distance to all other nodes; Closer = more important Betweenness centrality nx.betweenness_centrality(G) Based on the shortest path of info control Page rank nx.pagerank(G) Node gains importance via the importance around him Saturday, March 10, 2012
  • 29. Centrality Analysis (networkx) Digi Docs Inc Document Mangers (Dallas) “Offers software that generates loan documents for electronic delivery.” Iron Mountain (Mountain View) “Iron Mountain provides information management services that help organizations lower the costs, risks and inefficiencies of managing their physical and digital data.” MVS, Inc. (Washington, DC) “MVS Consulting is an 8(a) STARS II, HUBZone, LSDBE, CBE, and MBE IT Solutions company that provides IT solutions to Federal, State and Local Government Agencies.” MDM OFFICE SYSTEMS INC (Washington, DC) "Standard Office Supply - Office Supplies, Furniture Dealer, Educational Products, Breakroom Supplies, Imaging Supplies, and Coffee Services" Capital Services and Supplies (Washington, DC) “CSSI is an office solutions firm located in Washington, DC since 1980. CSSI’s goods and services are available to commercial, government, and educational institutions throughout the continental United States.” Saturday, March 10, 2012
  • 30. Centrality Analysis (networkx) Not included in previous slide... United States Postal Service & Dell Computer Corp Saturday, March 10, 2012
  • 31. Visual the network pos=nx.spring_layout(G,iterations=100) plot.figure(1,figsize=(15,15)) plt.axis('off') nx.draw_networkx_nodes( G, pos,node_size=100, alpha=1, node_color='g' ) nx.draw_networkx_edges(G,pos,alpha=0.2) plot.savefig('graph.png') Saturday, March 10, 2012
  • 33. Trimming nodes g2 = G.copy() d = nx.degree(g2) for n in g2.nodes(): if d[n] <= degree: g2.remove_node(n) return g2 Saturday, March 10, 2012
  • 34. Degree Distribution d=nx.degree(G) plot.figure(1,figsize=(15,10)) h=plot.hist(d.values(),100) Saturday, March 10, 2012
  • 39. nx.draw_networkx_labels (g3,pos,alpha=1) nx.draw_networkx_edges (g3,pos,alpha=0.05) Saturday, March 10, 2012
  • 40. Maps to maps Saturday, March 10, 2012
  • 41. Spatial is special • spatial data = attributes, location, time • mappable! • spatial data must be referenced in space • Tobler’s First Law of Geography Saturday, March 10, 2012
  • 42. Spatial analysis • large data sets a smaller amount of meaningful information • exploratory (ESDA) • spatial statistics • mathematical modeling and prediction of spatial processes Saturday, March 10, 2012
  • 43. Techniques • point pattern analysis -- hot spots, k density, nearest neighbor • spatial interpolation -- kriging • spatial regression -- ordinary least squares, geographically weighted regression Saturday, March 10, 2012
  • 52. PySAL • GeoDa Center at ASU • Python library for spatial analysis, with modules for exploratory spatial data analysis, spatial econometrics, and location modeling • http://code.google.com/p/pysal/ • requires NumPy, SciPy Saturday, March 10, 2012
  • 53. PySAL • developers looking for spatial analytical methods to incorporate in application development • analysts working on projects that require custom scripting • looking for a user-friendly GUI? Try STARS, GeoDA, GeoDASpace. • want to integrate into a powerful GIS? Look for plug-ins for ArcGIS & QGIS. Saturday, March 10, 2012
  • 55. Next steps • quantify clusters in city, region, nation • examine clusters along networks, business corridors • create beautiful, interactive maps and charts to allow users to explore spending patterns on their own Saturday, March 10, 2012
  • 56. From data analysis to stories Saturday, March 10, 2012
  • 57. Which stories would we go after? • construction contracts • funding to charter schools • health care costs in prisons • local vs. regional vs. national purchases • technology services -- look for overlap Saturday, March 10, 2012
  • 58. Want to learn more? The SAGE Handbook of Spatial Analysis eds. A. Stewart Fotheringham and Peter A. Rogerson Interactive Spatial Data Analysis Trevor Bailey and Tony Gatrell Geographic Information Analysis David O’Sullivan and David Unwin PySAL Luc Anselin, GeoDA Center Arizona State University Mia, age 3, geographer in training Saturday, March 10, 2012
  • 59. And even more? NetworkX tutorial http://networkx.lanl.gov/ networkx_tutorial.pdf UCD Dublin summer course http://mlg.ucd.ie/summer Social Network Analysis for Startups (O'Reilly Media) http://shop.oreilly.com/product/ 0636920020424.do Saturday, March 10, 2012