SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Downloaden Sie, um offline zu lesen
Stat405
              Visualising time & space


                             Hadley Wickham
Wednesday, 21 October 2009
1. New data: baby names by state
               2. Visualise time (done!)
               3. Visualise time conditional on space
               4. Visualise space
               5. Visualise space conditional on time



Wednesday, 21 October 2009
Baby names by state
                   Top 100 male and female baby names
                   for each state, 1960–2008.
                   480,000 records (100 * 50 * 2 * 48)
                   Slightly different variables: state, year,
                   name, sex and number.
                   To keep the data manageable, we will
                   look at the top 25 names of all time.

                                           CC BY http://www.flickr.com/photos/the_light_show/2586781132
Wednesday, 21 October 2009
Subset
                   Easier to compare states if we have
                   proportions. To calculate proportions,
                   need births. But could only find data from
                   1981.
                   Then selected 30 names that occurred
                   fairly frequently, and had interesting
                   patterns.


Wednesday, 21 October 2009
Aaron Alex Allison Alyssa Angela Ashley
                   Carlos Chelsea Christian Eric Evan
                   Gabriel Jacob Jared Jennifer Jonathan
                   Juan Katherine Kelsey Kevin Matthew
                   Michelle Natalie Nicholas Noah Rebecca
                   Sara Sarah Taylor Thomas




Wednesday, 21 October 2009
Getting started

               library(ggplot2)
               library(plyr)

               bnames <- read.csv("interesting-names.csv",
                 stringsAsFactors = F)

               matthew <- subset(bnames, name == "Matthew")




Wednesday, 21 October 2009
Time | Space



Wednesday, 21 October 2009
0.04




        0.03
 prop




        0.02




        0.01




                             1985   1990          1995   2000   2005
                                           year
Wednesday, 21 October 2009
0.04




        0.03
 prop




        0.02




        0.01




                             1985   1990   1995   2000     2005
qplot(year, prop, data = matthew,year
                                  geom = "line", group = state)
Wednesday, 21 October 2009
AK        AL         AR         AZ          CA        CO         CT         DC
        0.04
        0.03
        0.02
        0.01
                    DE        FL         GA         HI          IA        ID         IL         IN
        0.04
        0.03
        0.02
        0.01
                    KS        KY         LA         MA          MD        ME         MI         MN
        0.04
        0.03
        0.02
        0.01
                   MO        MS          MT         NC          NE        NH         NJ         NM
        0.04
 prop




        0.03
        0.02
        0.01
                    NV        NY         OH         OK          OR        PA         RI         SC
        0.04
        0.03
        0.02
        0.01
                    SD        TN         TX         UT          VA        VT         WA         WI
        0.04
        0.03
        0.02
        0.01
                   WV        WY
        0.04
        0.03
        0.02
        0.01
               198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000
                 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005
                                                         year
Wednesday, 21 October 2009
AK        AL         AR         AZ         CA         CO         CT         DC
        0.04
        0.03
        0.02
        0.01
                    DE        FL         GA         HI         IA         ID         IL         IN
        0.04
        0.03
        0.02
        0.01
                    KS        KY         LA         MA         MD         ME         MI         MN
        0.04
        0.03
        0.02
        0.01
                   MO        MS          MT         NC         NE         NH         NJ         NM
        0.04
 prop




        0.03
        0.02
        0.01
                    NV        NY         OH         OK         OR         PA         RI         SC
        0.04
        0.03
        0.02
        0.01
                    SD        TN         TX         UT         VA         VT         WA         WI
        0.04
        0.03
        0.02
        0.01
                   WV        WY
        0.04
        0.03
        0.02
        0.01
               198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000
                 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005

last_plot() + facet_wrap(~ state)year
Wednesday, 21 October 2009
Your turn

                   Pick some names out of the list and
                   explore. What do you see?
                   Can you write a function that plots the
                   trend for a given name?




Wednesday, 21 October 2009
show_name <- function(name) {
       name <- bnames[bnames$name == name, ]
       qplot(year, prop, data = name, geom = "line",
         group = state)
     }

     show_name("Jessica")
     show_name("Aaron")
     show_name("Juan") + facet_wrap(~ state)




Wednesday, 21 October 2009
0.04




        0.03
 prop




        0.02




        0.01




                             1985   1990          1995   2000   2005
                                           year
Wednesday, 21 October 2009
0.04




        0.03
 prop




        0.02




        0.01




qplot(year, prop, data = matthew, geom1995 "line", 2000
                1985       1990         =          group = state) +
                                                              2005
  geom_smooth(aes(group = 1), se year size = 3)
                                 = F,
Wednesday, 21 October 2009
0.04




        0.03
 prop




        0.02




        0.01


                         So we only get one smooth
                             for the whole dataset
qplot(year, prop, data = matthew, geom1995 "line", 2000
                1985       1990          =         group = state) +
                                                              2005
  geom_smooth(aes(group = 1), se year size = 3)
                                   = F,
Wednesday, 21 October 2009
Two useful tools
                   Smoothing: can be easier to perceive
                   overall trend by smoothing individual
                   functions
                   Indexing: remove initial differences by
                   indexing to the first value.
                   Not that useful here, but good tools to
                   have in your toolbox.


Wednesday, 21 October 2009
library(mgcv)
     smooth <- function(y, x) {
       as.numeric(predict(gam(y ~ s(x))))
     }

     matthew <- ddply(matthew, "state", transform,
       prop_s = smooth(prop, year))

     qplot(year, prop_s, data = matthew, geom = "line",
       group = state)




Wednesday, 21 October 2009
index <- function(y, x) {
       y / y[order(x)[1]]
     }

     matthew <- ddply(matthew, "state", transform,
       prop_i = index(prop, year),
       prop_si = index(prop_s, year))

     qplot(year, prop_i, data = matthew, geom = "line",
       group = state)
     qplot(year, prop_si, data = matthew, geom = "line",
       group = state)


Wednesday, 21 October 2009
Your turn
                   Create a plot to show all names
                   simultaneously. Does smoothing every
                   name in every state make it easier to see
                   patterns?
                   Hint: run the following R code on the next
                   slide to eliminate names with less than 10
                   years of data


Wednesday, 21 October 2009
longterm <- ddply(bnames, c("name", "state"),
     function(df) {
        if (nrow(df) > 10) {
          df
        }
     })




Wednesday, 21 October 2009
qplot(year, prop, data = bnames, geom = "line",
       group = state, alpha = I(1 / 4)) +
       facet_wrap(~ name)

     longterm <- ddply(longterm, c("name", "state"),
       transform, prop_s = smooth(prop, year))

     qplot(year, prop_s, data = longterm, geom = "line",
       group = state, alpha = I(1 / 4)) +
       facet_wrap(~ name)
     last_plot() + facet_wrap(scales = "free_y")



Wednesday, 21 October 2009
Space



Wednesday, 21 October 2009
Spatial plots

                   Choropleth map:
                   map colour of areas to value.
                   Proportional symbol map:
                   map size of symbols to value




Wednesday, 21 October 2009
juan2000 <- subset(bnames, name == "Juan" & year == 2000)

     # Turn map data into normal data frame
     library(maps)
     states <- map_data("state")
     states$state <- state.abb[match(states$region,
       tolower(state.name))]

     # Merge and then restore original order
     choropleth <- merge(juan2000, states,
       by = "state", all.y = T)
     choropleth <- choropleth[order(choropleth$order), ]

     # Plot with polygons
     qplot(long, lat, data = choropleth, geom = "polygon",
       fill = prop, group = group)

Wednesday, 21 October 2009
45




       40
                                                                 prop
                                                                        0.004
                                                                        0.006
 lat




                                                                        0.008
       35
                                                                         0.01




       30




                       −120   −110   −100      −90   −80   −70
                                        long
Wednesday, 21 October 2009
What’s the problem
                                                   with this map?
       45                                      How could we fix it?


       40
                                                                     prop
                                                                            0.004
                                                                            0.006
 lat




                                                                            0.008
       35
                                                                             0.01




       30




                       −120   −110   −100      −90    −80    −70
                                        long
Wednesday, 21 October 2009
ggplot(choropleth, aes(long, lat, group = group)) +
       geom_polygon(fill = "white", colour = "grey50") +
       geom_polygon(aes(fill = prop))




Wednesday, 21 October 2009
45




       40
                                                                 prop
                                                                        0.004
                                                                        0.006
 lat




                                                                        0.008
       35
                                                                         0.01




       30




                       −120   −110   −100      −90   −80   −70
                                        long
Wednesday, 21 October 2009
Problems?

                   What are the problems with this sort of
                   plot?
                   Take one minute to brainstorm some
                   possible issues.




Wednesday, 21 October 2009
Problems
                   Big areas most striking. But in the US (as
                   with most countries) big areas tend to
                   least populated. Most populated areas
                   tend to be small and dense - e.g. the East
                   coast.
                   (Another computational problem: need to
                   push around a lot of data to create these
                   plots)


Wednesday, 21 October 2009
●




       45
                        ●

                                                                                 ●




                                                                                         ●
                                                  ●




       40                                                     ●
                                                                                     ●



                                            ●                                                      prop
                                 ●                    ●
                                                                                                    ●     0.004
                             ●                                                                     ●      0.006
 lat




                                                      ●                     ●
                                                                                                   ●      0.008
       35
                                     ●      ●                                                      ●      0.010

                                                                       ●




                                                ●
       30

                                                                   ●




                       −120          −110       −100         −90           −80               −70
                                                      long
Wednesday, 21 October 2009
mid_range <- function(x) mean(range(x))
     centres <- ddply(states, c("state"), summarise,
       lat = mid_range(lat), long = mid_range(long))

     bubble <- merge(juan2000, centres, by = "state")
     qplot(long, lat, data = bubble,
       size = prop, colour = prop)

     ggplot(bubble, aes(long, lat)) +
       geom_polygon(aes(group = group), data = states,
         fill = NA, colour = "grey50") +
       geom_point(aes(size = prop, colour = prop))


Wednesday, 21 October 2009
Your turn


                   Replicate either a choropleth or a
                   proportional symbol map with the name
                   of your choice.




Wednesday, 21 October 2009
Space | Time



Wednesday, 21 October 2009
Wednesday, 21 October 2009
Your turn


                   Try and create this plot yourself. What is
                   the main difference between this plot and
                   the previous?




Wednesday, 21 October 2009
juan <- subset(bnames, name == "Juan")
     bubble <- merge(juan, centres, by = "state")

     ggplot(bubble, aes(long, lat)) +
       geom_polygon(aes(group = group), data = states,
         fill = NA, colour = "grey50") +
       geom_point(aes(size = prop, colour = prop)) +
       facet_wrap(~ year)




Wednesday, 21 October 2009
Aside: geographic data

                   Boundaries for most countries available
                   from: http://gadm.org
                   To use with ggplot2, use the fortify
                   function to convert to usual data frame.
                   Will also need to install the sp package.



Wednesday, 21 October 2009
# install.packages("sp")

     library(sp)
     load(url("http://gadm.org/data/rda/CHE_adm1.RData"))

     head(as.data.frame(gadm))
     ch <- fortify(gadm, region = "ID_1")
     str(ch)

     qplot(long, lat, group = group, data = ch,
       geom = "polygon", colour = I("white"))



Wednesday, 21 October 2009
Wednesday, 21 October 2009
This work is licensed under the Creative
      Commons Attribution-Noncommercial 3.0 United
      States License. To view a copy of this license,
      visit http://creativecommons.org/licenses/by-nc/
      3.0/us/ or send a letter to Creative Commons,
      171 Second Street, Suite 300, San Francisco,
      California, 94105, USA.




Wednesday, 21 October 2009

Weitere ähnliche Inhalte

Mehr von Hadley Wickham (20)

22 spam
22 spam22 spam
22 spam
 
21 spam
21 spam21 spam
21 spam
 
20 date-times
20 date-times20 date-times
20 date-times
 
19 tables
19 tables19 tables
19 tables
 
18 cleaning
18 cleaning18 cleaning
18 cleaning
 
17 polishing
17 polishing17 polishing
17 polishing
 
16 critique
16 critique16 critique
16 critique
 
14 case-study
14 case-study14 case-study
14 case-study
 
13 case-study
13 case-study13 case-study
13 case-study
 
12 adv-manip
12 adv-manip12 adv-manip
12 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
10 simulation
10 simulation10 simulation
10 simulation
 
10 simulation
10 simulation10 simulation
10 simulation
 
09 bootstrapping
09 bootstrapping09 bootstrapping
09 bootstrapping
 
08 functions
08 functions08 functions
08 functions
 
07 problem-solving
07 problem-solving07 problem-solving
07 problem-solving
 
06 data
06 data06 data
06 data
 
05 subsetting
05 subsetting05 subsetting
05 subsetting
 
04 reports
04 reports04 reports
04 reports
 

Kürzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 

Kürzlich hochgeladen (20)

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

15 Time Space

  • 1. Stat405 Visualising time & space Hadley Wickham Wednesday, 21 October 2009
  • 2. 1. New data: baby names by state 2. Visualise time (done!) 3. Visualise time conditional on space 4. Visualise space 5. Visualise space conditional on time Wednesday, 21 October 2009
  • 3. Baby names by state Top 100 male and female baby names for each state, 1960–2008. 480,000 records (100 * 50 * 2 * 48) Slightly different variables: state, year, name, sex and number. To keep the data manageable, we will look at the top 25 names of all time. CC BY http://www.flickr.com/photos/the_light_show/2586781132 Wednesday, 21 October 2009
  • 4. Subset Easier to compare states if we have proportions. To calculate proportions, need births. But could only find data from 1981. Then selected 30 names that occurred fairly frequently, and had interesting patterns. Wednesday, 21 October 2009
  • 5. Aaron Alex Allison Alyssa Angela Ashley Carlos Chelsea Christian Eric Evan Gabriel Jacob Jared Jennifer Jonathan Juan Katherine Kelsey Kevin Matthew Michelle Natalie Nicholas Noah Rebecca Sara Sarah Taylor Thomas Wednesday, 21 October 2009
  • 6. Getting started library(ggplot2) library(plyr) bnames <- read.csv("interesting-names.csv", stringsAsFactors = F) matthew <- subset(bnames, name == "Matthew") Wednesday, 21 October 2009
  • 7. Time | Space Wednesday, 21 October 2009
  • 8. 0.04 0.03 prop 0.02 0.01 1985 1990 1995 2000 2005 year Wednesday, 21 October 2009
  • 9. 0.04 0.03 prop 0.02 0.01 1985 1990 1995 2000 2005 qplot(year, prop, data = matthew,year geom = "line", group = state) Wednesday, 21 October 2009
  • 10. AK AL AR AZ CA CO CT DC 0.04 0.03 0.02 0.01 DE FL GA HI IA ID IL IN 0.04 0.03 0.02 0.01 KS KY LA MA MD ME MI MN 0.04 0.03 0.02 0.01 MO MS MT NC NE NH NJ NM 0.04 prop 0.03 0.02 0.01 NV NY OH OK OR PA RI SC 0.04 0.03 0.02 0.01 SD TN TX UT VA VT WA WI 0.04 0.03 0.02 0.01 WV WY 0.04 0.03 0.02 0.01 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 year Wednesday, 21 October 2009
  • 11. AK AL AR AZ CA CO CT DC 0.04 0.03 0.02 0.01 DE FL GA HI IA ID IL IN 0.04 0.03 0.02 0.01 KS KY LA MA MD ME MI MN 0.04 0.03 0.02 0.01 MO MS MT NC NE NH NJ NM 0.04 prop 0.03 0.02 0.01 NV NY OH OK OR PA RI SC 0.04 0.03 0.02 0.01 SD TN TX UT VA VT WA WI 0.04 0.03 0.02 0.01 WV WY 0.04 0.03 0.02 0.01 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 19902000 198519952005 19902000 198519952005 19902000 198519952005 19902000 198519952005 last_plot() + facet_wrap(~ state)year Wednesday, 21 October 2009
  • 12. Your turn Pick some names out of the list and explore. What do you see? Can you write a function that plots the trend for a given name? Wednesday, 21 October 2009
  • 13. show_name <- function(name) { name <- bnames[bnames$name == name, ] qplot(year, prop, data = name, geom = "line", group = state) } show_name("Jessica") show_name("Aaron") show_name("Juan") + facet_wrap(~ state) Wednesday, 21 October 2009
  • 14. 0.04 0.03 prop 0.02 0.01 1985 1990 1995 2000 2005 year Wednesday, 21 October 2009
  • 15. 0.04 0.03 prop 0.02 0.01 qplot(year, prop, data = matthew, geom1995 "line", 2000 1985 1990 = group = state) + 2005 geom_smooth(aes(group = 1), se year size = 3) = F, Wednesday, 21 October 2009
  • 16. 0.04 0.03 prop 0.02 0.01 So we only get one smooth for the whole dataset qplot(year, prop, data = matthew, geom1995 "line", 2000 1985 1990 = group = state) + 2005 geom_smooth(aes(group = 1), se year size = 3) = F, Wednesday, 21 October 2009
  • 17. Two useful tools Smoothing: can be easier to perceive overall trend by smoothing individual functions Indexing: remove initial differences by indexing to the first value. Not that useful here, but good tools to have in your toolbox. Wednesday, 21 October 2009
  • 18. library(mgcv) smooth <- function(y, x) { as.numeric(predict(gam(y ~ s(x)))) } matthew <- ddply(matthew, "state", transform, prop_s = smooth(prop, year)) qplot(year, prop_s, data = matthew, geom = "line", group = state) Wednesday, 21 October 2009
  • 19. index <- function(y, x) { y / y[order(x)[1]] } matthew <- ddply(matthew, "state", transform, prop_i = index(prop, year), prop_si = index(prop_s, year)) qplot(year, prop_i, data = matthew, geom = "line", group = state) qplot(year, prop_si, data = matthew, geom = "line", group = state) Wednesday, 21 October 2009
  • 20. Your turn Create a plot to show all names simultaneously. Does smoothing every name in every state make it easier to see patterns? Hint: run the following R code on the next slide to eliminate names with less than 10 years of data Wednesday, 21 October 2009
  • 21. longterm <- ddply(bnames, c("name", "state"), function(df) { if (nrow(df) > 10) { df } }) Wednesday, 21 October 2009
  • 22. qplot(year, prop, data = bnames, geom = "line", group = state, alpha = I(1 / 4)) + facet_wrap(~ name) longterm <- ddply(longterm, c("name", "state"), transform, prop_s = smooth(prop, year)) qplot(year, prop_s, data = longterm, geom = "line", group = state, alpha = I(1 / 4)) + facet_wrap(~ name) last_plot() + facet_wrap(scales = "free_y") Wednesday, 21 October 2009
  • 24. Spatial plots Choropleth map: map colour of areas to value. Proportional symbol map: map size of symbols to value Wednesday, 21 October 2009
  • 25. juan2000 <- subset(bnames, name == "Juan" & year == 2000) # Turn map data into normal data frame library(maps) states <- map_data("state") states$state <- state.abb[match(states$region, tolower(state.name))] # Merge and then restore original order choropleth <- merge(juan2000, states, by = "state", all.y = T) choropleth <- choropleth[order(choropleth$order), ] # Plot with polygons qplot(long, lat, data = choropleth, geom = "polygon", fill = prop, group = group) Wednesday, 21 October 2009
  • 26. 45 40 prop 0.004 0.006 lat 0.008 35 0.01 30 −120 −110 −100 −90 −80 −70 long Wednesday, 21 October 2009
  • 27. What’s the problem with this map? 45 How could we fix it? 40 prop 0.004 0.006 lat 0.008 35 0.01 30 −120 −110 −100 −90 −80 −70 long Wednesday, 21 October 2009
  • 28. ggplot(choropleth, aes(long, lat, group = group)) + geom_polygon(fill = "white", colour = "grey50") + geom_polygon(aes(fill = prop)) Wednesday, 21 October 2009
  • 29. 45 40 prop 0.004 0.006 lat 0.008 35 0.01 30 −120 −110 −100 −90 −80 −70 long Wednesday, 21 October 2009
  • 30. Problems? What are the problems with this sort of plot? Take one minute to brainstorm some possible issues. Wednesday, 21 October 2009
  • 31. Problems Big areas most striking. But in the US (as with most countries) big areas tend to least populated. Most populated areas tend to be small and dense - e.g. the East coast. (Another computational problem: need to push around a lot of data to create these plots) Wednesday, 21 October 2009
  • 32. 45 ● ● ● ● 40 ● ● ● prop ● ● ● 0.004 ● ● 0.006 lat ● ● ● 0.008 35 ● ● ● 0.010 ● ● 30 ● −120 −110 −100 −90 −80 −70 long Wednesday, 21 October 2009
  • 33. mid_range <- function(x) mean(range(x)) centres <- ddply(states, c("state"), summarise, lat = mid_range(lat), long = mid_range(long)) bubble <- merge(juan2000, centres, by = "state") qplot(long, lat, data = bubble, size = prop, colour = prop) ggplot(bubble, aes(long, lat)) + geom_polygon(aes(group = group), data = states, fill = NA, colour = "grey50") + geom_point(aes(size = prop, colour = prop)) Wednesday, 21 October 2009
  • 34. Your turn Replicate either a choropleth or a proportional symbol map with the name of your choice. Wednesday, 21 October 2009
  • 35. Space | Time Wednesday, 21 October 2009
  • 37. Your turn Try and create this plot yourself. What is the main difference between this plot and the previous? Wednesday, 21 October 2009
  • 38. juan <- subset(bnames, name == "Juan") bubble <- merge(juan, centres, by = "state") ggplot(bubble, aes(long, lat)) + geom_polygon(aes(group = group), data = states, fill = NA, colour = "grey50") + geom_point(aes(size = prop, colour = prop)) + facet_wrap(~ year) Wednesday, 21 October 2009
  • 39. Aside: geographic data Boundaries for most countries available from: http://gadm.org To use with ggplot2, use the fortify function to convert to usual data frame. Will also need to install the sp package. Wednesday, 21 October 2009
  • 40. # install.packages("sp") library(sp) load(url("http://gadm.org/data/rda/CHE_adm1.RData")) head(as.data.frame(gadm)) ch <- fortify(gadm, region = "ID_1") str(ch) qplot(long, lat, group = group, data = ch, geom = "polygon", colour = I("white")) Wednesday, 21 October 2009
  • 42. This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/ 3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Wednesday, 21 October 2009