SlideShare ist ein Scribd-Unternehmen logo
1 von 98
Downloaden Sie, um offline zu lesen
Data Visualization

http://nycdatascience.com/part4_en/

Data Visualization
class 5

Vivian Zhang | Scott Kostyshak
CTO @Supstat Inc | Data Scientist @Supstat Inc

1 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Data visualization
We will study the application of primary drawing functions and advanced drawing functions in R and
will focus on understanding the methods of data exploration by visualization.
· The related functions in R
· The properties of a single variable
· Displaying compositions
· The relationship between variables
· Exhibiting change over time
· Geographic information
Case study and excercise: Analyzing the NBA data with graphics

2 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Why use visualization?

3 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Data visualization
A figure is worth a thousand words.
data <- read.table('data/anscombe.txt',T)
data <- data[,-1]
head(data)

1
2
3
4
5
6

4 of 98

x1
10
8
13
9
11
14

x2
10
8
13
9
11
14

x3 x4
y1
y2
y3
y4
10 8 8.04 9.14 7.46 6.58
8 8 6.95 8.14 6.77 5.76
13 8 7.58 8.74 12.74 7.71
9 8 8.81 8.77 7.11 8.84
11 8 8.33 9.26 7.81 8.47
14 8 9.96 8.10 8.84 7.04

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Data visualization
Try to calculate some statistical indicators. First calculate the mean of these datasets, and then
calculate the correlation coefficient of the four groups of data
colMeans(data)

x1 x2 x3 x4 y1 y2 y3 y4
9.0 9.0 9.0 9.0 7.5 7.5 7.5 7.5

sapply(1:4,function(x) cor(data[,x],data[,x+4]))

[1] 0.816 0.816 0.816 0.817

5 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Data visualization

6 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Some basic principles
1. Determine the target of visualization from the beginning
· Exploratory visualization
· Explanatory visualization
2. Understanding the characteristics of the data and the audience
· Which variables are important and interesting
· Consider the role and background of the audience
· Select a proper mapping
3. Keep concise but give enough information

7 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Mapping elements of a graph:
1. Coordinate position
2. Line
3. Size
4. Color
5. Shape
6. Text

8 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Visualization functions in R

9 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Visualization functions in R
· base graphics
· lattice
· ggplot2

10 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Elementary graphing functions
plot(cars$dist~cars$speed)

11 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Elementary graphing functions
plot(cars$dist,type='l')

12 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Elementary graphing functions
plot(cars$dist,type='h')

13 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Elementary graphing functions
hist(cars$dist)

14 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

lattice package
library(lattice)
num <- sample(1:3,size=50,replace=T)
barchart(table(num))

15 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

lattice package
qqmath(rnorm(100))

16 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

lattice package
stripplot(~ Sepal.Length | Species, data = iris,layout=c(1,3))

17 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

lattice package
densityplot(~ Sepal.Length, groups=Species, data = iris,plot.points=FALSE)

18 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

lattice package
bwplot(Species~ Sepal.Length, data = iris)

19 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

lattice package
xyplot(Sepal.Width~ Sepal.Length, groups=Species, data = iris)

20 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

lattice package
splom(iris[1:4])

21 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

lattice package
histogram(~ Sepal.Length | Species, data = iris,layout=c(1,3))

22 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Three-dimensional graphs in the lattice
package
library(plyr)
func3d <- function(x,y) {
sin(x^2/2 - y^2/4) * cos(2*x - exp(y))
}
vec1 <- vec2 <- seq(0,2,length=30)
para <- expand.grid(x=vec1,y=vec2)
result6 <- mdply(.data=para,.fun=func3d)

23 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Three-dimensional graphs in the lattice
package
library(lattice)
wireframe(V1~x*y,data=result6,scales = list(arrows = FALSE),
drape = TRUE, colorkey = F)

24 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

ggplot package
Data, Mapping and Geom
library(ggplot2)
p <- ggplot(data=mpg,mapping=aes(x=cty,y=hwy)) + geom_point()
print(p)

25 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

ggplot package
Observe the internal structure
summary(p)

data: manufacturer, model, displ, year, cyl, trans, drv, cty, hwy, fl, class [234x11]
mapping: x = cty, y = hwy
faceting: facet_null()
----------------------------------geom_point: na.rm = FALSE
stat_identity:
position_identity: (width = NULL, height = NULL)

26 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

ggplot package
Add other data mappings
p <- ggplot(data=mpg,mapping=aes(x=cty,y=hwy,colour=factor(year)))
p <- p + geom_point()
print(p)

27 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

ggplot package
Add a statistical transformation such as a smooth
p <- ggplot(data=mpg,mapping=aes(x=cty,y=hwy,colour=factor(year)))
p <- p + geom_smooth()
print(p)

28 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

ggplot package
Add points and smooth lines on the plot layer
p <- ggplot(data=mpg,mapping=aes(x=cty,y=hwy)) +
geom_point(aes(colour=factor(year))) +
geom_smooth()

29 of 98

2/4/14, 7:31 AM
Data Visualization

30 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

ggplot package
Scale control
p <- ggplot(data=mpg,mapping=aes(x=cty,y=hwy)) +
geom_point(aes(colour=factor(year))) +
geom_smooth() +
scale_color_manual(values=c('blue2','red4'))

31 of 98

2/4/14, 7:31 AM
Data Visualization

32 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

ggplot package
Facet control
p <- ggplot(data=mpg,mapping=aes(x=cty,y=hwy)) +
geom_point(aes(colour=factor(year))) +
geom_smooth() +
scale_color_manual(values=c('blue2','red4')) +
facet_wrap(~ year,ncol=1)

33 of 98

2/4/14, 7:31 AM
Data Visualization

34 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

ggplot package
Polishing your plots for publication
p <- ggplot(data=mpg, mapping=aes(x=cty,y=hwy)) +
geom_point(aes(colour=class,size=displ),
alpha=0.5,position = "jitter") +
geom_smooth() +
scale_size_continuous(range = c(4, 10)) +
facet_wrap(~ year,ncol=1) +
opts(title='Vehicle model and fuel consumption') +
labs(y='Highway miles per gallon',
x='Urban miles per gallon',
size='Displacement',
colour = 'Model')

35 of 98

2/4/14, 7:31 AM
Data Visualization

36 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

ggplot exercise I
change the coordinate system,such as coord_flip() , coord_polar(),coord_cartesian()
p <- ggplot(data=mpg, mapping=aes(x=cty,y=hwy)) +
geom_point(aes(colour=factor(year),size=displ), alpha=0.5,position = "jitter")+
stat_smooth()+
scale_color_manual(values =c('steelblue','red4'))+
scale_size_continuous(range = c(4, 10))

37 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

The properties of a single variable

38 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Histogram
library(ggplot2)
p <- ggplot(data=iris,aes(x=Sepal.Length))+
geom_histogram()
print(p)

39 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Histogram
We can customize the histogram as follows:
p <- ggplot(iris,aes(x=Sepal.Length))+
geom_histogram(binwidth=0.1,
# Set the group gap
fill='skyblue', # Set the fill color
colour='black') # Set the border color

40 of 98

2/4/14, 7:31 AM
Data Visualization

41 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Histograms plus density curve
The main role of the histogram of is to show counting by groups and distribution characteristics. The
distribution of a sample in traditional statistics is of important significance. But there is another
method that can also show the distribution of data, namely the kernel density estimation curve. We
can estimate a density curve that represents the distribution, according to the data. We can display
the histogram and density curve at the same time.
p <- ggplot(iris,aes(x=Sepal.Length)) +
geom_histogram(aes(y=..density..),
fill='skyblue',
color='black') +
geom_density(color='black',
linetype=2,adjust=2)

42 of 98

2/4/14, 7:31 AM
Data Visualization

43 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Density curve
Similar to the window width parameter, the adjust parameter will control the presentation of the
density curve. We try different parameters to draw mutiple density curves. The smaller the parameter
is, the more volatile and sensitive the curve is.
p <- ggplot(iris,aes(x=Sepal.Length)) +
geom_histogram(aes(y=..density..), # Note: set y to relative frequency
fill='gray60',
color='gray') +
geom_density(color='black',linetype=1,adjust=0.5) +
geom_density(color='black',linetype=2,adjust=1) +
geom_density(color='black',linetype=3,adjust=2)

44 of 98

2/4/14, 7:31 AM
Data Visualization

45 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Density curve
Density curve is also convenient for comparison between different data. For example, we want to
compare the Sepal.Length distribution of three different flowers of the iris, like this:
p <- ggplot(iris,aes(x=Sepal.Length,fill=Species)) + geom_density(alpha=0.5,color='gray')
print(p)

46 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Boxplot
In addition to the histograms and density map, We can also use boxplots to show the distribution of
one-dimensional data. The boxplot is also convenient for comparison of different data.
p <- ggplot(iris,aes(x=Species,y=Sepal.Length,fill=Species)) + geom_boxplot()
print(p)

47 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Violin plot
A violin plot contains more information than a boxplot about the (sub-)distributions of the data:
p <- ggplot(iris,aes(x=Species,y=Sepal.Length,fill=Species)) + geom_violin()
print(p)

48 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Violin plot plus points
p <- ggplot(iris,aes(x=Species,y=Sepal.Length,
fill=Species)) +
geom_violin(fill='gray',alpha=0.5) +
geom_dotplot(binaxis = "y", stackdir = "center")
print(p)

49 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Displaying compositions

50 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Bar chart
The proportion of each vehicle model in the mpg dataset and these proportions grouped by years
p <- ggplot(mpg,aes(x=class)) +
geom_bar()
print(p)

51 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Stacked bar chart
The proportion of each vehicle model in the mpg dataset and these proportions grouped by years
mpg$year <- factor(mpg$year)
p <- ggplot(mpg,aes(x=class,fill=year)) +
geom_bar(color='black')

52 of 98

2/4/14, 7:31 AM
Data Visualization

53 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Stacked bar chart
Stacked bar chart
p <- ggplot(mpg,aes(x=class,fill=year)) +
geom_bar(color='black',
position=position_dodge())

54 of 98

2/4/14, 7:31 AM
Data Visualization

55 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Pie chart
p <- ggplot(mpg, aes(x = factor(1), fill = factor(class))) +
geom_bar(width = 1)+
coord_polar(theta = "y")

56 of 98

2/4/14, 7:31 AM
Data Visualization

57 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Rose diagram
Wind rose, a commonly used graphics tool by meteorologists, describes the wind speed and
direction distributions in a specific place.

set.seed(1)
# Randomly generate 100 wind directions, and divide them into 16 intervals.
dir <- cut_interval(runif(100,0,360),n=16)
# Randomly generate 100 wind speed, and divide them into 4 intensities.
mag <- cut_interval(rgamma(100,15),4)
sample <- data.frame(dir=dir,mag=mag)
# Map wind direction to X-axie, frequency to Y-axie and speed to fill colors. Transform the coo
p <- ggplot(sample,aes(x=dir,fill=mag)) +
geom_bar()+ coord_polar()

58 of 98

2/4/14, 7:31 AM
Data Visualization

59 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Mosaic Plot
Divide the data according to different variables, and then use rectangles of different sizes to
represent different groups of data. Let's look at the gender breakdown of survivors:

60 of 98

2/4/14, 7:31 AM
Data Visualization

61 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

The proportion structure of continuous data
data <- read.csv('data/soft_impact.csv',T)
library(reshape2)
data.melt <- melt(data,id='Year')
p <- ggplot(data.melt,aes(x=Year,y=value,
group=variable,fill=variable)) +
geom_area(color='black',size=0.3,
position=position_fill()) +
scale_fill_brewer()

62 of 98

2/4/14, 7:31 AM
Data Visualization

63 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

The relationship between variables

64 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Scatter diagram
Show the relationship between two variables with a scatter diagram.
p <- ggplot(data=mpg,aes(x=cty,y=hwy)) +
geom_point()
print(p)

65 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Scatter plot of multidimensional data
mpg$year <- factor(mpg$year)
p <- ggplot(data=mpg,aes(x=cty,y=hwy)) + geom_point(aes(color=year))
print(p)

66 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Scatter plot of multidimensional data
Represent different years with different shapes
mpg$year <- factor(mpg$year)
p <- ggplot(data=mpg,aes(x=cty,y=hwy)) + geom_point(aes(color=year,shape=year))
print(p)

67 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Scatter plot of multidimensional data
With large data sets, the points in a scatter plot may obscure each other due to overplotting, we can
make some random disturbance to solve this problem.
p <- ggplot(data=mpg,aes(x=cty,y=hwy)) + geom_point(aes(color=year),alpha=0.5,position =
print(p)

68 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Scatter plot of multidimensional data
For the trend of the scatterplot, we can draw out the regression line.
p <- ggplot(data=mpg,aes(x=cty,y=hwy)) +
geom_point(aes(color=year),alpha=0.5,position = "jitter") +
geom_smooth(method='lm')
print(p)

69 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Scatter plot of multidimensional data
In addition to color, We can also use the size of the dot to reflect another variable, such as the size
of the cylinder. Some refer to plots like this as "bubble charts".
p <- ggplot(data=mpg,aes(x=cty,y=hwy)) +
geom_point(aes(color=year,size=displ),alpha=0.5,position = "jitter") +
geom_smooth(method='lm') +
scale_size_continuous(range = c(4, 10))

70 of 98

2/4/14, 7:31 AM
Data Visualization

71 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Scatter plot of multidimensional data
Although we can show all the variables in a picture, we can also split it into multiple pictures to show
the characteristics of different variables. This method is called grouping, conditioning, or faceting.
p <- ggplot(data=mpg,aes(x=cty,y=hwy)) +
geom_point(aes(colour=class,size=displ),
alpha=0.5,position = "jitter") +
geom_smooth() +
scale_size_continuous(range = c(4, 10)) +
facet_wrap(~ year,ncol=1)

72 of 98

2/4/14, 7:31 AM
Data Visualization

73 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

ggplot exercise II
· make scatter plot for diamond data
· use transparency and small size points, look into size and alpha option in geom_point()
· use bin chart to observe intensity of points,look into stat_bin2d()
· estimate

data

dentisy,look

into

stat_density2d()

and

use

+cooord_cartesian(xlim=c(0,1.5), ylim=c(0,6000))

74 of 98

2/4/14, 7:31 AM
Data Visualization

75 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

76 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

77 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Scatter plot of multidimensional data
The typical scatter plot is to show a relationship between two variables. When you want to look at
many bivariate relationships at once, you can use a scatter plot matrix.

78 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Scatter plot of multidimensional data
if given many numerical variables, concentrated display can be done.

79 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Change over time

80 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Change over time
For visualization of time series data, the first step is looking at how the variable changes over time.
For example, we'll have a look at American employment GDP data visualization.
fillcolor <- ifelse(economics[440:470,'unemploy']<8000,'steelblue','red4')
p <- ggplot(economics[440:470,],aes(x=date,y=unemploy)) +
geom_bar(stat='identity',
fill=fillcolor)

81 of 98

2/4/14, 7:31 AM
Data Visualization

82 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Change over time
For the time series of small amount of data, we can use the bar graph to display. At the same time
display the number of positive and negative values with different colors.For the time series of large
scale data, the bar will be crowded, and lines and points can be used to represent the strip.

p <- ggplot(economics[300:470,],aes(x=date,ymax=psavert,ymin=0)) +
geom_linerange(color='grey20',size=0.5) +
geom_point(aes(y=psavert),color='red4') +
theme_bw()

83 of 98

2/4/14, 7:31 AM
Data Visualization

84 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Change over time
When the data is more intensive, we can use line graph or area chart to show the change of a trend.
Also, some important time points or time interval can be marked in the time series graph, such as
marking 80's as a key time.
fill.color <- ifelse(economics$date > '1980-01-01' &
economics$date < '1990-01-01',
'steelblue','red4')
p <- ggplot(economics,aes(x=date,ymax=psavert,ymin=0)) +
geom_linerange(color=fill.color,size=0.9) +
geom_text(aes(x=as.Date("1985-01-01",'%Y-%m-%d'),y=13),label="1980'") +
theme_bw()

85 of 98

2/4/14, 7:31 AM
Data Visualization

86 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

87 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Geographic information
visualization

88 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Map
Two types of drawing map
· Download the geographic information data, and then draw the geographical boundaries, and
identify areas and locations according to the need
· Download bitmap data of Google map, and then mark the location and path information on the
google map

89 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Map
world map
library(ggplot2)
world <- map_data("world")
worldmap <- ggplot(world, aes(x=long, y=lat, group=group)) +
geom_path(color='gray10',size=0.3) +
geom_point(x=114,y=30,size=10,shape='*') +
scale_y_continuous(breaks=(-2:2) * 30) +
scale_x_continuous(breaks=(-4:4) * 45) +
coord_map("ortho", orientation=c(30, 120, 0)) +
theme(panel.grid.major = element_line(colour = "gray50"),
panel.background = element_rect(fill = "white"),
axis.text=element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank())

90 of 98

2/4/14, 7:31 AM
Data Visualization

91 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

map of the U.S.
map <- map_data('state')
arrests <- USArrests
names(arrests) <- tolower(names(arrests))
arrests$region <- tolower(rownames(USArrests))
usmap <- ggplot(data=arrests) +
geom_map(map =map,aes(map_id = region,fill = murder),color='gray40' ) +
expand_limits(x = map$long, y = map$lat) +
scale_fill_continuous(high='red2',low='white') +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.background = element_blank(),
axis.text=element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank(),
legend.position = c(0.95,0.28),
legend.background=element_rect(fill="white", colour="white"))+ coord_map('mercator'

92 of 98

2/4/14, 7:31 AM
Data Visualization

93 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Drawing a map of China based on a bitmap
Another method to drawing China map is to download a document containing bitmap data from
Google or openstreetmap, and then to overlap points and lines elements on it with ggplot2. This
document does not include information of latitude and longitude, just a simple bitmap, for fast
mapping.
library(ggmap)
library(XML)
webpage <-'http://data.earthquake.cn/datashare/globeEarthquake_csn.html'
tables <- readHTMLTable(webpage,stringsAsFactors = FALSE)
raw <- tables[[6]]
data <- raw[,c(1,3,4)]
names(data) <- c('date','lan','lon')
data$lan <- as.numeric(data$lan)
data$lon <- as.numeric(data$lon)
data$date <- as.Date(data$date, "%Y-%m-%d")
#Read the map data from Google by the ggmap package, and mark the previous data on the map.
earthquake <- ggmap(get_googlemap(center = 'china', zoom=4,maptype='terrain'),extent='device'
geom_point(data=data,aes(x=lon,y=lan),colour = 'red',alpha=0.7)+
theme(legend.position = "none")

94 of 98

2/4/14, 7:31 AM
Data Visualization

95 of 98

http://nycdatascience.com/part4_en/

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

R and interactive visualization
GoogleVis is R package providing a interface between R and Google visualization API. It allows the
user to use the Google Visualization API for data visualization without the need to upload data.
We want to compare the development trajectory of 20 country group over the past several years. In
order to obtain the data, we selected three variables from the world bank database, which reflect the
change of GDP, CO2 emissions and life expectancy between 2001 to 2009.
library(googleVis)
library(WDI)
DF <- WDI(country=c("CN","RU","BR","ZA","IN",'DE','AU','CA','FR','IT','JP','MX','GB','US'
M <- gvisMotionChart(DF, idvar="country", timevar="year",
xvar='EN.ATM.CO2E.KT',
yvar='NY.GDP.MKTP.CD')
plot(M)

96 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Case study and excercise

97 of 98

2/4/14, 7:31 AM
Data Visualization

http://nycdatascience.com/part4_en/

Exercise III: Analyzing NBA data
· Calculate the seasonal winning rate, and draw a bar chart
· Calculating the seasonal winning rate at home and on the road, and draw a bar chart
· According to the seasonal scores of home side, draw a set of four histograms
· According to the seasonal scores of home side,draw the boxplots of five seasons
· Draw the boxplots of scores of all competitions for home side and opposite side
· Calculate the average and winning percentage for each opponent, and make a scatterplot to find
the strong and the weak team.

98 of 98

2/4/14, 7:31 AM

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (11)

Data visualization using the grammar of graphics
Data visualization using the grammar of graphicsData visualization using the grammar of graphics
Data visualization using the grammar of graphics
 
QMC: Undergraduate Workshop, Tutorial on 'R' Software - Yawen Guan, Feb 26, 2...
QMC: Undergraduate Workshop, Tutorial on 'R' Software - Yawen Guan, Feb 26, 2...QMC: Undergraduate Workshop, Tutorial on 'R' Software - Yawen Guan, Feb 26, 2...
QMC: Undergraduate Workshop, Tutorial on 'R' Software - Yawen Guan, Feb 26, 2...
 
Data flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkData flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into Flink
 
Maps
MapsMaps
Maps
 
R studio
R studio R studio
R studio
 
Gate-Cs 2010
Gate-Cs 2010Gate-Cs 2010
Gate-Cs 2010
 
DocEng2010 Bilauca Healy - A New Model for Automated Table Layout
DocEng2010 Bilauca Healy - A New Model for Automated Table LayoutDocEng2010 Bilauca Healy - A New Model for Automated Table Layout
DocEng2010 Bilauca Healy - A New Model for Automated Table Layout
 
Report
ReportReport
Report
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
 
Ijetr042170
Ijetr042170Ijetr042170
Ijetr042170
 
Gis (model questions)777
Gis (model questions)777Gis (model questions)777
Gis (model questions)777
 

Ähnlich wie Data visualization

Ähnlich wie Data visualization (20)

Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 
Generalized Notions of Data Depth
Generalized Notions of Data DepthGeneralized Notions of Data Depth
Generalized Notions of Data Depth
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
 
Formations & Deformations of Social Network Graphs
Formations & Deformations of Social Network GraphsFormations & Deformations of Social Network Graphs
Formations & Deformations of Social Network Graphs
 
R visualization: ggplot2, googlevis, plotly, igraph Overview
R visualization: ggplot2, googlevis, plotly, igraph OverviewR visualization: ggplot2, googlevis, plotly, igraph Overview
R visualization: ggplot2, googlevis, plotly, igraph Overview
 
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace DataMPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
 
Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in R
 
CLIM: Transition Workshop - A Notional Framework for a Theory of Data Systems...
CLIM: Transition Workshop - A Notional Framework for a Theory of Data Systems...CLIM: Transition Workshop - A Notional Framework for a Theory of Data Systems...
CLIM: Transition Workshop - A Notional Framework for a Theory of Data Systems...
 
Exploratory Analysis Part1 Coursera DataScience Specialisation
Exploratory Analysis Part1 Coursera DataScience SpecialisationExploratory Analysis Part1 Coursera DataScience Specialisation
Exploratory Analysis Part1 Coursera DataScience Specialisation
 
2019 GDRR: Blockchain Data Analytics - Dissecting Blockchain Price Analytics...
2019 GDRR: Blockchain Data Analytics  - Dissecting Blockchain Price Analytics...2019 GDRR: Blockchain Data Analytics  - Dissecting Blockchain Price Analytics...
2019 GDRR: Blockchain Data Analytics - Dissecting Blockchain Price Analytics...
 
Two mark qn answer
Two mark qn answerTwo mark qn answer
Two mark qn answer
 
Tech talk ggplot2
Tech talk   ggplot2Tech talk   ggplot2
Tech talk ggplot2
 
DATA VISUALIZATION WITH R PACKAGES
DATA VISUALIZATION WITH R PACKAGESDATA VISUALIZATION WITH R PACKAGES
DATA VISUALIZATION WITH R PACKAGES
 
Presentation1.1
Presentation1.1Presentation1.1
Presentation1.1
 
Vivarana literature survey
Vivarana literature surveyVivarana literature survey
Vivarana literature survey
 
Fundamentals_of_GIS_Estoque.pdf
Fundamentals_of_GIS_Estoque.pdfFundamentals_of_GIS_Estoque.pdf
Fundamentals_of_GIS_Estoque.pdf
 
Informatics Practices (new) solution CBSE 2021, Compartment, improvement ex...
Informatics Practices (new) solution CBSE  2021, Compartment,  improvement ex...Informatics Practices (new) solution CBSE  2021, Compartment,  improvement ex...
Informatics Practices (new) solution CBSE 2021, Compartment, improvement ex...
 

Mehr von Vivian S. Zhang

Wikipedia: Tuned Predictions on Big Data
Wikipedia: Tuned Predictions on Big DataWikipedia: Tuned Predictions on Big Data
Wikipedia: Tuned Predictions on Big Data
Vivian S. Zhang
 
Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on Hadoop
Vivian S. Zhang
 
Max Kuhn's talk on R machine learning
Max Kuhn's talk on R machine learningMax Kuhn's talk on R machine learning
Max Kuhn's talk on R machine learning
Vivian S. Zhang
 
Introducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rIntroducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with r
Vivian S. Zhang
 

Mehr von Vivian S. Zhang (20)

Why NYC DSA.pdf
Why NYC DSA.pdfWhy NYC DSA.pdf
Why NYC DSA.pdf
 
Career services workshop- Roger Ren
Career services workshop- Roger RenCareer services workshop- Roger Ren
Career services workshop- Roger Ren
 
Nycdsa wordpress guide book
Nycdsa wordpress guide bookNycdsa wordpress guide book
Nycdsa wordpress guide book
 
We're so skewed_presentation
We're so skewed_presentationWe're so skewed_presentation
We're so skewed_presentation
 
Wikipedia: Tuned Predictions on Big Data
Wikipedia: Tuned Predictions on Big DataWikipedia: Tuned Predictions on Big Data
Wikipedia: Tuned Predictions on Big Data
 
A Hybrid Recommender with Yelp Challenge Data
A Hybrid Recommender with Yelp Challenge Data A Hybrid Recommender with Yelp Challenge Data
A Hybrid Recommender with Yelp Challenge Data
 
Kaggle Top1% Solution: Predicting Housing Prices in Moscow
Kaggle Top1% Solution: Predicting Housing Prices in Moscow Kaggle Top1% Solution: Predicting Housing Prices in Moscow
Kaggle Top1% Solution: Predicting Housing Prices in Moscow
 
Data mining with caret package
Data mining with caret packageData mining with caret package
Data mining with caret package
 
Xgboost
XgboostXgboost
Xgboost
 
Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on Hadoop
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
 
Xgboost
XgboostXgboost
Xgboost
 
Nyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expandedNyc open-data-2015-andvanced-sklearn-expanded
Nyc open-data-2015-andvanced-sklearn-expanded
 
Nycdsa ml conference slides march 2015
Nycdsa ml conference slides march 2015 Nycdsa ml conference slides march 2015
Nycdsa ml conference slides march 2015
 
THE HACK ON JERSEY CITY CONDO PRICES explore trends in public data
THE HACK ON JERSEY CITY CONDO PRICES explore trends in public dataTHE HACK ON JERSEY CITY CONDO PRICES explore trends in public data
THE HACK ON JERSEY CITY CONDO PRICES explore trends in public data
 
Max Kuhn's talk on R machine learning
Max Kuhn's talk on R machine learningMax Kuhn's talk on R machine learning
Max Kuhn's talk on R machine learning
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen Zhang
 
Using Machine Learning to aid Journalism at the New York Times
Using Machine Learning to aid Journalism at the New York TimesUsing Machine Learning to aid Journalism at the New York Times
Using Machine Learning to aid Journalism at the New York Times
 
Introducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with rIntroducing natural language processing(NLP) with r
Introducing natural language processing(NLP) with r
 
Bayesian models in r
Bayesian models in rBayesian models in r
Bayesian models in r
 

Kürzlich hochgeladen

Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Kürzlich hochgeladen (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 

Data visualization