Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
R-Excel Integration
1. R – Excel Integration using XLConnect package
Example of writing R data frames and graphs to Excel templates
Create and save Excel template on Desktop:
• Two predefined cell styles called header (Font - Calibry (Body), Font Style - Bold
Italic, Size - 11, Fill - Blue) and number (Number, Comma 1000 separator, Decimal
Places - 3).
• Sheet2 – Table with formatting range Sheet2!B2:C3 as Number, Comma 1000
Separator and Decimal Places - 2
• Named Region graph - Sheet2!$A$5
R code (requires libraries XLConnect, plyr and ggplot2):
#Import XLConnect library
library(XLConnect)
#Define working directory
setwd("C:UsersPCDesktop")
#Simulate some data frame
set.seed(10)
DF <- data.frame(Gender=sample(c("female", "male"), 100, rep=TRUE),
Consumption=runif(100, 1000, 2000), stringsAsFactors=FALSE)
#Load Excel template file named template.xls
2. wb <- loadWorkbook("C:UsersPCDesktoptemplate.xlsx")
#Import predefined cell styles
header <- getCellStyle(wb, name = "header")
number <- getCellStyle(wb, name="number")
#Write data frame to Sheet1
writeWorksheet(wb, DF, sheet="Sheet1", startRow=1, startCol=1)
#Applied predefined cell styles
setCellStyle(wb, sheet="Sheet1", row=rep(1,ncol(DF)), col=rep(c(1:2), 1),
cellstyle=header)
setCellStyle(wb, sheet="Sheet1", row=2:(nrow(DF)+1), col=rep(c(1:2),
each=nrow(DF)),
cellstyle=number)
#Keep predefined cell style from Sheet2
setStyleAction(wb, XLC$"STYLE_ACTION.PREDEFINED")
#Summarize data by gender using plyr package
library(plyr)
DF1 <- ddply(DF, .(Gender), summarise,
No_Respondents=length(Gender),
Total_Consumption=sum(Consumption))
#Write data frame to Sheet2
writeWorksheet(wb, sheet="Sheet2", DF1[,-1], header=FALSE, startRow=2,
startCol=2)
#Create graph using ggplot2 library and save it as png file
png("graph1.png")
library(ggplot2)
ggplot(DF1, aes(Gender, No_Respondents, fill=Gender)) +
geom_bar(position = "dodge")+ scale_fill_manual(values=c("darkgray",
"lightblue"))
dev.off()
#Add graph (png to) to Excel file
#Named region graph on Sheet2
addImage(wb, filename = "graph1.png", name = "graph",
3. originalSize = TRUE)
#Deleted png file from working directory
file.remove("graph1.png")
#Save changes
saveWorkbook(wb)
R info session:
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_0.9.1 plyr_1.7.1 XLConnect_0.1-9
XLConnectJars_0.1-4
[5] rJava_0.9-3
loaded via a namespace (and not attached):
[1] colorspace_1.1-1 dichromat_1.2-4 digest_0.5.2 grid_2.15.0
[5] labeling_0.1 MASS_7.3-17 memoise_0.1 munsell_0.3
[9] proto_0.3-9.2 RColorBrewer_1.0-5 reshape2_1.2.1 scales_0.2.1
[13] stringr_0.6