Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Los Angeles R users group - Dec 14 2010 - Part 4 updated
1. Excel and R: data
exchange (updated)
R-meetup of Los Angeles
Eric Kostello
(Corrections appreciated, especially in summary capability table.)
Sunday, December 19, 2010
2. Topics
• General remarks on spreadsheets
• Overview of options for R and Excel to exchange
data
• Example: Pre-formatted template + automated
population
• Possible comprehensive solution: package xlsx
• but only for .xlsx file formats currently
• Sorry: nothing on non-Excel spreadsheets
Sunday, December 19, 2010
3. On spreadsheets
• Power of spreadsheets: you can do “anything”
• Unfortunately, anything can happen
• Spreadsheets are ubiquitous
• Very handy for certain types of problems
• Users like the control they give
• When the scope of the spreadsheet part of the data creation/
analysis/production activities is appropriate, they may be useful
• More generally: when the scope of [ specific technology ] is
appropriate to [ the problem ] it may be useful to use it.
Sunday, December 19, 2010
4. A few perils (among many)
• Spreadsheets are typically built with little or no enforcement of data
integrity
• Errors can creep into spreadsheets
• Huge challenge for automated interaction
• No solution proposed here. Often the only way to work with data
from such sources is by manual cleaning.
• This is not a talk about why not to use spreadsheets, but check these out...
• http://lib.stat.cmu.edu/S/Spoetry/Tutor/spreadsheet_addiction.html
• Encyclopedia of the Evils, but acknowledges utility when limited in
scope
• “spreadsheet addiction”: search the web with this phrase to see that
problems with spreadsheets are not confined to data analysis
Sunday, December 19, 2010
5. Living with spreadsheets
• R users often must exchange data with spreadsheet
users
• Data is stored in spreadsheets because...
• That is the way it was archived/sent/obtained
• It is still being created that way and change is
difficult/impossible
• So, communication is essential
• Easier communication may make your day easier and
your exchange more reliable
• With that in mind...
Sunday, December 19, 2010
6. Data exchange between R and Excel
Method/ Cross
RW Details Pros Cons
package platform
Avoid RW Import/Export CSV Avoid some Excel pitfalls Manual steps required Yes
With driver
Can read rows and columns.
RODBC + drivers R Adaptation of SQL APIs Complexity & inconsistencies purchase (if
Some writing ability on Windows.
non-MS OS)
Data frame to sheet only.
read.xls Automates creation of CSV
R Reads xls and xlsx Trouble with quotes. Yes
(gdata, Perl) using Perl, then imports
Perl dependencies nuisance
write.xls
Automates creation of CSVs, data frame to sheet only.
(dataframe2xls, W Some formatting ability Yes
then converts (Coerces to dataframe.)
Python)
WriteXLS Automates creation of CSVs, Some formatting ability Limited flexibility.
W Yes
(WriteXLS & Perl) then converts Multiple sheets/one call Data frame to sheet only
RDCOMClient RW via Windows APIs Cell level control Not fully vectorized? No
Free version and Pro version Fast, mature .xls format only (.xlsx a future
xlsReadWrite RW (shareware) without Pro version can read/write rows/ possibility) No
dependencies columns, ranges. No formatting.
Data frames and smaller. Fine Slow
Using Java library from
RW formatting control. xlsx format only. Yes
xlsx (rJava & xlsxJars) Apache
xlsx file format. Not fully vectorized.
Sunday, December 19, 2010
7. Write to pre-formatted
spreadsheet using RDCOMClient
• Hybrid approach to repeated report creation
• Windows only approach to creating .xls Excel spreadsheets without
programmatic formatting
• Inherit/create formatting and/or formulas in Excel (“by hand”)
• Save a template file to copy and populate for each new report
• Use shell commands to
• Copy the template into a new version with an appropriate name
• Use RDCOMClient functions to...
• Open the copy
• write to specific cells in the spreadsheet
• Close the copy
Sunday, December 19, 2010
8. RDCOMClient example
library ( "RDCOMClient")
exampleTemplateFilename <- "Example_Template.xls" # This would have all necessary formatting and formulas in place
newExcelReportInstance <- paste ( "reportsDirectoryReport_for_", format(Sys.Date(), "%d_%b_%Y"), ".xls", sep = '')
copyCommand <- paste ( "copy", exampleTemplateFilename, newExcelReportInstance )
shell ( copyCommand, shell = 'cmd %WINDIR%')
print ( "Ignore the error message about UNC paths if it occurs; it does not matter.")
exampleData <- data.frame (X = 10:19, Y = 656:647 )
.COMInit() # Start server
exl <- COMCreate("Excel.Application") # Hook to Excel
books <- exl[["workbooks"]] # Talk to workbooks
exampleBook <- books$open(newHOfile) # Talk to book
exampleSheets <- exampleBook[["sheets"]] # Talk to sheets
exampleSheet <- exampleSheets$Item(as.integer(1)) # Talk to a specific sheet
# But I cannot figure out how to get the "Range" to be larger than 1x1, so iterate through rows. Do range only apply to
rows[??]
headerRowPadding <- 1 # Allow for this many header rows
for ( ithRow in 1:nrow ( exampleData ) ) {
cellReferenceA <- exampleSheet$Range( paste ( "A", r + headerRowPadding, sep = '') ) # Create a reference to
worksheet Column A, row ithRow + headerRowPadding
cellReferenceA[["Value"]] <- exampleData[ ithRow, "X" ]
cellReferenceB <- exampleSheet$Range( paste ( "B", r + headerRowPadding, sep = '') )
cellReferenceB[["Value"]] <- exampleData[ ithRow, "Y" ]
}
exampleBook$save()
exampleBook$close()
Sunday, December 19, 2010
9. xlsx package overview
• Philosophy: Use Excel interface capabilities created in a more
widely used codebase: The Apache Java API to Microsoft
documents.
• Many capabilities are obtained “for free.”
• Full-featured cross platform solution
• This is a suitable candidate for one stop shopping in R to Excel
communications
• but requiring it may be a problem for some installations
(rJava dependency)
• It is somewhat slow, which is noticeable for larger Excel files
Sunday, December 19, 2010
10. xlsx package capabilities
• Easy data frame import/export: read.xls and write.xls
• write.xlsx ( exampleData, file = “exampleData Workbook.xlsx”)
• read.xlsx ( file = ..., sheet = ... )
• One sheet at a time. Can keep formulas, provide colClasses.
• Reads/writes at the cell level (but writing not fully vectorized)
• Formatting control (using Excel native capabilities, such as
borderColor)
• Reads/Writes comments in cells
• Merging regions, freezing panes, set print area, set zoom
• Can insert images (dib, emf, jpeg, pict, png, wmf).
Sunday, December 19, 2010