This document provides an overview of basic data ingestion techniques in R, including reading data from flat files, relational databases, and other sources using functions like read.table, read.csv, read.delim, and scan. It also discusses other methods like readLines, Sqldf, and bigmemory, as well as tricks for handling column classes, names, and missing values.
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Basic data ingestion in r
1. Basic Data Ingestion in R
Denver RUG 11/16/10
@jrideout
Software Engineer & Data Monkey
@ReturnPath
2. Where is the data?
• Flat-file (text/binary)
• Relational Database
• Where is … (from google suggestions)
– chuck norris
– the love
– my mind
– the love lyrics (apparently a song by Black Eyed Peas)
7. Some tricks
• comment.char="“
• Use colClasses or as.is for read.table
– stringsAsFactors
• Colnames(data) <- c(‘newName’,’other’)
• na.strings = “.”
8. Working with the DF
• Attach(df); fieldname
• df[[index]]
• df$fieldname
• Plyr/Reshape
• name abbreviation
• as.*, matrix, data.matrix
9. Type coercion
• Check types with str(), typeof()
• attributes()
• logical < integer < double < complex
• It’s better to get the read.* methods right
than coerce later.