SlideShare ist ein Scribd-Unternehmen logo
1 von 2
Downloaden Sie, um offline zu lesen
Other types of data
Try one of the following packages to import
other types of files
• haven - SPSS, Stata, and SAS files
• readxl - excel files (.xls and .xlsx)
• DBI - databases
• jsonlite - json
• xml2 - XML
• httr - Web APIs
• rvest - HTML (Web Scraping)
write_csv(x, path, na = "NA", append = FALSE,
col_names = !append)
Tibble/df to comma delimited file.
write_delim(x, path, delim = " ", na = "NA",
append = FALSE, col_names = !append)
Tibble/df to file with any delimiter.
write_excel_csv(x, path, na = "NA", append =
FALSE, col_names = !append)
Tibble/df to a CSV for excel
write_file(x, path, append = FALSE)
String to file.
write_lines(x, path, na = "NA", append =
FALSE)
String vector to file, one element per line.
write_rds(x, path, compress = c("none", "gz",
"bz2", "xz"), ...)
Object to RDS file.
write_tsv(x, path, na = "NA", append = FALSE,
col_names = !append)
Tibble/df to tab delimited files.
Write functions
Save x, an R object, to path, a file path, with:
Read functions Parsing data types
Tidy Data
with tidyr Cheat Sheet
R’s tidyverse is built around tidy data stored in
tibbles, an enhanced version of a data frame.
The front side of this sheet shows how
to read text files into R with readr.
The reverse side shows how to create
tibbles with tibble and to layout tidy
data with tidyr.
Data Importwith readr, tibble, and tidyr
Cheat Sheet
## Parsed with column specification:
## cols(
## age = col_integer(),
## sex = col_character(),
## earn = col_double()
## )
read_file(file, locale = default_locale())
Read a file into a single string.
read_file_raw(file)
Read a file into a raw vector.
read_lines(file, skip = 0, n_max = -1L, locale =
default_locale(), na = character(), progress =
interactive())
Read each line into its own string.
Skip lines
read_csv("file.csv",
skip = 1)
Read in a subset
read_csv("file.csv",
n_max = 1)
Missing Values
read_csv("file.csv",
na = c("4", "5", "."))
1. Use problems() to diagnose problems
x <- read_csv("file.csv"); problems(x)
2. Use a col_ function to guide parsing
• col_guess() - the default
• col_character()
• col_double()
• col_euro_double()
• col_datetime(format = "") Also
col_date(format = "") and col_time(format = "")
• col_factor(levels, ordered = FALSE)
• col_integer()
• col_logical()
• col_number()
• col_numeric()
• col_skip()
x <- read_csv("file.csv", col_types = cols(
A = col_double(),
B = col_logical(),
C = col_factor()
))
3. Else, read in as character vectors then parse
with a parse_ function.
• parse_guess(x, na = c("", "NA"), locale =
default_locale())
• parse_character(x, na = c("", "NA"), locale =
default_locale())
• parse_datetime(x, format = "", na = c("", "NA"),
locale = default_locale()) Also parse_date()
and parse_time()
• parse_double(x, na = c("", "NA"), locale =
default_locale())
• parse_factor(x, levels, ordered = FALSE, na =
c("", "NA"), locale = default_locale())
• parse_integer(x, na = c("", "NA"), locale =
default_locale())
• parse_logical(x, na = c("", "NA"), locale =
default_locale())
• parse_number(x, na = c("", "NA"), locale =
default_locale())
x$A <- parse_number(x$A)
read_lines_raw(file, skip = 0, n_max = -1L,
progress = interactive())
Read each line into a raw vector.
read_log(file, col_names = FALSE, col_types =
NULL, skip = 0, n_max = -1, progress =
interactive())
Apache style log files.
Read non-tabular data
Read tabular data to tibbles
read_csv()
Reads comma delimited files.
read_csv("file.csv")
read_csv2()
Reads Semi-colon delimited files.
read_csv2("file2.csv")
read_delim(delim, quote = """, escape_backslash = FALSE,
escape_double = TRUE) Reads files with any delimiter.
read_delim("file.txt", delim = "|")
read_fwf(col_positions)
Reads fixed width files.
read_fwf("file.fwf", col_positions = c(1, 3, 5))
read_tsv()
Reads tab delimited files. Also read_table().
read_tsv("file.tsv")
a,b,c
1,2,3
4,5,NA
a;b;c
1;2;3
4;5;NA
a|b|c
1|2|3
4|5|NA
a b c
1 2 3
4 5 NA
These functions share the common arguments:
read_*(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"),
quoted_na = TRUE, comment = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max =
min(1000, n_max), progress = interactive())
A B C
1 2 3
A B C
1 2 3
4 5 NA
x y z
A B C
1 2 3
4 5 NA
A B C
1 2 3
NA NA NA
1 2 3
4 5 NA
A B C
1 2 3
4 5 NA
A B C
1 2 3
4 5 NA
A B C
1 2 3
4 5 NA
A B C
1 2 3
4 5 NA
Useful arguments
a,b,c
1,2,3
4,5,NA
Example file
write_csv (path = "file.csv",
x = read_csv("a,b,cn1,2,3n4,5,NA"))
No header
read_csv("file.csv",
col_names = FALSE)
Provide header
read_csv("file.csv",
col_names = c("x", "y", "z"))
readr functions guess the types of each column
and convert types when appropriate (but will
NOT convert strings to factors automatically).
A message shows the type of each column in
the result.
earn is a double (numeric)
sex is a
character
age is an
integer
RStudio® is a trademark of RStudio, Inc. • CC BY RStudio info@rstudio.com • 844-448-1212 • rstudio.com Learn more at browseVignettes(package = c("readr", "tibble", "tidyr")) • readr 1.1.0 • tibble 1.2.12 • tidyr 0.6.0 • Updated: 2017-01
gather(data, key, value, ..., na.rm = FALSE,
convert = FALSE, factor_key = FALSE)
Gather moves column names into a key
column, gathering the column values into a
single value column.
spread(data, key, value, fill = NA, convert = FALSE,
drop = TRUE, sep = NULL)
Spread moves the unique values of a key column
into the column names, spreading the values of a
value column across the new columns that result.
Use gather() and spread() to reorganize the values of a table into a new layout. Each uses the idea of a
key column: value column pair.
gather(table4a, `1999`, `2000`,
key = "year", value = "cases") spread(table2, type, count)
valuekey
country 1999 2000
A 0.7K 2K
B 37K 80K
C 212K 213K
table4a
country year cases
A 1999 0.7K
B 1999 37K
C 1999 212K
A 2000 2K
B 2000 80K
C 2000 213K
valuekey
country year cases pop
A 1999 0.7K 19M
A 2000 2K 20M
B 1999 37K 172M
B 2000 80K 174M
C 1999 212K 1T
C 2000 213K 1T
table2
country year type count
A 1999 cases 0.7K
A 1999 pop 19M
A 2000 cases 2K
A 2000 pop 20M
B 1999 cases 37K
B 1999 pop 172M
B 2000 cases 80K
B 2000 pop 174M
C 1999 cases 212K
C 1999 pop 1T
C 2000 cases 213K
C 2000 pop 1T
Split and Combine Cells
unite(data, col, ..., sep = "_", remove = TRUE)
Collapse cells across several columns to
make a single column.
drop_na(data, ...)
Drop rows containing
NA’s in … columns.
fill(data, ..., .direction = c("down", "up"))
Fill in NA’s in … columns with most
recent non-NA values.
replace_na(data,
replace = list(), ...)
Replace NA’s by column.
Use these functions to split or combine cells into
individual, isolated values.
country year rate
A 1999 0.7K/19M
A 2000 2K/20M
B 1999 37K/172M
B 2000 80K/174M
C 1999 212K/1T
C 2000 213K/1T
country year cases pop
A 1999 0.7K 19M
A 2000 2K 20M
B 1999 37K 172
B 2000 80K 174
C 1999 212K 1T
C 2000 213K 1T
table3
separate(data, col, into, sep = "[^[:alnum:]]+",
remove = TRUE, convert = FALSE,
extra = "warn", fill = "warn", ...)
Separate each cell in a column to make several
columns.
separate_rows(data, ..., sep = "[^[:alnum:].]+",
convert = FALSE)
Separate each cell in a column to make several
rows. Also separate_rows_().
country century year
Afghan 19 99
Afghan 20 0
Brazil 19 99
Brazil 20 0
China 19 99
China 20 0
country year
Afghan 1999
Afghan 2000
Brazil 1999
Brazil 2000
China 1999
China 2000
table5
separate_rows(table3, rate,
into = c("cases", "pop"))
separate_rows(table3, rate)
unite(table5, century, year,
col = "year", sep = "")
Tidy Data with tidyr
x1 x2
A 1
B NA
C NA
D 3
E NA
x1 x2
A 1
D 3
x
x1 x2
A 1
B NA
C NA
D 3
E NA
x1 x2
A 1
B 1
C 1
D 3
E 3
x
x1 x2
A 1
B NA
C NA
D 3
E NA
x1 x2
A 1
B 2
C 2
D 3
E 2
x
drop_na(x, x2) fill(x, x2) replace_na(x,list(x2 = 2), x2)
country year rate
A 1999 0.7K
A 1999 19M
A 2000 2K
A 2000 20M
B 1999 37K
B 1999 172M
B 2000 80K
B 2000 174M
C 1999 212K
C 1999 1T
C 2000 213K
C 2000 1T
table3
country year rate
A 1999 0.7K/19M
A 2000 2K/20M
B 1999 37K/172M
B 2000 80K/174M
C 1999 212K/1T
C 2000 213K/1T
• Control the default appearance with options:
options(tibble.print_max = n,
tibble.print_min = m, tibble.width = Inf)
• View entire data set with View(x, title) or
glimpse(x, width = NULL, …)
• Revert to data frame with as.data.frame()
(required for some older packages)
Tibbles - an enhanced data frame
Handle Missing Values
Reshape Data - change the layout of values in a table
Tidy data is a way to organize tabular data. It provides a consistent data structure across packages.
CBA
A * B -> C
*A B C
Each observation, or
case, is in its own row
A B C
Each variable is in
its own column
A B C
&
Learn more at browseVignettes(package = c("readr", "tibble", "tidyr")) • readr 1.1.0 • tibble 1.2.12 • tidyr 0.6.0 • Updated: 2017-01
The tibble package provides a new S3 class for
storing tabular data, the tibble. Tibbles inherit the
data frame class, but improve two behaviors:
• Display - When you print a tibble, R provides a
concise view of the data that fits on one screen.
• Subsetting - [ always returns a new tibble,
[[ and $ always return a vector.
• No partial matching - You must use full
column names when subsetting
as_tibble(x, …) Convert data frame to tibble.
enframe(x, name = "name", value = "value")
Converts named vector to a tibble with a
names column and a values column.
is_tibble(x) Test whether x is a tibble.
data frame display
tibble display
Construct a tibble in two ways
tibble(…)
Construct by columns.
tibble(x = 1:3,
y = c("a", "b", "c"))
tribble(…)
Construct by rows.
tribble(
~x, ~y,
1, "a",
2, "b",
3, "c")
A tibble: 3 × 2
x y
<int> <dbl>
1 1 a
2 2 b
3 3 c
Both make
this tibble
ww
# A tibble: 234 × 6
manufacturer model displ
<chr> <chr> <dbl>
1 audi a4 1.8
2 audi a4 1.8
3 audi a4 2.0
4 audi a4 2.0
5 audi a4 2.8
6 audi a4 2.8
7 audi a4 3.1
8 audi a4 quattro 1.8
9 audi a4 quattro 1.8
10 audi a4 quattro 2.0
# ... with 224 more rows, and 3
# more variables: year <int>,
# cyl <int>, trans <chr>
156 1999 6 auto(l4)
157 1999 6 auto(l4)
158 2008 6 auto(l4)
159 2008 8 auto(s4)
160 1999 4 manual(m5)
161 1999 4 auto(l4)
162 2008 4 manual(m5)
163 2008 4 manual(m5)
164 2008 4 auto(l4)
165 2008 4 auto(l4)
166 1999 4 auto(l4)
[ reached getOption("max.print")
-- omitted 68 rows ]A large table
to display
RStudio® is a trademark of RStudio, Inc. • CC BY RStudio info@rstudio.com • 844-448-1212 • rstudio.com
A table is tidy if: Tidy data:
Makes variables easy
to access as vectors
Preserves cases during
vectorized operations
Expand Tables - quickly create tables with combinations of values
complete(data, ..., fill = list())
Adds to the data missing combinations of the
values of the variables listed in …
complete(mtcars, cyl, gear, carb)
expand(data, ...)
Create new tibble with all possible combinations
of the values of the variables listed in …
expand(mtcars, cyl, gear, carb)

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)
stasimus
 
R short-refcard
R short-refcardR short-refcard
R short-refcard
conline
 

Was ist angesagt? (19)

Stata cheat sheet analysis
Stata cheat sheet analysisStata cheat sheet analysis
Stata cheat sheet analysis
 
Scala. Introduction to FP. Monads
Scala. Introduction to FP. MonadsScala. Introduction to FP. Monads
Scala. Introduction to FP. Monads
 
Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)
 
Mysql
MysqlMysql
Mysql
 
Mysql
MysqlMysql
Mysql
 
Practical cats
Practical catsPractical cats
Practical cats
 
Pandas,scipy,numpy cheatsheet
Pandas,scipy,numpy cheatsheetPandas,scipy,numpy cheatsheet
Pandas,scipy,numpy cheatsheet
 
Pandas Cheat Sheet
Pandas Cheat SheetPandas Cheat Sheet
Pandas Cheat Sheet
 
Python3 cheatsheet
Python3 cheatsheetPython3 cheatsheet
Python3 cheatsheet
 
Python Pandas
Python PandasPython Pandas
Python Pandas
 
Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet Python Pandas for Data Science cheatsheet
Python Pandas for Data Science cheatsheet
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
Language R
Language RLanguage R
Language R
 
02 arrays
02 arrays02 arrays
02 arrays
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
R programming intro with examples
R programming intro with examplesR programming intro with examples
R programming intro with examples
 
20170509 rand db_lesugent
20170509 rand db_lesugent20170509 rand db_lesugent
20170509 rand db_lesugent
 
R reference card
R reference cardR reference card
R reference card
 
R short-refcard
R short-refcardR short-refcard
R short-refcard
 

Ähnlich wie Data import-cheatsheet

R
RR
R
exsuns
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Vyacheslav Arbuzov
 
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptxfINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
dataKarthik
 

Ähnlich wie Data import-cheatsheet (20)

tidyr.pdf
tidyr.pdftidyr.pdf
tidyr.pdf
 
R language introduction
R language introductionR language introduction
R language introduction
 
R_CheatSheet.pdf
R_CheatSheet.pdfR_CheatSheet.pdf
R_CheatSheet.pdf
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_public
 
A quick introduction to R
A quick introduction to RA quick introduction to R
A quick introduction to R
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
 
Broom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesBroom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data Frames
 
R
RR
R
 
R-Excel Integration
R-Excel IntegrationR-Excel Integration
R-Excel Integration
 
R Cheat Sheet – Data Management
R Cheat Sheet – Data ManagementR Cheat Sheet – Data Management
R Cheat Sheet – Data Management
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
Data Wrangling with dplyr and tidyr Cheat Sheet
Data Wrangling with dplyr and tidyr Cheat SheetData Wrangling with dplyr and tidyr Cheat Sheet
Data Wrangling with dplyr and tidyr Cheat Sheet
 
R Programming Reference Card
R Programming Reference CardR Programming Reference Card
R Programming Reference Card
 
Python Cheat Sheet
Python Cheat SheetPython Cheat Sheet
Python Cheat Sheet
 
Introduction to DAX Language
Introduction to DAX LanguageIntroduction to DAX Language
Introduction to DAX Language
 
R programming
R programmingR programming
R programming
 
An overview of Python 2.7
An overview of Python 2.7An overview of Python 2.7
An overview of Python 2.7
 
A tour of Python
A tour of PythonA tour of Python
A tour of Python
 
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptxfINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
 

Mehr von Dieudonne Nahigombeye (9)

Rstudio ide-cheatsheet
Rstudio ide-cheatsheetRstudio ide-cheatsheet
Rstudio ide-cheatsheet
 
Rmarkdown cheatsheet-2.0
Rmarkdown cheatsheet-2.0Rmarkdown cheatsheet-2.0
Rmarkdown cheatsheet-2.0
 
Reg ex cheatsheet
Reg ex cheatsheetReg ex cheatsheet
Reg ex cheatsheet
 
How big-is-your-graph
How big-is-your-graphHow big-is-your-graph
How big-is-your-graph
 
Ggplot2 cheatsheet-2.1
Ggplot2 cheatsheet-2.1Ggplot2 cheatsheet-2.1
Ggplot2 cheatsheet-2.1
 
Eurostat cheatsheet
Eurostat cheatsheetEurostat cheatsheet
Eurostat cheatsheet
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
 
Base r
Base rBase r
Base r
 
Advanced r
Advanced rAdvanced r
Advanced r
 

KĂźrzlich hochgeladen

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
SUHANI PANDEY
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 

KĂźrzlich hochgeladen (20)

Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 

Data import-cheatsheet

  • 1. Other types of data Try one of the following packages to import other types of files • haven - SPSS, Stata, and SAS files • readxl - excel files (.xls and .xlsx) • DBI - databases • jsonlite - json • xml2 - XML • httr - Web APIs • rvest - HTML (Web Scraping) write_csv(x, path, na = "NA", append = FALSE, col_names = !append) Tibble/df to comma delimited file. write_delim(x, path, delim = " ", na = "NA", append = FALSE, col_names = !append) Tibble/df to file with any delimiter. write_excel_csv(x, path, na = "NA", append = FALSE, col_names = !append) Tibble/df to a CSV for excel write_file(x, path, append = FALSE) String to file. write_lines(x, path, na = "NA", append = FALSE) String vector to file, one element per line. write_rds(x, path, compress = c("none", "gz", "bz2", "xz"), ...) Object to RDS file. write_tsv(x, path, na = "NA", append = FALSE, col_names = !append) Tibble/df to tab delimited files. Write functions Save x, an R object, to path, a file path, with: Read functions Parsing data types Tidy Data with tidyr Cheat Sheet R’s tidyverse is built around tidy data stored in tibbles, an enhanced version of a data frame. The front side of this sheet shows how to read text files into R with readr. The reverse side shows how to create tibbles with tibble and to layout tidy data with tidyr. Data Importwith readr, tibble, and tidyr Cheat Sheet ## Parsed with column specification: ## cols( ## age = col_integer(), ## sex = col_character(), ## earn = col_double() ## ) read_file(file, locale = default_locale()) Read a file into a single string. read_file_raw(file) Read a file into a raw vector. read_lines(file, skip = 0, n_max = -1L, locale = default_locale(), na = character(), progress = interactive()) Read each line into its own string. Skip lines read_csv("file.csv", skip = 1) Read in a subset read_csv("file.csv", n_max = 1) Missing Values read_csv("file.csv", na = c("4", "5", ".")) 1. Use problems() to diagnose problems x <- read_csv("file.csv"); problems(x) 2. Use a col_ function to guide parsing • col_guess() - the default • col_character() • col_double() • col_euro_double() • col_datetime(format = "") Also col_date(format = "") and col_time(format = "") • col_factor(levels, ordered = FALSE) • col_integer() • col_logical() • col_number() • col_numeric() • col_skip() x <- read_csv("file.csv", col_types = cols( A = col_double(), B = col_logical(), C = col_factor() )) 3. Else, read in as character vectors then parse with a parse_ function. • parse_guess(x, na = c("", "NA"), locale = default_locale()) • parse_character(x, na = c("", "NA"), locale = default_locale()) • parse_datetime(x, format = "", na = c("", "NA"), locale = default_locale()) Also parse_date() and parse_time() • parse_double(x, na = c("", "NA"), locale = default_locale()) • parse_factor(x, levels, ordered = FALSE, na = c("", "NA"), locale = default_locale()) • parse_integer(x, na = c("", "NA"), locale = default_locale()) • parse_logical(x, na = c("", "NA"), locale = default_locale()) • parse_number(x, na = c("", "NA"), locale = default_locale()) x$A <- parse_number(x$A) read_lines_raw(file, skip = 0, n_max = -1L, progress = interactive()) Read each line into a raw vector. read_log(file, col_names = FALSE, col_types = NULL, skip = 0, n_max = -1, progress = interactive()) Apache style log files. Read non-tabular data Read tabular data to tibbles read_csv() Reads comma delimited files. read_csv("file.csv") read_csv2() Reads Semi-colon delimited files. read_csv2("file2.csv") read_delim(delim, quote = """, escape_backslash = FALSE, escape_double = TRUE) Reads files with any delimiter. read_delim("file.txt", delim = "|") read_fwf(col_positions) Reads fixed width files. read_fwf("file.fwf", col_positions = c(1, 3, 5)) read_tsv() Reads tab delimited files. Also read_table(). read_tsv("file.tsv") a,b,c 1,2,3 4,5,NA a;b;c 1;2;3 4;5;NA a|b|c 1|2|3 4|5|NA a b c 1 2 3 4 5 NA These functions share the common arguments: read_*(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), quoted_na = TRUE, comment = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = interactive()) A B C 1 2 3 A B C 1 2 3 4 5 NA x y z A B C 1 2 3 4 5 NA A B C 1 2 3 NA NA NA 1 2 3 4 5 NA A B C 1 2 3 4 5 NA A B C 1 2 3 4 5 NA A B C 1 2 3 4 5 NA A B C 1 2 3 4 5 NA Useful arguments a,b,c 1,2,3 4,5,NA Example file write_csv (path = "file.csv", x = read_csv("a,b,cn1,2,3n4,5,NA")) No header read_csv("file.csv", col_names = FALSE) Provide header read_csv("file.csv", col_names = c("x", "y", "z")) readr functions guess the types of each column and convert types when appropriate (but will NOT convert strings to factors automatically). A message shows the type of each column in the result. earn is a double (numeric) sex is a character age is an integer RStudioÂŽ is a trademark of RStudio, Inc. • CC BY RStudio info@rstudio.com • 844-448-1212 • rstudio.com Learn more at browseVignettes(package = c("readr", "tibble", "tidyr")) • readr 1.1.0 • tibble 1.2.12 • tidyr 0.6.0 • Updated: 2017-01
  • 2. gather(data, key, value, ..., na.rm = FALSE, convert = FALSE, factor_key = FALSE) Gather moves column names into a key column, gathering the column values into a single value column. spread(data, key, value, fill = NA, convert = FALSE, drop = TRUE, sep = NULL) Spread moves the unique values of a key column into the column names, spreading the values of a value column across the new columns that result. Use gather() and spread() to reorganize the values of a table into a new layout. Each uses the idea of a key column: value column pair. gather(table4a, `1999`, `2000`, key = "year", value = "cases") spread(table2, type, count) valuekey country 1999 2000 A 0.7K 2K B 37K 80K C 212K 213K table4a country year cases A 1999 0.7K B 1999 37K C 1999 212K A 2000 2K B 2000 80K C 2000 213K valuekey country year cases pop A 1999 0.7K 19M A 2000 2K 20M B 1999 37K 172M B 2000 80K 174M C 1999 212K 1T C 2000 213K 1T table2 country year type count A 1999 cases 0.7K A 1999 pop 19M A 2000 cases 2K A 2000 pop 20M B 1999 cases 37K B 1999 pop 172M B 2000 cases 80K B 2000 pop 174M C 1999 cases 212K C 1999 pop 1T C 2000 cases 213K C 2000 pop 1T Split and Combine Cells unite(data, col, ..., sep = "_", remove = TRUE) Collapse cells across several columns to make a single column. drop_na(data, ...) Drop rows containing NA’s in … columns. fill(data, ..., .direction = c("down", "up")) Fill in NA’s in … columns with most recent non-NA values. replace_na(data, replace = list(), ...) Replace NA’s by column. Use these functions to split or combine cells into individual, isolated values. country year rate A 1999 0.7K/19M A 2000 2K/20M B 1999 37K/172M B 2000 80K/174M C 1999 212K/1T C 2000 213K/1T country year cases pop A 1999 0.7K 19M A 2000 2K 20M B 1999 37K 172 B 2000 80K 174 C 1999 212K 1T C 2000 213K 1T table3 separate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ...) Separate each cell in a column to make several columns. separate_rows(data, ..., sep = "[^[:alnum:].]+", convert = FALSE) Separate each cell in a column to make several rows. Also separate_rows_(). country century year Afghan 19 99 Afghan 20 0 Brazil 19 99 Brazil 20 0 China 19 99 China 20 0 country year Afghan 1999 Afghan 2000 Brazil 1999 Brazil 2000 China 1999 China 2000 table5 separate_rows(table3, rate, into = c("cases", "pop")) separate_rows(table3, rate) unite(table5, century, year, col = "year", sep = "") Tidy Data with tidyr x1 x2 A 1 B NA C NA D 3 E NA x1 x2 A 1 D 3 x x1 x2 A 1 B NA C NA D 3 E NA x1 x2 A 1 B 1 C 1 D 3 E 3 x x1 x2 A 1 B NA C NA D 3 E NA x1 x2 A 1 B 2 C 2 D 3 E 2 x drop_na(x, x2) fill(x, x2) replace_na(x,list(x2 = 2), x2) country year rate A 1999 0.7K A 1999 19M A 2000 2K A 2000 20M B 1999 37K B 1999 172M B 2000 80K B 2000 174M C 1999 212K C 1999 1T C 2000 213K C 2000 1T table3 country year rate A 1999 0.7K/19M A 2000 2K/20M B 1999 37K/172M B 2000 80K/174M C 1999 212K/1T C 2000 213K/1T • Control the default appearance with options: options(tibble.print_max = n, tibble.print_min = m, tibble.width = Inf) • View entire data set with View(x, title) or glimpse(x, width = NULL, …) • Revert to data frame with as.data.frame() (required for some older packages) Tibbles - an enhanced data frame Handle Missing Values Reshape Data - change the layout of values in a table Tidy data is a way to organize tabular data. It provides a consistent data structure across packages. CBA A * B -> C *A B C Each observation, or case, is in its own row A B C Each variable is in its own column A B C & Learn more at browseVignettes(package = c("readr", "tibble", "tidyr")) • readr 1.1.0 • tibble 1.2.12 • tidyr 0.6.0 • Updated: 2017-01 The tibble package provides a new S3 class for storing tabular data, the tibble. Tibbles inherit the data frame class, but improve two behaviors: • Display - When you print a tibble, R provides a concise view of the data that fits on one screen. • Subsetting - [ always returns a new tibble, [[ and $ always return a vector. • No partial matching - You must use full column names when subsetting as_tibble(x, …) Convert data frame to tibble. enframe(x, name = "name", value = "value") Converts named vector to a tibble with a names column and a values column. is_tibble(x) Test whether x is a tibble. data frame display tibble display Construct a tibble in two ways tibble(…) Construct by columns. tibble(x = 1:3, y = c("a", "b", "c")) tribble(…) Construct by rows. tribble( ~x, ~y, 1, "a", 2, "b", 3, "c") A tibble: 3 × 2 x y <int> <dbl> 1 1 a 2 2 b 3 3 c Both make this tibble ww # A tibble: 234 × 6 manufacturer model displ <chr> <chr> <dbl> 1 audi a4 1.8 2 audi a4 1.8 3 audi a4 2.0 4 audi a4 2.0 5 audi a4 2.8 6 audi a4 2.8 7 audi a4 3.1 8 audi a4 quattro 1.8 9 audi a4 quattro 1.8 10 audi a4 quattro 2.0 # ... with 224 more rows, and 3 # more variables: year <int>, # cyl <int>, trans <chr> 156 1999 6 auto(l4) 157 1999 6 auto(l4) 158 2008 6 auto(l4) 159 2008 8 auto(s4) 160 1999 4 manual(m5) 161 1999 4 auto(l4) 162 2008 4 manual(m5) 163 2008 4 manual(m5) 164 2008 4 auto(l4) 165 2008 4 auto(l4) 166 1999 4 auto(l4) [ reached getOption("max.print") -- omitted 68 rows ]A large table to display RStudioÂŽ is a trademark of RStudio, Inc. • CC BY RStudio info@rstudio.com • 844-448-1212 • rstudio.com A table is tidy if: Tidy data: Makes variables easy to access as vectors Preserves cases during vectorized operations Expand Tables - quickly create tables with combinations of values complete(data, ..., fill = list()) Adds to the data missing combinations of the values of the variables listed in … complete(mtcars, cyl, gear, carb) expand(data, ...) Create new tibble with all possible combinations of the values of the variables listed in … expand(mtcars, cyl, gear, carb)