SlideShare ist ein Scribd-Unternehmen logo
1 von 54
Downloaden Sie, um offline zu lesen
Next generation programming in R
Florian Uhlitz
uhlitz@hu-berlin.de
uhlitz.github.io
%>%
magrittr
readr
tidyr
dplyr
%>%
load data
reshape data
manipulate data
Stefan Milton Bache,
University of Southern Denmark
Hadley Wickham,
Rice University, RStudio
Recent developments in the R environment
magrittr
readr tidyr dplyr
%>%
load reshape manipulate%>% %>%
Toolbox for data wrangling in R
data wrangling
adapted from H. Wickham
magrittr
readr tidyr dplyr
%>%
load reshape manipulate%>% %>%
Toolbox for data wrangling in R
data wrangling
model
visualise
adapted from H. Wickham
report
magrittr
readr tidyr dplyr
%>%
load reshape manipulate%>% %>%
Toolbox for data wrangling in R
data wrangling
model
visualise
adapted from H. Wickham
report
magrittr
readr tidyr dplyr
%>%
load reshape manipulate%>% %>%
Toolbox for data wrangling in R
data wrangling
model
visualise
base
ggplot2
rmarkdown
broom
adapted from H. Wickham
data analysis
report
magrittr
readr tidyr dplyr
%>%
load reshape manipulate%>% %>%
Toolbox for data wrangling in R
data wrangling
model
visualise
base
ggplot2
rmarkdown
broom
adapted from H. Wickham
magrittr
In a pipe, the result of the left hand statement is handed
over to the function on the right hand side:
…similar to Unix pipe operator |
f(x, y)
x %>% f(y)
f(x, y, z)
x %>% f(y, z)
f2(f1(x), y)
f1(x) %>% f2(y)
magrittr
nested 

functions
magrittr
nested 

functions
chain of

functions
readr, readxl, haven
readr::read_csv()
readr::read_tsv()
readr::read_log()
readr::read_delim()
readr::read_fwf()
readr::read_table()
readxl::read_excel()
haven::read_sas()
haven::read_spss()
haven::read_stata()
tidyr
gather() spread()
Reshaping
adapted from rstudio.com/resources/cheatsheets/
tidyr
gather() spread()
separate() unite()
Reshaping
adapted from rstudio.com/resources/cheatsheets/
dplyr
filter(x > 1) select(B, C, E)
A B C D E B C Ex
1
2
3
1
x
2
3
Subsetting
adapted from rstudio.com/resources/cheatsheets/
dplyr
Transforming Summarising
1
2
3
x
4
5
6
y
1
2
3
x
4
5
6
y
5
7
9
z
mutate(z = x + y) summarise(A = sum(x), B = sum(y))
1
2
3
x
4
5
6
y
6
A
15
B
adapted from rstudio.com/resources/cheatsheets/
dplyr
Transforming Summarising
1
2
3
x
4
5
6
y
1
2
3
x
4
5
6
y
5
7
9
z
mutate(z = x + y) summarise(A = sum(x), B = sum(y))
1
2
3
x
4
5
6
y
6
A
15
B
group_by() %>% mutate() group_by() %>% summarise()
adapted from rstudio.com/resources/cheatsheets/
What`s tidy data?
KEEP

CALMAND
TIDY

UP
»Happy families are all alike; every unhappy
family is unhappy in its own way.«




Leo Tolstoy
Anna Karenina principle
»Tidy data sets are all alike; every messy
data set is messy in its own way.«




Hadley Wickham
Tidy data principle
Tidy data definition
Wickham, H. (2014). Tidy Data. Journal of Statistical Software
read_excel(“untidy_data.xlsx”) %>%
set_colnames(mynames) %>%
slice(1:36) %>%
fill(group, condition) %>%
separate(group, into = c(“Gene”, “Mutation”, “clone”), sep = “_”) %>%
write_tsv(“tidy_data.tsv”)
read_excel(“untidy_data.xlsx”) %>%
set_colnames(mynames) %>%
slice(1:36) %>%
fill(group, condition) %>%
separate(group, into = c(“Gene”, “Mutation”, “clone”), sep = “_”) %>%
write_tsv(“tidy_data.tsv”)
read_excel
read_excel %>% set_colnames
read_excel %>% set_colnames %>% tail
read_excel %>% set_colnames
read_excel %>% set_colnames %>% slice
read_excel %>% set_colnames %>% slice %>% fill
read_excel %>% set_colnames %>% slice %>% fill %>% select
read_excel %>% set_colnames %>% slice %>% fill %>% select %>% distinct
read_excel %>% set_colnames %>% slice %>% fill %>% select %>% distinct %>%

separate
read_excel %>% set_colnames %>% slice %>% fill %>% select %>% distinct %>%

separate
Caution!

readr, tidy & dplyr do “clever” stuff.
(heuristics like predicting a column class by
looking at the first 1000 entries)
read_excel %>% set_colnames %>% slice %>% fill %>% select %>% distinct

separate
read_excel %>% set_colnames %>% slice %>% fill %>% select %>% distinct

separate %>% unite
read_excel %>% set_colnames %>% slice %>% fill %>% select %>% distinct

separate %>% unite
Tidy data definition
Wickham, H. (2014). Tidy Data. Journal of Statistical Software
read_tsv
read_tsv %>% gather(key, value, -variable)
read_tsv %>% gather %>% spread(key, value)
read_tsv %>% gather
read_tsv %>% gather %>% filter
read_tsv %>% gather %>% filter %>% group_by
read_tsv %>% gather %>% filter %>% group_by %>% summarise %>% arrange
read_tsv %>% gather %>% filter %>% group_by %>% summarise %>% arrange
read_tsv %>% gather %>% filter %>% group_by %>% summarise %>% arrange
Data Wrangling
with dplyr and tidyr
Cheat Sheet
RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com
Syntax - Helpful conventions for wrangling
dplyr::tbl_df(iris)
Converts data to tbl class. tbl’s are easier to examine than
data frames. R displays only the data that fits onscreen:
dplyr::glimpse(iris)
Information dense summary of tbl data.
utils::View(iris)
View data set in spreadsheet-like display (note capital V).
Source: local data frame [150 x 5]
Sepal.Length Sepal.Width Petal.Length
1 5.1 3.5 1.4
2 4.9 3.0 1.4
3 4.7 3.2 1.3
4 4.6 3.1 1.5
5 5.0 3.6 1.4
.. ... ... ...
Variables not shown: Petal.Width (dbl),
Species (fctr)
dplyr::%>%
Passes object on left hand side as first argument (or .
argument) of function on righthand side.
"Piping" with %>% makes code more readable, e.g.
iris %>%
group_by(Species) %>%
summarise(avg = mean(Sepal.Width)) %>%
arrange(avg)
x %>% f(y) is the same as f(x, y)
y %>% f(x, ., z) is the same as f(x, y, z )
Reshaping Data - Change the layout of a data set
Subset Observations (Rows) Subset Variables (Columns)
F M A
Each variable is saved
in its own column
F M A
Each observation is
saved in its own row
In a tidy
data set: &
Tidy Data - A foundation for wrangling in R
Tidy data complements R’s vectorized
operations. R will automatically preserve
observations as you manipulate variables.
No other format works as intuitively with R.
FAM
M * A
*
tidyr::gather(cases, "year", "n", 2:4)
Gather columns into rows.
tidyr::unite(data, col, ..., sep)
Unite several columns into one.
dplyr::data_frame(a = 1:3, b = 4:6)
Combine vectors into data frame
(optimized).
dplyr::arrange(mtcars, mpg)
Order rows by values of a column
(low to high).
dplyr::arrange(mtcars, desc(mpg))
Order rows by values of a column
(high to low).
dplyr::rename(tb, y = year)
Rename the columns of a data
frame.
tidyr::spread(pollution, size, amount)
Spread rows into columns.
tidyr::separate(storms, date, c("y", "m", "d"))
Separate one column into several.
wwwwwwA1005A1013A1010A1010
wwp110110100745451009
wwp110110100745451009 wwp110110100745451009wwp110110100745451009
wppw11010071007110451009100945
wwwww110110110110110 wwww
dplyr::filter(iris, Sepal.Length > 7)
Extract rows that meet logical criteria.
dplyr::distinct(iris)
Remove duplicate rows.
dplyr::sample_frac(iris, 0.5, replace = TRUE)
Randomly select fraction of rows.
dplyr::sample_n(iris, 10, replace = TRUE)
Randomly select n rows.
dplyr::slice(iris, 10:15)
Select rows by position.
dplyr::top_n(storms, 2, date)
Select and order top n entries (by group if grouped data).
< Less than != Not equal to
> Greater than %in% Group membership
== Equal to is.na Is NA
<= Less than or equal to !is.na Is not NA
>= Greater than or equal to &,|,!,xor,any,all Boolean operators
Logic in R - ?Comparison, ?base::Logic
dplyr::select(iris, Sepal.Width, Petal.Length, Species)
Select columns by name or helper function.
Helper functions for select - ?select
select(iris, contains("."))
Select columns whose name contains a character string.
select(iris, ends_with("Length"))
Select columns whose name ends with a character string.
select(iris, everything())
Select every column.
select(iris, matches(".t."))
Select columns whose name matches a regular expression.
select(iris, num_range("x", 1:5))
Select columns named x1, x2, x3, x4, x5.
select(iris, one_of(c("Species", "Genus")))
Select columns whose names are in a group of names.
select(iris, starts_with("Sepal"))
Select columns whose name starts with a character string.
select(iris, Sepal.Length:Petal.Width)
Select all columns between Sepal.Length and Petal.Width (inclusive).
select(iris, -Species)
Select all columns except Species.
Learn more with browseVignettes(package = c("dplyr", "tidyr")) • dplyr 0.4.0• tidyr 0.2.0 • Updated: 1/15
wwwwwwA1005A1013A1010A1010
devtools::install_github("rstudio/EDAWR") for data sets
rstudio.com/resources/cheatsheets/
Next Generation Programming in R

Weitere ähnliche Inhalte

Was ist angesagt?

R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In RRsquared Academy
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data VisualizationSakthi Dasans
 
Merge Multiple CSV in single data frame using R
Merge Multiple CSV in single data frame using RMerge Multiple CSV in single data frame using R
Merge Multiple CSV in single data frame using RYogesh Khandelwal
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching moduleSander Timmer
 
3 R Tutorial Data Structure
3 R Tutorial Data Structure3 R Tutorial Data Structure
3 R Tutorial Data StructureSakthi Dasans
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factorskrishna singh
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandasPiyush rai
 
Python for R Users
Python for R UsersPython for R Users
Python for R UsersAjay Ohri
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Alexander Hendorf
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Sparksamthemonad
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-exportFAO
 
Introduction to data.table in R
Introduction to data.table in RIntroduction to data.table in R
Introduction to data.table in RPaul Richards
 
Stata Programming Cheat Sheet
Stata Programming Cheat SheetStata Programming Cheat Sheet
Stata Programming Cheat SheetLaura Hughes
 
R Workshop for Beginners
R Workshop for BeginnersR Workshop for Beginners
R Workshop for BeginnersMetamarkets
 

Was ist angesagt? (20)

R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
 
R seminar dplyr package
R seminar dplyr packageR seminar dplyr package
R seminar dplyr package
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
R programming language
R programming languageR programming language
R programming language
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data Visualization
 
Merge Multiple CSV in single data frame using R
Merge Multiple CSV in single data frame using RMerge Multiple CSV in single data frame using R
Merge Multiple CSV in single data frame using R
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
3 R Tutorial Data Structure
3 R Tutorial Data Structure3 R Tutorial Data Structure
3 R Tutorial Data Structure
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
 
Python for R Users
Python for R UsersPython for R Users
Python for R Users
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Data Analysis in Python
Data Analysis in PythonData Analysis in Python
Data Analysis in Python
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-export
 
Introduction to data.table in R
Introduction to data.table in RIntroduction to data.table in R
Introduction to data.table in R
 
Stata Programming Cheat Sheet
Stata Programming Cheat SheetStata Programming Cheat Sheet
Stata Programming Cheat Sheet
 
R Workshop for Beginners
R Workshop for BeginnersR Workshop for Beginners
R Workshop for Beginners
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 

Andere mochten auch

Data and donuts: Data Visualization using R
Data and donuts: Data Visualization using RData and donuts: Data Visualization using R
Data and donuts: Data Visualization using RC. Tobin Magle
 
Self Learning Credit Scoring Model Presentation
Self Learning Credit Scoring Model PresentationSelf Learning Credit Scoring Model Presentation
Self Learning Credit Scoring Model PresentationSwitchPitch
 
Aire - Alternative Credit Scoring (TechStars DemoDay - Sep 2014)
Aire - Alternative Credit Scoring (TechStars DemoDay - Sep 2014)Aire - Alternative Credit Scoring (TechStars DemoDay - Sep 2014)
Aire - Alternative Credit Scoring (TechStars DemoDay - Sep 2014)Aire
 
20160611 kintone Café 高知 Vol.3 LT資料
20160611 kintone Café 高知 Vol.3 LT資料20160611 kintone Café 高知 Vol.3 LT資料
20160611 kintone Café 高知 Vol.3 LT資料安隆 沖
 
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016Penn State University
 
R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8Muhammad Nabi Ahmad
 
Análisis espacial con R (asignatura de Master - UPM)
Análisis espacial con R (asignatura de Master - UPM)Análisis espacial con R (asignatura de Master - UPM)
Análisis espacial con R (asignatura de Master - UPM)Vladimir Gutierrez, PhD
 
Paquete ggplot - Potencia y facilidad para generar gráficos en R
Paquete ggplot - Potencia y facilidad para generar gráficos en RPaquete ggplot - Potencia y facilidad para generar gráficos en R
Paquete ggplot - Potencia y facilidad para generar gráficos en RNestor Montaño
 
Presentation DataScoring: Big Data and credit score
Presentation DataScoring: Big Data and credit scorePresentation DataScoring: Big Data and credit score
Presentation DataScoring: Big Data and credit scoreAnton Vokrug
 
Learn to use dplyr (Feb 2015 Philly R User Meetup)
Learn to use dplyr (Feb 2015 Philly R User Meetup)Learn to use dplyr (Feb 2015 Philly R User Meetup)
Learn to use dplyr (Feb 2015 Philly R User Meetup)Fan Li
 
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016WF ED 540, Class Meeting 3 - select, filter, arrange, 2016
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016Penn State University
 
WF ED 540, Class Meeting 3 - mutate and summarise, 2016
WF ED 540, Class Meeting 3 - mutate and summarise, 2016WF ED 540, Class Meeting 3 - mutate and summarise, 2016
WF ED 540, Class Meeting 3 - mutate and summarise, 2016Penn State University
 
R Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RR Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RRsquared Academy
 
Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R StudioSusan Johnston
 

Andere mochten auch (18)

Data and donuts: Data Visualization using R
Data and donuts: Data Visualization using RData and donuts: Data Visualization using R
Data and donuts: Data Visualization using R
 
Fast data munging in R
Fast data munging in RFast data munging in R
Fast data munging in R
 
Self Learning Credit Scoring Model Presentation
Self Learning Credit Scoring Model PresentationSelf Learning Credit Scoring Model Presentation
Self Learning Credit Scoring Model Presentation
 
Generating random primes
Generating random primesGenerating random primes
Generating random primes
 
Aire - Alternative Credit Scoring (TechStars DemoDay - Sep 2014)
Aire - Alternative Credit Scoring (TechStars DemoDay - Sep 2014)Aire - Alternative Credit Scoring (TechStars DemoDay - Sep 2014)
Aire - Alternative Credit Scoring (TechStars DemoDay - Sep 2014)
 
20160611 kintone Café 高知 Vol.3 LT資料
20160611 kintone Café 高知 Vol.3 LT資料20160611 kintone Café 高知 Vol.3 LT資料
20160611 kintone Café 高知 Vol.3 LT資料
 
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016
WF ED 540, Class Meeting 3 - Introduction to dplyr, 2016
 
Rlecturenotes
RlecturenotesRlecturenotes
Rlecturenotes
 
R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8R Brown-bag seminars : Seminar-8
R Brown-bag seminars : Seminar-8
 
Análisis espacial con R (asignatura de Master - UPM)
Análisis espacial con R (asignatura de Master - UPM)Análisis espacial con R (asignatura de Master - UPM)
Análisis espacial con R (asignatura de Master - UPM)
 
Paquete ggplot - Potencia y facilidad para generar gráficos en R
Paquete ggplot - Potencia y facilidad para generar gráficos en RPaquete ggplot - Potencia y facilidad para generar gráficos en R
Paquete ggplot - Potencia y facilidad para generar gráficos en R
 
Presentation DataScoring: Big Data and credit score
Presentation DataScoring: Big Data and credit scorePresentation DataScoring: Big Data and credit score
Presentation DataScoring: Big Data and credit score
 
Learn to use dplyr (Feb 2015 Philly R User Meetup)
Learn to use dplyr (Feb 2015 Philly R User Meetup)Learn to use dplyr (Feb 2015 Philly R User Meetup)
Learn to use dplyr (Feb 2015 Philly R User Meetup)
 
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016WF ED 540, Class Meeting 3 - select, filter, arrange, 2016
WF ED 540, Class Meeting 3 - select, filter, arrange, 2016
 
WF ED 540, Class Meeting 3 - mutate and summarise, 2016
WF ED 540, Class Meeting 3 - mutate and summarise, 2016WF ED 540, Class Meeting 3 - mutate and summarise, 2016
WF ED 540, Class Meeting 3 - mutate and summarise, 2016
 
R Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RR Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In R
 
Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R Studio
 
Dplyr and Plyr
Dplyr and PlyrDplyr and Plyr
Dplyr and Plyr
 

Ähnlich wie Next Generation Programming in R

Data Wrangling with dplyr and tidyr Cheat Sheet
Data Wrangling with dplyr and tidyr Cheat SheetData Wrangling with dplyr and tidyr Cheat Sheet
Data Wrangling with dplyr and tidyr Cheat SheetDr. Volkan OBAN
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
Broom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesBroom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesWork-Bench
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_publicLong Nguyen
 
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptxfINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptxdataKarthik
 
Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Yao Yao
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Spencer Fox
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
 
India software developers conference 2013 Bangalore
India software developers conference 2013 BangaloreIndia software developers conference 2013 Bangalore
India software developers conference 2013 BangaloreSatnam Singh
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciencesalexstorer
 
PPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatrePPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatreRaginiRatre
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
 
Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017Parth Khare
 

Ähnlich wie Next Generation Programming in R (20)

Data Wrangling with dplyr and tidyr Cheat Sheet
Data Wrangling with dplyr and tidyr Cheat SheetData Wrangling with dplyr and tidyr Cheat Sheet
Data Wrangling with dplyr and tidyr Cheat Sheet
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
Broom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesBroom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data Frames
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_public
 
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptxfINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
 
R Cheat Sheet
R Cheat SheetR Cheat Sheet
R Cheat Sheet
 
Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
India software developers conference 2013 Bangalore
India software developers conference 2013 BangaloreIndia software developers conference 2013 Bangalore
India software developers conference 2013 Bangalore
 
R Basics
R BasicsR Basics
R Basics
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
ComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical SciencesComputeFest 2012: Intro To R for Physical Sciences
ComputeFest 2012: Intro To R for Physical Sciences
 
PPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatrePPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini Ratre
 
User biglm
User biglmUser biglm
User biglm
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017Big Data Mining in Indian Economic Survey 2017
Big Data Mining in Indian Economic Survey 2017
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 

Kürzlich hochgeladen

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 

Kürzlich hochgeladen (20)

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 

Next Generation Programming in R

  • 1. Next generation programming in R Florian Uhlitz uhlitz@hu-berlin.de uhlitz.github.io %>%
  • 2. magrittr readr tidyr dplyr %>% load data reshape data manipulate data Stefan Milton Bache, University of Southern Denmark Hadley Wickham, Rice University, RStudio Recent developments in the R environment
  • 3. magrittr readr tidyr dplyr %>% load reshape manipulate%>% %>% Toolbox for data wrangling in R data wrangling adapted from H. Wickham
  • 4. magrittr readr tidyr dplyr %>% load reshape manipulate%>% %>% Toolbox for data wrangling in R data wrangling model visualise adapted from H. Wickham
  • 5. report magrittr readr tidyr dplyr %>% load reshape manipulate%>% %>% Toolbox for data wrangling in R data wrangling model visualise adapted from H. Wickham
  • 6. report magrittr readr tidyr dplyr %>% load reshape manipulate%>% %>% Toolbox for data wrangling in R data wrangling model visualise base ggplot2 rmarkdown broom adapted from H. Wickham
  • 7. data analysis report magrittr readr tidyr dplyr %>% load reshape manipulate%>% %>% Toolbox for data wrangling in R data wrangling model visualise base ggplot2 rmarkdown broom adapted from H. Wickham
  • 8. magrittr In a pipe, the result of the left hand statement is handed over to the function on the right hand side: …similar to Unix pipe operator | f(x, y) x %>% f(y) f(x, y, z) x %>% f(y, z) f2(f1(x), y) f1(x) %>% f2(y)
  • 12. tidyr gather() spread() Reshaping adapted from rstudio.com/resources/cheatsheets/
  • 13. tidyr gather() spread() separate() unite() Reshaping adapted from rstudio.com/resources/cheatsheets/
  • 14. dplyr filter(x > 1) select(B, C, E) A B C D E B C Ex 1 2 3 1 x 2 3 Subsetting adapted from rstudio.com/resources/cheatsheets/
  • 15. dplyr Transforming Summarising 1 2 3 x 4 5 6 y 1 2 3 x 4 5 6 y 5 7 9 z mutate(z = x + y) summarise(A = sum(x), B = sum(y)) 1 2 3 x 4 5 6 y 6 A 15 B adapted from rstudio.com/resources/cheatsheets/
  • 16. dplyr Transforming Summarising 1 2 3 x 4 5 6 y 1 2 3 x 4 5 6 y 5 7 9 z mutate(z = x + y) summarise(A = sum(x), B = sum(y)) 1 2 3 x 4 5 6 y 6 A 15 B group_by() %>% mutate() group_by() %>% summarise() adapted from rstudio.com/resources/cheatsheets/
  • 18. »Happy families are all alike; every unhappy family is unhappy in its own way.« 
 
 Leo Tolstoy Anna Karenina principle
  • 19. »Tidy data sets are all alike; every messy data set is messy in its own way.« 
 
 Hadley Wickham Tidy data principle
  • 20. Tidy data definition Wickham, H. (2014). Tidy Data. Journal of Statistical Software
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27. read_excel(“untidy_data.xlsx”) %>% set_colnames(mynames) %>% slice(1:36) %>% fill(group, condition) %>% separate(group, into = c(“Gene”, “Mutation”, “clone”), sep = “_”) %>% write_tsv(“tidy_data.tsv”)
  • 28. read_excel(“untidy_data.xlsx”) %>% set_colnames(mynames) %>% slice(1:36) %>% fill(group, condition) %>% separate(group, into = c(“Gene”, “Mutation”, “clone”), sep = “_”) %>% write_tsv(“tidy_data.tsv”)
  • 34. read_excel %>% set_colnames %>% slice %>% fill
  • 35. read_excel %>% set_colnames %>% slice %>% fill %>% select
  • 36. read_excel %>% set_colnames %>% slice %>% fill %>% select %>% distinct
  • 37. read_excel %>% set_colnames %>% slice %>% fill %>% select %>% distinct %>%
 separate
  • 38. read_excel %>% set_colnames %>% slice %>% fill %>% select %>% distinct %>%
 separate Caution!
 readr, tidy & dplyr do “clever” stuff. (heuristics like predicting a column class by looking at the first 1000 entries)
  • 39. read_excel %>% set_colnames %>% slice %>% fill %>% select %>% distinct
 separate
  • 40. read_excel %>% set_colnames %>% slice %>% fill %>% select %>% distinct
 separate %>% unite
  • 41. read_excel %>% set_colnames %>% slice %>% fill %>% select %>% distinct
 separate %>% unite
  • 42.
  • 43. Tidy data definition Wickham, H. (2014). Tidy Data. Journal of Statistical Software
  • 45. read_tsv %>% gather(key, value, -variable)
  • 46. read_tsv %>% gather %>% spread(key, value)
  • 48. read_tsv %>% gather %>% filter
  • 49. read_tsv %>% gather %>% filter %>% group_by
  • 50. read_tsv %>% gather %>% filter %>% group_by %>% summarise %>% arrange
  • 51. read_tsv %>% gather %>% filter %>% group_by %>% summarise %>% arrange
  • 52. read_tsv %>% gather %>% filter %>% group_by %>% summarise %>% arrange
  • 53. Data Wrangling with dplyr and tidyr Cheat Sheet RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com Syntax - Helpful conventions for wrangling dplyr::tbl_df(iris) Converts data to tbl class. tbl’s are easier to examine than data frames. R displays only the data that fits onscreen: dplyr::glimpse(iris) Information dense summary of tbl data. utils::View(iris) View data set in spreadsheet-like display (note capital V). Source: local data frame [150 x 5] Sepal.Length Sepal.Width Petal.Length 1 5.1 3.5 1.4 2 4.9 3.0 1.4 3 4.7 3.2 1.3 4 4.6 3.1 1.5 5 5.0 3.6 1.4 .. ... ... ... Variables not shown: Petal.Width (dbl), Species (fctr) dplyr::%>% Passes object on left hand side as first argument (or . argument) of function on righthand side. "Piping" with %>% makes code more readable, e.g. iris %>% group_by(Species) %>% summarise(avg = mean(Sepal.Width)) %>% arrange(avg) x %>% f(y) is the same as f(x, y) y %>% f(x, ., z) is the same as f(x, y, z ) Reshaping Data - Change the layout of a data set Subset Observations (Rows) Subset Variables (Columns) F M A Each variable is saved in its own column F M A Each observation is saved in its own row In a tidy data set: & Tidy Data - A foundation for wrangling in R Tidy data complements R’s vectorized operations. R will automatically preserve observations as you manipulate variables. No other format works as intuitively with R. FAM M * A * tidyr::gather(cases, "year", "n", 2:4) Gather columns into rows. tidyr::unite(data, col, ..., sep) Unite several columns into one. dplyr::data_frame(a = 1:3, b = 4:6) Combine vectors into data frame (optimized). dplyr::arrange(mtcars, mpg) Order rows by values of a column (low to high). dplyr::arrange(mtcars, desc(mpg)) Order rows by values of a column (high to low). dplyr::rename(tb, y = year) Rename the columns of a data frame. tidyr::spread(pollution, size, amount) Spread rows into columns. tidyr::separate(storms, date, c("y", "m", "d")) Separate one column into several. wwwwwwA1005A1013A1010A1010 wwp110110100745451009 wwp110110100745451009 wwp110110100745451009wwp110110100745451009 wppw11010071007110451009100945 wwwww110110110110110 wwww dplyr::filter(iris, Sepal.Length > 7) Extract rows that meet logical criteria. dplyr::distinct(iris) Remove duplicate rows. dplyr::sample_frac(iris, 0.5, replace = TRUE) Randomly select fraction of rows. dplyr::sample_n(iris, 10, replace = TRUE) Randomly select n rows. dplyr::slice(iris, 10:15) Select rows by position. dplyr::top_n(storms, 2, date) Select and order top n entries (by group if grouped data). < Less than != Not equal to > Greater than %in% Group membership == Equal to is.na Is NA <= Less than or equal to !is.na Is not NA >= Greater than or equal to &,|,!,xor,any,all Boolean operators Logic in R - ?Comparison, ?base::Logic dplyr::select(iris, Sepal.Width, Petal.Length, Species) Select columns by name or helper function. Helper functions for select - ?select select(iris, contains(".")) Select columns whose name contains a character string. select(iris, ends_with("Length")) Select columns whose name ends with a character string. select(iris, everything()) Select every column. select(iris, matches(".t.")) Select columns whose name matches a regular expression. select(iris, num_range("x", 1:5)) Select columns named x1, x2, x3, x4, x5. select(iris, one_of(c("Species", "Genus"))) Select columns whose names are in a group of names. select(iris, starts_with("Sepal")) Select columns whose name starts with a character string. select(iris, Sepal.Length:Petal.Width) Select all columns between Sepal.Length and Petal.Width (inclusive). select(iris, -Species) Select all columns except Species. Learn more with browseVignettes(package = c("dplyr", "tidyr")) • dplyr 0.4.0• tidyr 0.2.0 • Updated: 1/15 wwwwwwA1005A1013A1010A1010 devtools::install_github("rstudio/EDAWR") for data sets rstudio.com/resources/cheatsheets/