SlideShare ist ein Scribd-Unternehmen logo
1 von 19
R SEMINAR
Antony Karanja N.
Research Methods Group, ICRAF
2nd April, 15
Data Management and Analysis
AIM
• Recap on the steps and tips to R learning to
code
• Introduction to dplyr package
• How to utilize dplyr package for data
manipulation* and basic statistics
• Ultimate: dplyr and ggplot2
RECAP
• Set working directory (creating project, setwd)
• Installing and calling library packages
• Reading/loading data (read.???)
• What is the R object type (class)
• Variables within data frames
• Knowing which Data type are the variables
• View head and tail data
RECAP###################
# IMPORT datasets #
###################
tree<-read.csv(file="datavis.csv",header=T)
#-------------------------
# Inspect data with head()
#-------------------------
names(tree);colnames(tree)
head(tree)
tail(tree)
#-------------------------
# Inspect R object type
#-------------------------
class(tree)
#-------------------------
# Inspect Internal structure of R object type
#-------------------------
str(tree)
glimpse(tree)
#-------------------------
# Inspect data types
#-------------------------
sapply(tree,class) #-horizontal view
lapply(tree,class) #-Vertical view
##############################
# LOOK FOR DUPLICATE RECORDS #
##############################
duplicates<-tree[anyDuplicated(tree[c("Country","Site","PosTopoSeq")]),] #Base function
dplyr
• #install.packages(“dplyr”)
• >library(dplyr)
• Grammar of data manipulations
– filter() (and slice())
– arrange()
– select() (and rename())
– distinct()
– mutate() (and transmute())
– summarise()
– sample_n() and sample_frac()
filter()
• filter() allows you to select a subset of the rows of a
data frame.
• filter() works similarly to subset()
• Filter(FD, condition(s))
#1.0 #### filter - By and (use comma) or use |
table(tree$Country)
Nicaragua<-filter(tree, Country == "Nicaragua")
SA<-filter(tree, Country == "South Africa")
#1.1 #### slice
Nicaragua2<-slice(tree, 1:16)
arrange()
• arrange() works similarly to filter() except that
instead of filtering or selecting rows, it reorders
them.
#2.0 #### arrange
arrange(tree, Site,PosTopoSeq,VegStructure)
tree_arr<-arrange(tree, Site,PosTopoSeq,VegStructure)
tree_arr<-arrange(tree, desc(Site),PosTopoSeq,VegStructure)
select()
• Very helpful when working with dataset with many
columns/variables
• Helper function within select() include starts_with(),
ends_with(), matches() and contains()
#2.0 #### select
tree_select<-select(tree,Country,SEVEREERO,avSlope,avTreeDen,Carbon,pH,Clay)
tree_select<-select(tree,Country,SEVEREERO,avSlope,avTreeDen,Carbon,pH>=5,Clay)
#err!!!!
# What is happening here????
tree_select<-select(tree,-c(Site,PosTopoSeq,VegStructure))
tree_select<-select(tree,-(Site:VegStructure))
select()
#2.0.1 select and helper functions
# Keep variables or drop if negative sign (-)
select(tree, starts_with("av",ignore.case=T),starts_with("C"))
select(tree, ends_with("e"))
select(tree, contains("p"))
select(tree, matches("av"))
rename()
• To assign another name to the existing
variable
#2.1 #### rename
tree_rename<-rename(tree,Slope=avSlope)
tree_rename<-rename(tree,Slope=avSlope,TreeDen=avTreeDen)
distinct()
• Extract distinct (unique) rows
#3.0 ### distinct
tree_distinct<-distinct(tree)
tree_distinct<-distinct(select(tree,Country,Site,PosTopoSeq))
mutate()
• add new columns that are functions of
existing columns.
#4.0 ### Mutate
tree_mute<-mutate(tree,Acidbase = 7-pH,clay.cover = Clay / avTreeDen)
#4.0.1 ### transmute
tree_mute<-transmute(tree,Acidbase = 7-pH,clay.cover = Clay / avTreeDen)
sample_n()
• use sample_n() and sample_frac() to take a
random sample of rows
#5.0 ### sample_n()
sample_n(tree, 10,replace=F)
#5.0.1 ### sample_frac()
sample_frac(tbl=tree, size=0.1)
summarise()
• Generate stats from the existing columns/variables.
Also generates by stats by grouping variable(s)
summarise(tree,
count = n(),
MeanCarb = mean(Carbon, na.rm = TRUE),
MeanClay = mean(Clay, na.rm = TRUE),
MedPh=median(pH,na.rm=T))
summarise()
• Stats by grouping variable(s)
tree.summary <- tree %>%
group_by(Country,Site,SEVEREERO) %>%
summarise(count = n(),
meanC = mean(Carbon,na.rm=T),
meanClay = mean(Clay,na.rm=T),
sdC=sd(Carbon,na.rm=T),
sdClay=sd(Clay,na.rm=T),
medPh=median(pH,na.rm=T))
R Version
>R.Version()$version.string
OR
>R.version.string
BONUS
Update R
For windows OS
# installing/loading the package:
>if(!require(installr)) { install.packages("installr”)
>require(installr)} #load / install+load installr
# using the package:
>updateR() # this will start the updating process of your R installation.
Note: It will check for newer versions, and if one is available, will guide you
through the decisions you'd need to make.
Exercise
Use data you are working on and;
1. Manipulate using this the functions above
2. Explore more dplyr functions e.g, how to add row-wise,
column-wise e.t.c

Weitere ähnliche Inhalte

Was ist angesagt?

Data handling in r
Data handling in rData handling in r
Data handling in rAbhik Seal
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply FunctionSakthi Dasans
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization Sourabh Sahu
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandasPiyush rai
 
Introduction to data.table in R
Introduction to data.table in RIntroduction to data.table in R
Introduction to data.table in RPaul Richards
 
Python for R Users
Python for R UsersPython for R Users
Python for R UsersAjay Ohri
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Alexander Hendorf
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
SAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsSAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsAvjinder (Avi) Kaler
 
Pandas Cheat Sheet
Pandas Cheat SheetPandas Cheat Sheet
Pandas Cheat SheetACASH1011
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Julian Hyde
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesAndrew Ferlitsch
 
Python for R developers and data scientists
Python for R developers and data scientistsPython for R developers and data scientists
Python for R developers and data scientistsLambda Tree
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache CalciteJulian Hyde
 
Manipulating data with dates
Manipulating data with datesManipulating data with dates
Manipulating data with datesRupak Roy
 

Was ist angesagt? (20)

Data handling in r
Data handling in rData handling in r
Data handling in r
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
Introduction to data.table in R
Introduction to data.table in RIntroduction to data.table in R
Introduction to data.table in R
 
Python for R Users
Python for R UsersPython for R Users
Python for R Users
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
 
Statistical computing 01
Statistical computing 01Statistical computing 01
Statistical computing 01
 
Data Analysis in Python
Data Analysis in PythonData Analysis in Python
Data Analysis in Python
 
Python for R users
Python for R usersPython for R users
Python for R users
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
SAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsSAS and R Code for Basic Statistics
SAS and R Code for Basic Statistics
 
Pandas
PandasPandas
Pandas
 
Pandas Cheat Sheet
Pandas Cheat SheetPandas Cheat Sheet
Pandas Cheat Sheet
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
 
Python for R developers and data scientists
Python for R developers and data scientistsPython for R developers and data scientists
Python for R developers and data scientists
 
Data profiling with Apache Calcite
Data profiling with Apache CalciteData profiling with Apache Calcite
Data profiling with Apache Calcite
 
Manipulating data with dates
Manipulating data with datesManipulating data with dates
Manipulating data with dates
 

Ähnlich wie R seminar dplyr package

RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programmingYanchang Zhao
 
Pa1 session 3_slides
Pa1 session 3_slidesPa1 session 3_slides
Pa1 session 3_slidesaiclub_slides
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in RSamuel Bosch
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
 
Programming with R in Big Data Analytics
Programming with R in Big Data AnalyticsProgramming with R in Big Data Analytics
Programming with R in Big Data AnalyticsArchana Gopinath
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxINFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxcarliotwaycave
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?Jeremy Schneider
 
Unit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptxUnit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptxSreeLaya9
 
R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011Mandi Walls
 
DATA MINING USING R (1).pptx
DATA MINING USING R (1).pptxDATA MINING USING R (1).pptx
DATA MINING USING R (1).pptxmyworld93
 
R Programming - part 1.pdf
R Programming - part 1.pdfR Programming - part 1.pdf
R Programming - part 1.pdfRohanBorgalli
 
1 Installing & getting started with R
1 Installing & getting started with R1 Installing & getting started with R
1 Installing & getting started with Rnaroranisha
 
PPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatrePPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatreRaginiRatre
 

Ähnlich wie R seminar dplyr package (20)

RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
Pa1 session 3_slides
Pa1 session 3_slidesPa1 session 3_slides
Pa1 session 3_slides
 
Aggregate.pptx
Aggregate.pptxAggregate.pptx
Aggregate.pptx
 
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop - Xi...
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop  - Xi...PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop  - Xi...
PMED Undergraduate Workshop - R Tutorial for PMED Undegraduate Workshop - Xi...
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
Programming with R in Big Data Analytics
Programming with R in Big Data AnalyticsProgramming with R in Big Data Analytics
Programming with R in Big Data Analytics
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
محاضرة برنامج التحليل الكمي R program د.هديل القفيدي
محاضرة برنامج التحليل الكمي   R program د.هديل القفيديمحاضرة برنامج التحليل الكمي   R program د.هديل القفيدي
محاضرة برنامج التحليل الكمي R program د.هديل القفيدي
 
محاضرة برنامج التحليل الكمي R program د.هديل القفيدي
محاضرة برنامج التحليل الكمي   R program د.هديل القفيديمحاضرة برنامج التحليل الكمي   R program د.هديل القفيدي
محاضرة برنامج التحليل الكمي R program د.هديل القفيدي
 
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxINFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 
Unit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptxUnit I - introduction to r language 2.pptx
Unit I - introduction to r language 2.pptx
 
R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011
 
DATA MINING USING R (1).pptx
DATA MINING USING R (1).pptxDATA MINING USING R (1).pptx
DATA MINING USING R (1).pptx
 
Data Exploration in R.pptx
Data Exploration in R.pptxData Exploration in R.pptx
Data Exploration in R.pptx
 
R Programming - part 1.pdf
R Programming - part 1.pdfR Programming - part 1.pdf
R Programming - part 1.pdf
 
1 Installing & getting started with R
1 Installing & getting started with R1 Installing & getting started with R
1 Installing & getting started with R
 
R workshop
R workshopR workshop
R workshop
 
PPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatrePPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini Ratre
 

Kürzlich hochgeladen

ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEaurabinda banchhor
 

Kürzlich hochgeladen (20)

ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSE
 

R seminar dplyr package

  • 1. R SEMINAR Antony Karanja N. Research Methods Group, ICRAF 2nd April, 15 Data Management and Analysis
  • 2. AIM • Recap on the steps and tips to R learning to code • Introduction to dplyr package • How to utilize dplyr package for data manipulation* and basic statistics • Ultimate: dplyr and ggplot2
  • 3. RECAP • Set working directory (creating project, setwd) • Installing and calling library packages • Reading/loading data (read.???) • What is the R object type (class) • Variables within data frames • Knowing which Data type are the variables • View head and tail data
  • 4. RECAP################### # IMPORT datasets # ################### tree<-read.csv(file="datavis.csv",header=T) #------------------------- # Inspect data with head() #------------------------- names(tree);colnames(tree) head(tree) tail(tree) #------------------------- # Inspect R object type #------------------------- class(tree) #------------------------- # Inspect Internal structure of R object type #------------------------- str(tree) glimpse(tree) #------------------------- # Inspect data types #------------------------- sapply(tree,class) #-horizontal view lapply(tree,class) #-Vertical view ############################## # LOOK FOR DUPLICATE RECORDS # ############################## duplicates<-tree[anyDuplicated(tree[c("Country","Site","PosTopoSeq")]),] #Base function
  • 5. dplyr • #install.packages(“dplyr”) • >library(dplyr) • Grammar of data manipulations – filter() (and slice()) – arrange() – select() (and rename()) – distinct() – mutate() (and transmute()) – summarise() – sample_n() and sample_frac()
  • 6. filter() • filter() allows you to select a subset of the rows of a data frame. • filter() works similarly to subset() • Filter(FD, condition(s)) #1.0 #### filter - By and (use comma) or use | table(tree$Country) Nicaragua<-filter(tree, Country == "Nicaragua") SA<-filter(tree, Country == "South Africa") #1.1 #### slice Nicaragua2<-slice(tree, 1:16)
  • 7. arrange() • arrange() works similarly to filter() except that instead of filtering or selecting rows, it reorders them. #2.0 #### arrange arrange(tree, Site,PosTopoSeq,VegStructure) tree_arr<-arrange(tree, Site,PosTopoSeq,VegStructure) tree_arr<-arrange(tree, desc(Site),PosTopoSeq,VegStructure)
  • 8. select() • Very helpful when working with dataset with many columns/variables • Helper function within select() include starts_with(), ends_with(), matches() and contains() #2.0 #### select tree_select<-select(tree,Country,SEVEREERO,avSlope,avTreeDen,Carbon,pH,Clay) tree_select<-select(tree,Country,SEVEREERO,avSlope,avTreeDen,Carbon,pH>=5,Clay) #err!!!! # What is happening here???? tree_select<-select(tree,-c(Site,PosTopoSeq,VegStructure)) tree_select<-select(tree,-(Site:VegStructure))
  • 9. select() #2.0.1 select and helper functions # Keep variables or drop if negative sign (-) select(tree, starts_with("av",ignore.case=T),starts_with("C")) select(tree, ends_with("e")) select(tree, contains("p")) select(tree, matches("av"))
  • 10. rename() • To assign another name to the existing variable #2.1 #### rename tree_rename<-rename(tree,Slope=avSlope) tree_rename<-rename(tree,Slope=avSlope,TreeDen=avTreeDen)
  • 11. distinct() • Extract distinct (unique) rows #3.0 ### distinct tree_distinct<-distinct(tree) tree_distinct<-distinct(select(tree,Country,Site,PosTopoSeq))
  • 12. mutate() • add new columns that are functions of existing columns. #4.0 ### Mutate tree_mute<-mutate(tree,Acidbase = 7-pH,clay.cover = Clay / avTreeDen) #4.0.1 ### transmute tree_mute<-transmute(tree,Acidbase = 7-pH,clay.cover = Clay / avTreeDen)
  • 13. sample_n() • use sample_n() and sample_frac() to take a random sample of rows #5.0 ### sample_n() sample_n(tree, 10,replace=F) #5.0.1 ### sample_frac() sample_frac(tbl=tree, size=0.1)
  • 14. summarise() • Generate stats from the existing columns/variables. Also generates by stats by grouping variable(s) summarise(tree, count = n(), MeanCarb = mean(Carbon, na.rm = TRUE), MeanClay = mean(Clay, na.rm = TRUE), MedPh=median(pH,na.rm=T))
  • 15. summarise() • Stats by grouping variable(s) tree.summary <- tree %>% group_by(Country,Site,SEVEREERO) %>% summarise(count = n(), meanC = mean(Carbon,na.rm=T), meanClay = mean(Clay,na.rm=T), sdC=sd(Carbon,na.rm=T), sdClay=sd(Clay,na.rm=T), medPh=median(pH,na.rm=T))
  • 16.
  • 18. Update R For windows OS # installing/loading the package: >if(!require(installr)) { install.packages("installr”) >require(installr)} #load / install+load installr # using the package: >updateR() # this will start the updating process of your R installation. Note: It will check for newer versions, and if one is available, will guide you through the decisions you'd need to make.
  • 19. Exercise Use data you are working on and; 1. Manipulate using this the functions above 2. Explore more dplyr functions e.g, how to add row-wise, column-wise e.t.c