SlideShare ist ein Scribd-Unternehmen logo
1 von 57
1
INTRODUCTION TO R
STACY IRWIN
FEB 2017
2
•What is R?
•Data Structures and Types
•Syntax
•Statistics
•Visualizations
•File I/O
•Packages
•Finding Help
•More Syntax & Common Functions
OVERVIEW
3
• Presentation steps thru code and responses (Quick Starts)
user.input() Input at the prompt, typed by you
#> console output Results produced after hitting enter
• Followed by slides with functions and descriptions (Basics)
... More data was produced, but not displayed here for space
### Comment, note, tip
FORMAT
4
WHAT IS R?
5
• R, the [interpreted] language
• 800,000 lines of code
• 45% C
• 19% R
• 17% Fortran
• R, the implementation(s)
• GNU-R is most poular implementation
• Open source version (GNU) of the S language and environment
• Developed by Bell Labs by John Chambers, et al.
• Licensed under the GNU General Public License (GPL)
https://www.r-project.org/about.html
WHAT IS R?
6
WHY R?
WHY NOT R?
7
• R is easy to learn, intuitive
• R was made for statistics
• R makes great graphics, publication quality
WHY R?
8
WHY R?
9
WHY R?
10
WHY R?
11
WHY R?
12
• R is easy to learn, intuitive
• R was made for statistics
• R makes great graphics, pub quality
• R is optimized to work with tabular data structures
• R – it’s fast
• R is versatile – thousands of packages on CRAN alone
• R is open source
BUT…
• Memory limitations***
• Some data wrangling problems are clumsy
WHY R?
…enough!
13
• GNU-R 3.3.2
• https://cran.r-project.org
• Microsoft R Open / MRO
• https://mran.microsoft.com/download/
• RStudio dev environment
• https://www.rstudio.com/
…and many other implementations
HOW TO GET R?
14
DATA STRUCTURES AND TYPES
15
• Vector 1-dimensional array of elements of the same kind
• Scalar?
• Matrix 2D array of elements of the same kind
• Array multi-dimensional structure of one kind of value
• List Something that holds something else (e.g. another list)
• Data frame 2D structure of possibly different types of columns
DATA STRUCTURES
16
• Logical TRUE/FALSE or T/F
• Integers …-1,0,1,2,3…
• Double or “numeric” 1.0, 3.14, 4.002e-6
• Character "Hello", '123abc'
Some query and coercion functions
Informative: typeof( )
T/F Testing: is.numeric( )
is.character( )
Coersion: as.numeric("02")
as.character(3.1415)
DATA VALUE TYPES
17
1:5 ### integer sequence
#> [1] 1 2 3 4 5
x <- 1:5 ### assignment
x ### evaluation. Try: x+x
#> [1] 1 2 3 4 5
y = x^2
print(y)
#> [1] 1 4 9 16 25
y == x ### comparison. What results?
#> [1] TRUE FALSE FALSE FALSE FALSE
QUICK START: NUMERICS
18
"Hello, World!" ### char string
#> [1] "Hello, World!"
c("Smart", "Data") -> x ### vector of strings
LETTERS
#> [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"…
LETTERS[1:5] -> myABC
c(x, myABC)
#> [1] "Smart" "Data" "A" "B" "C" "D" "E"
c(x, y)
#> [1] "Smart" "Data" "1" "4" "9" "16" "25"
QUICK START: CHARACTERS
19
• Command line: Evaluates the line entered, or if the line is incomplete,
it waits for the end of an expression
• Variable names: Consist of numbers, letters, underscores, and periods
• Must start with a letter or a period+letter …CASE sensitive!
• Assignment: <- , = right to left
-> left to right
• Comparison: == != < > <= >=
SYNTAX BASICS
preferred
20
; Ends a statement
x <- 8; print(x)
# Comments out anything following
#but only on that line
( ) Groups expressions, enclosing function arguments
print(2*(6+c((1:4)+x)))
{ } Encloses groups of expressions, loops, if/then
if (x) { print(c("x^2 =", x^2))
print("Done.") }
[ [[ $ Subsets elements of a data object
SYNTAX BASICS
21
? <fxn> Opens help page for the command/concept/constant
?? <term> Lists help pages with term in their content
typeof( ) Identifies the data type of the object
str( ) Shows structure of data object, types of cols in df
length( ), nchar( )
Counts elements in vectors, letters in string
dim( ), nrow( ), ncol( )
Returns dimensions of data object
names( ), colnames( ), rownames( )
Displays only list/column/row names
SELF-HELP BASICS
22
NA missing values
NaN not a number
Inf infinity
NULL empty/nothing
• NA has a type, determined at time of assignment
• Mixed types are coerced into the most flexible type
x <- c(TRUE, 1, 4.4, NA) ; typeof(x[4])
#> [1] "double"
• Predefined constants in base R:
Letters, LETTERS, month.abb, month.name, pi
SPECIAL VALUES
23
list(11:15)
#> [[1]]
#> [1] 11 12 13 14 15
X <- list(A = 1:5, B = c("Yes", "No")); X
#> $A
#> [1] 1 2 3 4 5
#> $B
#> [1] "Yes" "No"
QUICK START: LISTS
24
Y <- list(a = list(b = list(c = "R"))); str(Y)
#> List of 1
#> $ a:List of 1
#> ..$ b:List of 1
#> .. ..$ c: chr "R"
Y[[1]][[1]][[1]]
#> [1] "R"
### also: Y$a$b$c
### also: Y[['a']][['b']][['c']]
c(X, Y) ### try it!
QUICK START: LISTS
25
[[, $ [
Simplifying subset Preserving subset
Returns simplest data structure Output structure == Input structure
str(X)
#> List of 2
#> $ a: int [1:5] 1 2 3 4 5
#> $ b: chr [1:2] "Yes" "No"
SUBSETTING LISTS
26
[[, $ [
Simplifying subset Preserving subset
Returns simplest data structure Output structure == Input structure
X[[1]] returns int vector X[1] returns list holding int vector
X[[1]]
#> [1] 1 2 3 4 5 ### vector
X[1]
#> $A ### list...
#> [1] 1 2 3 4 5 ### holding a vector
SUBSETTING LISTS
27
mtcars
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> ...
Row and Column Names are not data ... just referential meta-data
QUICK START: DATAFRAMES
column names
row names
28
mtcars
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> ...
mtcars[3,2] # [row, col] ### What will this return?
#> [1] 4
QUICK START: DATAFRAMES
29
mtcars[1]
#> mpg
#> Mazda RX4 21.0
#> Mazda RX4 Wag 21.0
#> Datsun 710 22.8
#> Hornet 4 Drive 21.4
#>...
mtcars[[1]] # also: mtcars$mpg or mtcars[,'mpg']
#> [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
mtcars[,1]
#> [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
mtcars[1,]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4
QUICK START: DATAFRAMES
Recall:
# [row, col]
first list element
30
[[, $ [
Simplifying subset Preserving subset
Simplest data structure Output structure == Input structure
mtcars[[1]] returns contents of 1st col mtcars[1] returns first col as list: $mpg
mtcars[[1]]
#> [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 ...
mtcars[1]
#> mpg
#> Mazda RX4 21.0
#> Mazda RX4 Wag 21.0
#> Datsun 710 22.8
#> Hornet 4 Drive 21.4
#> ...
SUBSETTING DATAFRAMES
data.frame columns ARE lists
31
str(mtcars)
#> 'data.frame': 32 obs. of 11 variables:
#> $ mpg : num 21 21 22.8 21.4 18.7 18.1...
#> $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
#> ...
### BUT WAIT ###
typeof(mtcars)
#> [1] "list"
EVEN DATAFRAMES ARE LISTS!
32
mtcars[mtcars$cyl == 4 & mtcars$gear == 4, ]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
#> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
MT <- mtcars[mtcars$gear > 4, c(1,4,6)] ; MT # only show mpg, hp, wt
#> mpg hp wt
#> Porsche 914-2 26.0 91 2.140
#> Lotus Europa 30.4 113 1.513
#> Ford Pantera L 15.8 264 3.170
#> Ferrari Dino 19.7 175 2.770
#> Maserati Bora 15.0 335 3.570
QUICK START: DATAFRAMES
“which” also returns an index vector:
mtcars[which(mtcars$gear > 4), …]
33
tf <- c(T,F)
df <- data.frame(x = 1:5, y = letters[1:10], z = tf)
df
#> x y z
#> 1 1 a TRUE
#> 2 2 b FALSE
#> 3 3 c TRUE
#> 4 4 d FALSE
#> 5 5 e TRUE
#> 6 1 f FALSE
#> 7 2 g TRUE
#> 8 3 h FALSE
#> 9 4 i TRUE
#> 10 5 j FALSE
QUICK START: DATAFRAMES
Combining and amending:
cbind( ), rbind( )
merge( )
data.frame(df, tf)
df$newCol <- NA
Deleting column x:
df[-1] -> df
df[,-1] -> df
df$x <- NULL
34
• With what you know about lists and dataframes…
• What happens when we execute these lines?
c(df, df)
typeof(c(df, df))
str(c(df, df))
c(df, mtcars)
DATAFRAMES QUIZ
35
BASIC STATISTICS
36
summary
round, ceiling, floor
sin, cos, …, exp, log, log10, log2
sum, diff, filter
union, intersect
mean, sd, var, weighted.mean
median, Mode, quartile, fivenum
min, max, range
&, |, !, xor
all, any
SELECTED STATS FUNCTIONS
37
mymodel <- lm(MT$mpg ~ MT$hp) ### y ~ x
mymodel
#>
#> Call:
#> lm(formula = y ~ x)
#>
#> Coefficients:
#> (Intercept) x
#> 32.77745 -0.05827
LINEAR MODEL EXAMPLE
38
VISUALIZATIONS
39
plot(x <- mtcars$hp, y <- mtcars$mpg,
xlab = "HP", ylab = "MPG", pch = 20)
### assignment can occur inside plot call
### also: plot(y ~ x)
BASIC PLOTS: SCATTER PLOT
40
SCATTER PLOT
41
plot(x <- mtcars$hp, y <- mtcars$mpg,
xlab = "HP", ylab = "MPG", pch = 20)
### assignment can occur inside plot call
### also: plot(y ~ x)
mymodel <- lm(y ~ x)
abline(mymodel, col="red", lwd=2)
BASIC PLOTS: SCATTER PLOT
42
TREND LINE
43
• We can place multiple plots on one window:
par(mfrow = c(1, 2)) ### request 1x2 layout
hist(mtcars$hp, xlab = "Horsepower",
main = "Histogram of HP")
hist(mtcars$mpg, xlab = "Miles per gallon",
breaks = 10, main = "Histogram of MPG")
BASIC PLOTS: HISTOGRAM
44
HISTOGRAMS
45
FILE I/O
46
readLines(file = "C:/Users/stacy/text.txt")
returns vector of character strings
read(file = "…") returns data.frame of rows & cols
read.csv(file = "…") read( ) with sep = ","
read.table(file = "…") read( ) with sep = "t"
writeLines(data = …, file = "…")
write(data = …, file = "…")
write.csv(x, "…")
write.table(x, "…")
FILE I/O BASICS
47
mydata <- readLines("http://ford.com", n = 10) ; mydata
#> [1] "<!DOCTYPE HTML>"
#> [2] "<html>"
#> [3] " <!-- appVersion: 09302016 rel_21.0 -->"
#> [4] " <!-- htmllint preset="$none" -->"
#> [5] " <head>"
#> [6] " <meta http-equiv="content-type" content="text/htm
#> [7] " "
#> [8] " <meta name="keywords">"
#> [9] " <meta name="description" content="The Official Fo
#>[10] ""
save(mydata, file = "…")
load("…")
FILE I/O BASICS
48
PACKAGES
49
•Package-related commands:
library() Lists installed packages
search() Lists loaded packages
install.packages("mylib") Installs package called “mylib”
library(mylib) Loads mylib into current env
require(mylib) …used inside other functions
https://cran.r-project.org/web/packages/
PACKAGE MANAGEMENT
50
plyr Tools for splitting, applying and combining data
data.table Extension of Data.frame, highly optimized
ggplot2 An Implementation of the Grammar of Graphics
colorspace Color Space Manipulation
shiny Web Application Framework for R
chron Chronological Objects which handle dates and times
RCurl General Network (HTTP/FTP/...) Client Interface for R
wordcloud Make Word Clouds
rjson, RJSONIO JSON tools for R
htmltools Tools for HTML
pdftools Extract Text and Data from PDF Documents
xlsx Read, write, format Excel 2007 and Excel 97/2000/XP/2003 files
XML Tools for Parsing and Generating XML Within R and S-Plus
xtable Export Tables to LaTeX or HTML
MY "MUST-HAVE" & OTHER POPULAR PACKAGES
51
FINDING HELP
52
• Beginning R (Wiley) FREE Chapter 1 online
• Mark Gardner
• R Graphics Cookbook (O'Reilly) FREE full text online
• Winston Chang
• Main focus is on ggplot2 package. Problem-Solution format
• R for Data Science (O'Reilly) FREE full text online
• Hadley Wickham and Garrett Grolemund
• Advanced R (CRC Press) FREE full text online
• Hadley Wickham
RECOMMENDED BOOKS
53
• StackExchange
• StackOverflow http://stackoverflow.com/tags/r
• CrossValidated http://stats.stackoverflow.com
• Post questions with MWE = Minumum Working Example
• R-bloggers https://www.r-bloggers.com
• News, Tutorials, Jobs … common issues often documented clearly
• R help mailing list https://stat.ethz.ch/mailman/listinfo/r-help
• Quick-R (2014) http://www.statmethods.net/
• Robert I. Kabacoff, Ph.D.
RECOMMENDED ONLINE HELP & TUTORIALS
www.modusoperandi.com
709 South Harbor City Blvd., Suite 400
Melbourne, FL 32901-1936
321-473-1400
Stacy Irwin
sirwin@modusoperandi.com
sirwin@gmail.com
55
MORE COMMANDS & SYNTAX
56
ls(), rm( ) list, remove objects from memory
getwd(), setwd("…") get, set working directory
grep, lgrep, gsub grep family
match, identical, setdiff, unique, %in%
matching functions
• String manipulation:
nchar, strsplit, unlist, paste0, pmatch
toupper, tolower, sub, strtrim, strtoi
help.search(keyword = "character")
COMMON FUNCTIONS
57
for(i in 1:100) ...
for(myLetter in LETTERS) ...
while(i < 100) i <- i+5
if(this == that) <do_something>
if(this %in% that) {
<do_this1>
<do_this2>
} else {
<do_that1>
<do_that2>
}
LOOPS AND CONDITIONAL FUNCTIONS

Weitere ähnliche Inhalte

Was ist angesagt?

Lecture3 mysql gui by okello erick
Lecture3 mysql gui by okello erickLecture3 mysql gui by okello erick
Lecture3 mysql gui by okello erickokelloerick
 
What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3MariaDB plc
 
Boost performance with MySQL 5.1 partitions
Boost performance with MySQL 5.1 partitionsBoost performance with MySQL 5.1 partitions
Boost performance with MySQL 5.1 partitionsGiuseppe Maxia
 
Intro to Forth - 2018/09/13 ACM Greenville
Intro to Forth - 2018/09/13 ACM GreenvilleIntro to Forth - 2018/09/13 ACM Greenville
Intro to Forth - 2018/09/13 ACM GreenvilleDave Johnson
 
Patterns for slick database applications
Patterns for slick database applicationsPatterns for slick database applications
Patterns for slick database applicationsSkills Matter
 
Python data structures
Python data structuresPython data structures
Python data structuresHarry Potter
 
MariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
MariaDB Server 10.3 - Temporale Daten und neues zur DB-KompatibilitätMariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
MariaDB Server 10.3 - Temporale Daten und neues zur DB-KompatibilitätMariaDB plc
 
data constraints,group by
data constraints,group by data constraints,group by
data constraints,group by Visakh V
 
Lecture2 mysql by okello erick
Lecture2 mysql by okello erickLecture2 mysql by okello erick
Lecture2 mysql by okello erickokelloerick
 
Lecture5 my sql statements by okello erick
Lecture5 my sql statements by okello erickLecture5 my sql statements by okello erick
Lecture5 my sql statements by okello erickokelloerick
 

Was ist angesagt? (14)

mysqlHiep.ppt
mysqlHiep.pptmysqlHiep.ppt
mysqlHiep.ppt
 
Lecture3 mysql gui by okello erick
Lecture3 mysql gui by okello erickLecture3 mysql gui by okello erick
Lecture3 mysql gui by okello erick
 
What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3What's New in MariaDB Server 10.3
What's New in MariaDB Server 10.3
 
Select To Order By
Select  To  Order BySelect  To  Order By
Select To Order By
 
Boost performance with MySQL 5.1 partitions
Boost performance with MySQL 5.1 partitionsBoost performance with MySQL 5.1 partitions
Boost performance with MySQL 5.1 partitions
 
MySQL constraints
MySQL constraintsMySQL constraints
MySQL constraints
 
Intro to Forth - 2018/09/13 ACM Greenville
Intro to Forth - 2018/09/13 ACM GreenvilleIntro to Forth - 2018/09/13 ACM Greenville
Intro to Forth - 2018/09/13 ACM Greenville
 
Patterns for slick database applications
Patterns for slick database applicationsPatterns for slick database applications
Patterns for slick database applications
 
Python data structures
Python data structuresPython data structures
Python data structures
 
MariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
MariaDB Server 10.3 - Temporale Daten und neues zur DB-KompatibilitätMariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
MariaDB Server 10.3 - Temporale Daten und neues zur DB-Kompatibilität
 
data constraints,group by
data constraints,group by data constraints,group by
data constraints,group by
 
Lecture2 mysql by okello erick
Lecture2 mysql by okello erickLecture2 mysql by okello erick
Lecture2 mysql by okello erick
 
Lecture5 my sql statements by okello erick
Lecture5 my sql statements by okello erickLecture5 my sql statements by okello erick
Lecture5 my sql statements by okello erick
 
Sql
SqlSql
Sql
 

Andere mochten auch

Andere mochten auch (13)

La crisis mundial de 1929
La crisis mundial de 1929La crisis mundial de 1929
La crisis mundial de 1929
 
Diputados y Senadores Nacionales de la Patagonia.
Diputados y  Senadores Nacionales de la Patagonia.Diputados y  Senadores Nacionales de la Patagonia.
Diputados y Senadores Nacionales de la Patagonia.
 
Ley 254
Ley 254 Ley 254
Ley 254
 
Cma visitas analistas teck-glencore 2017
Cma   visitas analistas teck-glencore 2017Cma   visitas analistas teck-glencore 2017
Cma visitas analistas teck-glencore 2017
 
MOGOK TOWN GEMS MINE AREA ILLUSTRATION COLLECTION
MOGOK TOWN GEMS MINE AREA ILLUSTRATION COLLECTIONMOGOK TOWN GEMS MINE AREA ILLUSTRATION COLLECTION
MOGOK TOWN GEMS MINE AREA ILLUSTRATION COLLECTION
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
3Com 2150A026
3Com 2150A0263Com 2150A026
3Com 2150A026
 
Taller 2 keila
Taller 2 keilaTaller 2 keila
Taller 2 keila
 
Fly Fishing Advice for the Novice
Fly Fishing Advice for the NoviceFly Fishing Advice for the Novice
Fly Fishing Advice for the Novice
 
Animales que-nos-alimentan
Animales que-nos-alimentanAnimales que-nos-alimentan
Animales que-nos-alimentan
 
Dhammapada
DhammapadaDhammapada
Dhammapada
 
Escuelas psicologicas
Escuelas psicologicas Escuelas psicologicas
Escuelas psicologicas
 
PLE GOBIERNO ESCOLAR
PLE GOBIERNO ESCOLARPLE GOBIERNO ESCOLAR
PLE GOBIERNO ESCOLAR
 

Ähnlich wie Introduction to R

Data manipulation and visualization in r 20190711 myanmarucsy
Data manipulation and visualization in r 20190711 myanmarucsyData manipulation and visualization in r 20190711 myanmarucsy
Data manipulation and visualization in r 20190711 myanmarucsySmartHinJ
 
R Programming: Transform/Reshape Data In R
R Programming: Transform/Reshape Data In RR Programming: Transform/Reshape Data In R
R Programming: Transform/Reshape Data In RRsquared Academy
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on rAbhik Seal
 
Read/Import data from flat/delimited files into R
Read/Import data from flat/delimited files into RRead/Import data from flat/delimited files into R
Read/Import data from flat/delimited files into RRsquared Academy
 
R tutorial for a windows environment
R tutorial for a windows environmentR tutorial for a windows environment
R tutorial for a windows environmentYogendra Chaubey
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data ManipulationChu An
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Writing Readable Code with Pipes
Writing Readable Code with PipesWriting Readable Code with Pipes
Writing Readable Code with PipesRsquared Academy
 
R Programming: Numeric Functions In R
R Programming: Numeric Functions In RR Programming: Numeric Functions In R
R Programming: Numeric Functions In RRsquared Academy
 
An overview of Python 2.7
An overview of Python 2.7An overview of Python 2.7
An overview of Python 2.7decoupled
 
Python Programming.pptx
Python Programming.pptxPython Programming.pptx
Python Programming.pptxSudhakarVenkey
 

Ähnlich wie Introduction to R (20)

Data manipulation and visualization in r 20190711 myanmarucsy
Data manipulation and visualization in r 20190711 myanmarucsyData manipulation and visualization in r 20190711 myanmarucsy
Data manipulation and visualization in r 20190711 myanmarucsy
 
Introduction to tibbles
Introduction to tibblesIntroduction to tibbles
Introduction to tibbles
 
R Programming: Transform/Reshape Data In R
R Programming: Transform/Reshape Data In RR Programming: Transform/Reshape Data In R
R Programming: Transform/Reshape Data In R
 
R programming language
R programming languageR programming language
R programming language
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
R programming
R programmingR programming
R programming
 
Programming in R
Programming in RProgramming in R
Programming in R
 
Read/Import data from flat/delimited files into R
Read/Import data from flat/delimited files into RRead/Import data from flat/delimited files into R
Read/Import data from flat/delimited files into R
 
R tutorial for a windows environment
R tutorial for a windows environmentR tutorial for a windows environment
R tutorial for a windows environment
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
RBootcam Day 2
RBootcam Day 2RBootcam Day 2
RBootcam Day 2
 
Writing Readable Code with Pipes
Writing Readable Code with PipesWriting Readable Code with Pipes
Writing Readable Code with Pipes
 
R Programming: Numeric Functions In R
R Programming: Numeric Functions In RR Programming: Numeric Functions In R
R Programming: Numeric Functions In R
 
An overview of Python 2.7
An overview of Python 2.7An overview of Python 2.7
An overview of Python 2.7
 
A tour of Python
A tour of PythonA tour of Python
A tour of Python
 
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
 
Python Programming.pptx
Python Programming.pptxPython Programming.pptx
Python Programming.pptx
 
R Programming Homework Help
R Programming Homework HelpR Programming Homework Help
R Programming Homework Help
 

Kürzlich hochgeladen

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Kürzlich hochgeladen (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Introduction to R

  • 1. 1 INTRODUCTION TO R STACY IRWIN FEB 2017
  • 2. 2 •What is R? •Data Structures and Types •Syntax •Statistics •Visualizations •File I/O •Packages •Finding Help •More Syntax & Common Functions OVERVIEW
  • 3. 3 • Presentation steps thru code and responses (Quick Starts) user.input() Input at the prompt, typed by you #> console output Results produced after hitting enter • Followed by slides with functions and descriptions (Basics) ... More data was produced, but not displayed here for space ### Comment, note, tip FORMAT
  • 5. 5 • R, the [interpreted] language • 800,000 lines of code • 45% C • 19% R • 17% Fortran • R, the implementation(s) • GNU-R is most poular implementation • Open source version (GNU) of the S language and environment • Developed by Bell Labs by John Chambers, et al. • Licensed under the GNU General Public License (GPL) https://www.r-project.org/about.html WHAT IS R?
  • 7. 7 • R is easy to learn, intuitive • R was made for statistics • R makes great graphics, publication quality WHY R?
  • 12. 12 • R is easy to learn, intuitive • R was made for statistics • R makes great graphics, pub quality • R is optimized to work with tabular data structures • R – it’s fast • R is versatile – thousands of packages on CRAN alone • R is open source BUT… • Memory limitations*** • Some data wrangling problems are clumsy WHY R? …enough!
  • 13. 13 • GNU-R 3.3.2 • https://cran.r-project.org • Microsoft R Open / MRO • https://mran.microsoft.com/download/ • RStudio dev environment • https://www.rstudio.com/ …and many other implementations HOW TO GET R?
  • 15. 15 • Vector 1-dimensional array of elements of the same kind • Scalar? • Matrix 2D array of elements of the same kind • Array multi-dimensional structure of one kind of value • List Something that holds something else (e.g. another list) • Data frame 2D structure of possibly different types of columns DATA STRUCTURES
  • 16. 16 • Logical TRUE/FALSE or T/F • Integers …-1,0,1,2,3… • Double or “numeric” 1.0, 3.14, 4.002e-6 • Character "Hello", '123abc' Some query and coercion functions Informative: typeof( ) T/F Testing: is.numeric( ) is.character( ) Coersion: as.numeric("02") as.character(3.1415) DATA VALUE TYPES
  • 17. 17 1:5 ### integer sequence #> [1] 1 2 3 4 5 x <- 1:5 ### assignment x ### evaluation. Try: x+x #> [1] 1 2 3 4 5 y = x^2 print(y) #> [1] 1 4 9 16 25 y == x ### comparison. What results? #> [1] TRUE FALSE FALSE FALSE FALSE QUICK START: NUMERICS
  • 18. 18 "Hello, World!" ### char string #> [1] "Hello, World!" c("Smart", "Data") -> x ### vector of strings LETTERS #> [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"… LETTERS[1:5] -> myABC c(x, myABC) #> [1] "Smart" "Data" "A" "B" "C" "D" "E" c(x, y) #> [1] "Smart" "Data" "1" "4" "9" "16" "25" QUICK START: CHARACTERS
  • 19. 19 • Command line: Evaluates the line entered, or if the line is incomplete, it waits for the end of an expression • Variable names: Consist of numbers, letters, underscores, and periods • Must start with a letter or a period+letter …CASE sensitive! • Assignment: <- , = right to left -> left to right • Comparison: == != < > <= >= SYNTAX BASICS preferred
  • 20. 20 ; Ends a statement x <- 8; print(x) # Comments out anything following #but only on that line ( ) Groups expressions, enclosing function arguments print(2*(6+c((1:4)+x))) { } Encloses groups of expressions, loops, if/then if (x) { print(c("x^2 =", x^2)) print("Done.") } [ [[ $ Subsets elements of a data object SYNTAX BASICS
  • 21. 21 ? <fxn> Opens help page for the command/concept/constant ?? <term> Lists help pages with term in their content typeof( ) Identifies the data type of the object str( ) Shows structure of data object, types of cols in df length( ), nchar( ) Counts elements in vectors, letters in string dim( ), nrow( ), ncol( ) Returns dimensions of data object names( ), colnames( ), rownames( ) Displays only list/column/row names SELF-HELP BASICS
  • 22. 22 NA missing values NaN not a number Inf infinity NULL empty/nothing • NA has a type, determined at time of assignment • Mixed types are coerced into the most flexible type x <- c(TRUE, 1, 4.4, NA) ; typeof(x[4]) #> [1] "double" • Predefined constants in base R: Letters, LETTERS, month.abb, month.name, pi SPECIAL VALUES
  • 23. 23 list(11:15) #> [[1]] #> [1] 11 12 13 14 15 X <- list(A = 1:5, B = c("Yes", "No")); X #> $A #> [1] 1 2 3 4 5 #> $B #> [1] "Yes" "No" QUICK START: LISTS
  • 24. 24 Y <- list(a = list(b = list(c = "R"))); str(Y) #> List of 1 #> $ a:List of 1 #> ..$ b:List of 1 #> .. ..$ c: chr "R" Y[[1]][[1]][[1]] #> [1] "R" ### also: Y$a$b$c ### also: Y[['a']][['b']][['c']] c(X, Y) ### try it! QUICK START: LISTS
  • 25. 25 [[, $ [ Simplifying subset Preserving subset Returns simplest data structure Output structure == Input structure str(X) #> List of 2 #> $ a: int [1:5] 1 2 3 4 5 #> $ b: chr [1:2] "Yes" "No" SUBSETTING LISTS
  • 26. 26 [[, $ [ Simplifying subset Preserving subset Returns simplest data structure Output structure == Input structure X[[1]] returns int vector X[1] returns list holding int vector X[[1]] #> [1] 1 2 3 4 5 ### vector X[1] #> $A ### list... #> [1] 1 2 3 4 5 ### holding a vector SUBSETTING LISTS
  • 27. 27 mtcars #> mpg cyl disp hp drat wt qsec vs am gear carb #> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 #> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 #> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 #> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 #> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 #> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 #> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 #> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 #> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 #> ... Row and Column Names are not data ... just referential meta-data QUICK START: DATAFRAMES column names row names
  • 28. 28 mtcars #> mpg cyl disp hp drat wt qsec vs am gear carb #> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 #> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 #> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 #> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 #> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 #> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 #> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 #> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 #> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 #> ... mtcars[3,2] # [row, col] ### What will this return? #> [1] 4 QUICK START: DATAFRAMES
  • 29. 29 mtcars[1] #> mpg #> Mazda RX4 21.0 #> Mazda RX4 Wag 21.0 #> Datsun 710 22.8 #> Hornet 4 Drive 21.4 #>... mtcars[[1]] # also: mtcars$mpg or mtcars[,'mpg'] #> [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... mtcars[,1] #> [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... mtcars[1,] #> mpg cyl disp hp drat wt qsec vs am gear carb #> Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4 QUICK START: DATAFRAMES Recall: # [row, col] first list element
  • 30. 30 [[, $ [ Simplifying subset Preserving subset Simplest data structure Output structure == Input structure mtcars[[1]] returns contents of 1st col mtcars[1] returns first col as list: $mpg mtcars[[1]] #> [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 ... mtcars[1] #> mpg #> Mazda RX4 21.0 #> Mazda RX4 Wag 21.0 #> Datsun 710 22.8 #> Hornet 4 Drive 21.4 #> ... SUBSETTING DATAFRAMES data.frame columns ARE lists
  • 31. 31 str(mtcars) #> 'data.frame': 32 obs. of 11 variables: #> $ mpg : num 21 21 22.8 21.4 18.7 18.1... #> $ cyl : num 6 6 4 6 8 6 8 4 4 6 ... #> ... ### BUT WAIT ### typeof(mtcars) #> [1] "list" EVEN DATAFRAMES ARE LISTS!
  • 32. 32 mtcars[mtcars$cyl == 4 & mtcars$gear == 4, ] #> mpg cyl disp hp drat wt qsec vs am gear carb #> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 #> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 #> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 #> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 #> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 #> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 #> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 #> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 MT <- mtcars[mtcars$gear > 4, c(1,4,6)] ; MT # only show mpg, hp, wt #> mpg hp wt #> Porsche 914-2 26.0 91 2.140 #> Lotus Europa 30.4 113 1.513 #> Ford Pantera L 15.8 264 3.170 #> Ferrari Dino 19.7 175 2.770 #> Maserati Bora 15.0 335 3.570 QUICK START: DATAFRAMES “which” also returns an index vector: mtcars[which(mtcars$gear > 4), …]
  • 33. 33 tf <- c(T,F) df <- data.frame(x = 1:5, y = letters[1:10], z = tf) df #> x y z #> 1 1 a TRUE #> 2 2 b FALSE #> 3 3 c TRUE #> 4 4 d FALSE #> 5 5 e TRUE #> 6 1 f FALSE #> 7 2 g TRUE #> 8 3 h FALSE #> 9 4 i TRUE #> 10 5 j FALSE QUICK START: DATAFRAMES Combining and amending: cbind( ), rbind( ) merge( ) data.frame(df, tf) df$newCol <- NA Deleting column x: df[-1] -> df df[,-1] -> df df$x <- NULL
  • 34. 34 • With what you know about lists and dataframes… • What happens when we execute these lines? c(df, df) typeof(c(df, df)) str(c(df, df)) c(df, mtcars) DATAFRAMES QUIZ
  • 36. 36 summary round, ceiling, floor sin, cos, …, exp, log, log10, log2 sum, diff, filter union, intersect mean, sd, var, weighted.mean median, Mode, quartile, fivenum min, max, range &, |, !, xor all, any SELECTED STATS FUNCTIONS
  • 37. 37 mymodel <- lm(MT$mpg ~ MT$hp) ### y ~ x mymodel #> #> Call: #> lm(formula = y ~ x) #> #> Coefficients: #> (Intercept) x #> 32.77745 -0.05827 LINEAR MODEL EXAMPLE
  • 39. 39 plot(x <- mtcars$hp, y <- mtcars$mpg, xlab = "HP", ylab = "MPG", pch = 20) ### assignment can occur inside plot call ### also: plot(y ~ x) BASIC PLOTS: SCATTER PLOT
  • 41. 41 plot(x <- mtcars$hp, y <- mtcars$mpg, xlab = "HP", ylab = "MPG", pch = 20) ### assignment can occur inside plot call ### also: plot(y ~ x) mymodel <- lm(y ~ x) abline(mymodel, col="red", lwd=2) BASIC PLOTS: SCATTER PLOT
  • 43. 43 • We can place multiple plots on one window: par(mfrow = c(1, 2)) ### request 1x2 layout hist(mtcars$hp, xlab = "Horsepower", main = "Histogram of HP") hist(mtcars$mpg, xlab = "Miles per gallon", breaks = 10, main = "Histogram of MPG") BASIC PLOTS: HISTOGRAM
  • 46. 46 readLines(file = "C:/Users/stacy/text.txt") returns vector of character strings read(file = "…") returns data.frame of rows & cols read.csv(file = "…") read( ) with sep = "," read.table(file = "…") read( ) with sep = "t" writeLines(data = …, file = "…") write(data = …, file = "…") write.csv(x, "…") write.table(x, "…") FILE I/O BASICS
  • 47. 47 mydata <- readLines("http://ford.com", n = 10) ; mydata #> [1] "<!DOCTYPE HTML>" #> [2] "<html>" #> [3] " <!-- appVersion: 09302016 rel_21.0 -->" #> [4] " <!-- htmllint preset="$none" -->" #> [5] " <head>" #> [6] " <meta http-equiv="content-type" content="text/htm #> [7] " " #> [8] " <meta name="keywords">" #> [9] " <meta name="description" content="The Official Fo #>[10] "" save(mydata, file = "…") load("…") FILE I/O BASICS
  • 49. 49 •Package-related commands: library() Lists installed packages search() Lists loaded packages install.packages("mylib") Installs package called “mylib” library(mylib) Loads mylib into current env require(mylib) …used inside other functions https://cran.r-project.org/web/packages/ PACKAGE MANAGEMENT
  • 50. 50 plyr Tools for splitting, applying and combining data data.table Extension of Data.frame, highly optimized ggplot2 An Implementation of the Grammar of Graphics colorspace Color Space Manipulation shiny Web Application Framework for R chron Chronological Objects which handle dates and times RCurl General Network (HTTP/FTP/...) Client Interface for R wordcloud Make Word Clouds rjson, RJSONIO JSON tools for R htmltools Tools for HTML pdftools Extract Text and Data from PDF Documents xlsx Read, write, format Excel 2007 and Excel 97/2000/XP/2003 files XML Tools for Parsing and Generating XML Within R and S-Plus xtable Export Tables to LaTeX or HTML MY "MUST-HAVE" & OTHER POPULAR PACKAGES
  • 52. 52 • Beginning R (Wiley) FREE Chapter 1 online • Mark Gardner • R Graphics Cookbook (O'Reilly) FREE full text online • Winston Chang • Main focus is on ggplot2 package. Problem-Solution format • R for Data Science (O'Reilly) FREE full text online • Hadley Wickham and Garrett Grolemund • Advanced R (CRC Press) FREE full text online • Hadley Wickham RECOMMENDED BOOKS
  • 53. 53 • StackExchange • StackOverflow http://stackoverflow.com/tags/r • CrossValidated http://stats.stackoverflow.com • Post questions with MWE = Minumum Working Example • R-bloggers https://www.r-bloggers.com • News, Tutorials, Jobs … common issues often documented clearly • R help mailing list https://stat.ethz.ch/mailman/listinfo/r-help • Quick-R (2014) http://www.statmethods.net/ • Robert I. Kabacoff, Ph.D. RECOMMENDED ONLINE HELP & TUTORIALS
  • 54. www.modusoperandi.com 709 South Harbor City Blvd., Suite 400 Melbourne, FL 32901-1936 321-473-1400 Stacy Irwin sirwin@modusoperandi.com sirwin@gmail.com
  • 56. 56 ls(), rm( ) list, remove objects from memory getwd(), setwd("…") get, set working directory grep, lgrep, gsub grep family match, identical, setdiff, unique, %in% matching functions • String manipulation: nchar, strsplit, unlist, paste0, pmatch toupper, tolower, sub, strtrim, strtoi help.search(keyword = "character") COMMON FUNCTIONS
  • 57. 57 for(i in 1:100) ... for(myLetter in LETTERS) ... while(i < 100) i <- i+5 if(this == that) <do_something> if(this %in% that) { <do_this1> <do_this2> } else { <do_that1> <do_that2> } LOOPS AND CONDITIONAL FUNCTIONS

Hinweis der Redaktion

  1. I learned about R while completing my PhD, from a friend of mine studying meteorology. She was using R to process huge amounts of worldwide temperature and moisture data, subset it, and analyze it to gage the effectiveness of weather prediction models. Like her, I was dealing with a big bucket of data and comparing it against models, but mine had to do with stars, planets, and their measured properties. I borrowed her intro book on R, and the rest is history. R is a fun and versatile language, and I hope I can share some of this excitement with you through this presentation. Emphasis in this lesson is on data structures, and light on analysis examples (to be covered at later date) but you are exposed to the basic concepts and commands.
  2. Syntax will be addressed throughout
  3. Code/typing/responses will be in Courier New font Adopts “Advanced R” style: Input lines are shown as you would type them Output lines are commented with #> Easy copy-paste with comparison results In reality, the prompt is just “> ”
  4. R is a language Interpreted language – great for statistical analysis, visualization, pub-quality graphics, data exploration and manipulation, scripting 800,000 lines of code 45% C 19% R 17% Fortran R is an implementation Open source version (GNU) of the S language and environment developed by Bell Labs by John Chambers, et al. R is a GNU Project and is licensed under the GNU General Public License (GPL). Current version 3.3.2 (Bell Labs (was: AT&T, now Lucent) in Aug 1993) Much S code runs unaltered in R, and highly extensible (needs additional packages/libraries)
  5. file manipulation prototyping botatistics astrophysics image processing text analysis statistical analysis fast visualization pub-quality graphics data exploration ETL scripting NLP Word Clouds time series networks graph analysis outliers patterns AB testing machine learning neural.nets JSON HTML XML
  6. Poorly written code – most R users are not programmers or software engineers, no formal training. They are exploring data for a quick answer. R won’t do: it has problems handing certain kinds of problems. For example, raw byte data “R is slow” Poorly written code The implementation is at fault 5 different ways to access a value from a dataframe Slowest takes 30X longer than the fastest More advanced topics include utilizing the object-oriented systems of R, using virtual memory, parallel programming with multiple cores, and rewriting functions in C++ within R
  7. GDELT data cleaning
  8. If R is "slow" …. often due to poorly written code – most R users are not programmers or software engineers, no formal training. They are exploring data for a quick answer. Admittedly, it has problems handing certain kinds of problems. For example, raw byte data manipulation and conversion More advanced topics include utilizing the object-oriented systems of R, using virtual memory, parallel programming with multiple cores, and rewriting functions in C++ within R (so they can run faster)
  9. There are no single value variables, but instead these are vectors of length 1. I’ll note here that the indexing of vectors, lists, dataframes, etc. start at 1, not 0. In this tutorial we will look mainly at vectors, lists, and data frames.
  10. Other data types: complex and raw, not covered here. Character strings can be enclosed in single or double quotes, just be consistent
  11. ":" produces a sequence of numbers at unit intervals, not necessarily integers! 1.1:5 produces 1.1 2.1 3.1 4.1 (5.1 is beyond the limit of the expression)
  12. c() produced vector of n elements – “combine” Overwriting variables with re-assignment allowed Built-in constants: LETTERS letters month.abb month.name pi Index/select elements with [ ] Coersion occurs implicitly: 1 is an integer, but "1" is a character
  13. There is no mandatory character to end a line of code, but if a parenthetical (or bracketed) expression is incomplete when enter is pressed, at least for simple expressions, R will wait for you to complete and close the expression. Considered good practice to use the arrow form of assignment. Logical comparison generate T/F and are similar to other languages’ syntax. Most comparisons performed vector-wise, or by multiples of vector lengths. Tip: everything is divisible by 1
  14. x <- 8; print(2*(6+c((1:4)+x))) #> [1] 30 32 34 36 Considered good practice to use the arrow form of assignment. Logical comparison generate T/F and are similar to other languages’ syntax. Most comparisons performed vector-wise, or by multiples of vector lengths
  15. Join lists with c()
  16. Join lists with c()
  17. Until now, only worked with vectors, 1D 3 ways to subset lists (also works with vectors!)
  18. Simplifying vs. Preserving Try these simple subsetting examples, examine their structure with str(X[[1]]) str(X[1]) Join lists with c()
  19. mtcars : Motor Trend road test data from 1974 Dataframes are groups of lists Columns may be different types, but each must be self-consistent
  20. mtcars : Motor Trend road test data from 1974 Dataframes are groups of lists Columns may be different types, but each must be self-consistent
  21. first list element [1] is referenced byt the preserving subset type.
  22. Until now, only worked with vectors, 1D 3 ways to subset lists (also works with vectors!) Try these simple subsetting examples, examine their structure with str() Join lists with c()
  23. Subsetting and filtering MT is a new subsetted data frame
  24. Creating a dataframe from scratch. A dataframe is a special kind of list: the elements (columns!) of the dataframe must all be the same length! rbind/cbind, for adding rows/columns to dataframes
  25. Do you think any of them will produce an error? Try it!
  26. ### also short-cutting: &&, ||
  27. ### also short-cutting: &&, ||
  28. Options (arguments) can specify whether header exists, col.names, col types, quote handling, missing values, etc. Also: read.xls(), write.xls()
  29. It's considered better practice to load packages with library(), instead of require()
  30. Quick-R author: "I created Quick-R for one simple reason. I wanted to learn R and I am a teacher at heart. The easiest way for me to learn something is to teach it."
  31. Textual Machine Learning for Topic Extraction and Document Similarity Matching
  32. ls,rm -- not the Unix functions!