08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Basic R
1. Prof. Dr. Roberto Dantas de Pinho, roberto.pinho@mct.gov.br
26/jul/2012
This presentation is based on courses by
Dr. Paulo Justiniano Ribeiro Jr (UFPR) &
Dr. Cosme Marcelo Furtado Passos da Silva (FIOCRUZ)
SEXECASCAV|CGIN 1
2. A First R Session Saving your work
Objects Changing data
Data input Sums e
Now that we have aggregates
data... Linear regression
Some analyses
Filter & select
And lots of other
things along the way
SEXECASCAV|CGIN 2
3. Install, configuration etc.
R internals, structure etc.
Handling large datasets
Fancy plots beyond the basics
SEXECASCAV|CGIN 3
4. You can use R to evaluate some simple
expressions. Just type:
1 + 2 + 3
2 + 3 * 4
3/2 + 1
4 * 3**3
R is an environment and a language
SEXECASCAV|CGIN 4
5. The R environment allows for you to submit
command and see results immediately.
The R language is made by the set of rules
and functions that may be run by the R
environment.
You may keep command sequences (scripts)
for latter use.
SEXECASCAV|CGIN 5
6. Several functions are available. A couple simple
examples:
sqrt(2) 2
abs(-10) 10
sin(pi) sin( )
pi is a constant in R, its value is already defined.
SEXECASCAV|CGIN 6
7. Results, input data, tables etc. are all stored
in R as Objects
Objects have a name, content , type and are
stored in memory. Ex.
Creates object “x” with the number 10:
x <- 10
Show the content of x:
x
In R, abc is different of ABC
SEXECASCAV|CGIN 7
8. Try:
X <- sqrt(2)
<- and = are equivalent.
Y = sin(pi)
Z = sqrt(X+Y)
In the above examples, X, Y and Z store
results from each operation.
In R, There is always many ways of
doing the same thing.
We will try to focus on a single way of doing each task.
SEXECASCAV|CGIN 8
9. What is the value of C at the end of the script?
A = 1
B = 2
C = A + B
A = 5
B = 5
Why?
SEXECASCAV|CGIN 9
14. Object that hold multiple values that store
data of a single type
Function c( ) (“c” from concatenate) groups
values to build a vector:
X = c(1,3,6)
To access vector elements:
X[1] X[3]
SEXECASCAV|CGIN 14
15. Operations may be performed and functions
applied over the whole vector. Ex.
X = c(1,3,5)
Y = c(10,20,30)
X+Y
[1] 11 23 35
sum(X)
[1] 9
How about X + 100 ?
[1] 101 103 105 due to the
Recycling law
SEXECASCAV|CGIN 15
16. When the size of an object required by an
operation is different from the actual size,
available data is repeated as needed.
As X has 3 elements, X+100 is the same as
X + c(100,100,100)
SEXECASCAV|CGIN 16
24. “matrix” with many dimensions. Ex. 3 dim.:
ar1 <- array(1:24, dim = c(3, 4, 2))
, , 1
1ª matrix
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12 For a 3 dimention array, you
migth visualize the 3rd
, , 2 dimentions as a colections of
matrices.
[,1] [,2] [,3] [,4]
[1,] 13 16 19 22
[2,] 14 17 20 23 2ª matrix
[3,] 15 18 21 24
SEXECASCAV|CGIN 24
25. How to work with this kind of data?
Ano Código do Órgão
UF Órgão Código da UO unidade orçamentária função subfunção programa ação
localizador descrição da ação valor P&D valor ACTC
Adm
direta e MODERNIZAÇÃO DO SISTEMA DE
2010 AC 1 indireta 1 Adm direta e indireta 19 121 2056 1548 PLANEJAMENTO E GESTÃO DA SDCT R$ - R$ 16.655,00
PROGRAMA DE COOPERAÇÃO TÉCNICA E
Adm FINANCEIRA COM INSTIT. NAC. INTERN.
direta e GOVERNAMENTAIS E NÃO
2010 AC 1 indireta 1 Adm direta e indireta 19 121 2056 1549 GOVERNAMENTAIS R$ - R$ 715.000,00
Adm
direta e MANUTENÇÃO DO GABINETE DO SECRETÁ
2010 AC 1 indireta 1 Adm direta e indireta 19 122 2009 2224 RIO R$ - R$ 27.732,11
Adm
direta e
2010 AC 1 indireta 1 Adm direta e indireta 19 122 2009 2227 DEPARTAMENTO DE GESTÃO INTERNA R$ - R$ 2.266.169,90
SEXECASCAV|CGIN 25
26. colnames(d) [1] "letra" "num" "valor"
Each column has its own data type
d = data.frame(letters[1:4], 1:4, 10.5)
letters.1.4. X1.4 X10.5
1 a 1 10.5 We will be using
2 b 2 10.5 data.frames most of
3 c 3 10.5 the time
4 d 4 10.5
We can change column names:
colnames(d) = c("letra","num", "valor")
colnames(d)
[1] "letra" "num" "valor“
d$valor # selects column “valor” from d
SEXECASCAV|CGIN 26
30. require(XLConnect)
Loads package XLConnect
Packages are sets of functions and data that
add capabilities to R.
If the package is not installed:
setInternet2() #only on windows
install.packages("XLConnect", dep=T)
SEXECASCAV|CGIN 30
31. Creates an object “wb” that points to the
excel file:
wb <-
loadWorkbook(“AC_PDACTCaula.xls”)
SEXECASCAV|CGIN 31
32. Load the first sheet data into an object called
“plan1”
plan1 <- readWorksheet(wb, sheet = 1)
R functions
identify
parameters by Or by name, or
order both
SEXECASCAV|CGIN 32
33. Show the structure of the new object:
str(plan1) str() works with any R
Object. It is very useful.
Show data on a window:
View(plan1) In RStudio, you may click on na
object from the objects list to the
same effect
SEXECASCAV|CGIN 33
34. args(readWorksheet) #shows available
parameters
function (
object, #workbook “wb”
sheet, #number or name of the sheet
startRow, #
startCol, #
endRow, #
endCol, #
header # T or F: use first line to
name columns )
SEXECASCAV|CGIN 34
35. Comma-separated values
Very popular format for data interchange
;
Other separators are also popular: <tab> <space>
Example:
uf ano valido somaactc somapd
AC 2009 1 34296430.67 3630841.04
AC 2010 1 29397712.04 3579715.12
AL 2009 1 12650160.51 8903714.41
SEXECASCAV|CGIN 35
36. Example:
uf ano valido somaactc somapd
AC 2009 1 34296430,67 3630841,04
AC 2010 1 29397712,04 3579715,12
AL 2009 1 12650160,51 8903714,41
To read this file:
d = read.csv(file="AgregaUF20110930_b.txt",
header=T, # uses first line as column names
sep="t", # separator is <tab>
dec="," # decimals uses comma
)
SEXECASCAV|CGIN 36
39. How to get the sum of values from a
data.frame column?
sum(data.frame$column)
sum(d$somapd)
[1] NA
SEXECASCAV|CGIN 39
40. NA Not Available
Missing values.
NaN Not a Number
Value not able to be presented as a number.
Inf & -Inf
plus and minus infinite
Try: c(-1,0,1)/0
SEXECASCAV|CGIN 40
42. For these examples:
milsa = read.csv("milsaText.txt",
sep="t", head=T, dec=".")
SEXECASCAV|CGIN 42
43. Absolute frequencies
table(milsa$civil)
Relative frequencies
table(milsa$civil) /
length(milsa$civil)
or
prop.table(milsa$civil)
Pie chart
pie(table(milsa$civil))
SEXECASCAV|CGIN 43
44. With attach(milsa)
Absolute frequencies
table(civil)
Relative frequencies
table(civil) /
length(civil)
or
prop.table(civil)
Pie Chart
after: detach(milsa)
pie(table(civil))
SEXECASCAV|CGIN 44
45. Bar plot:
barplot(table(instrucao))
remember:
I may save any result as an object to use it later.
instrucao.tb = table(instrucao)
barplot(instrucao.tb)
pie(instrucao.tb)
SEXECASCAV|CGIN 45
50. Who earns above median
acimamediana = milsa[ salario >
median(salario), ]
Who is married and has higher education
degree?
casadoEsuperior = milsa[
civil==“casado” & instrucao ==
“Superior”, ]
AND: both must be true
SEXECASCAV|CGIN 50
51. Who is married or has higher education
degree?
casadoOUsuperior = milsa[
civil==“casado” | instrucao ==
“Superior”, ]
OR: at least one must
be true
SEXECASCAV|CGIN 51
52. NOT
milsaLimpo=milsa[!is.na(salario), ]
In English:
New Table milsaLimpo
equals =
Old table milsa
Select [
Rows where
Salary is not NA ! is.na(salario)
And all columns , ]
SEXECASCAV|CGIN 52
53. How many are married?
sum(civil==“casado”)
or
table(civil)["casado"]
How may are married and has higher ed.
degree?
sum(civil==“casado” & instrucao ==
“Superior” )
or
table(civil,instrucao)["casado","S
uperior"]
SEXECASCAV|CGIN 53
54. milsaNovo is equal to milsa, without
rows 1,2 & 5 & without columns 1 &
8:
milsaNovo =
milsa[-c(1,2,5), -c(1,8)]
SEXECASCAV|CGIN 54
55. Which rows where this
is TRUE
sup = which(instrucao=="Superior“)
[1] 19 24 31 33 34 36
May use it again later:
mean(milsa[sup,”salario”])
Mean salary for those with higher education
advantage: it is not a copy!!
SEXECASCAV|CGIN 55
56. A random sample of 10 rows from
milsa:
amostra =
sample(x=nrow(milsa),size=10)
[1] 12 29 1 3 17 14 26 33 20 31
Mean salary for the sample:
mean(milsa[amostra,”salario”])
SEXECASCAV|CGIN 56
57. By number of children:
milsa[order(filhos),]
Decreasing:
milsa[order(filhos, decreasing=T),]
By number of children and then age:
milsa[order(filhos,ano),]
10 youngest:
head(milsa[order(ano),], 10)
10 older:
tail(milsa[order(ano),], 10)
SEXECASCAV|CGIN 57
58. Removing an object
rm(milsaNovo)
Removing every object
rm(list = ls())
ls() : list of current
objects
SEXECASCAV|CGIN 58
59. List objects are collections that may include different
types of objects.
lis = list(A=1:10, B=“Text”,
C = matrix(1:9,ncol=3)
They are often used as parameters to functions or as
result sets from them.
lis[1:2]
A list with the two first objects from lis (A & B)
lis[[1]]:
object stored at the first position of the list ( the content of
A). The same as lis$A
SEXECASCAV|CGIN 59
60. Saving all objects:
save.image(“file.RData”)
Saving selected objects:
save( x, y, file=“file.RData”)
loading:
load(“file.RData“)
Several “loads”: objects with distinct
names are kept in memory
SEXECASCAV|CGIN 60
61. Saving a script “.R” that reproduces the desired
output.
Advantage:
It may be used to document the work performed;
It may be used again over updated data to update
results.
Hybrid model:
Save intermediate results that take long time to
process. Update them less often.
SEXECASCAV|CGIN 61
62. Add a column to a data.frame:
milsa$idade =
milsa$ano + milsa$mes/12
SEXECASCAV|CGIN 62
69. Only rows found in both data.frames:
merge(x=milsa,
y=tabInst,by.x="instrucao", by.y="desc“,
all=F)
All rows from data.frame X:
merge(x=milsa,
y=tabInst,by.x="instrucao", by.y="desc",
all.x=T)
SEXECASCAV|CGIN 69
70. All rows from data.frame y:
merge(x=milsa,
y=tabInst,by.x="instrucao", by.y="desc",
all.y=T)
All rows from data.frames x & y:
merge(x=milsa,
y=tabInst,by.x="instrucao",
by.y="desc", all=T)
SEXECASCAV|CGIN 70
71. From text to numeric
d.f$novaColuna = as.numeric(d.f$coluna)
From numeric to text:
d.f$novaColuna=as.character(d.f$coluna)
From text or numeric to integer:
d.f$novaColuna = as.integer(d.f$coluna)
Integers save memory
SEXECASCAV|CGIN 71
72. Representation for categorical data
Nominal
▪ “married”, “single”
Ordinal Factors save memory
▪ “tall”, “short”
Assure proper treatment for these variables
by many R functions
SEXECASCAV|CGIN 72
74. From factor to text:
d.f$novaColuna =
as.character(d.f$colunaFator)
From factor to numeric:
d.f$novaColuna =
as.numeric(
as.character(d.f$colunaFator))
The internal representation of a factor
is different from its text description
SEXECASCAV|CGIN 74
75. Using:
m1 <- matrix(1:12, ncol = 3)
Sum of columns (a value for each column):
colSums(m1)
[1] 10 26 42
or
apply(m1,2,sum)
[1] 10 26 42
SEXECASCAV|CGIN 75
76. Sum of rows (one value for each row):
rowSums(m1)
[1] 15 18 21 24
or
apply(m1,1,sum)
[1] 15 18 21 24
May use any
function, even
your own.
SEXECASCAV|CGIN 76
79. model = lm(
formula = salario ~ ano + instrucao,
data = milsa)
summary(model)
Just one line!!!
SEXECASCAV|CGIN 79
80. Prof. Dr. Roberto Dantas de Pinho, roberto.pinho@mct.gov.br
This presentation is based on courses by
Dr. Paulo Justiniano Ribeiro Jr (UFPR) &
Dr. Cosme Marcelo Furtado Passos da Silva (FIOCRUZ)
SEXECASCAV|CGIN 80