SlideShare ist ein Scribd-Unternehmen logo
1 von 80
Downloaden Sie, um offline zu lesen
Prof. Dr. Roberto Dantas de Pinho, roberto.pinho@mct.gov.br




                                                                        26/jul/2012


                            This presentation is based on courses by
                            Dr. Paulo Justiniano Ribeiro Jr (UFPR) &
                            Dr. Cosme Marcelo Furtado Passos da Silva (FIOCRUZ)

                                 SEXECASCAV|CGIN                            1
 A First R Session      Saving your work
 Objects                Changing data
 Data input             Sums e
 Now that we have        aggregates
  data...                Linear regression
   Some analyses
 Filter & select
                         And lots of other
                         things along the way
                      SEXECASCAV|CGIN          2
Install, configuration etc.
R internals, structure etc.
Handling large datasets
Fancy plots beyond the basics




                   SEXECASCAV|CGIN   3
   You can use R to evaluate some simple
    expressions. Just type:
    1 +   2 + 3
    2 +   3 * 4
    3/2   + 1
    4 *   3**3


   R is an environment and a language
                        SEXECASCAV|CGIN    4
 The R environment allows for you to submit
  command and see results immediately.
 The R language is made by the set of rules
  and functions that may be run by the R
  environment.
 You may keep command sequences (scripts)
  for latter use.


                      SEXECASCAV|CGIN         5
   Several functions are available. A couple simple
    examples:
     sqrt(2)                           2
     abs(-10)                         10
     sin(pi)                      sin( )
   pi is a constant in R, its value is already defined.
                             SEXECASCAV|CGIN              6
 Results, input data, tables etc. are all stored
  in R as Objects
 Objects have a name, content , type and are
  stored in memory. Ex.
   Creates object “x” with the number 10:
    x <- 10
   Show the content of x:
    x
                        In R, abc is different of ABC

                           SEXECASCAV|CGIN             7
   Try:
    X <- sqrt(2)
                             <- and = are equivalent.
    Y = sin(pi)
    Z = sqrt(X+Y)
   In the above examples, X, Y and Z store
    results from each operation.
In R, There is always many ways of
doing the same thing.

       We will try to focus on a single way of doing each task.

                               SEXECASCAV|CGIN                   8
   What is the value of C at the end of the script?
    A   =   1
    B   =   2
    C   =   A + B
    A   =   5
    B   =   5
   Why?


                          SEXECASCAV|CGIN             9
SEXECASCAV|CGIN   10
   Tool that makes it easier to use R
   Manages work windows
   Easier access to objects, scripts, history of
    commands and plots.




                           SEXECASCAV|CGIN         11
Editing Scripts &
object view




           Console


                    SEXECASCAV|CGIN   12
Object list
& history




Help, plots,
files & packages




              SEXECASCAV|CGIN   13
 Object that hold multiple values that store
  data of a single type
 Function c( ) (“c” from concatenate) groups
  values to build a vector:
    X = c(1,3,6)
   To access vector elements:
    X[1]              X[3]




                        SEXECASCAV|CGIN        14
   Operations may be performed and functions
    applied over the whole vector. Ex.
    X = c(1,3,5)
    Y = c(10,20,30)
    X+Y
    [1] 11 23 35
    sum(X)
    [1] 9
   How about     X + 100      ?
    [1] 101 103 105 due to the
    Recycling law
                         SEXECASCAV|CGIN       15
 When the size of an object required by an
  operation is different from the actual size,
  available data is repeated as needed.
 As X has 3 elements, X+100 is the same as
    X + c(100,100,100)




                       SEXECASCAV|CGIN          16
> X = 1:10
> [1] 1 2 3 4 5 6 7 8 9 10
> X = seq(0,1,by=0.1)
> [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
> rep(“a”,5)
> “a” “a” “a” “a” “a”
> names = c("fulano", "beltrano",
  "cicrano")
> names [1] "fulano" "beltrano" "cicrano"
> letras = letters[1:5]
> letras [1] "a" "b" "c" "d" "e"
> letras = LETTERS[1:5]
> letras [1] "A" "B" "C" "D" "E"
                           SEXECASCAV|CGIN         17
   numeric                  integer
     is.numeric( )            is.integer( )
     as.numeric( )            as.integer( )
   character                logical
     is.character( )          T == TRUE == 1
     as.character( )          F == FALSE == 0


                A == B means “is A equal to B?”

                         SEXECASCAV|CGIN         18
   A Vector arranged in rows & columns
    m1 <- matrix(1:12, ncol        = 3)
          [,1]     [,2]            [,3]
    [1,] 1         5               9
    [2,] 2         6               10
    [3,] 3         7               11
    [4,] 4         8               12


                        SEXECASCAV|CGIN   19
 length(m1)
 [1] 12
 dim(m1)
 [1] 4 3
 nrow(m1)
 [1] 4
 ncol(m1)
 [1] 3


               SEXECASCAV|CGIN   20
 m1[1,   2]
 [1] 5
 m1[2,   2]
 [1] 6
 m1[ ,   2]
 [1] 5   6 7 8
 m1[3,   ]                   m1[1,2]= 99
 [1] 3   7 11    changes the value of the cell


                     SEXECASCAV|CGIN             21
m1[1:2, 2:3]
   [,1]    [,2]
[1,]   5   9
[2,]   6   10




                  SEXECASCAV|CGIN   22
colnames(m1)
NULL
rownames(m1)
NULL

colnames(m1) = c("C1","C2","C3")

m1[,”C1”]
[1] 1 2 3 4      t(m1) transpose of m1
                 SEXECASCAV|CGIN        23
   “matrix” with many dimensions. Ex. 3 dim.:
ar1 <- array(1:24, dim = c(3, 4, 2))
, , 1
                                                1ª matrix
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12            For a 3 dimention array, you
                                          migth visualize the 3rd
, , 2                              dimentions as a colections of
                                                        matrices.
     [,1] [,2] [,3] [,4]
[1,]   13   16   19   22
[2,]   14   17   20   23                         2ª matrix
[3,]   15   18   21   24

                            SEXECASCAV|CGIN                    24
     How to work with this kind of data?
Ano Código do Órgão
     UF           Órgão    Código da UO unidade orçamentária função subfunção programa ação
                                                                                         localizador        descrição da ação           valor P&D          valor ACTC
                Adm
                direta e                                                                         MODERNIZAÇÃO DO SISTEMA DE
2010 AC       1 indireta              1 Adm direta e indireta     19      121      2056 1548     PLANEJAMENTO E GESTÃO DA SDCT         R$           - R$       16.655,00
                                                                                                 PROGRAMA DE COOPERAÇÃO TÉCNICA E
                Adm                                                                              FINANCEIRA COM INSTIT. NAC. INTERN.
                direta e                                                                         GOVERNAMENTAIS E NÃO
2010 AC       1 indireta              1 Adm direta e indireta     19      121      2056 1549     GOVERNAMENTAIS                        R$           - R$      715.000,00
                Adm
                direta e                                                                         MANUTENÇÃO DO GABINETE DO SECRETÁ
2010 AC       1 indireta              1 Adm direta e indireta     19      122      2009 2224     RIO                                   R$           - R$       27.732,11
                Adm
                direta e
2010 AC       1 indireta              1 Adm direta e indireta     19      122      2009 2227     DEPARTAMENTO DE GESTÃO INTERNA        R$           - R$ 2.266.169,90




                                                                                    SEXECASCAV|CGIN                                                            25
colnames(d) [1] "letra" "num" "valor"




       Each column has its own data type
    d = data.frame(letters[1:4],                 1:4, 10.5)
      letters.1.4. X1.4 X10.5
    1             a    1 10.5                    We will be using
    2             b    2 10.5                    data.frames most of
    3             c    3 10.5                    the time
    4             d    4 10.5
       We can change column names:
    colnames(d) = c("letra","num", "valor")
    colnames(d)
    [1] "letra" "num" "valor“
    d$valor   # selects column “valor” from d

                                        SEXECASCAV|CGIN               26
   list
   factor




                                latter...
                                        27
             SEXECASCAV|CGIN
   Several possible sources.
   We will see:
     Keyboard      x = scan( )
     Excel files
     CSV files
     SQL Databases




                         SEXECASCAV|CGIN   28
require(XLConnect)

wb <- loadWorkbook(“AC_PDACTCaula.xls”)

plan1 <- readWorksheet(wb, sheet = 1)

str(plan1)

View(plan1)


                     SEXECASCAV|CGIN   29
require(XLConnect)
 Loads package XLConnect
 Packages are sets of functions and data that
  add capabilities to R.
 If the package is not installed:

setInternet2() #only on windows
install.packages("XLConnect", dep=T)


                       SEXECASCAV|CGIN          30
   Creates an object “wb” that points to the
    excel file:
wb <-
 loadWorkbook(“AC_PDACTCaula.xls”)




                         SEXECASCAV|CGIN       31
   Load the first sheet data into an object called
    “plan1”

plan1 <- readWorksheet(wb, sheet = 1)

                        R functions
                            identify
                     parameters by         Or by name, or
                              order                 both



                            SEXECASCAV|CGIN                32
   Show the structure of the new object:

str(plan1)              str() works with any R
                           Object. It is very useful.

   Show data on a window:

View(plan1)            In RStudio, you may click on na
                      object from the objects list to the
                                           same effect


                         SEXECASCAV|CGIN                   33
args(readWorksheet) #shows available
 parameters
function (
object,       #workbook “wb”
sheet,        #number or name of the sheet
startRow,     #
startCol,     #
endRow,       #
endCol,       #
header        # T or F: use first line to
              name columns )
                      SEXECASCAV|CGIN       34
   Comma-separated values
   Very popular format for data interchange
                                              ;
  Other separators are also popular: <tab> <space>
 Example:
uf    ano    valido   somaactc            somapd
AC    2009   1        34296430.67         3630841.04
AC    2010   1        29397712.04         3579715.12
AL    2009   1        12650160.51         8903714.41




                           SEXECASCAV|CGIN            35
   Example:
uf     ano    valido     somaactc           somapd
AC     2009   1          34296430,67        3630841,04
AC     2010   1          29397712,04        3579715,12
AL     2009   1          12650160,51        8903714,41
   To read this file:
d = read.csv(file="AgregaUF20110930_b.txt",
  header=T, # uses first line as column names
  sep="t", # separator is <tab>
  dec=","   # decimals uses comma
)



                             SEXECASCAV|CGIN            36
   str(d)       #structure

   summary(d)   #Statistical summary

   head(d)      #first rows

   tail(d)      #last rows

   plot(d)      #standard plot

                    SEXECASCAV|CGIN    37
require(RODBC)
canal <- odbcConnect(
“base_ODBC",
case="tolower“,
uid=“user”,
pwd=“password”)
d <- sqlQuery(canal,
”select * from table where year = 2010”,
as.is=T)

                    SEXECASCAV|CGIN   38
   How to get the sum of values from a
    data.frame column?
    sum(data.frame$column)
    sum(d$somapd)
    [1] NA




                        SEXECASCAV|CGIN   39
   NA Not Available
     Missing values.
   NaN Not a Number
     Value not able to be presented as a number.
   Inf & -Inf
     plus and minus infinite



                                     Try: c(-1,0,1)/0
                                SEXECASCAV|CGIN        40
   Sum:
    sum(d$somapd, na.rm=T)
    [1] 4836882446
   Mean:
mean(d$somapd, na.rm=T)
   Median:
median(d$somapd, na.rm=T)
   Standard deviation:
sd(d$somapd, na.rm=T)

                          SEXECASCAV|CGIN   41
   For these examples:
    milsa = read.csv("milsaText.txt",
      sep="t", head=T, dec=".")




                          SEXECASCAV|CGIN   42
 Absolute frequencies
table(milsa$civil)
 Relative frequencies
table(milsa$civil) /
    length(milsa$civil)
    or
prop.table(milsa$civil)
 Pie chart
pie(table(milsa$civil))

                 SEXECASCAV|CGIN   43
 With attach(milsa)
 Absolute frequencies
table(civil)
 Relative frequencies
table(civil) /
    length(civil)
    or
prop.table(civil)
 Pie Chart
                         after: detach(milsa)
pie(table(civil))
                  SEXECASCAV|CGIN              44
 Bar plot:
barplot(table(instrucao))
 remember:
     I may save any result as an object to use it later.

instrucao.tb = table(instrucao)
barplot(instrucao.tb)
pie(instrucao.tb)


                                SEXECASCAV|CGIN            45
 Try:
prop.table(filhos)
 Solution:
prop.table(table(filhos))
 Other solution:
     Filter out elements with NA




                            SEXECASCAV|CGIN   46
 mean(filhos, na.rm=T)
     median(filhos, na.rm=T)
     range(filhos, na.rm=T)
     var(filhos, na.rm=T) #variance
     sd(filhos, na.rm=T) #standard
     deviation
   Quantiles:
     filhos.quartis = quantile(filhos, na.rm=T)
   interquartile range:
     filhos.quartis [4] -filhos.quartis [1]

                          SEXECASCAV|CGIN         47
 plot(milsa)
 plot(salario ~ ano)
 hist(salario)
 boxplot(salario)
 stem(salario)

           SEXECASCAV|CGIN   48
   Selecting some rows
   milsaNovo = milsa[c(1,3,5,6)             ,   ]
   Selecting some columns
   milsaNovo = milsa[          , c(1,3,5)]
   milsaNovo = milsa[          , c(“funcionario”,
                          ”instrucao“, “salario”)]
   Attention:
     New copy
 milsaNovo=milsa[c(1,3,5,6) ,]
   Replaces previous
 milsa=milsa[c(1,3,5,6) , ]

                          SEXECASCAV|CGIN       49
 Who earns above median
 acimamediana = milsa[ salario >
  median(salario), ]
 Who is married and has higher education
  degree?
 casadoEsuperior = milsa[
  civil==“casado” & instrucao ==
  “Superior”, ]
                        AND: both must be true


                    SEXECASCAV|CGIN             50
 Who is married or has higher education
  degree?
 casadoOUsuperior = milsa[
  civil==“casado” | instrucao ==
  “Superior”, ]

                         OR: at least one must
                                 be true



                     SEXECASCAV|CGIN            51
NOT


 milsaLimpo=milsa[!is.na(salario), ]
 In English:
     New Table          milsaLimpo
     equals             =
     Old table          milsa
     Select             [
     Rows where
     Salary is not NA   ! is.na(salario)
     And all columns    , ]
                         SEXECASCAV|CGIN   52
How many are married?
sum(civil==“casado”)
     or
table(civil)["casado"]
How may are married and has higher ed.
 degree?
sum(civil==“casado” & instrucao ==
 “Superior” )
     or
table(civil,instrucao)["casado","S
 uperior"]
                    SEXECASCAV|CGIN      53
 milsaNovo is equal to milsa, without
 rows 1,2 & 5 & without columns 1 &
 8:

milsaNovo =

milsa[-c(1,2,5), -c(1,8)]
                  SEXECASCAV|CGIN       54
Which rows where this
                                               is TRUE


 sup = which(instrucao=="Superior“)
 [1] 19 24 31 33 34 36
 May use it again later:
     mean(milsa[sup,”salario”])
     Mean salary for those with higher education

                                  advantage: it is not a copy!!

                               SEXECASCAV|CGIN                   55
   A random sample of 10 rows from
    milsa:
    amostra =
     sample(x=nrow(milsa),size=10)
    [1] 12 29 1 3 17 14 26 33 20 31
   Mean salary for the sample:
    mean(milsa[amostra,”salario”])


                       SEXECASCAV|CGIN   56
   By number of children:
    milsa[order(filhos),]
   Decreasing:
    milsa[order(filhos, decreasing=T),]
   By number of children and then age:
    milsa[order(filhos,ano),]
   10 youngest:
    head(milsa[order(ano),], 10)
   10 older:
    tail(milsa[order(ano),], 10)
                             SEXECASCAV|CGIN   57
 Removing an object
  rm(milsaNovo)
 Removing every object
  rm(list = ls())
                 ls() : list of current
                         objects

               SEXECASCAV|CGIN           58
 List objects are collections that may include different
  types of objects.
lis = list(A=1:10, B=“Text”,
            C = matrix(1:9,ncol=3)
 They are often used as parameters to functions or as
  result sets from them.
 lis[1:2]
     A list with the two first objects from lis (A & B)
   lis[[1]]:
     object stored at the first position of the list ( the content of
      A). The same as lis$A

                                    SEXECASCAV|CGIN                 59
 Saving all objects:
  save.image(“file.RData”)
 Saving selected objects:
  save( x, y, file=“file.RData”)
 loading:
  load(“file.RData“)

            Several “loads”: objects with distinct
                 names are kept in memory
                   SEXECASCAV|CGIN                  60
 Saving a script “.R” that reproduces the desired
  output.
 Advantage:
     It may be used to document the work performed;
     It may be used again over updated data to update
     results.
   Hybrid model:
     Save intermediate results that take long time to
     process. Update them less often.

                               SEXECASCAV|CGIN          61
   Add a column to a data.frame:
    milsa$idade =
      milsa$ano + milsa$mes/12




                        SEXECASCAV|CGIN   62
X                            Y




6+3+5=14

           SEXECASCAV|CGIN       63
X                      Y




    SEXECASCAV|CGIN       64
X                      Y




    SEXECASCAV|CGIN       65
X                      Y




    SEXECASCAV|CGIN       66
X                      Y




    SEXECASCAV|CGIN       67
   Example:



    &




               SEXECASCAV|CGIN   68
   Only rows found in both data.frames:
merge(x=milsa,
 y=tabInst,by.x="instrucao", by.y="desc“,
 all=F)
All rows from data.frame X:
merge(x=milsa,
    y=tabInst,by.x="instrucao", by.y="desc",
    all.x=T)




                           SEXECASCAV|CGIN   69
All rows from data.frame y:
merge(x=milsa,
    y=tabInst,by.x="instrucao", by.y="desc",
    all.y=T)



All rows from data.frames x & y:
merge(x=milsa,
 y=tabInst,by.x="instrucao",
 by.y="desc", all=T)
                         SEXECASCAV|CGIN   70
   From text to numeric
d.f$novaColuna = as.numeric(d.f$coluna)

   From numeric to text:
d.f$novaColuna=as.character(d.f$coluna)

   From text or numeric to integer:
d.f$novaColuna = as.integer(d.f$coluna)

                               Integers save memory

                            SEXECASCAV|CGIN          71
   Representation for categorical data
     Nominal
      ▪ “married”, “single”
     Ordinal                     Factors save memory
      ▪ “tall”, “short”
   Assure proper treatment for these variables
    by many R functions


                              SEXECASCAV|CGIN          72
Nominal:
milsa$fatorcivil=factor(milsa$civil, ordered=F)

$fatorcivil : Factor w/ 2 levels
  "casado","solteiro": 2 1 1 2 2 1 2 2 1 2

Ordinal:
milsa$fatormes = factor(milsa$mes, ordered=T)

$fatormes : Ord.factor w/ 12 levels
  "0"<"1"<"2"<"3"<..: 4 11 6 11 8 1 1 5 11 7 ...
             It is possible to define a custom order: ?factor

                             SEXECASCAV|CGIN                   73
 From factor to text:
d.f$novaColuna =
  as.character(d.f$colunaFator)
 From factor to numeric:
d.f$novaColuna =
  as.numeric(
     as.character(d.f$colunaFator))
          The internal representation of a factor
           is different from its text description

                        SEXECASCAV|CGIN            74
   Using:
    m1 <- matrix(1:12, ncol = 3)
   Sum of columns (a value for each column):
colSums(m1)
[1] 10 26 42
     or
apply(m1,2,sum)
[1] 10 26 42



                           SEXECASCAV|CGIN     75
   Sum of rows (one value for each row):
rowSums(m1)
[1] 15 18 21 24
   or
apply(m1,1,sum)
[1] 15 18 21 24

       May use any
      function, even
        your own.

                           SEXECASCAV|CGIN   76
aggregate(salario ~ instrucao,
              data = milsa, mean)

  instrucao   salario
1    1oGrau 7.836667
2    2oGrau 11.528333
3 Superior 16.475000


               SEXECASCAV|CGIN   77
aggregate(
  salario ~ instrucao   + civil,
               data =   milsa, mean)
   instrucao    civil     salario
1     1oGrau   casado    7.044000
2     2oGrau   casado   12.825000
3 Superior     casado   17.783333
4     1oGrau solteiro    8.402857
5     2oGrau solteiro    8.935000
6 Superior solteiro     15.166667
                  SEXECASCAV|CGIN     78
model = lm(
 formula = salario ~ ano + instrucao,
 data = milsa)

summary(model)




                                Just one line!!!

                   SEXECASCAV|CGIN                79
Prof. Dr. Roberto Dantas de Pinho, roberto.pinho@mct.gov.br




                            This presentation is based on courses by
                            Dr. Paulo Justiniano Ribeiro Jr (UFPR) &
                            Dr. Cosme Marcelo Furtado Passos da Silva (FIOCRUZ)

                                 SEXECASCAV|CGIN                          80

Weitere ähnliche Inhalte

Was ist angesagt?

Microsoft Word Practice Exercise Set 2
Microsoft Word   Practice Exercise Set 2Microsoft Word   Practice Exercise Set 2
Microsoft Word Practice Exercise Set 2
rampan
 
Practice Exercise Set 1
Practice Exercise Set 1Practice Exercise Set 1
Practice Exercise Set 1
rampan
 
Module 2 logic gates
Module 2  logic gatesModule 2  logic gates
Module 2 logic gates
Deepak John
 
Engr 371 final exam april 2006
Engr 371 final exam april 2006Engr 371 final exam april 2006
Engr 371 final exam april 2006
amnesiann
 
Engr 371 final exam april 2010
Engr 371 final exam april 2010Engr 371 final exam april 2010
Engr 371 final exam april 2010
amnesiann
 

Was ist angesagt? (20)

Gauss seidal Matlab Code
Gauss seidal Matlab CodeGauss seidal Matlab Code
Gauss seidal Matlab Code
 
Qno 3 (a)
Qno 3 (a)Qno 3 (a)
Qno 3 (a)
 
[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network
[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network
[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network
 
Microsoft Word Practice Exercise Set 2
Microsoft Word   Practice Exercise Set 2Microsoft Word   Practice Exercise Set 2
Microsoft Word Practice Exercise Set 2
 
Hat04 0205
Hat04 0205Hat04 0205
Hat04 0205
 
Practice Exercise Set 1
Practice Exercise Set 1Practice Exercise Set 1
Practice Exercise Set 1
 
Linear models
Linear modelsLinear models
Linear models
 
Non-local Neural Network
Non-local Neural NetworkNon-local Neural Network
Non-local Neural Network
 
QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017QMC Error SAMSI Tutorial Aug 2017
QMC Error SAMSI Tutorial Aug 2017
 
Switching theory and logic design.
Switching theory and logic design.Switching theory and logic design.
Switching theory and logic design.
 
r for data science 2. grammar of graphics (ggplot2) clean -ref
r for data science 2. grammar of graphics (ggplot2)  clean -refr for data science 2. grammar of graphics (ggplot2)  clean -ref
r for data science 2. grammar of graphics (ggplot2) clean -ref
 
Gentle Introduction to Dirichlet Processes
Gentle Introduction to Dirichlet ProcessesGentle Introduction to Dirichlet Processes
Gentle Introduction to Dirichlet Processes
 
Vb scripting
Vb scriptingVb scripting
Vb scripting
 
IRJET- On Greatest Common Divisor and its Application for a Geometrical S...
IRJET-  	  On Greatest Common Divisor and its Application for a Geometrical S...IRJET-  	  On Greatest Common Divisor and its Application for a Geometrical S...
IRJET- On Greatest Common Divisor and its Application for a Geometrical S...
 
Module 2 logic gates
Module 2  logic gatesModule 2  logic gates
Module 2 logic gates
 
Introduction of Online Machine Learning Algorithms
Introduction of Online Machine Learning AlgorithmsIntroduction of Online Machine Learning Algorithms
Introduction of Online Machine Learning Algorithms
 
Engr 371 final exam april 2006
Engr 371 final exam april 2006Engr 371 final exam april 2006
Engr 371 final exam april 2006
 
Logic simplification sop and pos forms
Logic simplification sop and pos formsLogic simplification sop and pos forms
Logic simplification sop and pos forms
 
R getting spatial
R getting spatialR getting spatial
R getting spatial
 
Engr 371 final exam april 2010
Engr 371 final exam april 2010Engr 371 final exam april 2010
Engr 371 final exam april 2010
 

Ähnlich wie Basic R

09 a1ec01 c programming and data structures
09 a1ec01 c programming and data structures09 a1ec01 c programming and data structures
09 a1ec01 c programming and data structures
jntuworld
 

Ähnlich wie Basic R (20)

8. Vectors data frames
8. Vectors data frames8. Vectors data frames
8. Vectors data frames
 
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
 
R tutorial for a windows environment
R tutorial for a windows environmentR tutorial for a windows environment
R tutorial for a windows environment
 
R Programming: Mathematical Functions In R
R Programming: Mathematical Functions In RR Programming: Mathematical Functions In R
R Programming: Mathematical Functions In R
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
Introduction to R r.nabati - iausdj.ac.ir
Introduction to R   r.nabati - iausdj.ac.irIntroduction to R   r.nabati - iausdj.ac.ir
Introduction to R r.nabati - iausdj.ac.ir
 
Vectors data frames
Vectors data framesVectors data frames
Vectors data frames
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
6. R data structures
6. R data structures6. R data structures
6. R data structures
 
Chapter 16-spreadsheet1 questions and answer
Chapter 16-spreadsheet1  questions and answerChapter 16-spreadsheet1  questions and answer
Chapter 16-spreadsheet1 questions and answer
 
R Programming Homework Help
R Programming Homework HelpR Programming Homework Help
R Programming Homework Help
 
09 a1ec01 c programming and data structures
09 a1ec01 c programming and data structures09 a1ec01 c programming and data structures
09 a1ec01 c programming and data structures
 
PCA and SVD in brief
PCA and SVD in briefPCA and SVD in brief
PCA and SVD in brief
 
IBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesIBM Informix dynamic server 11 10 Cheetah Sql Features
IBM Informix dynamic server 11 10 Cheetah Sql Features
 
IRJET- Parallelization of Definite Integration
IRJET- Parallelization of Definite IntegrationIRJET- Parallelization of Definite Integration
IRJET- Parallelization of Definite Integration
 
3306617
33066173306617
3306617
 
chapter1.ppt
chapter1.pptchapter1.ppt
chapter1.ppt
 
chapter1.ppt
chapter1.pptchapter1.ppt
chapter1.ppt
 
Digital electronics k map comparators and their function
Digital electronics k map comparators and their functionDigital electronics k map comparators and their function
Digital electronics k map comparators and their function
 
Compose Async with RxJS
Compose Async with RxJSCompose Async with RxJS
Compose Async with RxJS
 

Mehr von Roberto de Pinho

Doutores 2010-word-clouds_apres
 Doutores 2010-word-clouds_apres Doutores 2010-word-clouds_apres
Doutores 2010-word-clouds_apres
Roberto de Pinho
 
Dados abertos: dados pessoais e anonimização de bases
Dados abertos: dados pessoais e anonimização de basesDados abertos: dados pessoais e anonimização de bases
Dados abertos: dados pessoais e anonimização de bases
Roberto de Pinho
 

Mehr von Roberto de Pinho (19)

Avaliação de impacto em Ciência, Tecnologia e Inovação
Avaliação de impacto em Ciência, Tecnologia e InovaçãoAvaliação de impacto em Ciência, Tecnologia e Inovação
Avaliação de impacto em Ciência, Tecnologia e Inovação
 
Rumo a uma política de dados científicos
Rumo a uma política de dados científicosRumo a uma política de dados científicos
Rumo a uma política de dados científicos
 
Towards a scientific data policy
Towards a scientific data policy Towards a scientific data policy
Towards a scientific data policy
 
Cientometria: Duas xícaras de ciência e três pitadas de citações
Cientometria: Duas xícaras de ciência e três pitadas de citações Cientometria: Duas xícaras de ciência e três pitadas de citações
Cientometria: Duas xícaras de ciência e três pitadas de citações
 
Indicadores de políticas públicas e métricas de software: uma visão em paralelo
Indicadores de políticas públicas e métricas de software: uma visão em paraleloIndicadores de políticas públicas e métricas de software: uma visão em paralelo
Indicadores de políticas públicas e métricas de software: uma visão em paralelo
 
Fábrica de Experiência
Fábrica de ExperiênciaFábrica de Experiência
Fábrica de Experiência
 
Metodologia de Análise e Solução de Problemas (MASP)
Metodologia de Análise e Solução de Problemas (MASP)Metodologia de Análise e Solução de Problemas (MASP)
Metodologia de Análise e Solução de Problemas (MASP)
 
Natureza dos Problemas
Natureza dos ProblemasNatureza dos Problemas
Natureza dos Problemas
 
Elaboração de Indicadores para quem tem pressa
Elaboração de Indicadores para quem tem pressaElaboração de Indicadores para quem tem pressa
Elaboração de Indicadores para quem tem pressa
 
Indicadores bibliométricos
Indicadores bibliométricosIndicadores bibliométricos
Indicadores bibliométricos
 
Evolução e perspectivas dos investimentos em CTI no Brasil
Evolução e perspectivas dos investimentos em CTI no BrasilEvolução e perspectivas dos investimentos em CTI no Brasil
Evolução e perspectivas dos investimentos em CTI no Brasil
 
As Coisas e Os Dados
As Coisas e Os DadosAs Coisas e Os Dados
As Coisas e Os Dados
 
Key words of Brazilian science
Key words of Brazilian scienceKey words of Brazilian science
Key words of Brazilian science
 
Doutores 2010-word-clouds_apres
 Doutores 2010-word-clouds_apres Doutores 2010-word-clouds_apres
Doutores 2010-word-clouds_apres
 
Dados abertos: dados pessoais e anonimização de bases" no II Encontro Naciona...
Dados abertos: dados pessoais e anonimização de bases" no II Encontro Naciona...Dados abertos: dados pessoais e anonimização de bases" no II Encontro Naciona...
Dados abertos: dados pessoais e anonimização de bases" no II Encontro Naciona...
 
In vino veritas - Dans le vin la vérité - L’étiquette de vin
In vino veritas -  Dans le vin la vérité - L’étiquette de vinIn vino veritas -  Dans le vin la vérité - L’étiquette de vin
In vino veritas - Dans le vin la vérité - L’étiquette de vin
 
Espaço incremental para a mineração visual de conjuntos dinâmicos de documentos
Espaço incremental para a mineração visual de conjuntos dinâmicos de documentosEspaço incremental para a mineração visual de conjuntos dinâmicos de documentos
Espaço incremental para a mineração visual de conjuntos dinâmicos de documentos
 
Dados abertos: dados pessoais e anonimização de bases
Dados abertos: dados pessoais e anonimização de basesDados abertos: dados pessoais e anonimização de bases
Dados abertos: dados pessoais e anonimização de bases
 
Curso Básico de R
Curso Básico de RCurso Básico de R
Curso Básico de R
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Basic R

  • 1. Prof. Dr. Roberto Dantas de Pinho, roberto.pinho@mct.gov.br 26/jul/2012 This presentation is based on courses by Dr. Paulo Justiniano Ribeiro Jr (UFPR) & Dr. Cosme Marcelo Furtado Passos da Silva (FIOCRUZ) SEXECASCAV|CGIN 1
  • 2.  A First R Session  Saving your work  Objects  Changing data  Data input  Sums e  Now that we have aggregates data...  Linear regression  Some analyses  Filter & select And lots of other things along the way SEXECASCAV|CGIN 2
  • 3. Install, configuration etc. R internals, structure etc. Handling large datasets Fancy plots beyond the basics SEXECASCAV|CGIN 3
  • 4. You can use R to evaluate some simple expressions. Just type: 1 + 2 + 3 2 + 3 * 4 3/2 + 1 4 * 3**3  R is an environment and a language SEXECASCAV|CGIN 4
  • 5.  The R environment allows for you to submit command and see results immediately.  The R language is made by the set of rules and functions that may be run by the R environment.  You may keep command sequences (scripts) for latter use. SEXECASCAV|CGIN 5
  • 6. Several functions are available. A couple simple examples:  sqrt(2) 2  abs(-10)  10  sin(pi) sin( )  pi is a constant in R, its value is already defined. SEXECASCAV|CGIN 6
  • 7.  Results, input data, tables etc. are all stored in R as Objects  Objects have a name, content , type and are stored in memory. Ex.  Creates object “x” with the number 10: x <- 10  Show the content of x: x In R, abc is different of ABC SEXECASCAV|CGIN 7
  • 8. Try: X <- sqrt(2) <- and = are equivalent. Y = sin(pi) Z = sqrt(X+Y)  In the above examples, X, Y and Z store results from each operation. In R, There is always many ways of doing the same thing. We will try to focus on a single way of doing each task. SEXECASCAV|CGIN 8
  • 9. What is the value of C at the end of the script? A = 1 B = 2 C = A + B A = 5 B = 5  Why? SEXECASCAV|CGIN 9
  • 11. Tool that makes it easier to use R  Manages work windows  Easier access to objects, scripts, history of commands and plots. SEXECASCAV|CGIN 11
  • 12. Editing Scripts & object view Console SEXECASCAV|CGIN 12
  • 13. Object list & history Help, plots, files & packages SEXECASCAV|CGIN 13
  • 14.  Object that hold multiple values that store data of a single type  Function c( ) (“c” from concatenate) groups values to build a vector: X = c(1,3,6)  To access vector elements: X[1] X[3] SEXECASCAV|CGIN 14
  • 15. Operations may be performed and functions applied over the whole vector. Ex. X = c(1,3,5) Y = c(10,20,30) X+Y [1] 11 23 35 sum(X) [1] 9  How about X + 100 ? [1] 101 103 105 due to the Recycling law SEXECASCAV|CGIN 15
  • 16.  When the size of an object required by an operation is different from the actual size, available data is repeated as needed.  As X has 3 elements, X+100 is the same as X + c(100,100,100) SEXECASCAV|CGIN 16
  • 17. > X = 1:10 > [1] 1 2 3 4 5 6 7 8 9 10 > X = seq(0,1,by=0.1) > [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 > rep(“a”,5) > “a” “a” “a” “a” “a” > names = c("fulano", "beltrano", "cicrano") > names [1] "fulano" "beltrano" "cicrano" > letras = letters[1:5] > letras [1] "a" "b" "c" "d" "e" > letras = LETTERS[1:5] > letras [1] "A" "B" "C" "D" "E" SEXECASCAV|CGIN 17
  • 18. numeric  integer  is.numeric( )  is.integer( )  as.numeric( )  as.integer( )  character  logical  is.character( )  T == TRUE == 1  as.character( )  F == FALSE == 0 A == B means “is A equal to B?” SEXECASCAV|CGIN 18
  • 19. A Vector arranged in rows & columns m1 <- matrix(1:12, ncol = 3) [,1] [,2] [,3] [1,] 1 5 9 [2,] 2 6 10 [3,] 3 7 11 [4,] 4 8 12 SEXECASCAV|CGIN 19
  • 20.  length(m1)  [1] 12  dim(m1)  [1] 4 3  nrow(m1)  [1] 4  ncol(m1)  [1] 3 SEXECASCAV|CGIN 20
  • 21.  m1[1, 2]  [1] 5  m1[2, 2]  [1] 6  m1[ , 2]  [1] 5 6 7 8  m1[3, ] m1[1,2]= 99  [1] 3 7 11 changes the value of the cell SEXECASCAV|CGIN 21
  • 22. m1[1:2, 2:3] [,1] [,2] [1,] 5 9 [2,] 6 10 SEXECASCAV|CGIN 22
  • 24. “matrix” with many dimensions. Ex. 3 dim.: ar1 <- array(1:24, dim = c(3, 4, 2)) , , 1 1ª matrix [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12 For a 3 dimention array, you migth visualize the 3rd , , 2 dimentions as a colections of matrices. [,1] [,2] [,3] [,4] [1,] 13 16 19 22 [2,] 14 17 20 23 2ª matrix [3,] 15 18 21 24 SEXECASCAV|CGIN 24
  • 25. How to work with this kind of data? Ano Código do Órgão UF Órgão Código da UO unidade orçamentária função subfunção programa ação localizador descrição da ação valor P&D valor ACTC Adm direta e MODERNIZAÇÃO DO SISTEMA DE 2010 AC 1 indireta 1 Adm direta e indireta 19 121 2056 1548 PLANEJAMENTO E GESTÃO DA SDCT R$ - R$ 16.655,00 PROGRAMA DE COOPERAÇÃO TÉCNICA E Adm FINANCEIRA COM INSTIT. NAC. INTERN. direta e GOVERNAMENTAIS E NÃO 2010 AC 1 indireta 1 Adm direta e indireta 19 121 2056 1549 GOVERNAMENTAIS R$ - R$ 715.000,00 Adm direta e MANUTENÇÃO DO GABINETE DO SECRETÁ 2010 AC 1 indireta 1 Adm direta e indireta 19 122 2009 2224 RIO R$ - R$ 27.732,11 Adm direta e 2010 AC 1 indireta 1 Adm direta e indireta 19 122 2009 2227 DEPARTAMENTO DE GESTÃO INTERNA R$ - R$ 2.266.169,90 SEXECASCAV|CGIN 25
  • 26. colnames(d) [1] "letra" "num" "valor"  Each column has its own data type d = data.frame(letters[1:4], 1:4, 10.5) letters.1.4. X1.4 X10.5 1 a 1 10.5 We will be using 2 b 2 10.5 data.frames most of 3 c 3 10.5 the time 4 d 4 10.5  We can change column names: colnames(d) = c("letra","num", "valor") colnames(d) [1] "letra" "num" "valor“ d$valor # selects column “valor” from d SEXECASCAV|CGIN 26
  • 27. list  factor latter... 27 SEXECASCAV|CGIN
  • 28. Several possible sources.  We will see:  Keyboard x = scan( )  Excel files  CSV files  SQL Databases SEXECASCAV|CGIN 28
  • 29. require(XLConnect) wb <- loadWorkbook(“AC_PDACTCaula.xls”) plan1 <- readWorksheet(wb, sheet = 1) str(plan1) View(plan1) SEXECASCAV|CGIN 29
  • 30. require(XLConnect)  Loads package XLConnect  Packages are sets of functions and data that add capabilities to R.  If the package is not installed: setInternet2() #only on windows install.packages("XLConnect", dep=T) SEXECASCAV|CGIN 30
  • 31. Creates an object “wb” that points to the excel file: wb <- loadWorkbook(“AC_PDACTCaula.xls”) SEXECASCAV|CGIN 31
  • 32. Load the first sheet data into an object called “plan1” plan1 <- readWorksheet(wb, sheet = 1) R functions identify parameters by Or by name, or order both SEXECASCAV|CGIN 32
  • 33. Show the structure of the new object: str(plan1) str() works with any R Object. It is very useful.  Show data on a window: View(plan1) In RStudio, you may click on na object from the objects list to the same effect SEXECASCAV|CGIN 33
  • 34. args(readWorksheet) #shows available parameters function ( object, #workbook “wb” sheet, #number or name of the sheet startRow, # startCol, # endRow, # endCol, # header # T or F: use first line to name columns ) SEXECASCAV|CGIN 34
  • 35. Comma-separated values  Very popular format for data interchange  ; Other separators are also popular: <tab> <space>  Example: uf ano valido somaactc somapd AC 2009 1 34296430.67 3630841.04 AC 2010 1 29397712.04 3579715.12 AL 2009 1 12650160.51 8903714.41 SEXECASCAV|CGIN 35
  • 36. Example: uf ano valido somaactc somapd AC 2009 1 34296430,67 3630841,04 AC 2010 1 29397712,04 3579715,12 AL 2009 1 12650160,51 8903714,41  To read this file: d = read.csv(file="AgregaUF20110930_b.txt", header=T, # uses first line as column names sep="t", # separator is <tab> dec="," # decimals uses comma ) SEXECASCAV|CGIN 36
  • 37. str(d) #structure  summary(d) #Statistical summary  head(d) #first rows  tail(d) #last rows  plot(d) #standard plot SEXECASCAV|CGIN 37
  • 38. require(RODBC) canal <- odbcConnect( “base_ODBC", case="tolower“, uid=“user”, pwd=“password”) d <- sqlQuery(canal, ”select * from table where year = 2010”, as.is=T) SEXECASCAV|CGIN 38
  • 39. How to get the sum of values from a data.frame column? sum(data.frame$column) sum(d$somapd) [1] NA SEXECASCAV|CGIN 39
  • 40. NA Not Available  Missing values.  NaN Not a Number  Value not able to be presented as a number.  Inf & -Inf  plus and minus infinite Try: c(-1,0,1)/0 SEXECASCAV|CGIN 40
  • 41. Sum: sum(d$somapd, na.rm=T) [1] 4836882446  Mean: mean(d$somapd, na.rm=T)  Median: median(d$somapd, na.rm=T)  Standard deviation: sd(d$somapd, na.rm=T) SEXECASCAV|CGIN 41
  • 42. For these examples: milsa = read.csv("milsaText.txt", sep="t", head=T, dec=".") SEXECASCAV|CGIN 42
  • 43.  Absolute frequencies table(milsa$civil)  Relative frequencies table(milsa$civil) / length(milsa$civil) or prop.table(milsa$civil)  Pie chart pie(table(milsa$civil)) SEXECASCAV|CGIN 43
  • 44.  With attach(milsa)  Absolute frequencies table(civil)  Relative frequencies table(civil) / length(civil) or prop.table(civil)  Pie Chart after: detach(milsa) pie(table(civil)) SEXECASCAV|CGIN 44
  • 45.  Bar plot: barplot(table(instrucao))  remember:  I may save any result as an object to use it later. instrucao.tb = table(instrucao) barplot(instrucao.tb) pie(instrucao.tb) SEXECASCAV|CGIN 45
  • 46.  Try: prop.table(filhos)  Solution: prop.table(table(filhos))  Other solution:  Filter out elements with NA SEXECASCAV|CGIN 46
  • 47.  mean(filhos, na.rm=T)  median(filhos, na.rm=T)  range(filhos, na.rm=T)  var(filhos, na.rm=T) #variance  sd(filhos, na.rm=T) #standard deviation  Quantiles:  filhos.quartis = quantile(filhos, na.rm=T)  interquartile range:  filhos.quartis [4] -filhos.quartis [1] SEXECASCAV|CGIN 47
  • 48.  plot(milsa)  plot(salario ~ ano)  hist(salario)  boxplot(salario)  stem(salario) SEXECASCAV|CGIN 48
  • 49. Selecting some rows  milsaNovo = milsa[c(1,3,5,6) , ]  Selecting some columns  milsaNovo = milsa[ , c(1,3,5)]  milsaNovo = milsa[ , c(“funcionario”, ”instrucao“, “salario”)]  Attention:  New copy  milsaNovo=milsa[c(1,3,5,6) ,]  Replaces previous  milsa=milsa[c(1,3,5,6) , ] SEXECASCAV|CGIN 49
  • 50.  Who earns above median  acimamediana = milsa[ salario > median(salario), ]  Who is married and has higher education degree?  casadoEsuperior = milsa[ civil==“casado” & instrucao == “Superior”, ] AND: both must be true SEXECASCAV|CGIN 50
  • 51.  Who is married or has higher education degree?  casadoOUsuperior = milsa[ civil==“casado” | instrucao == “Superior”, ] OR: at least one must be true SEXECASCAV|CGIN 51
  • 52. NOT  milsaLimpo=milsa[!is.na(salario), ]  In English:  New Table milsaLimpo  equals =  Old table milsa  Select [  Rows where  Salary is not NA ! is.na(salario)  And all columns , ] SEXECASCAV|CGIN 52
  • 53. How many are married? sum(civil==“casado”)  or table(civil)["casado"] How may are married and has higher ed. degree? sum(civil==“casado” & instrucao == “Superior” )  or table(civil,instrucao)["casado","S uperior"] SEXECASCAV|CGIN 53
  • 54.  milsaNovo is equal to milsa, without rows 1,2 & 5 & without columns 1 & 8: milsaNovo = milsa[-c(1,2,5), -c(1,8)] SEXECASCAV|CGIN 54
  • 55. Which rows where this is TRUE  sup = which(instrucao=="Superior“)  [1] 19 24 31 33 34 36  May use it again later:  mean(milsa[sup,”salario”])  Mean salary for those with higher education advantage: it is not a copy!! SEXECASCAV|CGIN 55
  • 56. A random sample of 10 rows from milsa: amostra = sample(x=nrow(milsa),size=10) [1] 12 29 1 3 17 14 26 33 20 31  Mean salary for the sample: mean(milsa[amostra,”salario”]) SEXECASCAV|CGIN 56
  • 57. By number of children: milsa[order(filhos),]  Decreasing: milsa[order(filhos, decreasing=T),]  By number of children and then age: milsa[order(filhos,ano),]  10 youngest: head(milsa[order(ano),], 10)  10 older: tail(milsa[order(ano),], 10) SEXECASCAV|CGIN 57
  • 58.  Removing an object  rm(milsaNovo)  Removing every object  rm(list = ls()) ls() : list of current objects SEXECASCAV|CGIN 58
  • 59.  List objects are collections that may include different types of objects. lis = list(A=1:10, B=“Text”, C = matrix(1:9,ncol=3)  They are often used as parameters to functions or as result sets from them.  lis[1:2]  A list with the two first objects from lis (A & B)  lis[[1]]:  object stored at the first position of the list ( the content of A). The same as lis$A SEXECASCAV|CGIN 59
  • 60.  Saving all objects: save.image(“file.RData”)  Saving selected objects: save( x, y, file=“file.RData”)  loading: load(“file.RData“) Several “loads”: objects with distinct names are kept in memory SEXECASCAV|CGIN 60
  • 61.  Saving a script “.R” that reproduces the desired output.  Advantage:  It may be used to document the work performed;  It may be used again over updated data to update results.  Hybrid model:  Save intermediate results that take long time to process. Update them less often. SEXECASCAV|CGIN 61
  • 62. Add a column to a data.frame: milsa$idade = milsa$ano + milsa$mes/12 SEXECASCAV|CGIN 62
  • 63. X Y 6+3+5=14 SEXECASCAV|CGIN 63
  • 64. X Y SEXECASCAV|CGIN 64
  • 65. X Y SEXECASCAV|CGIN 65
  • 66. X Y SEXECASCAV|CGIN 66
  • 67. X Y SEXECASCAV|CGIN 67
  • 68. Example: & SEXECASCAV|CGIN 68
  • 69. Only rows found in both data.frames: merge(x=milsa, y=tabInst,by.x="instrucao", by.y="desc“, all=F) All rows from data.frame X: merge(x=milsa, y=tabInst,by.x="instrucao", by.y="desc", all.x=T) SEXECASCAV|CGIN 69
  • 70. All rows from data.frame y: merge(x=milsa, y=tabInst,by.x="instrucao", by.y="desc", all.y=T) All rows from data.frames x & y: merge(x=milsa, y=tabInst,by.x="instrucao", by.y="desc", all=T) SEXECASCAV|CGIN 70
  • 71. From text to numeric d.f$novaColuna = as.numeric(d.f$coluna)  From numeric to text: d.f$novaColuna=as.character(d.f$coluna)  From text or numeric to integer: d.f$novaColuna = as.integer(d.f$coluna) Integers save memory SEXECASCAV|CGIN 71
  • 72. Representation for categorical data  Nominal ▪ “married”, “single”  Ordinal Factors save memory ▪ “tall”, “short”  Assure proper treatment for these variables by many R functions SEXECASCAV|CGIN 72
  • 73. Nominal: milsa$fatorcivil=factor(milsa$civil, ordered=F) $fatorcivil : Factor w/ 2 levels "casado","solteiro": 2 1 1 2 2 1 2 2 1 2 Ordinal: milsa$fatormes = factor(milsa$mes, ordered=T) $fatormes : Ord.factor w/ 12 levels "0"<"1"<"2"<"3"<..: 4 11 6 11 8 1 1 5 11 7 ... It is possible to define a custom order: ?factor SEXECASCAV|CGIN 73
  • 74.  From factor to text: d.f$novaColuna = as.character(d.f$colunaFator)  From factor to numeric: d.f$novaColuna = as.numeric( as.character(d.f$colunaFator)) The internal representation of a factor is different from its text description SEXECASCAV|CGIN 74
  • 75. Using: m1 <- matrix(1:12, ncol = 3)  Sum of columns (a value for each column): colSums(m1) [1] 10 26 42  or apply(m1,2,sum) [1] 10 26 42 SEXECASCAV|CGIN 75
  • 76. Sum of rows (one value for each row): rowSums(m1) [1] 15 18 21 24  or apply(m1,1,sum) [1] 15 18 21 24 May use any function, even your own. SEXECASCAV|CGIN 76
  • 77. aggregate(salario ~ instrucao, data = milsa, mean) instrucao salario 1 1oGrau 7.836667 2 2oGrau 11.528333 3 Superior 16.475000 SEXECASCAV|CGIN 77
  • 78. aggregate( salario ~ instrucao + civil, data = milsa, mean) instrucao civil salario 1 1oGrau casado 7.044000 2 2oGrau casado 12.825000 3 Superior casado 17.783333 4 1oGrau solteiro 8.402857 5 2oGrau solteiro 8.935000 6 Superior solteiro 15.166667 SEXECASCAV|CGIN 78
  • 79. model = lm( formula = salario ~ ano + instrucao, data = milsa) summary(model) Just one line!!! SEXECASCAV|CGIN 79
  • 80. Prof. Dr. Roberto Dantas de Pinho, roberto.pinho@mct.gov.br This presentation is based on courses by Dr. Paulo Justiniano Ribeiro Jr (UFPR) & Dr. Cosme Marcelo Furtado Passos da Silva (FIOCRUZ) SEXECASCAV|CGIN 80