1. 1
R: THE TRUE BASICS
R, also called the Languagefor Statistical Computing, was developed by Ross
Ihaka and Robert Gentleman at the University of Auckland in the nineties. It is
considered an open sourceimplementation of the S language, which was
developed by John Chambers in the Bell Laboratories in the eighties.
R provides a wide variety of statistical techniques and visualization capabilities.
Another very importantfeature about R is that it is highly extensible. Because
of this and more importantly becauseR is open source, it actually was the
vehicle to bring the power of S to a larger community. Like in every
programming language, there are pros and cons.
ADVANTAGES:
1) It is an open sourceand free.
2) Master at graphics
3) Command – line Interface
4) Reproducibility through R scripts
5) R packages: Extensions of R
DISADVANTAGES:
1) Easy to learn, harder to master
2) Poorly written code hard to read/maintain
3) Command – Line interface daunting at first
4) Poorly written code is slow
The first step in R is one of the mostimportant components of R, and where
most of the action happens, is the R console. It's a place where you can
execute R commands. You simply type something at the promptin the console,
hit Enter, and R interprets and executes your command.
2. 2
Let's start our experiments by having R do some basic arithmetic; we'll
calculate the sum of 1 and 2. We simply type 1 + 2 in the consoleand hit Enter.
R compiles what you typed, calculates the result and prints that resultas a
numerical value.
Now let's try to type sometext in the console. We usedouble quotes for this.
You can also simply type a number and hit Enter. R understood your character
string and numerical value, but simply printed that string as an output. This
sbrings meto the first super importantconcept in R: the variable. A variable
allows you to storea value or an object in R. You can then later use this
variable's name to easily access the value or the object that is stored within
this variable. You can use the less than sign followed by a dash to create a
variable.
3. 3
Supposethe number 2 is the height of a rectangle. Let's assign this value 2 to a
variable height. We type height, less than sign, dash, 2: This time, R does not
print anything, because it assumes that you will be using this variable in the
future. If we now simply type and execute height in the console, R returns 2:
We can do a similar thing for the width of our imaginary rectangle. We assign
the value 4 to a variablewidth. If wetype width, we see that indeed, it
contains the value 4. As you'reassigning variables in the R console, you're
actually accumulating an R workspace. It's theplace where variables and
information is stored in R.
You can access the objects in the workspacewith the ls() function. Simply type
ls followed by empty parentheses and hit enter. This shows you a list of all the
variables you have created in the R session. If you havefollowed all the
examples up to now, you should see "height" and "width". This tells you that
there are two objects in your workspaceatthe moment. When you type height
in the console, R looks for the variable height in the workspace, finds it, and
prints the corresponding value. If, however, wetry to printa non-existing
variable, depth for example, R throws an error, becausedepth is not defined in
the workspaceand thus not found. The principle of accumulating a workspace
through variable assignmentmakes these variables available for further use.
Supposewe wantto find out the area of our imaginary rectangle, which is
height multiplied by width. Let's go ahead and type height asterisk width. The
result is 8, as you'd expect. We can take it one step further and also assign the
result of this calculation to a new variable, area. We again use the assignment
operator. If you now type area, you'll see that it contains 8 as well. Inspecting
4. 4
the workspaceagain with ls, shows thatthe workspacecontains threeobjects
now: area, height and width.
5. 5
Basic Data Types
R's fundamental data types, also called atomic vector types. Throughoutour
experiments, we will use the function class(). This is a usefulway to see what
type a variable is. Let's head over to the consoleand start with TRUE, in capital
letters. TRUE is a logical. That's also what class(TRUE) tells us. Logical are so-
called boolean values, and can be either `TRUE` or `FALSE`. Well, actually, `NA`,
to denote missing values, is also a logical.
We can performall sorts of operations on them such as addition, subtraction,
multiplication, division and many more. A special type of numeric is the
integer. Itis a way to represent natural numbers like 1 and 2. To specify that a
number is integer, you can add a capital L to them. We don't see the difference
between the integer 2 and the numeric 2 from the output. However, the
`class()` function reveals the difference. Instead of asking for the class of a
variable, you can also use the is-dot-functions to see whether variables are
actually of a certain type. To see if a variableis a numeric, we can usethe is-
dot-numeric function. Itappears that both are numeric.
To see if a variableis integer, we can use is-dot-integer. This shows us that
integers are numeric, but that not all numeric are integers, so there's some
6. 6
kind of type hierarchy going on here. Lastbut not least, there's the character
string. The class of this type of object is "character".
It's importantto note that there are other data types in R, such as double for
higher precision numeric, complex for handling complex numbers, and raw to
storeraw bytes.
7. 7
Vectors
1) Create and name vectors:
A vector is nothing more than a sequence of data elements of the _same_
basic data type.
Firstthings first: creating a vector in R! You use the `c()` function for this, which
allows us to combine values into a vector. Supposeyou'replaying a basic card
game, and record the suit of 5 cards you draw from a deck. A possibleoutcome
and corresponding vector to contain this information could be this one of
coursewe could also assign this character vector to a new variable, drawn suits
for example. We now have a character vector, drawn suits. Wecan assertthat
it is a vector, by typing is dot vector drawn suits
Likewise, we could create a vector of integers for example to storehow much
cards of each suit remain after we drew the 5 cards.
Let's call this vector remain. There are 11 more spades, 12 morehearts, 11
diamonds, and all 13 clubs still remain.
.
8. 8
We can use the `names ()` function for this. Let's firstcreate another character
vector, `suits`, thatcontains the strings "spades", "hearts", "diamonds", and
"clubs", the names we wantto give your vector elements.
2) Vector Arithmetic:
We learned that we can usevariables to perform arithmetic Remember how
you summed apples and oranges? From the previous section, we also know
that actually these variables, `my_apples` and `my_oranges`,aresimply
vectors. This means that we can perform arithmetic with vectors in R.
9. 9
The most important thing to remember about operations with vectors in R , is
that they will be applied element by element. This means that standard
mathematics is extended to vectors in an element-wise fashion.
Imagineyou have a vector containing your gambling earnings for the past 3
days. Not bad for a few days in the desert, is it? Imaginea well-dressed
gentleman approaches you and offers to triple your earnings for the past three
days, if you beat him in one round of poker. If you wantto calculate the
expected earnings for each of the pastthree days, you can easily do it in R.
As you can see, R multiplies each element in the `earnings` vector with 3,
resulting in 150 dollars of promised earnings in the first day, 300 in the second
day and 90 in the third day..
Likewise, division, subtraction, summation and many more are all carried out
element wise, just as if you are carrying outthe operation between two scalars
three times. Fromthese lines of code you don't see anything differentfrom
what we'vedone before, becauseof course, you were working with vectors all
along. The mathematics naturally extended to vectors that contain more than
one element. Let's go back to your Vegas adventures. To enjoy your earnings,
you also decided to go shopping and spend some money every day on the Las
Vegas Strip. You recorded a vector of expenses.
Because you are a very conscious programmer in training, you decide to
compute whether your luck in the casino was sufficient to pay for your
expenses.
10. 10
MATRICES
Creating and naming matrices:
A matrix is kind of like the big brother of the vector. Where a vector is a
sequence of data elements, which is one-dimensional, a matrix is a similar
collection of data elements, but this time arranged into a fixed number of rows
and columns. Since we are only working with rows and columns, a matrix is
called two-dimensional.
The matrix can contain only one atomic vector type. This means that you can't
have logical and numeric in a matrix for example. There's really not much
more theory about matrices than this: it's really a natural extension of the
vector, going fromone to two dimensions. Of course, this has its implications
for manipulating and subsetting matrices, but let's start with simply creating
and naming them. To build a matrix, you usethe matrix function. Most
importantly, it needs a vector, containing the values you want to place in the
matrix, and at least one matrix dimension. You can chooseto specify the
number of rows or the number of columns. Havea look at the following
example, that creates a 2-by-3 matrixcontaining the values 1 to 6, by
specifying the vector and setting the row argument to 2: R sees that the input
vector has length 6 and that there haveto be two rows. Itthen infers that
you'll probably want3 columns, such that the number of matrix elements
matches the number of input vector elements.
11. 11
If you prefer to fill up the matrix in a row-wisefashion, such
that the 1, 2 and 3 are in the first row, you can set the `by row` argumentof
matrix to `TRUE` Can you spot the difference? Remember how R did recycling
when you weresubsetting vectors using logical vectors? The same thing
happens when you pass the matrix function a vector that is too shortto fill up
the entire matrix. Supposeyou pass a vector containing the values 1 to 3 to the
matrix function, and explicitly say you wanta matrix with 2 rows and 3
columns: R fills up the matrix column by column and simply repeats the vector.
If you try to fill up the matrix with a vector whosemultiple does not nicely fit in
the matrix, for example when you want to put a 4-element vector in a 6-
element matrix, R generates a warning message.
Actually, apartfrom the `matrix()` function, there's yet another easy way to
create matrices that is more intuitive in some cases. You can pastevectors
together using the `cbind()` and `rbind()` functions. Havea look at these calls
`cbind()`, shortfor column bind, takes the vectors you pass it, and sticks them
together as if they were columns of a matrix. The `rbind()` function, shortfor
row bind, does the samething but takes the input as rows and makes a matrix
out of them. These functions can come in pretty handy, because they're often
more easy to use than the `matrix()` function.
12. 12
If you want to add another row to it, containing the values 7, 8, 9, you could
simply run this
command: You can do a similar thing with `cbind()`: Next up is naming the
matrix. In the case of vectors, you simply used the names() function, but in the
case of matrices, you could assign names to both columns and rows. That's
why R came up with the rownames () and colnames () functions. Their use is
pretty straightforward. Retaking thematrix `m` from before,
we can set the row names justthe same way as wenamed vectors, but this
time with the row names function.