SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
STATISTICAL COMPUTATION USING R
A Seminar Report
submitted in partial fulfillment of the requirements
for the award of the Degree of
MASTER OF COMPUTER APPLICATIONS
under the
UNIVERSITY OF CALICUT
by
KAMARUDHEEN KV
Register No:MKANMCA018
.
DEPARTMENT OF COMPUTER APPLICATIONS
MES COLLEGE OF ENGINEERING,
KUTTIPPURAM, MALAPPURAM- 679 573
April-2016
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
STATISTICAL COMPUTATION USING R
A Seminar Report
submitted in partial fulfillment of the requirements
for the award of the Degree of
MASTER OF COMPUTER APPLICATIONS
under the
UNIVERSITY OF CALICUT
by
KAMARUDHEEN KV
Register No:MKANMCA018
.
DEPARTMENT OF COMPUTER APPLICATIONS
MES COLLEGE OF ENGINEERING,
KUTTIPPURAM, MALAPPURAM- 679 573
April-2016
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
MES COLLEGE OF ENGINEERING
KUTTIPPURAM, KERALA -679573
(AN ISO 9001: 2008 CERTIFIED INSTITUTION & WITH NBA ACCREDITED DEPARTMENTS,
APPROVED BY AICTE AND AFFILIATED TO THE UNIVERSITY OF CALICUT)
DEPARTMENT OF COMPUTER APPLICATIONS
C E R T I F I C A T E
This is to certify that report entitled STATISTICAL COMPUTATION
USING R has been prepared and presented by Mr. KAMARUDHEEN KV (Register
No: MKANMCAO18), fifth semester student of the department, during the academic
year 2015-16, in partial fulfillment of the requirements for the award of Degree of Master
of Computer Applications under the University of Calicut.
Staff in Charge Head of the Department
Date:
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
ACKNOWLEDGEMENT
My endeavor stands incomplete without dedicating my gratitude to a few people who
have contributed towards the successful completion of my seminar. I pay my gratitude to
the Almighty for invisible help and blessing for the fulfillment of this work. At the outset
I express my heart full thanks to our Head of the Department, Prof.Hyderali. K for
permitting me to present this seminar.
I take this opportunity to express my profound gratitude to Mr. Pradeep Uduppa our
group tutor, for his valuable support and help in presenting my seminar.
I am also grateful to all our teaching and non-teaching staff for their encouragement,
guidance and whole-hearted support.
Last but not least, I am gratefully indebted to my family and friends, who gave me a
precious help in presenting my seminar.
Sincerely,
KAMARUDHEENKV
MKANMCA018
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
SYNOPSIS
The rapid and sustained increases in computing power starting from the second half of the
20th century have had a substantial impact on the practice of statistical science. Early
statistical models were almost always from the class of linear models, but powerful
computers, coupled with suitable numerical algorithms, caused an increased interest
in nonlinear models (such as neural networks) as well as the creation of new types, such
as generalized linear models and multilevel models.
R is rapidly becoming the leading language in data science and statistics. It is a
programming language and software environment for statistical computing and graphics
supported by the R Foundation for Statistical Computing. The R language is widely used
among statisticians and data miners for developing statistical software and data analysis.
R is an implementation of the S programming language combined with lexical scoping
semantics inspired by Scheme. S was created by John Chambers while at Bell Labs.
There are some important differences, but much of the code written for S runs unaltered.
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
TABLE OF CONTENTS
1. INTRODUCTION
2. R AS A STATISTICAL SOFTWARE
2.1 Programming Features
3. ADVANTAGES OVER OTHER STATISTICAL TOOLS
4. R PRELIMINARIES
4.1 Common Operators
5. R LANGUAGE ESSENTIALS
5.1 Expressions and Objects
5.2 Functions and Arguments
5.3 Vectors
5.4 Matrices and Arrays
5.5 Lists
5.6 Data Frames
5.7 Indexing
5.8 Commonly used method of data input
6. GRAPHICS
6.1 Standard Plots
7. CONCLUSIONS
8. REFERENCES
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
1. INTRODUCTION
The R system for statistical computing is an environment for data analysis and graphics.
The root of R is the S language, developed by John Chambers and colleagues (Becker et
al., 1988, Chambers and Hastie, 1992, Chambers, 1998) at Bell Laboratories (formerly
AT&T, now owned by Lucent Technologies) starting in the 1960s. The S language was
designed and developed as a programming language for data analysis tasks but in fact it is
a full-featured programming language in its current implementations. The development of
the R system for statistical computing is heavily influenced by the open source idea: The
base distribution of R and a large number of user contributed extensions are available
under the terms of the Free Software Foundation‟s GNU General Public License in source
code form. This license has two major implications for the data analyst working with R.
The complete source code is available and thus the practitioner can investigate the details
of the implementation of a special method, can make changes and can distribute
modifications to colleagues. As a side-effect, the R system for statistical computing is
available to everyone. All scientists, especially including those working in developing
countries, have access to state-of-the-art tools for statistical data analysis without
additional costs. R system itself, a collection of add-on packages, manuals,
documentation and more.
The fact that R is based on a formal computer language gives it tremendous flexibility.
Other systems present simpler interfaces in terms of menus and forms, but often the
apparent user friendliness turns into a hindrance in the longer run. Although elementary
statistics is often presented as a collection of fixed procedures, analysis of moderately
complex data requires ad hoc statistical model building, which makes the added
flexibility of R highly desirable.
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
2. R AS A STATISTICAL SOFTWARE
R and its libraries implement a wide variety of statistical and graphical techniques,
including linear and nonlinear modeling, classical statistical tests, time-series analysis,
classification, clustering, and others. R is easily extensible through functions and
extensions, and the R community is noted for its active contributions in terms of
packages. Many of R's standard functions are written in R itself which makes it easy for
users to follow the algorithmic choices made. For computationally intensive tasks, C,
C++, and Fortran code can be linked and call at run time. Advanced users can write C,
C++,Java,.NET or Python code to manipulate R objects directly.
R is highly extensible through the use of user-submitted packages for Specific functions
or specific areas of study. Due to its S heritage, R has stronger object-oriented
programming facilities than most statistical computing languages. Extending R is also
eased by its lexical scoping rules. Another strength of R is static graphics, which can
produce publication-quality graphs, including mathematical symbols. Dynamic and
interactive graphics are available through additional packages.
R has its own LaTeX-like documentation format, which is used to supply comprehensive
documentation, both on-line in a number of formats and in hard copy.
2.1 Programming features of R
R is an interpreted language; users typically access it through a command-line interpreter.
If a user types 2+2 at the R command prompt and presses enter, the computer replies with
4, as shown below:
> 2+2
[1] 4
R's data structures include vectors, matrices, arrays, data frames (similar to tables in a
relational database) and lists. R's extensible object system include objects for (among
others): regression models, time-series and geo-spatial coordinates. The scalar data type
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
was never a data structure of R. Instead, a scalar is represented as a vector with length
one.
R supports procedural programming with functions and, for some functions, object-
oriented programming with generic functions. A generic function acts differently
depending on the type of arguments passed to it. In other words, the generic function
dispatches the function (method) specific to that type of object. For example, R has a
generic print function that can print almost every type of object in R with a simple
print(objectname) syntax.
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
3. ADVANTAGES OVER OTHER STATISTICAL TOOLS
In R, statistical analyses are normally done as a series of steps, with intermediate results
being stored in objects, where the objects are later “interrogated” for the information of
interest. This is in contrast to other widely used programs (e.g., SAS and SPSS), which
print a large amount of output to the screen. Storing the results in objects so that
information can be retrieved at later times allows for easily using the results of one
analysis as input for another analysis. Furthermore, because the objects contain all
pertinent model information, model modification can be easily performed by
manipulation of the objects, a valuable benefit in many cases. R packages for new
innovations in statistical computing also tend to become available more quickly than do
such developments in other statistical software packages.
Using R requires a more thoughtful approach to data analysis than does using some other
programs, but that dates back to the idea of the S language being one where the user
interacts with the data, as opposed to a “shotgun” approach, where the computer program
provides everything thought to be relevant to the particular problem. For those who want
to stay on the cutting edge of statistical developments, using R is a must. The flexibility
of R is arguably unmatched by any other statistics program, as its object-oriented
programming language allows for the creation of functions that perform customized
procedures and/or the automation of tasks that are commonly performed.
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
4. R-PRELIMINARIES
Expressions are entered directly into an R session at the prompt, which is generally
denoted by the symbol >, The number sign (#) is used for comments; anything that
follows a number sign on a line is ignored.
4.1Common Operators
4.1.1 Assignment Operator
The expression <− is the assignment operator (assign what is on the right to the object on
the left), as is −> (assign what is on the left to the object on the right).
Eg: x<-2 Assigns the value 2 to the object x
x^2->y Assigns the value x^2 to the object y
4.1.2 Arithmatic Operators
+ Addition - Subtract
* Multiplication / Division
^ Exponential
4.1.3 Relational Operators
< Lessthan > Greaterthan
<= Lessthan Equal >= Greaterthan Equal
== Is Equal to != Not Equal
4.1.4 Logical Operator
! NOT
& AND
| OR
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
5. R LANGUAGE ESSENTIALS
This section outlines the basic aspects of the R language. It is necessary to do this in a
slightly superficial manner, with some of the finer points glossed over. The emphasis is
on items that are useful to know in interactive usage as opposed to actual programming.
5.1 Expressions and Objects
The basic interaction mode in R is one of expression evaluation. The user enters an
expression; the system evaluates it and prints the result. Some expressions are evaluated
not for their result but for side effects such as putting up a graphics window or writing to
a file. All R expressions return a value (possibly NULL), but sometimes it is “invisible”
and not printed. Expressions typically involve variable references, operators such as +,
and function calls, as well as some other items that have not been introduced yet.
Expressions work on objects. This is an abstract term for anything that can be assigned to
a variable. R contains several different types of objects.
5.2 Functions and Arguments
Many things in R is done using function calls, commands that look like an application of
a mathematical function of one or several variables; for example, log(x) or plot(height,
weight). The format is that a function name is followed by a set of parentheses containing
one or more arguments. For instance, in plot(height,weight) the function name is plot and
the arguments are height and weight. These are the actual arguments, which apply only
to the current call. A function also has formal arguments, which get connected to actual
arguments in the call.
When you write plot(height, weight), R assumes that the first argument corresponds to the
x-variable and the second one to the y-variable. This is known as positional matching.
Fortunately, R has methods to avoid this: Most arguments have sensible defaults and can
be omitted in the standard cases, and there are
nonpositional ways of specifying them when you need to depart from the default settings.
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
5.3 Vectors
A character vector is a vector of text strings, whose elements are specified and printed in
quotes:
> c("Huey","Dewey","Louie")
[1] "Huey" "Dewey" "Louie"
It does not matter whether use single- or double-quote symbols, as long as the left quote
is the same as the right quote:
> c(‟Huey‟,‟Dewey‟,‟Louie‟)
[1] "Huey" "Dewey" "Louie"
Logical vectors are constructed using the c function just like the other vector types:
> c(T,T,F,T)
[1] TRUE TRUE FALSE TRUE
It is much more common to use single logical values to turn an option on or off in a
function call.
5.4 Matrices and Arrays
A matrix in mathematics is just a two-dimensional array of numbers. Matrices are used
for many purposes in theoretical and practical statistics. However, matrices and also
higher-dimensional arrays do get used for simpler purposes as well, mainly to hold tables.
In R, the matrix notion is extended to elements of any type, Matrices and arrays are
represented as vectors with dimensions:
> x <- 1:12
> dim(x) <- c(3,4)
> x
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
The dim assignment function sets or changes the dimension attribute of x, causing R to
treat the vector of 12 numbers as a 3 × 4 matrix. Notice that the storage is column-major;
that is, the elements of the first column are followed by those of the second.
A convenient way to create matrices is to use the matrix function:
> matrix(1:12,nrow=3,byrow=T)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
Notice how the byrow=T switch causes the matrix to be filled in a row wise fashion rather
than column wise. Useful functions that operate on matrices include rownames, colnames,
and the transposition function t (notice the lowercase t as opposed to uppercase T for
TRUE), which turns rows into columns and vice versa.
5.5 Lists
It is sometimes useful to combine a collection of objects into a larger composite object.
This can be done using lists. You can construct a list from its components with the
function list
As an example, consider a set of data concerning pre- and postmenstrual energy intake in
a group of women. We can place these data in two vectors as follows:
> intake.pre <- c(5260,5470,5640,6180,6390,6515,6805,7515,7515,8230,8770)
> intake.post <- c(3910,4220,3885,5160,5645,4680,5265,5975,6790,6900,7335)
To combine these individual vectors into a list,
> mylist <- list(before=intake.pre,after=intake.post)
> mylist
$before
[1] 5260 5470 5640 6180 6390 6515 6805 7515 7515 8230 8770
$after
[1] 3910 4220 3885 5160 5645 4680 5265 5975 6790 6900 7335
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
The components of the list are named according to the argument names used in list.
Named components may be extracted like this:
> mylist$before
[1] 5260 5470 5640 6180 6390 6515 6805 7515 7515 8230 8770
Many of R‟s built-in functions compute more than a single vector of values and return
their results in the form of a list.
5.6 Data Frames
A data frame corresponds to what other statistical packages call a “data matrix” or a “data
set”. It is a list of vectors and/or factors of the same length that are related “across” such
that data in the same position come from the same experimental unit (subject, animal,
etc.). In addition, it has a unique set of row names. We can create data frames from
preexisting variables:
> d <- data.frame(intake.pre,intake.post)
> d
intake.pre intake.post
1 5260 3910
2 5470 4220
3 5640 3885
4 6180 5160
5 6390 5645
6 6515 4680
7 6805 5265
8 7515 5975
9 7515 6790
10 8230 6900
11 8770 7335
As with lists, components (i.e., individual variables) can be accessed using
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
the $ notation:
> d$intake.pre
[1] 5260 5470 5640 6180 6390 6515 6805 7515 7515 8230 8770
5.7 Indexing
If you need a particular element in a vector, for instance the premenstrual energy intake
for woman no. 5,
> intake.pre[5]
[1] 6390
The brackets are used for selection of data, also known as indexing or subsetting. This
also works on the left-hand side of an assignment (so that you can say, for instance,
intake.pre[5] <- 6390) if we want to modify elements of a vector. If we want a sub vector
consisting of data for more than one woman, for instance nos. 3, 5, and 7, you can index
with a vector:
> intake.pre[c(3,5,7)]
[1] 5640 6390 6805
Note that it is necessary to use the c(...)-construction to define the vector consisting of the
three numbers 3, 5, and 7. intake.pre[3,5,7] would mean something completely different.
It would specify indexing into a three-dimensional array. Indexing with a vector also
works if the index vector is stored in a variable. This is useful when we need to index
several variables in the same way.
> v <- c(3,5,7)
> intake.pre[v]
[1] 5640 6390 6805
It is also worth noting that to get a sequence of elements, for instance the
first five, you can use the a:b notation:
> intake.pre[1:5]
[1] 5260 5470 5640 6180 6390
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
A neat feature of R is the possibility of negative indexing. We can get all observations
except nos. 3, 5, and 7 by writing
> intake.pre[-c(3,5,7)]
[1] 5260 5470 6180 6515 7515 7515 8230 8770
It is not possible to mix positive and negative indices. That would be highly ambiguous.
5.8 Commonly used method of data input
Following are the commonly used method of data input
5.8.1 Combine Function
The most useful R-command for quickly entering small data sets is the „C‟ or combine
function. This function combines term together.
Eg: > y<-c(1,5,3,9)
> y
[1] 1 5 3 9
The combine function can also be used to construct a vector of character strings
Eg: > Name<-c("bob","Jack","Simon")
> Name
[1] "bob" "Jack" "Simon"
5.8.2 Sequence Function
The sequence operator “:” generate consecutive no‟s while the sequence function thus the
same thing but more flexible.
Eg: > 1:4
[1] 1 2 3 4
seq function:
> seq(2,8,by=2)
[1] 2 4 6 8
5.8.3 Scan Function
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
Used to provide comparatively small quantities of data. The R command of this function
is,
Variable=scan()
After this command type in the data values separated by single space,or comma,
terminate data entry by double strike of enter key
Eg: A<-scan()
1: 25 50 63 64 55 47
7:
Read 6 items
5.8.4 Rep Function
In order to enter the data continuing repeated values, rep function is useful
y=rep(x,n)
create the value y, with values of x repeated n times
Eg: > x<-c(rep(1,4),rep(2,5))
> x
[1] 1 1 1 1 2 2 2 2 2
5.8.5 Class Function
This function is useful in deciding the class of the data object.
Eg: > x<-c(1,2,3,4)
> class(x)
[1] "numeric"
> y<-c("a","b","c")
> class(y)
[1] "character"
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
6 GRAPHICS
In order to produce graphical output, the user calls a series of graphics functions, each of
which produces either a complete plot, or adds some output to an existing plot. R graphics
follows a painters model," which means that graphics output occurs in steps, with later
output obscuring any previous output that it overlaps.
Functions in the graphics systems and graphics packages can be broken down into three
main types: high-level functions that produce complete plots; lowlevel functions that
add further output to an existing plot; and functions for working interactively with
graphical output.
6.1 Standard Plots
R provides the usual range of standard statistical plots, including scatterplots, boxplots,
histograms, barplots, piecharts, and basic 3D plots.
6.1.1 Scatter Plot
The function plot() can be used to plot data. Although it has a diverse array of arguments,
the most common specifications is of the form plot(x, y, type, col, xlim, ylim, xlab, ylab,
main) where x is the data to be represented on the abscissa (x-axis) of the plot; y is the
data to be represented on the ordinate (y-axis; note that the ordering of the values in x and
y must be consistent, meaning that the first element in y is linked to the first element in x,
etc.)
type is the type of plot (e.g., p for points, l for lines, n for no plotting but setting up the
structure of the plot so that points and/or lines are added later)
col is the color of the points and lines
xlim and ylim are the ranges of x-axis and y-axis, respectively
xlab and ylab are the labels of x axis and y-axis, respectively and
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
main is the title of the plot. All of the above arguments, except x and y, are optional.
Eg:
> age<-c(25,35,45,55,65)
> frequency<-c(55,93,113,90,85)
> plot(age,frequency,xlab=age,ylab=frequency,pch=1,main="frequency vs age")
6.1.2 Histogram
The function to plot histograms is hist(). The basic specification is of the form hist(x,
breaks, freq) where x is the data to be plotted, breaks defines the way to determine the
location and/or quantity of bins, and freq is a logical statement of whether the histogram
represents frequencies or probability densities.
Eg:
> midx<-seq(25,85,10)
> fr<-c(10,24,18,12,8,5,3)
> x<-rep(midx,fr)
> brk<-seq(20,90,10)
> hist(x,brk,main="Histogram",xlab="pocket money",ylab="no.of students")
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
6.1.3 Bar Plot
It is used to represents grouped data. A bar graph is a chart that uses either vertical or
horizontal bars to show comparisons among categories.
The function is to plot bar chart is,
barplot(x,y, type, col, xlim, ylim, xlab, ylab, main)
Eg:
> year<-1995:2000
> sales<-c(15,25,27,28,26,26.6)
> sales.year<-data.frame(year,sales)
> sales.year
year sales
1 1995 15.0
2 1996 25.0
3 1997 27.0
4 1998 28.0
5 1999 26.0
6 2000 26.6
> barplot(sales.year,xlab="year",ylab="sales",col="grey")
Histogram
pocket money
no.ofstudents
20 30 40 50 60 70 80 90
05101520
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
6.1.4 Box Plot
It is a convenient way of graphically depicting groups of numerical data through their
quartiles. Box plot may also have lines extending vertically from the boxes(whiskers)
indicating variability outside the upper and lower quartiles.
The function is to plot box plot is, boxplot().
Eg:
>x<-rnorm(100,1,1)
>boxplot(x,lwd=2)
year
sales
0510152025-101234
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
7 CONCLUSION
R is a flexible programming language designed to facilitate exploratory data analysis,
classical statistical tests, and high-level graphics.
R is a full-fledged programming language, with a rich complement of mathematical
functions, matrix operations and control structures. With its rich and ever-expanding
library of packages, R is on the leading edge of development in statistics, data analytics,
and data mining.
R has proven itself a useful tool within the growing field of big data and has been
integrated into several commercial packages, such as IBM SPSS and InfoSphere, as well
as Mathematica.
Seminar Report’16 Statistical Computation Using R
Department of Computer Applications MESCE, Kuttippuram
8 REFERENCES
 Introductory Statistics with R- Peter Dalgaard(2nd
edition)
 Statistical Computing with R- Eric Slud
 Quick-R : Creating Graphs http://www.statmethods.net/graphs

Weitere ähnliche Inhalte

Andere mochten auch

Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformSyracuse University
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programmingizahn
 
R language tutorial
R language tutorialR language tutorial
R language tutorialDavid Chiu
 
Introduction to data structures and Algorithm
Introduction to data structures and AlgorithmIntroduction to data structures and Algorithm
Introduction to data structures and AlgorithmDhaval Kaneria
 
Back to the Future (evolving model of communication agencies and content in t...
Back to the Future (evolving model of communication agencies and content in t...Back to the Future (evolving model of communication agencies and content in t...
Back to the Future (evolving model of communication agencies and content in t...Jimmy Ghazal
 

Andere mochten auch (7)

Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics Platform
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
R language tutorial
R language tutorialR language tutorial
R language tutorial
 
Class ppt intro to r
Class ppt intro to rClass ppt intro to r
Class ppt intro to r
 
R programming
R programmingR programming
R programming
 
Introduction to data structures and Algorithm
Introduction to data structures and AlgorithmIntroduction to data structures and Algorithm
Introduction to data structures and Algorithm
 
Back to the Future (evolving model of communication agencies and content in t...
Back to the Future (evolving model of communication agencies and content in t...Back to the Future (evolving model of communication agencies and content in t...
Back to the Future (evolving model of communication agencies and content in t...
 

Ähnlich wie statistical computation using R- report

Colombo+ronzoni+fontana
Colombo+ronzoni+fontanaColombo+ronzoni+fontana
Colombo+ronzoni+fontanaAjay Ohri
 
OPERATING SYSTEM AND ITS TYPES REPORT
OPERATING SYSTEM AND ITS TYPES REPORTOPERATING SYSTEM AND ITS TYPES REPORT
OPERATING SYSTEM AND ITS TYPES REPORTAmin Hussain
 
Sparkr sigmod
Sparkr sigmodSparkr sigmod
Sparkr sigmodwaqasm86
 
Study of R Programming
Study of R ProgrammingStudy of R Programming
Study of R ProgrammingIRJET Journal
 
SWANSTAT: A user-friendly web application for data analysis using shinydashbo...
SWANSTAT: A user-friendly web application for data analysis using shinydashbo...SWANSTAT: A user-friendly web application for data analysis using shinydashbo...
SWANSTAT: A user-friendly web application for data analysis using shinydashbo...TELKOMNIKA JOURNAL
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningEditor IJCATR
 
Using R for Cyber Security Part 1
Using R for Cyber Security Part 1Using R for Cyber Security Part 1
Using R for Cyber Security Part 1Ajay Ohri
 
Financial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali Tirmizi
Financial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali TirmiziFinancial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali Tirmizi
Financial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali TirmiziDr. Muhammad Ali Tirmizi., Ph.D.
 
Proceedings of the 2015 Industrial and Systems Engineering Res.docx
Proceedings of the 2015 Industrial and Systems Engineering Res.docxProceedings of the 2015 Industrial and Systems Engineering Res.docx
Proceedings of the 2015 Industrial and Systems Engineering Res.docxwkyra78
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document usefulssuser3c3f88
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Rusif Eyvazli
 
2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4Ferdin Joe John Joseph PhD
 
2 it unit-1 start learning r
2 it   unit-1 start learning r2 it   unit-1 start learning r
2 it unit-1 start learning rNetaji Gandi
 
Graph-Based Analysis and Visualization of Software Traces [SSP 2019]
Graph-Based Analysis and Visualization of Software Traces [SSP 2019]Graph-Based Analysis and Visualization of Software Traces [SSP 2019]
Graph-Based Analysis and Visualization of Software Traces [SSP 2019]Richard Müller
 
European Pharmaceutical Contractor: SAS and R Team in Clinical Research
European Pharmaceutical Contractor: SAS and R Team in Clinical ResearchEuropean Pharmaceutical Contractor: SAS and R Team in Clinical Research
European Pharmaceutical Contractor: SAS and R Team in Clinical ResearchKCR
 
Development of Information Extraction for Data Analysis using NLP
Development of Information Extraction for Data Analysis using NLPDevelopment of Information Extraction for Data Analysis using NLP
Development of Information Extraction for Data Analysis using NLPIRJET Journal
 

Ähnlich wie statistical computation using R- report (20)

Colombo+ronzoni+fontana
Colombo+ronzoni+fontanaColombo+ronzoni+fontana
Colombo+ronzoni+fontana
 
Airline Data Analysis
Airline Data AnalysisAirline Data Analysis
Airline Data Analysis
 
Crash course in R and BioConductor
Crash course in R and BioConductorCrash course in R and BioConductor
Crash course in R and BioConductor
 
OPERATING SYSTEM AND ITS TYPES REPORT
OPERATING SYSTEM AND ITS TYPES REPORTOPERATING SYSTEM AND ITS TYPES REPORT
OPERATING SYSTEM AND ITS TYPES REPORT
 
Sparkr sigmod
Sparkr sigmodSparkr sigmod
Sparkr sigmod
 
Study of R Programming
Study of R ProgrammingStudy of R Programming
Study of R Programming
 
SWANSTAT: A user-friendly web application for data analysis using shinydashbo...
SWANSTAT: A user-friendly web application for data analysis using shinydashbo...SWANSTAT: A user-friendly web application for data analysis using shinydashbo...
SWANSTAT: A user-friendly web application for data analysis using shinydashbo...
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data Mining
 
Using R for Cyber Security Part 1
Using R for Cyber Security Part 1Using R for Cyber Security Part 1
Using R for Cyber Security Part 1
 
Financial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali Tirmizi
Financial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali TirmiziFinancial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali Tirmizi
Financial Risk Mgt - Lec 4 by Dr. Syed Muhammad Ali Tirmizi
 
Vedic Calculator
Vedic CalculatorVedic Calculator
Vedic Calculator
 
Proceedings of the 2015 Industrial and Systems Engineering Res.docx
Proceedings of the 2015 Industrial and Systems Engineering Res.docxProceedings of the 2015 Industrial and Systems Engineering Res.docx
Proceedings of the 2015 Industrial and Systems Engineering Res.docx
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document useful
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...
 
2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4
 
2 it unit-1 start learning r
2 it   unit-1 start learning r2 it   unit-1 start learning r
2 it unit-1 start learning r
 
UNIT-1 Start Learning R.pdf
UNIT-1 Start Learning R.pdfUNIT-1 Start Learning R.pdf
UNIT-1 Start Learning R.pdf
 
Graph-Based Analysis and Visualization of Software Traces [SSP 2019]
Graph-Based Analysis and Visualization of Software Traces [SSP 2019]Graph-Based Analysis and Visualization of Software Traces [SSP 2019]
Graph-Based Analysis and Visualization of Software Traces [SSP 2019]
 
European Pharmaceutical Contractor: SAS and R Team in Clinical Research
European Pharmaceutical Contractor: SAS and R Team in Clinical ResearchEuropean Pharmaceutical Contractor: SAS and R Team in Clinical Research
European Pharmaceutical Contractor: SAS and R Team in Clinical Research
 
Development of Information Extraction for Data Analysis using NLP
Development of Information Extraction for Data Analysis using NLPDevelopment of Information Extraction for Data Analysis using NLP
Development of Information Extraction for Data Analysis using NLP
 

Kürzlich hochgeladen

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

statistical computation using R- report

  • 1. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram STATISTICAL COMPUTATION USING R A Seminar Report submitted in partial fulfillment of the requirements for the award of the Degree of MASTER OF COMPUTER APPLICATIONS under the UNIVERSITY OF CALICUT by KAMARUDHEEN KV Register No:MKANMCA018 . DEPARTMENT OF COMPUTER APPLICATIONS MES COLLEGE OF ENGINEERING, KUTTIPPURAM, MALAPPURAM- 679 573 April-2016
  • 2. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram STATISTICAL COMPUTATION USING R A Seminar Report submitted in partial fulfillment of the requirements for the award of the Degree of MASTER OF COMPUTER APPLICATIONS under the UNIVERSITY OF CALICUT by KAMARUDHEEN KV Register No:MKANMCA018 . DEPARTMENT OF COMPUTER APPLICATIONS MES COLLEGE OF ENGINEERING, KUTTIPPURAM, MALAPPURAM- 679 573 April-2016
  • 3. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram MES COLLEGE OF ENGINEERING KUTTIPPURAM, KERALA -679573 (AN ISO 9001: 2008 CERTIFIED INSTITUTION & WITH NBA ACCREDITED DEPARTMENTS, APPROVED BY AICTE AND AFFILIATED TO THE UNIVERSITY OF CALICUT) DEPARTMENT OF COMPUTER APPLICATIONS C E R T I F I C A T E This is to certify that report entitled STATISTICAL COMPUTATION USING R has been prepared and presented by Mr. KAMARUDHEEN KV (Register No: MKANMCAO18), fifth semester student of the department, during the academic year 2015-16, in partial fulfillment of the requirements for the award of Degree of Master of Computer Applications under the University of Calicut. Staff in Charge Head of the Department Date:
  • 4. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram ACKNOWLEDGEMENT My endeavor stands incomplete without dedicating my gratitude to a few people who have contributed towards the successful completion of my seminar. I pay my gratitude to the Almighty for invisible help and blessing for the fulfillment of this work. At the outset I express my heart full thanks to our Head of the Department, Prof.Hyderali. K for permitting me to present this seminar. I take this opportunity to express my profound gratitude to Mr. Pradeep Uduppa our group tutor, for his valuable support and help in presenting my seminar. I am also grateful to all our teaching and non-teaching staff for their encouragement, guidance and whole-hearted support. Last but not least, I am gratefully indebted to my family and friends, who gave me a precious help in presenting my seminar. Sincerely, KAMARUDHEENKV MKANMCA018
  • 5. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram SYNOPSIS The rapid and sustained increases in computing power starting from the second half of the 20th century have had a substantial impact on the practice of statistical science. Early statistical models were almost always from the class of linear models, but powerful computers, coupled with suitable numerical algorithms, caused an increased interest in nonlinear models (such as neural networks) as well as the creation of new types, such as generalized linear models and multilevel models. R is rapidly becoming the leading language in data science and statistics. It is a programming language and software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. R is an implementation of the S programming language combined with lexical scoping semantics inspired by Scheme. S was created by John Chambers while at Bell Labs. There are some important differences, but much of the code written for S runs unaltered.
  • 6. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram TABLE OF CONTENTS 1. INTRODUCTION 2. R AS A STATISTICAL SOFTWARE 2.1 Programming Features 3. ADVANTAGES OVER OTHER STATISTICAL TOOLS 4. R PRELIMINARIES 4.1 Common Operators 5. R LANGUAGE ESSENTIALS 5.1 Expressions and Objects 5.2 Functions and Arguments 5.3 Vectors 5.4 Matrices and Arrays 5.5 Lists 5.6 Data Frames 5.7 Indexing 5.8 Commonly used method of data input 6. GRAPHICS 6.1 Standard Plots 7. CONCLUSIONS 8. REFERENCES
  • 7. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram 1. INTRODUCTION The R system for statistical computing is an environment for data analysis and graphics. The root of R is the S language, developed by John Chambers and colleagues (Becker et al., 1988, Chambers and Hastie, 1992, Chambers, 1998) at Bell Laboratories (formerly AT&T, now owned by Lucent Technologies) starting in the 1960s. The S language was designed and developed as a programming language for data analysis tasks but in fact it is a full-featured programming language in its current implementations. The development of the R system for statistical computing is heavily influenced by the open source idea: The base distribution of R and a large number of user contributed extensions are available under the terms of the Free Software Foundation‟s GNU General Public License in source code form. This license has two major implications for the data analyst working with R. The complete source code is available and thus the practitioner can investigate the details of the implementation of a special method, can make changes and can distribute modifications to colleagues. As a side-effect, the R system for statistical computing is available to everyone. All scientists, especially including those working in developing countries, have access to state-of-the-art tools for statistical data analysis without additional costs. R system itself, a collection of add-on packages, manuals, documentation and more. The fact that R is based on a formal computer language gives it tremendous flexibility. Other systems present simpler interfaces in terms of menus and forms, but often the apparent user friendliness turns into a hindrance in the longer run. Although elementary statistics is often presented as a collection of fixed procedures, analysis of moderately complex data requires ad hoc statistical model building, which makes the added flexibility of R highly desirable.
  • 8. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram 2. R AS A STATISTICAL SOFTWARE R and its libraries implement a wide variety of statistical and graphical techniques, including linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and others. R is easily extensible through functions and extensions, and the R community is noted for its active contributions in terms of packages. Many of R's standard functions are written in R itself which makes it easy for users to follow the algorithmic choices made. For computationally intensive tasks, C, C++, and Fortran code can be linked and call at run time. Advanced users can write C, C++,Java,.NET or Python code to manipulate R objects directly. R is highly extensible through the use of user-submitted packages for Specific functions or specific areas of study. Due to its S heritage, R has stronger object-oriented programming facilities than most statistical computing languages. Extending R is also eased by its lexical scoping rules. Another strength of R is static graphics, which can produce publication-quality graphs, including mathematical symbols. Dynamic and interactive graphics are available through additional packages. R has its own LaTeX-like documentation format, which is used to supply comprehensive documentation, both on-line in a number of formats and in hard copy. 2.1 Programming features of R R is an interpreted language; users typically access it through a command-line interpreter. If a user types 2+2 at the R command prompt and presses enter, the computer replies with 4, as shown below: > 2+2 [1] 4 R's data structures include vectors, matrices, arrays, data frames (similar to tables in a relational database) and lists. R's extensible object system include objects for (among others): regression models, time-series and geo-spatial coordinates. The scalar data type
  • 9. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram was never a data structure of R. Instead, a scalar is represented as a vector with length one. R supports procedural programming with functions and, for some functions, object- oriented programming with generic functions. A generic function acts differently depending on the type of arguments passed to it. In other words, the generic function dispatches the function (method) specific to that type of object. For example, R has a generic print function that can print almost every type of object in R with a simple print(objectname) syntax.
  • 10. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram 3. ADVANTAGES OVER OTHER STATISTICAL TOOLS In R, statistical analyses are normally done as a series of steps, with intermediate results being stored in objects, where the objects are later “interrogated” for the information of interest. This is in contrast to other widely used programs (e.g., SAS and SPSS), which print a large amount of output to the screen. Storing the results in objects so that information can be retrieved at later times allows for easily using the results of one analysis as input for another analysis. Furthermore, because the objects contain all pertinent model information, model modification can be easily performed by manipulation of the objects, a valuable benefit in many cases. R packages for new innovations in statistical computing also tend to become available more quickly than do such developments in other statistical software packages. Using R requires a more thoughtful approach to data analysis than does using some other programs, but that dates back to the idea of the S language being one where the user interacts with the data, as opposed to a “shotgun” approach, where the computer program provides everything thought to be relevant to the particular problem. For those who want to stay on the cutting edge of statistical developments, using R is a must. The flexibility of R is arguably unmatched by any other statistics program, as its object-oriented programming language allows for the creation of functions that perform customized procedures and/or the automation of tasks that are commonly performed.
  • 11. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram 4. R-PRELIMINARIES Expressions are entered directly into an R session at the prompt, which is generally denoted by the symbol >, The number sign (#) is used for comments; anything that follows a number sign on a line is ignored. 4.1Common Operators 4.1.1 Assignment Operator The expression <− is the assignment operator (assign what is on the right to the object on the left), as is −> (assign what is on the left to the object on the right). Eg: x<-2 Assigns the value 2 to the object x x^2->y Assigns the value x^2 to the object y 4.1.2 Arithmatic Operators + Addition - Subtract * Multiplication / Division ^ Exponential 4.1.3 Relational Operators < Lessthan > Greaterthan <= Lessthan Equal >= Greaterthan Equal == Is Equal to != Not Equal 4.1.4 Logical Operator ! NOT & AND | OR
  • 12. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram 5. R LANGUAGE ESSENTIALS This section outlines the basic aspects of the R language. It is necessary to do this in a slightly superficial manner, with some of the finer points glossed over. The emphasis is on items that are useful to know in interactive usage as opposed to actual programming. 5.1 Expressions and Objects The basic interaction mode in R is one of expression evaluation. The user enters an expression; the system evaluates it and prints the result. Some expressions are evaluated not for their result but for side effects such as putting up a graphics window or writing to a file. All R expressions return a value (possibly NULL), but sometimes it is “invisible” and not printed. Expressions typically involve variable references, operators such as +, and function calls, as well as some other items that have not been introduced yet. Expressions work on objects. This is an abstract term for anything that can be assigned to a variable. R contains several different types of objects. 5.2 Functions and Arguments Many things in R is done using function calls, commands that look like an application of a mathematical function of one or several variables; for example, log(x) or plot(height, weight). The format is that a function name is followed by a set of parentheses containing one or more arguments. For instance, in plot(height,weight) the function name is plot and the arguments are height and weight. These are the actual arguments, which apply only to the current call. A function also has formal arguments, which get connected to actual arguments in the call. When you write plot(height, weight), R assumes that the first argument corresponds to the x-variable and the second one to the y-variable. This is known as positional matching. Fortunately, R has methods to avoid this: Most arguments have sensible defaults and can be omitted in the standard cases, and there are nonpositional ways of specifying them when you need to depart from the default settings.
  • 13. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram 5.3 Vectors A character vector is a vector of text strings, whose elements are specified and printed in quotes: > c("Huey","Dewey","Louie") [1] "Huey" "Dewey" "Louie" It does not matter whether use single- or double-quote symbols, as long as the left quote is the same as the right quote: > c(‟Huey‟,‟Dewey‟,‟Louie‟) [1] "Huey" "Dewey" "Louie" Logical vectors are constructed using the c function just like the other vector types: > c(T,T,F,T) [1] TRUE TRUE FALSE TRUE It is much more common to use single logical values to turn an option on or off in a function call. 5.4 Matrices and Arrays A matrix in mathematics is just a two-dimensional array of numbers. Matrices are used for many purposes in theoretical and practical statistics. However, matrices and also higher-dimensional arrays do get used for simpler purposes as well, mainly to hold tables. In R, the matrix notion is extended to elements of any type, Matrices and arrays are represented as vectors with dimensions: > x <- 1:12 > dim(x) <- c(3,4) > x [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12
  • 14. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram The dim assignment function sets or changes the dimension attribute of x, causing R to treat the vector of 12 numbers as a 3 × 4 matrix. Notice that the storage is column-major; that is, the elements of the first column are followed by those of the second. A convenient way to create matrices is to use the matrix function: > matrix(1:12,nrow=3,byrow=T) [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 5 6 7 8 [3,] 9 10 11 12 Notice how the byrow=T switch causes the matrix to be filled in a row wise fashion rather than column wise. Useful functions that operate on matrices include rownames, colnames, and the transposition function t (notice the lowercase t as opposed to uppercase T for TRUE), which turns rows into columns and vice versa. 5.5 Lists It is sometimes useful to combine a collection of objects into a larger composite object. This can be done using lists. You can construct a list from its components with the function list As an example, consider a set of data concerning pre- and postmenstrual energy intake in a group of women. We can place these data in two vectors as follows: > intake.pre <- c(5260,5470,5640,6180,6390,6515,6805,7515,7515,8230,8770) > intake.post <- c(3910,4220,3885,5160,5645,4680,5265,5975,6790,6900,7335) To combine these individual vectors into a list, > mylist <- list(before=intake.pre,after=intake.post) > mylist $before [1] 5260 5470 5640 6180 6390 6515 6805 7515 7515 8230 8770 $after [1] 3910 4220 3885 5160 5645 4680 5265 5975 6790 6900 7335
  • 15. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram The components of the list are named according to the argument names used in list. Named components may be extracted like this: > mylist$before [1] 5260 5470 5640 6180 6390 6515 6805 7515 7515 8230 8770 Many of R‟s built-in functions compute more than a single vector of values and return their results in the form of a list. 5.6 Data Frames A data frame corresponds to what other statistical packages call a “data matrix” or a “data set”. It is a list of vectors and/or factors of the same length that are related “across” such that data in the same position come from the same experimental unit (subject, animal, etc.). In addition, it has a unique set of row names. We can create data frames from preexisting variables: > d <- data.frame(intake.pre,intake.post) > d intake.pre intake.post 1 5260 3910 2 5470 4220 3 5640 3885 4 6180 5160 5 6390 5645 6 6515 4680 7 6805 5265 8 7515 5975 9 7515 6790 10 8230 6900 11 8770 7335 As with lists, components (i.e., individual variables) can be accessed using
  • 16. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram the $ notation: > d$intake.pre [1] 5260 5470 5640 6180 6390 6515 6805 7515 7515 8230 8770 5.7 Indexing If you need a particular element in a vector, for instance the premenstrual energy intake for woman no. 5, > intake.pre[5] [1] 6390 The brackets are used for selection of data, also known as indexing or subsetting. This also works on the left-hand side of an assignment (so that you can say, for instance, intake.pre[5] <- 6390) if we want to modify elements of a vector. If we want a sub vector consisting of data for more than one woman, for instance nos. 3, 5, and 7, you can index with a vector: > intake.pre[c(3,5,7)] [1] 5640 6390 6805 Note that it is necessary to use the c(...)-construction to define the vector consisting of the three numbers 3, 5, and 7. intake.pre[3,5,7] would mean something completely different. It would specify indexing into a three-dimensional array. Indexing with a vector also works if the index vector is stored in a variable. This is useful when we need to index several variables in the same way. > v <- c(3,5,7) > intake.pre[v] [1] 5640 6390 6805 It is also worth noting that to get a sequence of elements, for instance the first five, you can use the a:b notation: > intake.pre[1:5] [1] 5260 5470 5640 6180 6390
  • 17. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram A neat feature of R is the possibility of negative indexing. We can get all observations except nos. 3, 5, and 7 by writing > intake.pre[-c(3,5,7)] [1] 5260 5470 6180 6515 7515 7515 8230 8770 It is not possible to mix positive and negative indices. That would be highly ambiguous. 5.8 Commonly used method of data input Following are the commonly used method of data input 5.8.1 Combine Function The most useful R-command for quickly entering small data sets is the „C‟ or combine function. This function combines term together. Eg: > y<-c(1,5,3,9) > y [1] 1 5 3 9 The combine function can also be used to construct a vector of character strings Eg: > Name<-c("bob","Jack","Simon") > Name [1] "bob" "Jack" "Simon" 5.8.2 Sequence Function The sequence operator “:” generate consecutive no‟s while the sequence function thus the same thing but more flexible. Eg: > 1:4 [1] 1 2 3 4 seq function: > seq(2,8,by=2) [1] 2 4 6 8 5.8.3 Scan Function
  • 18. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram Used to provide comparatively small quantities of data. The R command of this function is, Variable=scan() After this command type in the data values separated by single space,or comma, terminate data entry by double strike of enter key Eg: A<-scan() 1: 25 50 63 64 55 47 7: Read 6 items 5.8.4 Rep Function In order to enter the data continuing repeated values, rep function is useful y=rep(x,n) create the value y, with values of x repeated n times Eg: > x<-c(rep(1,4),rep(2,5)) > x [1] 1 1 1 1 2 2 2 2 2 5.8.5 Class Function This function is useful in deciding the class of the data object. Eg: > x<-c(1,2,3,4) > class(x) [1] "numeric" > y<-c("a","b","c") > class(y) [1] "character"
  • 19. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram 6 GRAPHICS In order to produce graphical output, the user calls a series of graphics functions, each of which produces either a complete plot, or adds some output to an existing plot. R graphics follows a painters model," which means that graphics output occurs in steps, with later output obscuring any previous output that it overlaps. Functions in the graphics systems and graphics packages can be broken down into three main types: high-level functions that produce complete plots; lowlevel functions that add further output to an existing plot; and functions for working interactively with graphical output. 6.1 Standard Plots R provides the usual range of standard statistical plots, including scatterplots, boxplots, histograms, barplots, piecharts, and basic 3D plots. 6.1.1 Scatter Plot The function plot() can be used to plot data. Although it has a diverse array of arguments, the most common specifications is of the form plot(x, y, type, col, xlim, ylim, xlab, ylab, main) where x is the data to be represented on the abscissa (x-axis) of the plot; y is the data to be represented on the ordinate (y-axis; note that the ordering of the values in x and y must be consistent, meaning that the first element in y is linked to the first element in x, etc.) type is the type of plot (e.g., p for points, l for lines, n for no plotting but setting up the structure of the plot so that points and/or lines are added later) col is the color of the points and lines xlim and ylim are the ranges of x-axis and y-axis, respectively xlab and ylab are the labels of x axis and y-axis, respectively and
  • 20. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram main is the title of the plot. All of the above arguments, except x and y, are optional. Eg: > age<-c(25,35,45,55,65) > frequency<-c(55,93,113,90,85) > plot(age,frequency,xlab=age,ylab=frequency,pch=1,main="frequency vs age") 6.1.2 Histogram The function to plot histograms is hist(). The basic specification is of the form hist(x, breaks, freq) where x is the data to be plotted, breaks defines the way to determine the location and/or quantity of bins, and freq is a logical statement of whether the histogram represents frequencies or probability densities. Eg: > midx<-seq(25,85,10) > fr<-c(10,24,18,12,8,5,3) > x<-rep(midx,fr) > brk<-seq(20,90,10) > hist(x,brk,main="Histogram",xlab="pocket money",ylab="no.of students")
  • 21. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram 6.1.3 Bar Plot It is used to represents grouped data. A bar graph is a chart that uses either vertical or horizontal bars to show comparisons among categories. The function is to plot bar chart is, barplot(x,y, type, col, xlim, ylim, xlab, ylab, main) Eg: > year<-1995:2000 > sales<-c(15,25,27,28,26,26.6) > sales.year<-data.frame(year,sales) > sales.year year sales 1 1995 15.0 2 1996 25.0 3 1997 27.0 4 1998 28.0 5 1999 26.0 6 2000 26.6 > barplot(sales.year,xlab="year",ylab="sales",col="grey") Histogram pocket money no.ofstudents 20 30 40 50 60 70 80 90 05101520
  • 22. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram 6.1.4 Box Plot It is a convenient way of graphically depicting groups of numerical data through their quartiles. Box plot may also have lines extending vertically from the boxes(whiskers) indicating variability outside the upper and lower quartiles. The function is to plot box plot is, boxplot(). Eg: >x<-rnorm(100,1,1) >boxplot(x,lwd=2) year sales 0510152025-101234
  • 23. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram 7 CONCLUSION R is a flexible programming language designed to facilitate exploratory data analysis, classical statistical tests, and high-level graphics. R is a full-fledged programming language, with a rich complement of mathematical functions, matrix operations and control structures. With its rich and ever-expanding library of packages, R is on the leading edge of development in statistics, data analytics, and data mining. R has proven itself a useful tool within the growing field of big data and has been integrated into several commercial packages, such as IBM SPSS and InfoSphere, as well as Mathematica.
  • 24. Seminar Report’16 Statistical Computation Using R Department of Computer Applications MESCE, Kuttippuram 8 REFERENCES  Introductory Statistics with R- Peter Dalgaard(2nd edition)  Statistical Computing with R- Eric Slud  Quick-R : Creating Graphs http://www.statmethods.net/graphs