SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Let’s start with R
> typos = c(2,3,0,3,1,0,0,1)
> typos
[1] 2 3 0 3 1 0 0 1
> mean(typos)
[1] 1.25
> median(typos)
[1] 1
> var(typos)
[1] 1.642857

•
•
•
•

“typos” represent number of typing errors on different pages
Note that each command is stored in history
You can use UP arrow key to retrieve your previous command
You have started using built-in functions
Let’s start with R
> typos.draft1 = c(2,3,0,3,1,0,0,1)
> typos.draft2 = c(0,3,0,3,1,0,0,1)
> typos.draft1
[1] 2 3 0 3 1 0 0 1
> typos.draft2
[1] 0 3 0 3 1 0 0 1

• Note the two different object names for two drafts
• Period has been used as punctuation in object names
• Both the object names represent a vector
Let’s start with R
> typos.draft1 = c(2,3,0,3,1,0,0,1)
> typos.draft2 = typos.draft1 # make a copy
> typos.draft2[1] = 0 # assign the first page 0 typing error
> typos.draft2
[1] 0 3 0 3 1 0 0 1

• Note how we have created the same typos.draft2
• “#” has been used for comments
• ‘()’ are for functions and ‘*+’ are for vectors
Now try and check ….
> typos.draft2 # print out the value
[1] 0 3 0 3 1 0 0 1
> typos.draft2[2] # print 2nd pages' value
[1] 3
> typos.draft2[4] # 4th page
[1] 3
> typos.draft2[-4] # all but the 4th page
[1] 0 3 0 1 0 0 1
> typos.draft2[c(1,2,3)] # print values for 1st, 2nd and 3rd.
[1] 0 3 0

• Note the output of the last command. This is called Slicing.
Numeric Vector
• Simplest data structure in R
• To set up a numeric vector named x assign values :
> x <- c(23.0,17.0,12.5,11.0,17.0,12.0,14.5,9.0,11.0)
> x
[1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0

Or
> assign ("x", c(23.0,17.0,12.5,11.0,17.0,12.0,14.5,9.0,11.0))
> x
[1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0
Numeric Vector
or
> rm(x)
> c(23.0,17.0,12.5,11.0,17.0,12.0,14.5,9.0,11.0) -> x
> x
[1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0

Look at the next assignment
> y <- c(x,0,1)
> y
[1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5

9.0 11.0

0.0

1.0

A vector y has been created with a copy of x with a zero and one
at the end.
Character Vector
A character vector is a set of text values

> weekdays <- c("Sun","Mon","Tues","Wed","Thurs","Fri","Sat")
> weekdays
[1] "Sun"
"Mon"
"Tues" "Wed"
"Thurs" "Fri"
"Sat"
Positive Index
• A positive index can appended in square brackets to the name
of a vector
• It helps to select subsets of the elements of a vector
> x[2]
[1] 17
> x[1:9]
[1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5
> x[3:7]
[1] 12.5 11.0 17.0 12.0 14.5
> x[c(2,5,7)]
[1] 17.0 17.0 14.5

9.0 11.0

• How do you find the number of elements in a vector?
> X
[1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5
>length(x)
[1] 9

9.0 11.0
Negative Index
• A negative index specifies the element(s) to be excluded
rather than included
> y<-x[-2] #Include all but the second element
> y
[1] 23.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0
> x
[1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0

• How do you exclude more than one element?
> X
[1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0
> y<-x[-(2:4)]
> y
[1] 23.0 17.0 12.0 14.5 9.0 11.0
> y<-x[-(c((2:4),9))] #exclude 2nd to 4th, and 9th elements
> y
[1] 23.0 17.0 12.0 14.5 9.0
Now try and check ….
> typos.draft2
# show all the values
[1] 0 3 0 3 1 0 0 1
> max(typos.draft2) # what are worst pages?
[1] 3
> typos.draft2 == 3 # Where are they?
[1] FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE

• Note the use of ‘==‘ for comparing
• But how do we get the indices (pages) having 3 typos?
> which(typos.draft2 == 3)
[1] 2 4

• You only get the index of the elements
Now try and check ….
> n = length(typos.draft2) # how many pages
> pages = 1:n # how we get the page numbers
> pages # pages is simply 1 to number of pages
[1] 1 2 3 4 5 6 7 8
> pages[typos.draft2 == 3] # logical extraction. Very useful
[1] 2 4

The idea is to create a new vector 1, 2, 3, …. keeping track of page
numbers and then slicing off ones for which typos.draft2===3
Now try and check ….
> sum(typos.draft2) # How many typos?
[1] 8
> sum(typos.draft2>0) # How many pages with typos?
[1] 4
> typos.draft1 - typos.draft2 # difference between the two
[1] 2 0 0 0 0 0 0 0

Well Done … Great!!
Now try and check ….
Suppose the daily closing price of your favourite stock for two weeks is
45,43,46,48,51,46,50,47,46,45
How do you keep track of this?
> x = c(45,43,46,48,51,46,50,47,46,45)
> x
[1] 45 43 46 48 51 46 50 47 46 45
> mean(x) # the mean
[1] 46.7
> median(x) # the median
[1] 46
> max(x) # the maximum or largest value
[1] 51
> min(x) # the minimum value
[1] 43

Hope you are enjoying many interesting functions ………
Now try and check ….
Let’s add the next two weeks worth of data to x. This was
48,49,51,50,49,41,40,38,35,40
> x = c(x,48,49,51,50,49) #
> length(x) # how long is x
[1] 15
> x[16] = 41 # add value to
> x[17:20] = c(40,38,35,40)
> x
[1] 45 43 46 48 51 46 50 47

append values to x
now (it was 10)

a specified index which is 16
# add to many specified indices
46 45 48 49 51 50 49 41 40 38 35 40

We did three different things to add to a vector.
• We used the c (combine) operator to combine the previous
value of x with the next week's numbers.
• We then assigned directly to the 16th index.
• Finally, we assigned to a slice of indices.
Now try and check ….
Suppose we want a 5-day moving average
> day<-5
> mean(x[day:(day+4)])
[1] 48
> day:(day+4)
[1] 5 6 7 8 9

How do you get running maximum or minimum till date?
> cummax(x) # running
[1] 45 45 46 48 51 51
> cummin(x) # running
[1] 45 43 43 43 43 43

maximum
51 51 51 51 51 51 51 51 51 51 51 51 51 51
minimum
43 43 43 43 43 43 43 43 43 41 40 38 35 35
Self-test
Suppose you keep track of your mileage each time you fill up. At
your last 8 fill-ups the mileage was
65311 65624 65908 66219 66499 66821 67145 67447
Enter these numbers into R. Use the function ‘diff’ on the data.
What does it give?
Use the max function to find the maximum number of miles
between fill-ups, the mean function to find the average number
of miles and the min function to get the minimum number of
miles.
Self-test
Suppose you track your commute times for two weeks (10 days)
and you find the following times in minutes
17 16 20 24 22 15 21 15 17 22
Enter this into R. Use the function max to find the longest
commute time, the function mean to find the average and the
function min to find the minimum.
The 24 was a mistake. It should have been 18. How can you fix
this? Do so, and then find the new average.
How many times was your commute 20 minutes or more? To
answer this you can try (if you call your numbers commutes)
> sum( commutes >= 20)
What do you get? What percent of your commutes are less than
17 minutes? How can you answer this with R?
Categorical Data
A survey asks people if they smoke or not.
The data is Yes, No, No, Yes, Yes
We can enter this into R with the c() command, and summarize
with the table command as
> x=c("Yes","No","No","Yes","Yes")
> table(x)
x
No Yes
2
3

The table command simply adds up the frequency of each
unique value of the data.
Categorical Data : Factor
Categorical data is often used to classify data into various levels
or factors. To make a factor is easy with the command factor or
as.factor.
> x #Print the values in x
[1] "Yes" "No" "No" "Yes" "Yes"
> factor(x) # print out value in factor(x)
[1] Yes No No Yes Yes
Levels: No Yes

Note that levels have been printed.
Categorical Data and Bar Chart
A bar chart draws a bar with a height proportional to the count in
the table. The height could be given by the frequency, or the
proportion.
Suppose, a group of 25 people are surveyed as to their beerdrinking preference. The categories were (1) Domestic
can, (2) Domestic bottle, (3) Microbrew and (4) import. The raw
data is
3411343313212123231111431
> beer = scan()
1: 3 4 1 1 3 4 3 3 1 3 2 1 2 1 2 3 2 3 1 1 1 1 4 3 1
26:
Read 25 items
> barplot(beer) # this isn't correct
Categorical Data and Bar Chart

There are 25 categories in the Bar Chart. But how many do we need?
Categorical Data and Bar Chart
> table(beer)
beer
1 2 3 4
10 4 8 3
> barplot(table(beer)) # Yes, call with summarized data

There are 4 categories now,
y-axis shows frequency
Categorical Data and Bar Chart
> barplot(table(beer)/length(beer)) # divide by n for proportion

There are 4 categories
now, y-axis shows
proportion
Categorical Data and Pie Charts
> beer.counts = table(beer) # store the table result
> pie(beer.counts) # first pie -- kind of dull
Categorical Data and Pie Charts
names(beer.counts) = c("domesticn can","Domesticn bottle",
+
"Microbrew","Import") # give names
> pie(beer.counts) # prints out names
Categorical Data and Pie Charts
pie(beer.counts,col=c("purple","green2","cyan","white"))
Stem and Leaf chart
Suppose you have the box score of a basketball game and and
the following points per game for players on both teams
2 3 16 23 14 12 4 13 2 0 0 0 6 28 31 14 4 8 2 5
Create a Stem and Leaf Chart
> scores = scan()
1: 2 3 16 23 14 12 4 13 2 0 0 0 6 28 31 14 4 8 2 5
21: Read 20 items
> stem(scores)
The decimal point is 1 digit(s) to the right of the |
0 | 000222344568
1 | 23446
2 | 38
3 | 1
Stem and Leaf chart
Suppose you have the box score of a basketball game and and
the following points per game for players on both teams
2 3 16 23 14 12 4 13 2 0 0 0 6 28 31 14 4 8 2 5
Create a Stem and Leaf Chart
> stem(scores,scale=2)
The decimal point is 1 digit(s) to the right of the |

0
0
1
1
2
2
3

|
|
|
|
|
|
|

000222344
568
2344
6
3
8
1
Making numeric data categorical
Suppose, CEO yearly compensations are sampled and the
following are found (in millions).
12 0.4 5 2 50 8 3 1 4 0.25
And we want to break that data into the intervals [0; 1]; (1; 5];
(5; 50] and name the same.
> sals = c(12, .4, 5, 2, 50, 8, 3, 1, 4, .25) # enter data
> cats = cut(sals,breaks=c(0,1,5,max(sals))) # specify the breaks
> cats # view the values
[1] (5,50] (0,1] (1,5] (1,5] (5,50] (5,50] (1,5] (0,1] (1,5]
Levels: (0,1] (1,5] (5,50]
> levels(cats) = c("poor","rich","rolling in it") # change labels
> table(cats)
cats
poor
rich rolling in it
3
4
3

(0,1]

Weitere ähnliche Inhalte

Was ist angesagt?

2.1 order of operations w
2.1 order of operations w2.1 order of operations w
2.1 order of operations w
Tzenma
 
Limitless and recursion-free recursion limits!
Limitless and recursion-free recursion limits!Limitless and recursion-free recursion limits!
Limitless and recursion-free recursion limits!
akaptur
 
1.2 algebraic expressions
1.2 algebraic expressions1.2 algebraic expressions
1.2 algebraic expressions
math260
 

Was ist angesagt? (20)

Topik 1
Topik 1Topik 1
Topik 1
 
Pandas Series
Pandas SeriesPandas Series
Pandas Series
 
2.1 order of operations w
2.1 order of operations w2.1 order of operations w
2.1 order of operations w
 
Functions & graphs
Functions & graphsFunctions & graphs
Functions & graphs
 
“Introduction to MATLAB & SIMULINK”
“Introduction to MATLAB  & SIMULINK”“Introduction to MATLAB  & SIMULINK”
“Introduction to MATLAB & SIMULINK”
 
1.3 solving equations t
1.3 solving equations t1.3 solving equations t
1.3 solving equations t
 
Module 3 quadratic functions
Module 3   quadratic functionsModule 3   quadratic functions
Module 3 quadratic functions
 
1.6 sign charts and inequalities t
1.6 sign charts and inequalities t1.6 sign charts and inequalities t
1.6 sign charts and inequalities t
 
Postgresql İndex on Expression
Postgresql İndex on ExpressionPostgresql İndex on Expression
Postgresql İndex on Expression
 
Postgresql index-expression
Postgresql index-expressionPostgresql index-expression
Postgresql index-expression
 
Data Visualization 2020_21
Data Visualization 2020_21Data Visualization 2020_21
Data Visualization 2020_21
 
Ch. 4.2 Hexagons
Ch. 4.2 HexagonsCh. 4.2 Hexagons
Ch. 4.2 Hexagons
 
11 graphs of first degree functions x
11 graphs of first degree functions x11 graphs of first degree functions x
11 graphs of first degree functions x
 
[1062BPY12001] Data analysis with R / week 3
[1062BPY12001] Data analysis with R / week 3[1062BPY12001] Data analysis with R / week 3
[1062BPY12001] Data analysis with R / week 3
 
Inequalities day 2 worked
Inequalities day 2 workedInequalities day 2 worked
Inequalities day 2 worked
 
Constraint propagation
Constraint propagationConstraint propagation
Constraint propagation
 
Module 1 quadratic functions
Module 1   quadratic functionsModule 1   quadratic functions
Module 1 quadratic functions
 
Limitless and recursion-free recursion limits!
Limitless and recursion-free recursion limits!Limitless and recursion-free recursion limits!
Limitless and recursion-free recursion limits!
 
1.2 algebraic expressions
1.2 algebraic expressions1.2 algebraic expressions
1.2 algebraic expressions
 
Function
FunctionFunction
Function
 

Ähnlich wie R part I

SessionFour_DataTypesandObjects
SessionFour_DataTypesandObjectsSessionFour_DataTypesandObjects
SessionFour_DataTypesandObjects
Hellen Gakuruh
 

Ähnlich wie R part I (20)

R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
 
R tutorial for a windows environment
R tutorial for a windows environmentR tutorial for a windows environment
R tutorial for a windows environment
 
Programming in R
Programming in RProgramming in R
Programming in R
 
Basic operations by novi reandy sasmita
Basic operations by novi reandy sasmitaBasic operations by novi reandy sasmita
Basic operations by novi reandy sasmita
 
NumPy_Broadcasting Data Science - Python.pptx
NumPy_Broadcasting Data Science - Python.pptxNumPy_Broadcasting Data Science - Python.pptx
NumPy_Broadcasting Data Science - Python.pptx
 
[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2[1062BPY12001] Data analysis with R / week 2
[1062BPY12001] Data analysis with R / week 2
 
Teacher Lecture
Teacher LectureTeacher Lecture
Teacher Lecture
 
Learn Matlab
Learn MatlabLearn Matlab
Learn Matlab
 
Es272 ch2
Es272 ch2Es272 ch2
Es272 ch2
 
07. Arrays
07. Arrays07. Arrays
07. Arrays
 
03_NumberSystems.pdf
03_NumberSystems.pdf03_NumberSystems.pdf
03_NumberSystems.pdf
 
Ggplot2 v3
Ggplot2 v3Ggplot2 v3
Ggplot2 v3
 
SessionFour_DataTypesandObjects
SessionFour_DataTypesandObjectsSessionFour_DataTypesandObjects
SessionFour_DataTypesandObjects
 
Raspberry Pi - Lecture 5 Python for Raspberry Pi
Raspberry Pi - Lecture 5 Python for Raspberry PiRaspberry Pi - Lecture 5 Python for Raspberry Pi
Raspberry Pi - Lecture 5 Python for Raspberry Pi
 
R programming
R programmingR programming
R programming
 
Pythonlearn-08-Lists.pptx
Pythonlearn-08-Lists.pptxPythonlearn-08-Lists.pptx
Pythonlearn-08-Lists.pptx
 
Solution of matlab chapter 6
Solution of matlab chapter 6Solution of matlab chapter 6
Solution of matlab chapter 6
 
An Introduction to MATLAB for beginners
An Introduction to MATLAB for beginnersAn Introduction to MATLAB for beginners
An Introduction to MATLAB for beginners
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
 
bobok
bobokbobok
bobok
 

Mehr von Ruru Chowdhury

Mehr von Ruru Chowdhury (20)

The One With The Wizards and Dragons. Prelims
The One With The Wizards and Dragons. PrelimsThe One With The Wizards and Dragons. Prelims
The One With The Wizards and Dragons. Prelims
 
The One With The Wizards and Dragons. Finals
The One With The Wizards and Dragons. FinalsThe One With The Wizards and Dragons. Finals
The One With The Wizards and Dragons. Finals
 
Statr session 25 and 26
Statr session 25 and 26Statr session 25 and 26
Statr session 25 and 26
 
Statr session 23 and 24
Statr session 23 and 24Statr session 23 and 24
Statr session 23 and 24
 
Statr session 21 and 22
Statr session 21 and 22Statr session 21 and 22
Statr session 21 and 22
 
Statr session 19 and 20
Statr session 19 and 20Statr session 19 and 20
Statr session 19 and 20
 
Statr session 17 and 18
Statr session 17 and 18Statr session 17 and 18
Statr session 17 and 18
 
Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)
 
Statr session 15 and 16
Statr session 15 and 16Statr session 15 and 16
Statr session 15 and 16
 
Statr session14, Jan 11
Statr session14, Jan 11Statr session14, Jan 11
Statr session14, Jan 11
 
JM Statr session 13, Jan 11
JM Statr session 13, Jan 11JM Statr session 13, Jan 11
JM Statr session 13, Jan 11
 
Statr sessions 11 to 12
Statr sessions 11 to 12Statr sessions 11 to 12
Statr sessions 11 to 12
 
Nosql part3
Nosql part3Nosql part3
Nosql part3
 
Nosql part1 8th December
Nosql part1 8th December Nosql part1 8th December
Nosql part1 8th December
 
Nosql part 2
Nosql part 2Nosql part 2
Nosql part 2
 
Statr sessions 9 to 10
Statr sessions 9 to 10Statr sessions 9 to 10
Statr sessions 9 to 10
 
R part iii
R part iiiR part iii
R part iii
 
R part II
R part IIR part II
R part II
 
Statr sessions 7 to 8
Statr sessions 7 to 8Statr sessions 7 to 8
Statr sessions 7 to 8
 
Statr sessions 4 to 6
Statr sessions 4 to 6Statr sessions 4 to 6
Statr sessions 4 to 6
 

Kürzlich hochgeladen

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Kürzlich hochgeladen (20)

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 

R part I

  • 1. Let’s start with R > typos = c(2,3,0,3,1,0,0,1) > typos [1] 2 3 0 3 1 0 0 1 > mean(typos) [1] 1.25 > median(typos) [1] 1 > var(typos) [1] 1.642857 • • • • “typos” represent number of typing errors on different pages Note that each command is stored in history You can use UP arrow key to retrieve your previous command You have started using built-in functions
  • 2. Let’s start with R > typos.draft1 = c(2,3,0,3,1,0,0,1) > typos.draft2 = c(0,3,0,3,1,0,0,1) > typos.draft1 [1] 2 3 0 3 1 0 0 1 > typos.draft2 [1] 0 3 0 3 1 0 0 1 • Note the two different object names for two drafts • Period has been used as punctuation in object names • Both the object names represent a vector
  • 3. Let’s start with R > typos.draft1 = c(2,3,0,3,1,0,0,1) > typos.draft2 = typos.draft1 # make a copy > typos.draft2[1] = 0 # assign the first page 0 typing error > typos.draft2 [1] 0 3 0 3 1 0 0 1 • Note how we have created the same typos.draft2 • “#” has been used for comments • ‘()’ are for functions and ‘*+’ are for vectors
  • 4. Now try and check …. > typos.draft2 # print out the value [1] 0 3 0 3 1 0 0 1 > typos.draft2[2] # print 2nd pages' value [1] 3 > typos.draft2[4] # 4th page [1] 3 > typos.draft2[-4] # all but the 4th page [1] 0 3 0 1 0 0 1 > typos.draft2[c(1,2,3)] # print values for 1st, 2nd and 3rd. [1] 0 3 0 • Note the output of the last command. This is called Slicing.
  • 5. Numeric Vector • Simplest data structure in R • To set up a numeric vector named x assign values : > x <- c(23.0,17.0,12.5,11.0,17.0,12.0,14.5,9.0,11.0) > x [1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0 Or > assign ("x", c(23.0,17.0,12.5,11.0,17.0,12.0,14.5,9.0,11.0)) > x [1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0
  • 6. Numeric Vector or > rm(x) > c(23.0,17.0,12.5,11.0,17.0,12.0,14.5,9.0,11.0) -> x > x [1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0 Look at the next assignment > y <- c(x,0,1) > y [1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0 0.0 1.0 A vector y has been created with a copy of x with a zero and one at the end.
  • 7. Character Vector A character vector is a set of text values > weekdays <- c("Sun","Mon","Tues","Wed","Thurs","Fri","Sat") > weekdays [1] "Sun" "Mon" "Tues" "Wed" "Thurs" "Fri" "Sat"
  • 8. Positive Index • A positive index can appended in square brackets to the name of a vector • It helps to select subsets of the elements of a vector > x[2] [1] 17 > x[1:9] [1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 > x[3:7] [1] 12.5 11.0 17.0 12.0 14.5 > x[c(2,5,7)] [1] 17.0 17.0 14.5 9.0 11.0 • How do you find the number of elements in a vector? > X [1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 >length(x) [1] 9 9.0 11.0
  • 9. Negative Index • A negative index specifies the element(s) to be excluded rather than included > y<-x[-2] #Include all but the second element > y [1] 23.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0 > x [1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0 • How do you exclude more than one element? > X [1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0 > y<-x[-(2:4)] > y [1] 23.0 17.0 12.0 14.5 9.0 11.0 > y<-x[-(c((2:4),9))] #exclude 2nd to 4th, and 9th elements > y [1] 23.0 17.0 12.0 14.5 9.0
  • 10. Now try and check …. > typos.draft2 # show all the values [1] 0 3 0 3 1 0 0 1 > max(typos.draft2) # what are worst pages? [1] 3 > typos.draft2 == 3 # Where are they? [1] FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE • Note the use of ‘==‘ for comparing • But how do we get the indices (pages) having 3 typos? > which(typos.draft2 == 3) [1] 2 4 • You only get the index of the elements
  • 11. Now try and check …. > n = length(typos.draft2) # how many pages > pages = 1:n # how we get the page numbers > pages # pages is simply 1 to number of pages [1] 1 2 3 4 5 6 7 8 > pages[typos.draft2 == 3] # logical extraction. Very useful [1] 2 4 The idea is to create a new vector 1, 2, 3, …. keeping track of page numbers and then slicing off ones for which typos.draft2===3
  • 12. Now try and check …. > sum(typos.draft2) # How many typos? [1] 8 > sum(typos.draft2>0) # How many pages with typos? [1] 4 > typos.draft1 - typos.draft2 # difference between the two [1] 2 0 0 0 0 0 0 0 Well Done … Great!!
  • 13. Now try and check …. Suppose the daily closing price of your favourite stock for two weeks is 45,43,46,48,51,46,50,47,46,45 How do you keep track of this? > x = c(45,43,46,48,51,46,50,47,46,45) > x [1] 45 43 46 48 51 46 50 47 46 45 > mean(x) # the mean [1] 46.7 > median(x) # the median [1] 46 > max(x) # the maximum or largest value [1] 51 > min(x) # the minimum value [1] 43 Hope you are enjoying many interesting functions ………
  • 14. Now try and check …. Let’s add the next two weeks worth of data to x. This was 48,49,51,50,49,41,40,38,35,40 > x = c(x,48,49,51,50,49) # > length(x) # how long is x [1] 15 > x[16] = 41 # add value to > x[17:20] = c(40,38,35,40) > x [1] 45 43 46 48 51 46 50 47 append values to x now (it was 10) a specified index which is 16 # add to many specified indices 46 45 48 49 51 50 49 41 40 38 35 40 We did three different things to add to a vector. • We used the c (combine) operator to combine the previous value of x with the next week's numbers. • We then assigned directly to the 16th index. • Finally, we assigned to a slice of indices.
  • 15. Now try and check …. Suppose we want a 5-day moving average > day<-5 > mean(x[day:(day+4)]) [1] 48 > day:(day+4) [1] 5 6 7 8 9 How do you get running maximum or minimum till date? > cummax(x) # running [1] 45 45 46 48 51 51 > cummin(x) # running [1] 45 43 43 43 43 43 maximum 51 51 51 51 51 51 51 51 51 51 51 51 51 51 minimum 43 43 43 43 43 43 43 43 43 41 40 38 35 35
  • 16. Self-test Suppose you keep track of your mileage each time you fill up. At your last 8 fill-ups the mileage was 65311 65624 65908 66219 66499 66821 67145 67447 Enter these numbers into R. Use the function ‘diff’ on the data. What does it give? Use the max function to find the maximum number of miles between fill-ups, the mean function to find the average number of miles and the min function to get the minimum number of miles.
  • 17. Self-test Suppose you track your commute times for two weeks (10 days) and you find the following times in minutes 17 16 20 24 22 15 21 15 17 22 Enter this into R. Use the function max to find the longest commute time, the function mean to find the average and the function min to find the minimum. The 24 was a mistake. It should have been 18. How can you fix this? Do so, and then find the new average. How many times was your commute 20 minutes or more? To answer this you can try (if you call your numbers commutes) > sum( commutes >= 20) What do you get? What percent of your commutes are less than 17 minutes? How can you answer this with R?
  • 18. Categorical Data A survey asks people if they smoke or not. The data is Yes, No, No, Yes, Yes We can enter this into R with the c() command, and summarize with the table command as > x=c("Yes","No","No","Yes","Yes") > table(x) x No Yes 2 3 The table command simply adds up the frequency of each unique value of the data.
  • 19. Categorical Data : Factor Categorical data is often used to classify data into various levels or factors. To make a factor is easy with the command factor or as.factor. > x #Print the values in x [1] "Yes" "No" "No" "Yes" "Yes" > factor(x) # print out value in factor(x) [1] Yes No No Yes Yes Levels: No Yes Note that levels have been printed.
  • 20. Categorical Data and Bar Chart A bar chart draws a bar with a height proportional to the count in the table. The height could be given by the frequency, or the proportion. Suppose, a group of 25 people are surveyed as to their beerdrinking preference. The categories were (1) Domestic can, (2) Domestic bottle, (3) Microbrew and (4) import. The raw data is 3411343313212123231111431 > beer = scan() 1: 3 4 1 1 3 4 3 3 1 3 2 1 2 1 2 3 2 3 1 1 1 1 4 3 1 26: Read 25 items > barplot(beer) # this isn't correct
  • 21. Categorical Data and Bar Chart There are 25 categories in the Bar Chart. But how many do we need?
  • 22. Categorical Data and Bar Chart > table(beer) beer 1 2 3 4 10 4 8 3 > barplot(table(beer)) # Yes, call with summarized data There are 4 categories now, y-axis shows frequency
  • 23. Categorical Data and Bar Chart > barplot(table(beer)/length(beer)) # divide by n for proportion There are 4 categories now, y-axis shows proportion
  • 24. Categorical Data and Pie Charts > beer.counts = table(beer) # store the table result > pie(beer.counts) # first pie -- kind of dull
  • 25. Categorical Data and Pie Charts names(beer.counts) = c("domesticn can","Domesticn bottle", + "Microbrew","Import") # give names > pie(beer.counts) # prints out names
  • 26. Categorical Data and Pie Charts pie(beer.counts,col=c("purple","green2","cyan","white"))
  • 27. Stem and Leaf chart Suppose you have the box score of a basketball game and and the following points per game for players on both teams 2 3 16 23 14 12 4 13 2 0 0 0 6 28 31 14 4 8 2 5 Create a Stem and Leaf Chart > scores = scan() 1: 2 3 16 23 14 12 4 13 2 0 0 0 6 28 31 14 4 8 2 5 21: Read 20 items > stem(scores) The decimal point is 1 digit(s) to the right of the | 0 | 000222344568 1 | 23446 2 | 38 3 | 1
  • 28. Stem and Leaf chart Suppose you have the box score of a basketball game and and the following points per game for players on both teams 2 3 16 23 14 12 4 13 2 0 0 0 6 28 31 14 4 8 2 5 Create a Stem and Leaf Chart > stem(scores,scale=2) The decimal point is 1 digit(s) to the right of the | 0 0 1 1 2 2 3 | | | | | | | 000222344 568 2344 6 3 8 1
  • 29. Making numeric data categorical Suppose, CEO yearly compensations are sampled and the following are found (in millions). 12 0.4 5 2 50 8 3 1 4 0.25 And we want to break that data into the intervals [0; 1]; (1; 5]; (5; 50] and name the same. > sals = c(12, .4, 5, 2, 50, 8, 3, 1, 4, .25) # enter data > cats = cut(sals,breaks=c(0,1,5,max(sals))) # specify the breaks > cats # view the values [1] (5,50] (0,1] (1,5] (1,5] (5,50] (5,50] (1,5] (0,1] (1,5] Levels: (0,1] (1,5] (5,50] > levels(cats) = c("poor","rich","rolling in it") # change labels > table(cats) cats poor rich rolling in it 3 4 3 (0,1]