SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
Data Analysis and
Programming in R
Eswar Sai Santosh Bandaru
Eswar Sai Santosh Bandaru
R
• What is R?
• Programming language meant for statistical analysis, data mining
• https://en.wikipedia.org/wiki/R_(programming_language)
• Why R?
• Effective data manipulation, Storage and graphical display
• Free of cost, open source
• Many packages contributed by experienced programmers/ statisticians
• https://cran.r-project.org/web/packages/available_packages_by_name.html
• Simple and elegant code, easy to learn
• Microsoft is integrating R in SQL server
• Problems:
• Memory management : data sits on RAM
• Speed
• Many developments are happening to address these problems.
Eswar Sai Santosh Bandaru
Eswar Sai Santosh Bandaru
R studio Interface: Console
Console:
Run your code
here
Eswar Sai Santosh Bandaru
R studio Interface: Editor Save and
edit your
code here
Eswar Sai Santosh Bandaru
R studio Interface: Output
Output – plots
and help
Eswar Sai Santosh Bandaru
General Things:
• Case sensitive
• Shortcuts:
• CTRL+ENTER (Important): Send code from editor to console and execute
• CTRL+2: Move the console from editor to console
• CTRL+1: MOVE the cursor from console to editor
• CTRL+UP IN CONSOLE: Retrieve previous commands
• # hash is used for commenting the code
• CTRL+SHIFT+C: comment/uncomment a block of code
Eswar Sai Santosh Bandaru
R as a calculator
• + : Addition -- 2+3 output:5
• - : Subtraction -- 4-5 output: -1
• * : Multiplication - 2*3 output:8
• ^ or ** : Exponentiation -- 2^3 or 2**3
• / : Division - 17/3 -- 5.66667
• %% : Modulo Division - 17%3-- 2
• %/% : Integer Division -17%/%3 -- 5
Eswar Sai Santosh Bandaru
Assignments and Expression
• “<-” is the assignment operator in R
• a<-3, 3 gets assigned to variable a
• Expressions
• Combination of numbers/variables/operators
• E.g., 2+3*a/14
• Order of Evaluation:
• ORDER OF EVALUATION: BRACKETS -> EXPONENTIATION-> DIVISION ->
MULTILICATION -> ADDITION/SUBTRACTION
• E.g., 7*9/13 - 10.1111
• -2^0.5 -- -1.414
• (-2) ^0.5 - NaN
• Q1
Eswar Sai Santosh Bandaru
Data Types
• Numeric: Real Numbers. E.g., 1.24, -3.12, 1
• Integer: Integer values. Suffix L is added
• Character: E.g., ‘a’ , “a”, “Hello World!”, “2”
• Logical: Boolean Type. TRUE (1), FALSE(0), T, F
• Complex: a+bi . a,b are real numbers
• Class(): function is used to check the class
• E.g., class(24) -- numeric
• E.g., class(24L)-- integer
Eswar Sai Santosh Bandaru
Data structures
• 4 main types:
• Vector
• Matrices
• Lists
• Data frames
• We would discuss vectors and data frames in today’s session
Eswar Sai Santosh Bandaru
Vectors:
• One dimension collection of objects of same kind (same data type)
• Vectors in R are similar to arrays in any other programming language
• Syntax: (1,2,3,4,5) . 1,2,3,4,5 are called elements
• (1,2,3,4,5) : numeric vector
• (‘a’,’b’,’c’,’d’): character vector
• (T, F, T, T): logical vector
• (1L,2L,3L): integer vector
• (1,2,3,4,6) ----- valid vector
• (1,’a’,3,’t’) ------ invalid vector (but R doesn’t throw an error due to
coercion
Eswar Sai Santosh Bandaru
Creating
• Basic ways:
• Using c()
• Using “:”
• Using seq()
• Using rep()
• Using vector()
Eswar Sai Santosh Bandaru
C() combine function
• Syntax:
• X<- C(1,2,4,78,90) creates a Numeric vector X with elements 1,2,4,78,90
• Y<- c(‘a’,’b’,’c’,’d’) creates a character vector Y with elements ‘a’, ‘b’, ‘c’,’d’
• Printing:
• X # Auto printing
• Print(x) # explicit printing
Eswar Sai Santosh Bandaru
Using “:”
• x <- 20:50
• Creates a numeric vector x with values starting from 20 till 50 with increments
of 1
• Ending value > Starting Value - default increment +1
• y <- 50:20
• Creates a numeric vector x with values starting from 50 till 20 with increments
of -1
• Ending value < Starting Value .- default increment -1
Eswar Sai Santosh Bandaru
Seq()
• X <- seq(2,50)
• Creates a numeric vector starting from 2 till 50 with increment of +1
• X <- seq(50,2)
• Creates a numeric vector starting from 50 till 2 with increment of -1
• X <- seq(2,50,2)
• Creates a numeric vector starting from 2 till 50 with increment of +2
• Increment can also be –ve if starting element > ending element
• ( 2, 4,6,8,10…….,50)
• X<- seq(‘a’,’b’,2) Throws an error
Eswar Sai Santosh Bandaru
Rep()
• X <- rep(c(1,2,3),times =2)
• Creates vector numeric vector X: 1,2,3,1,2,3
• The vector gets repeated twice
• rep(1:3, each =2)
• Output: 1,1,2,2,3,3
• Each element in the vector gets repeated twice
• rep(1:3,each=2,times =3)
• Output: 1,1,2,2,3,3, 1,1,2,2,3,3, 1,1,2,2,3,3,
• 2 steps
• 1:Each element gets repeated twice
• 2: the entire vector itself gets repeated thrice
• Different variations of rep-- ?rep
Eswar Sai Santosh Bandaru
Combining vectors
• X <-c(1,2,3,4,5)
• Y<-c(1,6,7,8)
• Z<-c(X,Y)
• Combines vectors X,Y and assigns to Z, output: 1,2,3,4,5,1,6,7,8
• Q1 – Q8
Eswar Sai Santosh Bandaru
vector()
• X<-vector() …empty vector with default data type:logical
• X<-vector (…)
Eswar Sai Santosh Bandaru
Subsetting vectors
X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’)
Index: 1 2 3 4 5 6
X[1]: ‘a’
• Unlike python, java…indexing starts from 1 in R
Eswar Sai Santosh Bandaru
Subsetting vectors
X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’)
Index: 1 2 3 4 5 6
X[5]: ‘e’
Eswar Sai Santosh Bandaru
Subsetting vectors
X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’)
Index: 1 2 3 4 5 6
X[-1]: ‘b’ ‘c’ ‘d’ ‘e’ ‘f’
Expect first
element
Eswar Sai Santosh Bandaru
Subsetting vectors
X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’)
Index: 1 2 3 4 5 6
X[1:3]: ‘a’ ‘b’ ‘c’
Not same as x[3:1]
Prints first
three
elements
Eswar Sai Santosh Bandaru
Subsetting vectors
X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’)
Index: 1 2 3 4 5 6
X[-1:-2]: ‘c’ ‘d’ ‘e’ ‘f’
or
X[-2:-1]: ‘c’ ‘d’ ‘e’ ‘f’
Eswar Sai Santosh Bandaru
Example
• X[1:(length(X)-1)]
• Prints every element except for the last element
Eswar Sai Santosh Bandaru
Element wise operations
• (45,20, 25,3,4)
+
• (2, 6, 10, 1, 3)
||
(47, 26, 35, 4, 7)
• (45,20, 25,3,4)
+
• (2, 6, 10, 1, 3)
||
(47, 26, 35, 4, 7)
• (45,20, 25,3,4)
+
• (2, 6, 10, 1, 3)
||
(47, 26, 35, 4, 7)
Eswar Sai Santosh Bandaru
Example:
• x1 <- c(1,2,3), x2 <- c(6,7,8). what is x1+2*x2
• (1,2,3)
• 2*(6,7,8) -- (12, 14, 16) ….recycling!
• (1,2,3) + (12,14,16) - (13,16,19)
Eswar Sai Santosh Bandaru
Recycling
• 1:5 + 1
• Internally 1,2,3,4,5 + 1,1,1,1,1 (1 gets recycled 5 times to match the length of
longer vector, then element wise operation occurs)
• 1:6 + c(1,2)
• Internally 1,2,3,4,5,6 + 1,2,1,2,1,2 (c(1,2) gets recycled to meet the length of
longer vector)
• C(1,2,3,4,5,6,7) + c(1,2,3,4) ( a warning !!)
• 1,2,3,4,5,6,7 + 1,2,3,4,1,2,3
Eswar Sai Santosh Bandaru
Q12: Create vector q using element wise
operations
Eswar Sai Santosh Bandaru
Subsetting a vector with logical vector
• Y <- c('a','b','c','d')
• Y[c(T,T,F,T)]
• ‘a’ ‘b’ ‘d’(selects the element if true else does not select)
• Recycling
• Y[c(T)]
• Vector T gets recycled till it matches the length of Y
• Every element gets printed
Eswar Sai Santosh Bandaru
Comparison operators
• X<- c(1,2,3,4,5,6,7)
• X>4 (x greater than 4)
• Outputs a logical vector having True for values greater than 4 and false for
values less than or equal to false
• Output: logical vector : F,F,F,F,T,T,T
• X[X>4]
• Selects elements from X which are greater than 4
• Output: 5,6,7
Eswar Sai Santosh Bandaru
Conditional operators in R
• conditional statements in R
• x == y : checks for equality, outputs TRUE if equal else FALSE
• x !=y : checks for inequality
• x >=y: greater than or equal
• x <=y
• x<y
• x>y
• You can combine both of them using & , or operators
• Q13-Q16
Eswar Sai Santosh Bandaru
Coercion
• x <- c(1,2,'a',3) -- Does not throw an error
• Other elements in the vector gets coerced to character
• Output: ‘1’,’2’,’a’,’3’
• priority for coercion; character> numeric> logical
• Logical converts to 1,0
• explicit coercion:
• as.* function s
• as.character (1:20) # customerID
• X<-c(‘a’,’b’,’c’,’d’)
• as.numeric(x)--- R produced NA’s
• Output: NA, NA, NA, NA
Eswar Sai Santosh Bandaru
Some important functions
• Which() : produces the indices of vector the condition is satisfied
• X <- c(10,2,4,5,0)
• Which(x>2)
• Output: 1, 3, 4
• all() : produces a logical vector if a condition is satisfied by all values in
a vector
• all(x>2): False
• any(): produces a logical vector if a condition is satisfied in any values
in a vector
• Any(x>2) :TRUE
Eswar Sai Santosh Bandaru
attributes
• Attributes: Give additional information about elements of a vector
• E.g., names of elements, dimensions, levels
• attributes(x) : shows all the available attributes of x
• If there are no attributes, r outputs NULL
• We can assign attributes to a created vector
• E.g., we can assign names to elements with function name()
• names(x) <- student_names
• Where student names is character vector containing names of students
Eswar Sai Santosh Bandaru
Subsetting using names attribute
• X[‘Cory’] -- prints marks of Cory
• Internally…using which() , R gets the index whose attribute name is “Cory”
• Then subsets based on the index
• X[c(‘Cory’,’James’)] - prints marks of Cory and James
• Q16
Eswar Sai Santosh Bandaru
Updating a vector: What if Cory’s marks get
updated
• X[1] <- 35
• Element at index 1 gets updated to 35
• X[x<30 &&x>25] <-40
• All the values which are less than 30 updated to 40
• X[“Cory”] <- 67
Eswar Sai Santosh Bandaru
is.na() and mean imputation
• x<- c(1,2,4,NA,5,NA)
• is.na(x): produces a logical vector, TRUE if element is NA else FALSE
• Output: F F F T F T
• Replace NA with the mean values????
Eswar Sai Santosh Bandaru
Factors attribute
• Converts a continuous vector in to a categorical data
• X<-c(1,1,1,2,2,2,3,3,3)
• Sum(x) : 18
• X<-factors(X)
• Sum(x) : error
• Levels(x): categories in x
• Output: “1” “2” “3”
• Class(X)
• Output: factor
Eswar Sai Santosh Bandaru
Table function: frequency table
• Counts the number of times an element occurs in vector
• X<-c(‘a’,’a’,’a’,’b’,’b’,’c’,’c’)
• table(x):
• a-3
• b-2
• c-2
• Useful while plotting barplot
Eswar Sai Santosh Bandaru
ls() and rm()
• ls() : Lists all the objects in the current R session(environment)
• rm(“d”) : removes the object d
• rm( list = ls()): removes all objects from the environment
Eswar Sai Santosh Bandaru
Data frames:
• Data frames are simply “tables” (rows and columns)
• Each column should be of same data type (hence all the vector
operations are valid for each column)
• Creation
• X<- data.frame(data for column1, data for column 2,…….)
• Column gets binded
• 2 dimensional
Eswar Sai Santosh Bandaru
Subsetting data frames…why?
• Very useful for analyzing the data
• As it 2 dimensional, it has 2 indices : row * columns
• test[3,2] : refers to element in 3rd row 2nd column
• test[1:3,1:2]: first three rows, 2 columns
• Using column names
• test$student_name : refers to column: student_name
• Its kind of vector!...so we can perform all vector operations
• test["student_name"] : refers to column student_name
• test["marks"]
Eswar Sai Santosh Bandaru
Students with higher than average marks??
• above_average<- (test$marks>mean(test$marks))
• test$student_names[above_average]
• Two steps:
• above_average is a logical vector
• Test$student_names[above_average] selecting students where the vector is
True
Eswar Sai Santosh Bandaru
Writing into csv
• Write.csv(test,”test.csv”)
• Gets saved to the default directory(folder) R is pointing to
• To know the default directory:
• Use getwd()
Eswar Sai Santosh Bandaru
Reading a csv file
• setwd(“directory path”)
• read.csv(“file name”)
• Different function to read different files
• dir() : lists all files in the current directory
Eswar Sai Santosh Bandaru
Data inspection
• str()
• head()
• tail()
Eswar Sai Santosh Bandaru
Dates and Times in R
• Dates are stored internally as the number of days since 1970-01-01
while times are stored internally as the number of seconds since
1970-01-01
Eswar Sai Santosh Bandaru
Data Visualization in R: Using R base graphics
• 3 types:
• base graphics
• ggplot2
• lattice
• Boxplots
• Barplots
• Histograms
• Scatter plots
Eswar Sai Santosh Bandaru

Weitere ähnliche Inhalte

Was ist angesagt?

4 Descriptive Statistics with R
4 Descriptive Statistics with R4 Descriptive Statistics with R
4 Descriptive Statistics with RDr Nisha Arora
 
Regular Expressions Cheat Sheet
Regular Expressions Cheat SheetRegular Expressions Cheat Sheet
Regular Expressions Cheat SheetAkash Bisariya
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmhktripathy
 
DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmPınar Yahşi
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clusteringKrish_ver2
 
Week-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptxWeek-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptxTake1As
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using RVictoria López
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data miningSulman Ahmed
 
1.6.data preprocessing
1.6.data preprocessing1.6.data preprocessing
1.6.data preprocessingKrish_ver2
 
Exploratory Data Analysis using Python
Exploratory Data Analysis using PythonExploratory Data Analysis using Python
Exploratory Data Analysis using PythonShirin Mojarad, Ph.D.
 
4. R- files Reading and Writing
4. R- files Reading and Writing4. R- files Reading and Writing
4. R- files Reading and Writingkrishna singh
 
Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)Anmol Dwivedi
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using RUmmiya Mohammedi
 

Was ist angesagt? (20)

4 Descriptive Statistics with R
4 Descriptive Statistics with R4 Descriptive Statistics with R
4 Descriptive Statistics with R
 
Regular Expressions Cheat Sheet
Regular Expressions Cheat SheetRegular Expressions Cheat Sheet
Regular Expressions Cheat Sheet
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering Algorithm
 
Data Mining with R programming
Data Mining with R programmingData Mining with R programming
Data Mining with R programming
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Week-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptxWeek-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptx
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Data Mining
Data MiningData Mining
Data Mining
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
1.6.data preprocessing
1.6.data preprocessing1.6.data preprocessing
1.6.data preprocessing
 
MS Sql Server: Creating Views
MS Sql Server: Creating ViewsMS Sql Server: Creating Views
MS Sql Server: Creating Views
 
Exploratory Data Analysis using Python
Exploratory Data Analysis using PythonExploratory Data Analysis using Python
Exploratory Data Analysis using Python
 
4. R- files Reading and Writing
4. R- files Reading and Writing4. R- files Reading and Writing
4. R- files Reading and Writing
 
Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
 

Andere mochten auch

R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-iDr. Awase Khirni Syed
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)Dataspora
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingAlberto Labarga
 
Ten Trends in Digital Analytics Today
Ten Trends in Digital Analytics TodayTen Trends in Digital Analytics Today
Ten Trends in Digital Analytics TodayKen Burbary
 
CGT Research May 2013: Analytics & Insights
CGT Research May 2013: Analytics & InsightsCGT Research May 2013: Analytics & Insights
CGT Research May 2013: Analytics & InsightsCognizant
 
R Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RR Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RRsquared Academy
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In RRsquared Academy
 
2 R Tutorial Programming
2 R Tutorial Programming2 R Tutorial Programming
2 R Tutorial ProgrammingSakthi Dasans
 
Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!Edureka!
 
R Programming: Introduction to Matrices
R Programming: Introduction to MatricesR Programming: Introduction to Matrices
R Programming: Introduction to MatricesRsquared Academy
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Revolution Analytics
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
Why digital analytics?
Why digital analytics?Why digital analytics?
Why digital analytics?Raymond Chau
 
Business Analytics Overview
Business Analytics OverviewBusiness Analytics Overview
Business Analytics OverviewSAP Analytics
 
Combining Methods: Web Analytics and User Research
Combining Methods: Web Analytics and User ResearchCombining Methods: Web Analytics and User Research
Combining Methods: Web Analytics and User ResearchUser Intelligence
 

Andere mochten auch (20)

R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-i
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 
Experience Certificate
Experience CertificateExperience Certificate
Experience Certificate
 
Ten Trends in Digital Analytics Today
Ten Trends in Digital Analytics TodayTen Trends in Digital Analytics Today
Ten Trends in Digital Analytics Today
 
CGT Research May 2013: Analytics & Insights
CGT Research May 2013: Analytics & InsightsCGT Research May 2013: Analytics & Insights
CGT Research May 2013: Analytics & Insights
 
R Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In RR Programming: Learn To Manipulate Strings In R
R Programming: Learn To Manipulate Strings In R
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
 
Just in time
Just in timeJust in time
Just in time
 
2 R Tutorial Programming
2 R Tutorial Programming2 R Tutorial Programming
2 R Tutorial Programming
 
Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!
 
R Programming: Introduction to Matrices
R Programming: Introduction to MatricesR Programming: Introduction to Matrices
R Programming: Introduction to Matrices
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics?
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Why digital analytics?
Why digital analytics?Why digital analytics?
Why digital analytics?
 
LSESU a Taste of R Language Workshop
LSESU a Taste of R Language WorkshopLSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
 
Business Analytics Overview
Business Analytics OverviewBusiness Analytics Overview
Business Analytics Overview
 
Tugas komdat 1
Tugas komdat 1Tugas komdat 1
Tugas komdat 1
 
R programming
R programmingR programming
R programming
 
Combining Methods: Web Analytics and User Research
Combining Methods: Web Analytics and User ResearchCombining Methods: Web Analytics and User Research
Combining Methods: Web Analytics and User Research
 

Ähnlich wie Data Analysis and Programming in R

An overview of Python 2.7
An overview of Python 2.7An overview of Python 2.7
An overview of Python 2.7decoupled
 
Datastructures in python
Datastructures in pythonDatastructures in python
Datastructures in pythonhydpy
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RHappy Garg
 
Introduction to R
Introduction to RIntroduction to R
Introduction to Rvpletap
 
Extensible Operators and Literals for JavaScript
Extensible Operators and Literals for JavaScriptExtensible Operators and Literals for JavaScript
Extensible Operators and Literals for JavaScriptBrendan Eich
 
Chapter 2&3 (java fundamentals and Control Structures).ppt
Chapter 2&3 (java fundamentals and Control Structures).pptChapter 2&3 (java fundamentals and Control Structures).ppt
Chapter 2&3 (java fundamentals and Control Structures).ppthenokmetaferia1
 
Day 1b R structures objects.pptx
Day 1b   R structures   objects.pptxDay 1b   R structures   objects.pptx
Day 1b R structures objects.pptxAdrien Melquiond
 
Programming Haskell Chapter8
Programming Haskell Chapter8Programming Haskell Chapter8
Programming Haskell Chapter8Kousuke Ruichi
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners Jen Stirrup
 

Ähnlich wie Data Analysis and Programming in R (20)

Ggplot2 v3
Ggplot2 v3Ggplot2 v3
Ggplot2 v3
 
R language introduction
R language introductionR language introduction
R language introduction
 
An overview of Python 2.7
An overview of Python 2.7An overview of Python 2.7
An overview of Python 2.7
 
A tour of Python
A tour of PythonA tour of Python
A tour of Python
 
Datastructures in python
Datastructures in pythonDatastructures in python
Datastructures in python
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Factors.pptx
Factors.pptxFactors.pptx
Factors.pptx
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
Extensible Operators and Literals for JavaScript
Extensible Operators and Literals for JavaScriptExtensible Operators and Literals for JavaScript
Extensible Operators and Literals for JavaScript
 
Chapter 2&3 (java fundamentals and Control Structures).ppt
Chapter 2&3 (java fundamentals and Control Structures).pptChapter 2&3 (java fundamentals and Control Structures).ppt
Chapter 2&3 (java fundamentals and Control Structures).ppt
 
Day 1b R structures objects.pptx
Day 1b   R structures   objects.pptxDay 1b   R structures   objects.pptx
Day 1b R structures objects.pptx
 
R Basics
R BasicsR Basics
R Basics
 
Programming Haskell Chapter8
Programming Haskell Chapter8Programming Haskell Chapter8
Programming Haskell Chapter8
 
Python lecture 05
Python lecture 05Python lecture 05
Python lecture 05
 
Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlab
 
Arrays
ArraysArrays
Arrays
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
 
R workshop
R workshopR workshop
R workshop
 

Data Analysis and Programming in R

  • 1. Data Analysis and Programming in R Eswar Sai Santosh Bandaru Eswar Sai Santosh Bandaru
  • 2. R • What is R? • Programming language meant for statistical analysis, data mining • https://en.wikipedia.org/wiki/R_(programming_language) • Why R? • Effective data manipulation, Storage and graphical display • Free of cost, open source • Many packages contributed by experienced programmers/ statisticians • https://cran.r-project.org/web/packages/available_packages_by_name.html • Simple and elegant code, easy to learn • Microsoft is integrating R in SQL server • Problems: • Memory management : data sits on RAM • Speed • Many developments are happening to address these problems. Eswar Sai Santosh Bandaru
  • 4. R studio Interface: Console Console: Run your code here Eswar Sai Santosh Bandaru
  • 5. R studio Interface: Editor Save and edit your code here Eswar Sai Santosh Bandaru
  • 6. R studio Interface: Output Output – plots and help Eswar Sai Santosh Bandaru
  • 7. General Things: • Case sensitive • Shortcuts: • CTRL+ENTER (Important): Send code from editor to console and execute • CTRL+2: Move the console from editor to console • CTRL+1: MOVE the cursor from console to editor • CTRL+UP IN CONSOLE: Retrieve previous commands • # hash is used for commenting the code • CTRL+SHIFT+C: comment/uncomment a block of code Eswar Sai Santosh Bandaru
  • 8. R as a calculator • + : Addition -- 2+3 output:5 • - : Subtraction -- 4-5 output: -1 • * : Multiplication - 2*3 output:8 • ^ or ** : Exponentiation -- 2^3 or 2**3 • / : Division - 17/3 -- 5.66667 • %% : Modulo Division - 17%3-- 2 • %/% : Integer Division -17%/%3 -- 5 Eswar Sai Santosh Bandaru
  • 9. Assignments and Expression • “<-” is the assignment operator in R • a<-3, 3 gets assigned to variable a • Expressions • Combination of numbers/variables/operators • E.g., 2+3*a/14 • Order of Evaluation: • ORDER OF EVALUATION: BRACKETS -> EXPONENTIATION-> DIVISION -> MULTILICATION -> ADDITION/SUBTRACTION • E.g., 7*9/13 - 10.1111 • -2^0.5 -- -1.414 • (-2) ^0.5 - NaN • Q1 Eswar Sai Santosh Bandaru
  • 10. Data Types • Numeric: Real Numbers. E.g., 1.24, -3.12, 1 • Integer: Integer values. Suffix L is added • Character: E.g., ‘a’ , “a”, “Hello World!”, “2” • Logical: Boolean Type. TRUE (1), FALSE(0), T, F • Complex: a+bi . a,b are real numbers • Class(): function is used to check the class • E.g., class(24) -- numeric • E.g., class(24L)-- integer Eswar Sai Santosh Bandaru
  • 11. Data structures • 4 main types: • Vector • Matrices • Lists • Data frames • We would discuss vectors and data frames in today’s session Eswar Sai Santosh Bandaru
  • 12. Vectors: • One dimension collection of objects of same kind (same data type) • Vectors in R are similar to arrays in any other programming language • Syntax: (1,2,3,4,5) . 1,2,3,4,5 are called elements • (1,2,3,4,5) : numeric vector • (‘a’,’b’,’c’,’d’): character vector • (T, F, T, T): logical vector • (1L,2L,3L): integer vector • (1,2,3,4,6) ----- valid vector • (1,’a’,3,’t’) ------ invalid vector (but R doesn’t throw an error due to coercion Eswar Sai Santosh Bandaru
  • 13. Creating • Basic ways: • Using c() • Using “:” • Using seq() • Using rep() • Using vector() Eswar Sai Santosh Bandaru
  • 14. C() combine function • Syntax: • X<- C(1,2,4,78,90) creates a Numeric vector X with elements 1,2,4,78,90 • Y<- c(‘a’,’b’,’c’,’d’) creates a character vector Y with elements ‘a’, ‘b’, ‘c’,’d’ • Printing: • X # Auto printing • Print(x) # explicit printing Eswar Sai Santosh Bandaru
  • 15. Using “:” • x <- 20:50 • Creates a numeric vector x with values starting from 20 till 50 with increments of 1 • Ending value > Starting Value - default increment +1 • y <- 50:20 • Creates a numeric vector x with values starting from 50 till 20 with increments of -1 • Ending value < Starting Value .- default increment -1 Eswar Sai Santosh Bandaru
  • 16. Seq() • X <- seq(2,50) • Creates a numeric vector starting from 2 till 50 with increment of +1 • X <- seq(50,2) • Creates a numeric vector starting from 50 till 2 with increment of -1 • X <- seq(2,50,2) • Creates a numeric vector starting from 2 till 50 with increment of +2 • Increment can also be –ve if starting element > ending element • ( 2, 4,6,8,10…….,50) • X<- seq(‘a’,’b’,2) Throws an error Eswar Sai Santosh Bandaru
  • 17. Rep() • X <- rep(c(1,2,3),times =2) • Creates vector numeric vector X: 1,2,3,1,2,3 • The vector gets repeated twice • rep(1:3, each =2) • Output: 1,1,2,2,3,3 • Each element in the vector gets repeated twice • rep(1:3,each=2,times =3) • Output: 1,1,2,2,3,3, 1,1,2,2,3,3, 1,1,2,2,3,3, • 2 steps • 1:Each element gets repeated twice • 2: the entire vector itself gets repeated thrice • Different variations of rep-- ?rep Eswar Sai Santosh Bandaru
  • 18. Combining vectors • X <-c(1,2,3,4,5) • Y<-c(1,6,7,8) • Z<-c(X,Y) • Combines vectors X,Y and assigns to Z, output: 1,2,3,4,5,1,6,7,8 • Q1 – Q8 Eswar Sai Santosh Bandaru
  • 19. vector() • X<-vector() …empty vector with default data type:logical • X<-vector (…) Eswar Sai Santosh Bandaru
  • 20. Subsetting vectors X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’) Index: 1 2 3 4 5 6 X[1]: ‘a’ • Unlike python, java…indexing starts from 1 in R Eswar Sai Santosh Bandaru
  • 21. Subsetting vectors X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’) Index: 1 2 3 4 5 6 X[5]: ‘e’ Eswar Sai Santosh Bandaru
  • 22. Subsetting vectors X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’) Index: 1 2 3 4 5 6 X[-1]: ‘b’ ‘c’ ‘d’ ‘e’ ‘f’ Expect first element Eswar Sai Santosh Bandaru
  • 23. Subsetting vectors X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’) Index: 1 2 3 4 5 6 X[1:3]: ‘a’ ‘b’ ‘c’ Not same as x[3:1] Prints first three elements Eswar Sai Santosh Bandaru
  • 24. Subsetting vectors X<-( ‘a’ , ‘b’, ‘c’, ‘d’, ‘e’, ‘f’) Index: 1 2 3 4 5 6 X[-1:-2]: ‘c’ ‘d’ ‘e’ ‘f’ or X[-2:-1]: ‘c’ ‘d’ ‘e’ ‘f’ Eswar Sai Santosh Bandaru
  • 25. Example • X[1:(length(X)-1)] • Prints every element except for the last element Eswar Sai Santosh Bandaru
  • 26. Element wise operations • (45,20, 25,3,4) + • (2, 6, 10, 1, 3) || (47, 26, 35, 4, 7) • (45,20, 25,3,4) + • (2, 6, 10, 1, 3) || (47, 26, 35, 4, 7) • (45,20, 25,3,4) + • (2, 6, 10, 1, 3) || (47, 26, 35, 4, 7) Eswar Sai Santosh Bandaru
  • 27. Example: • x1 <- c(1,2,3), x2 <- c(6,7,8). what is x1+2*x2 • (1,2,3) • 2*(6,7,8) -- (12, 14, 16) ….recycling! • (1,2,3) + (12,14,16) - (13,16,19) Eswar Sai Santosh Bandaru
  • 28. Recycling • 1:5 + 1 • Internally 1,2,3,4,5 + 1,1,1,1,1 (1 gets recycled 5 times to match the length of longer vector, then element wise operation occurs) • 1:6 + c(1,2) • Internally 1,2,3,4,5,6 + 1,2,1,2,1,2 (c(1,2) gets recycled to meet the length of longer vector) • C(1,2,3,4,5,6,7) + c(1,2,3,4) ( a warning !!) • 1,2,3,4,5,6,7 + 1,2,3,4,1,2,3 Eswar Sai Santosh Bandaru
  • 29. Q12: Create vector q using element wise operations Eswar Sai Santosh Bandaru
  • 30. Subsetting a vector with logical vector • Y <- c('a','b','c','d') • Y[c(T,T,F,T)] • ‘a’ ‘b’ ‘d’(selects the element if true else does not select) • Recycling • Y[c(T)] • Vector T gets recycled till it matches the length of Y • Every element gets printed Eswar Sai Santosh Bandaru
  • 31. Comparison operators • X<- c(1,2,3,4,5,6,7) • X>4 (x greater than 4) • Outputs a logical vector having True for values greater than 4 and false for values less than or equal to false • Output: logical vector : F,F,F,F,T,T,T • X[X>4] • Selects elements from X which are greater than 4 • Output: 5,6,7 Eswar Sai Santosh Bandaru
  • 32. Conditional operators in R • conditional statements in R • x == y : checks for equality, outputs TRUE if equal else FALSE • x !=y : checks for inequality • x >=y: greater than or equal • x <=y • x<y • x>y • You can combine both of them using & , or operators • Q13-Q16 Eswar Sai Santosh Bandaru
  • 33. Coercion • x <- c(1,2,'a',3) -- Does not throw an error • Other elements in the vector gets coerced to character • Output: ‘1’,’2’,’a’,’3’ • priority for coercion; character> numeric> logical • Logical converts to 1,0 • explicit coercion: • as.* function s • as.character (1:20) # customerID • X<-c(‘a’,’b’,’c’,’d’) • as.numeric(x)--- R produced NA’s • Output: NA, NA, NA, NA Eswar Sai Santosh Bandaru
  • 34. Some important functions • Which() : produces the indices of vector the condition is satisfied • X <- c(10,2,4,5,0) • Which(x>2) • Output: 1, 3, 4 • all() : produces a logical vector if a condition is satisfied by all values in a vector • all(x>2): False • any(): produces a logical vector if a condition is satisfied in any values in a vector • Any(x>2) :TRUE Eswar Sai Santosh Bandaru
  • 35. attributes • Attributes: Give additional information about elements of a vector • E.g., names of elements, dimensions, levels • attributes(x) : shows all the available attributes of x • If there are no attributes, r outputs NULL • We can assign attributes to a created vector • E.g., we can assign names to elements with function name() • names(x) <- student_names • Where student names is character vector containing names of students Eswar Sai Santosh Bandaru
  • 36. Subsetting using names attribute • X[‘Cory’] -- prints marks of Cory • Internally…using which() , R gets the index whose attribute name is “Cory” • Then subsets based on the index • X[c(‘Cory’,’James’)] - prints marks of Cory and James • Q16 Eswar Sai Santosh Bandaru
  • 37. Updating a vector: What if Cory’s marks get updated • X[1] <- 35 • Element at index 1 gets updated to 35 • X[x<30 &&x>25] <-40 • All the values which are less than 30 updated to 40 • X[“Cory”] <- 67 Eswar Sai Santosh Bandaru
  • 38. is.na() and mean imputation • x<- c(1,2,4,NA,5,NA) • is.na(x): produces a logical vector, TRUE if element is NA else FALSE • Output: F F F T F T • Replace NA with the mean values???? Eswar Sai Santosh Bandaru
  • 39. Factors attribute • Converts a continuous vector in to a categorical data • X<-c(1,1,1,2,2,2,3,3,3) • Sum(x) : 18 • X<-factors(X) • Sum(x) : error • Levels(x): categories in x • Output: “1” “2” “3” • Class(X) • Output: factor Eswar Sai Santosh Bandaru
  • 40. Table function: frequency table • Counts the number of times an element occurs in vector • X<-c(‘a’,’a’,’a’,’b’,’b’,’c’,’c’) • table(x): • a-3 • b-2 • c-2 • Useful while plotting barplot Eswar Sai Santosh Bandaru
  • 41. ls() and rm() • ls() : Lists all the objects in the current R session(environment) • rm(“d”) : removes the object d • rm( list = ls()): removes all objects from the environment Eswar Sai Santosh Bandaru
  • 42. Data frames: • Data frames are simply “tables” (rows and columns) • Each column should be of same data type (hence all the vector operations are valid for each column) • Creation • X<- data.frame(data for column1, data for column 2,…….) • Column gets binded • 2 dimensional Eswar Sai Santosh Bandaru
  • 43. Subsetting data frames…why? • Very useful for analyzing the data • As it 2 dimensional, it has 2 indices : row * columns • test[3,2] : refers to element in 3rd row 2nd column • test[1:3,1:2]: first three rows, 2 columns • Using column names • test$student_name : refers to column: student_name • Its kind of vector!...so we can perform all vector operations • test["student_name"] : refers to column student_name • test["marks"] Eswar Sai Santosh Bandaru
  • 44. Students with higher than average marks?? • above_average<- (test$marks>mean(test$marks)) • test$student_names[above_average] • Two steps: • above_average is a logical vector • Test$student_names[above_average] selecting students where the vector is True Eswar Sai Santosh Bandaru
  • 45. Writing into csv • Write.csv(test,”test.csv”) • Gets saved to the default directory(folder) R is pointing to • To know the default directory: • Use getwd() Eswar Sai Santosh Bandaru
  • 46. Reading a csv file • setwd(“directory path”) • read.csv(“file name”) • Different function to read different files • dir() : lists all files in the current directory Eswar Sai Santosh Bandaru
  • 47. Data inspection • str() • head() • tail() Eswar Sai Santosh Bandaru
  • 48. Dates and Times in R • Dates are stored internally as the number of days since 1970-01-01 while times are stored internally as the number of seconds since 1970-01-01 Eswar Sai Santosh Bandaru
  • 49. Data Visualization in R: Using R base graphics • 3 types: • base graphics • ggplot2 • lattice • Boxplots • Barplots • Histograms • Scatter plots Eswar Sai Santosh Bandaru