SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Introduction to R
We are drowning in information and starving for knowledge.
 Survey R
3Confidential | Copyright © Fractal 2013
What does the Economic Survey tell us about Policy
making & Data ?
 People discount the importance of playing with the most
obvious data and working creatively with it – Universal Basic
Income
 When most evident sources of data fail to suffice, some Out
of the Box thinking is very helpful– Migration
 Thanks to the world of Big Data we can now move to
Space..!!! – Cities & Property Taxes
One India: District Level Railway Passenger Flow
One India: District Level Railway Passenger Flow
APC
AR
AS
BR CG DL
GA
GJ
HRHP
JK
JH
KA
KL
MP
MH
MN
MG
MZ
NA
OR PB
RJ
SK
TN
TR
UP
UK
WB
BJ
GD
NM
SH
XZ
0
5
10
15
7 8 9 10 11
Real GDP per capita in PPP (log) in 2004
AverageGrowthRateofRealGDPpercapita(%)
China India World
One India: District Level Railway Passenger Flow
APC
AR
AS
BR
CG
DL
GA
GJ HR
HP
JKJH
KA
KL
MP
MH
MN
MG
NA
OR
PB
RJ
SK
TN
TR
UP
UK
WB
BJ
GD
GZ
NM
SH
XZ
0
5
10
6 7 8 9 10
Real GDP per capita in PPP (log) in 1994
AverageGrowthRateofRealGDPPerCapita(%)
China India World
One India: Railway Traffic Movement Plot
8Confidential | Copyright © Fractal 2013
Cities Satellite Data: Night Lights
9Confidential | Copyright © Fractal 2013
Satellite Imagery processing through Machine Learning
10Confidential | Copyright © Fractal 2013
Lesson #3 – Bangalore and Jaipur can collect 5-20
times their current property tax collection !!
UBI: Welfare Scheme Misallocation and Poverty HCR
 Introduction R
 R vs Stata vs Excel
 R vs Stata vs Excel
R Environment
Components of R language – R environment (Objects and
Symbols)
 Objects:
 All R code manipulates objects
 Examples of objects in R include
 Numeric vectors
 character vectors
 Lists
 Functions
 Symbols:
 Formally, variable names in R are called symbols
 When you assign an object to a variable name, you are actually assigning the object to a symbol in the current environment
 R environment:
 An environment is defined as the set of symbols that are defined in a certain context
 For example, the statement:
> x <- 1
 assigns the symbol “x” to the object “1” in the current environment
Components of R language - Expressions
 R code is composed of a series of expressions
 Examples of expressions in R include
 assignment statements
 conditional statements
 arithmetic expressions
 Expressions are composed of objects and functions
 You may separate expressions with new lines or with semicolons
 Example :
 Using semicolons
"this expression will be printed"; 7 + 13; exp(0+1i*pi)
 Using new lines
"this expression will be printed“
7 + 13
exp(0+1i*pi)
 Basic Operations and Data structures in R
Basic Operations in R
 R has a wide variety of data structures, we will look at few basic ones
 Vectors (numerical, character, logical)
 Matrices
 Data frames
 Lists
 Your first Operations in R
 When you enter an expression into the R console and press the Enter key, R will evaluate that expression and display
the results
 The interactive R interpreter will automatically print an object returned by an expression entered into the R console
> 1 + 2 + 3
[1] 6
 In R, any number that you enter in the console is interpreted as a vector
Variables in R
 R lets you assign values to variables and refer to them by name.
 In R, the assignment operator is <-. Usually, this is pronounced as “gets.”
 The statement: x <- 1 is usually read as “x gets 1.”
 There are two additional operators that can be used for assigning values to symbols.
 First, you can use a single equals sign (“=”) for assignment
 you can also assign an object on the left to a symbol on the right:
> 3 -> three
 Whichever notation you prefer,
 Be careful because the = operator does not mean “equals.” For that, you need to use the ==
operator
 Note that you cannot use the <- operator when passing arguments to a function; you need to map values to argument names
using the “=” symbol.
What is a Vector in R??
 A vector is an ordered collection of same data type
 The “[1]” means that the index of the first item displayed in the row is 1
 You can construct longer vectors using the c(...) function. (c stands for “combine.”)
> c(0, 1, 1, 2, 3, 5, 8)
[1] 0 1 1 2 3 5 8
> 1:50
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[23] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
[45] 45 46 47 48 49 50
 The numbers in the brackets on the left hand side of the results indicate the index of the first element shown in each row
 When you perform an operation on two vectors, R will match the elements of the two vectors pair wise and return a vector
> c(1, 2, 3, 4) + c(10, 20, 30, 40)
[1] 11 22 33 44
 If the two vectors aren’t the same size, R will repeat the smaller sequence multiple times:
> c(1, 2, 3, 4, 5) + c(10, 100)
[1] 11 102 13 104 15
Warning message:
In c(1, 2, 3, 4, 5) + c(10, 100) :
longer object length is not a multiple of shorter object length
Arrays
 An array is a multidimensional vector.
 Vectors and arrays are stored the same way internally, but an array may be displayed differently and accessed differently.
 An array object is just a vector that’s associated with a dimension attribute.
 Let’s define an array explicitly
>a <- array(c(1,2,3,4,5,6,7,8,9,10,11,12),dim=c(3,4))
> a
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
 Here is how you reference one cell
a[2,2]
[1] 5
 Arrays can have more than two dimensions.
> w <- array(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18),dim=c(3,3,2))
> w
Arrays & Matrix
 R uses very clean syntax for referring to part of an array. You specify separate indices for each dimension, separated by
commas
> w[1,1,1]
[1] 1
 To get all rows (or columns) from a dimension, simply omit the indices
> # first row only
> a[1,]
[1] 1 4 7 10
> # first column only
> a[,1]
[1] 1 2 3
 A matrix is just a two-dimensional array
> m <- matrix(data=c(1,2,3,4,5,6,7,8,9,10,11,12),nrow=3,ncol=4)
> m
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Data Frames
 A data frame is a list that contains multiple named vectors of same length
 A data frame is a lot like a spreadsheet or a database table
 Data frames are particularly good for representing data
 Let’s construct a data frame with the win/loss results in the National League
> teams <- c("PHI","NYM","FLA","ATL","WSN")
> w <- c(92, 89, 94, 72, 59)
> l <- c(70, 73, 77, 90, 102)
> nleast <- data.frame(teams,w,l)
> nleast
teams w l
1 PHI 92 70
2 NYM 89 73
3 FLA 94 77
4 ATL 72 90
5 WSN 59 102
 You can refer to the components of a data frame (or items in a list) by name using the $ operator
>nleast$ teams
Lists
 It’s possible to construct more complicated structures with multiple data types.
 R has a built-in data type for mixing objects of different types, called lists.
 Lists in R may contain a heterogeneous selection of objects.
 You can name each component in a list.
 Items in a list may be referred to by either location or name.
 Creating your first list
> e <- list(thing="hat", size="8.25")
> e
 You can access an item in the list in multiple ways
 Using the name with help of $ operator
> e$thing
 Using the location as index
> e[1]
 A list can even contain other lists
Revision: Data Structures
Some of the data types are:
• Factor: Categorical variable
• Vector
• Matrix
• Data Frame
• List
To identify the data type of an object we us the function class
> library(datasets)
> air <- airquality
> class(air)
> [1] "data.frame"
Data Types
Data Types
To check whether the object/variable is of a certain type, use is. functions
is.numeric(), is.character(), is.vector(), is.matrix(), is.data.frame()
These are Logical functions
Returns TRUE/FALSE values
To convert an object/variable of a certain type to another, use as. functions
as.numeric(), as.character(), as.vector(), as.matrix(), as.data.frame(),
as.factor(), as.list()
> is.numeric(airquality$Ozone)
> [1] TRUE
> airquality$Ozone <- as.character(airquality$Ozone)
> is.numeric(airquality$Ozone)
[1] FALSE
> is.character(airquality$Ozone)
> [1] TRUE
Saving, Loading, and Editing Data
 Create a few vectors
> salary <- c(18700000,14626720,14137500,13980000,12916666)
> position <- c("QB","QB","DE","QB","QB")
> team <- c("Colts","Patriots","Panthers","Bengals","Giants")
> name.last <- c("Manning","Brady","Pepper","Palmer","Manning")
> name.first <- c("Peyton","Tom","Julius","Carson","Eli")
 Use the data.frame function to combine the vectors
> top.5.salaries <- data.frame(name.last,name.first,team,position,salary)
 top.5.salaries
 R allows you to save and load R data objects to external files
 The simplest way to save an object is with the save function
> save(top.5.salaries, file="C:/Documents and Settings/me/My Documents/top.5.salaries.Rdata")
 Note that the file argument must be explicitly named
 In R, file paths are always specified with forward slashes (“/”), even on Microsoft Windows and then assigns the result to the
same symbol in the calling environment
 You can easily load this object back into R with the load function
> load("C:/Documents and Settings/me/My Documents/top.5.salaries.Rdata")
Importing Data into R
 read.csv
 To read comma separated values into R
 SYNTAX: read.csv(filepath)
 Sample (social sector schemes file)
 read.xlsx
 To read data from Excel sheets into R
 Requires library “xlsx”
 SYNTAX: read.xlsx(filepath, sheetName=)
 Tricky to use in case of Java version mismatch
 read.dta
 To read data from Stata files into R
 Requires library “foreign”
 SYNTAX: read.dta(filepath)
 read.table
 To read data from tables
 A generic version of all the other formats mentioned above
 SYNTAX: read.table(filepath)
Working Directory: Truncated Filepaths
 For reading files easily, one way is to specify working directory
 Usual way:
 file <- read.csv(“/Users/parthkhare/Documents/dataframe.csv”)
 Truncated way:
 getwd()
 setwd(“/Users/parthkhare/Documents/”)
 file<- read.csv(“dataframe.csv”)
 Cheat way:
 file<- read.csv(file.choose())
R Packages
 A package is a related set of functions, help files, and data files that have been bundled together
 Typically, all of the functions in the package are related:
 R offers an enormous number of packages:
 Some of these packages are included with R, To get the list of packages loaded by default use the following commands,
>getOption("defaultPackages") # This command omits the base package
> (.packages())
 To show all packages available
> (.packages(all.available=TRUE))
> library() #new window will pop up showing you the set of available packages
 Installing R package
> install.packages(c("tree","maptree"))
#This will install the packages to the default library specified by the variable .Library
 Loading Packages
> library(rpart)
 Removing Packages
> remove.packages(c("tree", "maptree"),.Library)
# You need to specify the library where the packages were installed
Getting Help
 R includes a help system to help you get information about installed packages
 To get help on a function, say glm()
> help(glm)
or, equivalently:
> ?glm
 The following can be very helpful if you can’t remember the name of a function; R will return a list of relevant topics
> ??regression
Data Manipulation
Names, Renaming
Syntax : names(dataset)
> names(airquality)
1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day"
> names(airquality) <- NULL
> names(airquality)
> NULL
Renaming
In the following example we will change the variable name “Ozone” to”Oz”
> names(airquality) <- org.names
> names(airquality)[names(airquality)=="Ozone"]= "Oz"
[1] "Oz" "Solar.R" "Wind" "Temp" "Month" "Day"
#Renaming the second variable in data frame “airquality” to “NewName”
> names(airquality)[2] = "Sol"
> names(airquality)
[1] "Oz" "Sol" "Wind" "Temp" "Month" "Day"
Drop/Keep Variables
 Selecting (Keeping) Variables
• # select variables “Ozone “ and “Temp”
> names(airquality) <- org.names
> keep.airquality <- airquality[c("Ozone", “Temp")]
# select 1st and 3rd through 5th variables
> keep.airquality_1 <- airquality[c(1,3:5)]
 Excluding (DROPPING) Variables
• Dropping a variable from the dataset can be done by prefixing a “-” sign
before the variable name or the variable index in the Dataframe.
> drop.airquality <- airquality[,c(-3, -4)]
Subsetting datasets
Subseting is done by using subset function
#subsetting the data set “airquality” where Temperature is greater than 80
> subset_1 <- subset(airquality, Temp>80)
#subsetting the data set “airquality” where Temperature is greater than 80 and finally get only the “Day”
column
> subset_2 = subset(airquality, Temp>80, select=c(“Day"))
#subsetting a column where Temperature is greater than 80 and Day is equal to 8, notice the “==”
> subset_3 = subset(airquality, Temp<80& Day==8)
#subsetting rows without using “subset” function, notice the [ ] square brackets
> subset_4 = airquality[airquality$Temp==80, ]
#We use the %in% notation when we want to subset rows on multiple values of a variable
> subset_5 = airquality[airquality$Temp %in% c(70,90), ]
> subset_5.1 = airquality[airquality$Temp %in% c(70:90), ]
Appending
 Appending two datasets require that both have exactly the same number
of variables with exactly the same name. If using categorical data make
sure the categories on both datasets refer to exactly the same thing (i.e.
1 “Agree”, 2”Disagree”).
 If datasets do not have the same number of variables you can either drop
or create them so both match.
 rbind /smartbind (gtools package) function is used for appending the two
dataframes.
> headair <- head(airquality)
> tailair <- tail(airquality)
> append <- rbind(headair,tailair)
> smartappend <- smartbind(headair,tailair)
Sorting
 To sort a data frame in R, use the order( ) function. By default, sorting is
ASCENDING. Prepend the sorting variable by a minus sign to indicate
DESCENDING order. Here are some examples.
 sorting examples using the mtcars dataset
attach(mtcars)
# sort by hp in ascending order
> sort.mtcars<-mtcars[order(mtcars$hp),]
# sort by hp in discending order
> sort.mtcars<-mtcars[order(-mtcars$hp),]
#Multi level sort a dataset by columns in descending order, put a “-” sign,
> sort.mtcars<-mtcars[order(vs, -mtcars$hp),]
Remove Duplicate Values
Duplicates are identified using “duplicated” function
#To remove duplicate rows by 2nd column from airquality
> dupair1 = airquality[!duplicated(airquality[,c(2)]),]
#To get duplicate rows in another dataset just remove the “!” sign
> dupair2 = airquality[duplicated(airquality[,c(2)]),]
Merging 2 datasets
 Merging two datasets require that both have at least one variable in common
(either string or numeric). If string make sure the categories have the same
spelling (i.e. country names, etc.).
 Merge merges only common cases to both datasets . Adding the option “all=TRUE”
includes all cases from both datasets.
 To merge two data frames (datasets) horizontally, use the merge function. In most
cases, you join two data frames by one or more common key variables (i.e., an
inner join).
• # merge two data frames by ID
total <- merge(data frameA,data frameB,by="ID")
 Different possible cases while merging data
• a full outer join (all records from both tables) can be created with the "all"
keyword:
e.g. merge(d1,d2,all=TRUE)
• a left outer join of two dataset can be created with all.x:
e.g. merge(d1,d2,all.x=TRUE)
• a right outer join of two dataset can be created with all.y:
e.g. merge(d1,d2,all.y=TRUE)
Date functions
 Dates are represented as the number of days since 1970-01-01,with negative values for earlier date.
 Sys.date() returns today’s date
 Date()returns the current date and time
 Date conversion : use as.date() to convert any string format to date format
 Syntax:as.date(x,format=“ “,tz=..)
Arguments:
x:an object to be converted
format: A character string. If not specified ,it will try “%Y-%m-%d” then “%Y/%m/%d” on the first non-NA
element and give an error if neither works
tz: a timezone name
The following symbols can be used with the format( ) function to print dates
Symbol Meaning Example
%d day as a number (0-31) 01-31
%a
%A
abbreviated weekday
unabbreviated weekday
Mon
Monday
%m month (00-12) 00-12
%b
%B
abbreviated month
unabbreviated month
Jan
January
%y
%Y
2-digit year
4-digit year
07
2007
Useful Packages
 The Reshape2 Package :
 Melting:
 When you melt a dataset, you restructure it into a format where each measured variable is in its own row, along
with the ID variables needed to uniquely identify it
 Syntax:melt(data, id=)
Arguments:
data:dataset that you want to melt
id:Id variables
 Example:consider the following table for the melt function
library(reshape)
md <- melt(mydata, id=(c("id", "time")))
 Package ‘data.table’: Extension of data.frame for fast indexing, fast ordered joins,fast assignment, fast
grouping
and list columns
 Package ‘plyr’: For splitting, applying and combining data
 Package ‘stringr’ :Make it easier to work with strings
ID Time X1 X2
1 1 5 6
1 2 3 5
2 1 6 1
2 2 2 4
General Utility Function
 which()
 attach()
 head()
 tail()
 with()
 didq_summry()
 sumry_continuos()
 sumry_categorical()
 cat_ident()
 ident_cont()
 ident_cat()
General Utility Function
 read.csv
 read.xlsx
 read.dta
 read.table
 Reserve
Special Values
 NA
 In R, the NA values are used to represent missing values. (NA stands for “not available.”)
 You will encounter NA values in text loaded into R (to represent missing values) or in data loaded from databases (to
replace NULL values)
 If you expand the size of a vector (or matrix or array) beyond the size where values were defined, the new spaces will
have the value NA (meaning “not available”)
 Inf and -Inf
 If a computation results in a number that is too big, R will return Inf for a positive number and -Inf for a negative
number (meaning positive and negative infinity, respectively)
 NaN
 Sometimes, a computation will produce a result that makes little sense. In these cases, R will often return NaN
(meaning “not a number”)
 E.g. Inf – Inf or 0 / 0
 NULL
 Additionally, there is a null object in R, represented by the symbol NULL
 The symbol NULL always points to the same object
 NULL is often used as an argument in functions to mean that no value was assigned to the argument. Additionally,
some functions may return NULL
 NULL is not the same as NA, Inf, -Inf, or NaN

Weitere ähnliche Inhalte

Was ist angesagt?

Stata cheat sheet: data processing
Stata cheat sheet: data processingStata cheat sheet: data processing
Stata cheat sheet: data processingTim Essam
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)Dataspora
 
Stata Programming Cheat Sheet
Stata Programming Cheat SheetStata Programming Cheat Sheet
Stata Programming Cheat SheetLaura Hughes
 
5. working on data using R -Cleaning, filtering ,transformation, Sampling
5. working on data using R -Cleaning, filtering ,transformation, Sampling5. working on data using R -Cleaning, filtering ,transformation, Sampling
5. working on data using R -Cleaning, filtering ,transformation, Samplingkrishna singh
 
Data Analysis and Programming in R
Data Analysis and Programming in RData Analysis and Programming in R
Data Analysis and Programming in REshwar Sai
 
Stata Cheat Sheets (all)
Stata Cheat Sheets (all)Stata Cheat Sheets (all)
Stata Cheat Sheets (all)Laura Hughes
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factorskrishna singh
 
3 R Tutorial Data Structure
3 R Tutorial Data Structure3 R Tutorial Data Structure
3 R Tutorial Data StructureSakthi Dasans
 
2 data structure in R
2 data structure in R2 data structure in R
2 data structure in Rnaroranisha
 

Was ist angesagt? (20)

Stata cheat sheet: data processing
Stata cheat sheet: data processingStata cheat sheet: data processing
Stata cheat sheet: data processing
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
 
Language R
Language RLanguage R
Language R
 
Programming in R
Programming in RProgramming in R
Programming in R
 
Stata Programming Cheat Sheet
Stata Programming Cheat SheetStata Programming Cheat Sheet
Stata Programming Cheat Sheet
 
R programming language
R programming languageR programming language
R programming language
 
5. working on data using R -Cleaning, filtering ,transformation, Sampling
5. working on data using R -Cleaning, filtering ,transformation, Sampling5. working on data using R -Cleaning, filtering ,transformation, Sampling
5. working on data using R -Cleaning, filtering ,transformation, Sampling
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
3 Data Structure in R
3 Data Structure in R3 Data Structure in R
3 Data Structure in R
 
Basic Analysis using R
Basic Analysis using RBasic Analysis using R
Basic Analysis using R
 
Basic Analysis using Python
Basic Analysis using PythonBasic Analysis using Python
Basic Analysis using Python
 
Data Analysis and Programming in R
Data Analysis and Programming in RData Analysis and Programming in R
Data Analysis and Programming in R
 
R lecture oga
R lecture ogaR lecture oga
R lecture oga
 
Stata Cheat Sheets (all)
Stata Cheat Sheets (all)Stata Cheat Sheets (all)
Stata Cheat Sheets (all)
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
 
Data transformation-cheatsheet
Data transformation-cheatsheetData transformation-cheatsheet
Data transformation-cheatsheet
 
Data Management in R
Data Management in RData Management in R
Data Management in R
 
3 R Tutorial Data Structure
3 R Tutorial Data Structure3 R Tutorial Data Structure
3 R Tutorial Data Structure
 
R Basics
R BasicsR Basics
R Basics
 
2 data structure in R
2 data structure in R2 data structure in R
2 data structure in R
 

Ähnlich wie Big Data Mining in Indian Economic Survey 2017

Ähnlich wie Big Data Mining in Indian Economic Survey 2017 (20)

R교육1
R교육1R교육1
R교육1
 
R tutorial for a windows environment
R tutorial for a windows environmentR tutorial for a windows environment
R tutorial for a windows environment
 
R Programming.pptx
R Programming.pptxR Programming.pptx
R Programming.pptx
 
R programming
R programmingR programming
R programming
 
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
 
R language tutorial.pptx
R language tutorial.pptxR language tutorial.pptx
R language tutorial.pptx
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
Arrays in C language
Arrays in C languageArrays in C language
Arrays in C language
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
statistical computation using R- an intro..
statistical computation using R- an intro..statistical computation using R- an intro..
statistical computation using R- an intro..
 
Ggplot2 v3
Ggplot2 v3Ggplot2 v3
Ggplot2 v3
 
6. R data structures
6. R data structures6. R data structures
6. R data structures
 
Arrays
ArraysArrays
Arrays
 
R tutorial (R program 101)
R tutorial (R program 101)R tutorial (R program 101)
R tutorial (R program 101)
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
R_CheatSheet.pdf
R_CheatSheet.pdfR_CheatSheet.pdf
R_CheatSheet.pdf
 
8. Vectors data frames
8. Vectors data frames8. Vectors data frames
8. Vectors data frames
 
Data structures "1" (Lectures 2015-2016)
Data structures "1" (Lectures 2015-2016) Data structures "1" (Lectures 2015-2016)
Data structures "1" (Lectures 2015-2016)
 
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptxfINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
 
R workshop
R workshopR workshop
R workshop
 

Kürzlich hochgeladen

Top Rated Pune Call Girls Bhosari ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated  Pune Call Girls Bhosari ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...Top Rated  Pune Call Girls Bhosari ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated Pune Call Girls Bhosari ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...Call Girls in Nagpur High Profile
 
PPT Item # 4 - 231 Encino Ave (Significance Only)
PPT Item # 4 - 231 Encino Ave (Significance Only)PPT Item # 4 - 231 Encino Ave (Significance Only)
PPT Item # 4 - 231 Encino Ave (Significance Only)ahcitycouncil
 
The U.S. Budget and Economic Outlook (Presentation)
The U.S. Budget and Economic Outlook (Presentation)The U.S. Budget and Economic Outlook (Presentation)
The U.S. Budget and Economic Outlook (Presentation)Congressional Budget Office
 
Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)NAP Global Network
 
Coastal Protection Measures in Hulhumale'
Coastal Protection Measures in Hulhumale'Coastal Protection Measures in Hulhumale'
Coastal Protection Measures in Hulhumale'NAP Global Network
 
VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25...
VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25...VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25...
VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25...SUHANI PANDEY
 
Call Girls Nanded City Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Nanded City Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Nanded City Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Nanded City Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
TEST BANK For Essentials of Negotiation, 7th Edition by Roy Lewicki, Bruce Ba...
TEST BANK For Essentials of Negotiation, 7th Edition by Roy Lewicki, Bruce Ba...TEST BANK For Essentials of Negotiation, 7th Edition by Roy Lewicki, Bruce Ba...
TEST BANK For Essentials of Negotiation, 7th Edition by Roy Lewicki, Bruce Ba...robinsonayot
 
2024 Zoom Reinstein Legacy Asbestos Webinar
2024 Zoom Reinstein Legacy Asbestos Webinar2024 Zoom Reinstein Legacy Asbestos Webinar
2024 Zoom Reinstein Legacy Asbestos WebinarLinda Reinstein
 
Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...MOHANI PANDEY
 
Election 2024 Presiding Duty Keypoints_01.pdf
Election 2024 Presiding Duty Keypoints_01.pdfElection 2024 Presiding Duty Keypoints_01.pdf
Election 2024 Presiding Duty Keypoints_01.pdfSamirsinh Parmar
 
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
The Economic and Organised Crime Office (EOCO) has been advised by the Office...
The Economic and Organised Crime Office (EOCO) has been advised by the Office...The Economic and Organised Crime Office (EOCO) has been advised by the Office...
The Economic and Organised Crime Office (EOCO) has been advised by the Office...nservice241
 
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxxIncident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxxPeter Miles
 
Postal Ballots-For home voting step by step process 2024.pptx
Postal Ballots-For home voting step by step process 2024.pptxPostal Ballots-For home voting step by step process 2024.pptx
Postal Ballots-For home voting step by step process 2024.pptxSwastiRanjanNayak
 
VIP Model Call Girls Shikrapur ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Shikrapur ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Shikrapur ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Shikrapur ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Zechariah Boodey Farmstead Collaborative presentation - Humble Beginnings
Zechariah Boodey Farmstead Collaborative presentation -  Humble BeginningsZechariah Boodey Farmstead Collaborative presentation -  Humble Beginnings
Zechariah Boodey Farmstead Collaborative presentation - Humble Beginningsinfo695895
 
Night 7k to 12k Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
Night 7k to 12k  Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...Night 7k to 12k  Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
Night 7k to 12k Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...aartirawatdelhi
 
Financing strategies for adaptation. Presentation for CANCC
Financing strategies for adaptation. Presentation for CANCCFinancing strategies for adaptation. Presentation for CANCC
Financing strategies for adaptation. Presentation for CANCCNAP Global Network
 

Kürzlich hochgeladen (20)

Top Rated Pune Call Girls Bhosari ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated  Pune Call Girls Bhosari ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...Top Rated  Pune Call Girls Bhosari ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated Pune Call Girls Bhosari ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
 
PPT Item # 4 - 231 Encino Ave (Significance Only)
PPT Item # 4 - 231 Encino Ave (Significance Only)PPT Item # 4 - 231 Encino Ave (Significance Only)
PPT Item # 4 - 231 Encino Ave (Significance Only)
 
The U.S. Budget and Economic Outlook (Presentation)
The U.S. Budget and Economic Outlook (Presentation)The U.S. Budget and Economic Outlook (Presentation)
The U.S. Budget and Economic Outlook (Presentation)
 
Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)
 
Coastal Protection Measures in Hulhumale'
Coastal Protection Measures in Hulhumale'Coastal Protection Measures in Hulhumale'
Coastal Protection Measures in Hulhumale'
 
VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25...
VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25...VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25...
VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25...
 
Call Girls Nanded City Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Nanded City Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Nanded City Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Nanded City Call Me 7737669865 Budget Friendly No Advance Booking
 
TEST BANK For Essentials of Negotiation, 7th Edition by Roy Lewicki, Bruce Ba...
TEST BANK For Essentials of Negotiation, 7th Edition by Roy Lewicki, Bruce Ba...TEST BANK For Essentials of Negotiation, 7th Edition by Roy Lewicki, Bruce Ba...
TEST BANK For Essentials of Negotiation, 7th Edition by Roy Lewicki, Bruce Ba...
 
2024 Zoom Reinstein Legacy Asbestos Webinar
2024 Zoom Reinstein Legacy Asbestos Webinar2024 Zoom Reinstein Legacy Asbestos Webinar
2024 Zoom Reinstein Legacy Asbestos Webinar
 
Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
 
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
 
Election 2024 Presiding Duty Keypoints_01.pdf
Election 2024 Presiding Duty Keypoints_01.pdfElection 2024 Presiding Duty Keypoints_01.pdf
Election 2024 Presiding Duty Keypoints_01.pdf
 
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Booking
 
The Economic and Organised Crime Office (EOCO) has been advised by the Office...
The Economic and Organised Crime Office (EOCO) has been advised by the Office...The Economic and Organised Crime Office (EOCO) has been advised by the Office...
The Economic and Organised Crime Office (EOCO) has been advised by the Office...
 
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxxIncident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
Incident Command System xxxxxxxxxxxxxxxxxxxxxxxxx
 
Postal Ballots-For home voting step by step process 2024.pptx
Postal Ballots-For home voting step by step process 2024.pptxPostal Ballots-For home voting step by step process 2024.pptx
Postal Ballots-For home voting step by step process 2024.pptx
 
VIP Model Call Girls Shikrapur ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Shikrapur ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Shikrapur ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Shikrapur ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Zechariah Boodey Farmstead Collaborative presentation - Humble Beginnings
Zechariah Boodey Farmstead Collaborative presentation -  Humble BeginningsZechariah Boodey Farmstead Collaborative presentation -  Humble Beginnings
Zechariah Boodey Farmstead Collaborative presentation - Humble Beginnings
 
Night 7k to 12k Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
Night 7k to 12k  Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...Night 7k to 12k  Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
Night 7k to 12k Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
 
Financing strategies for adaptation. Presentation for CANCC
Financing strategies for adaptation. Presentation for CANCCFinancing strategies for adaptation. Presentation for CANCC
Financing strategies for adaptation. Presentation for CANCC
 

Big Data Mining in Indian Economic Survey 2017

  • 1. Introduction to R We are drowning in information and starving for knowledge.
  • 3. 3Confidential | Copyright © Fractal 2013 What does the Economic Survey tell us about Policy making & Data ?  People discount the importance of playing with the most obvious data and working creatively with it – Universal Basic Income  When most evident sources of data fail to suffice, some Out of the Box thinking is very helpful– Migration  Thanks to the world of Big Data we can now move to Space..!!! – Cities & Property Taxes
  • 4. One India: District Level Railway Passenger Flow
  • 5. One India: District Level Railway Passenger Flow APC AR AS BR CG DL GA GJ HRHP JK JH KA KL MP MH MN MG MZ NA OR PB RJ SK TN TR UP UK WB BJ GD NM SH XZ 0 5 10 15 7 8 9 10 11 Real GDP per capita in PPP (log) in 2004 AverageGrowthRateofRealGDPpercapita(%) China India World
  • 6. One India: District Level Railway Passenger Flow APC AR AS BR CG DL GA GJ HR HP JKJH KA KL MP MH MN MG NA OR PB RJ SK TN TR UP UK WB BJ GD GZ NM SH XZ 0 5 10 6 7 8 9 10 Real GDP per capita in PPP (log) in 1994 AverageGrowthRateofRealGDPPerCapita(%) China India World
  • 7. One India: Railway Traffic Movement Plot
  • 8. 8Confidential | Copyright © Fractal 2013 Cities Satellite Data: Night Lights
  • 9. 9Confidential | Copyright © Fractal 2013 Satellite Imagery processing through Machine Learning
  • 10. 10Confidential | Copyright © Fractal 2013 Lesson #3 – Bangalore and Jaipur can collect 5-20 times their current property tax collection !!
  • 11. UBI: Welfare Scheme Misallocation and Poverty HCR
  • 13.  R vs Stata vs Excel
  • 14.  R vs Stata vs Excel R Environment
  • 15. Components of R language – R environment (Objects and Symbols)  Objects:  All R code manipulates objects  Examples of objects in R include  Numeric vectors  character vectors  Lists  Functions  Symbols:  Formally, variable names in R are called symbols  When you assign an object to a variable name, you are actually assigning the object to a symbol in the current environment  R environment:  An environment is defined as the set of symbols that are defined in a certain context  For example, the statement: > x <- 1  assigns the symbol “x” to the object “1” in the current environment
  • 16. Components of R language - Expressions  R code is composed of a series of expressions  Examples of expressions in R include  assignment statements  conditional statements  arithmetic expressions  Expressions are composed of objects and functions  You may separate expressions with new lines or with semicolons  Example :  Using semicolons "this expression will be printed"; 7 + 13; exp(0+1i*pi)  Using new lines "this expression will be printed“ 7 + 13 exp(0+1i*pi)
  • 17.  Basic Operations and Data structures in R
  • 18. Basic Operations in R  R has a wide variety of data structures, we will look at few basic ones  Vectors (numerical, character, logical)  Matrices  Data frames  Lists  Your first Operations in R  When you enter an expression into the R console and press the Enter key, R will evaluate that expression and display the results  The interactive R interpreter will automatically print an object returned by an expression entered into the R console > 1 + 2 + 3 [1] 6  In R, any number that you enter in the console is interpreted as a vector
  • 19. Variables in R  R lets you assign values to variables and refer to them by name.  In R, the assignment operator is <-. Usually, this is pronounced as “gets.”  The statement: x <- 1 is usually read as “x gets 1.”  There are two additional operators that can be used for assigning values to symbols.  First, you can use a single equals sign (“=”) for assignment  you can also assign an object on the left to a symbol on the right: > 3 -> three  Whichever notation you prefer,  Be careful because the = operator does not mean “equals.” For that, you need to use the == operator  Note that you cannot use the <- operator when passing arguments to a function; you need to map values to argument names using the “=” symbol.
  • 20. What is a Vector in R??  A vector is an ordered collection of same data type  The “[1]” means that the index of the first item displayed in the row is 1  You can construct longer vectors using the c(...) function. (c stands for “combine.”) > c(0, 1, 1, 2, 3, 5, 8) [1] 0 1 1 2 3 5 8 > 1:50 [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 [45] 45 46 47 48 49 50  The numbers in the brackets on the left hand side of the results indicate the index of the first element shown in each row  When you perform an operation on two vectors, R will match the elements of the two vectors pair wise and return a vector > c(1, 2, 3, 4) + c(10, 20, 30, 40) [1] 11 22 33 44  If the two vectors aren’t the same size, R will repeat the smaller sequence multiple times: > c(1, 2, 3, 4, 5) + c(10, 100) [1] 11 102 13 104 15 Warning message: In c(1, 2, 3, 4, 5) + c(10, 100) : longer object length is not a multiple of shorter object length
  • 21. Arrays  An array is a multidimensional vector.  Vectors and arrays are stored the same way internally, but an array may be displayed differently and accessed differently.  An array object is just a vector that’s associated with a dimension attribute.  Let’s define an array explicitly >a <- array(c(1,2,3,4,5,6,7,8,9,10,11,12),dim=c(3,4)) > a [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12  Here is how you reference one cell a[2,2] [1] 5  Arrays can have more than two dimensions. > w <- array(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18),dim=c(3,3,2)) > w
  • 22. Arrays & Matrix  R uses very clean syntax for referring to part of an array. You specify separate indices for each dimension, separated by commas > w[1,1,1] [1] 1  To get all rows (or columns) from a dimension, simply omit the indices > # first row only > a[1,] [1] 1 4 7 10 > # first column only > a[,1] [1] 1 2 3  A matrix is just a two-dimensional array > m <- matrix(data=c(1,2,3,4,5,6,7,8,9,10,11,12),nrow=3,ncol=4) > m [,1] [,2] [,3] [,4] [1,] 1 4 7 10 [2,] 2 5 8 11 [3,] 3 6 9 12
  • 23. Data Frames  A data frame is a list that contains multiple named vectors of same length  A data frame is a lot like a spreadsheet or a database table  Data frames are particularly good for representing data  Let’s construct a data frame with the win/loss results in the National League > teams <- c("PHI","NYM","FLA","ATL","WSN") > w <- c(92, 89, 94, 72, 59) > l <- c(70, 73, 77, 90, 102) > nleast <- data.frame(teams,w,l) > nleast teams w l 1 PHI 92 70 2 NYM 89 73 3 FLA 94 77 4 ATL 72 90 5 WSN 59 102  You can refer to the components of a data frame (or items in a list) by name using the $ operator >nleast$ teams
  • 24. Lists  It’s possible to construct more complicated structures with multiple data types.  R has a built-in data type for mixing objects of different types, called lists.  Lists in R may contain a heterogeneous selection of objects.  You can name each component in a list.  Items in a list may be referred to by either location or name.  Creating your first list > e <- list(thing="hat", size="8.25") > e  You can access an item in the list in multiple ways  Using the name with help of $ operator > e$thing  Using the location as index > e[1]  A list can even contain other lists
  • 25. Revision: Data Structures Some of the data types are: • Factor: Categorical variable • Vector • Matrix • Data Frame • List To identify the data type of an object we us the function class > library(datasets) > air <- airquality > class(air) > [1] "data.frame" Data Types
  • 26. Data Types To check whether the object/variable is of a certain type, use is. functions is.numeric(), is.character(), is.vector(), is.matrix(), is.data.frame() These are Logical functions Returns TRUE/FALSE values To convert an object/variable of a certain type to another, use as. functions as.numeric(), as.character(), as.vector(), as.matrix(), as.data.frame(), as.factor(), as.list() > is.numeric(airquality$Ozone) > [1] TRUE > airquality$Ozone <- as.character(airquality$Ozone) > is.numeric(airquality$Ozone) [1] FALSE > is.character(airquality$Ozone) > [1] TRUE
  • 27. Saving, Loading, and Editing Data  Create a few vectors > salary <- c(18700000,14626720,14137500,13980000,12916666) > position <- c("QB","QB","DE","QB","QB") > team <- c("Colts","Patriots","Panthers","Bengals","Giants") > name.last <- c("Manning","Brady","Pepper","Palmer","Manning") > name.first <- c("Peyton","Tom","Julius","Carson","Eli")  Use the data.frame function to combine the vectors > top.5.salaries <- data.frame(name.last,name.first,team,position,salary)  top.5.salaries  R allows you to save and load R data objects to external files  The simplest way to save an object is with the save function > save(top.5.salaries, file="C:/Documents and Settings/me/My Documents/top.5.salaries.Rdata")  Note that the file argument must be explicitly named  In R, file paths are always specified with forward slashes (“/”), even on Microsoft Windows and then assigns the result to the same symbol in the calling environment  You can easily load this object back into R with the load function > load("C:/Documents and Settings/me/My Documents/top.5.salaries.Rdata")
  • 28. Importing Data into R  read.csv  To read comma separated values into R  SYNTAX: read.csv(filepath)  Sample (social sector schemes file)  read.xlsx  To read data from Excel sheets into R  Requires library “xlsx”  SYNTAX: read.xlsx(filepath, sheetName=)  Tricky to use in case of Java version mismatch  read.dta  To read data from Stata files into R  Requires library “foreign”  SYNTAX: read.dta(filepath)  read.table  To read data from tables  A generic version of all the other formats mentioned above  SYNTAX: read.table(filepath)
  • 29. Working Directory: Truncated Filepaths  For reading files easily, one way is to specify working directory  Usual way:  file <- read.csv(“/Users/parthkhare/Documents/dataframe.csv”)  Truncated way:  getwd()  setwd(“/Users/parthkhare/Documents/”)  file<- read.csv(“dataframe.csv”)  Cheat way:  file<- read.csv(file.choose())
  • 30. R Packages  A package is a related set of functions, help files, and data files that have been bundled together  Typically, all of the functions in the package are related:  R offers an enormous number of packages:  Some of these packages are included with R, To get the list of packages loaded by default use the following commands, >getOption("defaultPackages") # This command omits the base package > (.packages())  To show all packages available > (.packages(all.available=TRUE)) > library() #new window will pop up showing you the set of available packages  Installing R package > install.packages(c("tree","maptree")) #This will install the packages to the default library specified by the variable .Library  Loading Packages > library(rpart)  Removing Packages > remove.packages(c("tree", "maptree"),.Library) # You need to specify the library where the packages were installed
  • 31. Getting Help  R includes a help system to help you get information about installed packages  To get help on a function, say glm() > help(glm) or, equivalently: > ?glm  The following can be very helpful if you can’t remember the name of a function; R will return a list of relevant topics > ??regression
  • 33. Names, Renaming Syntax : names(dataset) > names(airquality) 1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day" > names(airquality) <- NULL > names(airquality) > NULL Renaming In the following example we will change the variable name “Ozone” to”Oz” > names(airquality) <- org.names > names(airquality)[names(airquality)=="Ozone"]= "Oz" [1] "Oz" "Solar.R" "Wind" "Temp" "Month" "Day" #Renaming the second variable in data frame “airquality” to “NewName” > names(airquality)[2] = "Sol" > names(airquality) [1] "Oz" "Sol" "Wind" "Temp" "Month" "Day"
  • 34. Drop/Keep Variables  Selecting (Keeping) Variables • # select variables “Ozone “ and “Temp” > names(airquality) <- org.names > keep.airquality <- airquality[c("Ozone", “Temp")] # select 1st and 3rd through 5th variables > keep.airquality_1 <- airquality[c(1,3:5)]  Excluding (DROPPING) Variables • Dropping a variable from the dataset can be done by prefixing a “-” sign before the variable name or the variable index in the Dataframe. > drop.airquality <- airquality[,c(-3, -4)]
  • 35. Subsetting datasets Subseting is done by using subset function #subsetting the data set “airquality” where Temperature is greater than 80 > subset_1 <- subset(airquality, Temp>80) #subsetting the data set “airquality” where Temperature is greater than 80 and finally get only the “Day” column > subset_2 = subset(airquality, Temp>80, select=c(“Day")) #subsetting a column where Temperature is greater than 80 and Day is equal to 8, notice the “==” > subset_3 = subset(airquality, Temp<80& Day==8) #subsetting rows without using “subset” function, notice the [ ] square brackets > subset_4 = airquality[airquality$Temp==80, ] #We use the %in% notation when we want to subset rows on multiple values of a variable > subset_5 = airquality[airquality$Temp %in% c(70,90), ] > subset_5.1 = airquality[airquality$Temp %in% c(70:90), ]
  • 36. Appending  Appending two datasets require that both have exactly the same number of variables with exactly the same name. If using categorical data make sure the categories on both datasets refer to exactly the same thing (i.e. 1 “Agree”, 2”Disagree”).  If datasets do not have the same number of variables you can either drop or create them so both match.  rbind /smartbind (gtools package) function is used for appending the two dataframes. > headair <- head(airquality) > tailair <- tail(airquality) > append <- rbind(headair,tailair) > smartappend <- smartbind(headair,tailair)
  • 37. Sorting  To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order. Here are some examples.  sorting examples using the mtcars dataset attach(mtcars) # sort by hp in ascending order > sort.mtcars<-mtcars[order(mtcars$hp),] # sort by hp in discending order > sort.mtcars<-mtcars[order(-mtcars$hp),] #Multi level sort a dataset by columns in descending order, put a “-” sign, > sort.mtcars<-mtcars[order(vs, -mtcars$hp),]
  • 38. Remove Duplicate Values Duplicates are identified using “duplicated” function #To remove duplicate rows by 2nd column from airquality > dupair1 = airquality[!duplicated(airquality[,c(2)]),] #To get duplicate rows in another dataset just remove the “!” sign > dupair2 = airquality[duplicated(airquality[,c(2)]),]
  • 39. Merging 2 datasets  Merging two datasets require that both have at least one variable in common (either string or numeric). If string make sure the categories have the same spelling (i.e. country names, etc.).  Merge merges only common cases to both datasets . Adding the option “all=TRUE” includes all cases from both datasets.  To merge two data frames (datasets) horizontally, use the merge function. In most cases, you join two data frames by one or more common key variables (i.e., an inner join). • # merge two data frames by ID total <- merge(data frameA,data frameB,by="ID")  Different possible cases while merging data • a full outer join (all records from both tables) can be created with the "all" keyword: e.g. merge(d1,d2,all=TRUE) • a left outer join of two dataset can be created with all.x: e.g. merge(d1,d2,all.x=TRUE) • a right outer join of two dataset can be created with all.y: e.g. merge(d1,d2,all.y=TRUE)
  • 40. Date functions  Dates are represented as the number of days since 1970-01-01,with negative values for earlier date.  Sys.date() returns today’s date  Date()returns the current date and time  Date conversion : use as.date() to convert any string format to date format  Syntax:as.date(x,format=“ “,tz=..) Arguments: x:an object to be converted format: A character string. If not specified ,it will try “%Y-%m-%d” then “%Y/%m/%d” on the first non-NA element and give an error if neither works tz: a timezone name The following symbols can be used with the format( ) function to print dates Symbol Meaning Example %d day as a number (0-31) 01-31 %a %A abbreviated weekday unabbreviated weekday Mon Monday %m month (00-12) 00-12 %b %B abbreviated month unabbreviated month Jan January %y %Y 2-digit year 4-digit year 07 2007
  • 41. Useful Packages  The Reshape2 Package :  Melting:  When you melt a dataset, you restructure it into a format where each measured variable is in its own row, along with the ID variables needed to uniquely identify it  Syntax:melt(data, id=) Arguments: data:dataset that you want to melt id:Id variables  Example:consider the following table for the melt function library(reshape) md <- melt(mydata, id=(c("id", "time")))  Package ‘data.table’: Extension of data.frame for fast indexing, fast ordered joins,fast assignment, fast grouping and list columns  Package ‘plyr’: For splitting, applying and combining data  Package ‘stringr’ :Make it easier to work with strings ID Time X1 X2 1 1 5 6 1 2 3 5 2 1 6 1 2 2 2 4
  • 42. General Utility Function  which()  attach()  head()  tail()  with()  didq_summry()  sumry_continuos()  sumry_categorical()  cat_ident()  ident_cont()  ident_cat()
  • 43. General Utility Function  read.csv  read.xlsx  read.dta  read.table
  • 45. Special Values  NA  In R, the NA values are used to represent missing values. (NA stands for “not available.”)  You will encounter NA values in text loaded into R (to represent missing values) or in data loaded from databases (to replace NULL values)  If you expand the size of a vector (or matrix or array) beyond the size where values were defined, the new spaces will have the value NA (meaning “not available”)  Inf and -Inf  If a computation results in a number that is too big, R will return Inf for a positive number and -Inf for a negative number (meaning positive and negative infinity, respectively)  NaN  Sometimes, a computation will produce a result that makes little sense. In these cases, R will often return NaN (meaning “not a number”)  E.g. Inf – Inf or 0 / 0  NULL  Additionally, there is a null object in R, represented by the symbol NULL  The symbol NULL always points to the same object  NULL is often used as an argument in functions to mean that no value was assigned to the argument. Additionally, some functions may return NULL  NULL is not the same as NA, Inf, -Inf, or NaN

Hinweis der Redaktion

  1. What R and Data can do Once you decide on a question after rounds of iterations the next question is WHAT DATA ? Based on the experience of working with data in the Survey there are 3 lessons that I wish to share.
  2. Power of data: Nkorea, SKorea
  3. The building density on the ground provides an estimate of total build-up area (in square feet/km), which when interacted with zone specific guidance value of property tax per unit area gives an aggregate sum of potential property tax to be collected.
  4. The building density on the ground provides an estimate of total build-up area (in square feet/km), which when interacted with zone specific guidance value of property tax per unit area gives an aggregate sum of potential property tax to be collected. I just took you all through a journey of what potential data, creative thinking about data and Big data holds to influence and shape policy making
  5. Tables in bland format no utility:
  6. Open R and R Studio: Difference between them: Ram usage Objects: Symbols All 4 windows description
  7. X <- 1
  8. Data types vs data structures
  9. Board: vector, matrix, data frame, list[data structures]
  10. Operations: within R doing operations [Exercise]
  11. Operations: within R doing operations [Exercise]