SlideShare ist ein Scribd-Unternehmen logo
1 von 27
SHARETHIS
DATA ANALYSIS with R
Hassan Namarvar
2
WHAT IS R?
• R is a free software programming language and software
development for statistical computing and graphics.
• It is similar to S language developed at AT&T Bell Labs by Rick
Becker, John Chambers and Allan Wilks.
• R was initially developed by Ross Ihaka and Robert Gentleman
(1996), from the University of Auckland, New Zealand.
• R source code is written in C, Fortran, and R.
3
R PARADIGMS
Multi paradigms:
– Array
– Object-oriented
– Imperative
– Functional
– Procedural
– Reflective
4
STATISTICAL FEATURES
• Graphical Techniques
• Linear and nonlinear modeling
• Classical statistical tests
• Time-series analysis
• Classification
• Clustering
• Machine learning
5
PROGRAMMING FEATURES
• R is an interpreted language
• Access R through a command-line interpreter
• Like MATLAB, R supports matrix arithmetic
• Data structures:
– Vectors
– Metrics
– Array
– Data Frames
– Lists
6
ADVANTAGES OF R
• The most comprehensive statistical analysis package
available.
• Outstanding graphical capabilities
• Open source software – reviewed by experts
• R is free and licensed under the GNU.
• R has over 5,578 packages as of May 31, 2014!
• R is cross-platform. GNU/Linux, Mac, Windows.
• R plays well with CSV, SAS, SPSS, Excel, Access, Oracle, MySQL,
and SQLite.
7
HOW TO INSTALL R?
• Download an install the latest version from:
– http://cran.r-project.org
• Install packages from R Console:
– > install.packages(‘package_name’)
• R has its own LaTeX-like documentation:
– > help()
8
STARTING WITH R
• In R console:
– > x <- 2
– > x
– > y <- x^2
– > y
– > ls()
– > rm(y)
• Vectors:
– > v <- c(4, 7, 23.5, 76.2, 80)
– > Summary(v)
9
STARTING WITH R
• Histogram:
– > r <- rnorm(100)
– > summary(r)
– > plot(r)
– > hist(r)
• QQ-Plot (Quantile):
– > qqplot(r, rnorm(1000))
10
STARTING WITH R
• Factors:
– > g <- c(‘f’, ‘m’, ‘m’, ‘m’, ‘f’, ‘m’, ‘f’, ‘m’)
– > h <- factor(g)
– > table(g)
• Matrices:
– > r <- rnorm(100)
– > dim(r) <- c(50,2)
– > r
– > Summary(r)
– > M <- matrix(c(45, 23, 66, 77, 33, 44), 2, 3,
byrow=T)
11
STARTING WITH R
• Data Frames:
– > n = c(2, 3, 5)
– > s = c("aa", "bb", "cc")
– > b = c(TRUE, FALSE, TRUE)
– > df = data.frame(n, s, b)
• Built-in Data Set:
– > state.x77
– > st = as.data.frame(state.x77)
– > st$Density = st$Population * 1000 / st$Area
– > summary(st)
– > cor(st)
– > pairs(st)
12
STARTING WITH R
Population
3000 5500 68 71 40 55 0e+00 5e+05
015000
30005500
Income
Illiteracy
0.52.0
6871
Life Exp
Murder
2814
4055
HS Grad
Frost
0100
0e+005e+05
Area
0 15000 0.5 2.0 2 8 14 0 100 0 600
0600
Density
13
LINEAR REGRESSION MODEL IN R
• Linear Regression Model:
– > x <- 1:100
– > y <- x^3
– Model y = a + b . x
– > lm(y ~ x)
– > model <- lm(y ~ x)
– > summary(model)
– > par(mfrow=c(2,2))
– > plot(model)
14
LM MODEL
– Call:
– lm(formula = y ~ x)
– Residuals:
– Min 1Q Median 3Q Max
– -129827 -103680 -29649 85058 292030
– Coefficients:
– Estimate Std. Error t value Pr(>|t|)
– (Intercept) -207070.2 23299.3 -8.887 3.14e-14 ***
– x 9150.4 400.6 22.844 < 2e-16 ***
– ---
– Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
1
– Residual standard error: 115600 on 98 degrees of freedom
– Multiple R-squared: 0.8419, Adjusted R-squared: 0.8403
– F-statistic: 521.9 on 1 and 98 DF, p-value: < 2.2e-16
15
LM MODEL
0 20 40 60 80 100
0e+002e+054e+056e+058e+051e+06
y=x^3
x
y
16
DIAGNOSIS PLOT
-2e+05 2e+05 4e+05 6e+05
-1e+051e+053e+05
Fitted values
Residuals
Residuals vs Fitted
100
99
98
-2 -1 0 1 2
-10123
Theoretical Quantiles
Standardizedresiduals
Normal Q-Q
100
99
98
-2e+05 2e+05 4e+05 6e+05
0.00.51.01.5
Fitted values
Standardizedresiduals
Scale-Location
100
99
98
0.00 0.01 0.02 0.03 0.04
-10123
Leverage
Standardizedresiduals
Cook's distance
Residuals vs Leverage
100
99
98
17
LINEAR REGRESSION MODEL IN R
• Model Built-in Data:
– > colnames(st)[4] = "Life.Exp"
– > colnames(st)[6] = "HS.Grad"
– model1 = lm(Life.Exp ~ Population + Income
+ Illiteracy + Murder + HS.Grad + Frost +
Area + Density, data=st)
– > summary(model1)
– > model2 <- step(model1)
– > model3 = update(model2, .~.-Population)
– > Summary(model3)
18
LINEAR REGRESSION MODEL IN R
• Confidence limits on Estimated Coefficients:
– > confint(model3)
– > predict(model3, list(Murder=10.5,
HS.Grad=48, Frost=100))
19
OUTLIERS
• Boxplot:
– > v <- rnorm(100)
– > v = c(v,10)
– > boxplot(v)
– > rug(jitter(v), side=2)
-20246810
20
PROBABILITY DENSITY FUNCTION
• PDF:
– > r <- rnorm(1000)
– > hist(r, prob=T)
– > lines(density(r), col="red") Histogram of r
r
Density
-3 -2 -1 0 1 2 3
0.00.10.20.30.4
21
CASE STUDY: SHARETHIS EXAMPLE
• Relationship of clicks with winning price and Impression on
ADX:
• Data
– Analyzed ADX Hourly Impression Logs
• Method
– Detected outliers
– Predicted clicks using a regression tree model
22
CASE STUDY: SHARETHIS EXAMPLE
• Outlier Detection:
Clicks Impressions
23
CASE STUDY: SHARETHIS EXAMPLE
• Regression Tree
– One of the most powerful classification/regression
– > library(rpart)
– > fit <- rpart(log(CLK) ~ log(IMP) + AVG_PRICE +
SD_PRICE, data=x)
– > plot(fit)
– > text(fit)
– > plot(predict(fit), log(x$CLK))
24
CASE STUDY: SHARETHIS EXAMPLE
• Regression Tree
|
log(IMP)< 9.33
log(IMP)< 8.349 log(IMP)< 11.28
SD_PRICE< 0.2604
log(IMP)>=10.04 log(IMP)< 10.39
AVG_PRICE>=1.713 AVG_PRICE>=1.247
AVG_PRICE< 0.8555
log(IMP)< 12.49
0.751 1.387
1.541 2.869
1.959 2.729
3.003
3.104 4.331
3.577 4.753
25
CASE STUDY: SHARETHIS EXAMPLE
• Predict Log of Clicks
0 1 2 3 4 5 6 7
1234
log(x$CLK)
predict(fit)
26
CASE STUDY: COLOR DETECTION
• Detect color from product image:
-1.0 -0.5 0.0 0.5 1.0
-1.0-0.50.00.51.0
-1.0 -0.5 0.0 0.5 1.0
-1.0-0.50.00.51.0
-1.0 -0.5 0.0 0.5 1.0
-1.0-0.50.00.51.0
27
RESOURCES
• Books:
– An Introduction to Statistical Learning: with
Applications in R by G. James, D. Witten, T. Hatie,
R. Tibshirani, 2013
– The Art of R Programming: A Tour of Statistical
Software Design, N. Matloff, 2011
– R Cookbook (O'Reilly Cookbooks), P. Teetor, 2011
• R Blog:
– http://www.r-bloggers.com

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
3. R- list and data frame
3. R- list and data frame3. R- list and data frame
3. R- list and data frame
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
R Programming Language
R Programming LanguageR Programming Language
R Programming Language
 
Data Management in R
Data Management in RData Management in R
Data Management in R
 
Data
DataData
Data
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-export
 
R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming Fundamentals
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 
R studio
R studio R studio
R studio
 
R Programming: Introduction To R Packages
R Programming: Introduction To R PackagesR Programming: Introduction To R Packages
R Programming: Introduction To R Packages
 
Introduction to Rstudio
Introduction to RstudioIntroduction to Rstudio
Introduction to Rstudio
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
R programming slides
R  programming slidesR  programming slides
R programming slides
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
R Programming
R ProgrammingR Programming
R Programming
 

Andere mochten auch

Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in RDuyen Do
 
Discriminant analysis basicrelationships
Discriminant analysis basicrelationshipsDiscriminant analysis basicrelationships
Discriminant analysis basicrelationshipsdivyakalsi89
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)Dataspora
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with RGreat Wide Open
 
R programming Basic & Advanced
R programming Basic & AdvancedR programming Basic & Advanced
R programming Basic & AdvancedSohom Ghosh
 
R language tutorial
R language tutorialR language tutorial
R language tutorialDavid Chiu
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with RYanchang Zhao
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformSyracuse University
 
Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Dr. Volkan OBAN
 
Applied spatial data introducing
Applied spatial data introducingApplied spatial data introducing
Applied spatial data introducingHa Hoang
 
Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...Duyen Do
 
R programming language in spatial analysis
R programming language in spatial analysisR programming language in spatial analysis
R programming language in spatial analysisAbhiram Kanigolla
 
Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013BertrandDrouvot
 

Andere mochten auch (20)

Iris data analysis example in R
Iris data analysis example in RIris data analysis example in R
Iris data analysis example in R
 
Discriminant analysis basicrelationships
Discriminant analysis basicrelationshipsDiscriminant analysis basicrelationships
Discriminant analysis basicrelationships
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
 
Class ppt intro to r
Class ppt intro to rClass ppt intro to r
Class ppt intro to r
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
R for data analytics
R for data analyticsR for data analytics
R for data analytics
 
R programming Basic & Advanced
R programming Basic & AdvancedR programming Basic & Advanced
R programming Basic & Advanced
 
R language tutorial
R language tutorialR language tutorial
R language tutorial
 
R learning by examples
R learning by examplesR learning by examples
R learning by examples
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with R
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics Platform
 
Biopilot training centre @ vadodara
Biopilot training centre @ vadodaraBiopilot training centre @ vadodara
Biopilot training centre @ vadodara
 
Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple. Logistic Regression in R-An Exmple.
Logistic Regression in R-An Exmple.
 
Applied spatial data introducing
Applied spatial data introducingApplied spatial data introducing
Applied spatial data introducing
 
Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...Probability based learning (in book: Machine learning for predictve data anal...
Probability based learning (in book: Machine learning for predictve data anal...
 
Introtor
IntrotorIntrotor
Introtor
 
Building powerful dashboards with r shiny
Building powerful dashboards with r shinyBuilding powerful dashboards with r shiny
Building powerful dashboards with r shiny
 
R programming language in spatial analysis
R programming language in spatial analysisR programming language in spatial analysis
R programming language in spatial analysis
 
Data clustering
Data clustering Data clustering
Data clustering
 
Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013
 

Ähnlich wie Data analysis with R

Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Chia-Chi Chang
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavVyacheslav Arbuzov
 
Introduction to R.pptx
Introduction to R.pptxIntroduction to R.pptx
Introduction to R.pptxkarthikks82
 
statistical computation using R- an intro..
statistical computation using R- an intro..statistical computation using R- an intro..
statistical computation using R- an intro..Kamarudheen KV
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data scienceLong Nguyen
 
India software developers conference 2013 Bangalore
India software developers conference 2013 BangaloreIndia software developers conference 2013 Bangalore
India software developers conference 2013 BangaloreSatnam Singh
 
Spatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the PrettySpatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the PrettyNoam Ross
 
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStoraDBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStoraLinaCovington707
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RHappy Garg
 

Ähnlich wie Data analysis with R (20)

R
RR
R
 
Big datacourse
Big datacourseBig datacourse
Big datacourse
 
R programming by ganesh kavhar
R programming by ganesh kavharR programming by ganesh kavhar
R programming by ganesh kavhar
 
Rtutorial
RtutorialRtutorial
Rtutorial
 
Perm winter school 2014.01.31
Perm winter school 2014.01.31Perm winter school 2014.01.31
Perm winter school 2014.01.31
 
Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
Introduction to R.pptx
Introduction to R.pptxIntroduction to R.pptx
Introduction to R.pptx
 
Ch1
Ch1Ch1
Ch1
 
Seminar psu 20.10.2013
Seminar psu 20.10.2013Seminar psu 20.10.2013
Seminar psu 20.10.2013
 
statistical computation using R- an intro..
statistical computation using R- an intro..statistical computation using R- an intro..
statistical computation using R- an intro..
 
Language R
Language RLanguage R
Language R
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
 
India software developers conference 2013 Bangalore
India software developers conference 2013 BangaloreIndia software developers conference 2013 Bangalore
India software developers conference 2013 Bangalore
 
Spatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the PrettySpatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the Pretty
 
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStoraDBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
R lecture oga
R lecture ogaR lecture oga
R lecture oga
 

Mehr von ShareThis

ShareThis Canadian Millennials Study_2015
ShareThis Canadian Millennials Study_2015ShareThis Canadian Millennials Study_2015
ShareThis Canadian Millennials Study_2015ShareThis
 
Real time pipeline at terabyte sacle
Real time pipeline at terabyte sacleReal time pipeline at terabyte sacle
Real time pipeline at terabyte sacleShareThis
 
ShareThis TV Study
ShareThis TV StudyShareThis TV Study
ShareThis TV StudyShareThis
 
Q1/2015 ShareThis Consumer Sharing Trends Report
Q1/2015 ShareThis Consumer Sharing Trends ReportQ1/2015 ShareThis Consumer Sharing Trends Report
Q1/2015 ShareThis Consumer Sharing Trends ReportShareThis
 
ShareThis Finance Study
ShareThis Finance Study ShareThis Finance Study
ShareThis Finance Study ShareThis
 
DataScienceInnovation_ShareThis
DataScienceInnovation_ShareThisDataScienceInnovation_ShareThis
DataScienceInnovation_ShareThisShareThis
 
Share this influentialdemocrats_jan2015
Share this influentialdemocrats_jan2015Share this influentialdemocrats_jan2015
Share this influentialdemocrats_jan2015ShareThis
 
ShareThis TravelStudy-2014
ShareThis TravelStudy-2014ShareThis TravelStudy-2014
ShareThis TravelStudy-2014ShareThis
 
ShareThis Midterm Elections_2014
ShareThis Midterm Elections_2014ShareThis Midterm Elections_2014
ShareThis Midterm Elections_2014ShareThis
 
H2O platform workshop
H2O platform workshopH2O platform workshop
H2O platform workshopShareThis
 
Q3 2014 Consumer Sharing Trends Report
Q3 2014 Consumer Sharing Trends ReportQ3 2014 Consumer Sharing Trends Report
Q3 2014 Consumer Sharing Trends ReportShareThis
 
ShareThis_Return on a Share Study
ShareThis_Return on a Share StudyShareThis_Return on a Share Study
ShareThis_Return on a Share StudyShareThis
 
Share this millennial study_2014
Share this millennial study_2014Share this millennial study_2014
Share this millennial study_2014ShareThis
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieShareThis
 
ShareThis_CSTR_July2014
ShareThis_CSTR_July2014ShareThis_CSTR_July2014
ShareThis_CSTR_July2014ShareThis
 
Sharing Steals the Cup
Sharing Steals the CupSharing Steals the Cup
Sharing Steals the CupShareThis
 
ShareThis Auto Study
ShareThis Auto Study ShareThis Auto Study
ShareThis Auto Study ShareThis
 
ShareThis Return on a Share Study
ShareThis Return on a Share StudyShareThis Return on a Share Study
ShareThis Return on a Share StudyShareThis
 
ShareThis RoS
ShareThis RoS ShareThis RoS
ShareThis RoS ShareThis
 

Mehr von ShareThis (20)

ShareThis Canadian Millennials Study_2015
ShareThis Canadian Millennials Study_2015ShareThis Canadian Millennials Study_2015
ShareThis Canadian Millennials Study_2015
 
Real time pipeline at terabyte sacle
Real time pipeline at terabyte sacleReal time pipeline at terabyte sacle
Real time pipeline at terabyte sacle
 
ShareThis TV Study
ShareThis TV StudyShareThis TV Study
ShareThis TV Study
 
Q1/2015 ShareThis Consumer Sharing Trends Report
Q1/2015 ShareThis Consumer Sharing Trends ReportQ1/2015 ShareThis Consumer Sharing Trends Report
Q1/2015 ShareThis Consumer Sharing Trends Report
 
ShareThis Finance Study
ShareThis Finance Study ShareThis Finance Study
ShareThis Finance Study
 
DataScienceInnovation_ShareThis
DataScienceInnovation_ShareThisDataScienceInnovation_ShareThis
DataScienceInnovation_ShareThis
 
Share this influentialdemocrats_jan2015
Share this influentialdemocrats_jan2015Share this influentialdemocrats_jan2015
Share this influentialdemocrats_jan2015
 
ShareThis TravelStudy-2014
ShareThis TravelStudy-2014ShareThis TravelStudy-2014
ShareThis TravelStudy-2014
 
ShareThis Midterm Elections_2014
ShareThis Midterm Elections_2014ShareThis Midterm Elections_2014
ShareThis Midterm Elections_2014
 
H2O platform workshop
H2O platform workshopH2O platform workshop
H2O platform workshop
 
Q3 2014 Consumer Sharing Trends Report
Q3 2014 Consumer Sharing Trends ReportQ3 2014 Consumer Sharing Trends Report
Q3 2014 Consumer Sharing Trends Report
 
ShareThis_Return on a Share Study
ShareThis_Return on a Share StudyShareThis_Return on a Share Study
ShareThis_Return on a Share Study
 
Share this millennial study_2014
Share this millennial study_2014Share this millennial study_2014
Share this millennial study_2014
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on Oozie
 
ShareThis_CSTR_July2014
ShareThis_CSTR_July2014ShareThis_CSTR_July2014
ShareThis_CSTR_July2014
 
Sharing Steals the Cup
Sharing Steals the CupSharing Steals the Cup
Sharing Steals the Cup
 
ShareThis Auto Study
ShareThis Auto Study ShareThis Auto Study
ShareThis Auto Study
 
ShareThis Return on a Share Study
ShareThis Return on a Share StudyShareThis Return on a Share Study
ShareThis Return on a Share Study
 
Social TV
Social TVSocial TV
Social TV
 
ShareThis RoS
ShareThis RoS ShareThis RoS
ShareThis RoS
 

Kürzlich hochgeladen

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 

Kürzlich hochgeladen (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

Data analysis with R

  • 1. SHARETHIS DATA ANALYSIS with R Hassan Namarvar
  • 2. 2 WHAT IS R? • R is a free software programming language and software development for statistical computing and graphics. • It is similar to S language developed at AT&T Bell Labs by Rick Becker, John Chambers and Allan Wilks. • R was initially developed by Ross Ihaka and Robert Gentleman (1996), from the University of Auckland, New Zealand. • R source code is written in C, Fortran, and R.
  • 3. 3 R PARADIGMS Multi paradigms: – Array – Object-oriented – Imperative – Functional – Procedural – Reflective
  • 4. 4 STATISTICAL FEATURES • Graphical Techniques • Linear and nonlinear modeling • Classical statistical tests • Time-series analysis • Classification • Clustering • Machine learning
  • 5. 5 PROGRAMMING FEATURES • R is an interpreted language • Access R through a command-line interpreter • Like MATLAB, R supports matrix arithmetic • Data structures: – Vectors – Metrics – Array – Data Frames – Lists
  • 6. 6 ADVANTAGES OF R • The most comprehensive statistical analysis package available. • Outstanding graphical capabilities • Open source software – reviewed by experts • R is free and licensed under the GNU. • R has over 5,578 packages as of May 31, 2014! • R is cross-platform. GNU/Linux, Mac, Windows. • R plays well with CSV, SAS, SPSS, Excel, Access, Oracle, MySQL, and SQLite.
  • 7. 7 HOW TO INSTALL R? • Download an install the latest version from: – http://cran.r-project.org • Install packages from R Console: – > install.packages(‘package_name’) • R has its own LaTeX-like documentation: – > help()
  • 8. 8 STARTING WITH R • In R console: – > x <- 2 – > x – > y <- x^2 – > y – > ls() – > rm(y) • Vectors: – > v <- c(4, 7, 23.5, 76.2, 80) – > Summary(v)
  • 9. 9 STARTING WITH R • Histogram: – > r <- rnorm(100) – > summary(r) – > plot(r) – > hist(r) • QQ-Plot (Quantile): – > qqplot(r, rnorm(1000))
  • 10. 10 STARTING WITH R • Factors: – > g <- c(‘f’, ‘m’, ‘m’, ‘m’, ‘f’, ‘m’, ‘f’, ‘m’) – > h <- factor(g) – > table(g) • Matrices: – > r <- rnorm(100) – > dim(r) <- c(50,2) – > r – > Summary(r) – > M <- matrix(c(45, 23, 66, 77, 33, 44), 2, 3, byrow=T)
  • 11. 11 STARTING WITH R • Data Frames: – > n = c(2, 3, 5) – > s = c("aa", "bb", "cc") – > b = c(TRUE, FALSE, TRUE) – > df = data.frame(n, s, b) • Built-in Data Set: – > state.x77 – > st = as.data.frame(state.x77) – > st$Density = st$Population * 1000 / st$Area – > summary(st) – > cor(st) – > pairs(st)
  • 12. 12 STARTING WITH R Population 3000 5500 68 71 40 55 0e+00 5e+05 015000 30005500 Income Illiteracy 0.52.0 6871 Life Exp Murder 2814 4055 HS Grad Frost 0100 0e+005e+05 Area 0 15000 0.5 2.0 2 8 14 0 100 0 600 0600 Density
  • 13. 13 LINEAR REGRESSION MODEL IN R • Linear Regression Model: – > x <- 1:100 – > y <- x^3 – Model y = a + b . x – > lm(y ~ x) – > model <- lm(y ~ x) – > summary(model) – > par(mfrow=c(2,2)) – > plot(model)
  • 14. 14 LM MODEL – Call: – lm(formula = y ~ x) – Residuals: – Min 1Q Median 3Q Max – -129827 -103680 -29649 85058 292030 – Coefficients: – Estimate Std. Error t value Pr(>|t|) – (Intercept) -207070.2 23299.3 -8.887 3.14e-14 *** – x 9150.4 400.6 22.844 < 2e-16 *** – --- – Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 – Residual standard error: 115600 on 98 degrees of freedom – Multiple R-squared: 0.8419, Adjusted R-squared: 0.8403 – F-statistic: 521.9 on 1 and 98 DF, p-value: < 2.2e-16
  • 15. 15 LM MODEL 0 20 40 60 80 100 0e+002e+054e+056e+058e+051e+06 y=x^3 x y
  • 16. 16 DIAGNOSIS PLOT -2e+05 2e+05 4e+05 6e+05 -1e+051e+053e+05 Fitted values Residuals Residuals vs Fitted 100 99 98 -2 -1 0 1 2 -10123 Theoretical Quantiles Standardizedresiduals Normal Q-Q 100 99 98 -2e+05 2e+05 4e+05 6e+05 0.00.51.01.5 Fitted values Standardizedresiduals Scale-Location 100 99 98 0.00 0.01 0.02 0.03 0.04 -10123 Leverage Standardizedresiduals Cook's distance Residuals vs Leverage 100 99 98
  • 17. 17 LINEAR REGRESSION MODEL IN R • Model Built-in Data: – > colnames(st)[4] = "Life.Exp" – > colnames(st)[6] = "HS.Grad" – model1 = lm(Life.Exp ~ Population + Income + Illiteracy + Murder + HS.Grad + Frost + Area + Density, data=st) – > summary(model1) – > model2 <- step(model1) – > model3 = update(model2, .~.-Population) – > Summary(model3)
  • 18. 18 LINEAR REGRESSION MODEL IN R • Confidence limits on Estimated Coefficients: – > confint(model3) – > predict(model3, list(Murder=10.5, HS.Grad=48, Frost=100))
  • 19. 19 OUTLIERS • Boxplot: – > v <- rnorm(100) – > v = c(v,10) – > boxplot(v) – > rug(jitter(v), side=2) -20246810
  • 20. 20 PROBABILITY DENSITY FUNCTION • PDF: – > r <- rnorm(1000) – > hist(r, prob=T) – > lines(density(r), col="red") Histogram of r r Density -3 -2 -1 0 1 2 3 0.00.10.20.30.4
  • 21. 21 CASE STUDY: SHARETHIS EXAMPLE • Relationship of clicks with winning price and Impression on ADX: • Data – Analyzed ADX Hourly Impression Logs • Method – Detected outliers – Predicted clicks using a regression tree model
  • 22. 22 CASE STUDY: SHARETHIS EXAMPLE • Outlier Detection: Clicks Impressions
  • 23. 23 CASE STUDY: SHARETHIS EXAMPLE • Regression Tree – One of the most powerful classification/regression – > library(rpart) – > fit <- rpart(log(CLK) ~ log(IMP) + AVG_PRICE + SD_PRICE, data=x) – > plot(fit) – > text(fit) – > plot(predict(fit), log(x$CLK))
  • 24. 24 CASE STUDY: SHARETHIS EXAMPLE • Regression Tree | log(IMP)< 9.33 log(IMP)< 8.349 log(IMP)< 11.28 SD_PRICE< 0.2604 log(IMP)>=10.04 log(IMP)< 10.39 AVG_PRICE>=1.713 AVG_PRICE>=1.247 AVG_PRICE< 0.8555 log(IMP)< 12.49 0.751 1.387 1.541 2.869 1.959 2.729 3.003 3.104 4.331 3.577 4.753
  • 25. 25 CASE STUDY: SHARETHIS EXAMPLE • Predict Log of Clicks 0 1 2 3 4 5 6 7 1234 log(x$CLK) predict(fit)
  • 26. 26 CASE STUDY: COLOR DETECTION • Detect color from product image: -1.0 -0.5 0.0 0.5 1.0 -1.0-0.50.00.51.0 -1.0 -0.5 0.0 0.5 1.0 -1.0-0.50.00.51.0 -1.0 -0.5 0.0 0.5 1.0 -1.0-0.50.00.51.0
  • 27. 27 RESOURCES • Books: – An Introduction to Statistical Learning: with Applications in R by G. James, D. Witten, T. Hatie, R. Tibshirani, 2013 – The Art of R Programming: A Tour of Statistical Software Design, N. Matloff, 2011 – R Cookbook (O'Reilly Cookbooks), P. Teetor, 2011 • R Blog: – http://www.r-bloggers.com

Hinweis der Redaktion

  1. Client Interview Position the upcoming as introductory and a launching pad for further exploration To get started, want to share a brief video that’s been helpful for our partners …