SlideShare a Scribd company logo
1 of 6
Introduction to Data
   Analysis with R
       Dani Solà
What is R?
●   “R is a language and environment for statistical
    computing and graphics”
●   Paradigms: array, object-oriented, imperative,
    functional, procedural, reflective
●   Everything resides in memory (no big data)
●   Easy to get started!
Why R?
●   Free Software (GNU General Public License)
●   Mature, v1.0 released on 2000
●   Widely used
●   Good documentation and manuals
●   Lots of freely available packages
●   Excellent graphic capabilities
Getting the data (CSV)
●   MySQL
        SELECT * INTO OUTFILE '/path/to/file.csv'
        FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
        ESCAPED BY ‘’
        LINES TERMINATED BY 'n'
        FROM table WHERE <condition>;

●   Hive + sed
     INSERT OVERWRITE LOCAL DIRECTORY '/tmp_path/'
     SELECT * FROM table
     WHERE <condition>;

     cat /tmp_path/* | sed 's/[Ctrl-V][Ctrl-A]/t/g' > out.txt


●   Consider sampling!
Linear Regression
y=α+β x
          n

̂
   ∑i=1 ( xi − ̄ )( y i − ̄ ) Cov [ x , y ]
               x          y
β=                           =
         n
                               Var [ x ]
     ∑i=1 ( x i − ̄ )
                    x   2


̂ y ̂
α= ̄ −β x


Just use lm() in R!
 (But check the assumptions)
Want more?
●   Computing for Data Analysis – Roger D. Peng
             www.coursera.org/course/compdata

●   Statistics One – Andrew Conway
               www.coursera.org/course/stats1

●   An Introduction to R – The R Core Team
      cran.r-project.org/doc/manuals/r-release/R-intro.pdf

More Related Content

Viewers also liked

Simple Log Analysis and Trending
Simple Log Analysis and TrendingSimple Log Analysis and Trending
Simple Log Analysis and TrendingMike Brittain
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply FunctionSakthi Dasans
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyrRomain Francois
 
Marketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success RatesMarketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success RatesRevolution Analytics
 
Data Visualization: Introduction to Shiny Web Applications
Data Visualization: Introduction to Shiny Web ApplicationsData Visualization: Introduction to Shiny Web Applications
Data Visualization: Introduction to Shiny Web ApplicationsOlga Scrivner
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Ram Narasimhan
 
Research (kinds, characteristics and purposes)
Research (kinds, characteristics and purposes)Research (kinds, characteristics and purposes)
Research (kinds, characteristics and purposes)Draizelle Sexon
 
Communication Research Methods
Communication Research MethodsCommunication Research Methods
Communication Research MethodsJenny Donley
 
Fyp Presentation
Fyp PresentationFyp Presentation
Fyp PresentationArsalan Mir
 
Communication Research PPT
Communication Research PPTCommunication Research PPT
Communication Research PPTremueller3
 
Communication research ppt
Communication research pptCommunication research ppt
Communication research pptNeha Shetty
 
Meaning & purposes of research
Meaning & purposes of researchMeaning & purposes of research
Meaning & purposes of researchNursing Path
 
Statistics in nursing research
Statistics in nursing researchStatistics in nursing research
Statistics in nursing researchNursing Path
 
UTILIZATION OF NURSING RESEARCH
UTILIZATION OF NURSING RESEARCHUTILIZATION OF NURSING RESEARCH
UTILIZATION OF NURSING RESEARCHMagi Xavier
 
Critiquing research
Critiquing researchCritiquing research
Critiquing researchNursing Path
 
Data analysis and Interpretation
Data analysis and InterpretationData analysis and Interpretation
Data analysis and InterpretationMehul Gondaliya
 
Communication in nursing
Communication in nursingCommunication in nursing
Communication in nursingNursing Path
 

Viewers also liked (20)

Nursing research
Nursing researchNursing research
Nursing research
 
Simple Log Analysis and Trending
Simple Log Analysis and TrendingSimple Log Analysis and Trending
Simple Log Analysis and Trending
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyr
 
Marketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success RatesMarketing Analytics with R Lifting Campaign Success Rates
Marketing Analytics with R Lifting Campaign Success Rates
 
Data Visualization: Introduction to Shiny Web Applications
Data Visualization: Introduction to Shiny Web ApplicationsData Visualization: Introduction to Shiny Web Applications
Data Visualization: Introduction to Shiny Web Applications
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 
Research (kinds, characteristics and purposes)
Research (kinds, characteristics and purposes)Research (kinds, characteristics and purposes)
Research (kinds, characteristics and purposes)
 
Communication Research Methods
Communication Research MethodsCommunication Research Methods
Communication Research Methods
 
Fyp Presentation
Fyp PresentationFyp Presentation
Fyp Presentation
 
Communication Research PPT
Communication Research PPTCommunication Research PPT
Communication Research PPT
 
Communication research ppt
Communication research pptCommunication research ppt
Communication research ppt
 
Meaning & purposes of research
Meaning & purposes of researchMeaning & purposes of research
Meaning & purposes of research
 
Statistics in nursing research
Statistics in nursing researchStatistics in nursing research
Statistics in nursing research
 
Data analysis
Data analysisData analysis
Data analysis
 
UTILIZATION OF NURSING RESEARCH
UTILIZATION OF NURSING RESEARCHUTILIZATION OF NURSING RESEARCH
UTILIZATION OF NURSING RESEARCH
 
Critiquing research
Critiquing researchCritiquing research
Critiquing research
 
Level Of Measurement
Level Of MeasurementLevel Of Measurement
Level Of Measurement
 
Data analysis and Interpretation
Data analysis and InterpretationData analysis and Interpretation
Data analysis and Interpretation
 
Communication in nursing
Communication in nursingCommunication in nursing
Communication in nursing
 

Introduction to Data Analysis with R

  • 1. Introduction to Data Analysis with R Dani Solà
  • 2. What is R? ● “R is a language and environment for statistical computing and graphics” ● Paradigms: array, object-oriented, imperative, functional, procedural, reflective ● Everything resides in memory (no big data) ● Easy to get started!
  • 3. Why R? ● Free Software (GNU General Public License) ● Mature, v1.0 released on 2000 ● Widely used ● Good documentation and manuals ● Lots of freely available packages ● Excellent graphic capabilities
  • 4. Getting the data (CSV) ● MySQL SELECT * INTO OUTFILE '/path/to/file.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY ‘’ LINES TERMINATED BY 'n' FROM table WHERE <condition>; ● Hive + sed INSERT OVERWRITE LOCAL DIRECTORY '/tmp_path/' SELECT * FROM table WHERE <condition>; cat /tmp_path/* | sed 's/[Ctrl-V][Ctrl-A]/t/g' > out.txt ● Consider sampling!
  • 5. Linear Regression y=α+β x n ̂ ∑i=1 ( xi − ̄ )( y i − ̄ ) Cov [ x , y ] x y β= = n Var [ x ] ∑i=1 ( x i − ̄ ) x 2 ̂ y ̂ α= ̄ −β x Just use lm() in R! (But check the assumptions)
  • 6. Want more? ● Computing for Data Analysis – Roger D. Peng www.coursera.org/course/compdata ● Statistics One – Andrew Conway www.coursera.org/course/stats1 ● An Introduction to R – The R Core Team cran.r-project.org/doc/manuals/r-release/R-intro.pdf