SlideShare ist ein Scribd-Unternehmen logo
1 von 80
a Taste of R Programming
Kyle Akepanidtaworn,
LSESU Data Science
Society
About Me
• Founder and President of LSESU Data Science Society
(2016)
• A General Course Student at LSE studying Econ & Stats
• Former Big Data Intern at IMC Institute, Thailand
• Former Teaching Assistant at Wesleyan University
• Former Quantitative Consultant for the Connection in the
course “intro to statistical consulting“ at Wesleyan
• Programming & Stats Packages: R, Python, SPSS, SAS,
STATA
• Business Intelligence Tools: Tableau, Qilkview
• Linkedin: https://uk.linkedin.com/in/korkridakepan
7 Quick Facts about R
• R is the highest paid IT skill (Dice.com
survey, January 2014)
• R most-used data science language after
SQL (O'Reilly survey, January 2014)
• R is used by 70% of data miners (Rexer
survey, October 2013)
• R is #15 of all programming languages
(RedMonk language rankings, January
2014)
• R growing faster than any other data
science language (KDNuggets survey,
August 2013)
• R is the #1 Google Search for Advanced
Analytics software (Google Trends, March
2014)
• R has more than 2 million users
worldwide (Oracle estimate, February
2012)
• http://blog.revolutionanalytics.com/2014/0
4/seven-quick-facts-about-r.html
What is R?
• Developed by Ross Ihaka and Robert
Gentleman (statistician)
• First appeared Aug 1993; 23 years ago
• Some capabilities of R include:
Software development
Data analysis and visualization
Polling, surveys of data miners
Shiny application development
Writing project report
Creating the HTML presentation
Data Science War: R vs. Python
Source: Which #superheroe are you?(#batman Vs. #Superman) == (#R Vs. #Python)?
R vs Python vs SAS (Analytics Vidhya)
Experience the Power of R
Language!
Why Learn R?
Outstanding Graphs
Big Community!
Friendly to New Users and Non-programmers
Extremely Comprehensive
Flexible & Fun!
Open-source Language
Cross-Platform Compatibility
Advanced Statistical Language
• Facebook - For behavior analysis related to status
updates and profile pictures.
• Google - For advertising effectiveness and economic
forecasting.
• Twitter - For data visualization and semantic
clustering
• Microsoft - Acquired Revolution R company and use
it for a variety of purposes.
• Uber - For statistical analysis
• Airbnb - Scale data science.
• IBM - Joined R Consortium Group
• ANZ - For credit risk modeling
• HP
• Ford
• Novartis
• Roche
• New York Times - For data visualization
• Mckinsey
• BCG
• Bain
Companies Using R
Installation Guide
1. Go to https://cran.r-project.org/ (The Comprehensive R Archive Network)
2. Choose the platform (either Windows or Mac) that suits you
3. Follow the installation instruction…nothing tricky here
4. Download R-Studio, which is an add-on user interface of R programming.
https://www.rstudio.com/
R Studio User Interface
Remark
I encourage everyone to follow the latest development of R programming via R-Bloggers,
CRAN R, and R Studio websites. There are always a tremendous number of developers
who help ease the analysis task for the R users.
A Transition from Microsoft Excel to R?
• Peter Flom, Independent statistical consultant for researchers in behavioral, social and
medical sciences, has a compelling argument why Excel is such an undervalued tool for data
analysis.
• Excel isn’t undervalued as a tool for statistical analysis. If anything, it’s overvalued as such a
tool.
• Most competent analysts do not use Excel, not because it’s too easy, but other analytical tools
have more statistical capabilities.
• The default graphs in Excel are awful…other visualization tools outperform Microsoft Excel.
• Learning statistics in Excel sometimes gives an imaginary idea about data analysis. Doing
good statistics requires rigorously intensive training.
• Excel cannot handle big data. If you are dealing with more than 1+ million data points, you
need to seek help from R or Python.
• It makes it harder than other programs to check the assumptions we made in analysis.
Migrate Microsoft Excel to R?
Teaching Outline
Chapter 1: R in Point-and-Click
• Rcommander
• Menu in Rcommander
• R vs. STATA vs. IBM SPSS
• Why is Coding Critical?
Chapter 2: Basics of R Programming
• Basic and Complex Numerical Operations
• R Basic Data Types
 Numeric
 Integer
 Complex
 Logical
 Character
Chapter 2: Basics of R (Cont’)
• Matrix
• Vector
• List
• Data Frame
• For-Loop
• Writing your functions
Chapter 3: R for Data Science
• Using External Data
• Exploratory Data Analysis (EDA)
• Predictive Modelling
 Linear Regression
 Classification
 Clustering
RCommanderBasics in R in a non-programming way
R Commanders
• Enables analysts to access a
selection of commonly-used R
commands.
• Serves the important role of helping
users to implement R commands
and develop their knowledge and
expertise in using the command line.
• Comes with a number of plugins
available that provide direct access
to R packages.
The Complete Menu Trees: Rcmdr (i)
The Complete Menu Trees: Rcmdr (ii)
The Complete Menu Trees: Rcmdr (iii)
The Complete Menu Trees: Rcmdr (iv)
Comparing the statistical capabilities of
software packages
• A statistical consultant known only as "Stanford PhD" has put together a table comparing the
statistical capabilities of the software packages R, Matlab, SAS, Stata and SPSS.
Comparing the statistical capabilities of
software packages
• For each of 57 methods (including techniques like "ridge regression", "survival analysis",
"optimization") the author ranks the capabilities of each software package as "Yes" (fully
supported), "Limited" or "Experimental".
• R and Matlab capabilities outperform those of SAS, STATA, and SPSS.
• Python, to the best of my knowledge, is not rich in statistical testing functions, so it lies
somewhere between R and SAS.
R 57
Matlab 57
SAS 42
Stata 29
SPSS 20
Should economists learn programming?
Of course! As Keynes said: "The master-economist must possess a rare combination of gifts ....
He must be mathematician, historian, statesman, programmer, philosopher -- in some degree.
He must understand symbols, write code, and speak in words. He must contemplate the
particular, in terms of the general, and touch abstract and concrete in the same flight of thought.
He must study the present in the light of the past for the purposes of the future. He must be able
to speak a common language with a computer scientist, a physicist and a sociologist. No part of
man's nature or his institutions must be entirely outside his regard. He must be purposeful and
disinterested in a simultaneous mood, as aloof and incorruptible as an artist, yet sometimes as
near to earth as a politician.”
--- Alex Teytelboym, Research Fellow in Economics at INET, University of Oxford
Why Coding? (I)
• As a social science student at LSE, managing, analyzing, and playing with data is an
important part of your work. (charts, curves, and trends etc.)
• Without programming skills, your work becomes more limited.
• Are you always relying upon manual calculations?
• Are you hand-collecting the data when you can write the code to easily retrieve data?
• Are you working with big data? Do you think excel will solve all data problems?
• With code, you can increase multiply by a huge factor the amount of work or calculations
you can perform, read millions of rows of data, try and find patterns or relations, compare
oil prices to Reddit traffic, or the natality rate to the average interest earned by investors in
Wyoming; whatever you can think of in a matter of minutes or hours and unleash your
imagination.
Why Coding? (II)
• Many experimental data requires you pull, clean and manipulate large sets of available
and incoming data to run experiments based on some economic question you're testing.
• If you can write code to do these tasks quickly and efficiently, you can iterate quickly
through a lot more hypotheses you might want to test.
• The cutting edge of economic research uses novel datasets and combines both theory
and empirics.
• Everyone in this room has different expectations of what they want to be able to do with
data. In social science, analytics is very important, while computer programming with
C++, Java, and similar languages are hardly necessary.
Everybody in this country
should learn to program
a computer… because it
teaches you how to think
- Steve Jobs, Co-Founder and CEO of Apple Inc. (1995 -2011)
R QuintessenceThe most disastrous thing that you can ever learn is
your first programming language – Alan Kay
R Arithmetic and Logical Operators
Hey! Note that R is case
sensitive!
R Basic Data Types
• Numeric
• Integer
• Complex
• Logical
• Character
Constants, variables and data types – BBC BiteSize Programming Course GCSE
R Data Structures (I)
Figure: R Data Structures (AmazonS3 Website)
R Data Structures (II)
R Data Structures (III)
What are “Loops”?
• “Looping”, “cycling”, “iterating” or just replicating instructions is an old practice that
originated well before the invention of computers. It is nothing more than automating a
multi-step process by organizing sequences of actions or ‘batch’ processes and by
grouping the parts that need to be repeated.
• All modern programming languages provide special constructs that allow for the repetition
of instructions or blocks of instructions.
• Broadly speaking, there are two types of these special constructs or loops in modern
programming languages. Some loops execute for a prescribed number of times, as
controlled by a counter or an index, incremented at each iteration cycle. These are part of
the for loop family.
• On the other hand, some loops are based on the onset and verification of a logical
condition. The condition is tested at the start or the end of the loop construct. These
variants belong to the while or repeat family of loops, respectively.
Looping in R
Figure: a Tutorial on Loops in R (DataCamp)
Functions (I)
• Functions are used to logically break our code into simpler parts which become easy to
maintain and understand.
• It's pretty straightforward to create your own function in R programming.
Functions (II)
• Conceptually, given some inputs of x, we perform some computation to get the new
output.
• Some commonly known functions are mean, median, square root and summation etc.
Functions (III)
• Fortunately, R provides some built-in functions that are widely used in mathematics and statistics:
Functions (IV)
Functions (V)
• Writing functions should spring to your
mind when you want to write your own
chunk of codes and automate codes
easily.
• Creating your own functions begs some
imagination and efficient coding skill.
• Please revisit the workshop file for
example in action!
• Peace of mind: a vast community of R
developers around the world collaborates
in providing R useful packages, which
saves us a lot of time and effort.
• As you progress in data analysis with R,
finding the right packages may provide a
shortcut for your research project.
Top 10 R Packages
Packages are collections of R functions, data, and
compiled code in a well-defined format. The directory
where packages are stored is called the library.
- Quick-R
R for Data ScienceWithout data, you’re just another person with an opinion
How to Import the Data
Importing your data into R – R Tutorials by R-Bloggers
“The simple graph has brought more information to the
data analyst’s mind than any other device.”
- John Tukey
R for Data Science: Data Visualization
• R has several systems for making
graphs, but ggplot2 is one of the most
elegant and most versatile. ggplot2
implements the grammar of graphics,
a coherent system for describing and
building graphs. With ggplot2, you can
do more faster by learning one system
and applying it in many places.
• If you’d like to learn more about the
theoretical underpinnings of ggplot2
before you start, I’d recommend
reading “The Layered Grammar of
Graphics”,
http://vita.had.co.nz/papers/layered-
grammar.pdf.
Examples of R Visualizations (I)
Examples of R
Visualizations
(II)
Examples of R Visualizations (III)
“Avoiding Chart Junks”, Tufte
Storytelling with data –
a data visualization guide for business
professionals
Simple Framework of Machine Learning
Statistical Modelling: Linear Regression
Statistical Modelling: Logistic Regression
Clustering Algorithms
Machine Learning Workflow
R Studio Tips and Tricks
R Studio Tips and Tricks
These are not exactly coding tricks, but rather ways to make your life easier using key
commands.
• The up arrow on your keyboard will allow you to scroll up through your past commands
• The tab key on your keyboard will help you (particularly in RStudio) by offering ways to
finish your code.
• When working within a .R or .Rmd file, you can put your cursor on a line and hit Cntrl +
Enter to get the code to execute in the Console. (On a mac, Command + Enter.)
• If you get stuck with some syntax (usually, mismatched parentheses or quotes), the R
Console will change from the > at the beginning of the line (which means it is waiting for a
new command) to the + at the beginning of the line (which means it is waiting for you to
finish a command). To get out, hit the Escape key.
Tearable Panes
Tearable panes are anything but terrible. This feature allows users to tear off data view
panes and source panes facilitating the use of multiple screens.
Command History
In the console it is possible to scroll through the command history by clicking Ctrl/Cmd and ↑.
The command history will be filtered as code is typed into the console:
History Pane
The history pane shows a searchable list of commands that have been run. Commands can
be written to the source pane or the console. No more copy and paste from the console to a
script!
Rename in Scope
This feature makes it easy to rename all instances of a variable. The tool is context aware;
changing ‘m’ to ‘m1’ won’t change ‘mtcars’ to ‘m1tcars’.
Gallery and Satellite View in Notebooks
A new feature built into R Notebooks, a code chunk that produces multiple plots will produce
a gallery. The plots can be viewed by toggling between thumbnails. The gallery can be
expanded into a new satellite window for closer inspection.
Code Outline
Save time scrolling with the code outline. This feature works for R Notebooks and traditional
R scripts. In R Notebooks sections are delimited by the R Markdown headers. In R scripts
sections are delimited by section comments (Try Code -> Insert Section).
Code Snippets
Code snippets are a shortcut to insert common boilerplate code. For instance, type fun and
then Tab to insert the skeleton code for a function definition. Then hit Tab to replace the
necessary components. In addition to a rich set of defaults, custom code snippets can also
be created.
File Navigation
Many people know of RStudio’s rich set of tab complete options for functions and function
arguments. Tab complete can also help find files and remove the hassle of writing out long
path locations. Hit tab in between two double quotes (“ “) to open a file explorer.
Jump To Function Definition
Want to dig into the innards of a function? With the cursor on a function press F2 to jump to
the function definition, even for functions in a package.
Thanks!
ANY QUESTIONS?
You can find me at
@Korkrid Ake
datasciencesoc@lsesu.org
Further Resources
R Language
 Advanced R: http://adv-r.had.co.nz/
 Bioconductor: https://www.bioconductor.org/
 CRANberries:
http://dirk.eddelbuettel.com/cranberries/
 MRAN:
https://mran.revolutionanalytics.com/
 rOpenSci: https://ropensci.org/
 R Project: https://www.r-project.org/
 The R Journal: https://journal.r-project.org/
R Community
 R Consortium: https://www.r-consortium.org/
 R Weekly: https://rweekly.org/
Data Sciences
 Apache Hadoop: http://hadoop.apache.org/
 KDnuggets: http://www.kdnuggets.com/
 R for Data Science: http://r4ds.had.co.nz/
 sparklyr: http://spark.rstudio.com/
 SparkR:
https://spark.apache.org/docs/latest/sparkr.h
tml
 Tessera: http://tessera.io/
Further Resources
Blogs
 RStudio Blog: https://blog.rstudio.org/
 BLOGR: https://drsimonj.svbtle.com/
 Mad (Data) Scientist:
https://matloff.wordpress.com/
 R Bloggers: https://www.r-bloggers.com/
 R Consortium Blog: https://www.r-
consortium.org/news/blog
 Revolutions Blog:
http://blog.revolutionanalytics.com
 rOpenSci Blog: https://ropensci.org/blog/
 Simply Statistics: http://simplystatistics.org/
 Statistical Modeling, Causal Inference, and Social
Science: http://andrewgelman.com/
 StatsBlogs: http://www.statsblogs.com/
 Win-Vector Blog: http://www.win-vector.com/blog/
Statistics
 Journal of Statistical Software:
https://www.jstatsoft.org/index
 Forecasting: principles and practice:
https://www.otexts.org/fpp
 From Algorithms to Z-Scores:
http://heather.cs.ucdavis.edu/~matloff/132/PLN/P
robStatBookW16ECS132.pdf
 Statistical Foundations of Machine Learning:
https://www.otexts.org/book/sfml
 The Elements of Statistical Learning:
http://statweb.stanford.edu/~tibs/ElemStatLearn/p
rintings/ESLII_print10.pdf
DataCamp: Learning R by Doing
Udemy R Courses
• Another company is Udemy. While they do not offer video + interactive sessions like DataCamp
• They do offer extensive video lessons, covering some other topics in using R and learning statistics.
• The Comprehensive Programming in R Course (25 Hours of video)
• Graphs in R (ggplot2, plotrix, base R) – Data Visualization with R Programming Language (5 Hours of video)
• Linear Mixed-Effects Models with R (11 Hours of video)
• Multivariate Data Visualization with R (7 Hours of video)
• Applied Multivariate Analysis with R (13 Hours of video)
• More Data Mining with R (11 Hours of video)
• Text Mining, Scraping and Sentiment Analysis with R (4 Hours of video)
• R Programming for Simulation and Monte Carlo Methods (12 Hours of video)
• Programming Statistical Applications in R (12 Hours of video)
• Comprehensive Linear Modeling with R (15 Hours of video)
• Bayesian Computational Analyses with R (12 Hours of video)
• Time Series Analysis and Forecasting in R (3 Hours of video)

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

R Programming
R ProgrammingR Programming
R Programming
 
R programming presentation
R programming presentationR programming presentation
R programming presentation
 
Class ppt intro to r
Class ppt intro to rClass ppt intro to r
Class ppt intro to r
 
R programming Language , Rahul Singh
R programming Language , Rahul SinghR programming Language , Rahul Singh
R programming Language , Rahul Singh
 
R programming
R programmingR programming
R programming
 
Why R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics PlatformWhy R? A Brief Introduction to the Open Source Statistics Platform
Why R? A Brief Introduction to the Open Source Statistics Platform
 
R programming
R programmingR programming
R programming
 
R programming
R programmingR programming
R programming
 
R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-i
 
R programming
R programmingR programming
R programming
 
R language
R languageR language
R language
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
 
2 it unit-1 start learning r
2 it   unit-1 start learning r2 it   unit-1 start learning r
2 it unit-1 start learning r
 
How to get started with R programming
How to get started with R programmingHow to get started with R programming
How to get started with R programming
 
Introduction to statistical software R
Introduction to statistical software RIntroduction to statistical software R
Introduction to statistical software R
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial Introduction
 
R programming
R programmingR programming
R programming
 
Introducing The R Software
Introducing The R Software  Introducing The R Software
Introducing The R Software
 
R for data analytics
R for data analyticsR for data analytics
R for data analytics
 
Weka tutorial
Weka tutorialWeka tutorial
Weka tutorial
 

Andere mochten auch

Computational Biology and Bioinformatics
Computational Biology and BioinformaticsComputational Biology and Bioinformatics
Computational Biology and BioinformaticsSharif Shuvo
 
IBM - Big Value from Big Data
IBM - Big Value from Big DataIBM - Big Value from Big Data
IBM - Big Value from Big DataWilfried Hoge
 
Systems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemSystems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemLars Juhl Jensen
 
Computational Systems Biology (JCSB)
Computational Systems Biology (JCSB)Computational Systems Biology (JCSB)
Computational Systems Biology (JCSB)Annex Publishers
 
Data Scientist - The Sexiest Job of the 21st Century?
Data Scientist - The Sexiest Job of the 21st Century?Data Scientist - The Sexiest Job of the 21st Century?
Data Scientist - The Sexiest Job of the 21st Century?IoT User Group Hamburg
 
Tutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopTutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopVivek Krishnakumar
 
Day in the Life of a Computer Scientist
Day in the Life of a Computer ScientistDay in the Life of a Computer Scientist
Day in the Life of a Computer ScientistJustin Brunelle
 
Apps for Science - Elsevier Developer Network Workshop 201102
Apps for Science - Elsevier Developer Network Workshop 201102Apps for Science - Elsevier Developer Network Workshop 201102
Apps for Science - Elsevier Developer Network Workshop 201102remko caprio
 
Jupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at ScaleJupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at ScaleMatthias Bussonnier
 
Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen Harald Erb
 
Systems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelSystems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelLars Juhl Jensen
 
Analytics meets Big Data – R/Python auf der Hadoop/Spark-Plattform
Analytics meets Big Data – R/Python auf der Hadoop/Spark-PlattformAnalytics meets Big Data – R/Python auf der Hadoop/Spark-Plattform
Analytics meets Big Data – R/Python auf der Hadoop/Spark-PlattformRising Media Ltd.
 
Alan Turing Scientist Unlimited | Turing100@Persistent Systems
Alan Turing Scientist Unlimited | Turing100@Persistent SystemsAlan Turing Scientist Unlimited | Turing100@Persistent Systems
Alan Turing Scientist Unlimited | Turing100@Persistent SystemsPersistent Systems Ltd.
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its toolsGaurav Diwakar
 
PO WER - XX LO Gdańsk - Alan Turing
PO WER - XX LO Gdańsk - Alan TuringPO WER - XX LO Gdańsk - Alan Turing
PO WER - XX LO Gdańsk - Alan TuringAgnieszka J.
 
Multi-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorMulti-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorLevi Waldron
 

Andere mochten auch (20)

Computational Biology and Bioinformatics
Computational Biology and BioinformaticsComputational Biology and Bioinformatics
Computational Biology and Bioinformatics
 
IBM - Big Value from Big Data
IBM - Big Value from Big DataIBM - Big Value from Big Data
IBM - Big Value from Big Data
 
Systems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemSystems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological system
 
Computational Systems Biology (JCSB)
Computational Systems Biology (JCSB)Computational Systems Biology (JCSB)
Computational Systems Biology (JCSB)
 
Data Scientist - The Sexiest Job of the 21st Century?
Data Scientist - The Sexiest Job of the 21st Century?Data Scientist - The Sexiest Job of the 21st Century?
Data Scientist - The Sexiest Job of the 21st Century?
 
Tutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopTutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer Workshop
 
Day in the Life of a Computer Scientist
Day in the Life of a Computer ScientistDay in the Life of a Computer Scientist
Day in the Life of a Computer Scientist
 
Apps for Science - Elsevier Developer Network Workshop 201102
Apps for Science - Elsevier Developer Network Workshop 201102Apps for Science - Elsevier Developer Network Workshop 201102
Apps for Science - Elsevier Developer Network Workshop 201102
 
Python for Data Science
Python for Data SciencePython for Data Science
Python for Data Science
 
Jupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at ScaleJupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at Scale
 
Zwischen Browser, Code & Photoshop - aus dem Leben eines Webworkers
Zwischen Browser, Code & Photoshop - aus dem Leben eines WebworkersZwischen Browser, Code & Photoshop - aus dem Leben eines Webworkers
Zwischen Browser, Code & Photoshop - aus dem Leben eines Webworkers
 
Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen
 
Systems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelSystems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems level
 
Analytics meets Big Data – R/Python auf der Hadoop/Spark-Plattform
Analytics meets Big Data – R/Python auf der Hadoop/Spark-PlattformAnalytics meets Big Data – R/Python auf der Hadoop/Spark-Plattform
Analytics meets Big Data – R/Python auf der Hadoop/Spark-Plattform
 
Donald Knuth
Donald KnuthDonald Knuth
Donald Knuth
 
Job ppt1
Job ppt1Job ppt1
Job ppt1
 
Alan Turing Scientist Unlimited | Turing100@Persistent Systems
Alan Turing Scientist Unlimited | Turing100@Persistent SystemsAlan Turing Scientist Unlimited | Turing100@Persistent Systems
Alan Turing Scientist Unlimited | Turing100@Persistent Systems
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its tools
 
PO WER - XX LO Gdańsk - Alan Turing
PO WER - XX LO Gdańsk - Alan TuringPO WER - XX LO Gdańsk - Alan Turing
PO WER - XX LO Gdańsk - Alan Turing
 
Multi-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorMulti-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/Bioconductor
 

Ähnlich wie LSESU a Taste of R Language Workshop

An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document usefulssuser3c3f88
 
Data Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptxData Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptxCarolineRebeccaD
 
Big data analytics with R tool.pptx
Big data analytics with R tool.pptxBig data analytics with R tool.pptx
Big data analytics with R tool.pptxsalutiontechnology
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Rohit Dubey
 
2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4Ferdin Joe John Joseph PhD
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionRevolution Analytics
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folksThomas Hütter
 
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statisticsIBM
 
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceIntroduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceFerdin Joe John Joseph PhD
 
Demystifying Data Science
Demystifying Data Science Demystifying Data Science
Demystifying Data Science Venkat Raman
 
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...DataMind-slides
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computingBAINIDA
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational StatisticsSetia Pramana
 
Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!Edureka!
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysisPramod Toraskar
 

Ähnlich wie LSESU a Taste of R Language Workshop (20)

An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document useful
 
Data Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptxData Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptx
 
Big data analytics with R tool.pptx
Big data analytics with R tool.pptxBig data analytics with R tool.pptx
Big data analytics with R tool.pptx
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
 
2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and Revolution
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
 
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statistics
 
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceIntroduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
 
Demystifying Data Science
Demystifying Data Science Demystifying Data Science
Demystifying Data Science
 
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
 
R programming
R programmingR programming
R programming
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational Statistics
 
Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!
 
Python ml
Python mlPython ml
Python ml
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 

Mehr von Korkrid Akepanidtaworn

Data Science & Analytics Talk @ ExxonMobil
Data Science & Analytics Talk @ ExxonMobilData Science & Analytics Talk @ ExxonMobil
Data Science & Analytics Talk @ ExxonMobilKorkrid Akepanidtaworn
 
Intro to Azure OpenAI Service L100 (Thai Ver).pdf
Intro to Azure OpenAI Service L100 (Thai Ver).pdfIntro to Azure OpenAI Service L100 (Thai Ver).pdf
Intro to Azure OpenAI Service L100 (Thai Ver).pdfKorkrid Akepanidtaworn
 
[AI] ML Operationalization with Microsoft Azure
[AI] ML Operationalization with Microsoft Azure[AI] ML Operationalization with Microsoft Azure
[AI] ML Operationalization with Microsoft AzureKorkrid Akepanidtaworn
 
Open Data, Open Opportunity, Open to Progress
Open Data, Open Opportunity, Open to ProgressOpen Data, Open Opportunity, Open to Progress
Open Data, Open Opportunity, Open to ProgressKorkrid Akepanidtaworn
 
Learning from Conversation with the Governor: Big Data Challenges for Bank of...
Learning from Conversation with the Governor: Big Data Challenges for Bank of...Learning from Conversation with the Governor: Big Data Challenges for Bank of...
Learning from Conversation with the Governor: Big Data Challenges for Bank of...Korkrid Akepanidtaworn
 

Mehr von Korkrid Akepanidtaworn (6)

Data Science & Analytics Talk @ ExxonMobil
Data Science & Analytics Talk @ ExxonMobilData Science & Analytics Talk @ ExxonMobil
Data Science & Analytics Talk @ ExxonMobil
 
Dashboard in a Day
Dashboard in a DayDashboard in a Day
Dashboard in a Day
 
Intro to Azure OpenAI Service L100 (Thai Ver).pdf
Intro to Azure OpenAI Service L100 (Thai Ver).pdfIntro to Azure OpenAI Service L100 (Thai Ver).pdf
Intro to Azure OpenAI Service L100 (Thai Ver).pdf
 
[AI] ML Operationalization with Microsoft Azure
[AI] ML Operationalization with Microsoft Azure[AI] ML Operationalization with Microsoft Azure
[AI] ML Operationalization with Microsoft Azure
 
Open Data, Open Opportunity, Open to Progress
Open Data, Open Opportunity, Open to ProgressOpen Data, Open Opportunity, Open to Progress
Open Data, Open Opportunity, Open to Progress
 
Learning from Conversation with the Governor: Big Data Challenges for Bank of...
Learning from Conversation with the Governor: Big Data Challenges for Bank of...Learning from Conversation with the Governor: Big Data Challenges for Bank of...
Learning from Conversation with the Governor: Big Data Challenges for Bank of...
 

Kürzlich hochgeladen

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 

Kürzlich hochgeladen (20)

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 

LSESU a Taste of R Language Workshop

  • 1. a Taste of R Programming Kyle Akepanidtaworn, LSESU Data Science Society
  • 2. About Me • Founder and President of LSESU Data Science Society (2016) • A General Course Student at LSE studying Econ & Stats • Former Big Data Intern at IMC Institute, Thailand • Former Teaching Assistant at Wesleyan University • Former Quantitative Consultant for the Connection in the course “intro to statistical consulting“ at Wesleyan • Programming & Stats Packages: R, Python, SPSS, SAS, STATA • Business Intelligence Tools: Tableau, Qilkview • Linkedin: https://uk.linkedin.com/in/korkridakepan
  • 3. 7 Quick Facts about R • R is the highest paid IT skill (Dice.com survey, January 2014) • R most-used data science language after SQL (O'Reilly survey, January 2014) • R is used by 70% of data miners (Rexer survey, October 2013) • R is #15 of all programming languages (RedMonk language rankings, January 2014) • R growing faster than any other data science language (KDNuggets survey, August 2013) • R is the #1 Google Search for Advanced Analytics software (Google Trends, March 2014) • R has more than 2 million users worldwide (Oracle estimate, February 2012) • http://blog.revolutionanalytics.com/2014/0 4/seven-quick-facts-about-r.html
  • 4. What is R? • Developed by Ross Ihaka and Robert Gentleman (statistician) • First appeared Aug 1993; 23 years ago • Some capabilities of R include: Software development Data analysis and visualization Polling, surveys of data miners Shiny application development Writing project report Creating the HTML presentation
  • 5. Data Science War: R vs. Python Source: Which #superheroe are you?(#batman Vs. #Superman) == (#R Vs. #Python)?
  • 6. R vs Python vs SAS (Analytics Vidhya)
  • 7. Experience the Power of R Language!
  • 8. Why Learn R? Outstanding Graphs Big Community! Friendly to New Users and Non-programmers Extremely Comprehensive Flexible & Fun! Open-source Language Cross-Platform Compatibility Advanced Statistical Language
  • 9. • Facebook - For behavior analysis related to status updates and profile pictures. • Google - For advertising effectiveness and economic forecasting. • Twitter - For data visualization and semantic clustering • Microsoft - Acquired Revolution R company and use it for a variety of purposes. • Uber - For statistical analysis • Airbnb - Scale data science. • IBM - Joined R Consortium Group • ANZ - For credit risk modeling • HP • Ford • Novartis • Roche • New York Times - For data visualization • Mckinsey • BCG • Bain Companies Using R
  • 10. Installation Guide 1. Go to https://cran.r-project.org/ (The Comprehensive R Archive Network) 2. Choose the platform (either Windows or Mac) that suits you 3. Follow the installation instruction…nothing tricky here 4. Download R-Studio, which is an add-on user interface of R programming. https://www.rstudio.com/
  • 11. R Studio User Interface
  • 12. Remark I encourage everyone to follow the latest development of R programming via R-Bloggers, CRAN R, and R Studio websites. There are always a tremendous number of developers who help ease the analysis task for the R users.
  • 13. A Transition from Microsoft Excel to R? • Peter Flom, Independent statistical consultant for researchers in behavioral, social and medical sciences, has a compelling argument why Excel is such an undervalued tool for data analysis. • Excel isn’t undervalued as a tool for statistical analysis. If anything, it’s overvalued as such a tool. • Most competent analysts do not use Excel, not because it’s too easy, but other analytical tools have more statistical capabilities. • The default graphs in Excel are awful…other visualization tools outperform Microsoft Excel. • Learning statistics in Excel sometimes gives an imaginary idea about data analysis. Doing good statistics requires rigorously intensive training. • Excel cannot handle big data. If you are dealing with more than 1+ million data points, you need to seek help from R or Python. • It makes it harder than other programs to check the assumptions we made in analysis.
  • 15. Teaching Outline Chapter 1: R in Point-and-Click • Rcommander • Menu in Rcommander • R vs. STATA vs. IBM SPSS • Why is Coding Critical? Chapter 2: Basics of R Programming • Basic and Complex Numerical Operations • R Basic Data Types  Numeric  Integer  Complex  Logical  Character Chapter 2: Basics of R (Cont’) • Matrix • Vector • List • Data Frame • For-Loop • Writing your functions Chapter 3: R for Data Science • Using External Data • Exploratory Data Analysis (EDA) • Predictive Modelling  Linear Regression  Classification  Clustering
  • 16. RCommanderBasics in R in a non-programming way
  • 17. R Commanders • Enables analysts to access a selection of commonly-used R commands. • Serves the important role of helping users to implement R commands and develop their knowledge and expertise in using the command line. • Comes with a number of plugins available that provide direct access to R packages.
  • 18. The Complete Menu Trees: Rcmdr (i)
  • 19. The Complete Menu Trees: Rcmdr (ii)
  • 20. The Complete Menu Trees: Rcmdr (iii)
  • 21. The Complete Menu Trees: Rcmdr (iv)
  • 22. Comparing the statistical capabilities of software packages • A statistical consultant known only as "Stanford PhD" has put together a table comparing the statistical capabilities of the software packages R, Matlab, SAS, Stata and SPSS.
  • 23. Comparing the statistical capabilities of software packages • For each of 57 methods (including techniques like "ridge regression", "survival analysis", "optimization") the author ranks the capabilities of each software package as "Yes" (fully supported), "Limited" or "Experimental". • R and Matlab capabilities outperform those of SAS, STATA, and SPSS. • Python, to the best of my knowledge, is not rich in statistical testing functions, so it lies somewhere between R and SAS. R 57 Matlab 57 SAS 42 Stata 29 SPSS 20
  • 24. Should economists learn programming? Of course! As Keynes said: "The master-economist must possess a rare combination of gifts .... He must be mathematician, historian, statesman, programmer, philosopher -- in some degree. He must understand symbols, write code, and speak in words. He must contemplate the particular, in terms of the general, and touch abstract and concrete in the same flight of thought. He must study the present in the light of the past for the purposes of the future. He must be able to speak a common language with a computer scientist, a physicist and a sociologist. No part of man's nature or his institutions must be entirely outside his regard. He must be purposeful and disinterested in a simultaneous mood, as aloof and incorruptible as an artist, yet sometimes as near to earth as a politician.” --- Alex Teytelboym, Research Fellow in Economics at INET, University of Oxford
  • 25. Why Coding? (I) • As a social science student at LSE, managing, analyzing, and playing with data is an important part of your work. (charts, curves, and trends etc.) • Without programming skills, your work becomes more limited. • Are you always relying upon manual calculations? • Are you hand-collecting the data when you can write the code to easily retrieve data? • Are you working with big data? Do you think excel will solve all data problems? • With code, you can increase multiply by a huge factor the amount of work or calculations you can perform, read millions of rows of data, try and find patterns or relations, compare oil prices to Reddit traffic, or the natality rate to the average interest earned by investors in Wyoming; whatever you can think of in a matter of minutes or hours and unleash your imagination.
  • 26. Why Coding? (II) • Many experimental data requires you pull, clean and manipulate large sets of available and incoming data to run experiments based on some economic question you're testing. • If you can write code to do these tasks quickly and efficiently, you can iterate quickly through a lot more hypotheses you might want to test. • The cutting edge of economic research uses novel datasets and combines both theory and empirics. • Everyone in this room has different expectations of what they want to be able to do with data. In social science, analytics is very important, while computer programming with C++, Java, and similar languages are hardly necessary.
  • 27. Everybody in this country should learn to program a computer… because it teaches you how to think - Steve Jobs, Co-Founder and CEO of Apple Inc. (1995 -2011)
  • 28. R QuintessenceThe most disastrous thing that you can ever learn is your first programming language – Alan Kay
  • 29. R Arithmetic and Logical Operators
  • 30. Hey! Note that R is case sensitive!
  • 31. R Basic Data Types • Numeric • Integer • Complex • Logical • Character Constants, variables and data types – BBC BiteSize Programming Course GCSE
  • 32. R Data Structures (I) Figure: R Data Structures (AmazonS3 Website)
  • 35. What are “Loops”? • “Looping”, “cycling”, “iterating” or just replicating instructions is an old practice that originated well before the invention of computers. It is nothing more than automating a multi-step process by organizing sequences of actions or ‘batch’ processes and by grouping the parts that need to be repeated. • All modern programming languages provide special constructs that allow for the repetition of instructions or blocks of instructions. • Broadly speaking, there are two types of these special constructs or loops in modern programming languages. Some loops execute for a prescribed number of times, as controlled by a counter or an index, incremented at each iteration cycle. These are part of the for loop family. • On the other hand, some loops are based on the onset and verification of a logical condition. The condition is tested at the start or the end of the loop construct. These variants belong to the while or repeat family of loops, respectively.
  • 36. Looping in R Figure: a Tutorial on Loops in R (DataCamp)
  • 37. Functions (I) • Functions are used to logically break our code into simpler parts which become easy to maintain and understand. • It's pretty straightforward to create your own function in R programming.
  • 38. Functions (II) • Conceptually, given some inputs of x, we perform some computation to get the new output. • Some commonly known functions are mean, median, square root and summation etc.
  • 39. Functions (III) • Fortunately, R provides some built-in functions that are widely used in mathematics and statistics:
  • 41. Functions (V) • Writing functions should spring to your mind when you want to write your own chunk of codes and automate codes easily. • Creating your own functions begs some imagination and efficient coding skill. • Please revisit the workshop file for example in action! • Peace of mind: a vast community of R developers around the world collaborates in providing R useful packages, which saves us a lot of time and effort. • As you progress in data analysis with R, finding the right packages may provide a shortcut for your research project.
  • 42.
  • 43. Top 10 R Packages
  • 44. Packages are collections of R functions, data, and compiled code in a well-defined format. The directory where packages are stored is called the library. - Quick-R
  • 45. R for Data ScienceWithout data, you’re just another person with an opinion
  • 46. How to Import the Data Importing your data into R – R Tutorials by R-Bloggers
  • 47. “The simple graph has brought more information to the data analyst’s mind than any other device.” - John Tukey
  • 48. R for Data Science: Data Visualization • R has several systems for making graphs, but ggplot2 is one of the most elegant and most versatile. ggplot2 implements the grammar of graphics, a coherent system for describing and building graphs. With ggplot2, you can do more faster by learning one system and applying it in many places. • If you’d like to learn more about the theoretical underpinnings of ggplot2 before you start, I’d recommend reading “The Layered Grammar of Graphics”, http://vita.had.co.nz/papers/layered- grammar.pdf.
  • 49. Examples of R Visualizations (I)
  • 51. Examples of R Visualizations (III)
  • 52.
  • 54. Storytelling with data – a data visualization guide for business professionals
  • 55. Simple Framework of Machine Learning
  • 56.
  • 57.
  • 58.
  • 59.
  • 64.
  • 65. R Studio Tips and Tricks
  • 66. R Studio Tips and Tricks These are not exactly coding tricks, but rather ways to make your life easier using key commands. • The up arrow on your keyboard will allow you to scroll up through your past commands • The tab key on your keyboard will help you (particularly in RStudio) by offering ways to finish your code. • When working within a .R or .Rmd file, you can put your cursor on a line and hit Cntrl + Enter to get the code to execute in the Console. (On a mac, Command + Enter.) • If you get stuck with some syntax (usually, mismatched parentheses or quotes), the R Console will change from the > at the beginning of the line (which means it is waiting for a new command) to the + at the beginning of the line (which means it is waiting for you to finish a command). To get out, hit the Escape key.
  • 67. Tearable Panes Tearable panes are anything but terrible. This feature allows users to tear off data view panes and source panes facilitating the use of multiple screens.
  • 68. Command History In the console it is possible to scroll through the command history by clicking Ctrl/Cmd and ↑. The command history will be filtered as code is typed into the console:
  • 69. History Pane The history pane shows a searchable list of commands that have been run. Commands can be written to the source pane or the console. No more copy and paste from the console to a script!
  • 70. Rename in Scope This feature makes it easy to rename all instances of a variable. The tool is context aware; changing ‘m’ to ‘m1’ won’t change ‘mtcars’ to ‘m1tcars’.
  • 71. Gallery and Satellite View in Notebooks A new feature built into R Notebooks, a code chunk that produces multiple plots will produce a gallery. The plots can be viewed by toggling between thumbnails. The gallery can be expanded into a new satellite window for closer inspection.
  • 72. Code Outline Save time scrolling with the code outline. This feature works for R Notebooks and traditional R scripts. In R Notebooks sections are delimited by the R Markdown headers. In R scripts sections are delimited by section comments (Try Code -> Insert Section).
  • 73. Code Snippets Code snippets are a shortcut to insert common boilerplate code. For instance, type fun and then Tab to insert the skeleton code for a function definition. Then hit Tab to replace the necessary components. In addition to a rich set of defaults, custom code snippets can also be created.
  • 74. File Navigation Many people know of RStudio’s rich set of tab complete options for functions and function arguments. Tab complete can also help find files and remove the hassle of writing out long path locations. Hit tab in between two double quotes (“ “) to open a file explorer.
  • 75. Jump To Function Definition Want to dig into the innards of a function? With the cursor on a function press F2 to jump to the function definition, even for functions in a package.
  • 76. Thanks! ANY QUESTIONS? You can find me at @Korkrid Ake datasciencesoc@lsesu.org
  • 77. Further Resources R Language  Advanced R: http://adv-r.had.co.nz/  Bioconductor: https://www.bioconductor.org/  CRANberries: http://dirk.eddelbuettel.com/cranberries/  MRAN: https://mran.revolutionanalytics.com/  rOpenSci: https://ropensci.org/  R Project: https://www.r-project.org/  The R Journal: https://journal.r-project.org/ R Community  R Consortium: https://www.r-consortium.org/  R Weekly: https://rweekly.org/ Data Sciences  Apache Hadoop: http://hadoop.apache.org/  KDnuggets: http://www.kdnuggets.com/  R for Data Science: http://r4ds.had.co.nz/  sparklyr: http://spark.rstudio.com/  SparkR: https://spark.apache.org/docs/latest/sparkr.h tml  Tessera: http://tessera.io/
  • 78. Further Resources Blogs  RStudio Blog: https://blog.rstudio.org/  BLOGR: https://drsimonj.svbtle.com/  Mad (Data) Scientist: https://matloff.wordpress.com/  R Bloggers: https://www.r-bloggers.com/  R Consortium Blog: https://www.r- consortium.org/news/blog  Revolutions Blog: http://blog.revolutionanalytics.com  rOpenSci Blog: https://ropensci.org/blog/  Simply Statistics: http://simplystatistics.org/  Statistical Modeling, Causal Inference, and Social Science: http://andrewgelman.com/  StatsBlogs: http://www.statsblogs.com/  Win-Vector Blog: http://www.win-vector.com/blog/ Statistics  Journal of Statistical Software: https://www.jstatsoft.org/index  Forecasting: principles and practice: https://www.otexts.org/fpp  From Algorithms to Z-Scores: http://heather.cs.ucdavis.edu/~matloff/132/PLN/P robStatBookW16ECS132.pdf  Statistical Foundations of Machine Learning: https://www.otexts.org/book/sfml  The Elements of Statistical Learning: http://statweb.stanford.edu/~tibs/ElemStatLearn/p rintings/ESLII_print10.pdf
  • 80. Udemy R Courses • Another company is Udemy. While they do not offer video + interactive sessions like DataCamp • They do offer extensive video lessons, covering some other topics in using R and learning statistics. • The Comprehensive Programming in R Course (25 Hours of video) • Graphs in R (ggplot2, plotrix, base R) – Data Visualization with R Programming Language (5 Hours of video) • Linear Mixed-Effects Models with R (11 Hours of video) • Multivariate Data Visualization with R (7 Hours of video) • Applied Multivariate Analysis with R (13 Hours of video) • More Data Mining with R (11 Hours of video) • Text Mining, Scraping and Sentiment Analysis with R (4 Hours of video) • R Programming for Simulation and Monte Carlo Methods (12 Hours of video) • Programming Statistical Applications in R (12 Hours of video) • Comprehensive Linear Modeling with R (15 Hours of video) • Bayesian Computational Analyses with R (12 Hours of video) • Time Series Analysis and Forecasting in R (3 Hours of video)