The Institute for Statistics Education at Statistics.com offers a graduate-level certificate program in R for those who want to use the R statistical programming environment for statistical analysis, visualization and modeling. The Institute offers continuing education credits as well as a Program completion certificate. Courses are offered year-round (there is no semester system) on a flexible schedule. The content of the Program is the equivalent of 18 credits, in the US academic system. Faculty include R core development team members, package developers, authors of books on R: Paul Murrell, Hadley Wickham, Thomas Lumley, Sudha Purohit, Luis Torgo, John Verzani, others.
Join this webinar to learn about the structure of the certificate and available courses through the Institute, which are offered in 3 categories:
Basic programming skills in R
Statistical methods implemented in R
R applied to specific domains
2. About Statistics.com
• First course 2002 (resampling methods)
• 2003-2004 added courses in data mining, modeling, intro stats
• Now 100+ courses
• Hybrid model between
• Professional development (topic centered, scheduling accommodates working
professionals)
• Academic (homework and assessment)
• Taught by noted authorities
• Statistics, predictive modeling, data mining, R, optimization, risk
modeling, clinical trials…
4. The spread of R:
Phase 1: R started in 1993 by academics, and gained popularity
in universities around the world – open source & free!
Phase 2: PhD statisticians who used R in university took it to
their complex quant modeling jobs in industry.
Phase 3 (now): R is ubiquitous:
•Industry - now that R is seeded by the PhD statisticians,
other analysts in their companies need to know it.
•Academic – researchers in a variety of fields who do
statistics but are not primarily statisticians use R
5. Why learn R? Let’s look at what
employers are looking for:
R
SAS
SPSS
Relative proportion of mentions of statistical tools in “job
requirements” section of job postings. A single job may
mention more than one tool.
Source: Statistics.com survey of approx. 4000 analytics/statistics job postings on various job sites, May,
2012
6. ConAgra Foods’ Human Capital Analytics/Reporting (HCA/R) program is searching for a project
manager/statistician ….development of predictive modeling processes to answer different business issues.
Excellent computer skills specifically with advanced Excel (v-lookups, pivot tables, macros), R, and other open
source software. Experience in configuration of data to support complex data mining & statistical analysis.
SAS- is seeking a Research Statistician to apply cutting-edge econometric models ...
demonstrated experience or knowledge of computer programming; ... particularly with
applied econometric modeling or time series analysis; the SAS system; statistical software
products, such as WinBugs, R, Stata, EViews, OxMetrics, or S-Plus.
AmazonLocal, Applied Machine Learning Scientist · Run sampling, clustering, classification, etc on large datasets using a
variety of analytics software (e.g. SAS, Python, R, etc).
SRA International, ...
* Use of statistical algorithms, techniques and models to define data for data integrity and process
analytics
* Use of data mining techniques to define data
...
* Experience and demonstrated expertise with at multiple data mining tools including SAS, SPSS, R,
Weka, etc.
Need for R goes hand-in-hand with need for higher-level stats skills.
8. Who are your fellow students?
China
Australia
Germany
India
Academia
Industry Bioinformaticist
PhD candidate in
epidemiology
Database marketer, Canada Prof. of
Survey researcher
international bank medicine
Health researcher
Digital Marketer J&J Plant ecologist
Project manager, large
consulting firm PhD student in animal
Circulation manager, embryology
Countryside Pubs.
Statistical geneticist
Anthropologist, human
UK remains
Web developer
Casualty actuary Farmer, Calif. Central
Valley
Forecaster, Walt Disney
Government Netherlands
Commodities analyst, hedge
Researcher, K-12
fund
Risk analyst, school dist.
agriculture dept.
CDC
epidemiologist
Team coordinator Brazil
Denmark aerospace
medicine
Forest
monitor
9. Executive and an assistant professor in an academic medical center: I have extensive
experience with SPSS …I see R as the future for quantitative work and need to begin
doing more of my work in R.
Analyst with state government natural resource agency: We have survey designs for regional
monitoring that we continually need to evaluate and improve. Currently, I program in C and rely
heavily on Monte Carlo modeling. I plot in Excel and have wanted to learn R to get greater
flexibility.
Analyst with health and human service agency: My job is mostly data analysis and some statistical
modeling which is handled via SAS and PL/SQL. Other agencies have incorporated R. I am
looking to be prepared should our agency adopt R as well as understand how R compares with SAS
with the hope of drawing from the strengths of both in the future.
Marketing analyst, international banking: Since we are manipulating tons of data at customer
level for more than 27 countries, R would be the perfect complement tool (we have been using
SAS) for customer analytics.
Analyst with non-profit organization: We do quite a bit of data analysis (mostly descriptive
work and GIS mapping) and I started teaching myself R a few years ago in order to
automate our routine data cleaning.
Database marketer, banking: I have used SAS for 8 years, also have experience in
FICO Model Builder, but am new to R and want to learn those comprehensive
packages which are not available in base SAS to do more advance analytics.
Commodities analyst at a hedge fund: I'm looking to use R to build more robust, stable and
dynamic econometric models.
10. Why take classes? Why not learn on your own?
• R is not like SAS, SPSS
• SAS has two very distinct user types:
• Programmer
• Statistical modeler & analyst
• SPSS the latter
• R is powerful, but has more programming and “messiness,” even when used purely in
analysis/modeling mode.
• Often it is helpful to have an expert on hand while learning R
• 4-week courses allow an iterative process – a short intensive learning
period, ask lots of questions. Apply what you learn. Come back later for
another 4-week class. Learn more. Apply. Repeat.
11. Certificate Program Content
PREREQUISITES: None for entry into program, but
introductory statistics is a prerequisite for some courses.
6 ELECTIVES
6 REQUIRED
R-Specific: Include R:
•Intro to R – Data Handling
•Intro to R – Statistical •Data mining •Probability
Analysis •Spatial Distributions
•Programming in R •Microarray •Resampling
•Programming in R – Adv. •SVM •Bootstrap
•Modeling in R •Clinical Trials Apps •Logistic
•Graphics in R •ggplot2 Regression
•Smoothing with P- •GLM
splines •Count data
•Survey Analysis
12. 1. The principles of R programming:
• Introduction to R – Data Handling (Paul Murrell) introduces basic
expressions, symbols, assignment, functions, packages, use of code
editors (emacs), workspace, data types & structures, subsetting,
assessor functions, classes, type coercion, text files, binary files, large
files, memory management, apply function, tabulate, aggregation,
merging and splitting data, reshape, text processing.
• Programming in R (2 courses with Hadley Wickham) covers lexical
scoping, dynamic scoping, frames, environments, namespaces, active
bindings, quoting, evaluation, calling from other functions, string
processing (stringr), dates and times (lubridate), regular expressions, xml
and xpath, extracting data with SQL, executing SQL in R, writing compact
and efficient code (helper function, lapply), anonymous functions, first
class functions, object oriented programming, S3, tips for producing
reliable code, functions and options to help debug, speed, testing.
13. 2. Plotting and visualizing data in R:
Graphics in R (Paul Murrell, covers the core R capabilities for
graphing, and teaches you to produce key statistical plots such as
scatterplots, )
R ggplot2 (Hadley Wickham teaches how to use his package,
which is a package with its own language that rests on R, to create
graphs)
14. 3. Application/method/domain specific:
Other classes are application oriented, where syntax and
programming are discussed, as necessary, on the path to
getting R to accomplish something specific. Intro stats,
statistical modeling, microarray analysis, data mining, survey
analysis.
In the most basic of these, Introduction to R – Statistical
Analysis, some familiarity with statistical procedures is assumed
and you learn R by executing these procedures (t-tests, chi-
square, correlation, regression, etc.) in R. In other cases, the
emphasis is on learning the method and R is simply the chosen
tool.
Let’s see an example from the Statistical Analysis course.
15. Snapshot: Regression. The instructions are given
step-by-step in Lesson 3 of “Introduction to R –
Statistical Analysis.”
The lm function will estimate the regression parameters for the
simple linear regression model. For the two models specified above
we have:
> lm(total ~ w.class, data = d)
Call:
lm(formula = total ~ w.class, data = d)
Coefficients:
(Intercept) w.class
159.815 2.732
which gives estimates ˆb0 = 159.815 and ˆb1 = 2.732.
16. KIM ASKS
Hi John,
I was plotting the residuals from a linear regression (example on page 19 of the lesson 3), and there was a delay before the
plots would show. The message on the R console was "Waiting to confirm page change." By clicking on the graphics, I could
switch from one plot to the next. Is there anyway to make them tile so I can see all of them at once, or any way to go back and
forth once they've 'printed' on the graphics page?
JOHN VERZANI REPLIES:
A couple of possibilities exist:
You can partition your graphics device so that more than one graphic will appear. For example, par(mfrow=c(2,2)) will set up a 2
by 2 grid, perfect for the plot function called on the output of the lm function.
On some implementations you can record plots and scroll back through them. For windows users, the RGui application (your
basic interface) allows you to turn on recording, I think by right clicking on a plot (if I'm wrong let me know, and I'll check).
For RStudio, the graphs are already recorded. There are arrows to scroll.
Hope one of those works for you. --J
SABINA CHIMES IN
Where do you type it? In the plot command? I have tried:
> plot(res.pipeline, par(mfrow=c(2,2)))
and get
Error in plot.lm(res.pipeline, par(mfrow = c(2, 2))) :
'which' must be in 1:6
....How do you keep track of all these different ways of doing things. I find that your comments are amazing...
JOHN REPLIES
The par settings are done in their own command (well some are). Try:
par(mfrow=c(2,2))
plot(res.pipeline)
The ".lm" extra bit isn't necessary (though doesn't hurt), as R will use the class of res.pipeline to find that function in most usual
cases.
Let me know if that doesn't help .
17. ALTA ASKS
John, what does masked in the following error message mean? and what is '.GlobeEnv'? thnx in advance
>attach(kid.weights)
The following object(S) are masked _by_ '.GlobeEnv':
age
JOHN REPLIES
R looks for objects by traversing a series of nested environments. In this case, when you
attach(kid.weights) it includes a variable 'age'. However you already have a variable 'age' in your global
workspace (.GlobalEnv is the secret name for that). Which one do you want? Well, R is answering which
one it will find. In this case the one in the global workspace, not that in kid.weights. For that one, you will
need to work harder (using $ or with or ...)
Does that help?
gotcha! very helpful--thnx
19. How courses work
Discussion Homework
Forum
Readings, notes, videos
20. Weekly Course Schedule
Most courses are 4 weeks.
~ March 2013 ~
Sun Mon Tue Wed Thu Fri Sat
1 2
Lesson 1 opens
3 4 5 6 7 8 9
Lesson 2 opens
10 11 12 13 14 15 16
Homework 1 due Feedback Homework 1 Lesson 3 opens
17 18 19 20 21 22 23
Homework 2 due Feedback Homework 2 Lesson 4 opens
24 25 26 27 28 29 30
Homework 3 due Feedback Homework 3
31 April 1 2 3 4 5 6
Homework 4 due Feedback Homework 4
21. Time Required
• Estimate 15 hours per week
• Don’t need to be online at
particular times or days
• Time zone does not matter
• Best not to leave all work until the
end of the week
• Materials remain open for a
couple of weeks after end-of-
course
• Most students are working
professionals, take courses one at
a time
22. Faculty
Paul Murrell John Verzani Hadley Wickham Sudha Purohit
Luis Torgo David Unwin Thomas Lumley Din Chen
Karl Peace Garrett Brian Marx Paul Eilers
Grolemund
23. Typical Course Contents – R Programming
• “Headquarters” Page
• Lesson Page
• Readings/notes/videos
• Homework
• Discussion Forum
24. Typical Course Contents – R Programming
• “Headquarters” Page
• Lesson Page
• Readings/notes/videos
• Homework
• Discussion Forum
25. Typical Course Contents – R Programming
• “Headquarters” Page
• Lesson Page
• Readings/notes/videos
• Homework
• Discussion Forum
26. Typical Course Contents – R Programming
• “Headquarters” Page
• Lesson Page
• Readings/notes/videos
• Homework
• Discussion Forum
27. Typical Course Contents – R Programming
• “Headquarters” Page
• Lesson Page
• Readings/notes/videos
• Homework
• Discussion Forum
29. Next Step.
For certificate program application, contact
sales@revolutionanalytics.com or call
1-855-GET-REVO (1-855-438-7386)
• Application fee will be waived (until July 30th)
• Up to 50% discount offered for Revolution Analytics
software when purchased in combination with training