Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
R for Analytics & Business Intelligence
First Steps in R
Prof. Dr. Jan Kirenz
Hochschule der Medien
Prof. Dr. Jan Kirenz
What you will learn
1. Import your data into R
2. Tidy your data (each column should be a variable, e...
Prof. Dr. Jan Kirenz
What you will learn
We start with some motivating examples so you
can see the bigger picture, and the...
Prof. Dr. Jan Kirenz
Starting with RStudio
27
Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved fro...
Prof. Dr. Jan Kirenz
First Steps in RStudio
l R code editor
l The top left window is where you'll
probably do most of your...
Prof. Dr. Jan Kirenz
First Steps in RStudio
l Interactive Console
l Bottom left is the interactive console
where you can t...
Prof. Dr. Jan Kirenz
First Steps in RStudio
l Workspace
l The top right window shows your
workspace, which includes a list...
Prof. Dr. Jan Kirenz
First Steps in RStudio
l Output: Plots, Help,...
l The window at bottom right shows a
plot if you've ...
Prof. Dr. Jan Kirenz
Starting with Rstudio: Change default options
32
l First, we change the default RStudio options
l Thi...
Prof. Dr. Jan Kirenz
Starting with Rstudio: Shortcuts
33
l Keyboard Shortcuts (Alt + Shift + K)
l Cmd/Ctrl + Enter:
- send...
Prof. Dr. Jan Kirenz
Starting with Rstudio: Installation of Packages
34
l An R package is a collection of functions, data,...
Prof. Dr. Jan Kirenz
Starting with Rstudio: Installation of Packages
35
You only need to install a package once, but
you n...
Prof. Dr. Jan Kirenz
Introduction
36
Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL htt...
Prof. Dr. Jan Kirenz 37
Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL http://r4ds.had....
Prof. Dr. Jan Kirenz
Data Analysis 1: Questions
38
Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieve...
Prof. Dr. Jan Kirenz
Data Analysis 1: Open RStudio
39
l Open Rstudio and open a new script:
Prof. Dr. Jan Kirenz
Data Analysis 1: Installation of packages (only once)
40
Source: Grolemund, G & Wickham, H. (2016). R...
Prof. Dr. Jan Kirenz
Data Analysis 1: Installation of packages (only once)
41
(1) Write your code
(2) Click on the line
yo...
Prof. Dr. Jan Kirenz
Data Analysis 1: Load the packages (every time you start)
42
Source: Grolemund, G & Wickham, H. (2016...
Prof. Dr. Jan Kirenz
Data Analysis 1: Have a look at the dataset
43
Source: Grolemund, G & Wickham, H. (2016). R for Data ...
Prof. Dr. Jan Kirenz
Data Analysis 1: ?mpg
44
l The dataset (mpg) contains observations collected by the US Environmental ...
Prof. Dr. Jan Kirenz
Data Analysis 1: Dataset mpg
45
Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrie...
Prof. Dr. Jan Kirenz
Data Analysis 1: Have a look at the dataset
46
Source: Grolemund, G & Wickham, H. (2016). R for Data ...
Prof. Dr. Jan Kirenz
Data Analysis 1: Have a look at the dataset
47
Source: Grolemund, G & Wickham, H. (2016). R for Data ...
Prof. Dr. Jan Kirenz
Data Analysis 1: Code review
48
Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrie...
Prof. Dr. Jan Kirenz
Data Analysis 1: Code review
49
Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrie...
Prof. Dr. Jan Kirenz
Data Analysis 1: Code review
50
Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrie...
Prof. Dr. Jan Kirenz
Data Analysis 1: Code review
51
Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrie...
Prof. Dr. Jan Kirenz
Data Analysis 1: Code template
52
Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retr...
Nächste SlideShare
Wird geladen in …5
×

R for Analytics & Business Intelligence - Introduction to Exploratory Data Analysis

8.408 Aufrufe

Veröffentlicht am

In this presentation, you will learn how to perform exploratory data analysis in R.

Content:

- Introduction to RStudio
- First steps with the ggplot2 package
- Exploratory data analysis

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

R for Analytics & Business Intelligence - Introduction to Exploratory Data Analysis

  1. 1. R for Analytics & Business Intelligence First Steps in R Prof. Dr. Jan Kirenz Hochschule der Medien
  2. 2. Prof. Dr. Jan Kirenz What you will learn 1. Import your data into R 2. Tidy your data (each column should be a variable, each row an observation) 3. Transform the data (e.g., calculate summary statistics...) 4. Visualization & Modelling 5. Communication of results 25 Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL http://r4ds.had.co.nz/introduction.html
  3. 3. Prof. Dr. Jan Kirenz What you will learn We start with some motivating examples so you can see the bigger picture, and then dive into the details. 26 Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL http://r4ds.had.co.nz/introduction.html
  4. 4. Prof. Dr. Jan Kirenz Starting with RStudio 27 Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL http://r4ds.had.co.nz/introduction.html l RStudio is an integrated development environment (IDE) for R programming. There are three key regions in the interface:
  5. 5. Prof. Dr. Jan Kirenz First Steps in RStudio l R code editor l The top left window is where you'll probably do most of your work. l That's the R code editor allowing you to create a file with multiple lines of R code. There you‘ll write your statistical commands. l For example, enter 2+2 and click on the „run“ button (see arrow) l You can use the hashtag character, #, for adding comments. R will not run anything that follows a hashtag on a line. 28 RStudio Source: Machlis, S. (2013). Get started with this popular programming language. Terieved from URL http://www.computerworld.com/article/2497143/business-intelligence/business-intelligence-beginner-s-guide-to-r-introduction.html EDITOR
  6. 6. Prof. Dr. Jan Kirenz First Steps in RStudio l Interactive Console l Bottom left is the interactive console where you can type in R statements one line at a time. l Any lines of code that are run from the editor window also appear in the console. l The console is R's way of telling you what it's doing. l For example, let's use R like a calculator: try typing in 1+1 in the interactive console. 29 RStudio Source: Machlis, S. (2013). Get started with this popular programming language. Terieved from URL http://www.computerworld.com/article/2497143/business-intelligence/business-intelligence-beginner-s-guide-to-r-introduction.html CONSOLE
  7. 7. Prof. Dr. Jan Kirenz First Steps in RStudio l Workspace l The top right window shows your workspace, which includes a list of objects currently in memory. l There's also a history tab with a list of your prior commands; what's handy there is that you can select one, some or all of those lines of code and one-click to send them either to the console or to whatever file is active in your code editor 30 RStudio Source: Machlis, S. (2013). Get started with this popular programming language. Terieved from URL http://www.computerworld.com/article/2497143/business-intelligence/business-intelligence-beginner-s-guide-to-r-introduction.html WORKSPACE
  8. 8. Prof. Dr. Jan Kirenz First Steps in RStudio l Output: Plots, Help,... l The window at bottom right shows a plot if you've created a data visualization with your R code. l There's a history of previous plots and an option to export a plot to an image file or PDF. l This window also shows external packages (R extensions) that are available on your system, files in your working directory and help files when called from the console. 31 RStudio Source: Machlis, S. (2013). Get started with this popular programming language. Terieved from URL http://www.computerworld.com/article/2497143/business-intelligence/business-intelligence-beginner-s-guide-to-r-introduction.html OUTPUT
  9. 9. Prof. Dr. Jan Kirenz Starting with Rstudio: Change default options 32 l First, we change the default RStudio options l This ensures that every time you restart RStudio you get a clean workspace.
  10. 10. Prof. Dr. Jan Kirenz Starting with Rstudio: Shortcuts 33 l Keyboard Shortcuts (Alt + Shift + K) l Cmd/Ctrl + Enter: - sends the current line (or current selection) from the editor to the console and runs it. l Tab: - suggest possible completions for the text you’ve typed. l Alt + Shift + K - overview about all shortcuts
  11. 11. Prof. Dr. Jan Kirenz Starting with Rstudio: Installation of Packages 34 l An R package is a collection of functions, data, and documentation that extends the capabilities of base R. l After you have downloaded the packages, you can load any of the packages into your current R session with the library() command
  12. 12. Prof. Dr. Jan Kirenz Starting with Rstudio: Installation of Packages 35 You only need to install a package once, but you need to reload it every time you start a new session. install.packages(„tidyr“) „only once“ library(tidyr) „every time you start a new session“ If you need help go to https://www.r-bloggers.com or https://stackoverflow.com/
  13. 13. Prof. Dr. Jan Kirenz Introduction 36 Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL http://r4ds.had.co.nz/introduction.html l To better understand data, we start with visualisation. l Visualisation, learning the basic structure of a ggplot2 plot, and powerful techniques for turning data into plots.
  14. 14. Prof. Dr. Jan Kirenz 37 Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL http://r4ds.had.co.nz/introduction.html Analytics Example 1 Do cars with big engines use more fuel than cars with small engines?
  15. 15. Prof. Dr. Jan Kirenz Data Analysis 1: Questions 38 Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL http://r4ds.had.co.nz/introduction.html l What does the relationship between engine size and fuel efficiency look like? Is it: - Positive? - Negative? - Linear? - Nonlinear?
  16. 16. Prof. Dr. Jan Kirenz Data Analysis 1: Open RStudio 39 l Open Rstudio and open a new script:
  17. 17. Prof. Dr. Jan Kirenz Data Analysis 1: Installation of packages (only once) 40 Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL http://r4ds.had.co.nz/introduction.html l To access the datasets, help pages, and functions that we will use in this data analysis, install the following packages (you only need to to this once): l ggplot2: visualisation package. l Tibble: improves the default printing of datasets install.packages("ggplot2") install.packages(“tibble") „R Codes are presented in grey frames in this presentation“
  18. 18. Prof. Dr. Jan Kirenz Data Analysis 1: Installation of packages (only once) 41 (1) Write your code (2) Click on the line you want to run (3) Click on „run“ or use the shortcut Apple: cmd + Enter Microsoft: ctrl + Enter
  19. 19. Prof. Dr. Jan Kirenz Data Analysis 1: Load the packages (every time you start) 42 Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL http://r4ds.had.co.nz/introduction.html l Load the packages library(ggplot2) library(tibble) l Hint: If you run this code and get the error message “there is no package called ‘ggplot2’”, you’ll need to first install it, then run library() once again.
  20. 20. Prof. Dr. Jan Kirenz Data Analysis 1: Have a look at the dataset 43 Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL http://r4ds.had.co.nz/introduction.html l To view the data, type mpg into the console: l To open a help window with more information, type: mpg ?mpg
  21. 21. Prof. Dr. Jan Kirenz Data Analysis 1: ?mpg 44 l The dataset (mpg) contains observations collected by the US Environmental Protection Agency (EPA) on 38 models of (see the help window in RStudio): l Format: A data frame with 234 rows and 11 variables l manufacturer. l model. l displ. engine displacement, in litres l year. l cyl. number of cylinders l trans. type of transmission l drv. f = front-wheel drive, r = rear wheel drive, 4 = 4wd l cty. city miles per gallon l hwy. highway miles per gallon l fl. l class.
  22. 22. Prof. Dr. Jan Kirenz Data Analysis 1: Dataset mpg 45 Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL http://r4ds.had.co.nz/introduction.html l Among the variables in mpg are: - displ: a car’s engine size, in litres. - hwy: a car’s fuel efficiency on the highway, in miles per gallon (mpg). - A car with a low fuel efficiency consumes more fuel than a car with a high fuel efficiency when they travel the same distance.
  23. 23. Prof. Dr. Jan Kirenz Data Analysis 1: Have a look at the dataset 46 Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL http://r4ds.had.co.nz/introduction.html l Create a plot with ggplot2. Enter this code in your editor: ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy))
  24. 24. Prof. Dr. Jan Kirenz Data Analysis 1: Have a look at the dataset 47 Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL http://r4ds.had.co.nz/introduction.html
  25. 25. Prof. Dr. Jan Kirenz Data Analysis 1: Code review 48 Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL http://r4ds.had.co.nz/introduction.html With ggplot2, you begin a plot with the function ggplot() ggplot() creates a coordinate system that you can add layers to. ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy))
  26. 26. Prof. Dr. Jan Kirenz Data Analysis 1: Code review 49 Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL http://r4ds.had.co.nz/introduction.html The first argument of ggplot() is the dataset to use in the graph. So ggplot(data = mpg) creates an empty graph. ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy))
  27. 27. Prof. Dr. Jan Kirenz Data Analysis 1: Code review 50 Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL http://r4ds.had.co.nz/introduction.html You complete your graph by adding one or more layers to ggplot(). The function geom_point() adds a layer of points to your plot, which creates a scatterplot. ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy))
  28. 28. Prof. Dr. Jan Kirenz Data Analysis 1: Code review 51 Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL http://r4ds.had.co.nz/introduction.html Each geom function in ggplot2 takes a mapping argument. This defines how variables in your dataset are mapped to visual properties. ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy))
  29. 29. Prof. Dr. Jan Kirenz Data Analysis 1: Code template 52 Source: Grolemund, G & Wickham, H. (2016). R for Data Science. Retrieved from URL http://r4ds.had.co.nz/introduction.html l Let’s turn this code into a reusable template for making graphs with ggplot2: l To make a graph, replace the bracketed sections in the code below with a dataset, a geom function, or a set of mappings. ggplot(data = <DATA>) + <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))

×