SlideShare ist ein Scribd-Unternehmen logo
1 von 22
MEET
OUR
TEAM
WRITE HERE SOMETHING
DATA EXPLORATION METHODS &
PRACTISES
Martin Bago | Instarea
8.10.2018
2nd Data Science Club, 18/19 Winter
MEET
OUR
TEAM
WRITE HERE SOMETHINGTABLE OF CONTENT
INTRO
FIRST DEEP INTO DATASET
GOING DEEPER
CORRELATIONS
BONUS
D A T A S C I E N C E C L U B
Martin Bago
Data Scientist | Instarea
Ing. @ Process Automation and Informatization in Industry (2016, MTF STU BA)
Bc. @ Applied Informatics (2014, FEI STU BA)
2017- now Data Scientist, Instarea s.r.o., Market Locator
2015-2016 Head of Analyst, News and Media Holding a.s.
2014-2015 SEO Analyst, Centrum Holdings a.s.
2011-2014 Automix.sk, Centrum Holdings a.s.
2010-2013 Editor-in-chief OKO Casopis (FEI STU BA)
Passionate driver, beer&coffee&football lover
Something for you
Download this presentation +
source code here:
http://bit.ly/2QybvNV
The Data journey…always the
same
Dataset
>> install.packages("datasets") #installing datasets package in R
>> library(datasets)
For studying there is an unique library consisting of many real-life dataset examples (from Monthly
Airline Passenger Numbers, thru Weight versus age of chicks on different diets to Monthly Deaths from
Lung Diseases in the UK) .
For this presentation we will use mtcars dataset.
How to find&use
Baby steps
head(), tail(), nrow() and ncol()
To understand, what are you working with is very important to see dimensions of dataset a number/count
of values.
>> head(mtcars)
>> tail(mtcars)
>> head(mtcars, 25)
>> nrow(mtcars)
>> ncol(mtcars)
Input: Output:
Deeper insight
str(), summary()
To deeper understanding of dataset use detailed views of metrics and
dimensions.
>> str(mtcars)
>> summary(mtcars)
Input: Output:
Always check data types!!!
Source
Unique and missing values
unique(), is.na()
Is crucial to find, how many values are missing from the dataset. If there is 2/3 missing,
you got wrong dataset.
>> unique(mtcars$cyl)
>> is.na(mtcars)
Input: Output:
If there is something missing, you can
use old&good method to treat that –
filling with mean.
>> mtcars$smt[is.na(mtcars$smt)] <-
mean(mtcars$smt, na.rm = TRUE)
Histograms
hist()
The best way to learn and understand, is visual
>> hist(mtcars$mpg)
>> hist(mtcars$hp)
Input: Output:
Output:
Transforming and recalculating
Often you need to calculate your own metrics. In R, it’s really
easy.
>> mtcars2 <- mtcars
>> mtcars2$disp_l <- mtcars$mpg/61.024
>> mtcars2$kml <- 235/mtcars$mpg
>> hist(mtcars2$disp_l)
Input: Output:
Understand the scope of
variablesboxplot()
>> boxplot(mtcars)
>> boxplot(mtcars2$disp_l, mtcars2$kml)
>> boxplot(mtcars2$kml, main = "mtcars dataset",
xlab = "Comsumption per 100km", ylab = "Liters")
Input:
Output:
Output:
How to read boxplot?
boxplot()
Does it correlate?
Library(corplot), cor()
>> install.packages("corrplot")
>> library(corrplot)
>> #cor(x, method = "pearson", use = "complete.obs")
>> cor(mtcars)
Input:
Output: Not very intuitive…
Does it correlate?
Library(corplot), cor()
>> res <- cor(mtcars)
>> round(res, 2)
>> corrplot(res, type = "upper", order = "hclust",
tl.col = "black", tl.srt = 25)
Input: Output:
! Becareful !
Correlation is not causality
Heatmap via corrplot library
>> library(corrplot)
>> col<- colorRampPalette(c("blue", "white", "red"))(20)
>> heatmap(x = res, col = col, symm = TRUE)
Input: Output:
Does it correlate?
Or even deeper insight…
>>require(graphics)
pairs(mtcars2, main = "mtcars2 data", gap = 1/4)
coplot(kml ~ disp_l | as.factor(cyl), data = mtcars2,
panel = panel.smooth, rows = 1)
## possibly more meaningful, e.g., for summary() or
bivariate plots:
mtcars2 <- within(mtcars2, {
vs <- factor(vs, labels = c("V", "S"))
am <- factor(am, labels = c("automatic", "manual"))
cyl <- ordered(cyl)
gear <- ordered(gear)
carb <- ordered(carb)
})
summary(mtcars2)
Input: Output:
Library(corplot), cor()
Or even deeper insight…
>> install.packages("PerformanceAnalytics")
>> library(PerformanceAnalytics)
>> chart.Correlation(mtcars, histogram=TRUE, pch=19)
>> mtcars_small <- mtcars[,1:4]
>> chart.Correlation(mtcars_small, histogram=TRUE, pch=19)
Input: Output:
Library Performance Analytics
Bonus - anomaliesDetection
AnomalyDetectionTs()
As input in considered time-series or vector, at least two periods.
Madeby Twitter
What next?
To create customizable dashboards try
Shiny: Tableau-like Drag and Drop GUI Visualization in R use esquisse:
Something for you
Download this presentation +
source code here:
http://bit.ly/2QybvNV
Stay in touch
Instarea s.r.o.
29. Augusta 36/A
811 09 Bratislava
www.instarea.com
Martin Bago
Data Scientist
Instarea
martin.bago@instarea.com
+421 905 255 852
https://www.linkedin.com/in/martinbago/
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis Peter Reimann
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using RUmmiya Mohammedi
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Simplilearn
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-exportFAO
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisGramener
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Simplilearn
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data AnalysisUmair Shafique
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Edureka!
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data miningKamal Acharya
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis IntroductionPrasiddhaSarma
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Simplilearn
 
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Edureka!
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science ProcessVishal Patel
 

Was ist angesagt? (20)

Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-export
 
Data Science Project Lifecycle and Skill Set
Data Science Project Lifecycle and Skill SetData Science Project Lifecycle and Skill Set
Data Science Project Lifecycle and Skill Set
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Cluster Analysis Introduction
Cluster Analysis IntroductionCluster Analysis Introduction
Cluster Analysis Introduction
 
Data Management in R
Data Management in RData Management in R
Data Management in R
 
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 2 | Machine Learning Tutorial For Beginners ...
 
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
Statistics And Probability Tutorial | Statistics And Probability for Data Sci...
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
 

Ähnlich wie Exploratory data analysis in R - Data Science Club

A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData StackPeadar Coyle
 
Machine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWSMachine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWSAmazon Web Services
 
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...ITCamp
 
Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...
Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...
Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...Databricks
 
Sparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R usersSparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R usersICTeam S.p.A.
 
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAMSparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAMData Science Milan
 
Monzor, Carbon-R-a, and the end of the world
Monzor, Carbon-R-a, and the end of the worldMonzor, Carbon-R-a, and the end of the world
Monzor, Carbon-R-a, and the end of the worldRyan Bateman
 
How to calculate a broadcast address ?
How to calculate a broadcast address ?How to calculate a broadcast address ?
How to calculate a broadcast address ?Miguel Delamontagne
 
InfluxData Webinar 16 June, 2020 - How to Create a Telegraf Parser Plugin for...
InfluxData Webinar 16 June, 2020 - How to Create a Telegraf Parser Plugin for...InfluxData Webinar 16 June, 2020 - How to Create a Telegraf Parser Plugin for...
InfluxData Webinar 16 June, 2020 - How to Create a Telegraf Parser Plugin for...Emanuele Falzone
 
MUM Europe 2017 - Traffic Generator Case Study
MUM Europe 2017 - Traffic Generator Case StudyMUM Europe 2017 - Traffic Generator Case Study
MUM Europe 2017 - Traffic Generator Case StudyFajar Nugroho
 
Life of PySpark - A tale of two environments
Life of PySpark - A tale of two environmentsLife of PySpark - A tale of two environments
Life of PySpark - A tale of two environmentsShankar M S
 
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]CARTO
 
My Favorite Calc Code
My Favorite Calc CodeMy Favorite Calc Code
My Favorite Calc CodeAlithya
 
TabPy Presentation
TabPy PresentationTabPy Presentation
TabPy PresentationSanjana Jami
 
Big data bi-mature-oanyc summit
Big data bi-mature-oanyc summitBig data bi-mature-oanyc summit
Big data bi-mature-oanyc summitOpen Analytics
 
7 key recipes for data engineering
7 key recipes for data engineering7 key recipes for data engineering
7 key recipes for data engineeringunivalence
 
AI Deeplearning Programming
AI Deeplearning ProgrammingAI Deeplearning Programming
AI Deeplearning ProgrammingPaulSombat
 
Decoupling Official Statistics
Decoupling Official StatisticsDecoupling Official Statistics
Decoupling Official StatisticsXavier Badosa
 

Ähnlich wie Exploratory data analysis in R - Data Science Club (20)

A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData Stack
 
Machine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWSMachine Learning & Data Lake for IoT scenarios on AWS
Machine Learning & Data Lake for IoT scenarios on AWS
 
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
 
MLflow with R
MLflow with RMLflow with R
MLflow with R
 
Seeing Like Software
Seeing Like SoftwareSeeing Like Software
Seeing Like Software
 
Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...
Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...
Road to Enterprise Architecture for Big Data Applications: Mixing Apache Spar...
 
Sparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R usersSparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R users
 
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAMSparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
Sparklyr: Big Data enabler for R users - Serena Signorelli, ICTEAM
 
Monzor, Carbon-R-a, and the end of the world
Monzor, Carbon-R-a, and the end of the worldMonzor, Carbon-R-a, and the end of the world
Monzor, Carbon-R-a, and the end of the world
 
How to calculate a broadcast address ?
How to calculate a broadcast address ?How to calculate a broadcast address ?
How to calculate a broadcast address ?
 
InfluxData Webinar 16 June, 2020 - How to Create a Telegraf Parser Plugin for...
InfluxData Webinar 16 June, 2020 - How to Create a Telegraf Parser Plugin for...InfluxData Webinar 16 June, 2020 - How to Create a Telegraf Parser Plugin for...
InfluxData Webinar 16 June, 2020 - How to Create a Telegraf Parser Plugin for...
 
MUM Europe 2017 - Traffic Generator Case Study
MUM Europe 2017 - Traffic Generator Case StudyMUM Europe 2017 - Traffic Generator Case Study
MUM Europe 2017 - Traffic Generator Case Study
 
Life of PySpark - A tale of two environments
Life of PySpark - A tale of two environmentsLife of PySpark - A tale of two environments
Life of PySpark - A tale of two environments
 
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
CARTO en 5 Pasos: del Dato a la Toma de Decisiones [CARTO]
 
My Favorite Calc Code
My Favorite Calc CodeMy Favorite Calc Code
My Favorite Calc Code
 
TabPy Presentation
TabPy PresentationTabPy Presentation
TabPy Presentation
 
Big data bi-mature-oanyc summit
Big data bi-mature-oanyc summitBig data bi-mature-oanyc summit
Big data bi-mature-oanyc summit
 
7 key recipes for data engineering
7 key recipes for data engineering7 key recipes for data engineering
7 key recipes for data engineering
 
AI Deeplearning Programming
AI Deeplearning ProgrammingAI Deeplearning Programming
AI Deeplearning Programming
 
Decoupling Official Statistics
Decoupling Official StatisticsDecoupling Official Statistics
Decoupling Official Statistics
 

Kürzlich hochgeladen

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Kürzlich hochgeladen (20)

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 

Exploratory data analysis in R - Data Science Club

  • 1. MEET OUR TEAM WRITE HERE SOMETHING DATA EXPLORATION METHODS & PRACTISES Martin Bago | Instarea 8.10.2018 2nd Data Science Club, 18/19 Winter
  • 2. MEET OUR TEAM WRITE HERE SOMETHINGTABLE OF CONTENT INTRO FIRST DEEP INTO DATASET GOING DEEPER CORRELATIONS BONUS D A T A S C I E N C E C L U B
  • 3. Martin Bago Data Scientist | Instarea Ing. @ Process Automation and Informatization in Industry (2016, MTF STU BA) Bc. @ Applied Informatics (2014, FEI STU BA) 2017- now Data Scientist, Instarea s.r.o., Market Locator 2015-2016 Head of Analyst, News and Media Holding a.s. 2014-2015 SEO Analyst, Centrum Holdings a.s. 2011-2014 Automix.sk, Centrum Holdings a.s. 2010-2013 Editor-in-chief OKO Casopis (FEI STU BA) Passionate driver, beer&coffee&football lover
  • 4. Something for you Download this presentation + source code here: http://bit.ly/2QybvNV
  • 6. Dataset >> install.packages("datasets") #installing datasets package in R >> library(datasets) For studying there is an unique library consisting of many real-life dataset examples (from Monthly Airline Passenger Numbers, thru Weight versus age of chicks on different diets to Monthly Deaths from Lung Diseases in the UK) . For this presentation we will use mtcars dataset. How to find&use
  • 7. Baby steps head(), tail(), nrow() and ncol() To understand, what are you working with is very important to see dimensions of dataset a number/count of values. >> head(mtcars) >> tail(mtcars) >> head(mtcars, 25) >> nrow(mtcars) >> ncol(mtcars) Input: Output:
  • 8. Deeper insight str(), summary() To deeper understanding of dataset use detailed views of metrics and dimensions. >> str(mtcars) >> summary(mtcars) Input: Output: Always check data types!!! Source
  • 9. Unique and missing values unique(), is.na() Is crucial to find, how many values are missing from the dataset. If there is 2/3 missing, you got wrong dataset. >> unique(mtcars$cyl) >> is.na(mtcars) Input: Output: If there is something missing, you can use old&good method to treat that – filling with mean. >> mtcars$smt[is.na(mtcars$smt)] <- mean(mtcars$smt, na.rm = TRUE)
  • 10. Histograms hist() The best way to learn and understand, is visual >> hist(mtcars$mpg) >> hist(mtcars$hp) Input: Output: Output:
  • 11. Transforming and recalculating Often you need to calculate your own metrics. In R, it’s really easy. >> mtcars2 <- mtcars >> mtcars2$disp_l <- mtcars$mpg/61.024 >> mtcars2$kml <- 235/mtcars$mpg >> hist(mtcars2$disp_l) Input: Output:
  • 12. Understand the scope of variablesboxplot() >> boxplot(mtcars) >> boxplot(mtcars2$disp_l, mtcars2$kml) >> boxplot(mtcars2$kml, main = "mtcars dataset", xlab = "Comsumption per 100km", ylab = "Liters") Input: Output: Output:
  • 13. How to read boxplot? boxplot()
  • 14. Does it correlate? Library(corplot), cor() >> install.packages("corrplot") >> library(corrplot) >> #cor(x, method = "pearson", use = "complete.obs") >> cor(mtcars) Input: Output: Not very intuitive…
  • 15. Does it correlate? Library(corplot), cor() >> res <- cor(mtcars) >> round(res, 2) >> corrplot(res, type = "upper", order = "hclust", tl.col = "black", tl.srt = 25) Input: Output: ! Becareful ! Correlation is not causality
  • 16. Heatmap via corrplot library >> library(corrplot) >> col<- colorRampPalette(c("blue", "white", "red"))(20) >> heatmap(x = res, col = col, symm = TRUE) Input: Output: Does it correlate?
  • 17. Or even deeper insight… >>require(graphics) pairs(mtcars2, main = "mtcars2 data", gap = 1/4) coplot(kml ~ disp_l | as.factor(cyl), data = mtcars2, panel = panel.smooth, rows = 1) ## possibly more meaningful, e.g., for summary() or bivariate plots: mtcars2 <- within(mtcars2, { vs <- factor(vs, labels = c("V", "S")) am <- factor(am, labels = c("automatic", "manual")) cyl <- ordered(cyl) gear <- ordered(gear) carb <- ordered(carb) }) summary(mtcars2) Input: Output: Library(corplot), cor()
  • 18. Or even deeper insight… >> install.packages("PerformanceAnalytics") >> library(PerformanceAnalytics) >> chart.Correlation(mtcars, histogram=TRUE, pch=19) >> mtcars_small <- mtcars[,1:4] >> chart.Correlation(mtcars_small, histogram=TRUE, pch=19) Input: Output: Library Performance Analytics
  • 19. Bonus - anomaliesDetection AnomalyDetectionTs() As input in considered time-series or vector, at least two periods. Madeby Twitter
  • 20. What next? To create customizable dashboards try Shiny: Tableau-like Drag and Drop GUI Visualization in R use esquisse:
  • 21. Something for you Download this presentation + source code here: http://bit.ly/2QybvNV
  • 22. Stay in touch Instarea s.r.o. 29. Augusta 36/A 811 09 Bratislava www.instarea.com Martin Bago Data Scientist Instarea martin.bago@instarea.com +421 905 255 852 https://www.linkedin.com/in/martinbago/ Thank you!