SlideShare a Scribd company logo
1 of 29
Download to read offline
Introduction to Data Mining with R 
and Data Import/Export in R 
Yanchang Zhao 
http://www.RDataMining.com 
30 September 2014 
1 / 25
Questions 
I Do you know data mining and its algorithms and techniques? 
2 / 25
Questions 
I Do you know data mining and its algorithms and techniques? 
I Have you heard of R? 
2 / 25
Questions 
I Do you know data mining and its algorithms and techniques? 
I Have you heard of R? 
I Have you used R in your research or projects? 
2 / 25
Outline 
Introduction to R 
R Packages and Functions for Data Mining 
Data Import and Export 
Online Resources 
3 / 25
What is R? 
I R 1 is a free software environment for statistical computing 
and graphics. 
I R can be easily extended with 5,800+ packages available on 
CRAN2 (as of 13 Sept 2014). 
I Many other packages provided on Bioconductor3, R-Forge4, 
GitHub5, etc. 
I R manuals on CRAN6 
I An Introduction to R 
I The R Language De
nition 
I R Data Import/Export 
I . . . 
1http://www.r-project.org/ 
2http://cran.r-project.org/ 
3http://www.bioconductor.org/ 
4http://r-forge.r-project.org/ 
5https://github.com/ 
6http://cran.r-project.org/manuals.html 
4 / 25
Why R? 
I R is widely used in both academia and industry. 
I R was ranked no. 1 in the KDnuggets 2014 poll on Top 
Languages for analytics, data mining, data science7 (actually 
R has been no. 1 in 2011, 2012 & 2013!). 
I The CRAN Task Views 8 provide collections of packages for 
dierent tasks. 
I Machine learning  atatistical learning 
I Cluster analysis
nite mixture models 
I Time series analysis 
I Multivariate statistics 
I Analysis of spatial data 
I . . . 
7 
http://www.kdnuggets.com/polls/2014/languages-analytics-data-mining-data-science.html 
8 
http://cran.r-project.org/web/views/ 
5 / 25
Outline 
Introduction to R 
R Packages and Functions for Data Mining 
Data Import and Export 
Online Resources 
6 / 25
Classi
cation with R 
I Decision trees: rpart, party 
I Random forest: randomForest, party 
I SVM: e1071, kernlab 
I Neural networks: nnet, neuralnet, RSNNS 
I Performance evaluation: ROCR 
7 / 25
Clustering with R 
I k-means: kmeans(), kmeansruns()9 
I k-medoids: pam(), pamk() 
I Hierarchical clustering: hclust(), agnes(), diana() 
I DBSCAN: fpc 
I BIRCH: birch 
9Functions are followed with (), and others are packages. 
8 / 25
Association Rule Mining with R 
I Association rules: apriori(), eclat() in package arules 
I Sequential patterns: arulesSequence 
I Visualisation of associations: arulesViz 
9 / 25
Text Mining with R 
I Text mining: tm 
I Topic modelling: topicmodels, lda 
I Word cloud: wordcloud 
I Twitter data access: twitteR 
10 / 25
Time Series Analysis with R 
I Time series decomposition: decomp(), decompose(), arima(), 
stl() 
I Time series forecasting: forecast 
I Time Series Clustering: TSclust 
I Dynamic Time Warping (DTW): dtw 
11 / 25
Social Network Analysis with R 
I Packages: igraph, sna 
I Centrality measures: degree(), betweenness(), closeness(), 
transitivity() 
I Clusters: clusters(), no.clusters() 
I Cliques: cliques(), largest.cliques(), maximal.cliques(), 
clique.number() 
I Community detection: fastgreedy.community(), 
spinglass.community() 
12 / 25
R and Big Data 
I Hadoop 
I Hadoop (or YARN) - a framework that allows for the 
distributed processing of large data sets across clusters of 
computers using simple programming models 
I R Packages: RHadoop, RHIPE 
I Spark 
I Spark - a fast and general engine for large-scale data 
processing, which can be 100 times faster than Hadoop 
I SparkR - R frontend for Spark 
I H2O 
I H2O - an open source in-memory prediction engine for big 
data science 
I R Package: h2o 
I MongoDB 
I MongoDB - an open-source document database 
I R packages: rmongodb, RMongo 
13 / 25
R and Hadoop 
I Packages: RHadoop, RHive 
I RHadoop10 is a collection of R packages: 
I rmr2 - perform data analysis with R via MapReduce on a 
Hadoop cluster 
I rhdfs - connect to Hadoop Distributed File System (HDFS) 
I rhbase - connect to the NoSQL HBase database 
I . . . 
I You can play with it on a single PC (in standalone or 
pseudo-distributed mode), and your code developed on that 
will be able to work on a cluster of PCs (in full-distributed 
mode)! 
I Step-by-Step Guide to Setting Up an R-Hadoop System 
http://www.rdatamining.com/big-data/ 
r-hadoop-setup-guide 
10https://github.com/RevolutionAnalytics/RHadoop/wiki 
14 / 25
Outline 
Introduction to R 
R Packages and Functions for Data Mining 
Data Import and Export 
Online Resources 
15 / 25
Data Import and Export 11 
Read data from and write data to 
I R native formats (incl. Rdata and RDS) 
I CSV
les 
I EXCEL
les 
I ODBC databases 
I SAS databases 
R Data Import/Export: 
I http://cran.r-project.org/doc/manuals/R-data.pdf 
11Chapter 2: Data Import and Export, in book R and Data Mining: Examples 
and Case Studies. http://www.rdatamining.com/docs/RDataMining.pdf 
16 / 25
Save and Load R Objects 
I save(): save R objects into a .Rdata
le 
I load(): read R objects from a .Rdata
le 
I rm(): remove objects from R 
a - 1:10 
save(a, file = ./data/dumData.Rdata) 
rm(a) 
a 
## Error: object 'a' not found 
load(./data/dumData.Rdata) 
a 
## [1] 1 2 3 4 5 6 7 8 9 10 
17 / 25
Save and Load R Objects - More Functions 
I save.image(): 
save current workspace to a
le 
It saves everything! 
I readRDS(): 
read a single R object from a .rds
le 
I saveRDS(): 
save a single R object to a

More Related Content

What's hot

Introduction To R Language
Introduction To R LanguageIntroduction To R Language
Introduction To R LanguageGaurang Dobariya
 
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageJanuary 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageZurich_R_User_Group
 
Introduction to the language R
Introduction to the language RIntroduction to the language R
Introduction to the language Rfbenault
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching moduleSander Timmer
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programmingYanchang Zhao
 
R Programming Tutorial for Beginners - -TIB Academy
R Programming Tutorial for Beginners - -TIB AcademyR Programming Tutorial for Beginners - -TIB Academy
R Programming Tutorial for Beginners - -TIB Academyrajkamaltibacademy
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in RFlorian Uhlitz
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming FundamentalsRagia Ibrahim
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using RVictoria López
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
 
2 R Tutorial Programming
2 R Tutorial Programming2 R Tutorial Programming
2 R Tutorial ProgrammingSakthi Dasans
 
R basics
R basicsR basics
R basicsFAO
 

What's hot (20)

Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
R - the language
R - the languageR - the language
R - the language
 
Introduction To R Language
Introduction To R LanguageIntroduction To R Language
Introduction To R Language
 
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageJanuary 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
 
Introduction to the language R
Introduction to the language RIntroduction to the language R
Introduction to the language R
 
R language
R languageR language
R language
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
Programming in R
Programming in RProgramming in R
Programming in R
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
R Programming Tutorial for Beginners - -TIB Academy
R Programming Tutorial for Beginners - -TIB AcademyR Programming Tutorial for Beginners - -TIB Academy
R Programming Tutorial for Beginners - -TIB Academy
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
R programming Fundamentals
R programming  FundamentalsR programming  Fundamentals
R programming Fundamentals
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
R programming language
R programming languageR programming language
R programming language
 
2 R Tutorial Programming
2 R Tutorial Programming2 R Tutorial Programming
2 R Tutorial Programming
 
R basics
R basicsR basics
R basics
 

Viewers also liked

An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with RYanchang Zhao
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data MiningYanchang Zhao
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with RYanchang Zhao
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataYanchang Zhao
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with RYanchang Zhao
 
Introduction to R for Data Mining (Feb 2013)
Introduction to R for Data Mining (Feb 2013)Introduction to R for Data Mining (Feb 2013)
Introduction to R for Data Mining (Feb 2013)Revolution Analytics
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesJeffrey Breen
 
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with RYanchang Zhao
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slidesYanchang Zhao
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with RYanchang Zhao
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with RYanchang Zhao
 
TiffanyHertel2016RESUME_Final
TiffanyHertel2016RESUME_FinalTiffanyHertel2016RESUME_Final
TiffanyHertel2016RESUME_FinalTiffany Hertel
 
Data mining platform
Data mining platformData mining platform
Data mining platformchanson zhang
 
Analyzing mlb data with ggplot
Analyzing mlb data with ggplotAnalyzing mlb data with ggplot
Analyzing mlb data with ggplotAustin Ogilvie
 
Analyze this
Analyze thisAnalyze this
Analyze thisAjay Ohri
 
Table of Useful R commands.
Table of Useful R commands.Table of Useful R commands.
Table of Useful R commands.Dr. Volkan OBAN
 
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)Building a Beer Recommender with Yhat (PAPIs.io - November 2014)
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)Austin Ogilvie
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsAjay Ohri
 
Python at yhat (august 2013)
Python at yhat (august 2013)Python at yhat (august 2013)
Python at yhat (august 2013)Austin Ogilvie
 
Ggplot in python
Ggplot in pythonGgplot in python
Ggplot in pythonAjay Ohri
 

Viewers also liked (20)

An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with R
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data Mining
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with R
 
Introduction to R for Data Mining (Feb 2013)
Introduction to R for Data Mining (Feb 2013)Introduction to R for Data Mining (Feb 2013)
Introduction to R for Data Mining (Feb 2013)
 
R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlines
 
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with R
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slides
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with R
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with R
 
TiffanyHertel2016RESUME_Final
TiffanyHertel2016RESUME_FinalTiffanyHertel2016RESUME_Final
TiffanyHertel2016RESUME_Final
 
Data mining platform
Data mining platformData mining platform
Data mining platform
 
Analyzing mlb data with ggplot
Analyzing mlb data with ggplotAnalyzing mlb data with ggplot
Analyzing mlb data with ggplot
 
Analyze this
Analyze thisAnalyze this
Analyze this
 
Table of Useful R commands.
Table of Useful R commands.Table of Useful R commands.
Table of Useful R commands.
 
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)Building a Beer Recommender with Yhat (PAPIs.io - November 2014)
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports Analytics
 
Python at yhat (august 2013)
Python at yhat (august 2013)Python at yhat (august 2013)
Python at yhat (august 2013)
 
Ggplot in python
Ggplot in pythonGgplot in python
Ggplot in python
 

Similar to Introduction to Data Mining with R and Data Import/Export

r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packagesAjay Ohri
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)Daniel Nüst
 
R Introduction
R IntroductionR Introduction
R Introductionschamber
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big DataDhafer Malouche
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptanshikagoel52
 
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and REnabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and RDatabricks
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdfBusyBird2
 
Big data analysis using spark r published
Big data analysis using spark r publishedBig data analysis using spark r published
Big data analysis using spark r publishedDipendra Kusi
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiUnmesh Baile
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchAndrew Lowe
 
Transient and persistent RDF views over relational databases in the context o...
Transient and persistent RDF views over relational databases in the context o...Transient and persistent RDF views over relational databases in the context o...
Transient and persistent RDF views over relational databases in the context o...Nikolaos Konstantinou
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax EnablementVincent Poncet
 
Parallelizing Existing R Packages
Parallelizing Existing R PackagesParallelizing Existing R Packages
Parallelizing Existing R PackagesCraig Warman
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark Hubert Fan Chiang
 

Similar to Introduction to Data Mining with R and Data Import/Export (20)

r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)
 
R Introduction
R IntroductionR Introduction
R Introduction
 
Data Science
Data ScienceData Science
Data Science
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big Data
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1 r
Lecture1 rLecture1 r
Lecture1 r
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.ppt
 
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and REnabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and R
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdf
 
Big data analysis using spark r published
Big data analysis using spark r publishedBig data analysis using spark r published
Big data analysis using spark r published
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbai
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
 
Transient and persistent RDF views over relational databases in the context o...
Transient and persistent RDF views over relational databases in the context o...Transient and persistent RDF views over relational databases in the context o...
Transient and persistent RDF views over relational databases in the context o...
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax Enablement
 
Parallelizing Existing R Packages
Parallelizing Existing R PackagesParallelizing Existing R Packages
Parallelizing Existing R Packages
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 

More from Yanchang Zhao

RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisYanchang Zhao
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rYanchang Zhao
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classificationYanchang Zhao
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rYanchang Zhao
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationYanchang Zhao
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rYanchang Zhao
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rYanchang Zhao
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-cardYanchang Zhao
 

More from Yanchang Zhao (8)

RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysis
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-r
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisation
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-r
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-r
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-card
 

Recently uploaded

Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 

Recently uploaded (20)

Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 

Introduction to Data Mining with R and Data Import/Export

  • 1. Introduction to Data Mining with R and Data Import/Export in R Yanchang Zhao http://www.RDataMining.com 30 September 2014 1 / 25
  • 2. Questions I Do you know data mining and its algorithms and techniques? 2 / 25
  • 3. Questions I Do you know data mining and its algorithms and techniques? I Have you heard of R? 2 / 25
  • 4. Questions I Do you know data mining and its algorithms and techniques? I Have you heard of R? I Have you used R in your research or projects? 2 / 25
  • 5. Outline Introduction to R R Packages and Functions for Data Mining Data Import and Export Online Resources 3 / 25
  • 6. What is R? I R 1 is a free software environment for statistical computing and graphics. I R can be easily extended with 5,800+ packages available on CRAN2 (as of 13 Sept 2014). I Many other packages provided on Bioconductor3, R-Forge4, GitHub5, etc. I R manuals on CRAN6 I An Introduction to R I The R Language De
  • 7. nition I R Data Import/Export I . . . 1http://www.r-project.org/ 2http://cran.r-project.org/ 3http://www.bioconductor.org/ 4http://r-forge.r-project.org/ 5https://github.com/ 6http://cran.r-project.org/manuals.html 4 / 25
  • 8. Why R? I R is widely used in both academia and industry. I R was ranked no. 1 in the KDnuggets 2014 poll on Top Languages for analytics, data mining, data science7 (actually R has been no. 1 in 2011, 2012 & 2013!). I The CRAN Task Views 8 provide collections of packages for dierent tasks. I Machine learning atatistical learning I Cluster analysis
  • 9. nite mixture models I Time series analysis I Multivariate statistics I Analysis of spatial data I . . . 7 http://www.kdnuggets.com/polls/2014/languages-analytics-data-mining-data-science.html 8 http://cran.r-project.org/web/views/ 5 / 25
  • 10. Outline Introduction to R R Packages and Functions for Data Mining Data Import and Export Online Resources 6 / 25
  • 12. cation with R I Decision trees: rpart, party I Random forest: randomForest, party I SVM: e1071, kernlab I Neural networks: nnet, neuralnet, RSNNS I Performance evaluation: ROCR 7 / 25
  • 13. Clustering with R I k-means: kmeans(), kmeansruns()9 I k-medoids: pam(), pamk() I Hierarchical clustering: hclust(), agnes(), diana() I DBSCAN: fpc I BIRCH: birch 9Functions are followed with (), and others are packages. 8 / 25
  • 14. Association Rule Mining with R I Association rules: apriori(), eclat() in package arules I Sequential patterns: arulesSequence I Visualisation of associations: arulesViz 9 / 25
  • 15. Text Mining with R I Text mining: tm I Topic modelling: topicmodels, lda I Word cloud: wordcloud I Twitter data access: twitteR 10 / 25
  • 16. Time Series Analysis with R I Time series decomposition: decomp(), decompose(), arima(), stl() I Time series forecasting: forecast I Time Series Clustering: TSclust I Dynamic Time Warping (DTW): dtw 11 / 25
  • 17. Social Network Analysis with R I Packages: igraph, sna I Centrality measures: degree(), betweenness(), closeness(), transitivity() I Clusters: clusters(), no.clusters() I Cliques: cliques(), largest.cliques(), maximal.cliques(), clique.number() I Community detection: fastgreedy.community(), spinglass.community() 12 / 25
  • 18. R and Big Data I Hadoop I Hadoop (or YARN) - a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models I R Packages: RHadoop, RHIPE I Spark I Spark - a fast and general engine for large-scale data processing, which can be 100 times faster than Hadoop I SparkR - R frontend for Spark I H2O I H2O - an open source in-memory prediction engine for big data science I R Package: h2o I MongoDB I MongoDB - an open-source document database I R packages: rmongodb, RMongo 13 / 25
  • 19. R and Hadoop I Packages: RHadoop, RHive I RHadoop10 is a collection of R packages: I rmr2 - perform data analysis with R via MapReduce on a Hadoop cluster I rhdfs - connect to Hadoop Distributed File System (HDFS) I rhbase - connect to the NoSQL HBase database I . . . I You can play with it on a single PC (in standalone or pseudo-distributed mode), and your code developed on that will be able to work on a cluster of PCs (in full-distributed mode)! I Step-by-Step Guide to Setting Up an R-Hadoop System http://www.rdatamining.com/big-data/ r-hadoop-setup-guide 10https://github.com/RevolutionAnalytics/RHadoop/wiki 14 / 25
  • 20. Outline Introduction to R R Packages and Functions for Data Mining Data Import and Export Online Resources 15 / 25
  • 21. Data Import and Export 11 Read data from and write data to I R native formats (incl. Rdata and RDS) I CSV
  • 23. les I ODBC databases I SAS databases R Data Import/Export: I http://cran.r-project.org/doc/manuals/R-data.pdf 11Chapter 2: Data Import and Export, in book R and Data Mining: Examples and Case Studies. http://www.rdatamining.com/docs/RDataMining.pdf 16 / 25
  • 24. Save and Load R Objects I save(): save R objects into a .Rdata
  • 25. le I load(): read R objects from a .Rdata
  • 26. le I rm(): remove objects from R a - 1:10 save(a, file = ./data/dumData.Rdata) rm(a) a ## Error: object 'a' not found load(./data/dumData.Rdata) a ## [1] 1 2 3 4 5 6 7 8 9 10 17 / 25
  • 27. Save and Load R Objects - More Functions I save.image(): save current workspace to a
  • 28. le It saves everything! I readRDS(): read a single R object from a .rds
  • 29. le I saveRDS(): save a single R object to a
  • 30. le I Advantage of readRDS() and saveRDS(): You can restore the data under a dierent object name. I Advantage of load() and save(): You can save multiple R objects to one
  • 31. le. 18 / 25
  • 32. Import from and Export to .CSV Files I write.csv(): write an R object to a .CSV
  • 33. le I read.csv(): read an R object from a .CSV
  • 34. le # create a data frame var1 - 1:5 var2 - (1:5)/10 var3 - c(R, and, Data Mining, Examples, Case Studies) df1 - data.frame(var1, var2, var3) names(df1) - c(VarInt, VarReal, VarChar) # save to a csv file write.csv(df1, ./data/dummmyData.csv, row.names = FALSE) # read from a csv file df2 - read.csv(./data/dummmyData.csv) print(df2) ## VarInt VarReal VarChar ## 1 1 0.1 R ## 2 2 0.2 and ## 3 3 0.3 Data Mining ## 4 4 0.4 Examples ## 5 5 0.5 Case Studies 19 / 25
  • 35. Import from and Export to EXCEL Files Package xlsx: read, write, format Excel 2007 and Excel 97/2000/XP/2003
  • 36. les library(xlsx) xlsx.file - ./data/dummmyData.xlsx write.xlsx(df2, xlsx.file, sheetName = sheet1, row.names = F) df3 - read.xlsx(xlsx.file, sheetName = sheet1) df3 ## VarInt VarReal VarChar ## 1 1 0.1 R ## 2 2 0.2 and ## 3 3 0.3 Data Mining ## 4 4 0.4 Examples ## 5 5 0.5 Case Studies 20 / 25
  • 37. Read from Databases I Package RODBC: provides connection to ODBC databases. I Function odbcConnect(): sets up a connection to database I sqlQuery(): sends an SQL query to the database I odbcClose() closes the connection. library(RODBC) db - odbcConnect(dsn = servername, uid = userid, pwd = ******) sql - SELECT * FROM lib.table WHERE ... # or read query from file sql - readChar(myQuery.sql, nchars=99999) myData - sqlQuery(db, sql, errors=TRUE) odbcClose(db) 21 / 25
  • 38. Read from Databases I Package RODBC: provides connection to ODBC databases. I Function odbcConnect(): sets up a connection to database I sqlQuery(): sends an SQL query to the database I odbcClose() closes the connection. library(RODBC) db - odbcConnect(dsn = servername, uid = userid, pwd = ******) sql - SELECT * FROM lib.table WHERE ... # or read query from file sql - readChar(myQuery.sql, nchars=99999) myData - sqlQuery(db, sql, errors=TRUE) odbcClose(db) Functions sqlFetch(), sqlSave() and sqlUpdate(): read, write or update a table in an ODBC database 21 / 25
  • 39. Import Data from SAS Package foreign provides function read.ssd() for importing SAS datasets (.sas7bdat
  • 40. les) into R. library(foreign) # for importing SAS data # the path of SAS on your computer sashome - C:/Program Files/SAS/SASFoundation/9.2 filepath - ./data # filename should be no more than 8 characters, without extension fileName - dumData # read data from a SAS dataset a - read.ssd(file.path(filepath), fileName, sascmd=file.path(sashome, sas.exe)) 22 / 25
  • 41. Import Data from SAS Package foreign provides function read.ssd() for importing SAS datasets (.sas7bdat
  • 42. les) into R. library(foreign) # for importing SAS data # the path of SAS on your computer sashome - C:/Program Files/SAS/SASFoundation/9.2 filepath - ./data # filename should be no more than 8 characters, without extension fileName - dumData # read data from a SAS dataset a - read.ssd(file.path(filepath), fileName, sascmd=file.path(sashome, sas.exe)) Another way: using function read.xport() to read a
  • 43. le in SAS Transport (XPORT) format 22 / 25
  • 44. Outline Introduction to R R Packages and Functions for Data Mining Data Import and Export Online Resources 23 / 25
  • 45. Online Resources I RDataMining website http://www.rdatamining.com I R Reference Card for Data Mining I R and Data Mining: Examples and Case Studies I RDataMining Group on LinkedIn (7,000+ members) http://group.rdatamining.com I RDataMining on Twitter (1,700+ followers) @RDataMining I Free online courses http://www.rdatamining.com/resources/courses I Online documents http://www.rdatamining.com/resources/onlinedocs 24 / 25
  • 46. The End Thanks! Email: yanchang(at)rdatamining.com 25 / 25