SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
Import and export Big Data
using
R Studio
Rupak Roy
Working with Big Data
R provides two ways to work with Big Data, one by using
R-hadoop functions and an another is R’s in-built base packages
and functions by using systems RAM.
But the problem with the R’s in-built base functions that it can
handle the amount of data based on system’s RAM availability.
Therefore higher system memory provides better performance.
One of the common errors related to memory in R will show: cant
allocate vector of size i.e. error due to memory limitation.
So R developers created special packages and functions to handle
big data in R through better memory management.
Rupak Roy
R-Hadoop
R-hadoop is also an another function to integrate R programming
language with hadoop.
Due to its limit of handling data based on system’s RAM
availability R uses special packages and functions to send back
and forth to process the user instructions using hadoop
framework.
The reasons why R-hadoop is good fit for big data analytics:
 Its an interactive language.
 It is also useful for advance data visualizations.
 Can easily implement statistical programming features like
predictive analysis.
#to know more about integrating the R and the hadoop follow
our big data Analytics module.
fread()
R’s special packages and functions to read big data:
1. fread(): similar to read.table in terms of functionality but faster and
effective with more parameters.
All the controls such as sep, colClasses and nrows are automatically
detected. Integer data types are also detected and read directly. Dates
are read as character and can be converted afterwards using the
time package or standard R base functions.
>bigdata<-fread(input, sep=“auto”, header= “auto”, nrow= -1L,
stringAsFactors= FALSE,……..);
Where as
input= file name to read
nrow= -1L the number of rows to read, by default -1 means all.
Base functions Vs fread()
Using standard R base function
>system.time(store<-read.table(“store.csv”, header=T, sep=“,”,fill=TRUE,
nrows=28000) )
where, fill =If TRUE then in case the rows have unequal length, blank fields are
implicitly added.
user system elapsed
0.50 0.00 0.52 #output
-------------------------------------------------------------
>install.packages(“data.table”) #if the package is not installed
>library(data.table) #load the fread function from data.table package
>system.time(store<-fread(“store.csv”,header= “auto”, sep=“auto”, nrows
=28000));
user system elapsed
0.05 0.00 0.05 #output
System.time(): will give us the system’s process time to execute the code.
?data.table::fread - it’s a wrapper function of read.table to read big data in an
effective and efficient way. To know more about the features of fread() use
>?read.table::fread
read.csv.sql()
2. read.csv.sql(): Reads the file by filtering it with an sql
statement so that it can handle large files in R
>bigdata<- read.csv.sql(file, sql= “…”, header = T, sep=“,”, nrows,
row.names, skip,…………….)
Where
file = name of the file to read
sql = sql statements to filter
header, sep = as in read.csv
nrows, rows.names, skip = as in read.csv
Rupak Roy
Base functions Vs read.csv.sql()
Using standard R base function
>system.time(crimedata<-read.table("crime_data.csv", header=T,
sep=","))
0.00 0.00. 0.08 #output
-------------------------------------------------------------
>install.packages(“sqldf”) #if the package is not installed
>library(sqldf) #load the function from sqldf package
>system.time(crimedata<-read.csv.sql(“crime_data.csv”, sql=“select
*from file where Assault >=10”, header= T, sep=“,”));
user system elapsed
0.05 0.00 0.05 #output
?sqldf::read.csv.sql- it is again a wrapper function of read.csv but it adds
the rich features of a structured query language (sql) to segregate the
data to handle large files. To know more about the features of
read.csv.sqldf use > ?sqldf::read.csv.sqldf
read.csv.ffdf()
3. read.csv.ffdf(): reads input file data into ffdf (ff data frame) objects,
very much like (and using) read.csv and read.table but with more
effective memory management than standard functions.
>bigdata<- read.csv.ffdf(file= “file.csv”, header= F, Verbose = T,
first.rows= 30000, next.rows= 30000)
where
file = the name of the file which the data are to be read from.
verbose = show timings for each processed chunk (default FALSE)
first.rows = number of rows to be read in the first chunk
next.rows = number of rows to be read in further chunks
Rupak Roy
Base functions Vs read.csv.ffdf()
>install.packages(“ff”) #if the package is not installed
>library(ff) #load the function from ff package
>system.time(bigdata<- read.csv.ffdf(file="store.csv", header= T, VERBOSE = T,
first.rows =40000, next.rows=9000,
colClasses=c("factor","factor","factor","numeric","factor")))
We can observe the verbose for the first chunk of 1 to 40,000 rows took
0.47sec and for the next 9000 rows 40,001 to 49,000 took 0.19 sec and so on.
?ff::read.table.ffdf - It can work with any convenience wrappers like read.csv
and it reads large files in row chunks. The first chunk is read with a default of
1,000 rows, for subsequent chunks it adjusts to RAM availability. To know more
about the features of read.table.ffdf use > ?ff::read.table.ffdf
Exporting Big Data
We can also use our base R functions to export the big data like
write.csv() and write.table()
In addition to this,
write.csv.ffdf() also exports the ff df (data frames) into text files.
Rupak Roy
Troubleshoot errors
Important points to remember:
1. Error in scan…… lines did not have 5 elements.
If the rows have unequal length it will throw an error while importing the file.
The solution to this is to use FILL = TRUE, to indicate if the rows have unequal
length then fill it with blank spaces.
So the correct code will be
 Store<-read.table(“store.csv”, header = T,sep=“ , ”, nrows =661,
blank.lines.skip = T, fill = TRUE)
Rupak Roy
Troubleshoot errors
2. Error in ff…… vmode character not implemented.
This is because it doesn’t support character vectors, so it needs to be stored as
factors. The disadvantage of this is the levels are stored in the RAM, so if there
are large number of levels, might cause memory problems.
And also integer doesn’t work.
So the correct code will be
 bigdata<- read.csv.ffdf(file="store.csv", header= T, VERBOSE = T, first.rows
=40000, next.rows=9000, colClasses=c("factor","factor","factor",
"numeric","factor"))
Rupak Roy
Next:
We will learn how to import, export and read directly
the worksheets of an excel file.
Import and export Big Data
Rupak Roy

Weitere ähnliche Inhalte

Was ist angesagt?

Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Simplilearn
 
Organized and disorganized complexity
Organized and disorganized complexityOrganized and disorganized complexity
Organized and disorganized complexityVostrikov Arkady
 
Identifying classes and objects ooad
Identifying classes and objects ooadIdentifying classes and objects ooad
Identifying classes and objects ooadMelba Rosalind
 
Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Ali Raw
 
The Object Model
The Object Model  The Object Model
The Object Model yndaravind
 
Visual pattern recognition
Visual pattern recognitionVisual pattern recognition
Visual pattern recognitionRushin Shah
 
The origin and evaluation criteria of aes
The origin and evaluation criteria of aesThe origin and evaluation criteria of aes
The origin and evaluation criteria of aesMDKAWSARAHMEDSAGAR
 
Basics of Cryptography - Stream ciphers and PRNG
Basics of Cryptography - Stream ciphers and PRNGBasics of Cryptography - Stream ciphers and PRNG
Basics of Cryptography - Stream ciphers and PRNGjulien pauli
 
CLUSTER SILHOUETTES.pptx
CLUSTER SILHOUETTES.pptxCLUSTER SILHOUETTES.pptx
CLUSTER SILHOUETTES.pptxagniva pradhan
 
Type checking in compiler design
Type checking in compiler designType checking in compiler design
Type checking in compiler designSudip Singh
 
Hierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyondHierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyondFrank Kelly
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysisguru_prasadg
 
Authentication Application in Network Security NS4
Authentication Application in Network Security NS4Authentication Application in Network Security NS4
Authentication Application in Network Security NS4koolkampus
 
Introduction to text classification using naive bayes
Introduction to text classification using naive bayesIntroduction to text classification using naive bayes
Introduction to text classification using naive bayesDhwaj Raj
 
Variable scope ppt in vb6
Variable scope ppt in vb6Variable scope ppt in vb6
Variable scope ppt in vb6AmanHooda4
 

Was ist angesagt? (20)

Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
 
Organized and disorganized complexity
Organized and disorganized complexityOrganized and disorganized complexity
Organized and disorganized complexity
 
Identifying classes and objects ooad
Identifying classes and objects ooadIdentifying classes and objects ooad
Identifying classes and objects ooad
 
Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)
 
The Object Model
The Object Model  The Object Model
The Object Model
 
Visual pattern recognition
Visual pattern recognitionVisual pattern recognition
Visual pattern recognition
 
Graph mining ppt
Graph mining pptGraph mining ppt
Graph mining ppt
 
Lex
LexLex
Lex
 
The origin and evaluation criteria of aes
The origin and evaluation criteria of aesThe origin and evaluation criteria of aes
The origin and evaluation criteria of aes
 
Basics of Cryptography - Stream ciphers and PRNG
Basics of Cryptography - Stream ciphers and PRNGBasics of Cryptography - Stream ciphers and PRNG
Basics of Cryptography - Stream ciphers and PRNG
 
CLUSTER SILHOUETTES.pptx
CLUSTER SILHOUETTES.pptxCLUSTER SILHOUETTES.pptx
CLUSTER SILHOUETTES.pptx
 
8 statement level
8 statement level8 statement level
8 statement level
 
Support vector machine-SVM's
Support vector machine-SVM'sSupport vector machine-SVM's
Support vector machine-SVM's
 
Unit 4
Unit 4Unit 4
Unit 4
 
Type checking in compiler design
Type checking in compiler designType checking in compiler design
Type checking in compiler design
 
Hierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyondHierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyond
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
Authentication Application in Network Security NS4
Authentication Application in Network Security NS4Authentication Application in Network Security NS4
Authentication Application in Network Security NS4
 
Introduction to text classification using naive bayes
Introduction to text classification using naive bayesIntroduction to text classification using naive bayes
Introduction to text classification using naive bayes
 
Variable scope ppt in vb6
Variable scope ppt in vb6Variable scope ppt in vb6
Variable scope ppt in vb6
 

Ähnlich wie Import and Export Big Data using R Studio

Get started with R lang
Get started with R langGet started with R lang
Get started with R langsenthil0809
 
R hive tutorial - udf, udaf, udtf functions
R hive tutorial - udf, udaf, udtf functionsR hive tutorial - udf, udaf, udtf functions
R hive tutorial - udf, udaf, udtf functionsAiden Seonghak Hong
 
map reduce Technic in big data
map reduce Technic in big data map reduce Technic in big data
map reduce Technic in big data Jay Nagar
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studioDerek Kane
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big DataDhafer Malouche
 
Import Data using R
Import Data using R Import Data using R
Import Data using R Rupak Roy
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packagesAjay Ohri
 
SessionFive_ImportingandExportingData
SessionFive_ImportingandExportingDataSessionFive_ImportingandExportingData
SessionFive_ImportingandExportingDataHellen Gakuruh
 
Import web resources using R Studio
Import web resources using R StudioImport web resources using R Studio
Import web resources using R StudioRupak Roy
 
Pandas-(Ziad).pptx
Pandas-(Ziad).pptxPandas-(Ziad).pptx
Pandas-(Ziad).pptxSivam Chinna
 
Draft sas and r and sas (may, 2018 asa meeting)
Draft sas and r and sas (may, 2018 asa meeting)Draft sas and r and sas (may, 2018 asa meeting)
Draft sas and r and sas (may, 2018 asa meeting)Barry DeCicco
 
data stage-material
data stage-materialdata stage-material
data stage-materialRajesh Kv
 

Ähnlich wie Import and Export Big Data using R Studio (20)

Get started with R lang
Get started with R langGet started with R lang
Get started with R lang
 
Unit 3
Unit 3Unit 3
Unit 3
 
R hive tutorial - udf, udaf, udtf functions
R hive tutorial - udf, udaf, udtf functionsR hive tutorial - udf, udaf, udtf functions
R hive tutorial - udf, udaf, udtf functions
 
map reduce Technic in big data
map reduce Technic in big data map reduce Technic in big data
map reduce Technic in big data
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
e_lumley.pdf
e_lumley.pdfe_lumley.pdf
e_lumley.pdf
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big Data
 
Import Data using R
Import Data using R Import Data using R
Import Data using R
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 
SessionFive_ImportingandExportingData
SessionFive_ImportingandExportingDataSessionFive_ImportingandExportingData
SessionFive_ImportingandExportingData
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
Migration from 8.1 to 11.3
Migration from 8.1 to 11.3Migration from 8.1 to 11.3
Migration from 8.1 to 11.3
 
Import web resources using R Studio
Import web resources using R StudioImport web resources using R Studio
Import web resources using R Studio
 
Pandas-(Ziad).pptx
Pandas-(Ziad).pptxPandas-(Ziad).pptx
Pandas-(Ziad).pptx
 
Aggregate.pptx
Aggregate.pptxAggregate.pptx
Aggregate.pptx
 
Draft sas and r and sas (may, 2018 asa meeting)
Draft sas and r and sas (may, 2018 asa meeting)Draft sas and r and sas (may, 2018 asa meeting)
Draft sas and r and sas (may, 2018 asa meeting)
 
SAS Programming Notes
SAS Programming NotesSAS Programming Notes
SAS Programming Notes
 
data stage-material
data stage-materialdata stage-material
data stage-material
 
Caerusone
CaerusoneCaerusone
Caerusone
 

Mehr von Rupak Roy

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPRupak Roy
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPRupak Roy
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLPRupak Roy
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLPRupak Roy
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical StepsRupak Roy
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment AnalysisRupak Roy
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular ExpressionsRupak Roy
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining Rupak Roy
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase ArchitectureRupak Roy
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase Rupak Roy
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQLRupak Roy
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Rupak Roy
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive Rupak Roy
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSRupak Roy
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Rupak Roy
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functionsRupak Roy
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to FlumeRupak Roy
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Rupak Roy
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command LineRupak Roy
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations Rupak Roy
 

Mehr von Rupak Roy (20)

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLP
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLP
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLP
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical Steps
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular Expressions
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase Architecture
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQL
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMS
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command Line
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations
 

Kürzlich hochgeladen

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

Kürzlich hochgeladen (20)

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

Import and Export Big Data using R Studio

  • 1. Import and export Big Data using R Studio Rupak Roy
  • 2. Working with Big Data R provides two ways to work with Big Data, one by using R-hadoop functions and an another is R’s in-built base packages and functions by using systems RAM. But the problem with the R’s in-built base functions that it can handle the amount of data based on system’s RAM availability. Therefore higher system memory provides better performance. One of the common errors related to memory in R will show: cant allocate vector of size i.e. error due to memory limitation. So R developers created special packages and functions to handle big data in R through better memory management. Rupak Roy
  • 3. R-Hadoop R-hadoop is also an another function to integrate R programming language with hadoop. Due to its limit of handling data based on system’s RAM availability R uses special packages and functions to send back and forth to process the user instructions using hadoop framework. The reasons why R-hadoop is good fit for big data analytics:  Its an interactive language.  It is also useful for advance data visualizations.  Can easily implement statistical programming features like predictive analysis. #to know more about integrating the R and the hadoop follow our big data Analytics module.
  • 4. fread() R’s special packages and functions to read big data: 1. fread(): similar to read.table in terms of functionality but faster and effective with more parameters. All the controls such as sep, colClasses and nrows are automatically detected. Integer data types are also detected and read directly. Dates are read as character and can be converted afterwards using the time package or standard R base functions. >bigdata<-fread(input, sep=“auto”, header= “auto”, nrow= -1L, stringAsFactors= FALSE,……..); Where as input= file name to read nrow= -1L the number of rows to read, by default -1 means all.
  • 5. Base functions Vs fread() Using standard R base function >system.time(store<-read.table(“store.csv”, header=T, sep=“,”,fill=TRUE, nrows=28000) ) where, fill =If TRUE then in case the rows have unequal length, blank fields are implicitly added. user system elapsed 0.50 0.00 0.52 #output ------------------------------------------------------------- >install.packages(“data.table”) #if the package is not installed >library(data.table) #load the fread function from data.table package >system.time(store<-fread(“store.csv”,header= “auto”, sep=“auto”, nrows =28000)); user system elapsed 0.05 0.00 0.05 #output System.time(): will give us the system’s process time to execute the code. ?data.table::fread - it’s a wrapper function of read.table to read big data in an effective and efficient way. To know more about the features of fread() use >?read.table::fread
  • 6. read.csv.sql() 2. read.csv.sql(): Reads the file by filtering it with an sql statement so that it can handle large files in R >bigdata<- read.csv.sql(file, sql= “…”, header = T, sep=“,”, nrows, row.names, skip,…………….) Where file = name of the file to read sql = sql statements to filter header, sep = as in read.csv nrows, rows.names, skip = as in read.csv Rupak Roy
  • 7. Base functions Vs read.csv.sql() Using standard R base function >system.time(crimedata<-read.table("crime_data.csv", header=T, sep=",")) 0.00 0.00. 0.08 #output ------------------------------------------------------------- >install.packages(“sqldf”) #if the package is not installed >library(sqldf) #load the function from sqldf package >system.time(crimedata<-read.csv.sql(“crime_data.csv”, sql=“select *from file where Assault >=10”, header= T, sep=“,”)); user system elapsed 0.05 0.00 0.05 #output ?sqldf::read.csv.sql- it is again a wrapper function of read.csv but it adds the rich features of a structured query language (sql) to segregate the data to handle large files. To know more about the features of read.csv.sqldf use > ?sqldf::read.csv.sqldf
  • 8. read.csv.ffdf() 3. read.csv.ffdf(): reads input file data into ffdf (ff data frame) objects, very much like (and using) read.csv and read.table but with more effective memory management than standard functions. >bigdata<- read.csv.ffdf(file= “file.csv”, header= F, Verbose = T, first.rows= 30000, next.rows= 30000) where file = the name of the file which the data are to be read from. verbose = show timings for each processed chunk (default FALSE) first.rows = number of rows to be read in the first chunk next.rows = number of rows to be read in further chunks Rupak Roy
  • 9. Base functions Vs read.csv.ffdf() >install.packages(“ff”) #if the package is not installed >library(ff) #load the function from ff package >system.time(bigdata<- read.csv.ffdf(file="store.csv", header= T, VERBOSE = T, first.rows =40000, next.rows=9000, colClasses=c("factor","factor","factor","numeric","factor"))) We can observe the verbose for the first chunk of 1 to 40,000 rows took 0.47sec and for the next 9000 rows 40,001 to 49,000 took 0.19 sec and so on. ?ff::read.table.ffdf - It can work with any convenience wrappers like read.csv and it reads large files in row chunks. The first chunk is read with a default of 1,000 rows, for subsequent chunks it adjusts to RAM availability. To know more about the features of read.table.ffdf use > ?ff::read.table.ffdf
  • 10. Exporting Big Data We can also use our base R functions to export the big data like write.csv() and write.table() In addition to this, write.csv.ffdf() also exports the ff df (data frames) into text files. Rupak Roy
  • 11. Troubleshoot errors Important points to remember: 1. Error in scan…… lines did not have 5 elements. If the rows have unequal length it will throw an error while importing the file. The solution to this is to use FILL = TRUE, to indicate if the rows have unequal length then fill it with blank spaces. So the correct code will be  Store<-read.table(“store.csv”, header = T,sep=“ , ”, nrows =661, blank.lines.skip = T, fill = TRUE) Rupak Roy
  • 12. Troubleshoot errors 2. Error in ff…… vmode character not implemented. This is because it doesn’t support character vectors, so it needs to be stored as factors. The disadvantage of this is the levels are stored in the RAM, so if there are large number of levels, might cause memory problems. And also integer doesn’t work. So the correct code will be  bigdata<- read.csv.ffdf(file="store.csv", header= T, VERBOSE = T, first.rows =40000, next.rows=9000, colClasses=c("factor","factor","factor", "numeric","factor")) Rupak Roy
  • 13. Next: We will learn how to import, export and read directly the worksheets of an excel file. Import and export Big Data Rupak Roy