SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
RHive tutorial - HDFS functions
Hive uses Hadoop’s system to process distributed file systems.
Thus, in order to expertly use Hive and RHive,
you must be able to do things along the lines of using HDFS to put, get, and
remove big data.
RHive possesses Functions that correspond to what the “hadoop fs”
command supports.
Using these Functions, a user can in R environment handle HDFS without
using HADOOP CLI(command line interface) or Hadoop HDFS library.
If you find yourself more comfortable with using “hadoop”’s CLI or Hadoop
library then it is also fine to use them.
But if you are not familiar with using Rstudio server or working from a terminal,
RHive HDFS Functions should prove to be easy-to-use solutions in handling
HDFS for R users.

Before Emulating this Example
rhive.hdfs.* Functions work after RHive has successfully been installed and
library(Rhive) and rhive.connect are successfully executed.
Let’s not forget to do the following before emulating the example.

#	
  Open	
  R	
  
library(RHive)	
  
rhive.connect()	
  


rhive.hdfs.connect
In order to use RHive Functions to use HDFS, a connection to hdfs must be
established.
But if the Hadoop configuration for HDFS is properly set and rhive.connect
Function is executed, then this Function will automatically be
processed/executed* so there is no need to have this separately executed.

If you need to connect to a different HDFS then you can do it like this:

rhive.hdfs.connect("hdfs://10.1.1.1:9000")	
  
[1]	
  "Java-­‐Object{DFS[DFSClient[clientName=DFSClient_630489789,	
  
ugi=root]]}"	
  
The connection will fail to establish itself if you do not insert the exact
hostname and port number servicing HDFS.
Ask the system manager if you do not have this information.

rhive.hdfs.ls
This does the same thing as "hadoop fs -ls" and this is used like this.

rhive.hdfs.ls("/")	
  
	
  	
  permission	
  owner	
  	
  	
  	
  	
  	
  group	
  	
  	
  length	
  	
  	
  	
  	
  	
  modify-­‐
time	
  	
  	
  	
  	
  	
  	
  	
  file	
  
1	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
14:27	
  	
  	
  	
  /airline	
  
2	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  13:16	
  
/benchmarks	
  
3	
  	
  rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  root	
  supergroup	
  11186419	
  2011-­‐12-­‐06	
  
03:59	
  	
  	
  /messages	
  
4	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
22:05	
  	
  	
  	
  	
  	
  	
  	
  /mnt	
  
5	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐13	
  
20:24	
  	
  	
  	
  	
  	
  /rhive	
  
6	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
20:19	
  	
  	
  	
  	
  	
  	
  	
  /tmp	
  
7	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐14	
  
01:14	
  	
  	
  	
  	
  	
  	
  /user	
  

This is the same as the command which uses Hadoop CLI.

hadoop	
  fs	
  -­‐ls	
  /	
  


rhive.hdfs.get
The rhive.hdfs.get Function’s role is to bring the data in HDFS to local.
This functions in the same way as "hadoop fs -get".
The next example entails taking messages data in HDFS and saving them to
local system’s /tmp/messages, then checking the number of Records.

rhive.hdfs.get("/messages",	
  "/tmp/messages")	
  
[1]	
  TRUE	
  
system("wc	
  -­‐l	
  /tmp/messages")	
  
145889	
  /tmp/messages	
  


rhive.hdfs.put
The rhive.hdfs.put Function uploads all data in local to HDFS.
This functions like "hadoop fs -put" and opposite of rhive.hdfs.get.
The following example uploads the “/tmp/messages” in local system to
“/messages_new” in HDFS.

rhive.hdfs.put("/tmp/messages",	
  "/messages_new")	
  
rhive.hdfs.ls("/")	
  
	
  	
  permission	
  owner	
  	
  	
  	
  	
  	
  group	
  	
  	
  length	
  	
  	
  	
  	
  	
  modify-­‐
time	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  file	
  
1	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
14:27	
  	
  	
  	
  	
  	
  /airline	
  
2	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
13:16	
  	
  	
  /benchmarks	
  
3	
  	
  rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  root	
  supergroup	
  11186419	
  2011-­‐12-­‐06	
  
03:59	
  	
  	
  	
  	
  /messages	
  
4	
  	
  rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  root	
  supergroup	
  11186419	
  2011-­‐12-­‐14	
  02:02	
  
/messages_new	
  
5	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
22:05	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /mnt	
  
6	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐13	
  
20:24	
  	
  	
  	
  	
  	
  	
  	
  /rhive	
  
7	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐14	
  
01:14	
  	
  	
  	
  	
  	
  	
  	
  	
  /user	
  

You can see a new file, "/messages_new", now appears in HDFS.

rhive.hdfs.rm
This does the same thing as "hadoop fs -rm", deleting files in HDFS.
rhive.hdfs.rm("/messages_new")	
  
rhive.hdfs.ls("/")	
  
	
  	
  permission	
  owner	
  	
  	
  	
  	
  	
  group	
  	
  	
  length	
  	
  	
  	
  	
  	
  modify-­‐
time	
  	
  	
  	
  	
  	
  	
  	
  file	
  
1	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
14:27	
  	
  	
  	
  /airline	
  
2	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  13:16	
  
/benchmarks	
  
3	
  	
  rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  root	
  supergroup	
  11186419	
  2011-­‐12-­‐06	
  
03:59	
  	
  	
  /messages	
  
4	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
22:05	
  	
  	
  	
  	
  	
  	
  	
  /mnt	
  
5	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐13	
  
20:24	
  	
  	
  	
  	
  	
  /rhive	
  
6	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐14	
  
01:14	
  	
  	
  	
  	
  	
  	
  /user	
  

You can see the "/messages_new" file has been deleted from within HDFS.

rhive.hdfs.rename
This does the same thing as "hadoop fs -mv".
That is, it changes the file name for files in HDFS or moves directories.

rhive.hdfs.rename("/messages",	
  "/messages_renamed")	
  
[1]	
  TRUE	
  
rhive.hdfs.ls("/")	
  
	
  	
  permission	
  owner	
  	
  	
  	
  	
  	
  group	
  	
  	
  length	
  	
  	
  	
  	
  	
  modify-­‐
time	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  file	
  
1	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
14:27	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /airline	
  
2	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
13:16	
  	
  	
  	
  	
  	
  	
  /benchmarks	
  
3	
  	
  rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  root	
  supergroup	
  11186419	
  2011-­‐12-­‐06	
  03:59	
  
/messages_renamed	
  
4	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
22:05	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /mnt	
  
5	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐13	
  
20:24	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /rhive	
  
6	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐14	
  
01:14	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /user	
  




rhive.hdfs.exists
This checks whether a file exists within HDFS. There is no corresponding
command hadoop that serves as a counterpart.

rhive.hdfs.exists("/messages_renamed")	
  
[1]	
  TRUE	
  
rhive.hdfs.exists("/foobar")	
  
[1]	
  FALSE	
  


rhive.hdfs.mkdirs
This does the same thing as "hadoop fs -mkdir".
This makes directories in HDFS, even subdirectories.

rhive.hdfs.mkdirs("/newdir/newsubdir")	
  
[1]	
  TRUE	
  
rhive.hdfs.ls("/")	
  
	
  	
  permission	
  owner	
  	
  	
  	
  	
  	
  group	
  	
  	
  length	
  	
  	
  	
  	
  	
  modify-­‐
time	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  file	
  
1	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
14:27	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /airline	
  
2	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
13:16	
  	
  	
  	
  	
  	
  	
  /benchmarks	
  
3	
  	
  rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  root	
  supergroup	
  11186419	
  2011-­‐12-­‐06	
  03:59	
  
/messages_renamed	
  
4	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
22:05	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /mnt	
  
5	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐14	
  
02:13	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /newdir	
  
6	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐13	
  
20:24	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /rhive	
  
7	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐14	
  
01:14	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /user	
  
rhive.hdfs.ls("/newdir")	
  
	
  	
  permission	
  owner	
  	
  	
  	
  	
  	
  group	
  length	
  	
  	
  	
  	
  	
  modify-­‐
time	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  file	
  
1	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐14	
  02:13	
  
/newdir/newsubdir	
  


rhive.hdfs.close
This is used to close the connection when you have completed using HDFS
and no longer need to use it.

rhive.hdfs.close()	
  

Weitere ähnliche Inhalte

Was ist angesagt?

101 2.4b use debian package management v2
101 2.4b use debian package management v2101 2.4b use debian package management v2
101 2.4b use debian package management v2Acácio Oliveira
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows habeebulla g
 
Unix commands in etl testing
Unix commands in etl testingUnix commands in etl testing
Unix commands in etl testingGaruda Trainings
 
TP2 Big Data HBase
TP2 Big Data HBaseTP2 Big Data HBase
TP2 Big Data HBaseAmal Abid
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117exsuns
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Rupak Roy
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installationSumitra Pundlik
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215exsuns
 
Content server installation guide
Content server installation guideContent server installation guide
Content server installation guideNaveed Bashir
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functionsRupak Roy
 
Hadoop installation
Hadoop installationHadoop installation
Hadoop installationhabeebulla g
 
HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성Young Pyo
 
Installing hadoop on ubuntu 16
Installing hadoop on ubuntu 16Installing hadoop on ubuntu 16
Installing hadoop on ubuntu 16Enrique Davila
 
Hive data migration (export/import)
Hive data migration (export/import)Hive data migration (export/import)
Hive data migration (export/import)Bopyo Hong
 
Install and upgrade Oracle grid infrastructure 12.1.0.2
Install and upgrade Oracle grid infrastructure 12.1.0.2Install and upgrade Oracle grid infrastructure 12.1.0.2
Install and upgrade Oracle grid infrastructure 12.1.0.2Biju Thomas
 

Was ist angesagt? (16)

101 2.4b use debian package management v2
101 2.4b use debian package management v2101 2.4b use debian package management v2
101 2.4b use debian package management v2
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows
 
Unix commands in etl testing
Unix commands in etl testingUnix commands in etl testing
Unix commands in etl testing
 
TP2 Big Data HBase
TP2 Big Data HBaseTP2 Big Data HBase
TP2 Big Data HBase
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215
 
Content server installation guide
Content server installation guideContent server installation guide
Content server installation guide
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Hadoop installation
Hadoop installationHadoop installation
Hadoop installation
 
HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성
 
Hadoop completereference
Hadoop completereferenceHadoop completereference
Hadoop completereference
 
Installing hadoop on ubuntu 16
Installing hadoop on ubuntu 16Installing hadoop on ubuntu 16
Installing hadoop on ubuntu 16
 
Hive data migration (export/import)
Hive data migration (export/import)Hive data migration (export/import)
Hive data migration (export/import)
 
Install and upgrade Oracle grid infrastructure 12.1.0.2
Install and upgrade Oracle grid infrastructure 12.1.0.2Install and upgrade Oracle grid infrastructure 12.1.0.2
Install and upgrade Oracle grid infrastructure 12.1.0.2
 

Andere mochten auch

RHive tutorials - Basic functions
RHive tutorials - Basic functionsRHive tutorials - Basic functions
RHive tutorials - Basic functionsAiden Seonghak Hong
 
R hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduceR hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduceAiden Seonghak Hong
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and RJunHo Cho
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
 
Introduccion a Apache Spark
Introduccion a Apache SparkIntroduccion a Apache Spark
Introduccion a Apache SparkGustavo Arjones
 
Docker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksDocker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksAdrien Blind
 

Andere mochten auch (9)

RHive tutorials - Basic functions
RHive tutorials - Basic functionsRHive tutorials - Basic functions
RHive tutorials - Basic functions
 
R hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduceR hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduce
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and R
 
RHadoop, R meets Hadoop
RHadoop, R meets HadoopRHadoop, R meets Hadoop
RHadoop, R meets Hadoop
 
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
Introduccion a Apache Spark
Introduccion a Apache SparkIntroduccion a Apache Spark
Introduccion a Apache Spark
 
Docker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksDocker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined Networks
 
Enabling R on Hadoop
Enabling R on HadoopEnabling R on Hadoop
Enabling R on Hadoop
 

Ähnlich wie RHive tutorial - HDFS functions

Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase clientShashwat Shriparv
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFSApache Apex
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapakapa rohit
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceUday Vakalapudi
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Upgrading from HDP 2.1 to HDP 2.2
Upgrading from HDP 2.1 to HDP 2.2Upgrading from HDP 2.1 to HDP 2.2
Upgrading from HDP 2.1 to HDP 2.2SATOSHI TAGOMORI
 
5c_BigData_Hadoop_HDFS.PPTX
5c_BigData_Hadoop_HDFS.PPTX5c_BigData_Hadoop_HDFS.PPTX
5c_BigData_Hadoop_HDFS.PPTXMiguel720844
 
Inside HDFS Append
Inside HDFS AppendInside HDFS Append
Inside HDFS AppendYue Chen
 
Snapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File SystemSnapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File SystemBhavesh Padharia
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop AdministrationEdureka!
 
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
Hadoop 2.x  HDFS Cluster Installation (VirtualBox)Hadoop 2.x  HDFS Cluster Installation (VirtualBox)
Hadoop 2.x HDFS Cluster Installation (VirtualBox)Amir Sedighi
 
Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands SimoniShah6
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoopFrank Y
 

Ähnlich wie RHive tutorial - HDFS functions (20)

Hadoop File System Shell Commands,
Hadoop File System Shell Commands,Hadoop File System Shell Commands,
Hadoop File System Shell Commands,
 
HDFS_Command_Reference
HDFS_Command_ReferenceHDFS_Command_Reference
HDFS_Command_Reference
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
MapReduce1.pptx
MapReduce1.pptxMapReduce1.pptx
MapReduce1.pptx
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Upgrading from HDP 2.1 to HDP 2.2
Upgrading from HDP 2.1 to HDP 2.2Upgrading from HDP 2.1 to HDP 2.2
Upgrading from HDP 2.1 to HDP 2.2
 
RHadoop - beginners
RHadoop - beginnersRHadoop - beginners
RHadoop - beginners
 
Unix Basics Commands
Unix Basics CommandsUnix Basics Commands
Unix Basics Commands
 
5c_BigData_Hadoop_HDFS.PPTX
5c_BigData_Hadoop_HDFS.PPTX5c_BigData_Hadoop_HDFS.PPTX
5c_BigData_Hadoop_HDFS.PPTX
 
Inside HDFS Append
Inside HDFS AppendInside HDFS Append
Inside HDFS Append
 
Snapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File SystemSnapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File System
 
Basics of Linux
Basics of LinuxBasics of Linux
Basics of Linux
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
Hadoop 2.x  HDFS Cluster Installation (VirtualBox)Hadoop 2.x  HDFS Cluster Installation (VirtualBox)
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
 
Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoop
 

Mehr von Aiden Seonghak Hong

RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치Aiden Seonghak Hong
 
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치Aiden Seonghak Hong
 
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치Aiden Seonghak Hong
 
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스Aiden Seonghak Hong
 
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수Aiden Seonghak Hong
 
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수Aiden Seonghak Hong
 
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수Aiden Seonghak Hong
 
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정Aiden Seonghak Hong
 
R hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing HiveR hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing HiveAiden Seonghak Hong
 
R hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopR hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopAiden Seonghak Hong
 

Mehr von Aiden Seonghak Hong (12)

IoT and Big data with R
IoT and Big data with RIoT and Big data with R
IoT and Big data with R
 
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
 
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
 
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
 
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
 
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
 
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
 
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
 
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
 
R hive tutorial 1
R hive tutorial 1R hive tutorial 1
R hive tutorial 1
 
R hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing HiveR hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing Hive
 
R hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopR hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing Hadoop
 

Kürzlich hochgeladen

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Kürzlich hochgeladen (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

RHive tutorial - HDFS functions

  • 1. RHive tutorial - HDFS functions Hive uses Hadoop’s system to process distributed file systems. Thus, in order to expertly use Hive and RHive, you must be able to do things along the lines of using HDFS to put, get, and remove big data. RHive possesses Functions that correspond to what the “hadoop fs” command supports. Using these Functions, a user can in R environment handle HDFS without using HADOOP CLI(command line interface) or Hadoop HDFS library. If you find yourself more comfortable with using “hadoop”’s CLI or Hadoop library then it is also fine to use them. But if you are not familiar with using Rstudio server or working from a terminal, RHive HDFS Functions should prove to be easy-to-use solutions in handling HDFS for R users. Before Emulating this Example rhive.hdfs.* Functions work after RHive has successfully been installed and library(Rhive) and rhive.connect are successfully executed. Let’s not forget to do the following before emulating the example. #  Open  R   library(RHive)   rhive.connect()   rhive.hdfs.connect In order to use RHive Functions to use HDFS, a connection to hdfs must be established. But if the Hadoop configuration for HDFS is properly set and rhive.connect Function is executed, then this Function will automatically be processed/executed* so there is no need to have this separately executed. If you need to connect to a different HDFS then you can do it like this: rhive.hdfs.connect("hdfs://10.1.1.1:9000")   [1]  "Java-­‐Object{DFS[DFSClient[clientName=DFSClient_630489789,   ugi=root]]}"  
  • 2. The connection will fail to establish itself if you do not insert the exact hostname and port number servicing HDFS. Ask the system manager if you do not have this information. rhive.hdfs.ls This does the same thing as "hadoop fs -ls" and this is used like this. rhive.hdfs.ls("/")      permission  owner            group      length            modify-­‐ time                file   1    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   14:27        /airline   2    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07  13:16   /benchmarks   3    rw-­‐r-­‐-­‐r-­‐-­‐    root  supergroup  11186419  2011-­‐12-­‐06   03:59      /messages   4    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   22:05                /mnt   5    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐13   20:24            /rhive   6    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   20:19                /tmp   7    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐14   01:14              /user   This is the same as the command which uses Hadoop CLI. hadoop  fs  -­‐ls  /   rhive.hdfs.get The rhive.hdfs.get Function’s role is to bring the data in HDFS to local. This functions in the same way as "hadoop fs -get". The next example entails taking messages data in HDFS and saving them to local system’s /tmp/messages, then checking the number of Records. rhive.hdfs.get("/messages",  "/tmp/messages")  
  • 3. [1]  TRUE   system("wc  -­‐l  /tmp/messages")   145889  /tmp/messages   rhive.hdfs.put The rhive.hdfs.put Function uploads all data in local to HDFS. This functions like "hadoop fs -put" and opposite of rhive.hdfs.get. The following example uploads the “/tmp/messages” in local system to “/messages_new” in HDFS. rhive.hdfs.put("/tmp/messages",  "/messages_new")   rhive.hdfs.ls("/")      permission  owner            group      length            modify-­‐ time                    file   1    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   14:27            /airline   2    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   13:16      /benchmarks   3    rw-­‐r-­‐-­‐r-­‐-­‐    root  supergroup  11186419  2011-­‐12-­‐06   03:59          /messages   4    rw-­‐r-­‐-­‐r-­‐-­‐    root  supergroup  11186419  2011-­‐12-­‐14  02:02   /messages_new   5    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   22:05                    /mnt   6    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐13   20:24                /rhive   7    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐14   01:14                  /user   You can see a new file, "/messages_new", now appears in HDFS. rhive.hdfs.rm This does the same thing as "hadoop fs -rm", deleting files in HDFS.
  • 4. rhive.hdfs.rm("/messages_new")   rhive.hdfs.ls("/")      permission  owner            group      length            modify-­‐ time                file   1    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   14:27        /airline   2    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07  13:16   /benchmarks   3    rw-­‐r-­‐-­‐r-­‐-­‐    root  supergroup  11186419  2011-­‐12-­‐06   03:59      /messages   4    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   22:05                /mnt   5    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐13   20:24            /rhive   6    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐14   01:14              /user   You can see the "/messages_new" file has been deleted from within HDFS. rhive.hdfs.rename This does the same thing as "hadoop fs -mv". That is, it changes the file name for files in HDFS or moves directories. rhive.hdfs.rename("/messages",  "/messages_renamed")   [1]  TRUE   rhive.hdfs.ls("/")      permission  owner            group      length            modify-­‐ time                            file   1    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   14:27                    /airline   2    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   13:16              /benchmarks   3    rw-­‐r-­‐-­‐r-­‐-­‐    root  supergroup  11186419  2011-­‐12-­‐06  03:59   /messages_renamed   4    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   22:05                            /mnt  
  • 5. 5    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐13   20:24                        /rhive   6    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐14   01:14                          /user   rhive.hdfs.exists This checks whether a file exists within HDFS. There is no corresponding command hadoop that serves as a counterpart. rhive.hdfs.exists("/messages_renamed")   [1]  TRUE   rhive.hdfs.exists("/foobar")   [1]  FALSE   rhive.hdfs.mkdirs This does the same thing as "hadoop fs -mkdir". This makes directories in HDFS, even subdirectories. rhive.hdfs.mkdirs("/newdir/newsubdir")   [1]  TRUE   rhive.hdfs.ls("/")      permission  owner            group      length            modify-­‐ time                            file   1    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   14:27                    /airline   2    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   13:16              /benchmarks   3    rw-­‐r-­‐-­‐r-­‐-­‐    root  supergroup  11186419  2011-­‐12-­‐06  03:59   /messages_renamed   4    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   22:05                            /mnt   5    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐14   02:13                      /newdir   6    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐13  
  • 6. 20:24                        /rhive   7    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐14   01:14                          /user   rhive.hdfs.ls("/newdir")      permission  owner            group  length            modify-­‐ time                            file   1    rwxr-­‐xr-­‐x    root  supergroup            0  2011-­‐12-­‐14  02:13   /newdir/newsubdir   rhive.hdfs.close This is used to close the connection when you have completed using HDFS and no longer need to use it. rhive.hdfs.close()