SlideShare ist ein Scribd-Unternehmen logo
1 von 5
Downloaden Sie, um offline zu lesen
RHive tutorial – Rstudio-server setup
for RHive
This tutorial explains how to set up RStudio for using RHive more conveniently.
You can see a detailed how-to document about setting RStudio up at
http://rstudio.org.
A how-to for installing and using RStudio for RHive users is introduced here.

RHive is one of R packages that uses Hadoop and Hive for processing
massive data.

Though there are many R codes made with RHive that come up with results
and finish running in a short time, but if a code that processes extremely large
data is written, it may take a long time for it to finish analyzing and come up
with results.
Depending on the size of the data and the complexity of the processed
calculations, it can take anything from minutes at minimum to couple weeks at
maximum.

The problem here is that R’s session must be kept until the task started by the
user reaches completion.

If the user used a laptop to run the code then it must stay on and keep its
session until the code finishes. Even for desktops, it would be difficult for
desktops to reboot or anything similar while keeping its session until the task
is completed.
There are many other inconveniences stemming from having to keep the
session.

This problem, unrelated to RHive, also occurs when only using either Hadoop
or Hive, and RHive is no exception.

To solve this problem, you can also use a method of having a Hadoop client
opened, connect to the terminal, and run the code in the background.
But this is not that convenient for R users, and it is difficult to make use of the
convenience of the user’s IDE environment or the task environment in R.
Also, if the user is not familiar with using terminal then there is the
inconvenience of having to learn that.

RStudio is the best solution for this.

RStudio provides desktop and server versions but the desktop version is very
good for being an IDE for R.
And RStudio-server connects via a web browser and enables many people to
share common resources, and also has the advantage of being able to keep
the user’s session.
And if the Hadoop, Hive, RHive installed by the user are located in a restricted
network and so warrants approaching them through firewalls, then RStudio
port can be opened for that.
You can use RHive more conveniently if you use RStudio-server with RHive.

Lastly, since RStudio facilitates connecting to the server’s R environment, it
enables sharing of RHive, Hadoop, and Hive between multiple people.

This tutorial will demonstrate how to install, connect to, and use RStudio-
server.

Installing RStudio-server

RStudio can be downloaded from its official site.

http://rstudio.org/

RStudio’s official site, rstudio.org, provides documents detailing how to easily
install and use RStudio.
The page below gives a guide on the installation so it is equally fine to peruse
that instead of this tutorial.

http://rstudio.org/download/server

This tutorial explains how to install RStudio onto CentOS5.
The majority of this installation guide is cited from the aforementioned site,
with partial changes.

Of course, you must install R before installing RStudio-server
If you have read previous RHive tutorials and installed RHive accordingly,
then installation of R should already be complete.
But an explanation will be given here once more.

In order to install newest version of R, you should do the following.

$	
  sudo	
  rpm	
  -­‐Uvh	
  
http://download.fedora.redhat.com/pub/epel/5/i386/epel-­‐release-­‐
5-­‐4.noarch.rpm	
  

Now install R.

$	
  sudo	
  yum	
  install	
  R	
  R-­‐devel	
  
When installing RHive, remember to not only install R but R-devel as well.

Before installing RStudio-server, you must first know whether your server is of
a 32bit architecture or a 64bit architecture.
Recent servers would most likely be 64bit and you can confirm this via the
uname command.

uname	
  -­‐m	
  
x86_64	
  

The above case confirms the server being of a 64bit architecture.

Now download the appropriate RStudio version for your architecture.

Installing for 32-bit:

$	
   wget	
        http://download2.rstudio.org/rstudio-­‐server-­‐0.94.110-­‐
i686.rpm	
  
$ sudo rpm -Uvh rstudio-server-0.94.110-i686.rpm

Installing for 64-bit:

$	
   wget	
   http://download2.rstudio.org/rstudio-­‐server-­‐0.94.110-­‐
x86_64.rpm	
  
$	
  sudo	
  rpm	
  -­‐Uvh	
  rstudio-­‐server-­‐0.94.110-­‐x86_64.rpm	
  

Making a User Account

In order to connect to RStudio-server, a user account must exist in the server
where RStudio-server is installed.
As RStudio-server does not allow connecting via a root account, so accounts
for normal users are needed.

Connect to the server to create accounts for would-be users of RStudio-server
and set their passwords.

ssh	
  root@10.1.1.1	
  
adduser	
  user1	
  
passwd	
  user1	
  
The user1 above is an arbitrarily named account, so name one to your liking.

Starting RStudio-server

RStudio-server must be run as a background process (Daemon mode).
Connect to the server like it is shown below

ssh	
  root@10.1.1.1	
  
/etc/init.d/rstudioserver	
  start	
  

You can easily run it like above.

Connecting to RStudio-server

You can use a web browser to connect to the RStudio-server.
Run your web browser and connect to the RStudio-server’s URL.

http://10.1.1.1:8787

The port that can connect to RStudio is set to be 8787 by default.
You can change this to something else as needed.

Now you can connect to RStutio-server and perform massive data analysis
with R and RHive.

Tips for using RHive in RStudio

While working in RStudio-server, you might experience failure in loading
RHive due to improper environment variables.
In this case you can solve this by adding a code that assigns values for
environment variables.

Sys.setenv(HADOOP_HOME="/mnt/srv/hadoop-­‐0.20.203.0")	
  
Sys.setenv(HIVE_HOME="/mnt/srv/hive-­‐0.7.1")	
  
Sys.setenv(RHIVE_DATA="/mnt/srv/rhive_data")	
  
	
  	
  
library(RHive)	
  

The HADOOP_HOME mentioned above must have assigned to it the home
directories of Hadoop and Hive in the server where RStudio is installed.
And RHIVE_DATA refers to a temporary directory which RHive will use; it is
created in each Hadoop node.
The setting of environment variables should be done before loading RHive via
use of library functions.
If you have loaded RHive without setting the environment variables, then you
can set them and then use the rhive.init() function to initialize RHive.

library(RHive)	
  
	
  	
  
Sys.setenv(HADOOP_HOME="/mnt/srv/hadoop-­‐0.20.203.0")	
  
Sys.setenv(HIVE_HOME="/mnt/srv/hive-­‐0.7.1")	
  
Sys.setenv(RHIVE_DATA="/mnt/srv/rhive_data")	
  
	
  	
  
rhive.init()	
  

Now you have written codes in R via RStudio, and finished the setup of an
environment that can use RHive to handle Hive and Hadoop.

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to FlumeRupak Roy
 
Setting up LAMP for Linux newbies
Setting up LAMP for Linux newbiesSetting up LAMP for Linux newbies
Setting up LAMP for Linux newbiesShabir Ahmad
 
Linux Webserver Installation Command and GUI.ppt
Linux Webserver Installation Command and GUI.pptLinux Webserver Installation Command and GUI.ppt
Linux Webserver Installation Command and GUI.pptwebhostingguy
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase Rupak Roy
 
Standby db creation commands
Standby db creation commandsStandby db creation commands
Standby db creation commandsPiyush Kumar
 
Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0venkatakrishnan k
 
Deploying your rails application to a clean ubuntu 10
Deploying your rails application to a clean ubuntu 10Deploying your rails application to a clean ubuntu 10
Deploying your rails application to a clean ubuntu 10Maurício Linhares
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformIMC Institute
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation Mahantesh Angadi
 

Was ist angesagt? (16)

Linux
LinuxLinux
Linux
 
Hadoop on ec2
Hadoop on ec2Hadoop on ec2
Hadoop on ec2
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
 
Installing lemp with ssl and varnish on Debian 9
Installing lemp with ssl and varnish on Debian 9Installing lemp with ssl and varnish on Debian 9
Installing lemp with ssl and varnish on Debian 9
 
Setting up LAMP for Linux newbies
Setting up LAMP for Linux newbiesSetting up LAMP for Linux newbies
Setting up LAMP for Linux newbies
 
Linux Webserver Installation Command and GUI.ppt
Linux Webserver Installation Command and GUI.pptLinux Webserver Installation Command and GUI.ppt
Linux Webserver Installation Command and GUI.ppt
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase
 
Hadoop completereference
Hadoop completereferenceHadoop completereference
Hadoop completereference
 
Standby db creation commands
Standby db creation commandsStandby db creation commands
Standby db creation commands
 
Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0
 
Linux presentation
Linux presentationLinux presentation
Linux presentation
 
Deploying your rails application to a clean ubuntu 10
Deploying your rails application to a clean ubuntu 10Deploying your rails application to a clean ubuntu 10
Deploying your rails application to a clean ubuntu 10
 
BD-zero lecture.pptx
BD-zero lecture.pptxBD-zero lecture.pptx
BD-zero lecture.pptx
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud Platform
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
 
Ex-8-hive.pptx
Ex-8-hive.pptxEx-8-hive.pptx
Ex-8-hive.pptx
 

Andere mochten auch

RHive tutorials - Basic functions
RHive tutorials - Basic functionsRHive tutorials - Basic functions
RHive tutorials - Basic functionsAiden Seonghak Hong
 
Hive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionHive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionXplenty
 
R hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduceR hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduceAiden Seonghak Hong
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
 
R hive tutorial - udf, udaf, udtf functions
R hive tutorial - udf, udaf, udtf functionsR hive tutorial - udf, udaf, udtf functions
R hive tutorial - udf, udaf, udtf functionsAiden Seonghak Hong
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy snehal parikh
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and RJunHo Cho
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
 

Andere mochten auch (11)

RHive tutorials - Basic functions
RHive tutorials - Basic functionsRHive tutorials - Basic functions
RHive tutorials - Basic functions
 
RHadoop, R meets Hadoop
RHadoop, R meets HadoopRHadoop, R meets Hadoop
RHadoop, R meets Hadoop
 
Hive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionHive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly Competition
 
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 
R hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduceR hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduce
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
R hive tutorial - udf, udaf, udtf functions
R hive tutorial - udf, udaf, udtf functionsR hive tutorial - udf, udaf, udtf functions
R hive tutorial - udf, udaf, udtf functions
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and R
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Enabling R on Hadoop
Enabling R on HadoopEnabling R on Hadoop
Enabling R on Hadoop
 

Ähnlich wie R hive tutorial supplement 3 - Rstudio-server setup for rhive

Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R StudioRupak Roy
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Søren Lund
 
Setting Up a Cloud Server - Part 2 - Transcript.pdf
Setting Up a Cloud Server - Part 2 - Transcript.pdfSetting Up a Cloud Server - Part 2 - Transcript.pdf
Setting Up a Cloud Server - Part 2 - Transcript.pdfShaiAlmog1
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installationSumitra Pundlik
 
Rapid miner r extension 5
Rapid miner r extension 5Rapid miner r extension 5
Rapid miner r extension 5raz3366
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial IntroductionSakthi Dasans
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase clientShashwat Shriparv
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for WindowsTerry Padgett
 
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuis Rodríguez Castromil
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox
 
Apache web server tutorial for linux
Apache web server tutorial for linuxApache web server tutorial for linux
Apache web server tutorial for linuxSahad Sali
 
HOW TO RUN RSTUDIO SERVERS ANYWHERE WITH CONTAINERS - HPC, CLOUD, AND LOCALLY
HOW TO RUN RSTUDIO SERVERS ANYWHERE WITH CONTAINERS - HPC, CLOUD, AND LOCALLYHOW TO RUN RSTUDIO SERVERS ANYWHERE WITH CONTAINERS - HPC, CLOUD, AND LOCALLY
HOW TO RUN RSTUDIO SERVERS ANYWHERE WITH CONTAINERS - HPC, CLOUD, AND LOCALLYWendy Wong
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2benjaminwootton
 
Configuration of Apache Web Server On CentOS 8
Configuration of Apache Web Server On CentOS 8Configuration of Apache Web Server On CentOS 8
Configuration of Apache Web Server On CentOS 8Kaan Aslandağ
 
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSH
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSHTame Your Build And Deployment Process With Hudson, PHPUnit, and SSH
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSHDavid Stockton
 

Ähnlich wie R hive tutorial supplement 3 - Rstudio-server setup for rhive (20)

RStudio
RStudioRStudio
RStudio
 
R Studio (Report)
R Studio (Report)R Studio (Report)
R Studio (Report)
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
 
RHadoop - beginners
RHadoop - beginnersRHadoop - beginners
RHadoop - beginners
 
Unit 5
Unit  5Unit  5
Unit 5
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)
 
Setting Up a Cloud Server - Part 2 - Transcript.pdf
Setting Up a Cloud Server - Part 2 - Transcript.pdfSetting Up a Cloud Server - Part 2 - Transcript.pdf
Setting Up a Cloud Server - Part 2 - Transcript.pdf
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
Rapid miner r extension 5
Rapid miner r extension 5Rapid miner r extension 5
Rapid miner r extension 5
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial Introduction
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for Windows
 
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Apache web server tutorial for linux
Apache web server tutorial for linuxApache web server tutorial for linux
Apache web server tutorial for linux
 
HOW TO RUN RSTUDIO SERVERS ANYWHERE WITH CONTAINERS - HPC, CLOUD, AND LOCALLY
HOW TO RUN RSTUDIO SERVERS ANYWHERE WITH CONTAINERS - HPC, CLOUD, AND LOCALLYHOW TO RUN RSTUDIO SERVERS ANYWHERE WITH CONTAINERS - HPC, CLOUD, AND LOCALLY
HOW TO RUN RSTUDIO SERVERS ANYWHERE WITH CONTAINERS - HPC, CLOUD, AND LOCALLY
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
Configuration of Apache Web Server On CentOS 8
Configuration of Apache Web Server On CentOS 8Configuration of Apache Web Server On CentOS 8
Configuration of Apache Web Server On CentOS 8
 
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSH
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSHTame Your Build And Deployment Process With Hudson, PHPUnit, and SSH
Tame Your Build And Deployment Process With Hudson, PHPUnit, and SSH
 
Apache ppt
Apache pptApache ppt
Apache ppt
 

Mehr von Aiden Seonghak Hong

RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치Aiden Seonghak Hong
 
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치Aiden Seonghak Hong
 
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치Aiden Seonghak Hong
 
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스Aiden Seonghak Hong
 
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수Aiden Seonghak Hong
 
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수Aiden Seonghak Hong
 
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수Aiden Seonghak Hong
 
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정Aiden Seonghak Hong
 

Mehr von Aiden Seonghak Hong (10)

IoT and Big data with R
IoT and Big data with RIoT and Big data with R
IoT and Big data with R
 
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
 
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
 
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
 
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
 
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
 
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
 
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
 
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
 
R hive tutorial 1
R hive tutorial 1R hive tutorial 1
R hive tutorial 1
 

Kürzlich hochgeladen

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Kürzlich hochgeladen (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

R hive tutorial supplement 3 - Rstudio-server setup for rhive

  • 1. RHive tutorial – Rstudio-server setup for RHive This tutorial explains how to set up RStudio for using RHive more conveniently. You can see a detailed how-to document about setting RStudio up at http://rstudio.org. A how-to for installing and using RStudio for RHive users is introduced here. RHive is one of R packages that uses Hadoop and Hive for processing massive data. Though there are many R codes made with RHive that come up with results and finish running in a short time, but if a code that processes extremely large data is written, it may take a long time for it to finish analyzing and come up with results. Depending on the size of the data and the complexity of the processed calculations, it can take anything from minutes at minimum to couple weeks at maximum. The problem here is that R’s session must be kept until the task started by the user reaches completion. If the user used a laptop to run the code then it must stay on and keep its session until the code finishes. Even for desktops, it would be difficult for desktops to reboot or anything similar while keeping its session until the task is completed. There are many other inconveniences stemming from having to keep the session. This problem, unrelated to RHive, also occurs when only using either Hadoop or Hive, and RHive is no exception. To solve this problem, you can also use a method of having a Hadoop client opened, connect to the terminal, and run the code in the background. But this is not that convenient for R users, and it is difficult to make use of the convenience of the user’s IDE environment or the task environment in R. Also, if the user is not familiar with using terminal then there is the inconvenience of having to learn that. RStudio is the best solution for this. RStudio provides desktop and server versions but the desktop version is very good for being an IDE for R. And RStudio-server connects via a web browser and enables many people to
  • 2. share common resources, and also has the advantage of being able to keep the user’s session. And if the Hadoop, Hive, RHive installed by the user are located in a restricted network and so warrants approaching them through firewalls, then RStudio port can be opened for that. You can use RHive more conveniently if you use RStudio-server with RHive. Lastly, since RStudio facilitates connecting to the server’s R environment, it enables sharing of RHive, Hadoop, and Hive between multiple people. This tutorial will demonstrate how to install, connect to, and use RStudio- server. Installing RStudio-server RStudio can be downloaded from its official site. http://rstudio.org/ RStudio’s official site, rstudio.org, provides documents detailing how to easily install and use RStudio. The page below gives a guide on the installation so it is equally fine to peruse that instead of this tutorial. http://rstudio.org/download/server This tutorial explains how to install RStudio onto CentOS5. The majority of this installation guide is cited from the aforementioned site, with partial changes. Of course, you must install R before installing RStudio-server If you have read previous RHive tutorials and installed RHive accordingly, then installation of R should already be complete. But an explanation will be given here once more. In order to install newest version of R, you should do the following. $  sudo  rpm  -­‐Uvh   http://download.fedora.redhat.com/pub/epel/5/i386/epel-­‐release-­‐ 5-­‐4.noarch.rpm   Now install R. $  sudo  yum  install  R  R-­‐devel  
  • 3. When installing RHive, remember to not only install R but R-devel as well. Before installing RStudio-server, you must first know whether your server is of a 32bit architecture or a 64bit architecture. Recent servers would most likely be 64bit and you can confirm this via the uname command. uname  -­‐m   x86_64   The above case confirms the server being of a 64bit architecture. Now download the appropriate RStudio version for your architecture. Installing for 32-bit: $   wget   http://download2.rstudio.org/rstudio-­‐server-­‐0.94.110-­‐ i686.rpm   $ sudo rpm -Uvh rstudio-server-0.94.110-i686.rpm Installing for 64-bit: $   wget   http://download2.rstudio.org/rstudio-­‐server-­‐0.94.110-­‐ x86_64.rpm   $  sudo  rpm  -­‐Uvh  rstudio-­‐server-­‐0.94.110-­‐x86_64.rpm   Making a User Account In order to connect to RStudio-server, a user account must exist in the server where RStudio-server is installed. As RStudio-server does not allow connecting via a root account, so accounts for normal users are needed. Connect to the server to create accounts for would-be users of RStudio-server and set their passwords. ssh  root@10.1.1.1   adduser  user1   passwd  user1  
  • 4. The user1 above is an arbitrarily named account, so name one to your liking. Starting RStudio-server RStudio-server must be run as a background process (Daemon mode). Connect to the server like it is shown below ssh  root@10.1.1.1   /etc/init.d/rstudioserver  start   You can easily run it like above. Connecting to RStudio-server You can use a web browser to connect to the RStudio-server. Run your web browser and connect to the RStudio-server’s URL. http://10.1.1.1:8787 The port that can connect to RStudio is set to be 8787 by default. You can change this to something else as needed. Now you can connect to RStutio-server and perform massive data analysis with R and RHive. Tips for using RHive in RStudio While working in RStudio-server, you might experience failure in loading RHive due to improper environment variables. In this case you can solve this by adding a code that assigns values for environment variables. Sys.setenv(HADOOP_HOME="/mnt/srv/hadoop-­‐0.20.203.0")   Sys.setenv(HIVE_HOME="/mnt/srv/hive-­‐0.7.1")   Sys.setenv(RHIVE_DATA="/mnt/srv/rhive_data")       library(RHive)   The HADOOP_HOME mentioned above must have assigned to it the home directories of Hadoop and Hive in the server where RStudio is installed. And RHIVE_DATA refers to a temporary directory which RHive will use; it is created in each Hadoop node.
  • 5. The setting of environment variables should be done before loading RHive via use of library functions. If you have loaded RHive without setting the environment variables, then you can set them and then use the rhive.init() function to initialize RHive. library(RHive)       Sys.setenv(HADOOP_HOME="/mnt/srv/hadoop-­‐0.20.203.0")   Sys.setenv(HIVE_HOME="/mnt/srv/hive-­‐0.7.1")   Sys.setenv(RHIVE_DATA="/mnt/srv/rhive_data")       rhive.init()   Now you have written codes in R via RStudio, and finished the setup of an environment that can use RHive to handle Hive and Hadoop.