SlideShare a Scribd company logo
1 of 33
Big Data 
With 
Hadoop Setup 
Mandakini Kumari
Agenda 
1. Big Data ? 
2. Limitation of Existing System 
3. Advantage Of Hadoop 
4. Disadvantage of Hadoop 
5. Hadoop Echo System & Components 
6. Prerequisite for Hadoop 1.x 
7. Install Hadoop 1.X
1.1 Characteristics of Big Data
1.2 In Every 60 seconds on the internet
2.1 Limitation of Existing Data 
Analytics Architecture
3.1 Advantage of Hadoop 
•Hadoop: storage + Computational capabilities both together. While 
RDBMS computation done in CPU which required BUS for data transfer from HardDisk to CPU 
•Fault-tolerant hardware is expensive V/S Hadoop is design to 
run on cheap commodity hardware 
•Complicated Data Replication & Failure System v/s 
Hadoop autometically handles datareplication and node failure. 
•HDFS (storage) is optimized for high throughput. 
•Large block sizes of HDFS helps in large files(GB, PB...) 
• HDFS have high Scalability and Availability for achieve 
data replication and fault tolerance. 
•Extremely scalable 
•MR Framework allows parallel work over a huge data. 
•Job schedule for remote execution on the slave/datanodes 
allow parallel & fast job executions. 
•MR deal with business and HDFS with storage independently
3.2 Advantage of Hadoop
3.3 Advantage of Hadoop
4.1 Disadvantage of Hadoop 
•HDFS is inefficient for handling small files 
•Hadoop 1.X single points of failure at NN 
•Create problem if cluster is more then 4000 because all 
meta data will store on only one NN RAM. 
•Hadoop 2.x don't have single points of failure. 
•Security is major concern because Hadoop 1.X does 
offer a security model But by default it is disabled 
because of its high complexity. 
•Hadoop 1.X does not offer storage or network level 
encryption which is very big concern for government 
sector application data.
5.1 HADOOP ECO SYSTEM
5.2 ADVANTAGE OF HDFS
5.3 NAMENODE: HADOOP COMPONENT 
•It is Master with high end H/W. 
•Store all Metadata in Main Memory i.e. RAM. 
•Type of MetaData: List of files, Blocks for each file, 
DN for each block 
•File attributes: Access time, replication factor 
•JobTracker report to NN after JOB completed. 
•Receive heartbeat from each DN 
•Transaction Log: Records file create / delete etc.
5.4 DATANODE: HADOOP COMPONENT 
•A Slave/commodity H/W 
•File Write operation in DN preferred as sequential 
process. If parallel then issue in data replication. 
•File write in DN is parallel process 
•Provides actual storage. 
•Responsible for read/write data for clients 
•Heartbeat: NN receive heartbeat from DN in 
every 5 or 10 sec. If heartbeat not received then 
data will replicated to another datanode.
5.5 SECONDARY NAMENODE: HADOOP 
COMPONENT 
•Not a hot standby for the NameNode(NN) 
•If NN fail only Read operation can performed no 
block replicated or deleted. 
•If NN failed system will go in safe mode 
•Secondary NameNode connect to NN in every 
hour and get backup of NN metadata 
•Saved metadata can build a failed NameNode
5.6 MAPREDUCE(BUSINESS LOGIC) ENGINE 
•TaskTracker(TT) is slave 
•TT act like resource who work on task 
•Jobtracker(Master) act like manager who split JOB into TASK
5.7 HDFS: HADOOP 
COMPONENT
5.8 FAULT TOLERANCE: 
REPLICATION AND RACK AWARENESS
6. Hadoop Installation: Prerequisites 
1. Ubuntu Linux 12.04.3 LTS 
2. Installing Java v1.5+ 
3. Adding dedicated Hadoop system user. 
4. Configuring SSH access. 
5. Disabling IPv6. 
For Putty user: sudo apt-get install openssh-server 
Run command sudo apt-get update
6.1 Install Java v1.5+ 
6.1.1) Download latest oracle java linux version 
wget https://edelivery.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz 
OR 
To avoid passing username and password use 
wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F 
%2Fwww.oracle.com" 
https://edelivery.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz 
6.1.2) Copy Java binaries into the /usr/local/java directory. 
sudo cp -r jdk-7u25-linux-x64.tar.gz /usr/local/java 
6.1.3) Change the directory to /usr/local/java: cd /usr/local/java 
6.1.4) Unpack the Java binaries, in /usr/local/java 
sudo tar xvzf jdk-7u25-linux-x64.tar.gz 
6.1.5) Edit the system PATH file /etc/profile 
sudo nano /etc/profile or sudo gedit /etc/profile
6.1 Install Java v1.5+ 
6.1.6) At end of /etc/profile file add the following system 
variables to your system path: 
JAVA_HOME=/usr/local/java/jdk1.7.0_25 
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin 
export JAVA_HOME 
export PATH 
6.1.7)Inform your Ubuntu Linux system where your Oracle Java JDK/JRE is 
located. 
sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.7.0_40/bin/javac" 
6.1.8) Reload system wide PATH /etc/profile: . /etc/profile 
6.1.9) Test Java: Java -version
6.2 Add dedicated Hadoop system user 
6.2.1) Adding group: sudo addgroup Hadoop 
6.2.2) Creating a user and adding the user to 
a group: 
sudo adduser –ingroup Hadoop hduser
6.3 Generae an SSH key for the hduser user 
6.3.1) Login as hduser with sudo 
6.3.2) Run this Key generation command: ssh-keyegen -t rsa -P “” 
6.3.3) It will ask to provide the file name in which to save the 
key, just press has entered so that it will generate the key at 
‘/home/hduser/ .ssh’ 
6.3.4)Enable SSH access to your local machine with this 
newly created key. 
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys 
6.3.5) Test SSH setup by connecting to your local machine 
with the hduser user. 
ssh hduser@localhost 
This will add localhost permanently to the list of known hosts
6.4 Disabling IPv6 
6.4.1)We need to disable IPv6 because Ubuntu is 
using 0.0.0.0 IP for different Hadoop configurations. 
Run command : sudo gedit /etc/sysctl.conf 
Add the following lines to the end of the file and 
reboot the machine, to update the configurations 
correctly. 
#disable ipv6 
net.ipv6.conf.all.disable_ipv6 = 1 
net.ipv6.conf.default.disable_ipv6 = 1 
net.ipv6.conf.lo.disable_ipv6 = 1
Install Hadoop 1.2 
Ubuntu Linux 12.04.3 LTS 
Hadoop 1.2.1, released August, 2013 
Download and extract Hadoop: 
Command: wget 
http://archive.apache.org/dist/hadoop/core/hadoop-1.2.0/Command: tar -xvf hadoop-1.2.0.tar.gz
Edit Core-Site.Xml 
Command: sudo gedit hadoop/conf/core-site.xml 
<property> 
<name>fs.default.name</name> 
<value>hdfs://localhost:8020</value> 
</property>
Edit hdfs-site.xml 
Command: sudo gedit hadoop/conf/hdfs-site.xml 
<property> 
<name>dfs.replication</name> 
<value>1</value> 
</property> 
<property> 
<name>dfs.permissions</name> 
<value>false</value> 
</property>
Edit mapred-site.xml 
Command: sudo gedit hadoop/conf/mapred 
-site.xml 
<property> 
<name>mapred.job.tracker</name> 
<value>localhost:8021</value> 
</property>
Get your ip address 
Command: ifconfig 
Command: sudo gedit /etc/hosts
CREATE A SSH KEY 
•Command: ssh-keygen -t rsa 
–P "" 
•Moving the key to 
authorized key: 
•Command: cat 
$HOME/.ssh/id_rsa.pub >> 
$HOME/.ssh/authorized_key 
s
Configuration 
•Reboot the system 
• Add JAVA_HOME in hadoop-env.sh file: 
Command: sudo gedit hadoop/conf/hadoop-env.sh 
Type :export JAVA_HOME=/usr/lib/jvm/java-6- 
openjdk-i386
JAVA_HOME
Hadoop Command 
Format the name node 
Command: bin/hadoop namenode -format 
Start the namenode, datanode 
Command: bin/start-dfs.sh 
Start the task tracker and job tracker 
Command: bin/start-mapred.sh 
To check if Hadoop started correctly 
Command: jps
Thank you 
References: 
http://bigdatahandler.com/2013/10/24/what-is-apache-hadoop/ 
edureka.in 
CONTACT ME @ 
http://in.linkedin.com/pub/mandakini-kumari/ 
18/93/935 
http://www.slideshare.net/mandakinikumari

More Related Content

What's hot

What's hot (19)

phptek13 - Caching and tuning fun tutorial
phptek13 - Caching and tuning fun tutorialphptek13 - Caching and tuning fun tutorial
phptek13 - Caching and tuning fun tutorial
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et KibanaJournée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
Journée DevOps : Des dashboards pour tous avec ElasticSearch, Logstash et Kibana
 
Slow Database in your PHP stack? Don't blame the DBA!
Slow Database in your PHP stack? Don't blame the DBA!Slow Database in your PHP stack? Don't blame the DBA!
Slow Database in your PHP stack? Don't blame the DBA!
 
Rihards Olups - Zabbix 3.0: Excited for new features?
Rihards Olups -  Zabbix 3.0: Excited for new features?Rihards Olups -  Zabbix 3.0: Excited for new features?
Rihards Olups - Zabbix 3.0: Excited for new features?
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Docker Monitoring Webinar
Docker Monitoring  WebinarDocker Monitoring  Webinar
Docker Monitoring Webinar
 
Oozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY WayOozie or Easy: Managing Hadoop Workloads the EASY Way
Oozie or Easy: Managing Hadoop Workloads the EASY Way
 
Website Performance Basics
Website Performance BasicsWebsite Performance Basics
Website Performance Basics
 
Regex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language InsteadRegex Considered Harmful: Use Rosie Pattern Language Instead
Regex Considered Harmful: Use Rosie Pattern Language Instead
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep dive
 
Raymond Kuiper - Working the API like a Unix Pro
Raymond Kuiper - Working the API like a Unix ProRaymond Kuiper - Working the API like a Unix Pro
Raymond Kuiper - Working the API like a Unix Pro
 
Open Source Logging and Monitoring Tools
Open Source Logging and Monitoring ToolsOpen Source Logging and Monitoring Tools
Open Source Logging and Monitoring Tools
 
Mongo performance tuning: tips and tricks
Mongo performance tuning: tips and tricksMongo performance tuning: tips and tricks
Mongo performance tuning: tips and tricks
 
Managing Your Security Logs with Elasticsearch
Managing Your Security Logs with ElasticsearchManaging Your Security Logs with Elasticsearch
Managing Your Security Logs with Elasticsearch
 
The tale of 100 cve's
The tale of 100 cve'sThe tale of 100 cve's
The tale of 100 cve's
 
Application Logging With The ELK Stack
Application Logging With The ELK StackApplication Logging With The ELK Stack
Application Logging With The ELK Stack
 
Care and feeding notes
Care and feeding notesCare and feeding notes
Care and feeding notes
 
Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...Sherlock Homepage - A detective story about running large web services - WebN...
Sherlock Homepage - A detective story about running large web services - WebN...
 

Similar to Big data with hadoop Setup on Ubuntu 12.04

Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
Amrut Patil
 
Implementing Hadoop on a single cluster
Implementing Hadoop on a single clusterImplementing Hadoop on a single cluster
Implementing Hadoop on a single cluster
Salil Navgire
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
fann wu
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
Richard McDougall
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
prabakaranbrick
 

Similar to Big data with hadoop Setup on Ubuntu 12.04 (20)

Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nage
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
 
Exp-3.pptx
Exp-3.pptxExp-3.pptx
Exp-3.pptx
 
Implementing Hadoop on a single cluster
Implementing Hadoop on a single clusterImplementing Hadoop on a single cluster
Implementing Hadoop on a single cluster
 
Micro Datacenter & Data Warehouse
Micro Datacenter & Data WarehouseMicro Datacenter & Data Warehouse
Micro Datacenter & Data Warehouse
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
 
HDFS Issues
HDFS IssuesHDFS Issues
HDFS Issues
 
Upgrading hadoop
Upgrading hadoopUpgrading hadoop
Upgrading hadoop
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
High performance content hosting
High performance content hosting High performance content hosting
High performance content hosting
 
Rh202 q&amp;a-demo-cert magic
Rh202 q&amp;a-demo-cert magicRh202 q&amp;a-demo-cert magic
Rh202 q&amp;a-demo-cert magic
 

More from Mandakini Kumari

Drupal7 an introduction by ayushiinfotech
Drupal7 an introduction by ayushiinfotechDrupal7 an introduction by ayushiinfotech
Drupal7 an introduction by ayushiinfotech
Mandakini Kumari
 

More from Mandakini Kumari (9)

Emerging Trends In Cloud Computing.pptx
Emerging Trends In Cloud Computing.pptxEmerging Trends In Cloud Computing.pptx
Emerging Trends In Cloud Computing.pptx
 
Building an Edge Computing Strategy - Distributed infrastructure.pptx
Building an Edge Computing Strategy - Distributed infrastructure.pptxBuilding an Edge Computing Strategy - Distributed infrastructure.pptx
Building an Edge Computing Strategy - Distributed infrastructure.pptx
 
Emerging Trends in Cloud Computing.pptx
Emerging Trends in Cloud Computing.pptxEmerging Trends in Cloud Computing.pptx
Emerging Trends in Cloud Computing.pptx
 
Women in IT & Inspirational Individual of the Year.pptx
Women in IT & Inspirational Individual of the Year.pptxWomen in IT & Inspirational Individual of the Year.pptx
Women in IT & Inspirational Individual of the Year.pptx
 
Php basic for vit university
Php basic for vit universityPhp basic for vit university
Php basic for vit university
 
Web services soap and rest by mandakini for TechGig
Web services soap and rest by mandakini for TechGigWeb services soap and rest by mandakini for TechGig
Web services soap and rest by mandakini for TechGig
 
Drupal7 an introduction by ayushiinfotech
Drupal7 an introduction by ayushiinfotechDrupal7 an introduction by ayushiinfotech
Drupal7 an introduction by ayushiinfotech
 
Introduction of drupal7 by ayushi infotech
Introduction of drupal7 by ayushi infotechIntroduction of drupal7 by ayushi infotech
Introduction of drupal7 by ayushi infotech
 
Drupal 7 theme by ayushi infotech
Drupal 7 theme by ayushi infotechDrupal 7 theme by ayushi infotech
Drupal 7 theme by ayushi infotech
 

Recently uploaded

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 

Big data with hadoop Setup on Ubuntu 12.04

  • 1. Big Data With Hadoop Setup Mandakini Kumari
  • 2. Agenda 1. Big Data ? 2. Limitation of Existing System 3. Advantage Of Hadoop 4. Disadvantage of Hadoop 5. Hadoop Echo System & Components 6. Prerequisite for Hadoop 1.x 7. Install Hadoop 1.X
  • 4. 1.2 In Every 60 seconds on the internet
  • 5. 2.1 Limitation of Existing Data Analytics Architecture
  • 6. 3.1 Advantage of Hadoop •Hadoop: storage + Computational capabilities both together. While RDBMS computation done in CPU which required BUS for data transfer from HardDisk to CPU •Fault-tolerant hardware is expensive V/S Hadoop is design to run on cheap commodity hardware •Complicated Data Replication & Failure System v/s Hadoop autometically handles datareplication and node failure. •HDFS (storage) is optimized for high throughput. •Large block sizes of HDFS helps in large files(GB, PB...) • HDFS have high Scalability and Availability for achieve data replication and fault tolerance. •Extremely scalable •MR Framework allows parallel work over a huge data. •Job schedule for remote execution on the slave/datanodes allow parallel & fast job executions. •MR deal with business and HDFS with storage independently
  • 9. 4.1 Disadvantage of Hadoop •HDFS is inefficient for handling small files •Hadoop 1.X single points of failure at NN •Create problem if cluster is more then 4000 because all meta data will store on only one NN RAM. •Hadoop 2.x don't have single points of failure. •Security is major concern because Hadoop 1.X does offer a security model But by default it is disabled because of its high complexity. •Hadoop 1.X does not offer storage or network level encryption which is very big concern for government sector application data.
  • 10. 5.1 HADOOP ECO SYSTEM
  • 12. 5.3 NAMENODE: HADOOP COMPONENT •It is Master with high end H/W. •Store all Metadata in Main Memory i.e. RAM. •Type of MetaData: List of files, Blocks for each file, DN for each block •File attributes: Access time, replication factor •JobTracker report to NN after JOB completed. •Receive heartbeat from each DN •Transaction Log: Records file create / delete etc.
  • 13. 5.4 DATANODE: HADOOP COMPONENT •A Slave/commodity H/W •File Write operation in DN preferred as sequential process. If parallel then issue in data replication. •File write in DN is parallel process •Provides actual storage. •Responsible for read/write data for clients •Heartbeat: NN receive heartbeat from DN in every 5 or 10 sec. If heartbeat not received then data will replicated to another datanode.
  • 14. 5.5 SECONDARY NAMENODE: HADOOP COMPONENT •Not a hot standby for the NameNode(NN) •If NN fail only Read operation can performed no block replicated or deleted. •If NN failed system will go in safe mode •Secondary NameNode connect to NN in every hour and get backup of NN metadata •Saved metadata can build a failed NameNode
  • 15. 5.6 MAPREDUCE(BUSINESS LOGIC) ENGINE •TaskTracker(TT) is slave •TT act like resource who work on task •Jobtracker(Master) act like manager who split JOB into TASK
  • 16. 5.7 HDFS: HADOOP COMPONENT
  • 17. 5.8 FAULT TOLERANCE: REPLICATION AND RACK AWARENESS
  • 18. 6. Hadoop Installation: Prerequisites 1. Ubuntu Linux 12.04.3 LTS 2. Installing Java v1.5+ 3. Adding dedicated Hadoop system user. 4. Configuring SSH access. 5. Disabling IPv6. For Putty user: sudo apt-get install openssh-server Run command sudo apt-get update
  • 19. 6.1 Install Java v1.5+ 6.1.1) Download latest oracle java linux version wget https://edelivery.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz OR To avoid passing username and password use wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F %2Fwww.oracle.com" https://edelivery.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz 6.1.2) Copy Java binaries into the /usr/local/java directory. sudo cp -r jdk-7u25-linux-x64.tar.gz /usr/local/java 6.1.3) Change the directory to /usr/local/java: cd /usr/local/java 6.1.4) Unpack the Java binaries, in /usr/local/java sudo tar xvzf jdk-7u25-linux-x64.tar.gz 6.1.5) Edit the system PATH file /etc/profile sudo nano /etc/profile or sudo gedit /etc/profile
  • 20. 6.1 Install Java v1.5+ 6.1.6) At end of /etc/profile file add the following system variables to your system path: JAVA_HOME=/usr/local/java/jdk1.7.0_25 PATH=$PATH:$HOME/bin:$JAVA_HOME/bin export JAVA_HOME export PATH 6.1.7)Inform your Ubuntu Linux system where your Oracle Java JDK/JRE is located. sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.7.0_40/bin/javac" 6.1.8) Reload system wide PATH /etc/profile: . /etc/profile 6.1.9) Test Java: Java -version
  • 21. 6.2 Add dedicated Hadoop system user 6.2.1) Adding group: sudo addgroup Hadoop 6.2.2) Creating a user and adding the user to a group: sudo adduser –ingroup Hadoop hduser
  • 22. 6.3 Generae an SSH key for the hduser user 6.3.1) Login as hduser with sudo 6.3.2) Run this Key generation command: ssh-keyegen -t rsa -P “” 6.3.3) It will ask to provide the file name in which to save the key, just press has entered so that it will generate the key at ‘/home/hduser/ .ssh’ 6.3.4)Enable SSH access to your local machine with this newly created key. cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys 6.3.5) Test SSH setup by connecting to your local machine with the hduser user. ssh hduser@localhost This will add localhost permanently to the list of known hosts
  • 23. 6.4 Disabling IPv6 6.4.1)We need to disable IPv6 because Ubuntu is using 0.0.0.0 IP for different Hadoop configurations. Run command : sudo gedit /etc/sysctl.conf Add the following lines to the end of the file and reboot the machine, to update the configurations correctly. #disable ipv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1
  • 24. Install Hadoop 1.2 Ubuntu Linux 12.04.3 LTS Hadoop 1.2.1, released August, 2013 Download and extract Hadoop: Command: wget http://archive.apache.org/dist/hadoop/core/hadoop-1.2.0/Command: tar -xvf hadoop-1.2.0.tar.gz
  • 25. Edit Core-Site.Xml Command: sudo gedit hadoop/conf/core-site.xml <property> <name>fs.default.name</name> <value>hdfs://localhost:8020</value> </property>
  • 26. Edit hdfs-site.xml Command: sudo gedit hadoop/conf/hdfs-site.xml <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property>
  • 27. Edit mapred-site.xml Command: sudo gedit hadoop/conf/mapred -site.xml <property> <name>mapred.job.tracker</name> <value>localhost:8021</value> </property>
  • 28. Get your ip address Command: ifconfig Command: sudo gedit /etc/hosts
  • 29. CREATE A SSH KEY •Command: ssh-keygen -t rsa –P "" •Moving the key to authorized key: •Command: cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_key s
  • 30. Configuration •Reboot the system • Add JAVA_HOME in hadoop-env.sh file: Command: sudo gedit hadoop/conf/hadoop-env.sh Type :export JAVA_HOME=/usr/lib/jvm/java-6- openjdk-i386
  • 32. Hadoop Command Format the name node Command: bin/hadoop namenode -format Start the namenode, datanode Command: bin/start-dfs.sh Start the task tracker and job tracker Command: bin/start-mapred.sh To check if Hadoop started correctly Command: jps
  • 33. Thank you References: http://bigdatahandler.com/2013/10/24/what-is-apache-hadoop/ edureka.in CONTACT ME @ http://in.linkedin.com/pub/mandakini-kumari/ 18/93/935 http://www.slideshare.net/mandakinikumari