SlideShare ist ein Scribd-Unternehmen logo
1 von 73
Module-2
Hadoop Architecture and HDFS
www.edureka.co/big-data-and-hadoop
Objectives
www.edureka.co/big-data-and-hadoopSlide 2
At the end of this module, you will be able to
 Analyse Hadoop 2.x Cluster Architecture – Federation
 Analyse Hadoop 2.x Cluster Architecture – High Availability
 Run Hadoop in different cluster modes
 Implement basic Hadoop commands on Terminal
 Prepare Hadoop 2.x configuration files and analyze the parameters in it
 Analyze dump of a MapReduce program
 Implement different data loading techniques
 Hadoop Core Components
 HDFS Architecture
 What is HDFS?
 Hadoop Vs. Traditional Systems
 NameNode and Secondary
NameNode
Let’s Revise
DataNode
Node
Manager
DataNode DataNode DataNode
YARN
HDFS
Cluster
Resource
Manager
NameNode
Node
Manager
Node
Manager
Node
Manager
www.edureka.co/big-data-and-hadoopSlide 3
Pre-Class Questions
www.edureka.co/big-data-and-hadoopSlide 4
Pre-Class Questions
Annie’s Question
www.edureka.co/big-data-and-hadoopSlide 5
The default replication factor is:
a. 2
b. 4
c. 5
d. 3
Annie’s Answer
Ans. Option d.
It means if you move a file to HDFS then by default 3
copies of the file will be stored on different datanodes.
www.edureka.co/big-data-and-hadoopSlide 6
Annie’s Question
Every Slave node has two daemons running on
them that is DataNode and NodeManager in a
MultiNode Cluster.
a. TRUE
b. FALSE
www.edureka.co/big-data-and-hadoopSlide 7
Annie’s Answer
www.edureka.co/big-data-and-hadoopSlide 8
Ans. TRUE
DataNode service for HDFS and NodeManager for
processing.
Annie’s Question
www.edureka.co/big-data-and-hadoopSlide 9
A block is replicated in 4 nodes K,L,M, and N. If
M, K and N fails. A client can still read the data.
a. TRUE
b. FALSE
Annie’s Answer
www.edureka.co/big-data-and-hadoopSlide 10
Ans. TRUE.
As the remaining node ‘L’ will contain the block in
question.
RAM: 16GB
Hard disk: 6 x 2TB
Processor: Xenon with 2 cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Hadoop Cluster: A Typical Use Case
RAM: 16GB
Hard disk: 6 x 2TB
Processor: Xenon with 2 cores.
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Active NameNode
RAM: 64 GB,
Hard disk: 1 TB
Processor: Xenon with 8 Cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
StandBy NameNode
RAM: 64 GB,
Hard disk: 1 TB
Processor: Xenon with 8 Cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
Optional
Secondary NameNode
RAM: 32 GB,
Hard disk: 1 TB
Processor: Xenon with 4 Cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
RAM: 16GB
Hard disk: 6 x 2TB
Processor: Xenon with 2 cores
Ethernet: 3 x 10 GB/s
OS: 64-bit CentOS
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
www.edureka.co/big-data-and-hadoopSlide 11
Master
NameNode
http://master:50070/
ResourceManager
http://master:8088
Slave01
DataNode
NodeManager
Slave02
NodeManager
DataNode
Slave03
DataNode
NodeManager
Slave04
DataNode
Slave05
NodeManager
DataNode
NodeManager
www.edureka.co/big-data-and-hadoopSlide 12
Hadoop 2.x Cluster Architecture
NodeManager
DataNode
NodeManager
Client
HDFS YARN
NameNode
DataNode
NodeManager DataNode
ResourceManager
DataNode
NodeManager
DataNode
NodeManager
NodeManager
DataNode
NodeManager
DataNode
NodeManager
DataNode
www.edureka.co/big-data-and-hadoopSlide 13
Hadoop 2.x Cluster Architecture (Contd.)
Namenode N
S
BlockStorageNamespace
Hadoop 1.0 Hadoop 2.0
http://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-hdfs/Federation.html
Datanode … Datanode
Storage
Block Management
Hadoop 2.x Cluster Architecture - Federation
www.edureka.co/big-data-and-hadoopSlide 14
Annie’s Question
How does HDFS Federation help HDFS Scale horizontally?
a.Reduces the load on any single NameNode by using the
multiple, independent NameNode to manage individual
parts of the file system namespace.
b.Provides cross-data centre (non-local) support for
HDFS, allowing a cluster administrator to split the Block
Storage outside the local cluster.
www.edureka.co/big-data-and-hadoopSlide 15
Annie’s Answer
www.edureka.co/big-data-and-hadoopSlide 16
Ans. Option (a)
In order to scale the name service horizontally, HDFS
federation uses multiple independent NameNode. The
NameNode are federated, that is, the NameNode are
independent and don’t require coordination with each
other.
Annie’s Question
You have configured two name nodes to manage
/marketing and /finance respectively. What will happen if
you try to put a file in /accounting directory?
www.edureka.co/big-data-and-hadoopSlide 17
Annie’s Answer
Ans. Put will fail. None of the namespace will manage the
file and you will get an IOException with a No such file or
directory error.
www.edureka.co/big-data-and-hadoopSlide 18
Hadoop 2.x – High Availability
Node Manager
Container
App
Master
Node Manager
Container
App
Master
All name space edits
logged to shared NFS
storage; single writer
(fencing)
Read edit logs and
applies to its own
namespace
Secondary
Name Node
DataNode
Standby
NameNode
Active
NameNode
DataNode
NameNode
High
Availability
*Not necessary to
configureSecondary
NameNode
http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html
Shared Edit Logs
HDFS HIGH AVAILABILITY
Client
www.edureka.co/big-data-and-hadoopSlide 19
Hadoop 2.x – Resource Management
YARN
Resource
Manager
it logs and
to its own
ace
DataNodeDataNode
Next Generation
MapReduce
http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html
Client
HDFS
All name space edits
logged to shared NFS
storage; single writer Read ed
(fencing)
Shared Edit Logs applies
namesp
NameNode
High Secondary Active Standby
Availability Name Node NameNode NameNode
*Not necessary to
DataNode DataNode
configureSecondary
NameNode
Node Manager Node Manager
App App
Container Master Container Master
HDFS HIGH AVAILABILITY
Node Manager
Container
App
Master
Node Manager
Container
App
Master
www.edureka.co/big-data-and-hadoopSlide 20
Node Manager
DataNode
Node Manager
DataNode
YARN – Yet Another Resource Negotiator
Masters
Slaves
Container
App
Master
Scheduler
Resource Manager
Applications
Manager
(AsM)
Client
Container
App
Master
Hadoop 2.x – Resource Management (Contd.)
www.edureka.co/big-data-and-hadoopSlide 21
Annie’s Question
HDFS HA was developed to overcome the following
disadvantage in Hadoop 1.0?
a. Single Point of Failure of NameNode
b. Only one version can be run in classic Map-
Reduce
c. Too much burden on Job Tracker
www.edureka.co/big-data-and-hadoopSlide 22
Annie’s Answer
Ans. Single Point of Failure of NameNode
www.edureka.co/big-data-and-hadoopSlide 23
Hadoop Cluster: Facebook
Facebook
 We use Hadoop to store copies of internal log and dimension data sources and use
it as a source for reporting/analytics and machine learning.
 Currently we have 2 major clusters:
» A 1100-machine cluster with 8800 cores and about 12 PB raw storage.
» A 300-machine cluster with 2400 cores and about 3 PB raw storage.
» Each (commodity) node has 8 cores and 12 TB of storage.
» We are heavy users of both streaming as well as the Java APIs. We have
built a higher level data warehousing framework using these features called
Hive(see the http://Hadoop.apache.org/hive/). We have also developed a
FUSE implementation over HDFS.
www.edureka.co/big-data-and-hadoopSlide 24
Pseudo-Distributed Mode
 Hadoop daemons run on the local machine.
Fully-Distributed Mode
 Hadoop daemons run on a cluster of machines.
Hadoop can run in any of the following three modes:
Standalone (or Local) Mode
 No daemons, everything runs in a single JVM.
 Suitable for running MapReduce programs during development.
 Has no DFS.
www.edureka.co/big-data-and-hadoopSlide 25
Hadoop Cluster Modes
Terminal Commands
www.edureka.co/big-data-and-hadoopSlide 26
Terminal Commands
www.edureka.co/big-data-and-hadoopSlide 27
Hadoop FS Shell Commands
 HDFS organizes its data in files and directories
 It provides a command line interface called the FS shell that lets the user interact with data in the HDFS
 The syntax of the commands is similar to bash
www.edureka.co/big-data-and-hadoopSlide 28
Listing of files present in bin Directory
Listing of files present on HDFS
Terminal Commands
www.edureka.co/big-data-and-hadoopSlide 29
Configuration
Filenames
Description of Log Files
hadoop-env.sh Environment variables that are used in the scripts to run Hadoop.
core-site.xml
Configuration settings for Hadoop Core such as I/O settings that are common to HDFS and
MapReduce.
hdfs-site.xml
Configuration settings for HDFS daemons, the namenode, the secondary namenode and the data
nodes.
mapred-site.xml Configuration settings for MapReduce Applications.
yarn-site.xml Configuration settings for ResourceManager and NodeManager.
masters A list of machines (one per line) that each run a secondary namenode.
slaves A list of machines (one per line) that each run a Datanode and a NodeManager.
www.edureka.co/big-data-and-hadoopSlide 30
Hadoop 2.x Configuration Files
Hadoop 2.x Configuration Files – Apache Hadoop
www.edureka.co/big-data-and-hadoopSlide 31
Core
HDFS
core-site.xml
hdfs-site.xml
yarn-site.xmlYARN
mapred-site.xml
Map
Reduce
Hadoop 2.x Configuration Files – Apache Hadoop
www.edureka.co/big-data-and-hadoopSlide 32
core-site.xml
-------------------------------------------------core-site.xml-----------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- core-site.xml -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
------------------------------------------------core-site.xml-----------------------------------------------------
The name of the default file
system. The url's authority is
used to determine the host,
port, etc. for a filesystem.
www.edureka.co/big-data-and-hadoopSlide 33
hdfs-site.xml
---------------------------------------------------------hdfs-site.xml-------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- hdfs-site.xml -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/edureka/hadoop-2.2.0/hadoop2_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/edureka/hadoop-2.2.0/hadoop2_data/hdfs/datanode</value>
</property>
</configuration>
----------------------------------
Determines where on the local
filesystem the DFS name node
should store the name
table(fsimage).
If "true", enable permission
checking in HDFS. If "false",
permission checking is turned off.
Determines where on the local
filesystem the DFS name node
should store the name
table(fsimage).
Determines where on the local
filesystem an DFS data node should
store its blocks.
---------------------------------------------------------hdfs-site.xml---------------------------
www.edureka.co/big-data-and-hadoopSlide 35
mapred-site.xml
-----------------------------------------------mapred-site.xml---------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- mapred-site.xml -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
-----------------------------------------------mapred-site.xml---------------------------------------------------
The runtime framework for
executing MapReduce jobs.
Can be one of local, classic
or yarn.
www.edureka.co/big-data-and-hadoopSlide 35
yarn-site.xml
-----------------------------------------------yarn-site.xml---------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- yarn-site.xml -->
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
-----------------------------------------------yarn-site.xml---------------------------------------------------
The auxiliary service
name.
The auxiliary service
class to use.
www.edureka.co/big-data-and-hadoopSlide 36
1. http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/core-default.xml
2. http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
3. http://hadoop.apache.org/docs/r2.2.0/hadoop-mapreduce-client/hadoop-mapreduce-client-
core/mapred-default.xml
4. http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
www.edureka.co/big-data-and-hadoopSlide 37
All Properties
Two files are used by the startup and shutdown commands:
Slaves and Masters
Slaves
 Contains a list of hosts, one per line, that are to host DataNode and
NodeManager servers.
Masters
 Contains a list of hosts, one per line, that are to host Secondary
NameNode servers.
www.edureka.co/big-data-and-hadoopSlide 38
 This file also offers a way to provide custom parameters for each of the servers.
 Hadoop-env.sh is sourced by all of the Hadoop Core scripts provided in the hadoop directory which is present in
hadoop installation directory (hadoop-2.2.0/etc/hadoop).
 Examples of environment variables that you can specify:
export HADOOP_HEAPSIZE=“512"
export HADOOP_DATANODE_HEAPSIZE=“128"
Set parameter JAVA_HOME
JVMhadoop-env.sh
Per-Process RunTime Environment
www.edureka.co/big-data-and-hadoopSlide 39
Hadoop Daemons
www.edureka.co/big-data-and-hadoopSlide 40
Hadoop Daemons
www.edureka.co/big-data-and-hadoopSlide 41
 NameNodedaemon
» Runs on master node of the Hadoop Distributed File System (HDFS)
» Directs Data Nodes to perform their low-level I/O tasks
 DataNodedaemon
» Runs on each slave machine in the HDFS
» Does the low-level I/O work
 ResourceManager
» Runs on master node of the Data processing System(MapReduce)
» Global resource Scheduler
 NodeManager
» Runs on each slave node of Data processing System
» Platform for the Data processing tasks
 JobHistoryServer
» JobHistoryServer is responsible for servicing all job history related requests from client
Hadoop Web UI Parts
www.edureka.co/big-data-and-hadoopSlide 42
Service Servers Default
Used Ports
Protocol Description
NameNode
WebUI
Master Nodes
(NameNode and any
back-up NameNodes)
50070 http
Web UI to look at current status of
HDFS, explore file system
DataNode All Slave Nodes
50075 http Data Node WebUI to access the status,
logs etc.
Resource-
Manager Web UI
Cluster Level resource
manager
8088 http Web UI for Resource-Manager and for
application submissions
NodeManager Monitors resources on
Data Node
8042 TCP Node information, List of Applications
and List of containers
MapReduce
JobHistory
Server
Get status on finished
applications.
19888 TCP
Providing logs of important events in
MapReduce job execution and associated
profiling metrics
 NameNode status: http://localhost:50070/dfshealth.jsp
 ResourceManager status: http://localhost:8088/cluster
 MapReduce JobHistory Server status: http://localhost:19888/jobhistory
www.edureka.co/big-data-and-hadoopSlide 43
Web UI URLs
Annie’s Question
Which of the following file is used to specify the
NameNode's heap size?
a. bashrc
b. hadoop-env.sh
c. hdfs-site.sh
d. core-site.xml
www.edureka.co/big-data-and-hadoopSlide 44
Annie’s Answer
Ans. hadoop-env.sh.
This file specifies environment variables that affect the
JDK used by Hadoop Daemon (bin/Hadoop)
www.edureka.co/big-data-and-hadoopSlide 45
Annie’s Question
It is necessary to define all the properties in core-
site.xml, hdfs-site.xml,yarn-site.xml & mapred-site.xml.
a. TRUE
b. FALSE
www.edureka.co/big-data-and-hadoopSlide 46
Annie’s Answer
Ans. False.
Detailed answer will be given after the next question.
www.edureka.co/big-data-and-hadoopSlide 47
Annie’s Question
Stand alone Mode uses default configuration?
a) TRUE
b) FALSE
www.edureka.co/big-data-and-hadoopSlide 48
Annie’s Answer
Ans. True.
In Stand alone mode Hadoop runs with default
configuration
(Empty configuration files i.e. no configuration settings in
core-site.xml, hdfs-site.xml, mapred-site.xml and yarn-
site.xml). If properties are not defined in the
configuration files, hadoop runs with default values for
the corresponding properties.
www.edureka.co/big-data-and-hadoopSlide 49
Sample Examples List
www.edureka.co/big-data-and-hadoopSlide 50
Running the Teragen Example
www.edureka.co/big-data-and-hadoopSlide 51
Checking the Output
www.edureka.co/big-data-and-hadoopSlide 52
Checking the Output
www.edureka.co/big-data-and-hadoopSlide 53
Annie’s Question
The output of a MR job will be stored on HDFS:
a. TRUE
b. FALSE
www.edureka.co/big-data-and-hadoopSlide 54
Annie’s Answer
Ans. True
It is stored in different part files for eg – part-m-00000,
part-m-00001 and so on.
www.edureka.co/big-data-and-hadoopSlide 55
Annie’s Question
To run MR job data should be present on HDFS:
a. TRUE
b. FALSE
www.edureka.co/big-data-and-hadoopSlide 56
Annie’s Answer
Ans. True
In order to process data in parallel it is necessary that it
is present on HDFS so that MR can work on chunks of
data in parallel.
www.edureka.co/big-data-and-hadoopSlide 57
Slide 59 www.edureka.co/big-data-and-hadoop
HDFS
Data Analysis
Data Loading
Using Pig Using HIVE
Using Flume Using Sqoop Using Hadoop Copy Commands
Data Loading Techniques and Data Analysis
hadoop distcp hdfs://<source NN> hdfs://<target NN>
www.edureka.co/big-data-and-hadoopSlide 59
distcp: Distributed Copy to move data between clusters, used for backup and recovery
put: Copy file(s) from local file system to destination file system. It can also read from “stdin” and writes to
destination file system.
copyFromLocal: Similar to “put” command, except that the source is restricted to a local file reference.
hadoop dfs –put weather.txt hdfs://<target Namenode>
hadoop dfs –copyFromLocal weather.txt hdfs://<target Namenode>
Hadoop Copy Commands
Demo on Copy Commands
www.edureka.co/big-data-and-hadoopSlide 60
Flume is a distributed, reliable, and available service for efficiently collecting,
aggregating, and moving large amounts of streaming event data.
Twitter Source Memory Channel
Twitter
Streaming
API
HDFS Sink
HDFS
Flume
Data Loading Using Flume
Demo will be covered in
Module 10
www.edureka.co/big-data-and-hadoopSlide 61
Apache Sqoop (TM) is a tool designed for efficiently transferring bulk data
between Apache Hadoop and structured data stores such as relational
databases.
 Imports individual tables or entire databases to HDFS.
 Generates Java classes to allow you to interact with your
imported data.
 Provides the ability to import from SQL databases straight into
your Hive data warehouse.
Data Loading Using Sqoop
Demo will be covered in
Module 10
www.edureka.co/big-data-and-hadoopSlide 62
Annie’s Question
Your website is hosting a group of more than 300 sub-
websites. You want to have an analytics on the shopping
patterns of different visitors? What is the best way to
collect those information from the weblogs?
a. SQOOP
b. FLUME
www.edureka.co/big-data-and-hadoopSlide 63
Annie’s Answer
www.edureka.co/big-data-and-hadoopSlide 64
Ans. FLUME.
Annie’s Question
You want to join data collected from two sources. One
source of data collected from a big database of call
records is already available in HDFS. The another source
of data is available in a database table. The best way to
move that data in HDFS is:
a. SQOOP import
b. PIG script
c. Hive Query
www.edureka.co/big-data-and-hadoopSlide 65
Annie’s Answer
Ans. SQOOP import.
www.edureka.co/big-data-and-hadoopSlide 66
Assignment
 Go through Edureka VM and explore it
 Check working condition of Hadoop eco system in Edureka VM
Follow this
document to
install
Edureka VM
www.edureka.co/big-data-and-hadoopSlide 67
Further Reading
 Hadoop Cluster Setup
http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/ClusterSetup.html
 Hadoop on Amazon AWS ec2
http://www.edureka.in/blog/install-apache-hadoop-cluster/
 Hadoop Hardware Selection
http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-
hadoop-cluster/
http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-Win-1.3.0/bk_cluster-planning-
guide/content/ch_hardware-recommendations.html
 Hadoop Cluster Configuration
http://www.edureka.in/blog/hadoop-cluster-configuration-files/
www.edureka.co/big-data-and-hadoopSlide 68
Further Reading
 MapReduce Job execution
http://www.edureka.in/blog/anatomy-of-a-mapreduce-job-in-apache-hadoop/
 Add/Remove Nodes in a Cluster
http://www.edureka.in/blog/commissioning-and-decommissioning-nodes-in-a-hadoop-cluster/
 Secondary Namenode
https://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-
hdfs/HdfsUserGuide.html#Secondary_NameNode
www.edureka.co/big-data-and-hadoopSlide 69
Setup the Hadoop development environment using the documents present in the LMS.
 Refresh your Java Skills using Java Essential for Hadoop Tutorial
Review the Interview Questions for setting up hadoop cluster.
http://www.edureka.in/blog/hadoop-interview-questions-hadoop-cluster/
www.edureka.co/big-data-and-hadoopSlide 70
Pre-work for next Class
Agenda for Next Class
www.edureka.co/big-data-and-hadoopSlide 71
 Use Cases of MapReduce
 Traditional vs MapReduce Way
 Hadoop 2.x MapReduce Components and Architecture
 YARN Execution Flow
 MapReduceConcepts
Your feedback is important to us, be it a compliment, a suggestion or a complaint. It helps us to make
the course better!
Please spare few minutes to take the survey after the webinar.
www.edureka.co/big-data-and-hadoopSlide 72
Survey
Hadoop Architecture and HDFS

Weitere ähnliche Inhalte

Was ist angesagt?

Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra Nikiforos Botis
 
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...Edureka!
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Jay Patel
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...Simplilearn
 
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka Edureka!
 
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...Simplilearn
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQLkristinferrier
 
Introduction to HADOOP.pdf
Introduction to HADOOP.pdfIntroduction to HADOOP.pdf
Introduction to HADOOP.pdf8840VinayShelke
 

Was ist angesagt? (20)

Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Hive
HiveHive
Hive
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
Introduction to HADOOP.pdf
Introduction to HADOOP.pdfIntroduction to HADOOP.pdf
Introduction to HADOOP.pdf
 
Sqoop
SqoopSqoop
Sqoop
 

Ähnlich wie Hadoop Architecture and HDFS

Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jkEdureka!
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop ClusterEdureka!
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterEdureka!
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop AdministrationEdureka!
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterEdureka!
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Rohit Agrawal
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityEdureka!
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryIJRESJOURNAL
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreducesenthil0809
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosEdureka!
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Simplilearn
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudEdureka!
 
big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing datapreetik9044
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoopFrank Y
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFSApache Apex
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapakapa rohit
 
Hadoop at a glance
Hadoop at a glanceHadoop at a glance
Hadoop at a glanceTan Tran
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapakapa rohit
 

Ähnlich wie Hadoop Architecture and HDFS (20)

Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop Cluster
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
 
Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High Availability
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on Raspberry
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS Cloud
 
big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing data
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoop
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
 
Hadoop at a glance
Hadoop at a glanceHadoop at a glance
Hadoop at a glance
 
MapReduce1.pptx
MapReduce1.pptxMapReduce1.pptx
MapReduce1.pptx
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 

Mehr von Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

Mehr von Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Kürzlich hochgeladen

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Kürzlich hochgeladen (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Hadoop Architecture and HDFS

  • 1. Module-2 Hadoop Architecture and HDFS www.edureka.co/big-data-and-hadoop
  • 2. Objectives www.edureka.co/big-data-and-hadoopSlide 2 At the end of this module, you will be able to  Analyse Hadoop 2.x Cluster Architecture – Federation  Analyse Hadoop 2.x Cluster Architecture – High Availability  Run Hadoop in different cluster modes  Implement basic Hadoop commands on Terminal  Prepare Hadoop 2.x configuration files and analyze the parameters in it  Analyze dump of a MapReduce program  Implement different data loading techniques
  • 3.  Hadoop Core Components  HDFS Architecture  What is HDFS?  Hadoop Vs. Traditional Systems  NameNode and Secondary NameNode Let’s Revise DataNode Node Manager DataNode DataNode DataNode YARN HDFS Cluster Resource Manager NameNode Node Manager Node Manager Node Manager www.edureka.co/big-data-and-hadoopSlide 3
  • 5. Annie’s Question www.edureka.co/big-data-and-hadoopSlide 5 The default replication factor is: a. 2 b. 4 c. 5 d. 3
  • 6. Annie’s Answer Ans. Option d. It means if you move a file to HDFS then by default 3 copies of the file will be stored on different datanodes. www.edureka.co/big-data-and-hadoopSlide 6
  • 7. Annie’s Question Every Slave node has two daemons running on them that is DataNode and NodeManager in a MultiNode Cluster. a. TRUE b. FALSE www.edureka.co/big-data-and-hadoopSlide 7
  • 8. Annie’s Answer www.edureka.co/big-data-and-hadoopSlide 8 Ans. TRUE DataNode service for HDFS and NodeManager for processing.
  • 9. Annie’s Question www.edureka.co/big-data-and-hadoopSlide 9 A block is replicated in 4 nodes K,L,M, and N. If M, K and N fails. A client can still read the data. a. TRUE b. FALSE
  • 10. Annie’s Answer www.edureka.co/big-data-and-hadoopSlide 10 Ans. TRUE. As the remaining node ‘L’ will contain the block in question.
  • 11. RAM: 16GB Hard disk: 6 x 2TB Processor: Xenon with 2 cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Hadoop Cluster: A Typical Use Case RAM: 16GB Hard disk: 6 x 2TB Processor: Xenon with 2 cores. Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Active NameNode RAM: 64 GB, Hard disk: 1 TB Processor: Xenon with 8 Cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply StandBy NameNode RAM: 64 GB, Hard disk: 1 TB Processor: Xenon with 8 Cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply Optional Secondary NameNode RAM: 32 GB, Hard disk: 1 TB Processor: Xenon with 4 Cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply RAM: 16GB Hard disk: 6 x 2TB Processor: Xenon with 2 cores Ethernet: 3 x 10 GB/s OS: 64-bit CentOS DataNode DataNode DataNode DataNode DataNode DataNode www.edureka.co/big-data-and-hadoopSlide 11
  • 14. Namenode N S BlockStorageNamespace Hadoop 1.0 Hadoop 2.0 http://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-hdfs/Federation.html Datanode … Datanode Storage Block Management Hadoop 2.x Cluster Architecture - Federation www.edureka.co/big-data-and-hadoopSlide 14
  • 15. Annie’s Question How does HDFS Federation help HDFS Scale horizontally? a.Reduces the load on any single NameNode by using the multiple, independent NameNode to manage individual parts of the file system namespace. b.Provides cross-data centre (non-local) support for HDFS, allowing a cluster administrator to split the Block Storage outside the local cluster. www.edureka.co/big-data-and-hadoopSlide 15
  • 16. Annie’s Answer www.edureka.co/big-data-and-hadoopSlide 16 Ans. Option (a) In order to scale the name service horizontally, HDFS federation uses multiple independent NameNode. The NameNode are federated, that is, the NameNode are independent and don’t require coordination with each other.
  • 17. Annie’s Question You have configured two name nodes to manage /marketing and /finance respectively. What will happen if you try to put a file in /accounting directory? www.edureka.co/big-data-and-hadoopSlide 17
  • 18. Annie’s Answer Ans. Put will fail. None of the namespace will manage the file and you will get an IOException with a No such file or directory error. www.edureka.co/big-data-and-hadoopSlide 18
  • 19. Hadoop 2.x – High Availability Node Manager Container App Master Node Manager Container App Master All name space edits logged to shared NFS storage; single writer (fencing) Read edit logs and applies to its own namespace Secondary Name Node DataNode Standby NameNode Active NameNode DataNode NameNode High Availability *Not necessary to configureSecondary NameNode http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html Shared Edit Logs HDFS HIGH AVAILABILITY Client www.edureka.co/big-data-and-hadoopSlide 19
  • 20. Hadoop 2.x – Resource Management YARN Resource Manager it logs and to its own ace DataNodeDataNode Next Generation MapReduce http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html Client HDFS All name space edits logged to shared NFS storage; single writer Read ed (fencing) Shared Edit Logs applies namesp NameNode High Secondary Active Standby Availability Name Node NameNode NameNode *Not necessary to DataNode DataNode configureSecondary NameNode Node Manager Node Manager App App Container Master Container Master HDFS HIGH AVAILABILITY Node Manager Container App Master Node Manager Container App Master www.edureka.co/big-data-and-hadoopSlide 20
  • 21. Node Manager DataNode Node Manager DataNode YARN – Yet Another Resource Negotiator Masters Slaves Container App Master Scheduler Resource Manager Applications Manager (AsM) Client Container App Master Hadoop 2.x – Resource Management (Contd.) www.edureka.co/big-data-and-hadoopSlide 21
  • 22. Annie’s Question HDFS HA was developed to overcome the following disadvantage in Hadoop 1.0? a. Single Point of Failure of NameNode b. Only one version can be run in classic Map- Reduce c. Too much burden on Job Tracker www.edureka.co/big-data-and-hadoopSlide 22
  • 23. Annie’s Answer Ans. Single Point of Failure of NameNode www.edureka.co/big-data-and-hadoopSlide 23
  • 24. Hadoop Cluster: Facebook Facebook  We use Hadoop to store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning.  Currently we have 2 major clusters: » A 1100-machine cluster with 8800 cores and about 12 PB raw storage. » A 300-machine cluster with 2400 cores and about 3 PB raw storage. » Each (commodity) node has 8 cores and 12 TB of storage. » We are heavy users of both streaming as well as the Java APIs. We have built a higher level data warehousing framework using these features called Hive(see the http://Hadoop.apache.org/hive/). We have also developed a FUSE implementation over HDFS. www.edureka.co/big-data-and-hadoopSlide 24
  • 25. Pseudo-Distributed Mode  Hadoop daemons run on the local machine. Fully-Distributed Mode  Hadoop daemons run on a cluster of machines. Hadoop can run in any of the following three modes: Standalone (or Local) Mode  No daemons, everything runs in a single JVM.  Suitable for running MapReduce programs during development.  Has no DFS. www.edureka.co/big-data-and-hadoopSlide 25 Hadoop Cluster Modes
  • 28. Hadoop FS Shell Commands  HDFS organizes its data in files and directories  It provides a command line interface called the FS shell that lets the user interact with data in the HDFS  The syntax of the commands is similar to bash www.edureka.co/big-data-and-hadoopSlide 28
  • 29. Listing of files present in bin Directory Listing of files present on HDFS Terminal Commands www.edureka.co/big-data-and-hadoopSlide 29
  • 30. Configuration Filenames Description of Log Files hadoop-env.sh Environment variables that are used in the scripts to run Hadoop. core-site.xml Configuration settings for Hadoop Core such as I/O settings that are common to HDFS and MapReduce. hdfs-site.xml Configuration settings for HDFS daemons, the namenode, the secondary namenode and the data nodes. mapred-site.xml Configuration settings for MapReduce Applications. yarn-site.xml Configuration settings for ResourceManager and NodeManager. masters A list of machines (one per line) that each run a secondary namenode. slaves A list of machines (one per line) that each run a Datanode and a NodeManager. www.edureka.co/big-data-and-hadoopSlide 30 Hadoop 2.x Configuration Files
  • 31. Hadoop 2.x Configuration Files – Apache Hadoop www.edureka.co/big-data-and-hadoopSlide 31
  • 33. core-site.xml -------------------------------------------------core-site.xml----------------------------------------------------- <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- core-site.xml --> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration> ------------------------------------------------core-site.xml----------------------------------------------------- The name of the default file system. The url's authority is used to determine the host, port, etc. for a filesystem. www.edureka.co/big-data-and-hadoopSlide 33
  • 34. hdfs-site.xml ---------------------------------------------------------hdfs-site.xml------------------------------------------------------------- <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- hdfs-site.xml --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/edureka/hadoop-2.2.0/hadoop2_data/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/edureka/hadoop-2.2.0/hadoop2_data/hdfs/datanode</value> </property> </configuration> ---------------------------------- Determines where on the local filesystem the DFS name node should store the name table(fsimage). If "true", enable permission checking in HDFS. If "false", permission checking is turned off. Determines where on the local filesystem the DFS name node should store the name table(fsimage). Determines where on the local filesystem an DFS data node should store its blocks. ---------------------------------------------------------hdfs-site.xml--------------------------- www.edureka.co/big-data-and-hadoopSlide 35
  • 35. mapred-site.xml -----------------------------------------------mapred-site.xml--------------------------------------------------- <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- mapred-site.xml --> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> -----------------------------------------------mapred-site.xml--------------------------------------------------- The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn. www.edureka.co/big-data-and-hadoopSlide 35
  • 36. yarn-site.xml -----------------------------------------------yarn-site.xml--------------------------------------------------- <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- yarn-site.xml --> <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration> -----------------------------------------------yarn-site.xml--------------------------------------------------- The auxiliary service name. The auxiliary service class to use. www.edureka.co/big-data-and-hadoopSlide 36
  • 37. 1. http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/core-default.xml 2. http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml 3. http://hadoop.apache.org/docs/r2.2.0/hadoop-mapreduce-client/hadoop-mapreduce-client- core/mapred-default.xml 4. http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml www.edureka.co/big-data-and-hadoopSlide 37 All Properties
  • 38. Two files are used by the startup and shutdown commands: Slaves and Masters Slaves  Contains a list of hosts, one per line, that are to host DataNode and NodeManager servers. Masters  Contains a list of hosts, one per line, that are to host Secondary NameNode servers. www.edureka.co/big-data-and-hadoopSlide 38
  • 39.  This file also offers a way to provide custom parameters for each of the servers.  Hadoop-env.sh is sourced by all of the Hadoop Core scripts provided in the hadoop directory which is present in hadoop installation directory (hadoop-2.2.0/etc/hadoop).  Examples of environment variables that you can specify: export HADOOP_HEAPSIZE=“512" export HADOOP_DATANODE_HEAPSIZE=“128" Set parameter JAVA_HOME JVMhadoop-env.sh Per-Process RunTime Environment www.edureka.co/big-data-and-hadoopSlide 39
  • 41. Hadoop Daemons www.edureka.co/big-data-and-hadoopSlide 41  NameNodedaemon » Runs on master node of the Hadoop Distributed File System (HDFS) » Directs Data Nodes to perform their low-level I/O tasks  DataNodedaemon » Runs on each slave machine in the HDFS » Does the low-level I/O work  ResourceManager » Runs on master node of the Data processing System(MapReduce) » Global resource Scheduler  NodeManager » Runs on each slave node of Data processing System » Platform for the Data processing tasks  JobHistoryServer » JobHistoryServer is responsible for servicing all job history related requests from client
  • 42. Hadoop Web UI Parts www.edureka.co/big-data-and-hadoopSlide 42 Service Servers Default Used Ports Protocol Description NameNode WebUI Master Nodes (NameNode and any back-up NameNodes) 50070 http Web UI to look at current status of HDFS, explore file system DataNode All Slave Nodes 50075 http Data Node WebUI to access the status, logs etc. Resource- Manager Web UI Cluster Level resource manager 8088 http Web UI for Resource-Manager and for application submissions NodeManager Monitors resources on Data Node 8042 TCP Node information, List of Applications and List of containers MapReduce JobHistory Server Get status on finished applications. 19888 TCP Providing logs of important events in MapReduce job execution and associated profiling metrics
  • 43.  NameNode status: http://localhost:50070/dfshealth.jsp  ResourceManager status: http://localhost:8088/cluster  MapReduce JobHistory Server status: http://localhost:19888/jobhistory www.edureka.co/big-data-and-hadoopSlide 43 Web UI URLs
  • 44. Annie’s Question Which of the following file is used to specify the NameNode's heap size? a. bashrc b. hadoop-env.sh c. hdfs-site.sh d. core-site.xml www.edureka.co/big-data-and-hadoopSlide 44
  • 45. Annie’s Answer Ans. hadoop-env.sh. This file specifies environment variables that affect the JDK used by Hadoop Daemon (bin/Hadoop) www.edureka.co/big-data-and-hadoopSlide 45
  • 46. Annie’s Question It is necessary to define all the properties in core- site.xml, hdfs-site.xml,yarn-site.xml & mapred-site.xml. a. TRUE b. FALSE www.edureka.co/big-data-and-hadoopSlide 46
  • 47. Annie’s Answer Ans. False. Detailed answer will be given after the next question. www.edureka.co/big-data-and-hadoopSlide 47
  • 48. Annie’s Question Stand alone Mode uses default configuration? a) TRUE b) FALSE www.edureka.co/big-data-and-hadoopSlide 48
  • 49. Annie’s Answer Ans. True. In Stand alone mode Hadoop runs with default configuration (Empty configuration files i.e. no configuration settings in core-site.xml, hdfs-site.xml, mapred-site.xml and yarn- site.xml). If properties are not defined in the configuration files, hadoop runs with default values for the corresponding properties. www.edureka.co/big-data-and-hadoopSlide 49
  • 51. Running the Teragen Example www.edureka.co/big-data-and-hadoopSlide 51
  • 54. Annie’s Question The output of a MR job will be stored on HDFS: a. TRUE b. FALSE www.edureka.co/big-data-and-hadoopSlide 54
  • 55. Annie’s Answer Ans. True It is stored in different part files for eg – part-m-00000, part-m-00001 and so on. www.edureka.co/big-data-and-hadoopSlide 55
  • 56. Annie’s Question To run MR job data should be present on HDFS: a. TRUE b. FALSE www.edureka.co/big-data-and-hadoopSlide 56
  • 57. Annie’s Answer Ans. True In order to process data in parallel it is necessary that it is present on HDFS so that MR can work on chunks of data in parallel. www.edureka.co/big-data-and-hadoopSlide 57
  • 58. Slide 59 www.edureka.co/big-data-and-hadoop HDFS Data Analysis Data Loading Using Pig Using HIVE Using Flume Using Sqoop Using Hadoop Copy Commands Data Loading Techniques and Data Analysis
  • 59. hadoop distcp hdfs://<source NN> hdfs://<target NN> www.edureka.co/big-data-and-hadoopSlide 59 distcp: Distributed Copy to move data between clusters, used for backup and recovery put: Copy file(s) from local file system to destination file system. It can also read from “stdin” and writes to destination file system. copyFromLocal: Similar to “put” command, except that the source is restricted to a local file reference. hadoop dfs –put weather.txt hdfs://<target Namenode> hadoop dfs –copyFromLocal weather.txt hdfs://<target Namenode> Hadoop Copy Commands
  • 60. Demo on Copy Commands www.edureka.co/big-data-and-hadoopSlide 60
  • 61. Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming event data. Twitter Source Memory Channel Twitter Streaming API HDFS Sink HDFS Flume Data Loading Using Flume Demo will be covered in Module 10 www.edureka.co/big-data-and-hadoopSlide 61
  • 62. Apache Sqoop (TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured data stores such as relational databases.  Imports individual tables or entire databases to HDFS.  Generates Java classes to allow you to interact with your imported data.  Provides the ability to import from SQL databases straight into your Hive data warehouse. Data Loading Using Sqoop Demo will be covered in Module 10 www.edureka.co/big-data-and-hadoopSlide 62
  • 63. Annie’s Question Your website is hosting a group of more than 300 sub- websites. You want to have an analytics on the shopping patterns of different visitors? What is the best way to collect those information from the weblogs? a. SQOOP b. FLUME www.edureka.co/big-data-and-hadoopSlide 63
  • 65. Annie’s Question You want to join data collected from two sources. One source of data collected from a big database of call records is already available in HDFS. The another source of data is available in a database table. The best way to move that data in HDFS is: a. SQOOP import b. PIG script c. Hive Query www.edureka.co/big-data-and-hadoopSlide 65
  • 66. Annie’s Answer Ans. SQOOP import. www.edureka.co/big-data-and-hadoopSlide 66
  • 67. Assignment  Go through Edureka VM and explore it  Check working condition of Hadoop eco system in Edureka VM Follow this document to install Edureka VM www.edureka.co/big-data-and-hadoopSlide 67
  • 68. Further Reading  Hadoop Cluster Setup http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/ClusterSetup.html  Hadoop on Amazon AWS ec2 http://www.edureka.in/blog/install-apache-hadoop-cluster/  Hadoop Hardware Selection http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new- hadoop-cluster/ http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-Win-1.3.0/bk_cluster-planning- guide/content/ch_hardware-recommendations.html  Hadoop Cluster Configuration http://www.edureka.in/blog/hadoop-cluster-configuration-files/ www.edureka.co/big-data-and-hadoopSlide 68
  • 69. Further Reading  MapReduce Job execution http://www.edureka.in/blog/anatomy-of-a-mapreduce-job-in-apache-hadoop/  Add/Remove Nodes in a Cluster http://www.edureka.in/blog/commissioning-and-decommissioning-nodes-in-a-hadoop-cluster/  Secondary Namenode https://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop- hdfs/HdfsUserGuide.html#Secondary_NameNode www.edureka.co/big-data-and-hadoopSlide 69
  • 70. Setup the Hadoop development environment using the documents present in the LMS.  Refresh your Java Skills using Java Essential for Hadoop Tutorial Review the Interview Questions for setting up hadoop cluster. http://www.edureka.in/blog/hadoop-interview-questions-hadoop-cluster/ www.edureka.co/big-data-and-hadoopSlide 70 Pre-work for next Class
  • 71. Agenda for Next Class www.edureka.co/big-data-and-hadoopSlide 71  Use Cases of MapReduce  Traditional vs MapReduce Way  Hadoop 2.x MapReduce Components and Architecture  YARN Execution Flow  MapReduceConcepts
  • 72. Your feedback is important to us, be it a compliment, a suggestion or a complaint. It helps us to make the course better! Please spare few minutes to take the survey after the webinar. www.edureka.co/big-data-and-hadoopSlide 72 Survey