Big data - Online Training

L
BIG DATA
The following topics will be covered in our
BIG DATA
Online Training:
Copyright @ 2015 Learntek. All Rights Reserved. 2
What is Hadoop?
Big Data Hadoop Training: Hadoop is a free, Java -based programming
framework that supports the processing of large data sets in a distributed
computing environment. It is part of the Apache project sponsored by the
Apache Software Foundation. Hadoop makes it possible to run applications on
systems with thousands of nodes involving thousands of terabytes of storage
capacity. Its distributed file system facilitates rapid data transfer rates among
nodes and allows the system to continue operating uninterrupted in case of a
node failure. This approach lowers the risk of catastrophic system failure, even
if a significant number of nodes become inoperative.
Copyright @ 2015 Learntek. All Rights Reserved.
Why Hadoop?
• Large Volumes of Data: Ability to store and process huge amounts of variety (structure,
unstructured and semi structured) of data, quickly. With data volumes and varieties
constantly increasing, especially from social media and the Internet of Things (IoT), that’s a
key consideration.
• Computing Power: Hadoop’s distributed computing model processes big data fast. The more
computing nodes you use, the more processing power you have.
• Fault Tolerance: Data and application processing are protected against hardware failure. If a
node goes down, jobs are automatically redirected to other nodes to make sure the
distributed computing does not fail. Multiple copies of all data are stored automatically.
• Flexibility: Unlike traditional relational database, you don’t have to process data before
storing it, You can store as much data as you want and decide how to use it later. That
includes unstructured data like text, images and videos etc.
• Low Cost: The open-source framework is free and used commodity hardware to store large
quantities of data.
• Scalability: You can easily grow your system to handle more data simply by adding nodes.
Little administration is required.
Copyright @ 2015 Learntek. All Rights Reserved. 4
Big Data Hadoop Training: Hadoop Introduction
• Big Data Hadoop Training:
Introduction to Data and System
• Types of Data
• Traditional way of dealing large
data and its problems
• Types of Systems & Scaling
• What is Big Data
• Challenges in Big Data
• Challenges in Traditional
Application
• New Requirements
• What is Hadoop? Why Hadoop?
• Brief history of Hadoop
• Features of Hadoop
• Hadoop and RDBMS
• Hadoop Ecosystem’s overview
Copyright @ 2015 Learntek. All Rights Reserved. 5
Hadoop Installation
• Installation in detail
• Creating Ubuntu image in
VMwareDownloading Hadoop
• Installing SSH
• Configuring Hadoop, HDFS &
MapReduce
• Download, Installation &
Configuration Hive
• Download, Installation &
Configuration Pig
• Download, Installation &
Configuration Sqoop
• Download, Installation &
Configuration Hive
• Configuring Hadoop in Different
Modes
Copyright @ 2015 Learntek. All Rights Reserved. 6
Hadoop Distribute File System (HDFS)
Copyright @ 2015 Learntek. All Rights Reserved. 7
• File System – Concepts
• Blocks
• Replication Factor
• Version File
• Safe mode
• Namespace IDs
• Purpose of Name Node
• Purpose of Data Node
• Purpose of Secondary Name
Node
• Purpose of Job Tracker
• Purpose of Task Tracker
• HDFS Shell Commands –
copy, delete, create
directories etc.
• Reading and Writing in HDFS
• Difference of Unix
Commands and HDFS
commands
• Hadoop Admin Commands
• Hands on exercise with Unix
and HDFS commands
• Read / Write in HDFS –
Internal Process between
Client, NameNode &
DataNodes.
• Accessing HDFS using Java
API
• Various Ways of Accessing
HDFS
• Understanding HDFS Java
classes and methods
• Admin: 1. Commissioning /
DeCommissioning DataNode
• Balancer
• Replication Policy
• Network Distance / Topology
Script
Map Reduce Programming
• About MapReduce
• Understanding block and
input splits
• MapReduce Data types
• Understanding Writable
• Data Flow in MapReduce
Application
• Understanding MapReduce
problem on datasets
• MapReduce and Functional
Programming
• Writing MapReduce
Application
• Understanding Mapper
function
• Understanding Reducer
Function
• Understanding Driver
• Usage of Combiner
• Understanding Partitioner
• Usage of Distributed Cache
• Passing the parameters to
mapper and reducer
• Analysing the Results
• Log files
• Input Formats and Output
Formats
• Counters, Skipping Bad and
unwanted Records
• Writing Join’s in MapReduce
with 2 Input files. Join Types.
• Execute MapReduce Job –
Insights.
• Exercise’s on MapReduce.
• Job Scheduling: Type of
Schedulers.
Copyright @ 2015 Learntek. All Rights Reserved. 8
Hive
• Hive concepts
• Schema on Read VS Schema on
Write
• Hive architecture
• Install and configure hive on
cluster
• Meta Store – Purpose & Type of
Configurations
• Different type of tables in Hive
• Buckets
• Partitions
• Joins in hive
• Hive Query Language
• Hive Data Types
• Data Loading into Hive Tables
• Hive Query Execution
• Hive library functions
• Hive UDF
• Hive Limitations
Copyright @ 2015 Learntek. All Rights Reserved. 9
Pig
• Pig basics
• Install and configure PIG on a cluster
• PIG Library functions
• Pig Vs Hive
• Write sample Pig Latin scripts
• Modes of running PIG
• Running in Grunt shell
• Running as Java program
• PIG UDFs
Copyright @ 2015 Learntek. All Rights Reserved. 10
HBase
• HBase concepts
• HBase architecture
• Region server architecture
• File storage architecture
• HBase basics
• Column access
• Scans
• HBase use cases
• Install and configure HBase on a
multi node cluster
• Create database, Develop and
run sample applications
• Access data stored in HBase
using Java API
Copyright @ 2015 Learntek. All Rights Reserved. 11
Sqoop
• Install and configure Sqoop on cluster
• Connecting to RDBMS
• Installing Mysql
• Import data from Mysql to hive
• Export data to Mysql
• Internal mechanism of import/export
Copyright @ 2015 Learntek. All Rights Reserved. 12
Oozie
• Introduction to OOZIE
• Oozie architecture
• XML file specifications
• Specifying Work flow
• Control nodes
• Oozie job coordinator
Copyright @ 2015 Learntek. All Rights Reserved. 13
Flume
• Introduction to Flume
• Configuration and Setup
• Flume Sink with example
• Channel
• Flume Source with example
• Complex flume architecture
Copyright @ 2015 Learntek. All Rights Reserved. 14
ZooKeeper
• Introduction to ZooKeeper
• Challenges in distributed Applications
• Coordination
• ZooKeeper : Design Goals
• Data Model and Hierarchical namespace
• Cilent APIs
Copyright @ 2015 Learntek. All Rights Reserved. 15
YARN
• Hadoop 1.0 Limitations
• MapReduce Limitations
• History of Hadoop 2.0
• HDFS 2: Architecture
• HDFS 2: Quorum based storage
• HDFS 2: High availability
• HDFS 2: Federation
• YARN Architecture
• Classic vs YARN
• YARN Apps
• YARN multitenancy
• YARN Capacity Scheduler
Copyright @ 2015 Learntek. All Rights Reserved. 16
Prerequisites :
• Knowledge in any programming language, Database knowledge and
Linux Operating system. Core Java or Python knowledge helpful.
Copyright @ 2015 Learntek. All Rights Reserved. 17
Copyright @ 2015 Learntek. All Rights Reserved. 18
1 von 18

Recomendados

Big data Hadoop von
Big data  Hadoop   Big data  Hadoop
Big data Hadoop Ayyappan Paramesh
468 views74 Folien
Scaling Deep Learning on Hadoop at LinkedIn von
Scaling Deep Learning on Hadoop at LinkedInScaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedInDataWorks Summit
808 views54 Folien
Apache hadoop technology : Beginners von
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
215 views21 Folien
Hadoop Security and Compliance - StampedeCon 2016 von
Hadoop Security and Compliance - StampedeCon 2016Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016StampedeCon
760 views14 Folien
Hybrid Data Platform von
Hybrid Data Platform Hybrid Data Platform
Hybrid Data Platform DataWorks Summit/Hadoop Summit
2.2K views12 Folien
Hadoop jon von
Hadoop jonHadoop jon
Hadoop jonHumoyun Ahmedov
354 views30 Folien

Más contenido relacionado

Was ist angesagt?

Introduction to Kudu - StampedeCon 2016 von
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016StampedeCon
417 views23 Folien
Data protection for hadoop environments von
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environmentsDataWorks Summit
2.1K views29 Folien
Deep Learning using Spark and DL4J for fun and profit von
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDataWorks Summit/Hadoop Summit
1.8K views18 Folien
Big data architecture on cloud computing infrastructure von
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructuredatastack
756 views33 Folien
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su... von
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Abhiraj Butala
1.4K views31 Folien
Data Wrangling and Oracle Connectors for Hadoop von
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopGwen (Chen) Shapira
2.7K views50 Folien

Was ist angesagt?(18)

Introduction to Kudu - StampedeCon 2016 von StampedeCon
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
StampedeCon417 views
Data protection for hadoop environments von DataWorks Summit
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environments
DataWorks Summit2.1K views
Big data architecture on cloud computing infrastructure von datastack
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
datastack756 views
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su... von Abhiraj Butala
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Abhiraj Butala1.4K views
Data Wrangling and Oracle Connectors for Hadoop von Gwen (Chen) Shapira
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
Gwen (Chen) Shapira2.7K views
Querying Druid in SQL with Superset von DataWorks Summit
Querying Druid in SQL with SupersetQuerying Druid in SQL with Superset
Querying Druid in SQL with Superset
DataWorks Summit5.5K views
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends von Esther Kundin
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin1.1K views
Strata EU tutorial - Architectural considerations for hadoop applications von hadooparchbook
Strata EU tutorial - Architectural considerations for hadoop applicationsStrata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applications
hadooparchbook4.7K views
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial von hadooparchbook
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
hadooparchbook9.5K views
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes von DataWorks Summit
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit361 views
Hadoop vs. RDBMS for Advanced Analytics von joshwills
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
joshwills4K views
Innovation in the Data Warehouse - StampedeCon 2016 von StampedeCon
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon914 views
Leveraging docker for hadoop build automation and big data stack provisioning von Evans Ye
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
Evans Ye863 views
Advanced Security In Hadoop Cluster von Edureka!
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
Edureka!2.1K views

Similar a Big data - Online Training

Big data and hadoop von
Big data and hadoopBig data and hadoop
Big data and hadoopPrashanth Yennampelli
460 views18 Folien
Hadoop.pptx von
Hadoop.pptxHadoop.pptx
Hadoop.pptxsonukumar379092
8 views55 Folien
List of Engineering Colleges in Uttarakhand von
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandRoorkee College of Engineering, Roorkee
265 views55 Folien
Hadoop.pptx von
Hadoop.pptxHadoop.pptx
Hadoop.pptxarslanhaneef
8 views55 Folien
Introduction to BIg Data and Hadoop von
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
870 views42 Folien
Hadoop ppt1 von
Hadoop ppt1Hadoop ppt1
Hadoop ppt1chariorienit
586 views53 Folien

Similar a Big data - Online Training(20)

Introduction to BIg Data and Hadoop von Amir Shaikh
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
Amir Shaikh870 views
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3 von tcloudcomputing-tw
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
tcloudcomputing-tw3.2K views
Colorado Springs Open Source Hadoop/MySQL von David Smelker
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
David Smelker665 views
Apache hadoop technology : Beginners von Shweta Patnaik
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik57 views
Apache hadoop technology : Beginners von Shweta Patnaik
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik287 views
Big Data Developers Moscow Meetup 1 - sql on hadoop von bddmoscow
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
bddmoscow931 views
Hadoop And Their Ecosystem von sunera pathan
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan452 views
Hadoop And Their Ecosystem ppt von sunera pathan
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
sunera pathan10K views
Getting started with big data in Azure HDInsight von Nilesh Gule
Getting started with big data in Azure HDInsightGetting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsight
Nilesh Gule266 views
Big data and hadoop product page von Janu Jahnavi
Big data and hadoop product pageBig data and hadoop product page
Big data and hadoop product page
Janu Jahnavi43 views
Hadoop and SQL: Delivery Analytics Across the Organization von Seeling Cheung
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
Seeling Cheung2.5K views

Más de Learntek1

Aws sys ops administrator von
Aws sys ops administratorAws sys ops administrator
Aws sys ops administratorLearntek1
74 views19 Folien
Angular js Online Training von
Angular js Online TrainingAngular js Online Training
Angular js Online TrainingLearntek1
34 views24 Folien
Selenium Online Training von
Selenium  Online TrainingSelenium  Online Training
Selenium Online TrainingLearntek1
23 views12 Folien
React js Online Training von
React js Online TrainingReact js Online Training
React js Online TrainingLearntek1
169 views21 Folien
Machine learning using spark Online Training von
Machine learning using spark Online TrainingMachine learning using spark Online Training
Machine learning using spark Online TrainingLearntek1
33 views16 Folien
Apache Flink Online Training von
Apache Flink Online TrainingApache Flink Online Training
Apache Flink Online TrainingLearntek1
42 views13 Folien

Más de Learntek1(7)

Aws sys ops administrator von Learntek1
Aws sys ops administratorAws sys ops administrator
Aws sys ops administrator
Learntek174 views
Angular js Online Training von Learntek1
Angular js Online TrainingAngular js Online Training
Angular js Online Training
Learntek134 views
Selenium Online Training von Learntek1
Selenium  Online TrainingSelenium  Online Training
Selenium Online Training
Learntek123 views
React js Online Training von Learntek1
React js Online TrainingReact js Online Training
React js Online Training
Learntek1169 views
Machine learning using spark Online Training von Learntek1
Machine learning using spark Online TrainingMachine learning using spark Online Training
Machine learning using spark Online Training
Learntek133 views
Apache Flink Online Training von Learntek1
Apache Flink Online TrainingApache Flink Online Training
Apache Flink Online Training
Learntek142 views
Scala & Spark Online Training von Learntek1
Scala & Spark Online TrainingScala & Spark Online Training
Scala & Spark Online Training
Learntek146 views

Último

Six Sigma Concept by Sahil Srivastava.pptx von
Six Sigma Concept by Sahil Srivastava.pptxSix Sigma Concept by Sahil Srivastava.pptx
Six Sigma Concept by Sahil Srivastava.pptxSahil Srivastava
40 views11 Folien
StudioX.pptx von
StudioX.pptxStudioX.pptx
StudioX.pptxNikhileshSathyavarap
89 views18 Folien
EILO EXCURSION PROGRAMME 2023 von
EILO EXCURSION PROGRAMME 2023EILO EXCURSION PROGRAMME 2023
EILO EXCURSION PROGRAMME 2023info33492
181 views40 Folien
Berry country.pdf von
Berry country.pdfBerry country.pdf
Berry country.pdfMariaKenney3
61 views12 Folien
Create a Structure in VBNet.pptx von
Create a Structure in VBNet.pptxCreate a Structure in VBNet.pptx
Create a Structure in VBNet.pptxBreach_P
82 views8 Folien
ICS3211_lecture 09_2023.pdf von
ICS3211_lecture 09_2023.pdfICS3211_lecture 09_2023.pdf
ICS3211_lecture 09_2023.pdfVanessa Camilleri
134 views10 Folien

Último(20)

Six Sigma Concept by Sahil Srivastava.pptx von Sahil Srivastava
Six Sigma Concept by Sahil Srivastava.pptxSix Sigma Concept by Sahil Srivastava.pptx
Six Sigma Concept by Sahil Srivastava.pptx
Sahil Srivastava40 views
EILO EXCURSION PROGRAMME 2023 von info33492
EILO EXCURSION PROGRAMME 2023EILO EXCURSION PROGRAMME 2023
EILO EXCURSION PROGRAMME 2023
info33492181 views
Create a Structure in VBNet.pptx von Breach_P
Create a Structure in VBNet.pptxCreate a Structure in VBNet.pptx
Create a Structure in VBNet.pptx
Breach_P82 views
12.5.23 Poverty and Precarity.pptx von mary850239
12.5.23 Poverty and Precarity.pptx12.5.23 Poverty and Precarity.pptx
12.5.23 Poverty and Precarity.pptx
mary850239162 views
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv... von Taste
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...
Creative Restart 2023: Leonard Savage - The Permanent Brief: Unearthing unobv...
Taste53 views
Parts of Speech (1).pptx von mhkpreet001
Parts of Speech (1).pptxParts of Speech (1).pptx
Parts of Speech (1).pptx
mhkpreet00143 views
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx von Niranjan Chavan
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptxGuidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx
Guidelines & Identification of Early Sepsis DR. NN CHAVAN 02122023.pptx
Niranjan Chavan38 views
11.30.23A Poverty and Inequality in America.pptx von mary850239
11.30.23A Poverty and Inequality in America.pptx11.30.23A Poverty and Inequality in America.pptx
11.30.23A Poverty and Inequality in America.pptx
mary85023986 views
Creative Restart 2023: Atila Martins - Craft: A Necessity, Not a Choice von Taste
Creative Restart 2023: Atila Martins - Craft: A Necessity, Not a ChoiceCreative Restart 2023: Atila Martins - Craft: A Necessity, Not a Choice
Creative Restart 2023: Atila Martins - Craft: A Necessity, Not a Choice
Taste41 views
Guess Papers ADC 1, Karachi University von Khalid Aziz
Guess Papers ADC 1, Karachi UniversityGuess Papers ADC 1, Karachi University
Guess Papers ADC 1, Karachi University
Khalid Aziz83 views
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (FRIE... von Nguyen Thanh Tu Collection
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (FRIE...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (FRIE...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (FRIE...
When Sex Gets Complicated: Porn, Affairs, & Cybersex von Marlene Maheu
When Sex Gets Complicated: Porn, Affairs, & CybersexWhen Sex Gets Complicated: Porn, Affairs, & Cybersex
When Sex Gets Complicated: Porn, Affairs, & Cybersex
Marlene Maheu108 views
Class 9 lesson plans von TARIQ KHAN
Class 9 lesson plansClass 9 lesson plans
Class 9 lesson plans
TARIQ KHAN68 views

Big data - Online Training

  • 2. The following topics will be covered in our BIG DATA Online Training: Copyright @ 2015 Learntek. All Rights Reserved. 2
  • 3. What is Hadoop? Big Data Hadoop Training: Hadoop is a free, Java -based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes of storage capacity. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative. Copyright @ 2015 Learntek. All Rights Reserved.
  • 4. Why Hadoop? • Large Volumes of Data: Ability to store and process huge amounts of variety (structure, unstructured and semi structured) of data, quickly. With data volumes and varieties constantly increasing, especially from social media and the Internet of Things (IoT), that’s a key consideration. • Computing Power: Hadoop’s distributed computing model processes big data fast. The more computing nodes you use, the more processing power you have. • Fault Tolerance: Data and application processing are protected against hardware failure. If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail. Multiple copies of all data are stored automatically. • Flexibility: Unlike traditional relational database, you don’t have to process data before storing it, You can store as much data as you want and decide how to use it later. That includes unstructured data like text, images and videos etc. • Low Cost: The open-source framework is free and used commodity hardware to store large quantities of data. • Scalability: You can easily grow your system to handle more data simply by adding nodes. Little administration is required. Copyright @ 2015 Learntek. All Rights Reserved. 4
  • 5. Big Data Hadoop Training: Hadoop Introduction • Big Data Hadoop Training: Introduction to Data and System • Types of Data • Traditional way of dealing large data and its problems • Types of Systems & Scaling • What is Big Data • Challenges in Big Data • Challenges in Traditional Application • New Requirements • What is Hadoop? Why Hadoop? • Brief history of Hadoop • Features of Hadoop • Hadoop and RDBMS • Hadoop Ecosystem’s overview Copyright @ 2015 Learntek. All Rights Reserved. 5
  • 6. Hadoop Installation • Installation in detail • Creating Ubuntu image in VMwareDownloading Hadoop • Installing SSH • Configuring Hadoop, HDFS & MapReduce • Download, Installation & Configuration Hive • Download, Installation & Configuration Pig • Download, Installation & Configuration Sqoop • Download, Installation & Configuration Hive • Configuring Hadoop in Different Modes Copyright @ 2015 Learntek. All Rights Reserved. 6
  • 7. Hadoop Distribute File System (HDFS) Copyright @ 2015 Learntek. All Rights Reserved. 7 • File System – Concepts • Blocks • Replication Factor • Version File • Safe mode • Namespace IDs • Purpose of Name Node • Purpose of Data Node • Purpose of Secondary Name Node • Purpose of Job Tracker • Purpose of Task Tracker • HDFS Shell Commands – copy, delete, create directories etc. • Reading and Writing in HDFS • Difference of Unix Commands and HDFS commands • Hadoop Admin Commands • Hands on exercise with Unix and HDFS commands • Read / Write in HDFS – Internal Process between Client, NameNode & DataNodes. • Accessing HDFS using Java API • Various Ways of Accessing HDFS • Understanding HDFS Java classes and methods • Admin: 1. Commissioning / DeCommissioning DataNode • Balancer • Replication Policy • Network Distance / Topology Script
  • 8. Map Reduce Programming • About MapReduce • Understanding block and input splits • MapReduce Data types • Understanding Writable • Data Flow in MapReduce Application • Understanding MapReduce problem on datasets • MapReduce and Functional Programming • Writing MapReduce Application • Understanding Mapper function • Understanding Reducer Function • Understanding Driver • Usage of Combiner • Understanding Partitioner • Usage of Distributed Cache • Passing the parameters to mapper and reducer • Analysing the Results • Log files • Input Formats and Output Formats • Counters, Skipping Bad and unwanted Records • Writing Join’s in MapReduce with 2 Input files. Join Types. • Execute MapReduce Job – Insights. • Exercise’s on MapReduce. • Job Scheduling: Type of Schedulers. Copyright @ 2015 Learntek. All Rights Reserved. 8
  • 9. Hive • Hive concepts • Schema on Read VS Schema on Write • Hive architecture • Install and configure hive on cluster • Meta Store – Purpose & Type of Configurations • Different type of tables in Hive • Buckets • Partitions • Joins in hive • Hive Query Language • Hive Data Types • Data Loading into Hive Tables • Hive Query Execution • Hive library functions • Hive UDF • Hive Limitations Copyright @ 2015 Learntek. All Rights Reserved. 9
  • 10. Pig • Pig basics • Install and configure PIG on a cluster • PIG Library functions • Pig Vs Hive • Write sample Pig Latin scripts • Modes of running PIG • Running in Grunt shell • Running as Java program • PIG UDFs Copyright @ 2015 Learntek. All Rights Reserved. 10
  • 11. HBase • HBase concepts • HBase architecture • Region server architecture • File storage architecture • HBase basics • Column access • Scans • HBase use cases • Install and configure HBase on a multi node cluster • Create database, Develop and run sample applications • Access data stored in HBase using Java API Copyright @ 2015 Learntek. All Rights Reserved. 11
  • 12. Sqoop • Install and configure Sqoop on cluster • Connecting to RDBMS • Installing Mysql • Import data from Mysql to hive • Export data to Mysql • Internal mechanism of import/export Copyright @ 2015 Learntek. All Rights Reserved. 12
  • 13. Oozie • Introduction to OOZIE • Oozie architecture • XML file specifications • Specifying Work flow • Control nodes • Oozie job coordinator Copyright @ 2015 Learntek. All Rights Reserved. 13
  • 14. Flume • Introduction to Flume • Configuration and Setup • Flume Sink with example • Channel • Flume Source with example • Complex flume architecture Copyright @ 2015 Learntek. All Rights Reserved. 14
  • 15. ZooKeeper • Introduction to ZooKeeper • Challenges in distributed Applications • Coordination • ZooKeeper : Design Goals • Data Model and Hierarchical namespace • Cilent APIs Copyright @ 2015 Learntek. All Rights Reserved. 15
  • 16. YARN • Hadoop 1.0 Limitations • MapReduce Limitations • History of Hadoop 2.0 • HDFS 2: Architecture • HDFS 2: Quorum based storage • HDFS 2: High availability • HDFS 2: Federation • YARN Architecture • Classic vs YARN • YARN Apps • YARN multitenancy • YARN Capacity Scheduler Copyright @ 2015 Learntek. All Rights Reserved. 16
  • 17. Prerequisites : • Knowledge in any programming language, Database knowledge and Linux Operating system. Core Java or Python knowledge helpful. Copyright @ 2015 Learntek. All Rights Reserved. 17
  • 18. Copyright @ 2015 Learntek. All Rights Reserved. 18