Introduction to Hadoop Distributed Programming

•Als PPTX, PDF herunterladen•

1 gefällt mir•419 views

The document provides an overview of the Hadoop ecosystem, including introductory information on Hadoop and MapReduce, installing and using Hadoop, programming with Pig and Hive, using NoSQL databases like MongoDB, machine learning with Mahout, and moving data in and out of Hadoop systems. It also covers managing Hadoop clusters, running Hadoop on AWS, data structures and algorithms for Hadoop, and testing and debugging Hadoop applications.

Bildung Technologie

 Introduction to Distributed Programming
› Background of Hadoop
› What is Hadoop ?
› How Hadoop works ?
 Installing Hadoop
› Setting up SSH
› Setting up Environment Variables
› Running Hadoop
› Web-Based Cluster

 Components of Hadoop
› Working with Hadoop File-System
› Understanding Hadoop Map-Reduce
› Reading and Writing
 Writing Basic Map Reduce Program
› Getting the Patent Data Set
› Constructing Basic Map-Reduce Program
› Working with Hadoop Streaming
› Improving Performance with Combiners

 Advanced MapReduce
› Summarization Patterns
› Filtering Patterns
› Data Organization Patterns
› Join Patterns
› Meta Patterns
› Input and Output Patterns
 Programming Practices
› Developing Map-Reduce Programs
› Monitoring and Debugging on a cluster
› Tuning for performance

 Hadoop Cookbook
› Passing Job-Specific Parameters to your tasks
› Probing for Task-Specific Parameters
› Partitioning into multiple output files
› Inputting from and output to database
› Keeping Output in Sorted Order
 Managing Hadoop
› Checking System’s Health
› Setting permissions
› Managing Quotas , Enabling Trash ,
Adding/Deleting Nodes, Recovering from a
failed NameNode

 Running Hadoop in the Cloud
› Introducing Amazon Web Services
› Setting up AWS and Setting up cloud on EC2
› Running Map-Reduce Programs on EC2
› Cleaning up and Shutting down your EC2
instances.
› Amazon Elastic Map-Reduce and other AWS
Services

 Programming with Pig
› Thinking like a pig
› Installing Pig
› Running Pig
› Learning Pig Latin through Grunt
› Pig Latin Syntax
› Working with UDF
› Working with Scripts

 Getting Started on Hive
 Data Types and File Formats
 HiveQL – Data Definition
 HiveQL - Data Manipulation
 HiveQL – Queries, Views and Indexes
 Schema Design , Tuning & Record
Formats
 Hive Integration with Oozie
 Hive and Amazon Web Services

 NoSQL Database
› Why No SQL ?
› Aggregate Data Models
› Distribution Models
› Consistency
 No SQL DBs
› Key-Value DataBases
› Document Databases
› Column Family Stores
› Graph Databases

 MongoDB
› Introduction
› MongoDB through JavaScript Shell
› Writing Programs using MongoDB
› Document Oriented Data
› Queries and Aggregation
› Updates, Atomic Operations and Deletes
› Indexing, Replication and Sharding

 Mahout – Machine Learning
› Introduction
› Recommenders
 Representing Recommender Data
 Making Recommendations
› Clustering
 Clustering Algorithms in Mahout
› Classification
 Training a Classifier
 Evaluating and Tuning a Classifier

 Moving Data in and out of Hadoop
› Flume
› Oozie
› Sqoop
› Hbase
 Data Serialization Formats
› XML, JSON
› SequenceFiles, Protocol Buffers, Thrift and
Avro

 Utilizing Data Structures and Algorithms
› Modelling Data & Solving Problems with
Graphs
› Parallelized Bloom Filter Creation in Map-
Reduce
 Programming Pipelines with Pig
› Using Pig to find malicious actors in log data.
› Optimizing user workflow with Pig.

 Crunch
 Cascading
 Puppet
 Unit Testing Map-Reduce
 Heavyweight Job Testing using
LocalJobRunner
 Debugging User-Space Problems

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoopavnishagr

Introduction to Apache Spark EcosystemBojan Babic

Asbury Hadoop OverviewBrian Enochson

Hadoop overviewSiva Pandeti

Big dataAlisha Roy

Hadoop and Distributed ComputingFederico Cargnelutti

Getting started big dataKibrom Gebrehiwot

Introduction to apache sparkUserReport

Nextag talkJoydeep Sen Sarma

Cloud Optimized Big DataJoydeep Sen Sarma

An introduction to Apache Hadoop HiveMike Frampton

Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...AyeeshaParveen

Basic Hadoop Architecture V1 vs V2VIVEKVANAVAN

Geek Night - Functional Data Processing using Spark and ScalaAtif Akhtar

R and-hadoopBryan Downing

Hadoop ArchitectureDr. C.V. Suresh Babu

Apache spark on Hadoop Yarn Resource Managerharidasnss

An Introduction of Apache HadoopKMS Technology

Intro to SparkKyle Burke

Was ist angesagt? (19)

Hadoop

Introduction to Apache Spark Ecosystem

Asbury Hadoop Overview

Hadoop overview

Big data

Hadoop and Distributed Computing

Getting started big data

Introduction to apache spark

Nextag talk

Cloud Optimized Big Data

An introduction to Apache Hadoop Hive

Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...

Basic Hadoop Architecture V1 vs V2

Geek Night - Functional Data Processing using Spark and Scala

R and-hadoop

Hadoop Architecture

Apache spark on Hadoop Yarn Resource Manager

An Introduction of Apache Hadoop

Intro to Spark

Ähnlich wie Introduction to Hadoop Distributed Programming

Hadoop online trainingsGeek Trainings

Big Data in the Microsoft PlatformJesus Rodriguez

Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai

Hadoop Training in HyderabadRajitha D

Hadoop Training in HyderabadCHENNAKESHAVAKATAGAR

Microsoft's Big Play for Big DataAndrew Brust

Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow

Big-Data Hadoop Tutorials - MindScripts Technologies, Pune amrutupre

Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Andrew Brust

Couch dbKhurram Mahmood Bhatti

Apache Hadoop HiveSome corner at the Laboratory

Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime

SQL Server 2012 and Big DataMicrosoft TechNet - Belgium and Luxembourg

Hadoop course contents latestsandsys technologies

Prashanth Kumar_Hadoop_NEWPrashanth Shankar kumar

Big data overviewbeCloudReady

Hadoop 80hr v1.0binarylore Inc

Getting Started with HadoopCloudera, Inc.

Hadoop Frameworks Panel__HadoopSummit2010Yahoo Developer Network

Haoop pptorsenit

Ähnlich wie Introduction to Hadoop Distributed Programming (20)

Hadoop online trainings

Big Data in the Microsoft Platform

Building robust CDC pipeline with Apache Hudi and Debezium

Hadoop Training in Hyderabad

Microsoft's Big Play for Big Data

Big Data Developers Moscow Meetup 1 - sql on hadoop

Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012

Couch db

Apache Hadoop Hive

Cloudera Impala - San Diego Big Data Meetup August 13th 2014

SQL Server 2012 and Big Data

Hadoop course contents latest

Prashanth Kumar_Hadoop_NEW

Big data overview

Hadoop 80hr v1.0

Getting Started with Hadoop

Hadoop Frameworks Panel__HadoopSummit2010

Haoop ppt

Kürzlich hochgeladen

Paradigm shift in nursing research by RS MEHTABP KOIRALA INSTITUTE OF HELATH SCIENCS,, NEPAL

4.16.24 21st Century Movements for Black Lives.pptxmary850239

Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringSri Sairam College Of Engineering Bengaluru

Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar

Transaction Management in Database Management SystemChristalin Nelson

4.11.24 Poverty and Inequality in America.pptxmary850239

ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri

Active Learning Strategies (in short ALS).pdfPatidar M

Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar

Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43

ClimART Action | eTwinning Projectjordimapav

Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW

Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management

Expanded definition: technical and operationalssuser3e220a

Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar

Oppenheimer Film Discussion for Philosophy and FilmStan Meyer

Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith

MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir

Multi Domain Alias In the Odoo 17 ERP ModuleCeline George

Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar

Kürzlich hochgeladen (20)

Paradigm shift in nursing research by RS MEHTA

4.16.24 21st Century Movements for Black Lives.pptx

Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering

Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx

Transaction Management in Database Management System

4.11.24 Poverty and Inequality in America.pptx

ICS2208 Lecture6 Notes for SL spaces.pdf

Active Learning Strategies (in short ALS).pdf

Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx

Grade Three -ELLNA-REVIEWER-ENGLISH.pptx

ClimART Action | eTwinning Project

Mythology Quiz-4th April 2024, Quiz Club NITW

Team Lead Succeed – Helping you and your team achieve high-performance teamwo...

Expanded definition: technical and operational

Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx

Oppenheimer Film Discussion for Philosophy and Film

Mental Health Awareness - a toolkit for supporting young minds

MS4 level being good citizen -imperative- (1) (1).pdf

Multi Domain Alias In the Odoo 17 ERP Module

Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...

Introduction to Hadoop Distributed Programming

2.  Introduction to Distributed Programming › Background of Hadoop › What is Hadoop ? › How Hadoop works ?  Installing Hadoop › Setting up SSH › Setting up Environment Variables › Running Hadoop › Web-Based Cluster

3.  Components of Hadoop › Working with Hadoop File-System › Understanding Hadoop Map-Reduce › Reading and Writing  Writing Basic Map Reduce Program › Getting the Patent Data Set › Constructing Basic Map-Reduce Program › Working with Hadoop Streaming › Improving Performance with Combiners

4.  Advanced MapReduce › Summarization Patterns › Filtering Patterns › Data Organization Patterns › Join Patterns › Meta Patterns › Input and Output Patterns  Programming Practices › Developing Map-Reduce Programs › Monitoring and Debugging on a cluster › Tuning for performance

5.  Hadoop Cookbook › Passing Job-Specific Parameters to your tasks › Probing for Task-Specific Parameters › Partitioning into multiple output files › Inputting from and output to database › Keeping Output in Sorted Order  Managing Hadoop › Checking System’s Health › Setting permissions › Managing Quotas , Enabling Trash , Adding/Deleting Nodes, Recovering from a failed NameNode

6.  Running Hadoop in the Cloud › Introducing Amazon Web Services › Setting up AWS and Setting up cloud on EC2 › Running Map-Reduce Programs on EC2 › Cleaning up and Shutting down your EC2 instances. › Amazon Elastic Map-Reduce and other AWS Services

7.  Programming with Pig › Thinking like a pig › Installing Pig › Running Pig › Learning Pig Latin through Grunt › Pig Latin Syntax › Working with UDF › Working with Scripts

8.  Getting Started on Hive  Data Types and File Formats  HiveQL – Data Definition  HiveQL - Data Manipulation  HiveQL – Queries, Views and Indexes  Schema Design , Tuning & Record Formats  Hive Integration with Oozie  Hive and Amazon Web Services

9.  NoSQL Database › Why No SQL ? › Aggregate Data Models › Distribution Models › Consistency  No SQL DBs › Key-Value DataBases › Document Databases › Column Family Stores › Graph Databases

10.  MongoDB › Introduction › MongoDB through JavaScript Shell › Writing Programs using MongoDB › Document Oriented Data › Queries and Aggregation › Updates, Atomic Operations and Deletes › Indexing, Replication and Sharding

11.  Mahout – Machine Learning › Introduction › Recommenders  Representing Recommender Data  Making Recommendations › Clustering  Clustering Algorithms in Mahout › Classification  Training a Classifier  Evaluating and Tuning a Classifier

12.  Moving Data in and out of Hadoop › Flume › Oozie › Sqoop › Hbase  Data Serialization Formats › XML, JSON › SequenceFiles, Protocol Buffers, Thrift and Avro

13.  Utilizing Data Structures and Algorithms › Modelling Data & Solving Problems with Graphs › Parallelized Bloom Filter Creation in Map- Reduce  Programming Pipelines with Pig › Using Pig to find malicious actors in log data. › Optimizing user workflow with Pig.

14.  Crunch  Cascading  Puppet  Unit Testing Map-Reduce  Heavyweight Job Testing using LocalJobRunner  Debugging User-Space Problems

Introduction to Hadoop Distributed Programming

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Ähnlich wie Introduction to Hadoop Distributed Programming

Ähnlich wie Introduction to Hadoop Distributed Programming (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Introduction to Hadoop Distributed Programming