SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Slide 1© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data Insights using
MapReduce
Slide 2© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Session Objectives
ᗍ Introduction to Big Data and Hadoop
ᗍ Understanding HDFS
ᗍ Introduction to MapReduce – MapReduce Fundamentals
ᗍ MapReduce Programming Tutorial
ᗍ BIG Data Analytics via MapReduce
ᗍ BIG Data & Hadoop Course Details
ᗍ Webinar by Skillspeed
Get Started with BIG Data & Hadoop
Slide 3© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
Get Started with BIG Data & Hadoop
Slide 4© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Big Data and its Challenges
Big data is the term for a collection of data sets so
large and complex that it becomes difficult to
process using on-hand database management
tools or traditional data processing applications
Systems / Enterprises generate huge amount of
data from Terabytes to and even Petabytes of
information
It’s very difficult to manage such huge data……
Get Started with BIG Data & Hadoop
Slide 5© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Who Generates Big Data?
Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data?
Today, it is becoming a problem for all of us to manage such BIG DATA…. Get Started with BIG Data & Hadoop
Slide 6© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop can be used for easy processing of such huge Data…..
We will answer how?
Before that let’s understand what is Hadoop?
Get Started with BIG Data & Hadoop
Slide 7© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop and its Characteristics
Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of
commodity computers using a simple programming model
It is an Open-source Data Management technology with scale-out storage and distributed processing
Hadoop
Characteristics
Flexible
Reliable
Economical
Scalable Get Started with BIG Data & Hadoop
Slide 8© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Why Hadoop?
How does Hadoop solve the Big Data challenges?
Hadoop Platform is designed to address the big data problems
Size of Data
Variety of Data
Get Started with BIG Data & Hadoop
Slide 9© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Hadoop Ecosystem
Flume Sqoop
Import Or Export
Unstructured or
Semi-Structured data Structured Data
Apache Oozie (Workflow)
HDFS
(Hadoop Distributed File System)
Pig Latin
Data Analysis
Hive
DW System
MapReduce Framework HBase
Other
YARN
Frameworks (MPI,
GIRAPH)
YARN
Cluster Resource Management
Get Started with BIG Data & Hadoop
Slide 10© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Map Reduce
Get Started with BIG Data & Hadoop
Slide 11© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Map Reduce – Scenario
Let us consider a real life scenario to understand the importance of “Map Reduce” in Hadoop
Suppose, you are the
handling a project which has
x tasks and takes 100 hours
for one resource to complete
1 x 100 = 100 hours
100/10(resources) = 10 hours
Get Started with BIG Data & Hadoop
Slide 12© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Similarly,
= 100 hours 100/10 = 10 hours
Map Reduce – Scenario
Get Started with BIG Data & Hadoop
Slide 13© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
More Scenarios on Map-Reduce
Problem Statement:
Find maximum stock market levels recorded in a span of 5 years
Problem Statement:
De-identify personal identifier information
Get Started with BIG Data & Hadoop
Slide 14© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Traditional Solution
matchesSplit Data
Very
Big
Data
All
matches
grep
grep
grep
cat
grep
:
matches
matches
matches
Split Data
Split Data
Split Data
Get Started with BIG Data & Hadoop
Slide 15© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Solution
Very
Big
Input
Split Data
All
matches
:
Split Data
Split Data
Split Data
M
A
P
R
E
D
U
C
E
MapReduce Framework
Get Started with BIG Data & Hadoop
Slide 16© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Advantages
Two biggest advantages:
ᗍ Takes processing to the data
ᗍ Allows processing data in parallel
a b
c
Map Task
HDFS Block
Data Center
Rack
Node
Get Started with BIG Data & Hadoop
Slide 17© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
MapReduce Flow
1. Input data is present in data nodes
2. Map tasks = Input Splits
3. Mappers produce intermediate data
4. Data exchanged among nodes in “shuffling”
5. All data of same key goes to same reducer
6. Reducer output stored at output location
Node 1
INPUT DATA
Map
Node 2
Map
Node 1
Reduce
Node 1
Reduce
Get Started with BIG Data & Hadoop
Slide 18© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
What is Expected?
In this section, we will discuss the questions on HDFS and MapReduce that is asked during the interview
This will help you analyze the importance of the topics under study!
Get Started with BIG Data & Hadoop
Slide 19© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Job Trends – Hadoop
Get Started with BIG Data & Hadoop
Slide 20© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Why SkillSpeed?
Course
Curriculum
from Industry
Experts
Instructor Led
Live Virtual
Sessions
Lifetime access
to Course
Content via
LMS
100% Placement
Assistance
24x7 Support
Get Started with BIG Data & Hadoop
Slide 21© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Course Topics
Module 1
Introduction to Big
Data and Hadoop
Module 2
HDFS Internals, Hadoop
Configurations and
Data Loading
Module 3
Introduction to Map
Reduce
Module 4
Advanced Map Reduce
Concepts
Module 5
Introduction to Pig
Module 6
Advanced Pig and
Introduction to Hive
Module 7
Advanced Hive
Concepts
Module 8
Extending Hive and
HBase Introduction
Module 9
Advanced HBase and
Oozie Introduction
Module 10
Project Set-up
Discussion
Get Started with BIG Data & Hadoop
Slide 22© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Corporate Partners
Get Started with BIG Data & Hadoop
Slide 23© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Lines open 24/7
To know more about the course, Please contact:
IND +91-90660-20904 USA 1866-607-6547 (Toll Free)
Or reach us at
sales@skillspeed.com
Contact Us
Get Started with BIG Data & Hadoop
Slide 24© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com
Image References
Google images – credit for google, Facebook and LinkedIn LOGO and Snapshots
http://findicons.com/icon/66444/user_group
http://www.virtualizor.com/tour
https://accounts.it.et.byu.edu/
http://www.clipartsfree.net/tag/server.html
http://www.gopixpic.com/16/time-clock-icon-png-download
http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/
http://www.lincs.fr/research/areas/big-data/
http://www.counsellingpages.co.uk/
http://langfordsconsultancy.com/langfords-training-support-package/
http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html
http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010
http://imgarcade.com/1/big-data-cartoon/
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Simplilearn
 

Was ist angesagt? (20)

Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data Science
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
 
Big data
Big dataBig data
Big data
 
Big data.
Big data.Big data.
Big data.
 
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
 

Andere mochten auch (8)

Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
 
An Introduction To Map-Reduce
An Introduction To Map-ReduceAn Introduction To Map-Reduce
An Introduction To Map-Reduce
 
Map Reduce introduction
Map Reduce introductionMap Reduce introduction
Map Reduce introduction
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
 
Map reduce vs spark
Map reduce vs sparkMap reduce vs spark
Map reduce vs spark
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 

Ähnlich wie Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals

HDFS & MapReduce
HDFS & MapReduceHDFS & MapReduce
HDFS & MapReduce
Skillspeed
 
Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopPredicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via Hadoop
Skillspeed
 
Top 5 Tasks Of A Hadoop Developer Webinar
Top 5 Tasks Of A Hadoop Developer WebinarTop 5 Tasks Of A Hadoop Developer Webinar
Top 5 Tasks Of A Hadoop Developer Webinar
Skillspeed
 
Hadoop Webinar 28July15
Hadoop Webinar 28July15Hadoop Webinar 28July15
Hadoop Webinar 28July15
Edureka!
 

Ähnlich wie Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals (20)

HDFS & MapReduce
HDFS & MapReduceHDFS & MapReduce
HDFS & MapReduce
 
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig Fundamentals
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopPredicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via Hadoop
 
Top 5 Tasks Of A Hadoop Developer Webinar
Top 5 Tasks Of A Hadoop Developer WebinarTop 5 Tasks Of A Hadoop Developer Webinar
Top 5 Tasks Of A Hadoop Developer Webinar
 
Hadoop for Business Intelligence Professionals
Hadoop for Business Intelligence ProfessionalsHadoop for Business Intelligence Professionals
Hadoop for Business Intelligence Professionals
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
Talend For Big Data : Secret Key to Hadoop
Talend For Big Data  : Secret Key to HadoopTalend For Big Data  : Secret Key to Hadoop
Talend For Big Data : Secret Key to Hadoop
 
Webinar : Talend : The Non-Programmer's Swiss Knife for Big Data
Webinar  : Talend : The Non-Programmer's Swiss Knife for Big DataWebinar  : Talend : The Non-Programmer's Swiss Knife for Big Data
Webinar : Talend : The Non-Programmer's Swiss Knife for Big Data
 
ETL using Big Data Talend
ETL using Big Data Talend  ETL using Big Data Talend
ETL using Big Data Talend
 
Talend webinar
Talend webinarTalend webinar
Talend webinar
 
Simplifying Big Data ETL with Talend
Simplifying Big Data ETL with TalendSimplifying Big Data ETL with Talend
Simplifying Big Data ETL with Talend
 
BIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in LogisticsBIG Data & Hadoop Applications in Logistics
BIG Data & Hadoop Applications in Logistics
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessLeveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine Business
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
 
Hadoop Webinar 28July15
Hadoop Webinar 28July15Hadoop Webinar 28July15
Hadoop Webinar 28July15
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?
 
5 Scenarios: When To Use & When Not to Use Hadoop
5 Scenarios: When To Use & When Not to Use Hadoop5 Scenarios: When To Use & When Not to Use Hadoop
5 Scenarios: When To Use & When Not to Use Hadoop
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use Hadoop
 
Realtime and Job Oriented Big Data/Hadoop Training in Marathahalli, Bangalore
Realtime and Job Oriented Big Data/Hadoop Training in Marathahalli, BangaloreRealtime and Job Oriented Big Data/Hadoop Training in Marathahalli, Bangalore
Realtime and Job Oriented Big Data/Hadoop Training in Marathahalli, Bangalore
 

Mehr von Skillspeed

Sentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSentiment Analysis via R Programming
Sentiment Analysis via R Programming
Skillspeed
 
Decoding Puppet & Jenkins via DevOps
Decoding Puppet & Jenkins via DevOpsDecoding Puppet & Jenkins via DevOps
Decoding Puppet & Jenkins via DevOps
Skillspeed
 
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python ArchitecturePython and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Skillspeed
 

Mehr von Skillspeed (8)

Sentiment Analysis via R Programming
Sentiment Analysis via R ProgrammingSentiment Analysis via R Programming
Sentiment Analysis via R Programming
 
Decoding Puppet & Jenkins via DevOps
Decoding Puppet & Jenkins via DevOpsDecoding Puppet & Jenkins via DevOps
Decoding Puppet & Jenkins via DevOps
 
Skillspeed Affiliate Program
Skillspeed Affiliate ProgramSkillspeed Affiliate Program
Skillspeed Affiliate Program
 
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python ArchitecturePython and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python Architecture
 
BIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareBIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in Healthcare
 
BIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in FinanceBIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in Finance
 
BIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-CommerceBIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-Commerce
 
BIG Data & Hadoop Applications in Retail
BIG Data & Hadoop Applications in RetailBIG Data & Hadoop Applications in Retail
BIG Data & Hadoop Applications in Retail
 

Kürzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals

  • 1. Slide 1© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Big Data Insights using MapReduce
  • 2. Slide 2© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Session Objectives ᗍ Introduction to Big Data and Hadoop ᗍ Understanding HDFS ᗍ Introduction to MapReduce – MapReduce Fundamentals ᗍ MapReduce Programming Tutorial ᗍ BIG Data Analytics via MapReduce ᗍ BIG Data & Hadoop Course Details ᗍ Webinar by Skillspeed Get Started with BIG Data & Hadoop
  • 3. Slide 3© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Big Data and its Challenges Get Started with BIG Data & Hadoop
  • 4. Slide 4© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Big Data and its Challenges Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information It’s very difficult to manage such huge data…… Get Started with BIG Data & Hadoop
  • 5. Slide 5© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Who Generates Big Data? Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data? Today, it is becoming a problem for all of us to manage such BIG DATA…. Get Started with BIG Data & Hadoop
  • 6. Slide 6© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hadoop can be used for easy processing of such huge Data….. We will answer how? Before that let’s understand what is Hadoop? Get Started with BIG Data & Hadoop
  • 7. Slide 7© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hadoop and its Characteristics Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity computers using a simple programming model It is an Open-source Data Management technology with scale-out storage and distributed processing Hadoop Characteristics Flexible Reliable Economical Scalable Get Started with BIG Data & Hadoop
  • 8. Slide 8© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Why Hadoop? How does Hadoop solve the Big Data challenges? Hadoop Platform is designed to address the big data problems Size of Data Variety of Data Get Started with BIG Data & Hadoop
  • 9. Slide 9© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hadoop Ecosystem Flume Sqoop Import Or Export Unstructured or Semi-Structured data Structured Data Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Hive DW System MapReduce Framework HBase Other YARN Frameworks (MPI, GIRAPH) YARN Cluster Resource Management Get Started with BIG Data & Hadoop
  • 10. Slide 10© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Map Reduce Get Started with BIG Data & Hadoop
  • 11. Slide 11© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Map Reduce – Scenario Let us consider a real life scenario to understand the importance of “Map Reduce” in Hadoop Suppose, you are the handling a project which has x tasks and takes 100 hours for one resource to complete 1 x 100 = 100 hours 100/10(resources) = 10 hours Get Started with BIG Data & Hadoop
  • 12. Slide 12© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Similarly, = 100 hours 100/10 = 10 hours Map Reduce – Scenario Get Started with BIG Data & Hadoop
  • 13. Slide 13© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com More Scenarios on Map-Reduce Problem Statement: Find maximum stock market levels recorded in a span of 5 years Problem Statement: De-identify personal identifier information Get Started with BIG Data & Hadoop
  • 14. Slide 14© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Traditional Solution matchesSplit Data Very Big Data All matches grep grep grep cat grep : matches matches matches Split Data Split Data Split Data Get Started with BIG Data & Hadoop
  • 15. Slide 15© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com MapReduce Solution Very Big Input Split Data All matches : Split Data Split Data Split Data M A P R E D U C E MapReduce Framework Get Started with BIG Data & Hadoop
  • 16. Slide 16© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com MapReduce Advantages Two biggest advantages: ᗍ Takes processing to the data ᗍ Allows processing data in parallel a b c Map Task HDFS Block Data Center Rack Node Get Started with BIG Data & Hadoop
  • 17. Slide 17© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com MapReduce Flow 1. Input data is present in data nodes 2. Map tasks = Input Splits 3. Mappers produce intermediate data 4. Data exchanged among nodes in “shuffling” 5. All data of same key goes to same reducer 6. Reducer output stored at output location Node 1 INPUT DATA Map Node 2 Map Node 1 Reduce Node 1 Reduce Get Started with BIG Data & Hadoop
  • 18. Slide 18© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com What is Expected? In this section, we will discuss the questions on HDFS and MapReduce that is asked during the interview This will help you analyze the importance of the topics under study! Get Started with BIG Data & Hadoop
  • 19. Slide 19© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Job Trends – Hadoop Get Started with BIG Data & Hadoop
  • 20. Slide 20© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Why SkillSpeed? Course Curriculum from Industry Experts Instructor Led Live Virtual Sessions Lifetime access to Course Content via LMS 100% Placement Assistance 24x7 Support Get Started with BIG Data & Hadoop
  • 21. Slide 21© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Course Topics Module 1 Introduction to Big Data and Hadoop Module 2 HDFS Internals, Hadoop Configurations and Data Loading Module 3 Introduction to Map Reduce Module 4 Advanced Map Reduce Concepts Module 5 Introduction to Pig Module 6 Advanced Pig and Introduction to Hive Module 7 Advanced Hive Concepts Module 8 Extending Hive and HBase Introduction Module 9 Advanced HBase and Oozie Introduction Module 10 Project Set-up Discussion Get Started with BIG Data & Hadoop
  • 22. Slide 22© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Corporate Partners Get Started with BIG Data & Hadoop
  • 23. Slide 23© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Lines open 24/7 To know more about the course, Please contact: IND +91-90660-20904 USA 1866-607-6547 (Toll Free) Or reach us at sales@skillspeed.com Contact Us Get Started with BIG Data & Hadoop
  • 24. Slide 24© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Image References Google images – credit for google, Facebook and LinkedIn LOGO and Snapshots http://findicons.com/icon/66444/user_group http://www.virtualizor.com/tour https://accounts.it.et.byu.edu/ http://www.clipartsfree.net/tag/server.html http://www.gopixpic.com/16/time-clock-icon-png-download http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/ http://www.lincs.fr/research/areas/big-data/ http://www.counsellingpages.co.uk/ http://langfordsconsultancy.com/langfords-training-support-package/ http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010 http://imgarcade.com/1/big-data-cartoon/

Hinweis der Redaktion

  1. SkillSpeed offer virtual instructor lead courses designed to bridge the time to competency gap experienced by the technology companies. USP of SkillSpeed is the subject matter expert (SME). SMEs are industry experts and has a good understanding and hands-on industry experience of the technology. This industry expert designs, develops, and delivers the course. SkillSpeed provides you: Course Curriculum from Industry Experts Instructor Led Live Virtual Sessions Real life industry case studies  - Live Virtual Interactions Interaction with industry experts  - Lifetime access to all course content via the LMS   - 24*7 support   - 100% placement assistance