Mr hadoop seedrocket

•

3 gefällt mir•622 views

SeedRocket

Technologie

M/R: Motivation

● Process big amount of data to
produce other data
● Scale up VS Scale out

M/R: What is it?

● Different programming paradigm
● Based on a google paper (2004)
● Automatic parallelization and distribution
● I/O Scheduling
● Fault tolerance
● Status and monitoring

M/R: Basics

● Input & Output: set of key/value pairs
● Big amount of data group & sort
● Job = Two phases = Mapper & Reducer

● Map (in_key, in_value) →
list(interm_key, interm_value)

● Reduce (interm_key, list(interm_value)) →
list (out_key, out_value)

Hadoop: What is it?

● Framework based on GMR / GFS
● Apache project
● Developed in Java
● Multiple applications
● Used by many companies
● And growing!

Hadoop: HDFS dist. systems

● Data is sent to
computation
(whereas here
computation goes to
data)
● Nodes must get info
from each other
● Very difficult to
recover from partial
failure

Hadoop: HDFS concepts

● Distributed file system.
Layer on top ext3, xfs...
● Works better on huge files
● Redundancy (default 3)
● Bad seeking, no append!
● Good rack scale. Not
good data center scale
● File divided in 64Mb –
128Mb blocks

Hadoop: Advanced stuff

● Distributed caches
● Partitioner
● Sort comparator
● Group comparator
● Combiner
● Input format & Record reader
● MultiInput
● MultiOutput
● Compression (LZO)

Hadoop: Conclussions

● Simplify large-scale computation
● Hide parallel programming issues
● Easy to get into & develop
● Deeply used & maintained by community
● Possibility yo throw away RDBMs! (Bottleneck)

MapReduce & Hadoop

● Juanjo Mostazo
● juanj.mostazo@gmail.com

Empfohlen

Google's DremelMaria Stylianou

Hadoop Case Studies in the Real WorldMobin Ranjbar

Introduction to Hadoop - FinistJugDavid Morin

Dremel: Interactive Analysis of Web-Scale Datasets robertlz

Sharding - patterns & antipatterns, Константин Осипов, Алексей РыбакOntico

Dremel interactive analysis of web scale datasetsCarl Lu

Geo data analyticsDaniel Marcous

Hadoop introduction葵慶李

Empfohlen

Google's DremelMaria Stylianou

Hadoop Case Studies in the Real WorldMobin Ranjbar

Introduction to Hadoop - FinistJugDavid Morin

Dremel: Interactive Analysis of Web-Scale Datasets robertlz

Sharding - patterns & antipatterns, Константин Осипов, Алексей РыбакOntico

Dremel interactive analysis of web scale datasetsCarl Lu

Geo data analyticsDaniel Marcous

Hadoop introduction葵慶李

Basic Hadoop Architecture V1 vs V2VIVEKVANAVAN

Davraz - A graph visualization and exploration software.TigerGraph

Hadoop eco system-first classalogarg

Hadoop introductionRabindra Nath Nandi

Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisTrieu Nguyen

Handling the growth of dataPiyush Katariya

No sqlBhanu Sekhar

Hadoop_EcoSystem_Pradeep_MGPradeep MG

Dremel Paper ReviewArinto Murdopo

Hadoop..NIKHIL P L

HBase introduction talkHayden Marchant

Big data quizParmpreet Singh

Hadoop - Introduction to map reduce programming - Reunião 12/04/2014soujavajug

Lightweight Collection and Storage of Software Repository Data with DataRoverChristoph Matthies

TechEvent Time Seriesd DatabasesTrivadis

The immutable database datomicLaurence Chen

Hadoop/Spark Non-Technical BasicsZitao Liu

Solr Payloads for Ranking Data BloomReach

Mastering Hadoop Map Reduce - Custom Types and Other Optimizationsscottcrespo

Fundamental of Big Data with Hadoop and HiveSharjeel Imtiaz

TrainingDoug Chang

Hadoop breizhjugDavid Morin

Weitere ähnliche Inhalte

Was ist angesagt?

Basic Hadoop Architecture V1 vs V2VIVEKVANAVAN

Davraz - A graph visualization and exploration software.TigerGraph

Hadoop eco system-first classalogarg

Hadoop introductionRabindra Nath Nandi

Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisTrieu Nguyen

Handling the growth of dataPiyush Katariya

No sqlBhanu Sekhar

Hadoop_EcoSystem_Pradeep_MGPradeep MG

Dremel Paper ReviewArinto Murdopo

Hadoop..NIKHIL P L

HBase introduction talkHayden Marchant

Big data quizParmpreet Singh

Hadoop - Introduction to map reduce programming - Reunião 12/04/2014soujavajug

Lightweight Collection and Storage of Software Repository Data with DataRoverChristoph Matthies

TechEvent Time Seriesd DatabasesTrivadis

The immutable database datomicLaurence Chen

Hadoop/Spark Non-Technical BasicsZitao Liu

Solr Payloads for Ranking Data BloomReach

Mastering Hadoop Map Reduce - Custom Types and Other Optimizationsscottcrespo

Fundamental of Big Data with Hadoop and HiveSharjeel Imtiaz

Was ist angesagt? (20)

Basic Hadoop Architecture V1 vs V2

Davraz - A graph visualization and exploration software.

Hadoop eco system-first class

Hadoop introduction

Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis

Handling the growth of data

No sql

Hadoop_EcoSystem_Pradeep_MG

Dremel Paper Review

Hadoop..

HBase introduction talk

Big data quiz

Hadoop - Introduction to map reduce programming - Reunião 12/04/2014

Lightweight Collection and Storage of Software Repository Data with DataRover

TechEvent Time Seriesd Databases

The immutable database datomic

Hadoop/Spark Non-Technical Basics

Solr Payloads for Ranking Data

Mastering Hadoop Map Reduce - Custom Types and Other Optimizations

Fundamental of Big Data with Hadoop and Hive

Ähnlich wie Mr hadoop seedrocket

TrainingDoug Chang

Hadoop breizhjugDavid Morin

Hadoop Training Tutorial for Freshersrajkamaltibacademy

Apache Hive for modern DBAsLuis Marques

JOSA TechTalks - Big Data on HadoopJordan Open Source Association

Hadoop-2.6.0 Slideskul prasad subedi

(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin

Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...spinningmatt

Glusterfs and HadoopShubhendu Tripathi

Big Data TechnologyJuan J. Mostazo

Python in big data worldRohit

Hadoop, MapReduce and R = RHadoopVictoria López

BW Tech Meetup: Hadoop and The rise of Big Data Mindgrub Technologies

Bw tech hadoopMindgrub Technologies

The Evolution of the Hadoop EcosystemCloudera, Inc.

Seminar_Report_hadoopVarun Narang

Hadoop by sunithaSunitha Satyadas

Apache hadoop, hdfs and map reduce OverviewNisanth Simon

BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...Big Data Montreal

Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Cognizant

Ähnlich wie Mr hadoop seedrocket (20)

Training

Hadoop breizhjug

Hadoop Training Tutorial for Freshers

Apache Hive for modern DBAs

JOSA TechTalks - Big Data on Hadoop

Hadoop-2.6.0 Slides

(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...

Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...

Glusterfs and Hadoop

Big Data Technology

Python in big data world

Hadoop, MapReduce and R = RHadoop

BW Tech Meetup: Hadoop and The rise of Big Data

Bw tech hadoop

The Evolution of the Hadoop Ecosystem

Seminar_Report_hadoop

Hadoop by sunitha

Apache hadoop, hdfs and map reduce Overview

BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...

Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...

Kürzlich hochgeladen

Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

UiPath Community: Communication Mining from Zero to HeroUiPathCommunity

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

Data governance with Unity Catalog PresentationKnoldus Inc.

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery

Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Connecting the Dots for Information Discovery.pdfNeo4j

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

Sample pptx for embedding into website for demoHarshalMandlekar2

Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani

2024 April Patch TuesdayIvanti

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

Kürzlich hochgeladen (20)

Genislab builds better products and faster go-to-market with Lean project man...

The Ultimate Guide to Choosing WordPress Pros and Cons

UiPath Community: Communication Mining from Zero to Hero

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

Data governance with Unity Catalog Presentation

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...

Assure Ecommerce and Retail Operations Uptime with ThousandEyes

DevEX - reference for building teams, processes, and platforms

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Connecting the Dots for Information Discovery.pdf

The State of Passkeys with FIDO Alliance.pptx

Sample pptx for embedding into website for demo

Potential of AI (Generative AI) in Business: Learnings and Insights

2024 April Patch Tuesday

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

[Webinar] SpiraTest - Setting New Standards in Quality Assurance

Take control of your SAP testing with UiPath Test Suite

Testing tools and AI - ideas what to try with some tool examples

Time Series Foundation Models - current state and future directions

Mr hadoop seedrocket

1. MapReduce & Hadoop ● Juanjo Mostazo ● SeedRocket BCN ● June 2011

2. M/R: Motivation ● Process big amount of data to produce other data ● Scale up VS Scale out

3. M/R: What is it? ● Different programming paradigm ● Based on a google paper (2004) ● Automatic parallelization and distribution ● I/O Scheduling ● Fault tolerance ● Status and monitoring

4. M/R: Basics ● Input & Output: set of key/value pairs ● Big amount of data group & sort ● Job = Two phases = Mapper & Reducer ● Map (in_key, in_value) → list(interm_key, interm_value) ● Reduce (interm_key, list(interm_value)) → list (out_key, out_value)

5. M/R: Example (word counter)

6. M/R: Workflow

7. Hadoop: What is it? ● Framework based on GMR / GFS ● Apache project ● Developed in Java ● Multiple applications ● Used by many companies ● And growing!

8. Hadoop: HDFS dist. systems ● Data is sent to computation (whereas here computation goes to data) ● Nodes must get info from each other ● Very difficult to recover from partial failure

9. Hadoop: HDFS concepts ● Distributed file system. Layer on top ext3, xfs... ● Works better on huge files ● Redundancy (default 3) ● Bad seeking, no append! ● Good rack scale. Not good data center scale ● File divided in 64Mb – 128Mb blocks

10. Hadoop: HDFS Architecture

11. Hadoop: Architecture v1

12. Hadoop: Architecture v2

13. Hadoop: Architecture v3

14. M/R: Example (word counter)

15. Hadoop: EcoSystem ● FLUME ● ZOOKEEPER

16. Hadoop: Advanced stuff ● Distributed caches ● Partitioner ● Sort comparator ● Group comparator ● Combiner ● Input format & Record reader ● MultiInput ● MultiOutput ● Compression (LZO)

17. Hadoop: Conclussions ● Simplify large-scale computation ● Hide parallel programming issues ● Easy to get into & develop ● Deeply used & maintained by community ● Possibility yo throw away RDBMs! (Bottleneck)

18. MapReduce & Hadoop ● Juanjo Mostazo ● juanj.mostazo@gmail.com