Suche senden
Hochladen
Hadoop MapReduce Streaming and Pipes
•
3 gefällt mir
•
4,614 views
Hanborq Inc.
Folgen
Introduction of Hadoop MapReduce Streaming and Pipes, for training.
Weniger lesen
Mehr lesen
Technologie
Melden
Teilen
Melden
Teilen
1 von 26
Empfohlen
Hadoop Streaming: Programming Hadoop without Java
Hadoop Streaming: Programming Hadoop without Java
Glenn K. Lockwood
Overview of Spark for HPC
Overview of Spark for HPC
Glenn K. Lockwood
03 pig intro
03 pig intro
Subhas Kumar Ghosh
06 pig etl features
06 pig etl features
Subhas Kumar Ghosh
Hadoop Interview Question and Answers
Hadoop Interview Question and Answers
techieguy85
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
soujavajug
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Yahoo Developer Network
MapReduce basic
MapReduce basic
Chirag Ahuja
Empfohlen
Hadoop Streaming: Programming Hadoop without Java
Hadoop Streaming: Programming Hadoop without Java
Glenn K. Lockwood
Overview of Spark for HPC
Overview of Spark for HPC
Glenn K. Lockwood
03 pig intro
03 pig intro
Subhas Kumar Ghosh
06 pig etl features
06 pig etl features
Subhas Kumar Ghosh
Hadoop Interview Question and Answers
Hadoop Interview Question and Answers
techieguy85
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
soujavajug
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Yahoo Developer Network
MapReduce basic
MapReduce basic
Chirag Ahuja
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
Yahoo Developer Network
Introduction To Map Reduce
Introduction To Map Reduce
rantav
Introduction to Apache Pig
Introduction to Apache Pig
Jason Shao
Hadoop
Hadoop
Scott Leberknight
Hadoop 2
Hadoop 2
EasyMedico.com
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduce
fvanvollenhoven
Map Reduce
Map Reduce
Rahul Agarwal
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
Hadoop & MapReduce
Hadoop & MapReduce
Newvewm
Hadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
Big Data Interview Questions
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
Farzad Nozarian
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
Subhas Kumar Ghosh
Hadoop pig
Hadoop pig
Sean Murphy
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
scottcrespo
MapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
Adam Kawa
Hadoop eco system-first class
Hadoop eco system-first class
alogarg
myHadoop 0.30
myHadoop 0.30
Glenn K. Lockwood
01 hbase
01 hbase
Subhas Kumar Ghosh
Seminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Sameer Tiwari
Designing Data Pipelines Using Hadoop
Designing Data Pipelines Using Hadoop
DataWorks Summit
Weitere ähnliche Inhalte
Was ist angesagt?
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
Yahoo Developer Network
Introduction To Map Reduce
Introduction To Map Reduce
rantav
Introduction to Apache Pig
Introduction to Apache Pig
Jason Shao
Hadoop
Hadoop
Scott Leberknight
Hadoop 2
Hadoop 2
EasyMedico.com
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduce
fvanvollenhoven
Map Reduce
Map Reduce
Rahul Agarwal
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
Hadoop & MapReduce
Hadoop & MapReduce
Newvewm
Hadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
Big Data Interview Questions
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
Farzad Nozarian
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
Subhas Kumar Ghosh
Hadoop pig
Hadoop pig
Sean Murphy
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
scottcrespo
MapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
Adam Kawa
Hadoop eco system-first class
Hadoop eco system-first class
alogarg
myHadoop 0.30
myHadoop 0.30
Glenn K. Lockwood
01 hbase
01 hbase
Subhas Kumar Ghosh
Was ist angesagt?
(19)
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
Introduction To Map Reduce
Introduction To Map Reduce
Introduction to Apache Pig
Introduction to Apache Pig
Hadoop
Hadoop
Hadoop 2
Hadoop 2
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduce
Map Reduce
Map Reduce
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Hadoop & MapReduce
Hadoop & MapReduce
Hadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
Hadoop pig
Hadoop pig
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
MapReduce Paradigm
MapReduce Paradigm
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
Hadoop eco system-first class
Hadoop eco system-first class
myHadoop 0.30
myHadoop 0.30
01 hbase
01 hbase
Andere mochten auch
Seminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Sameer Tiwari
Designing Data Pipelines Using Hadoop
Designing Data Pipelines Using Hadoop
DataWorks Summit
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
softwarequery
информатика 5. информация сообщение
информатика 5. информация сообщение
Вячеслав Васильченко
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Desing Pathshala
Hadoop combiner and partitioner
Hadoop combiner and partitioner
Subhas Kumar Ghosh
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
Ashraf Uddin
HDFS Federation
HDFS Federation
Hortonworks
Big Data Analytics
Big Data Analytics
Global Business Solutions SME
Big Data Analytics
Big Data Analytics
Ghulam Imaduddin
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
Adam Kawa
Basics of big data analytics hadoop
Basics of big data analytics hadoop
Ambuj Kumar
Hadoop Map Reduce
Hadoop Map Reduce
VNIT-ACM Student Chapter
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
BMC Software
Types of pipes
Types of pipes
Kaustuv Ruhela
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Hortonworks
Big Data Analytics 2014
Big Data Analytics 2014
Stratebi
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
Big data and Hadoop
Big data and Hadoop
Rahul Agarwal
Andere mochten auch
(20)
Seminar Presentation Hadoop
Seminar Presentation Hadoop
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Designing Data Pipelines Using Hadoop
Designing Data Pipelines Using Hadoop
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
информатика 5. информация сообщение
информатика 5. информация сообщение
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Hadoop combiner and partitioner
Hadoop combiner and partitioner
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
HDFS Federation
HDFS Federation
Big Data Analytics
Big Data Analytics
Big Data Analytics
Big Data Analytics
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
Basics of big data analytics hadoop
Basics of big data analytics hadoop
Hadoop Map Reduce
Hadoop Map Reduce
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
Types of pipes
Types of pipes
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Big Data Analytics 2014
Big Data Analytics 2014
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
Big data and Hadoop
Big data and Hadoop
Ähnlich wie Hadoop MapReduce Streaming and Pipes
Building Your First Apache Apex Application
Building Your First Apache Apex Application
Apache Apex
Building your first aplication using Apache Apex
Building your first aplication using Apache Apex
Yogi Devendra Vyavahare
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App Development
Apache Apex
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
Mao Geng
LAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96Boards
Linaro
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Ganesh Raju
A slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analytics
KrishnaVeni451953
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Apex
20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍
Tae Young Lee
H2O on Hadoop Dec 12
H2O on Hadoop Dec 12
Sri Ambati
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
PyData
Cloud Foundry Open Tour China
Cloud Foundry Open Tour China
marklucovsky
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
DrPDShebaKeziaMalarc
pig.ppt
pig.ppt
Sheba41
Hadoop introduction
Hadoop introduction
Dong Ngoc
Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop Engine
Nicolas Morales
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learned
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learned
Sri Ambati
NetFlow Data processing using Hadoop and Vertica
NetFlow Data processing using Hadoop and Vertica
Josef Niedermeier
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Skills Matter
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
clairvoyantllc
Ähnlich wie Hadoop MapReduce Streaming and Pipes
(20)
Building Your First Apache Apex Application
Building Your First Apache Apex Application
Building your first aplication using Apache Apex
Building your first aplication using Apache Apex
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App Development
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
LAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96Boards
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
A slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analytics
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍
H2O on Hadoop Dec 12
H2O on Hadoop Dec 12
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
Cloud Foundry Open Tour China
Cloud Foundry Open Tour China
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
pig.ppt
pig.ppt
Hadoop introduction
Hadoop introduction
Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop Engine
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learned
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learned
NetFlow Data processing using Hadoop and Vertica
NetFlow Data processing using Hadoop and Vertica
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
Mehr von Hanborq Inc.
Introduction to Cassandra
Introduction to Cassandra
Hanborq Inc.
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
Hanborq Inc.
Hadoop大数据实践经验
Hadoop大数据实践经验
Hanborq Inc.
FlumeBase Study
FlumeBase Study
Hanborq Inc.
Flume and Flive Introduction
Flume and Flive Introduction
Hanborq Inc.
HBase Introduction
HBase Introduction
Hanborq Inc.
Hadoop Versioning
Hadoop Versioning
Hanborq Inc.
Hadoop MapReduce Task Scheduler Introduction
Hadoop MapReduce Task Scheduler Introduction
Hanborq Inc.
Hadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep Insight
Hanborq Inc.
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
Hanborq Inc.
How to Build Cloud Storage Service Systems
How to Build Cloud Storage Service Systems
Hanborq Inc.
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduce
Hanborq Inc.
Mehr von Hanborq Inc.
(12)
Introduction to Cassandra
Introduction to Cassandra
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
Hadoop大数据实践经验
Hadoop大数据实践经验
FlumeBase Study
FlumeBase Study
Flume and Flive Introduction
Flume and Flive Introduction
HBase Introduction
HBase Introduction
Hadoop Versioning
Hadoop Versioning
Hadoop MapReduce Task Scheduler Introduction
Hadoop MapReduce Task Scheduler Introduction
Hadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep Insight
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
How to Build Cloud Storage Service Systems
How to Build Cloud Storage Service Systems
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduce
Kürzlich hochgeladen
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Igalia
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Rafal Los
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
Antenna Manufacturer Coco
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
RTylerCroy
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Katpro Technologies
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Malak Abu Hammad
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Enterprise Knowledge
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
UK Journal
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
The Digital Insurer
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
Kürzlich hochgeladen
(20)
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Hadoop MapReduce Streaming and Pipes
1.
Hadoop Streaming and Pipes
July 10, 2012 Clay Jiang Big Data Engineering Team Hanborq Inc.
2.
Hadoop Streaming • Hadoop
Streaming 是一个将任何可执行程序 /脚本当成Map/Reduce来执行MR Job的工具 • $HADOOP_HOME/contrib/streaming/hadoop- streaming-*.jar 2
3.
First Streaming Run •
基本命令: – hadoop jar $HADOOP_HOME/contrib/streaming/hadoop -streaming-*.jar -input /path/to/inputdir -output /path/to/outputdir -mapper /path/to/map_exec -reducer /path/to/reduce_exec 3
4.
How Streaming Works? •
Mapper/Reducer将 map_exec/reduce_exec作为单独进程启 动 • Mapper/Reducer通过stdin和stdout传输 <key,value> • <key,value>以约定的形式传输给 map_exec/reduce_exec,默认形式 为”keytvalue” 4
5.
How Streaming Works?
5
6.
Hadoop Streaming Example •
Streaming WordCount 6
7.
Streaming Internal • 只是工具,不是新的机制 •
在原有的MapReduce框架上,增加适配层: – PipeMapper + PipeMapRunner – PipeCombiner – PipeReducer – No PipePartitioner 7
8.
Streaming Internal PipeMapper/PipeReducer负责与可执行程序通过
stdin/stdout传输数据 8
9.
Streaming Internal • hadoop-streaming*.jar主入口: •
三个工具其中之一: 9
10.
Streaming-StreamJob • StreamJob
– parseArgv: • Argv Field Member – setJobConf: • Field Member JobConf – submitAndMonitorJob: • JobConf submit to JobClient 10
11.
Streaming Map • -mapper
<cmd|JavaClassName> • PipeMapRunner/PipeMapper – startOutputThreads: 启动线程MROutputThread 来“tail”map_exec的stdout,并使用 OutputReader 读取输出,解析后写到collector上 – PipeMapper.map: 使用InputWriter,将key/value 写成map_exec可以解析的字符串,写到 map_exec的 stdin 11
12.
Streaming Reduce • -reducer
<cmd|JavaClassName> • PipeReducer – 倚靠MapReduce内部机制shuffle到reducer – startOutputThread: 首次reduce时,类似地启动 MROutputThread来收集“reducer cmd”的stdout – 类似地,使用inputWriter来翻译reduce的 key/values,逐对提供给“reducer cmd” 12
13.
InputWriter/OutputReader • InputWriter
– 将<key,value>按预定的编码写到可执行程序的stdin • OutputReader – 读取可执行程序的stdout并解编码为<key,value> • InputWriter + OutputReader – 形成Java进程与map/reduce可执行进程的数据传输协议 13
14.
TextInputWriter/TextOutputReader • 默认使用:
– TextInputWriter、TextOutputReader • <key,value> key + separator + value • 默认separator: t 14
15.
Streaming Data Flow
15
16.
Streaming Combiner • -combiner
<cmd|JavaClassName> • PipeCombiner简单地继承了PipeReducer,流 程与PipeReducer相同 16
17.
Streaming Partitioner • -partitioner
<javaClassName> • 目前而言,partitioner必须为java类 17
18.
Streaming I/O Format •
-inputFormat <javaClassName> – JobConf.setInputFormat() • -outputFormat <javaClassName> – JobConf.setOutputFormat() • -inputreader <javaClassName>: • 使用StreamInputFormat 作为InputFormat 18
19.
Streaming IO Spec •
TextInputWriter/TextOuputReader: – stream.map/reduce.output.field.separator • map/reduce可执行程序输出使用的separator – stream.map/reduce.input.field.separator • map/reduce可执行程序输入使用的separator – stream.num/reduce.map.output.key.fields • Separator将行分割成多个field,指定若干个fields作 为key 19
20.
Streaming IO Spec •
-io text|rawbytes|typedbytes – text TextInputWriter/TextOutputReader – rawbytes RawBytesInputWriter/RawBytesOutputReader – typedbytes TypedBytesInputWriter/TypedOutputReader – 由IdentifierResolver解析选项 20
21.
User-Defined IO Spec •
MyInputWriter/MyOutputReader – extend InputWriter/OutputReader • MyIdentifierResovler – extend IdentifierResovler – 用于解析 my MyInputWriter/MyOutputReader – -Dstream.io.identifier.resolver.class MyIdentifierResovler 21
22.
Debug Streaming • -mapdebug/-reducedebug
– 当map/reduce task执行失败时,执行debug脚本 – $script $stdout $stderr $syslog $jobconf • -debug – 执行完毕时,不删除 /tmp/${user.name}/streamjob.jar 22
23.
V.S. Hadoop Pipes •
Stdin/stdout Socket • 限定I/O接口 $HADOOP_HOME/c++/$PLATFORM/include – HadoopPipes::Mapper::map(MapContext& context) – HadoopPipes::Reducer::reduce(ReduceContext& context) • Performance: One better than the other? 23
24.
V.S. Hadoop Pipes •
实现上很相似 – PipeMapper/PipeReducer PipesMapper/PipesReducer – InputWriter/OuputReader Application – 任何可执行程序 Pipes客户端需要链接 c++库 24
25.
参考 • (1)《Hadoop the
definitive guide》 • (2)Hadoop Streaming - http://hadoop.apache.org/common/docs/r0.20.2/streaming. html • (3)How to Debug Map/Reduce Programs http://wiki.apache.org/hadoop/HowToDebugMapReduceProg rams • (4)Hadoop Wiki http://wiki.apache.org/hadoop/ 25
26.
The End Thank You
Very Much! chiangbing@gmail.com 26