Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Hadoop Streaming One way to Getting Started on Hadoop

21.422 Aufrufe

Veröffentlicht am

Hadoop Streaming

One way to approach MapReduce jobs in Hadoop is to use streaming.
In other words, use any kind of script which can be run from a command
line and read/write data via stdin and stdout:
http://hadoop.apache.org/common/docs/current/streaming.html#Hadoop+Streaming


The following examples use Python scripts for Hadoop Streaming. One
really great benefit is that then you can dev/test/debug your MapReduce
code on small data sets from a command line simply by using pipes:


cat input.txt | mapper.py | sort | reducer.py

BTW, there are much better ways to handle Hadoop Streaming in Python
on Elastic MapReduce – for example, using the “boto” library. However,
these examples are kept simple so they’ll fit into a tech talk!

Veröffentlicht in: Technologie

×