SlideShare ist ein Scribd-Unternehmen logo
1 von 90
Downloaden Sie, um offline zu lesen
8: MapReduce Application Scripting
Zubair Nabi
zubair.nabi@itu.edu.pk
May 25, 2013
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 1 / 28
Outline
1 Pig Latin
2 Cascading
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 2 / 28
Outline
1 Pig Latin
2 Cascading
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 3 / 28
Introduction
MapReduce is too low-level and rigid and leads to lots of custom user
code
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 4 / 28
Introduction
MapReduce is too low-level and rigid and leads to lots of custom user
code
Pig Latin is a declarative language atop MapReduce designed by
Yahoo!
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 4 / 28
Introduction
MapReduce is too low-level and rigid and leads to lots of custom user
code
Pig Latin is a declarative language atop MapReduce designed by
Yahoo!
Finds the sweet spot between the declarative style of SQL and the
low-level interface of MapReduce
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 4 / 28
Introduction
MapReduce is too low-level and rigid and leads to lots of custom user
code
Pig Latin is a declarative language atop MapReduce designed by
Yahoo!
Finds the sweet spot between the declarative style of SQL and the
low-level interface of MapReduce
The Pig system compiles Pig Latin queries into physical plans that are
executed atop Hadoop
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 4 / 28
SQL query to find average pagerank for each large category
of URLs
1 SELECT category , AVG(pagerank)
2 FROM urls WHERE pagerank > 0.2
3 GROUP BY category HAVING COUNT(∗) > 10^6
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 5 / 28
Equivalent Pig query
1 good_urls = FILTER urls BY pagerank > 0.2;
2 groups = GROUP good_urls BY category;
3 big_groups = FILTER groups BY COUNT(good_urls)>10^6;
4 output = FOREACH big_groups GENERATE category , AVG(good_urls.pagerank);
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 6 / 28
Pig Interface
A Pig Latin program is a sequence of steps, reminiscent of traditional
programming languages
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 7 / 28
Pig Interface
A Pig Latin program is a sequence of steps, reminiscent of traditional
programming languages
In contrast, SQL consists of declarative constraints that collectively
define the result
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 7 / 28
Pig Interface
A Pig Latin program is a sequence of steps, reminiscent of traditional
programming languages
In contrast, SQL consists of declarative constraints that collectively
define the result
Each step carries out a single data transformation
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 7 / 28
Pig Interface
A Pig Latin program is a sequence of steps, reminiscent of traditional
programming languages
In contrast, SQL consists of declarative constraints that collectively
define the result
Each step carries out a single data transformation
A Pig Latin program is similar to specifying a query execution or a
dataflow graph
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 7 / 28
Pig Interface
A Pig Latin program is a sequence of steps, reminiscent of traditional
programming languages
In contrast, SQL consists of declarative constraints that collectively
define the result
Each step carries out a single data transformation
A Pig Latin program is similar to specifying a query execution or a
dataflow graph
Due to this dataflow model, it is easier for programmers to understand
and control how their data processing task is executed
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 7 / 28
Features
Support for a fully nested data model with complex data types
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 8 / 28
Features
Support for a fully nested data model with complex data types
Extensive support for user-defined functions
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 8 / 28
Features
Support for a fully nested data model with complex data types
Extensive support for user-defined functions
Ability to operate over plain, schema-less input files
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 8 / 28
Features
Support for a fully nested data model with complex data types
Extensive support for user-defined functions
Ability to operate over plain, schema-less input files
Open-source Apache project
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 8 / 28
Interoperability
Queries can be performed atop raw data dumps directly
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 9 / 28
Interoperability
Queries can be performed atop raw data dumps directly
The user needs to provide a function to parse the content of the file into
tuples
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 9 / 28
Interoperability
Queries can be performed atop raw data dumps directly
The user needs to provide a function to parse the content of the file into
tuples
Similarly, the user also needs to provide a function to convert tuples
into a byte sequence
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 9 / 28
Interoperability
Queries can be performed atop raw data dumps directly
The user needs to provide a function to parse the content of the file into
tuples
Similarly, the user also needs to provide a function to convert tuples
into a byte sequence
Datasets can be laid across diverse data storage sources and
applications
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 9 / 28
UDFs as first-class citizens
A significant part of large-scale data analysis relies on custom
processing
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 10 / 28
UDFs as first-class citizens
A significant part of large-scale data analysis relies on custom
processing
For instance, the user may be interested in figuring out whether a
particular website is spam
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 10 / 28
UDFs as first-class citizens
A significant part of large-scale data analysis relies on custom
processing
For instance, the user may be interested in figuring out whether a
particular website is spam
All aspects of processing in Pig Latin including grouping, filtering,
joining, and per-tuple processing can be customized via UDFs
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 10 / 28
UDFs as first-class citizens
A significant part of large-scale data analysis relies on custom
processing
For instance, the user may be interested in figuring out whether a
particular website is spam
All aspects of processing in Pig Latin including grouping, filtering,
joining, and per-tuple processing can be customized via UDFs
UDFs take non-atomic parameters as input and produce non-atomic
values as output
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 10 / 28
UDFs as first-class citizens
A significant part of large-scale data analysis relies on custom
processing
For instance, the user may be interested in figuring out whether a
particular website is spam
All aspects of processing in Pig Latin including grouping, filtering,
joining, and per-tuple processing can be customized via UDFs
UDFs take non-atomic parameters as input and produce non-atomic
values as output
UDFs are defined in Java
1 groups = GROUP urls BY category;
2 output = FOREACH groups GENERATE
3 category , top10(urls);
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 10 / 28
Data Model
Pig has four data types:
1 Atom: A single atomic value such as a string or an integer
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 11 / 28
Data Model
Pig has four data types:
1 Atom: A single atomic value such as a string or an integer
2 Tuple: A sequence of values, each with possibly a different data type
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 11 / 28
Data Model
Pig has four data types:
1 Atom: A single atomic value such as a string or an integer
2 Tuple: A sequence of values, each with possibly a different data type
3 Bag: A collection of tuples
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 11 / 28
Data Model
Pig has four data types:
1 Atom: A single atomic value such as a string or an integer
2 Tuple: A sequence of values, each with possibly a different data type
3 Bag: A collection of tuples
4 Map: A collection of data types, each with an associated key
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 11 / 28
Commands
LOAD: Load and deserialize an input file
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 12 / 28
Commands
LOAD: Load and deserialize an input file
FOREACH: Process each tuple of a dataset
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 12 / 28
Commands
LOAD: Load and deserialize an input file
FOREACH: Process each tuple of a dataset
FILTER: Filter a dataset based on some condition or UDF
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 12 / 28
Commands
LOAD: Load and deserialize an input file
FOREACH: Process each tuple of a dataset
FILTER: Filter a dataset based on some condition or UDF
COGROUP: Group together tuples which are related in some way from
one or more datasets
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 12 / 28
Commands
LOAD: Load and deserialize an input file
FOREACH: Process each tuple of a dataset
FILTER: Filter a dataset based on some condition or UDF
COGROUP: Group together tuples which are related in some way from
one or more datasets
GROUP: Group together tuples which are related in some way from
one dataset
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 12 / 28
Commands
LOAD: Load and deserialize an input file
FOREACH: Process each tuple of a dataset
FILTER: Filter a dataset based on some condition or UDF
COGROUP: Group together tuples which are related in some way from
one or more datasets
GROUP: Group together tuples which are related in some way from
one dataset
STORE: Materialize the output of a Pig Latin expression to a file
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 12 / 28
Other Commands
UNION: Return the union of two or more bags
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 13 / 28
Other Commands
UNION: Return the union of two or more bags
CROSS: Return the cross product of two or more bags
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 13 / 28
Other Commands
UNION: Return the union of two or more bags
CROSS: Return the cross product of two or more bags
ORDER: Order a bag by a specified field
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 13 / 28
Other Commands
UNION: Return the union of two or more bags
CROSS: Return the cross product of two or more bags
ORDER: Order a bag by a specified field
DISTINCT: Eliminate duplicate tuples in a bag
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 13 / 28
MapReduce in PigLatin
1 map_result = FOREACH input GENERATE FLATTEN(map(∗));
2 key_groups = GROUP map_result BY $0;
3 output = FOREACH key_groups GENERATE reduce(∗);
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 14 / 28
Outline
1 Pig Latin
2 Cascading
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 15 / 28
Introduction
Many applications require a chain of MapReduce jobs
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 16 / 28
Introduction
Many applications require a chain of MapReduce jobs
Cascading allows the creation of processing pipelines using languages
that run atop the JVM
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 16 / 28
Introduction
Many applications require a chain of MapReduce jobs
Cascading allows the creation of processing pipelines using languages
that run atop the JVM
Source-pipe-sink paradigm
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 16 / 28
Introduction
Many applications require a chain of MapReduce jobs
Cascading allows the creation of processing pipelines using languages
that run atop the JVM
Source-pipe-sink paradigm
Data comes from sources
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 16 / 28
Introduction
Many applications require a chain of MapReduce jobs
Cascading allows the creation of processing pipelines using languages
that run atop the JVM
Source-pipe-sink paradigm
Data comes from sources
Pipes perform data analysis
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 16 / 28
Introduction
Many applications require a chain of MapReduce jobs
Cascading allows the creation of processing pipelines using languages
that run atop the JVM
Source-pipe-sink paradigm
Data comes from sources
Pipes perform data analysis
Results are written to sinks
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 16 / 28
Terminology
Pipe: data stream
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 17 / 28
Terminology
Pipe: data stream
Tuple: data record
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 17 / 28
Terminology
Pipe: data stream
Tuple: data record
Branch: chain of pipes
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 17 / 28
Terminology
Pipe: data stream
Tuple: data record
Branch: chain of pipes
Pipe Assembly: set of pipe branches
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 17 / 28
Terminology
Pipe: data stream
Tuple: data record
Branch: chain of pipes
Pipe Assembly: set of pipe branches
Tap: data source or sink
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 17 / 28
Terminology
Pipe: data stream
Tuple: data record
Branch: chain of pipes
Pipe Assembly: set of pipe branches
Tap: data source or sink
Flow: pipe assembly bound to a tap
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 17 / 28
Terminology
Pipe: data stream
Tuple: data record
Branch: chain of pipes
Pipe Assembly: set of pipe branches
Tap: data source or sink
Flow: pipe assembly bound to a tap
Cascade: a collection flows, in which one flow depends on the output
of another
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 17 / 28
Pipes
Base class: Pipe
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 18 / 28
Pipes
Base class: Pipe
Each: Analyze, transform, or filter individual tuples
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 18 / 28
Pipes
Base class: Pipe
Each: Analyze, transform, or filter individual tuples
Merge: Combine streams with same fields into one
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 18 / 28
Pipes
Base class: Pipe
Each: Analyze, transform, or filter individual tuples
Merge: Combine streams with same fields into one
GroupBy: Group tuples based on common values in a specified field
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 18 / 28
Pipes
Base class: Pipe
Each: Analyze, transform, or filter individual tuples
Merge: Combine streams with same fields into one
GroupBy: Group tuples based on common values in a specified field
CoGroup: Join streams (similar to SQL join)
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 18 / 28
Pipes
Base class: Pipe
Each: Analyze, transform, or filter individual tuples
Merge: Combine streams with same fields into one
GroupBy: Group tuples based on common values in a specified field
CoGroup: Join streams (similar to SQL join)
Every: Aggregate tuples
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 18 / 28
Pipes
Base class: Pipe
Each: Analyze, transform, or filter individual tuples
Merge: Combine streams with same fields into one
GroupBy: Group tuples based on common values in a specified field
CoGroup: Join streams (similar to SQL join)
Every: Aggregate tuples
HashJoin: Similar to CoGroup but more efficient if one stream can
be held in memory
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 18 / 28
Pipe Assemblies
Define the processing of tuple streams
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 19 / 28
Pipe Assemblies
Define the processing of tuple streams
Tuples are read/written to taps
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 19 / 28
Pipe Assemblies
Define the processing of tuple streams
Tuples are read/written to taps
Processing includes filtering, transforming, organizing, and calculating
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 19 / 28
Pipe Assemblies
Define the processing of tuple streams
Tuples are read/written to taps
Processing includes filtering, transforming, organizing, and calculating
Can use multiple taps
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 19 / 28
Pipe Assemblies
Define the processing of tuple streams
Tuples are read/written to taps
Processing includes filtering, transforming, organizing, and calculating
Can use multiple taps
May also define splits, merges, and joins to manipulate tuple streams
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 19 / 28
Example: Pipe Assembly
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 20 / 28
Example: Pipe Assembly (2)
1 Pipe lhs = new Pipe( "lhs" );
2 lhs = new Each( lhs, new SomeFunction() );
3 lhs = new Each( lhs, new SomeFilter() );
4
5 Pipe rhs = new Pipe( "rhs" );
6 rhs = new Each( rhs, new SomeFunction() );
7
8 Pipe join = new CoGroup( lhs, rhs );
9 join = new Every( join, new SomeAggregator() );
10 join = new GroupBy( join );
11 join = new Every( join, new SomeAggregator() );
12
13 join = new Each( join, new SomeFunction() );
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 21 / 28
Data Processing
Operation: Accept an input tuple, process it, and output zero or more
tuples
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 22 / 28
Data Processing
Operation: Accept an input tuple, process it, and output zero or more
tuples
Tuple: Array of fields
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 22 / 28
Data Processing
Operation: Accept an input tuple, process it, and output zero or more
tuples
Tuple: Array of fields
Field: Defines a data type, such as string, integer, etc.
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 22 / 28
Taps
Data flows in and out of taps
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 23 / 28
Taps
Data flows in and out of taps
Represent data sources and sinks, such local files, distributed FS files,
etc.
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 23 / 28
Taps
Data flows in and out of taps
Represent data sources and sinks, such local files, distributed FS files,
etc.
Each tap is associated with a scheme that describe the data, such as
TextLine, TextDelimited, etc.
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 23 / 28
Taps
Data flows in and out of taps
Represent data sources and sinks, such local files, distributed FS files,
etc.
Each tap is associated with a scheme that describe the data, such as
TextLine, TextDelimited, etc.
Sinks have modes such as SinkMode.KEEP,
SinkMode.REPLACE, and SinkMode.UPDATE
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 23 / 28
Flows
Represent entire pipelines
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 24 / 28
Flows
Represent entire pipelines
A pipeline reads data from a source, processes it, and then writes it to
a sink
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 24 / 28
Example: Flow
1 Pipe lhs = new Pipe( "lhs" );
2 lhs = new Each( lhs, new SomeFunction() );
3 lhs = new Each( lhs, new SomeFilter() );
4 Pipe rhs = new Pipe( "rhs" );
5 rhs = new Each( rhs, new SomeFunction() );
6 Pipe join = new CoGroup( lhs, rhs );
7 join = new Every( join, new SomeAggregator() );
8
9 Tap lhsSource = new Hfs( new TextLine(), "lhs.txt" );
10 Tap rhsSource = new Hfs( new TextLine(), "rhs.txt" );
11 Tap sink = new Hfs( new TextLine(), "output" );
12 FlowDef flowDef = new FlowDef()
13 .setName( "flow−name" )
14 .addSource( rhs, rhsSource )
15 .addSource( lhs, lhsSource )
16 .addTailSink( join, sink );
17 Flow flow = new HadoopFlowConnector().connect( flowDef );
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 25 / 28
Operations
Operations manipulate data
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 26 / 28
Operations
Operations manipulate data
Four kinds:
1 Function
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 26 / 28
Operations
Operations manipulate data
Four kinds:
1 Function
2 Filter
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 26 / 28
Operations
Operations manipulate data
Four kinds:
1 Function
2 Filter
3 Aggregator
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 26 / 28
Operations
Operations manipulate data
Four kinds:
1 Function
2 Filter
3 Aggregator
4 Buffer
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 26 / 28
Operations
Operations manipulate data
Four kinds:
1 Function
2 Filter
3 Aggregator
4 Buffer
Take an input tuple and emit zero or more tuples
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 26 / 28
Operations
Operations manipulate data
Four kinds:
1 Function
2 Filter
3 Aggregator
4 Buffer
Take an input tuple and emit zero or more tuples
Filter returns a Boolean
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 26 / 28
Operations
Operations manipulate data
Four kinds:
1 Function
2 Filter
3 Aggregator
4 Buffer
Take an input tuple and emit zero or more tuples
Filter returns a Boolean
Must be wrapped around in either Every or Each pipes
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 26 / 28
Example: Wordcount
1 Scheme sourceScheme = new TextLine( new Fields( "line" ) );
2 Tap source = new Hfs( sourceScheme , inputPath );
3 Scheme sinkScheme = new TextLine( new Fields( "word", "count" ) );
4 Tap sink = new Hfs( sinkScheme , outputPath , SinkMode.REPLACE );
5 Pipe assembly = new Pipe( "wordcount" );
6 String regex = " ";
7 Function function = new RegexGenerator( new Fields( "word" ), regex );
8 assembly = new Each( assembly , new Fields( "line" ), function );
9 assembly = new GroupBy( assembly , new Fields( "word" ) );
10 Aggregator count = new Count( new Fields( "count" ) );
11 assembly = new Every( assembly , count );
12 FlowConnector flowConnector = new FlowConnector();
13 Flow flow = flowConnector.connect( "word−count", source, sink, assembly );
14 flow.complete();
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 27 / 28
References
1 Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar,
and Andrew Tomkins. 2008. Pig latin: a not-so-foreign language for
data processing. In Proceedings of the 2008 ACM SIGMOD
international conference on Management of data (SIGMOD ’08). ACM,
New York, NY, USA, 1099-1110.
2 Cascading 2.1 User Guide: http://docs.cascading.org/
cascading/2.1/userguide/pdf/userguide.pdf
Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 28 / 28

Weitere ähnliche Inhalte

Andere mochten auch

AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itZubair Nabi
 
MapReduce and DBMS Hybrids
MapReduce and DBMS HybridsMapReduce and DBMS Hybrids
MapReduce and DBMS HybridsZubair Nabi
 
AOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversAOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversZubair Nabi
 
AOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondAOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondZubair Nabi
 
AOS Lab 7: Page tables
AOS Lab 7: Page tablesAOS Lab 7: Page tables
AOS Lab 7: Page tablesZubair Nabi
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!Zubair Nabi
 
AOS Lab 5: System calls
AOS Lab 5: System callsAOS Lab 5: System calls
AOS Lab 5: System callsZubair Nabi
 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: SchedulingZubair Nabi
 
AOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksAOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksZubair Nabi
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!Zubair Nabi
 
Topic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationTopic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationZubair Nabi
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 
AOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationAOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationZubair Nabi
 
Topic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingTopic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingZubair Nabi
 

Andere mochten auch (14)

AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on it
 
MapReduce and DBMS Hybrids
MapReduce and DBMS HybridsMapReduce and DBMS Hybrids
MapReduce and DBMS Hybrids
 
AOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversAOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device Drivers
 
AOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondAOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyond
 
AOS Lab 7: Page tables
AOS Lab 7: Page tablesAOS Lab 7: Page tables
AOS Lab 7: Page tables
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
 
AOS Lab 5: System calls
AOS Lab 5: System callsAOS Lab 5: System calls
AOS Lab 5: System calls
 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: Scheduling
 
AOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksAOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocks
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!
 
Topic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationTopic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and Virtualization
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
AOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationAOS Lab 12: Network Communication
AOS Lab 12: Network Communication
 
Topic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingTopic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and Networking
 

Ähnlich wie MapReduce Application Scripting

Topic 8: Enhancements and Alternative Architectures
Topic 8: Enhancements and Alternative ArchitecturesTopic 8: Enhancements and Alternative Architectures
Topic 8: Enhancements and Alternative ArchitecturesZubair Nabi
 
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.euDatabase migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eualdaschwede80
 
Topic 5: MapReduce Theory and Implementation
Topic 5: MapReduce Theory and ImplementationTopic 5: MapReduce Theory and Implementation
Topic 5: MapReduce Theory and ImplementationZubair Nabi
 
Topic 7: Shortcomings in the MapReduce Paradigm
Topic 7: Shortcomings in the MapReduce ParadigmTopic 7: Shortcomings in the MapReduce Paradigm
Topic 7: Shortcomings in the MapReduce ParadigmZubair Nabi
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...Srivatsan Ramanujam
 
Topic 12: NoSQL in Action
Topic 12: NoSQL in ActionTopic 12: NoSQL in Action
Topic 12: NoSQL in ActionZubair Nabi
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームMasayuki Matsushita
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream ProcessingZbigniew Jerzak
 
Extending twitter's data platform to google cloud
Extending twitter's data platform to google cloud Extending twitter's data platform to google cloud
Extending twitter's data platform to google cloud Vrushali Channapattan
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud lohitvijayarenu
 
presentation_these_141215
presentation_these_141215presentation_these_141215
presentation_these_141215Patrick Raad
 
Flink Apachecon Presentation
Flink Apachecon PresentationFlink Apachecon Presentation
Flink Apachecon PresentationGyula Fóra
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONijcsit
 
Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson Hakka Labs
 
Putting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at NetflixPutting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at NetflixJeff Magnusson
 

Ähnlich wie MapReduce Application Scripting (20)

Topic 8: Enhancements and Alternative Architectures
Topic 8: Enhancements and Alternative ArchitecturesTopic 8: Enhancements and Alternative Architectures
Topic 8: Enhancements and Alternative Architectures
 
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.euDatabase migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
 
Topic 5: MapReduce Theory and Implementation
Topic 5: MapReduce Theory and ImplementationTopic 5: MapReduce Theory and Implementation
Topic 5: MapReduce Theory and Implementation
 
Topic 7: Shortcomings in the MapReduce Paradigm
Topic 7: Shortcomings in the MapReduce ParadigmTopic 7: Shortcomings in the MapReduce Paradigm
Topic 7: Shortcomings in the MapReduce Paradigm
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
 
Pig latin
Pig latinPig latin
Pig latin
 
43_Sameer_Kumar_Das2
43_Sameer_Kumar_Das243_Sameer_Kumar_Das2
43_Sameer_Kumar_Das2
 
Topic 12: NoSQL in Action
Topic 12: NoSQL in ActionTopic 12: NoSQL in Action
Topic 12: NoSQL in Action
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream Processing
 
Extending twitter's data platform to google cloud
Extending twitter's data platform to google cloud Extending twitter's data platform to google cloud
Extending twitter's data platform to google cloud
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
presentation_these_141215
presentation_these_141215presentation_these_141215
presentation_these_141215
 
Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)Big Data & Hadoop. Simone Leo (CRS4)
Big Data & Hadoop. Simone Leo (CRS4)
 
Flink Apachecon Presentation
Flink Apachecon PresentationFlink Apachecon Presentation
Flink Apachecon Presentation
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
 
Lipstick On Pig
Lipstick On Pig Lipstick On Pig
Lipstick On Pig
 
Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson
 
Putting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at NetflixPutting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at Netflix
 

Mehr von Zubair Nabi

Lab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using MininetLab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using MininetZubair Nabi
 
Lab 4: Interfacing with Cassandra
Lab 4: Interfacing with CassandraLab 4: Interfacing with Cassandra
Lab 4: Interfacing with CassandraZubair Nabi
 
Topic 10: Taxonomy of Data and Storage
Topic 10: Taxonomy of Data and StorageTopic 10: Taxonomy of Data and Storage
Topic 10: Taxonomy of Data and StorageZubair Nabi
 
Topic 11: Google Filesystem
Topic 11: Google FilesystemTopic 11: Google Filesystem
Topic 11: Google FilesystemZubair Nabi
 
Lab 3: Writing a Naiad Application
Lab 3: Writing a Naiad ApplicationLab 3: Writing a Naiad Application
Lab 3: Writing a Naiad ApplicationZubair Nabi
 
Lab 1: Introduction to Amazon EC2 and MPI
Lab 1: Introduction to Amazon EC2 and MPILab 1: Introduction to Amazon EC2 and MPI
Lab 1: Introduction to Amazon EC2 and MPIZubair Nabi
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsZubair Nabi
 

Mehr von Zubair Nabi (8)

Lab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using MininetLab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using Mininet
 
Lab 4: Interfacing with Cassandra
Lab 4: Interfacing with CassandraLab 4: Interfacing with Cassandra
Lab 4: Interfacing with Cassandra
 
Topic 10: Taxonomy of Data and Storage
Topic 10: Taxonomy of Data and StorageTopic 10: Taxonomy of Data and Storage
Topic 10: Taxonomy of Data and Storage
 
Topic 11: Google Filesystem
Topic 11: Google FilesystemTopic 11: Google Filesystem
Topic 11: Google Filesystem
 
Lab 3: Writing a Naiad Application
Lab 3: Writing a Naiad ApplicationLab 3: Writing a Naiad Application
Lab 3: Writing a Naiad Application
 
Topic 9: MR+
Topic 9: MR+Topic 9: MR+
Topic 9: MR+
 
Lab 1: Introduction to Amazon EC2 and MPI
Lab 1: Introduction to Amazon EC2 and MPILab 1: Introduction to Amazon EC2 and MPI
Lab 1: Introduction to Amazon EC2 and MPI
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce Applications
 

Kürzlich hochgeladen

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Kürzlich hochgeladen (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

MapReduce Application Scripting

  • 1. 8: MapReduce Application Scripting Zubair Nabi zubair.nabi@itu.edu.pk May 25, 2013 Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 1 / 28
  • 2. Outline 1 Pig Latin 2 Cascading Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 2 / 28
  • 3. Outline 1 Pig Latin 2 Cascading Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 3 / 28
  • 4. Introduction MapReduce is too low-level and rigid and leads to lots of custom user code Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 4 / 28
  • 5. Introduction MapReduce is too low-level and rigid and leads to lots of custom user code Pig Latin is a declarative language atop MapReduce designed by Yahoo! Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 4 / 28
  • 6. Introduction MapReduce is too low-level and rigid and leads to lots of custom user code Pig Latin is a declarative language atop MapReduce designed by Yahoo! Finds the sweet spot between the declarative style of SQL and the low-level interface of MapReduce Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 4 / 28
  • 7. Introduction MapReduce is too low-level and rigid and leads to lots of custom user code Pig Latin is a declarative language atop MapReduce designed by Yahoo! Finds the sweet spot between the declarative style of SQL and the low-level interface of MapReduce The Pig system compiles Pig Latin queries into physical plans that are executed atop Hadoop Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 4 / 28
  • 8. SQL query to find average pagerank for each large category of URLs 1 SELECT category , AVG(pagerank) 2 FROM urls WHERE pagerank > 0.2 3 GROUP BY category HAVING COUNT(∗) > 10^6 Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 5 / 28
  • 9. Equivalent Pig query 1 good_urls = FILTER urls BY pagerank > 0.2; 2 groups = GROUP good_urls BY category; 3 big_groups = FILTER groups BY COUNT(good_urls)>10^6; 4 output = FOREACH big_groups GENERATE category , AVG(good_urls.pagerank); Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 6 / 28
  • 10. Pig Interface A Pig Latin program is a sequence of steps, reminiscent of traditional programming languages Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 7 / 28
  • 11. Pig Interface A Pig Latin program is a sequence of steps, reminiscent of traditional programming languages In contrast, SQL consists of declarative constraints that collectively define the result Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 7 / 28
  • 12. Pig Interface A Pig Latin program is a sequence of steps, reminiscent of traditional programming languages In contrast, SQL consists of declarative constraints that collectively define the result Each step carries out a single data transformation Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 7 / 28
  • 13. Pig Interface A Pig Latin program is a sequence of steps, reminiscent of traditional programming languages In contrast, SQL consists of declarative constraints that collectively define the result Each step carries out a single data transformation A Pig Latin program is similar to specifying a query execution or a dataflow graph Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 7 / 28
  • 14. Pig Interface A Pig Latin program is a sequence of steps, reminiscent of traditional programming languages In contrast, SQL consists of declarative constraints that collectively define the result Each step carries out a single data transformation A Pig Latin program is similar to specifying a query execution or a dataflow graph Due to this dataflow model, it is easier for programmers to understand and control how their data processing task is executed Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 7 / 28
  • 15. Features Support for a fully nested data model with complex data types Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 8 / 28
  • 16. Features Support for a fully nested data model with complex data types Extensive support for user-defined functions Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 8 / 28
  • 17. Features Support for a fully nested data model with complex data types Extensive support for user-defined functions Ability to operate over plain, schema-less input files Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 8 / 28
  • 18. Features Support for a fully nested data model with complex data types Extensive support for user-defined functions Ability to operate over plain, schema-less input files Open-source Apache project Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 8 / 28
  • 19. Interoperability Queries can be performed atop raw data dumps directly Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 9 / 28
  • 20. Interoperability Queries can be performed atop raw data dumps directly The user needs to provide a function to parse the content of the file into tuples Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 9 / 28
  • 21. Interoperability Queries can be performed atop raw data dumps directly The user needs to provide a function to parse the content of the file into tuples Similarly, the user also needs to provide a function to convert tuples into a byte sequence Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 9 / 28
  • 22. Interoperability Queries can be performed atop raw data dumps directly The user needs to provide a function to parse the content of the file into tuples Similarly, the user also needs to provide a function to convert tuples into a byte sequence Datasets can be laid across diverse data storage sources and applications Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 9 / 28
  • 23. UDFs as first-class citizens A significant part of large-scale data analysis relies on custom processing Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 10 / 28
  • 24. UDFs as first-class citizens A significant part of large-scale data analysis relies on custom processing For instance, the user may be interested in figuring out whether a particular website is spam Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 10 / 28
  • 25. UDFs as first-class citizens A significant part of large-scale data analysis relies on custom processing For instance, the user may be interested in figuring out whether a particular website is spam All aspects of processing in Pig Latin including grouping, filtering, joining, and per-tuple processing can be customized via UDFs Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 10 / 28
  • 26. UDFs as first-class citizens A significant part of large-scale data analysis relies on custom processing For instance, the user may be interested in figuring out whether a particular website is spam All aspects of processing in Pig Latin including grouping, filtering, joining, and per-tuple processing can be customized via UDFs UDFs take non-atomic parameters as input and produce non-atomic values as output Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 10 / 28
  • 27. UDFs as first-class citizens A significant part of large-scale data analysis relies on custom processing For instance, the user may be interested in figuring out whether a particular website is spam All aspects of processing in Pig Latin including grouping, filtering, joining, and per-tuple processing can be customized via UDFs UDFs take non-atomic parameters as input and produce non-atomic values as output UDFs are defined in Java 1 groups = GROUP urls BY category; 2 output = FOREACH groups GENERATE 3 category , top10(urls); Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 10 / 28
  • 28. Data Model Pig has four data types: 1 Atom: A single atomic value such as a string or an integer Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 11 / 28
  • 29. Data Model Pig has four data types: 1 Atom: A single atomic value such as a string or an integer 2 Tuple: A sequence of values, each with possibly a different data type Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 11 / 28
  • 30. Data Model Pig has four data types: 1 Atom: A single atomic value such as a string or an integer 2 Tuple: A sequence of values, each with possibly a different data type 3 Bag: A collection of tuples Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 11 / 28
  • 31. Data Model Pig has four data types: 1 Atom: A single atomic value such as a string or an integer 2 Tuple: A sequence of values, each with possibly a different data type 3 Bag: A collection of tuples 4 Map: A collection of data types, each with an associated key Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 11 / 28
  • 32. Commands LOAD: Load and deserialize an input file Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 12 / 28
  • 33. Commands LOAD: Load and deserialize an input file FOREACH: Process each tuple of a dataset Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 12 / 28
  • 34. Commands LOAD: Load and deserialize an input file FOREACH: Process each tuple of a dataset FILTER: Filter a dataset based on some condition or UDF Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 12 / 28
  • 35. Commands LOAD: Load and deserialize an input file FOREACH: Process each tuple of a dataset FILTER: Filter a dataset based on some condition or UDF COGROUP: Group together tuples which are related in some way from one or more datasets Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 12 / 28
  • 36. Commands LOAD: Load and deserialize an input file FOREACH: Process each tuple of a dataset FILTER: Filter a dataset based on some condition or UDF COGROUP: Group together tuples which are related in some way from one or more datasets GROUP: Group together tuples which are related in some way from one dataset Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 12 / 28
  • 37. Commands LOAD: Load and deserialize an input file FOREACH: Process each tuple of a dataset FILTER: Filter a dataset based on some condition or UDF COGROUP: Group together tuples which are related in some way from one or more datasets GROUP: Group together tuples which are related in some way from one dataset STORE: Materialize the output of a Pig Latin expression to a file Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 12 / 28
  • 38. Other Commands UNION: Return the union of two or more bags Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 13 / 28
  • 39. Other Commands UNION: Return the union of two or more bags CROSS: Return the cross product of two or more bags Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 13 / 28
  • 40. Other Commands UNION: Return the union of two or more bags CROSS: Return the cross product of two or more bags ORDER: Order a bag by a specified field Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 13 / 28
  • 41. Other Commands UNION: Return the union of two or more bags CROSS: Return the cross product of two or more bags ORDER: Order a bag by a specified field DISTINCT: Eliminate duplicate tuples in a bag Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 13 / 28
  • 42. MapReduce in PigLatin 1 map_result = FOREACH input GENERATE FLATTEN(map(∗)); 2 key_groups = GROUP map_result BY $0; 3 output = FOREACH key_groups GENERATE reduce(∗); Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 14 / 28
  • 43. Outline 1 Pig Latin 2 Cascading Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 15 / 28
  • 44. Introduction Many applications require a chain of MapReduce jobs Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 16 / 28
  • 45. Introduction Many applications require a chain of MapReduce jobs Cascading allows the creation of processing pipelines using languages that run atop the JVM Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 16 / 28
  • 46. Introduction Many applications require a chain of MapReduce jobs Cascading allows the creation of processing pipelines using languages that run atop the JVM Source-pipe-sink paradigm Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 16 / 28
  • 47. Introduction Many applications require a chain of MapReduce jobs Cascading allows the creation of processing pipelines using languages that run atop the JVM Source-pipe-sink paradigm Data comes from sources Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 16 / 28
  • 48. Introduction Many applications require a chain of MapReduce jobs Cascading allows the creation of processing pipelines using languages that run atop the JVM Source-pipe-sink paradigm Data comes from sources Pipes perform data analysis Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 16 / 28
  • 49. Introduction Many applications require a chain of MapReduce jobs Cascading allows the creation of processing pipelines using languages that run atop the JVM Source-pipe-sink paradigm Data comes from sources Pipes perform data analysis Results are written to sinks Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 16 / 28
  • 50. Terminology Pipe: data stream Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 17 / 28
  • 51. Terminology Pipe: data stream Tuple: data record Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 17 / 28
  • 52. Terminology Pipe: data stream Tuple: data record Branch: chain of pipes Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 17 / 28
  • 53. Terminology Pipe: data stream Tuple: data record Branch: chain of pipes Pipe Assembly: set of pipe branches Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 17 / 28
  • 54. Terminology Pipe: data stream Tuple: data record Branch: chain of pipes Pipe Assembly: set of pipe branches Tap: data source or sink Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 17 / 28
  • 55. Terminology Pipe: data stream Tuple: data record Branch: chain of pipes Pipe Assembly: set of pipe branches Tap: data source or sink Flow: pipe assembly bound to a tap Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 17 / 28
  • 56. Terminology Pipe: data stream Tuple: data record Branch: chain of pipes Pipe Assembly: set of pipe branches Tap: data source or sink Flow: pipe assembly bound to a tap Cascade: a collection flows, in which one flow depends on the output of another Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 17 / 28
  • 57. Pipes Base class: Pipe Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 18 / 28
  • 58. Pipes Base class: Pipe Each: Analyze, transform, or filter individual tuples Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 18 / 28
  • 59. Pipes Base class: Pipe Each: Analyze, transform, or filter individual tuples Merge: Combine streams with same fields into one Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 18 / 28
  • 60. Pipes Base class: Pipe Each: Analyze, transform, or filter individual tuples Merge: Combine streams with same fields into one GroupBy: Group tuples based on common values in a specified field Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 18 / 28
  • 61. Pipes Base class: Pipe Each: Analyze, transform, or filter individual tuples Merge: Combine streams with same fields into one GroupBy: Group tuples based on common values in a specified field CoGroup: Join streams (similar to SQL join) Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 18 / 28
  • 62. Pipes Base class: Pipe Each: Analyze, transform, or filter individual tuples Merge: Combine streams with same fields into one GroupBy: Group tuples based on common values in a specified field CoGroup: Join streams (similar to SQL join) Every: Aggregate tuples Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 18 / 28
  • 63. Pipes Base class: Pipe Each: Analyze, transform, or filter individual tuples Merge: Combine streams with same fields into one GroupBy: Group tuples based on common values in a specified field CoGroup: Join streams (similar to SQL join) Every: Aggregate tuples HashJoin: Similar to CoGroup but more efficient if one stream can be held in memory Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 18 / 28
  • 64. Pipe Assemblies Define the processing of tuple streams Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 19 / 28
  • 65. Pipe Assemblies Define the processing of tuple streams Tuples are read/written to taps Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 19 / 28
  • 66. Pipe Assemblies Define the processing of tuple streams Tuples are read/written to taps Processing includes filtering, transforming, organizing, and calculating Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 19 / 28
  • 67. Pipe Assemblies Define the processing of tuple streams Tuples are read/written to taps Processing includes filtering, transforming, organizing, and calculating Can use multiple taps Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 19 / 28
  • 68. Pipe Assemblies Define the processing of tuple streams Tuples are read/written to taps Processing includes filtering, transforming, organizing, and calculating Can use multiple taps May also define splits, merges, and joins to manipulate tuple streams Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 19 / 28
  • 69. Example: Pipe Assembly Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 20 / 28
  • 70. Example: Pipe Assembly (2) 1 Pipe lhs = new Pipe( "lhs" ); 2 lhs = new Each( lhs, new SomeFunction() ); 3 lhs = new Each( lhs, new SomeFilter() ); 4 5 Pipe rhs = new Pipe( "rhs" ); 6 rhs = new Each( rhs, new SomeFunction() ); 7 8 Pipe join = new CoGroup( lhs, rhs ); 9 join = new Every( join, new SomeAggregator() ); 10 join = new GroupBy( join ); 11 join = new Every( join, new SomeAggregator() ); 12 13 join = new Each( join, new SomeFunction() ); Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 21 / 28
  • 71. Data Processing Operation: Accept an input tuple, process it, and output zero or more tuples Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 22 / 28
  • 72. Data Processing Operation: Accept an input tuple, process it, and output zero or more tuples Tuple: Array of fields Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 22 / 28
  • 73. Data Processing Operation: Accept an input tuple, process it, and output zero or more tuples Tuple: Array of fields Field: Defines a data type, such as string, integer, etc. Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 22 / 28
  • 74. Taps Data flows in and out of taps Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 23 / 28
  • 75. Taps Data flows in and out of taps Represent data sources and sinks, such local files, distributed FS files, etc. Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 23 / 28
  • 76. Taps Data flows in and out of taps Represent data sources and sinks, such local files, distributed FS files, etc. Each tap is associated with a scheme that describe the data, such as TextLine, TextDelimited, etc. Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 23 / 28
  • 77. Taps Data flows in and out of taps Represent data sources and sinks, such local files, distributed FS files, etc. Each tap is associated with a scheme that describe the data, such as TextLine, TextDelimited, etc. Sinks have modes such as SinkMode.KEEP, SinkMode.REPLACE, and SinkMode.UPDATE Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 23 / 28
  • 78. Flows Represent entire pipelines Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 24 / 28
  • 79. Flows Represent entire pipelines A pipeline reads data from a source, processes it, and then writes it to a sink Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 24 / 28
  • 80. Example: Flow 1 Pipe lhs = new Pipe( "lhs" ); 2 lhs = new Each( lhs, new SomeFunction() ); 3 lhs = new Each( lhs, new SomeFilter() ); 4 Pipe rhs = new Pipe( "rhs" ); 5 rhs = new Each( rhs, new SomeFunction() ); 6 Pipe join = new CoGroup( lhs, rhs ); 7 join = new Every( join, new SomeAggregator() ); 8 9 Tap lhsSource = new Hfs( new TextLine(), "lhs.txt" ); 10 Tap rhsSource = new Hfs( new TextLine(), "rhs.txt" ); 11 Tap sink = new Hfs( new TextLine(), "output" ); 12 FlowDef flowDef = new FlowDef() 13 .setName( "flow−name" ) 14 .addSource( rhs, rhsSource ) 15 .addSource( lhs, lhsSource ) 16 .addTailSink( join, sink ); 17 Flow flow = new HadoopFlowConnector().connect( flowDef ); Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 25 / 28
  • 81. Operations Operations manipulate data Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 26 / 28
  • 82. Operations Operations manipulate data Four kinds: 1 Function Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 26 / 28
  • 83. Operations Operations manipulate data Four kinds: 1 Function 2 Filter Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 26 / 28
  • 84. Operations Operations manipulate data Four kinds: 1 Function 2 Filter 3 Aggregator Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 26 / 28
  • 85. Operations Operations manipulate data Four kinds: 1 Function 2 Filter 3 Aggregator 4 Buffer Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 26 / 28
  • 86. Operations Operations manipulate data Four kinds: 1 Function 2 Filter 3 Aggregator 4 Buffer Take an input tuple and emit zero or more tuples Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 26 / 28
  • 87. Operations Operations manipulate data Four kinds: 1 Function 2 Filter 3 Aggregator 4 Buffer Take an input tuple and emit zero or more tuples Filter returns a Boolean Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 26 / 28
  • 88. Operations Operations manipulate data Four kinds: 1 Function 2 Filter 3 Aggregator 4 Buffer Take an input tuple and emit zero or more tuples Filter returns a Boolean Must be wrapped around in either Every or Each pipes Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 26 / 28
  • 89. Example: Wordcount 1 Scheme sourceScheme = new TextLine( new Fields( "line" ) ); 2 Tap source = new Hfs( sourceScheme , inputPath ); 3 Scheme sinkScheme = new TextLine( new Fields( "word", "count" ) ); 4 Tap sink = new Hfs( sinkScheme , outputPath , SinkMode.REPLACE ); 5 Pipe assembly = new Pipe( "wordcount" ); 6 String regex = " "; 7 Function function = new RegexGenerator( new Fields( "word" ), regex ); 8 assembly = new Each( assembly , new Fields( "line" ), function ); 9 assembly = new GroupBy( assembly , new Fields( "word" ) ); 10 Aggregator count = new Count( new Fields( "count" ) ); 11 assembly = new Every( assembly , count ); 12 FlowConnector flowConnector = new FlowConnector(); 13 Flow flow = flowConnector.connect( "word−count", source, sink, assembly ); 14 flow.complete(); Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 27 / 28
  • 90. References 1 Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008. Pig latin: a not-so-foreign language for data processing. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD ’08). ACM, New York, NY, USA, 1099-1110. 2 Cascading 2.1 User Guide: http://docs.cascading.org/ cascading/2.1/userguide/pdf/userguide.pdf Zubair Nabi 8: MapReduce Application Scripting May 25, 2013 28 / 28