SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Pipeline.
TPL Dataflow.
Usage.

by Alexey Kursov
http://www.linkedin.com/in/kursov
TPL Dataflow
The Task Parallel Library (TPL) provides dataflow components to help increase the
robustness of concurrency-enabled applications. These dataflow components are
collectively referred to as the TPL Dataflow Library. Dataflow model providing inprocess message passing for coarse-grained dataflow and pipelining tasks...
WTF?

Pipeline? Dataflow?
Pipeline basics
In software engineering, a pipeline consists of a chain of processing elements
(processes, threads, coroutines, etc.), arranged so that the output of each element is
the input of the next. Usually some amount of buffering is provided between
consecutive elements. The information that flows in these pipelines is often a stream
of records, bytes or bits.

The concept is also called the pipes and filters design pattern. It was named by
analogy to a physical pipeline.
Simple example:
Pipeline basics
A linear pipeline is a series of processing stages which are arranged linearly to
perform a specific function over a data stream. The basic usages of linear pipeline is
instruction execution, arithmetic computation and memory access.

A non-linear pipeline (also called dynamic pipeline) can be configured to perform
various functions at different times. In a dynamic pipeline there is also feed forward
or feedback connection. Non-linear pipeline also allows very long instruction word.
Pipelines in real life
Pipelines in real life
Dataflow programming
Dataflow programming is a programming paradigm that
models a program as a directed graph of the data flowing
between operations, thus implementing dataflow principles and
architecture.
● emphasizes the movement of data
● program is series of connections
● explicitly defined inputs and outputs connect operations
Popular in

● parallel computing frameworks
● database engine designs
● digital signal processing
● network routing
● graphics processing
Usage
In Unix-like computer operating systems, a pipeline is the original software pipeline:
a set of processes chained by their standard streams, so that the output of each
process (stdout) feeds directly as input (stdin) to the next one. Each connection is
implemented by an anonymous pipe. Filter programs are often used in this
configuration.
The concept was invented by Douglas McIlroy
for Unix shells and it was named by analogy to a
physical pipeline.
Abstract and concrete examples:
% program1 | program2 | program3
% ls | grep xxx
Usage
Cascading is a Java application framework that enables typical developers to
quickly and easily develop rich Data Analytics and Data Management applications
that can be deployed and managed across a variety of computing environments.
Cascading works seamlessly with Apache Hadoop and API compatible distributions.
It follows a ‘source-pipe-sink’ paradigm, where data is captured from sources, follows
reusable ‘pipes’ that perform data analysis processes, where the results are stored in
output files or ‘sinks’
Usage
Cascading pipeline example:
Usage
Apache Crunch (Simple and Efficient MapReduce Pipelines by Cloudera)
The Apache Crunch Java library provides a framework for writing, testing, and
running MapReduce pipelines. Its goal is to make pipelines that are composed of
many user-defined functions simple to write, easy to test, and efficient to run.

Storm
Storm is a distributed realtime computation system. Similar to how Hadoop provides
a set of general primitives for doing batch processing, Storm provides a set of
general primitives for doing realtime computation. Storm is simple, can be used with
any programming language
TPL Dataflow
The Task Parallel Library (TPL) provides dataflow components to help increase the
robustness of concurrency-enabled applications. These dataflow components are
collectively referred to as the TPL Dataflow Library.

Data Flow Tasks
Coordination data
structure

Task parallel library

Threads
What it provides for me?
●

provides a foundation for message passing and parallelizing CPU-intensive and
I/O-intensive applications

●

gives you explicit control over how data is buffered and moves around the
system

●

improve responsiveness and throughput by efficiently managing the underlying
threads

●

allows you to easily create a mesh through which your data flows

●

meshes can split and join the data flows, and even contain data flow loops

●

allows to create custom blocks and extend functionality
Type of blocks
Dataflow blocks - are data structures that buffer and process
data.

1. source blocks (acts as a source of data ) ISourceBlock<TOutput>
2. target blocks (acts as a receiver of data) ITargetBlock<TInput>
3. propagator blocks (acts as both a source block and a
target block) IPropagatorBlock<TInput, TOutput>
Buffering blocks
●

BufferBlock<T> - stores a first in, first out (FIFO) queue of messages that can be written to by multiple
sources or read from by multiple targets. If some target receives message from bufferblock, that
message will be removed
input

●

output (original)

BroadcastBlock<T> - broadcast a message to multiple components
Current

input

output (originals or copies)

Task

●

WriteOnceBlock<T> - class resembles the BroadcastBlock<T> class, except that a
WriteOnceBlock<T> object can be written to one time only
input

First writed value (readonly)

Task

output (originals or copies)
Execution blocks
●

ActionBlock<TInput> - is a target block that calls a delegate when it receives data
input
Task

●

TransformBlock<TInput, TOutput> - it acts as both a source and as a target and delegate that you
pass should return a value of TOutput type
input

output
Task

●

TransformManyBlock<TInput, TOutput> - resembles the TransformBlock except that
TransformManyBlock produces zero or more output values for each input value, instead of only one
output value for each input value.
input

output
Task
Grouping blocks
●

BatchBlock<T> - combines sets of input data, which are known as batches, into arrays of output data.
input

output
Task

●

The JoinBlock<T1, T2> and JoinBlock<T1, T2, T3> - collect input elements and propagate out
System.Tuple<T1, T2> or System.Tuple<T1, T2, T3> objects that contain those elements
input (T1)
output
input (T2)

●

Task

The BatchedJoinBlock<T1, T2> and BatchedJoinBlock<T1, T2, T3> - collect batches of input
elements and propagate out System.Tuple(IList(T1), IList(T2)) or System.Tuple(IList(T1), IList(T2), IList
(T3)) objects that contain those elements
input (T1)
output
input (T2)

Task
LinkTo and Predicate
Link/UnLink
The ISourceBlock<TOutput>.LinkTo (returns IDisposable) method links a source dataflow block to a target
block. If you want to unlink block you should call Dispose method on result of LinkTo call. The predefined
dataflow block types handle all thread-safety aspects of linking and unlinking. Also the source will be unlinked
automatically if you set MaxMessages larger than -1 on LinkTo call in DataflowLinkOptions after the
declared number of messages is received

Predicate
When you link target block you can set “predicate” that will check message before adding it to input buffer.
You should specify delegate in DataflowLinkOptions that recives message of TInput type of target block
and returns bool value.
Another options
You can specify:

●

degree of parallelism for block

●

maximum number of messages that may be buffered by the block

●

task scheduler

●

number of message per task

●

cancellation

●

greedy behavior

●

completion
Recommendations
Recommendations for building TPL Dataflow pipelines:

●

make each block do one thing well

●

design for composition

●

be stateless where you can
Use cases
1.

Prototyping pipelines for use in more complex systems

2.

Development of flexible asynchronous applications that process some data, like:
○
○

Image processors

○

Sound processors

○

Pipelines in mobile phone apps

○

Data analysis/mining services

○
3.

Web-crawlers

etc.

Study pipeline based development
Practice
Useful links

●

http://www.nuget.org/packages/Microsoft.Tpl.Dataflow/

●

http://msdn.microsoft.com/en-us/library/hh228603.aspx

●

http://blogs.microsoft.co.il/blogs/bnaya/archive/2012/01/28/tpl-dataflow-walkthrough-part-5.aspx

●

http://www.cascading.org/

●

http://crunch.apache.org/

●

http://storm-project.net/
Thanks for your attention!

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep DiveVasia Kalavri
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Stephan Ewen
 
Apache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink MeetupApache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink MeetupStephan Ewen
 
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in FlinkAnwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in FlinkFlink Forward
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingKostas Tzoumas
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseKostas Tzoumas
 
Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Rajeev Rastogi (KRR)
 
Flink Batch Processing and Iterations
Flink Batch Processing and IterationsFlink Batch Processing and Iterations
Flink Batch Processing and IterationsSameer Wadkar
 
Vasia Kalavri – Training: Gelly School
Vasia Kalavri – Training: Gelly School Vasia Kalavri – Training: Gelly School
Vasia Kalavri – Training: Gelly School Flink Forward
 
A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark Anyscale
 
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Stephan Ewen
 
Virtual Flink Forward 2020: Cogynt: Flink without code - Samantha Chan, Aslam...
Virtual Flink Forward 2020: Cogynt: Flink without code - Samantha Chan, Aslam...Virtual Flink Forward 2020: Cogynt: Flink without code - Samantha Chan, Aslam...
Virtual Flink Forward 2020: Cogynt: Flink without code - Samantha Chan, Aslam...Flink Forward
 
CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)Ortus Solutions, Corp
 
The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016Frank Lyaruu
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkVasia Kalavri
 
Universal metrics with Apache Beam
Universal metrics with Apache BeamUniversal metrics with Apache Beam
Universal metrics with Apache BeamEtienne Chauchot
 
Stream data mining & CluStream framework
Stream data mining & CluStream frameworkStream data mining & CluStream framework
Stream data mining & CluStream frameworkYueshen Xu
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HAParis Carbone
 

Was ist angesagt? (20)

Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Apache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink MeetupApache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink Meetup
 
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in FlinkAnwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
06 u 2
06 u 206 u 2
06 u 2
 
Structured streaming in Spark
Structured streaming in SparkStructured streaming in Spark
Structured streaming in Spark
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
 
Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2
 
Flink Batch Processing and Iterations
Flink Batch Processing and IterationsFlink Batch Processing and Iterations
Flink Batch Processing and Iterations
 
Vasia Kalavri – Training: Gelly School
Vasia Kalavri – Training: Gelly School Vasia Kalavri – Training: Gelly School
Vasia Kalavri – Training: Gelly School
 
A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark
 
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016
 
Virtual Flink Forward 2020: Cogynt: Flink without code - Samantha Chan, Aslam...
Virtual Flink Forward 2020: Cogynt: Flink without code - Samantha Chan, Aslam...Virtual Flink Forward 2020: Cogynt: Flink without code - Samantha Chan, Aslam...
Virtual Flink Forward 2020: Cogynt: Flink without code - Samantha Chan, Aslam...
 
CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)
 
The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache Flink
 
Universal metrics with Apache Beam
Universal metrics with Apache BeamUniversal metrics with Apache Beam
Universal metrics with Apache Beam
 
Stream data mining & CluStream framework
Stream data mining & CluStream frameworkStream data mining & CluStream framework
Stream data mining & CluStream framework
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HA
 

Ähnlich wie Tpl dataflow

Real time data-pipeline from inception to production
Real time data-pipeline from inception to productionReal time data-pipeline from inception to production
Real time data-pipeline from inception to productionShreya Mukhopadhyay
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsMichael Häusler
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsBertram Ludäscher
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
2007 Tidc India Profiling
2007 Tidc India Profiling2007 Tidc India Profiling
2007 Tidc India Profilingdanrinkes
 
Task 803   - 1 page Instructions Distinguish between full con.docx
Task 803   - 1 page Instructions Distinguish between full con.docxTask 803   - 1 page Instructions Distinguish between full con.docx
Task 803   - 1 page Instructions Distinguish between full con.docxrudybinks
 
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...Ortus Solutions, Corp
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsLightbend
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)maclean liu
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streamsRadu Tudoran
 
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFramesTaking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFramesDatabricks
 
Software architecture unit 4
Software architecture unit 4Software architecture unit 4
Software architecture unit 4yawani05
 
The Overview of Discovery and Reconciliation of LTE Network
The Overview of Discovery and Reconciliation of LTE NetworkThe Overview of Discovery and Reconciliation of LTE Network
The Overview of Discovery and Reconciliation of LTE NetworkIRJET Journal
 
Web based-distributed-sesnzer-using-service-oriented-architecture
Web based-distributed-sesnzer-using-service-oriented-architectureWeb based-distributed-sesnzer-using-service-oriented-architecture
Web based-distributed-sesnzer-using-service-oriented-architectureAidah Izzah Huriyah
 

Ähnlich wie Tpl dataflow (20)

Real time data-pipeline from inception to production
Real time data-pipeline from inception to productionReal time data-pipeline from inception to production
Real time data-pipeline from inception to production
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
 
ETL DW-RealTime
ETL DW-RealTimeETL DW-RealTime
ETL DW-RealTime
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
2007 Tidc India Profiling
2007 Tidc India Profiling2007 Tidc India Profiling
2007 Tidc India Profiling
 
Task 803   - 1 page Instructions Distinguish between full con.docx
Task 803   - 1 page Instructions Distinguish between full con.docxTask 803   - 1 page Instructions Distinguish between full con.docx
Task 803   - 1 page Instructions Distinguish between full con.docx
 
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFramesTaking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFrames
 
Software architecture unit 4
Software architecture unit 4Software architecture unit 4
Software architecture unit 4
 
Interpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with SawzallInterpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with Sawzall
 
Data provenance in Hopsworks
Data provenance in HopsworksData provenance in Hopsworks
Data provenance in Hopsworks
 
The Overview of Discovery and Reconciliation of LTE Network
The Overview of Discovery and Reconciliation of LTE NetworkThe Overview of Discovery and Reconciliation of LTE Network
The Overview of Discovery and Reconciliation of LTE Network
 
2017 nov reflow sbtb
2017 nov reflow sbtb2017 nov reflow sbtb
2017 nov reflow sbtb
 
Web based-distributed-sesnzer-using-service-oriented-architecture
Web based-distributed-sesnzer-using-service-oriented-architectureWeb based-distributed-sesnzer-using-service-oriented-architecture
Web based-distributed-sesnzer-using-service-oriented-architecture
 

Kürzlich hochgeladen

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Tpl dataflow

  • 1.
  • 2. Pipeline. TPL Dataflow. Usage. by Alexey Kursov http://www.linkedin.com/in/kursov
  • 3. TPL Dataflow The Task Parallel Library (TPL) provides dataflow components to help increase the robustness of concurrency-enabled applications. These dataflow components are collectively referred to as the TPL Dataflow Library. Dataflow model providing inprocess message passing for coarse-grained dataflow and pipelining tasks...
  • 5. Pipeline basics In software engineering, a pipeline consists of a chain of processing elements (processes, threads, coroutines, etc.), arranged so that the output of each element is the input of the next. Usually some amount of buffering is provided between consecutive elements. The information that flows in these pipelines is often a stream of records, bytes or bits. The concept is also called the pipes and filters design pattern. It was named by analogy to a physical pipeline. Simple example:
  • 6. Pipeline basics A linear pipeline is a series of processing stages which are arranged linearly to perform a specific function over a data stream. The basic usages of linear pipeline is instruction execution, arithmetic computation and memory access. A non-linear pipeline (also called dynamic pipeline) can be configured to perform various functions at different times. In a dynamic pipeline there is also feed forward or feedback connection. Non-linear pipeline also allows very long instruction word.
  • 9. Dataflow programming Dataflow programming is a programming paradigm that models a program as a directed graph of the data flowing between operations, thus implementing dataflow principles and architecture. ● emphasizes the movement of data ● program is series of connections ● explicitly defined inputs and outputs connect operations
  • 10. Popular in ● parallel computing frameworks ● database engine designs ● digital signal processing ● network routing ● graphics processing
  • 11. Usage In Unix-like computer operating systems, a pipeline is the original software pipeline: a set of processes chained by their standard streams, so that the output of each process (stdout) feeds directly as input (stdin) to the next one. Each connection is implemented by an anonymous pipe. Filter programs are often used in this configuration. The concept was invented by Douglas McIlroy for Unix shells and it was named by analogy to a physical pipeline. Abstract and concrete examples: % program1 | program2 | program3 % ls | grep xxx
  • 12. Usage Cascading is a Java application framework that enables typical developers to quickly and easily develop rich Data Analytics and Data Management applications that can be deployed and managed across a variety of computing environments. Cascading works seamlessly with Apache Hadoop and API compatible distributions. It follows a ‘source-pipe-sink’ paradigm, where data is captured from sources, follows reusable ‘pipes’ that perform data analysis processes, where the results are stored in output files or ‘sinks’
  • 14. Usage Apache Crunch (Simple and Efficient MapReduce Pipelines by Cloudera) The Apache Crunch Java library provides a framework for writing, testing, and running MapReduce pipelines. Its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run. Storm Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language
  • 15. TPL Dataflow The Task Parallel Library (TPL) provides dataflow components to help increase the robustness of concurrency-enabled applications. These dataflow components are collectively referred to as the TPL Dataflow Library. Data Flow Tasks Coordination data structure Task parallel library Threads
  • 16. What it provides for me? ● provides a foundation for message passing and parallelizing CPU-intensive and I/O-intensive applications ● gives you explicit control over how data is buffered and moves around the system ● improve responsiveness and throughput by efficiently managing the underlying threads ● allows you to easily create a mesh through which your data flows ● meshes can split and join the data flows, and even contain data flow loops ● allows to create custom blocks and extend functionality
  • 17. Type of blocks Dataflow blocks - are data structures that buffer and process data. 1. source blocks (acts as a source of data ) ISourceBlock<TOutput> 2. target blocks (acts as a receiver of data) ITargetBlock<TInput> 3. propagator blocks (acts as both a source block and a target block) IPropagatorBlock<TInput, TOutput>
  • 18. Buffering blocks ● BufferBlock<T> - stores a first in, first out (FIFO) queue of messages that can be written to by multiple sources or read from by multiple targets. If some target receives message from bufferblock, that message will be removed input ● output (original) BroadcastBlock<T> - broadcast a message to multiple components Current input output (originals or copies) Task ● WriteOnceBlock<T> - class resembles the BroadcastBlock<T> class, except that a WriteOnceBlock<T> object can be written to one time only input First writed value (readonly) Task output (originals or copies)
  • 19. Execution blocks ● ActionBlock<TInput> - is a target block that calls a delegate when it receives data input Task ● TransformBlock<TInput, TOutput> - it acts as both a source and as a target and delegate that you pass should return a value of TOutput type input output Task ● TransformManyBlock<TInput, TOutput> - resembles the TransformBlock except that TransformManyBlock produces zero or more output values for each input value, instead of only one output value for each input value. input output Task
  • 20. Grouping blocks ● BatchBlock<T> - combines sets of input data, which are known as batches, into arrays of output data. input output Task ● The JoinBlock<T1, T2> and JoinBlock<T1, T2, T3> - collect input elements and propagate out System.Tuple<T1, T2> or System.Tuple<T1, T2, T3> objects that contain those elements input (T1) output input (T2) ● Task The BatchedJoinBlock<T1, T2> and BatchedJoinBlock<T1, T2, T3> - collect batches of input elements and propagate out System.Tuple(IList(T1), IList(T2)) or System.Tuple(IList(T1), IList(T2), IList (T3)) objects that contain those elements input (T1) output input (T2) Task
  • 21. LinkTo and Predicate Link/UnLink The ISourceBlock<TOutput>.LinkTo (returns IDisposable) method links a source dataflow block to a target block. If you want to unlink block you should call Dispose method on result of LinkTo call. The predefined dataflow block types handle all thread-safety aspects of linking and unlinking. Also the source will be unlinked automatically if you set MaxMessages larger than -1 on LinkTo call in DataflowLinkOptions after the declared number of messages is received Predicate When you link target block you can set “predicate” that will check message before adding it to input buffer. You should specify delegate in DataflowLinkOptions that recives message of TInput type of target block and returns bool value.
  • 22. Another options You can specify: ● degree of parallelism for block ● maximum number of messages that may be buffered by the block ● task scheduler ● number of message per task ● cancellation ● greedy behavior ● completion
  • 23. Recommendations Recommendations for building TPL Dataflow pipelines: ● make each block do one thing well ● design for composition ● be stateless where you can
  • 24. Use cases 1. Prototyping pipelines for use in more complex systems 2. Development of flexible asynchronous applications that process some data, like: ○ ○ Image processors ○ Sound processors ○ Pipelines in mobile phone apps ○ Data analysis/mining services ○ 3. Web-crawlers etc. Study pipeline based development
  • 27. Thanks for your attention!