Apache flink

A brief introduction to Apache Flink.

  1. 1. Apache Flink Introduction By: Ahmed Nader
  2. 2. Agenda • What’s Apache Flink? • Deeper into Flink • Quick Start and Configuration • Get your hands dirty • Tips and some useful links • References 2
  3. 3. What’s Apache Flink?  Open Source platform for distributed Stream and Batch Processing.  Large scale data processing engine.  Real Streaming engine, not cutting stream into batches.  Flink has 2 APIs. DataStrea m DataSet 3
  4. 4. Datastream API  Represents a continuous stream of data of certain type.  Operations applied on each element of the stream or windows. Data Strea m Operation Data Strea m Source Sink 4
  5. 5. Datastream API 5  Example Live Stock Feed: Apple 235 Alert if Microsoft > 120 Apple 235 Google 516 Sum every 10 seconds Microsoft 124 Microsoft 124 Google 516 Write event to database Alert if sum > 10000
  6. 6. Dataset API 6  Uses Batch processing.  Special case for Stream processing where finite data sources are just streams that happen to end.  Offers dedicated API with machine learning and graph processing libraries. Data Set Operation Data Set Source Sink
  7. 7. Dataset API 7  Example Map/Reduce paradigm: Map Reduce a 1 2 …
  8. 8. Flink Stack 8
  9. 9. Analyzing flink stack 9  Streaming dataflow runtime which interprets every program as a dataflow graph.  Some Libraries on top of Datastream and Dataset API such as:  Table: enables SQL like queries.  Gelly: Graph processing to transform and traverse graphs in a distributed fashion.  ML: has a couple of machine learning algorithms yet still too basic.  CEP: easily detect complex events in a data stream. Which can allow to get hold of what’s really important in your data.
  10. 10. Deeper into Flink 10 Data Sources From an input file From a socket From a collection
  11. 11. Deeper into Flink 11 Data Sinks Write to a CSV File Write to a socket Print on the terminal
  12. 12. Deeper into Flink 12  Data Transformations(for DataStream API):  Map: takes 1 element and produces 1 element.  flatMap: takes 1 element and produces 0 or more elements.  Filter: Evaluates a boolean value for each element and retains those returning true.  KeyBy: partitions a stream into disjoint partitions each has elements of the same key.  Window: groups all stream events according to some characteristic ex: data arrived in last 5 seconds.  Union, Join, Split, Select…
  13. 13. Deeper into Flink 13  Interesting Use cases:  Processing Twitter feed and one good application for that can be collecting statistics on that feed. see: http://blog.brakmic.com/stream-processing-with-apache-flink/  Identifying popular locations where people arrive by taxis, By applying filter and map functions on a datastream of taxi ride records then getting the most popular places for the last 15 minutes for example. see: https://www.mapr.com/blog/essential-guide-streaming-first- processing-apache-flink
  14. 14. Setup 14  Pre-requisites:  Java 7.x or higher.  Maven 3.0.4 or higher.  Start a new flink project using Maven: Run the following script in the terminal: mvn archetype:generate -DarchetypeGroupId=org.apache.flink - DarchetypeArtifactId=flink-quickstart-java -DarchetypeVersion=1.0.1 OR  Add flink to an existing project: see: https://ci.apache.org/projects/flink/flink-docs-release- 1.0/apis/common/index.html
  15. 15. Get your hands dirty: 15
  16. 16. Get your hands dirty: 16
  17. 17. Get your hands dirty: 17 Execution Local/debugging cluster Command Line Interface Web interface See: https://ci.apache.org/projects/flink/flink-docs-release-0.7/programming_guide.htm
  Tips and some useful links:  Clone the flink project on Github for more examples.  There's a free course by DataArtisans see: http://dataartisans.github.io/flink- training/index.html  Here are some other useful links too: • http://www.slideshare.net/sbaltagi/stepbystep-introduction-to-apache-flink • https://ci.apache.org/projects/flink/flink-docs-release- 0.7/programming_guide.html • https://ci.apache.org/projects/flink/flink-docs-release- 1.0/apis/common/index.html
  20. 20. 20 Thanks! Any Questions??