Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Apache flink

370 Aufrufe

Veröffentlicht am

A brief introduction to Apache Flink.

Veröffentlicht in: Software
  • Als Erste(r) kommentieren

Apache flink

  1. 1. Apache Flink Introduction By: Ahmed Nader
  2. 2. Agenda • What’s Apache Flink? • Deeper into Flink • Quick Start and Configuration • Get your hands dirty • Tips and some useful links • References 2
  3. 3. What’s Apache Flink?  Open Source platform for distributed Stream and Batch Processing.  Large scale data processing engine.  Real Streaming engine, not cutting stream into batches.  Flink has 2 APIs. DataStrea m DataSet 3
  4. 4. Datastream API  Represents a continuous stream of data of certain type.  Operations applied on each element of the stream or windows. Data Strea m Operation Data Strea m Source Sink 4
  5. 5. Datastream API 5  Example Live Stock Feed: Apple 235 Alert if Microsoft > 120 Apple 235 Google 516 Sum every 10 seconds Microsoft 124 Microsoft 124 Google 516 Write event to database Alert if sum > 10000
  6. 6. Dataset API 6  Uses Batch processing.  Special case for Stream processing where finite data sources are just streams that happen to end.  Offers dedicated API with machine learning and graph processing libraries. Data Set Operation Data Set Source Sink
  7. 7. Dataset API 7  Example Map/Reduce paradigm: Map Reduce a 1 2 …
  8. 8. Flink Stack 8
  9. 9. Analyzing flink stack 9  Streaming dataflow runtime which interprets every program as a dataflow graph.  Some Libraries on top of Datastream and Dataset API such as:  Table: enables SQL like queries.  Gelly: Graph processing to transform and traverse graphs in a distributed fashion.  ML: has a couple of machine learning algorithms yet still too basic.  CEP: easily detect complex events in a data stream. Which can allow to get hold of what’s really important in your data.
  10. 10. Deeper into Flink 10 Data Sources From an input file From a socket From a collection
  11. 11. Deeper into Flink 11 Data Sinks Write to a CSV File Write to a socket Print on the terminal
  12. 12. Deeper into Flink 12  Data Transformations(for DataStream API):  Map: takes 1 element and produces 1 element.  flatMap: takes 1 element and produces 0 or more elements.  Filter: Evaluates a boolean value for each element and retains those returning true.  KeyBy: partitions a stream into disjoint partitions each has elements of the same key.  Window: groups all stream events according to some characteristic ex: data arrived in last 5 seconds.  Union, Join, Split, Select…
  13. 13. Deeper into Flink 13  Interesting Use cases:  Processing Twitter feed and one good application for that can be collecting statistics on that feed. see: http://blog.brakmic.com/stream-processing-with-apache-flink/  Identifying popular locations where people arrive by taxis, By applying filter and map functions on a datastream of taxi ride records then getting the most popular places for the last 15 minutes for example. see: https://www.mapr.com/blog/essential-guide-streaming-first- processing-apache-flink
  14. 14. Setup 14  Pre-requisites:  Java 7.x or higher.  Maven 3.0.4 or higher.  Start a new flink project using Maven: Run the following script in the terminal: mvn archetype:generate -DarchetypeGroupId=org.apache.flink - DarchetypeArtifactId=flink-quickstart-java -DarchetypeVersion=1.0.1 OR  Add flink to an existing project: see: https://ci.apache.org/projects/flink/flink-docs-release- 1.0/apis/common/index.html
  15. 15. Get your hands dirty: 15
  16. 16. Get your hands dirty: 16
  17. 17. Get your hands dirty: 17 Execution Local/debugging cluster Command Line Interface Web interface See: https://ci.apache.org/projects/flink/flink-docs-release-0.7/programming_guide.htm
  18. 18. Tips and some useful links: 18  Subscribe to the mailing list, by sending an empty email to user-subscribe@flink.apache.org.  Clone the flink project on Github for more examples.  There’s a free course by DataArtisans see: http://dataartisans.github.io/flink- training/index.html  Here are some other useful links too: • http://www.slideshare.net/sbaltagi/stepbystep-introduction-to-apache-flink • https://ci.apache.org/projects/flink/flink-docs-release- 0.7/programming_guide.html • https://ci.apache.org/projects/flink/flink-docs-release- 1.0/apis/common/index.html
  19. 19. References 19  http://blog.brakmic.com/stream-processing-with-apache-flink/  http://www.slideshare.net/sbaltagi/stepbystep-introduction-to-apache-flink  https://www.mapr.com/blog/essential-guide-streaming-first-processing- apache-flink  https://ci.apache.org/projects/flink/flink-docs-release- 0.7/programming_guide.html  http://dataartisans.github.io/flink-training/index.html  https://ci.apache.org/projects/flink/flink-docs-release- 1.0/apis/common/index.html
  20. 20. 20 Thanks! Any Questions??