SlideShare ist ein Scribd-Unternehmen logo
1 von 18
- GARY YAO
FLINK AS A LIBRARY
(AND STILL AS A FRAMEWORK)
© 2018 data Artisans2
ABOUT DATA ARTISANS
Original Creators of
Apache Flink®
RealTime Stream Processing Enterprise
Ready
© 2018 data Artisans3
FREE TRIAL DOWNLOAD
data-artisans.com/download
© 2018 data Artisans4
MOTIVATION
© 2018 data Artisans5
FLINK DEPLOYMENT OPTIONS & LIMITATIONS
BeforeApache Flink 1.5
• Shortcomings:
‒ Fixed number ofTaskManagers
‒ Difficult to support autoscaling
‒ Supporting new cluster managers difficult
‒ Akka for client—cluster communication
‒ Undisciplined about Job clusters
Session cluster Job cluster
Standalone ✓
YARN ✓ ✓
Mesos ✓
Kubernetes ✓
© 2018 data Artisans6
FLINK AS A LIBRARY
• Make deployments easier
• Application-oriented instead of cluster-oriented
• Just a start one or more Job jars, and automatically get distributed execution
• Remove distinction between different roles in the cluster (JM vs.TM)
• Participants of the cluster should determine roles on their own
‸
© 2018 data Artisans7
RESOURCE MANAGEMENT & AUTOSCALING
• Short-lived & high latency
• Resources only exist for the duration of
the job
• Different stages in a job may require
different resources
• Flink should be in control of
allocating/releasing resources
• Long-lived & low latency
• Dependent on external systems & exposed to
varying load
• External system should drive resource allocation
to meet SLO/SLAs
• Flink should react to available resources
Real-time
Applications
Batch Analytics
Continuous
processing
© 2018 data Artisans8
CONTAINERIZED DEPLOYMENT
Why Flink on containers?
• Self-contained runtime environment that includes application code, libraries,
dependencies, and configuration files
• Ease of operations by clean separation of concerns
• Enables dynamic resource allocation
• Fits the “Flink as a Library” model
© 2018 data Artisans9
CONTAINERIZED DEPLOYMENT (CONT.)
Brief introduction to Kubernetes
• De-facto industry standard for container orchestration
• Resource-oriented with declarative configuration
‒ You tell K8s the desired state, and a background process asynchronously makes it happen
‒ “3 replicas of this container should be kept running”
‒ “a load balancer should exist, listening on port 443, backed by containers with this label”
• Core resource types:
‒ Pod: a group of one or more containers running on a node
‒ Deployment: keep n pods running
© 2018 data Artisans10
REWORKING FLINK’S RUNTIME
FLIP-6 – Flink Deployment and Process Model (Flink 1.5 and up)
• Revamped distributed architecture
‒ Resource elasticity
‒ Support for different environments
‒ RESTfulAPI for client-cluster communication
• Building blocks:
ResourceManager Dispatcher
JobManager TaskManager
• Cluster manager specific
• Acquires/releases resources
• Touch point for job submissions
• Spawns JobManagers
• Single job only
• Deploys/monitors job execution
• Offers resources to ResourceManager
• Gets tasks from JobManagers
© 2018 data Artisans11
UPCOMING CHANGES
© 2018 data Artisans12
CONTAINER MODES
• Different levels of interaction with cluster manager
• Active Container Mode
‒ Flink is aware about the cluster manager that it is running on, and
interacts with it
‒ Already exists, e.g., FLIP-6YARN
• Reactive Container Mode
‒ Flink is oblivious to its environment
‒ Similar to standalone deployment
‒ Flink may react on new resources by scaling up job
© 2018 data Artisans13
ACTIVE K8S INTEGRATION
K8s deployment
controller
Client
TaskManager
JobManager
K8sResourceManager
ApplicationMaster
TaskManager
(3) Submit job
(1) SubmitAM deployment
(4) Start JM
(5) Request slots
(7) StartTM pod
© 2018 data Artisans14
REACTIVE CONTAINER MODE
• Relies on external system to start/release
TaskManagers, e.g.,
‒ Kubernetes Horizontal Pod Autoscaler
‒ GCPAutoscaling
‒ AWS Auto ScalingGroup
• Re-scale job if resources are added/removed
(take savepoint and resume job with new
parallelism automatically)
• By definition works with all cluster managers
Flink cluster
JM TM TM
ASG
Start new TM if
CPU% > threshold
Register
& offer
slots
Event rate over time
© 2018 data Artisans15
DOES ANY OF THIS WORK?
Demo: Autoscaling in Active & Reactive Container Mode
© 2018 data Artisans16
CURRENT STATE
• Recommendations on how to deploy job and session cluster on K8s
‒ Docker image build script
‒ K8s resource configs
• StandaloneJobClusterEntryPoint to start a (containerized) job cluster
• Work in progress/planned:
‒ Reactive Container Mode with automatically rescaling
• Design documents will be published soon
‒ Active K8s integration: prototype is available
• git clone -b nativeKubernetes git@github.com:tillrohrmann/flink.git
© 2018 data Artisans17
SUMMARY
• Thinking in terms of jobs & application code as an alternative to deploying Flink clusters
• Active K8s integration & Reactive Container Mode
• Existing code already enables job and session clusters on K8s
• Call to action:
‒ Umbrella tickets: FLINK-9953, FLINK-7087
‒ Follow dev@flink.apache.org to stay informed about latest state of development
THANK YOU!
@gjyao
@dataArtisans
@ApacheFlink
WE ARE HIRING
data-artisans.com/careers

Weitere ähnliche Inhalte

Mehr von Flink Forward

Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!Flink Forward
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsFlink Forward
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesFlink Forward
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleFlink Forward
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitFlink Forward
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkFlink Forward
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionFlink Forward
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 

Mehr von Flink Forward (20)

Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 

Kürzlich hochgeladen

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 

Kürzlich hochgeladen (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Flink Forward Berlin 2018: Gary Yao - "Flink as a Library (and still as a Framework)"

  • 1. - GARY YAO FLINK AS A LIBRARY (AND STILL AS A FRAMEWORK)
  • 2. © 2018 data Artisans2 ABOUT DATA ARTISANS Original Creators of Apache Flink® RealTime Stream Processing Enterprise Ready
  • 3. © 2018 data Artisans3 FREE TRIAL DOWNLOAD data-artisans.com/download
  • 4. © 2018 data Artisans4 MOTIVATION
  • 5. © 2018 data Artisans5 FLINK DEPLOYMENT OPTIONS & LIMITATIONS BeforeApache Flink 1.5 • Shortcomings: ‒ Fixed number ofTaskManagers ‒ Difficult to support autoscaling ‒ Supporting new cluster managers difficult ‒ Akka for client—cluster communication ‒ Undisciplined about Job clusters Session cluster Job cluster Standalone ✓ YARN ✓ ✓ Mesos ✓ Kubernetes ✓
  • 6. © 2018 data Artisans6 FLINK AS A LIBRARY • Make deployments easier • Application-oriented instead of cluster-oriented • Just a start one or more Job jars, and automatically get distributed execution • Remove distinction between different roles in the cluster (JM vs.TM) • Participants of the cluster should determine roles on their own ‸
  • 7. © 2018 data Artisans7 RESOURCE MANAGEMENT & AUTOSCALING • Short-lived & high latency • Resources only exist for the duration of the job • Different stages in a job may require different resources • Flink should be in control of allocating/releasing resources • Long-lived & low latency • Dependent on external systems & exposed to varying load • External system should drive resource allocation to meet SLO/SLAs • Flink should react to available resources Real-time Applications Batch Analytics Continuous processing
  • 8. © 2018 data Artisans8 CONTAINERIZED DEPLOYMENT Why Flink on containers? • Self-contained runtime environment that includes application code, libraries, dependencies, and configuration files • Ease of operations by clean separation of concerns • Enables dynamic resource allocation • Fits the “Flink as a Library” model
  • 9. © 2018 data Artisans9 CONTAINERIZED DEPLOYMENT (CONT.) Brief introduction to Kubernetes • De-facto industry standard for container orchestration • Resource-oriented with declarative configuration ‒ You tell K8s the desired state, and a background process asynchronously makes it happen ‒ “3 replicas of this container should be kept running” ‒ “a load balancer should exist, listening on port 443, backed by containers with this label” • Core resource types: ‒ Pod: a group of one or more containers running on a node ‒ Deployment: keep n pods running
  • 10. © 2018 data Artisans10 REWORKING FLINK’S RUNTIME FLIP-6 – Flink Deployment and Process Model (Flink 1.5 and up) • Revamped distributed architecture ‒ Resource elasticity ‒ Support for different environments ‒ RESTfulAPI for client-cluster communication • Building blocks: ResourceManager Dispatcher JobManager TaskManager • Cluster manager specific • Acquires/releases resources • Touch point for job submissions • Spawns JobManagers • Single job only • Deploys/monitors job execution • Offers resources to ResourceManager • Gets tasks from JobManagers
  • 11. © 2018 data Artisans11 UPCOMING CHANGES
  • 12. © 2018 data Artisans12 CONTAINER MODES • Different levels of interaction with cluster manager • Active Container Mode ‒ Flink is aware about the cluster manager that it is running on, and interacts with it ‒ Already exists, e.g., FLIP-6YARN • Reactive Container Mode ‒ Flink is oblivious to its environment ‒ Similar to standalone deployment ‒ Flink may react on new resources by scaling up job
  • 13. © 2018 data Artisans13 ACTIVE K8S INTEGRATION K8s deployment controller Client TaskManager JobManager K8sResourceManager ApplicationMaster TaskManager (3) Submit job (1) SubmitAM deployment (4) Start JM (5) Request slots (7) StartTM pod
  • 14. © 2018 data Artisans14 REACTIVE CONTAINER MODE • Relies on external system to start/release TaskManagers, e.g., ‒ Kubernetes Horizontal Pod Autoscaler ‒ GCPAutoscaling ‒ AWS Auto ScalingGroup • Re-scale job if resources are added/removed (take savepoint and resume job with new parallelism automatically) • By definition works with all cluster managers Flink cluster JM TM TM ASG Start new TM if CPU% > threshold Register & offer slots Event rate over time
  • 15. © 2018 data Artisans15 DOES ANY OF THIS WORK? Demo: Autoscaling in Active & Reactive Container Mode
  • 16. © 2018 data Artisans16 CURRENT STATE • Recommendations on how to deploy job and session cluster on K8s ‒ Docker image build script ‒ K8s resource configs • StandaloneJobClusterEntryPoint to start a (containerized) job cluster • Work in progress/planned: ‒ Reactive Container Mode with automatically rescaling • Design documents will be published soon ‒ Active K8s integration: prototype is available • git clone -b nativeKubernetes git@github.com:tillrohrmann/flink.git
  • 17. © 2018 data Artisans17 SUMMARY • Thinking in terms of jobs & application code as an alternative to deploying Flink clusters • Active K8s integration & Reactive Container Mode • Existing code already enables job and session clusters on K8s • Call to action: ‒ Umbrella tickets: FLINK-9953, FLINK-7087 ‒ Follow dev@flink.apache.org to stay informed about latest state of development
  • 18. THANK YOU! @gjyao @dataArtisans @ApacheFlink WE ARE HIRING data-artisans.com/careers

Hinweis der Redaktion

  1. should fit
  2. • data Artisans was founded by the original creators of Apache Flink • We provide dA Platform, a complete stream processing infrastructure with open-source Apache Flink
  3. • Also included is the Application Manager, which turns dA Platform into a self-service platform for stateful stream processing applications. • dA Platform is generally available, and you can download a free trial today!