Are you interested in learning how to schedule batch jobs in container runtimes?
Maybe you’re wondering how to apply continuous delivery in practice for data-intensive applications? Perhaps you’re looking for an orchestration tool for data pipelines?
Questions like these are common, so rest assured that you’re not alone.
In this webinar, we’ll cover the recent feature improvements in Spring Cloud Data Flow. More specifically, we’ll discuss data processing use cases and how they simplify the overall orchestration experience in cloud runtimes like Cloud Foundry and Kubernetes.
Please join us and be part of the community discussion!
Presenters :
Sabby Anandan, Product Manager
Mark Pollack, Software Engineer, Pivotal
3. Data in the
Enterprise
What we see in the industry
Digital transformation is the new norm. DevOps
practices play a critical role in transitioning into a
data-driven business.
Event-driven architectures are on the rise; data is at
the core of it.
ETL is not going away, but the development and
operating model continues to evolve.
Machine learning has brought unprecedented
abilities to engineering domain. Making it easily
accessible is on an upswing.
3
4. “We call an application data-intensive if
data is its primary challenge—the
quantity of data, the complexity of data,
or the speed at which it is changing.”
4
10. A toolkit for building
data integration, real-
time streaming, and
batch data processing
pipelines.
WHAT IS SPRING CLOUD DATA FLOW?
10
11. Data pipelines consist of Spring Boot apps, using
Spring Cloud Stream for event-streaming or
Spring Cloud Task for batch processes.
Ready for Data Integration with >60 out-of-the-
box streaming and batch Apps.
DSL, GUI, and REST-APIs to build and
orchestrate data pipelines onto platforms like
Kubernetes and Cloud Foundry.
Continuous delivery for streaming data pipelines
using Spring Cloud Skipper.
Cron-job scheduler for batch data pipelines using
Spring Cloud Scheduler.
A toolkit for building
data integration, real-
time streaming, and
batch data processing
pipelines.
WHAT IS SPRING CLOUD DATA FLOW?
11
12. Data pipelines consist of Spring Boot apps, using
Spring Cloud Stream for event-streaming or
Spring Cloud Task for batch processes.
Ready for Data Integration with >60 out-of-the-
box streaming and batch Apps.
DSL, GUI, and REST-APIs to build and
orchestrate data pipelines onto platforms like
Kubernetes and Cloud Foundry.
Continuous delivery for streaming data pipelines
using Spring Cloud Skipper.
Cron-job scheduler for batch data pipelines using
Spring Cloud Scheduler.
A toolkit for building
data integration, real-
time streaming, and
batch data processing
pipelines.
WHAT IS SPRING CLOUD DATA FLOW?
12
13. Data pipelines consist of Spring Boot apps, using
Spring Cloud Stream for event-streaming or
Spring Cloud Task for batch processes.
Ready for Data Integration with >60 out-of-the-
box streaming and batch Apps.
DSL, GUI, and REST-APIs to build and
orchestrate data pipelines onto platforms like
Kubernetes and Cloud Foundry.
Continuous delivery for streaming data pipelines
using Spring Cloud Skipper.
Cron-job scheduler for batch data pipelines using
Spring Cloud Scheduler.
A toolkit for building
data integration, real-
time streaming, and
batch data processing
pipelines.
WHAT IS SPRING CLOUD DATA FLOW?
13
14. Data pipelines consist of Spring Boot apps, using
Spring Cloud Stream for event-streaming or
Spring Cloud Task for batch processes.
Ready for Data Integration with >60 out-of-the-
box streaming and batch Apps.
DSL, GUI, and REST-APIs to build and
orchestrate data pipelines onto platforms like
Kubernetes and Cloud Foundry.
Continuous delivery for streaming data pipelines
using Spring Cloud Skipper.
Cron-job scheduler for batch data pipelines using
Spring Cloud Scheduler.
A toolkit for building
data integration, real-
time streaming, and
batch data processing
pipelines.
WHAT IS SPRING CLOUD DATA FLOW?
14
15. Data pipelines consist of Spring Boot apps, using
Spring Cloud Stream for event-streaming or
Spring Cloud Task for batch processes.
Ready for Data Integration with >60 out-of-the-
box streaming and batch Apps.
DSL, GUI, and REST-APIs to build and
orchestrate data pipelines onto platforms like
Kubernetes and Cloud Foundry.
Continuous delivery for streaming data pipelines
using Spring Cloud Skipper.
Cron-job scheduler for batch data pipelines using
Spring Cloud Scheduler.
A toolkit for building
data integration, real-
time streaming, and
batch data processing
pipelines.
WHAT IS SPRING CLOUD DATA FLOW?
15
16. Spring Cloud Task
Build short-lived microservices to
perform data processing locally or in the
cloud.
● Task executions history
● Pluggable Task repository
● Remote partitioning,
checkpointing, and restartability
Spring Cloud Stream
Build highly scalable event-driven
microservices connected with shared
messaging systems.
● Imperative vs. Functional
programming styles
● Partitioning and consumer-
groups
● Pluggable message bus
abstraction
Spring MVC / WebFlux
Build production-grade RESTful apps on
the JVM.
● Separation of concerns to
support modular architecture
● Built-in RESTful components
● Pluggable view resolvers
FEATURES FEATURES FEATURES
Common Denominator = Spring Boot
RESTful Streaming Batch
Opportunities to Consolidate: Development Practices | Test Infrastructure | CI/CD Tooling and Automation
16
23. Rabbit MQ Apache Kafka Google PubSub
Amazon Kinesis SolaceAzure Event Hubs
Pluggable Binder Implementation
Opportunities: Same code; Same tests; Drop-in replacement for a variety of Message Brokers
23
28. What do we mean by ‘cloud-native’
patterns for data?
■ Self-contained applications; no app-server
or external runtime (Web Server or ESB).
■ Deployment and Governance done by
platforms like Cloud Foundry or
Kubernetes.
■ Many data centric use-cases can be
handled by self-contained apps. Leverage
existing knowledge of runtime platforms
and the supporting ecosystem.
E
T
L
E
T
L
28
29. Maintainability
● Build, test, and iterate as
frequently as needed.
● CI/CD as first-class
thinking for data-centric
workloads.
● Data processing
guarantees in the event of
rolling-upgrades.
Scalability
● Auto-scale up/down based
on throughput demands.
● Linear throughput
characteristics as you scale
applications.
● Bring it back to desired
shape when the demand
fades away.
● Same app runs locally in
the laptop or in any cloud
platform where there’s
JVM.
PortabilityReliability
● Focus on the business
logic and the unit-level,
integration, and
acceptance tests.
● Depend on platform
runtime (Kubernetes or
Cloud Foundry) for
reliability and resiliency
guarantees.
29
31. Build Package Test
Unit Test
IT Test
Candidate
Stage
E2E Test
Deploy
Prod
Deploy
Prod
automatic
automatic
automatic
manual
Continuous Delivery
Continuous Deployment
31
33. Modernize monolithic ETL
workloads
SQL scripts, stored
procedures, and in-house
bash scripts to cloud-native
architecture.
Small and incremental
releases.
Continuous delivery is the
focus.
33
35. Events as first-class thinking
in the enterprise.
Common practices include
domain-driven design,
event-sourcing, and CQRS.
Batch to Event-driven and
streaming architectures
35
36. The file-ingest from NFS, S3,
and other volume-mounts.
Doing it in container based
runtimes comes with many
operational benefits
including the ability to auto-
scale.
File-ingest and data
processing
36
37. Stateful applications built
using KStreams API
including KTable and
Interactive Queries for real-
time streaming-analytics
and rapid dashboarding.
Stateful stream processing
37
38. Scheduled batch-jobs
Whether it is for predictive
model training, massive file
movement, or the classic
data migration batch-jobs,
they are typically schedule
driven.
38
40. Next
■ Function chaining through Spring Cloud
Function and Spring Cloud Stream.
■ Deploy Apps with multiple input/output
channels.
■ Audit trails: Who did what and when?
■ Task-launch and rate limiting.
■ Spring Boot 2.x / Java 9, 10, 11.
40