Deploying commercial applications that meet their expected business needs is challenging due to the differences between how business goals are specified and how the system is evaluated. Furthermore, business goals are dynamic, requiring deployment to change constantly over time. Such difficulties make it costly to maintain application quality as the underlying infrastructure is not always fast enough to keep up with business changes. Nowadays, serverless opens a new approach to build application. By abstracting out the deployment details, serverless application can be implemented with minimum deployment efforts. Serverless also reduces maintenance cost with auto-scaling and pay-as-you-go. Such abilities make us believe that by adopting serverless, we can build application that can meet and quickly adapt to business goals.
However, simply writing applications with serverless is not sufficient. Due to best-effort invocation mechanisms and the lack of application structure awareness, serverless performance is highly variable and often fails to support applications with rigorous quality of service requirements. In this study, we aim to mitigate such limitations by coupling serverless deployment with business needs. In particular, we define an Serverless Service-Level Objective (SLO) interface that allows developers to describe their application structure and business goals in terms of software-level objectives. We implement an SLO enforcer, which uses this information in combination with the system performance metrics to decide a proper serverless deployment and resource allocation for meeting business goals. The Serverless SLO leverages blueprint model, which allow developers to describe applications' architecture and runtime characteristics needs, to map application description to serverless function deployment on the top of Knative. We deploy our proposed system on KinD, a tool to run Kubernetes cluster over our local Docker container, and evaluate it with different system configurations. Evaluation results showed that SLO definition and enforcement helps serverless application use resources in accordance with business goals.
2. Background
Business objectives
- Market growth
- Maximize user
experiences
Feedback: Performance metrics, earning, etc.
Business improvement loop
Solution deployments
- Functionalities
- Latency
- Throughput
Encode business logics in terms of performance
indicators that can be measured and enforced by proper
system designs
- Software-level Agreements (SLA): performance
guarantee agreements defined by user and enforced
by the implementation
- Software-level Objective (SLO): key elements of SLA,
objectives that the implementation must achieve to
satisfy the SLA
2
3. SLO/SLA Example: Reefer Container Shipment
reference implementation*
*see https://ibm-cloud-architecture.github.io/refarch-kc/
• Business goal: customer
experience should not be
affected by delay
• Translate to SLA/SLO
- 99% of order placement < 2s
- No order placement lost
- Availability > 99.999%
- Capacity > 1M concurrent
orders
- Business logics change
- Market growth
- Response to competitors’
improvement
3
Can we translate the business goal to system
implementation fast?
4. Serverless (industry perspective)
• According to Cloud Native Computing Foundation*: “Serverless computing refers to the concept of
building and running applications that do not require server management. It describes a finer-
grained deployment model where applications, bundled as one or more functions, are uploaded to
a platform and then executed, scaled, and billed in response to the exact demand needed at the
moment”
* https://www.cncf.io/blog/2018/02/14/cncf-takes-first-step-towards-serverless-computing/
Time
Workload
Serverless
Allocation
Cost
- Autoscaling: dynamically add/remove
resources according to workload, scale-to-
zero
- Truly pay-as-you-go: only pay for function
execution time with fine-grain pricing (per
100ms)
- Detach resource management from
software development/maintenance
process
- Speed up software evolution
4
5. Serverless Implementation
Kubernetes is “an open-source system for automating
deployment, scaling, and management of containerized
applications”*
• Become the de facto standard platform for container orchestration.
• Market adoption is strong: 80% of companies used Kubernetes in
some forms in 2019, and the number is subjected to increase***
Knative is “an open source community project which adds
components for deploying, running, and managing serverless,
cloud native applications to Kubernetes”**.
* https://kubernetes.io
** https://www.redhat.com/en/topics/microservices/what-is-knative
*** Kubernetes Ecosystem, https://thenewstack.io/ebooks/kubernetes/state-of-kubernetes-ecosystem-second-edition-2020/
5
6. Example: handling a new shipment order
Preprocess
Find
container
Schedule
shipment
New order Response
𝑓
1. Submit
2. Establish
Knative
Kubernetes
4. Invoke
3. Subscribe
𝑓
5. Publish 6. Trigger
Container Container
𝑓 Kafka
Kafka
Container
6
7. Why Serverless SLO?
• In Serverless, it is order of magnitude faster to translate among
business logics and system designs
• Zero deployment/maintenance effort
• New way of utilizing resources
• Finer grain resource management truly pay-as-you-go
• High flexibility more rooms for resource management optimization
• Spatial: stateless
• Temporal: short execution time
Potential way to quickly define and enforce SLOs
7
8. Experiment setup for serverless sequence*
100ms 100ms 100ms
…
- Periodic bursts
- Constant ramp-up, 4 orders per sec2
- Limited rate, up to 120 orders per sec
Workload (Input)
Preprocess
Find
container
Schedule
shipment
New order Response
forward-1 Kafka Kafkaforward-2 forward-3 Output
System specs:
- Knative v0.15.1 over Kubernetes v1.18.2 supported by KinD**
- Events are transmitted through Kafka v2.5.0
- 12 CPU, 48 GB Memory and 256GB SSD
*Script available at https://github.com/ngduchai/event-driven-apps
**KinD helps deploying Kubernetes cluster over local computer, see https://kind.sigs.k8s.io
8
9. Invocation Overhead of forward-1
• Scaling delay: Serverless framework need time to detect workload changes and
schedule resources accordingly
• Cold start: It takes time to allocate and initialize new function sandboxes
Latency distributionDemand vs. System capacity9
10. Autoscaler behaviors over the Sequence
Scale-up delay propagation Latency delay propagation
- Serverless platform is not aware of the sequence topology
- The overhead is amplified as computation propagates over the topology
10
11. Problem and Approach
• Can we leverage the serverless model to define and enforce a new set
of SLA/SLO to support fast translation from business logic to
application solutions?
SLO EnforcerBusiness + Application
Logics
Application + SLO
Description
Workload
1. Define 2. Deploy
5. Config
3. Run
4. Feedback
with perf.
metrics
6. Evaluate
11
12. Application Description: Open Application
Model (OAM)
• “Open Application Model (OAM) is a runtime-agnostic specification for defining
cloud native applications and enable building application-centric platforms by
natural. Focused on application rather than container or orchestrator, Open
Application Model brings modular, extensible, and portable design for building
application centric platforms on any runtime systems like Kubernetes, cloud, or
IoT devices.” *
• Provide for translating business goals to application design and application description,
including
• Computation components (e.g. Micro-services)
• Topology
• Deployment environment
• We extend OAM for defining and enforcing SLO by
• Adding serverless (Knative deployment) description support
• Adding SLO description
*OAM see https://github.com/oam-dev/spec
12
14. Control Invocation Overhead with Knative
• Cold start: multiplex
multiple invocation into big
pods (e.g. containers) to
reduce # cold starts
• Scaling delay: reserve extra
resource to buy time when
workload surges
14
Note: 400m+1 = use pod size of 400m with
reservation = 400m (1 pod)
15. Topology Awareness Configuration with
Knative
• Select the right pod size • Select the right reservation
Increasing pod size doesn’t always
improve concurrency significantly
Increasing workflow size increases the
per-component deployment cost
15
16. Demonstration
• Naïve serverless vs. Serverless + SLO enforcement
• Reuse the previous workload, application logics, and system setup
• SLO: as long as input rate < 120 order per sec:
• 95th of end-to-end latency < 1500ms
• Recorded Videos
• LINK TO BE ADDED
16
18. Results: Latency distribution
Serverless deployment Serverless + SLO enforcer deployment
Extremely high latency due to scaling lag
and topology unawareness
18
SLO Enforcer successfully meet SLO
requirements by choosing right pod size
and reservation
19. Results: Earning
• Cost is calculated based on IBM
container pricing*, with 0.000034
USD/Second/Core
* https://cloud.ibm.com/kubernetes/catalog/about
Per-request Icome
End-to-end Latency Income ($)
< 500 0.0000012
< 1000 0.0000011
< 1500 0.000001
Otherwise 0
19
SLO Enforcer meet the SLO at reasonably low
cost, thereby creates high earning, satisfying
the business goal.
20. Related Work
• Handling invocation overhead
• Cold start: SAND (ATC18), SOCK (ATC18), Catalyzer (ASPLOS20)
• Scaling: Shadrad et. al. (ATC’20)
• Per-function optimization, no topology support
• Topology-aware Deployment for Serverless: IBM Composer, AWS Step
Function
• Simple topology (sequence + parallel), no performance guarantee
• Performance Guarantee for Serverless: Real-time Serverless (WoSC19)
• Rate guarantee but no topology support
20
21. Conclusion and Future Vision
• Serverless opens opportunities to quickly build and adjust software
solution to business goals. But many challenges arise
• Scaling overhead
• Lack of topology awareness
• We propose an SLO Serverless interface to describe and enforce
business goals in terms of SLOs
• Long-term vision
• Support more complicated workload topologies
• Efficient SLO enforcement (smarter metrics selection, ML approaches, etc.)
• Generic mechanism for all serverless platforms (not just Knative)
21
22. Coming soon…
• Will be available soon
• Blog post
• Demonstration scripts
• … and slides, demo recording and talk recording posted later after the
talk
22