The talk was held on WJAX-2019 and shows how Knative changes the serverless landscape based on the customer requires. It shows how the Knative autoscaling works and how Knative can help to implement customer requirements. It also shows weaknesses of the technologies
Unlocking the Future of AI Agents with Large Language Models
How Knative Changes the Serverless Landscape
1. Ein Blick behind-the-scenes
und wie Knative die Serverless
Landschaft verändert
(following charts in english)
Jeremias Werner | Senior Software Developer | IBM
2. Who I‘am...
2Jeremias Werner @JereWerner
Jeremias Werner
Senior Software Developer
IBM Research & Development
jerewern@de.ibm.com
LinkedIn: jeremias-werner
Twitter: JereWerner
Opinions are my own!
6. Content
What is Serverless?
What is Serverless good for?
What is Knative?
How Knative works behind-the-scenes?
How Knative changes the Serverless Landscape?
• Positioning
• Value Proposition
6Jeremias Werner @JereWerner
7. An evolution of compute!
Break-up the monolith
– Inherent Scaling
– Better resource
utilization
– Reduced costs
Abstraction of
infrastructure
– Devs focus on code
not infrastructure
– Faster time to
market = $
7
Source: If applicable, describe source origin
Increasingfocusonbusinesslogic
Decreasing concern (and control) over stack implementation
Bare Metal
Virtual machines
Functions
Serverless
Containers
Apps/PaaS
Jeremias Werner @JereWerner
8. Value Proposition
No management and
operation of
infrastructures
Focus on developing
value-adding code and
on driving innovations
Transparently scales
with the number of
requests being served
Only pay for resources
being used, instead of
resources idling
around
8
$
Jeremias Werner @JereWerner
9. This is the price of 1 GB allocated for 1s of
IBM Cloud Functions
$0.000017
9Jeremias Werner @JereWerner
10. Traditional Model
Worry about when and how to scale
Worry about resiliency & cost
Charged even when idling / not 100% utilized
Continuous polling due to missing event
programming model
Jeremias Werner @JereWerner
11. Serverless Model
Scales inherently one process per request
No cost overhead for resiliency
Introduces event programming model
Charges only for what is used
Only worry about code
higher dev velocity, lower operational costs
Jeremias Werner @JereWerner
12. Demo - A FaaS experience!
Jeremias Werner @JereWerner 12
13. Container cold start is in the ballpark of...
~200ms
13Jeremias Werner @JereWerner
14. Content
What is Serverless?
What is Serverless good for?
What is Knative?
How Knative works behind-the-scenes?
How Knative changes the Serverless Landscape?
• Use Cases
• Customer Stories
• Demo
14Jeremias Werner @JereWerner
15. Some customers
Many more in-prod
customer projects
across numerous
industries
incl automotive,
banking, insurance,
entertainment, retail,
manufacturing, etc.
Jeremias Werner @JereWerner
List of customers are removed!
16. Serverless is more than FaaS
While FaaS is the key anchor point for serverless,
there is a growing set of services from other
domains also delivering serverless attributes
This enables customers to build application
topologies which are entirely serverless
Build your serverless architecture!
Jeremias Werner @JereWerner
17. Common Use Cases
Serverless API
Backends /
Microservices
Mobile backends
Conversational
applications
Scheduled tasks
Massively parallel
compute / “Map”
operations
Parallel data
processing
Data-at-rest
processing & ETL
Pipelines
Data processing
enriched with cognitive
capabilities
Event Stream
Processing
IoT
17Jeremias Werner @JereWerner
18. Serverless API backend
Allows to map API
endpoints
to functions
DBaaSFaaSAPI GatewayClient
Jeremias Werner @JereWerner
19. Mobile Backend
Remember the value
propostion!
If there is no request,
nothing to run and no
charges occur!
FaaS
Jeremias Werner @JereWerner
20. Data Processing
Ideally suited for working with structured data, text,
audio, image and video data:
• Data enrichment, transformation, validation,
cleansing
• PDF processing
• Audio normalization
• Image rotation, sharpening, noise reduction or
• Thumbnail generation
• Image OCR’ing
• Video transcoding
20
Elephant
Animal
Sign
FaaSDBaaS
Jeremias Werner @JereWerner
25. ESPN Fantasy Football has
10+ million daily
active user
25Jeremias Werner @JereWerner
26. Weather radar video processing
and thumbnail generation
Periodic trigger to scan
weather data from
object storage
Functions to generate
thumbnails and
animated gifs
26
(https://www.wunderground.com/maps)
Jeremias Werner @JereWerner
27. Demo
Map and Reduce the values of a column in 288 CSV
files with a total of 360 million rows stored on cloud
object storage to generate a histogram
27Jeremias Werner @JereWerner
pywren.io + knative.dev
28. PyWren Rocks!
28
Number of forecasts Local run FaaS
100,000 ~10,000 secs ~140 secs
Jeremias Werner @JereWerner
29. Content
What is Serverless?
What is Serverless good for?
What is Knative?
How Knative works behind-the-scenes?
How Knative changes the Serverless Landscape?
• Knative
• Knative Serving in Detail
• Demo
29Jeremias Werner @JereWerner
30. „Open source building blocks for
Serverless on Kubernetes“
30
https://knative.dev/
Jeremias Werner @JereWerner
31. What is Knative?
Run serverless containers, apps and functions on
Kubernetes with ease
Knative takes care of the details of
• networking,
• autoscaling (+ from-zero, to-zero)
• revision tracking
You just have to focus on your core logic
Simplified UX on top of Kubernetes
31
https://knative.dev/docs/
Jeremias Werner @JereWerner
32. Main Components
Serving is the runtime
component that hosts
and scales your
application as K8s
pods
Tekton (Build) are
Kubernetes building blocks to
run pipelines to create
images from source, now in
the ci/cd foundation
Eventing contains
tools for managing
events between
loosely coupled
services
Client owns the kn CLI to
manage knative resources
32
https://tekton.dev
Jeremias Werner @JereWerner
34. Serving
Service – Is the top level resource that controls the
deployment and life-cycle of the workload
Route – Is the external visible endpoint of the
service and is routing the traffic to individual
revisions
Configuration – Describes the current desired state
of the deployment
Revision – Is created for each modification of the
configuration and reflect a point-in-time
configuration of the deployment
34
https://knative.dev/docs/serving/
Jeremias Werner @JereWerner
35. Serving and how it works?
1. Deploy app as pod/revision
2. Networking auto-setup
3. Revisions are scaled up/down based on load
4. Update of service create Revisions
5. Traffic splitting based on %
6. Dedicated URLs to Revisions
35Jeremias Werner @JereWerner
38. Content
What is Serverless?
What is Serverless good for?
What is Knative?
How Knative works behind-the-scenes?
How Knative changes the Serverless Landscape?
• How Knative works
• Revisit the customer requirements
• Understand scaling behaviour
• Capacity considerations
38Jeremias Werner @JereWerner
39. The API Specification
containers: Kubernetes container spec but only
1 container is allowed
containerConcurrency: The maximum number
of concurrent requests being handled by a single
container instance. Basis for scaling.
resources: The requested and limited memory
and cpu resources of the container
apiVersion: serving.knative.dev/v1beta1
kind: Service
metadata:
name: helloworld
spec:
template:
spec:
containerConcurrency: 10
containers:
- image: jerewern/helloworld
resources:
limits:
memory: 256Mi
cpu: 2000m
requests:
memory: 128Mi
cpu: 100m
https://github.com/knative/docs/blob/master/docs/serving/spec/knative-api-specification-1.0.md
Jeremias Werner @JereWerner
40. The Request Flow (simplified)
1. Istio Ingress Gateway is configured by the
Route and terminates the request
2. Ingress Gateway forwards the requests to the
Activtor
3. Activator buffers requests when scaled to zero
or when in burst mode
4. Queue Proxy is terminating the request in the
service pod and forwards the request to the
user container
5. Autoscaler is scraping metrics from the
Activator and Queue Proxies and scales the
Deployment
assuming istio-sidecar injection being disabled
Jeremias Werner @JereWerner
41. Think big! A customer requirement
~ 1s response
guarantee
41Jeremias Werner @JereWerner
43. The Problem Statement
43
The Pod startup depends on a couple of factors,
like:
– Size of the container image which might need
to be pulled
– Creation and startup of user container, queue
proxy and (optionally) istio sidecar
– Process startup and waiting for readiness
– Network namespace setup for the Pod
– Network setup in k8s and making the Pod
available in the deployment and service
– Load on the worker node machine
Image Pull
Pod creation
Container
Startup
Container
Startup
Cluster network
setup
Jeremias Werner @JereWerner
44. It‘s in the ballpark of...
(do you remember the FaaS experience from above?)
~3-5s
overhead
44Jeremias Werner @JereWerner
45. Possible Knative
Improvements
45
Discussions in the Knative community
– Improve load-balancing in activator
– Do not wait for readiness probe
– Do not wait for Pod being reachable behind
ClusterIP and address the Pod directly
– Get rid of the queue-proxy side-car container
– Write a custom kubelet or use virtual kubelet
– Pre-warming images for specific runtimes, i.e.
nodejs, … and only inject code!
https://docs.google.com/document/d/1Jdd8eu3cJRv
CVkl8Y48Fg3fVY6Bpp7Tv8HshSg58dOg/edit#
Jeremias Werner @JereWerner
46. What the user can do...
As an user
– Use light-weight frameworks for your
application container and ensure fast startup
times, like Quarkus
– Small container image and reduce #layers
(GraalVM)
– Find the right container concurrency > 1
– Configure min replicas to avoid scale to zero
46Jeremias Werner @JereWerner
47. Think big! A customer requirement
1 petabyte
image data
47Jeremias Werner @JereWerner
49. Revisit the Pywren Scenario
Execution with 100 requests in parallel
took 128s for ~360 million records and
7.7GB
49
Actual Pods
Desired Pods
Jeremias Werner @JereWerner
50. Panic and Stable Mode
50
Panic Mode = True
6s panic window
60s sliding stable window
Goal: average of 70% of
container concurrency in sliding
window
Panic when observed
concurrency > 200% of
desired concurrency
Check metrics of observed
concurrency every 2s (tick interval)
Jeremias Werner @JereWerner
52. Activator as buffer!
52
”Proxy Mode” – if …
a) … the spare capacity, i.e. the number of
requests the pods could handle, is smaller
than the target burst capacity
b) … scaled to zero
“Serve Mode” – otherwise
Proxy vs Serve
https://docs.google.com/document/d/1Jdd8eu3cJRv
CVkl8Y48Fg3fVY6Bpp7Tv8HshSg58dOg/edit#
Spare capacity <
threshold
Jeremias Werner @JereWerner
53. Think big! A customer requirement
1 million
req/min peak
53Jeremias Werner @JereWerner
55. Test:
– 1000 req/s
– 100ms duration of
each request
– container
concurrency is 40
– 5 minutes test run
Testing autoscaler stability
and precision
Produce a constant rate of N req/s
Higher container concurrency
Actual vs Expected:
• number of scale-up and scale-down
• number of pods scaled
• success and error rates
• latency
55
Expected:
1000
𝑟𝑒𝑞
𝑠
∗
0,1𝑠
40 𝑟𝑒𝑞 ∗ 70%
= 3,5 𝑃𝑜𝑑𝑠
100 𝑚𝑠 𝑙𝑎𝑡𝑒𝑛𝑐𝑦
100% 𝑠𝑢𝑐𝑐𝑒𝑠𝑠
Jeremias Werner @JereWerner
56. It‘s works as expected!
56
Panic, scale-out
to 100 Pods
Stablize,
4 Pods
Initial wave, buffered
110 ~= 40 req * 0,7 * 4 Pods
1000
𝑟𝑒𝑞
𝑠
∗
0,1𝑠
40 𝑟𝑒𝑞 ∗ 70%
= 3,5 𝑃𝑜𝑑𝑠
Activator in proxy
mode
Jeremias Werner @JereWerner
57. Scaling matters...
57
1. Understand component on the critical path
• Activator is getting more and more part of the
critical path, i.e. new in 0.9 release
2. Identify bottlenecks
• Consider network bandwidth
3. Scale horizontally and vertically
• Istio is CPU hungry and requires 0.5 vCPU per 1k
req/s
Jeremias Werner @JereWerner
58. We could easily scale up to...
140k
req/s
58Jeremias Werner @JereWerner
59. Think big! A customer requirement
7 TB of
memory
59Jeremias Werner @JereWerner
61. 61
Let‘s talk a bit about resources
and placement
Jeremias Werner @JereWerner
62. Remember
resources: The requested and limited memory
and cpu resources of the container
apiVersion: serving.knative.dev/v1beta1
kind: Service
metadata:
name: helloworld
spec:
template:
spec:
containerConcurrency: 10
timeoutSeconds: 600
containers:
- image: jerewern/helloworld
resources:
limits:
memory: 256Mi
cpu: 2000m
requests:
memory: 128Mi
cpu: 100m
https://github.com/knative/docs/blob/master/docs/serving/spec/knative-api-specification-1.0.md
Jeremias Werner @JereWerner
63. Placement
63
Placement is done based on the resource requests
values for Memory and CPU
If a pod can not be placed, the node autoscaler
kicks in and deploys additional nodes
https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/
32/32 GB = 32 GB requests / 32 GB limit
Jeremias Werner @JereWerner
64. Resource Limits and Quota
64
Containers can have resource limits assigned
If the container reaches the limit
• Memory imit à Kill
• CPU limit à throttle
Note: Resource limits count for the resource quota
given to a k8s namespace
32/32 GB = 32 GB requests / 32 GB limit
https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/
Jeremias Werner @JereWerner
65. Think big! A customer requirement
20 minutes
model training
65Jeremias Werner @JereWerner
66. This is what you get if a request takes 10min
504 Gateway
Timeout
66Jeremias Werner @JereWerner
68. We would need something
asynchronous...
68
1. Client should NOT need to wait for the request
to finish
2. Client should be able to submit non-blocking
requests that run asynchronous
3. Client should be able to query the state of the
request in order to check if the request was
successful or failed
4. System should NOT scale down the pods when
the request is running asynchronous
5. System should count an asynchronous request
for container concurrency
https://docs.google.com/document/d/11Fryfns-
KQL6JXfNG9TyMsh3MDVF_gFGT2sl9bOpc_o/edit
Jeremias Werner @JereWerner
70. Content
What is Serverless?
What is Serverless good for?
What is Knative?
How Knative is working behind-the-scenes?
How Knative changes the serverless landscape?
70Jeremias Werner @JereWerner
71. Serverless is now more!
Run serverless
containers, apps and
functions on
Kubernetes with ease
That’s why it changes
the serverless
landscape!
71
Source: If applicable, describe source origin
Increasingfocusonbusinesslogic
Decreasing concern (and control) over stack implementation
Bare Metal
Virtual machines
Functions
Containers
Apps/PaaS
Serverless
Jeremias Werner @JereWerner
72. Traditional 12-factor app
With Knative you can run your traditional application
and container in a serverless fashion. Easy lift-and-
shift.
Knative scales your app and container from-zero
and to-zero
Scale by number of requests instead of
CPU/Memory
Jeremias Werner @JereWerner
73. Portability!
73
Knative runs where Kubernetes runs!
Knative brings serverless to Kubernetes
Operators and developers can leverage the same
infrastructure, tools and skills as for Kubernetes
+
Jeremias Werner @JereWerner
74. Strength and Weakness of
Knative
Best suitable for high-volume request-response
workload allowing much higher throughput than
traditional FaaS
Scaling based on requests and concurrency vs
memory/CPU
Allows higher memory limits than traditional
FaaS services (AWS Lambda, Azure Functions,...)
Designed to handle bursty workload, but slow
reaction due to async feedback loop
Short critical path allows very low latency for
“warm“ invocations, i.e. ~1ms.
Lack on container cold start time compared to
existing FaaS services (seconds vs milliseconds)
Lack of long-running invocations
74Jeremias Werner @JereWerner
75. A lot of work todo...
Help
wanted!
75
https://github.com/knative
Jeremias Werner @JereWerner
76. Or you want to...
Try it
out?
76Jeremias Werner @JereWerner
77. It‘s available as a managed
service in IBM Cloud
77Jeremias Werner @JereWerner