4. Traditional Approach
• copying and context switches between kernel
and user space à poor performance!
Socket
Read Buffer NIC Buffer
Fast Buffer
Slow Kernel Context
Application Context
Application
App Buffer
Slide Source: Distributed Systems Course 2011/2012 by Dennis Pfisterer, Institute of Telematics, University of Lübeck, Germany 4
5. Zero-Copy Approach
• Kernel handles the copy process via Direct Memory Access (DMA)
– No CPU load
– Lower load on bus system
– No copying between kernelspace and userspace
Socket
Read Buffer NIC Buffer
Fast Buffer
Kernel Context
Perfect!
Task Application Context
Application
App Buffer
Slide Source: Distributed Systems Course 2011/2012 by Dennis Pfisterer, Institute of Telematics, University of Lübeck, Germany 5
6. Simple Benchmark: Copy vs. Zero-Copy
Duration [ms]
Data [Mbyte]
Slide Source: Distributed Systems Course 2011/2012 by Dennis Pfisterer, Institute of Telematics, University of Lübeck, Germany 6
7. Zero-Copy Between Communication Layers
• Often copying is not necessary
– If data is not modi ed a slice can be passed
forward without copying to a different buffer
Ethernet IP TCP HTTP XML
Application
Ethernet IP TCP HTTP XML
Transport Ethernet IP TCP HTTP XML
Internet Ethernet IP TCP HTTP XML
Link Layer Ethernet IP TCP HTTP XML
Slide Source: Distributed Systems Course 2011/2012 by Dennis Pfisterer, Institute of Telematics, University of Lübeck, Germany 7
8. Zero-Copy Between Communication Layers
• Sometimes slices of multiple packages can be
combined to extract e.g., a payload that is split
over multiple packages
• Newly “created” buffer points to original buffers
à No copying necessary
Virtual
HTTP (Part 1) HTTP (Part 2)
Buffer
Received TCP HTTP (Part 1) TCP HTTP (Part 2)
Buffers
8
10. Request Processing in Multi-Thread Servers
t1: Thread S1: ServerSocket Waits most of time db1: DataBase
socket = accept()
s2: Socket without doing
<<create>>(socket)
t2: Thread actual work!
run()
socket = accept()
waitForData()
bytes = read() <<create>>
d1: Decoder
decode(bytes)
waitForData()
bytes = read()
req = decode(bytes)
<<create>>
s1: Servlet
processRequest(req)
query(...)
response results
write(response)
= thread idle
10
11. Request Processing in Multi-Thread Servers
• Usually one thread per request
– Thread idle most of the time (e.g. waiting for I/O)
– Thread even more idle when network slow
– Number of simultaneous clients mostly limited by
maximum number of threads
• Thread construction is expensive
– Higher latency when constructing on request
– Can be improved using pools of Threads
(see Java‘s ExecutorService & Executors classes)
11
12. Request Processing in Event-Driven Servers
s1: Socket s2: Socket io1: NioWorker e1: ExecutorThread = request 1
dataAvailable() = request 2
bytes = read() handleEvent(s1, bytes)
<<create>>
dataAvailable() d1: Decoder
decode(bytes)
bytes = read() handleEvent(s2, bytes)
<<create>>
dataAvailable() d2: Decoder
decode(bytes)
bytes = read() handleEvent(s1, bytes)
req = decode(bytes)
resp = processRequest(bytes)
write(resp)
dataAvailable()
bytes = read() handleEvent(s2, bytes)
req = decode(bytes)
resp = processRequest(bytes)
write(resp)
Disclaimer: this slide may contain errors and is far away from real implementation code but should do good for illustrative purposes 12
13. Request Processing in Event-Driven Servers
• Calls to I/O functions of OS are non-blocking
• Heavy usage of zero-copy strategies
• Everything is an event
– Data available for reading
– Writing data
– Connection established / shut down
• Different requests share threads
• Work is split into small tasks
– Tasks are solved by worker threads from a pool
– Thread number and control decoupled from individual
connections / simultaneous requests
• Number of simultaneous clients can be very high
– Netty: 50.000 on commodity hardware!
13
15. Introduction to Netty
• „The Netty project is an effort to provide an asynchronous
event-driven network application framework for rapid
development of maintainable high-performance protocol
servers & clients.“
Source: http://netty.io
• Good reasons to use Netty:
• Simpli es development
• Amazing performance
• Amazing documentation (Tutorials, JavaDocs), clean concepts
• Lots of documenting examples
• Unit testability for protocols
• Heavily used at e.g., twitter for performance critical applications
15
17. Introduction to Netty - Buffers
• Netty uses a zero-copy strategy for efficiency
• Primitive byte[] are wrapped in a ChannelBuffer
• Simple read/write operations, e.g.:
– writeByte()
– writeLong()
– readByte()
– readLong()
– …
• Hides complexities such as byte order
• Uses low overhead index pointers for realization:
17
18. Introduction to Netty - Buffers
• Combine & slice ChannelBuffers without copying
any payload data by e.g.,
– ChannelBuffer.slice(int index, int length)
– ChannelBuffers.wrappedBuffer(ChannelBuffer... Buffers)
• Pseudo-Code Example:
requestPart1 = buffer1.slice(OFFSET_PAYLOAD,
buffer1.readableBytes() – OFFSET_PAYLOAD);
requestPart2 = buffer2.slice(OFFSET_PAYLOAD,
buffer2.readableBytes() – OFFSET_PAYLOAD);
request = ChannelBuffers.wrappedBuffer(requestPart1, requestPart2);
Virtual
HTTP (Part 1) HTTP (Part 2)
Buffer
Received TCP HTTP (Part 1) TCP HTTP (Part 2)
Buffers
18
20. Introduction to Netty - Codes
• Many protocol encoders/decoders included
– Base64
– Zlib
– Framing/Deframing
– HTTP
– WebSockets
– Google Protocol Buffers
– Real-Time Streaming Protocol (RTSP)
– Java Object Serialization
– String
– (SSL/TLS)
20
21. Introduction to Netty - Codecs
• Abstract base classes for easy implementation
– OneToOneEncoder
– OneToOneDecoder
• Converts one Object (e.g. a ChannelBuffer) into another (e.g.
a HttpServletRequest)
– ReplayingDecoder
• The bytes necessary to decode an Object (e.g. a
HttpServletRequest) may be split over multiple data events
• Manual implementation forces to check and accumulate data
all the time
• ReplayingDecoder allows you to implement decoding
methods just like all required bytes were already received
21
23. Introduction to Netty – Pipelines & Handlers
• Every socket is attached
to a ChannelPipeline
• It contains a stack of
handlers
– Protocol
Encoders / Decoders
– Security Layers
(SSL/TLS, Authentication)
– Application Logic
– ...
23
24. Introduction to Netty – Pipelines & Handlers
• One ChannelPipeline per
Connection
• Handlers can handle
– Downstream events
– Upstream events
– Or both
• Everything is an event
– Data available for reading
– Writing data
– Connection established /
shut down
– …
24
25. Netty – ChannelPipeline Example: HTTP(S) Client
Client Application • Applications based
read(httpResponse) write(httpRequest) on Netty are built as
Channel a stack
httpResponse httpRequest
• Application Logic
ChannelPipeline
HttpRequestDecoder HttpRequestEncoder
String String
sites on top of the
StringDecoder StringEncoder channel
ChannelBuffer ChannelBuffer
• Everything else
SSLDecoder SSLEncoder
(decoding,
ChannelBuffer ChannelBuffer
securing, ...) is done
OS Socket object
inside the pipeline
Disclaimer: this slide is imprecise, may contain errors and there’s no one-to-one implementation. It shows a logic conceptual view of the Netty pipeline. 25