3. Application dimension
“OSI is a beautiful dream, and TCP/IP is living it!” - Einar Stefferud
Application
Presentation
Session
Transport
Network
Data Link
Physical
OSI Model
Application
Transport
Network
Data Link
TCP/IP Model
HTTP, DNS,
SSH, DHCP, …
TCP, UDP
IPv4, IPv6, ARP
Ethernet
4. Infrastructure dimension
Management plane
Control plane
Data plane
UX, CLI, REST-API, SNMP, …
Distributed (OSPF, BGP, Gossip-based), Centralized(OpenFlow, OVSDB)
User/Operator/Tools managing Network Infrastructure
Signaling between network entities to exchange reachability states
Actual movement of application data packets
IPTables, IPVS, OVS-DP, DPDK, BPF, Routing Tables, …
5. Application
Transport
Network
Data Link
MgmtPlane
ControlPlane
DataPlane
Docker networking
• Provides portable application services
• Service-Discovery
• Load-Balancing
• Built-in and pluggable network drivers
• Overlay, macvlan, bridge
• Remote Drivers / Plugins
• Built-in Management plane
• API, CLI
• Docker Stack / Compose
• Built-in distributed control plane
• Gossip based
• Encrypted Control & Data plane
7. Application Stackversion: "3"
services:
web:
ports:
- “8080:80”
networks:
- frontend
deploy:
replicas: 2
app:
networks:
- frontend
- backend
db:
networks:
- backend
networks:
frontend:
driver: overlay
backend:
driver: overlay
driver_opts:
encrypted : true
Stack Deploy$ docker stack deploy -c d.yml demo
Creating network demo_frontend
Creating network demo_backend
Creating service demo_web
Creating service demo_app
Creating service demo_db
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
n5myqlubepvl demo_backend overlay swarm
4m5e9hn5x0xx demo_frontend overlay swarm
$ docker service ls
ID NAME MODE REPLICAS
69rwee5mbbzm demo_web replicated 2/2
gkwx4z4ksrz1 demo_app replicated 1/1
4m5e9hn5x0xx demo_db replicated 1/1
8. Application Stack
$ docker stack deploy -c d.yml demo
Creating service demo_web
Creating service demo_app
Creating service demo_db
Creating network demo_frontend
Creating network demo_backend
Day in life of a Stack Deploy
• Manager only operation
• Reserves network resources at mgmt plane
such as subnet and vxlan-id. No impact to
the data-plane yet.
• Manager reserves service and task
resources : Service VIP and Task IPs
• Tasks Scheduled to swarm workers
• Network scoped Service Registration on Docker DNS
server
• Service name -> VIP
• Task name -> Task IP
• task.Service-Name -> All Task IPs
• Exchange SD & LB states via Gossip
• Prepare Data-plane*
• Call Driver APIs and exchange driver states via Gossip
10. De-centralized events
Swarm Scope Gossip
W1
W2
W3
W1
W5
W4
Network Scope Gossip
• Eventually consistent
• State dissemination through
de-centralized events
• Service Registration
• Load-Balancer configs
• Routing states
• Fast convergence
• ~ O(logn)
• Highly scalable
• Continues to function even if all
managers are Down
Gossip
11. State dissemination
Node A
Broadcast state
change to 3 nodes in
the network-scope
Random
Node C
Random
Node D
Random
Node E
9 More
nodes
receive
rebroadcast
Rebroadcast
Entire cluster
receives
rebroadcast
Rebroadcast
Accept state update only if
entry’s lamport time is greater
than the lamport time of
existing entry
Random
Node F
Periodic bulk sync to a
random node in the
network-scope
Create State
12. Worker1
task1.web task2.web
Worker3
demo_frontend overlay network (vxlan-id 4097)
DNS resolver
127.0.0.11
Worker2
task1.app
Docker
DNS
server
Docker
DNS
server
Docker
DNS
server
DNS resolver
127.0.0.11
DNS resolver
127.0.0.11
DNS resolver
127.0.0.11
task1.db
web 10.0.1.4
(vip)
app 10.0.1.8 (vip)
task1.web 10.0.1.5
task2.web 10.0.1.6
task1.app 10.0.1.9
Service Discovery states
Routing states
10.0.1.6 :{Worker2,4097}
10.0.1.9 :{Worker2,4097}
demo_backend overlay network (vxlan-id 4098)
web 10.0.1.4
(vip)
app 10.0.1.8 (vip)
task1.web 10.0.1.5
task2.web 10.0.1.6
task1.app 10.0.1.9
Service Discovery states
Routing states
10.0.1.5 :{Worker1,4097}
db 10.0.2.4
(vip)
app 10.0.2.8 (vip)
task1.db 10.0.2.5
task1.app 10.0.2.6
Service Discovery states
Routing states
10.0.2.5 :{Worker3,4098}
db 10.0.2.4
(vip)
app 10.0.2.8 (vip)
task1.db 10.0.2.5
task1.app 10.0.2.6
Service Discovery states
Routing states
10.0.2.6 :{Worker2,4098}
Gossip Gossip
10.0.1.5 10.0.1.6 10.0.1.9 10.0.2.6 10.0.2.5
17. Worker1
task1.web task2.web
Worker3
demo_frontend overlay network (vxlan-id 4097)
DNS resolver
127.0.0.11
Worker2
task1.app
Docker
DNS
server
Docker
DNS
server
Docker
DNS
server
DNS resolver
127.0.0.11
DNS resolver
127.0.0.11
DNS resolver
127.0.0.11
task1.db
web 10.0.1.4
(vip)
app 10.0.1.8 (vip)
task1.web 10.0.1.5
task2.web 10.0.1.6
task1.app 10.0.1.9
Service Discovery states
Routing states
10.0.1.6 :{Worker2,4097}
10.0.1.9 :{Worker2,4097}
demo_backend overlay network (vxlan-id 4098)
web 10.0.1.4
(vip)
app 10.0.1.8 (vip)
task1.web 10.0.1.5
task2.web 10.0.1.6
task1.app 10.0.1.9
Service Discovery states
Routing states
10.0.1.5 :{Worker1,4097}
db 10.0.2.4
(vip)
app 10.0.2.8 (vip)
task1.db 10.0.2.5
task1.app 10.0.2.6
Service Discovery states
Routing states
10.0.2.5 :{Worker3,4098}
db 10.0.2.4
(vip)
app 10.0.2.8 (vip)
task1.db 10.0.2.5
task1.app 10.0.2.6
Service Discovery states
Routing states
10.0.2.6 :{Worker2,4098}
Gossip Gossip
18. /etc/resolv.conf
nameserver 127.0.0.11 web 10.0.1.4
(vip)
app 10.0.1.8 (vip)
task1.web 10.0.1.5
task2.web 10.0.1.6
task1.app 10.0.1.9
task2.app 10.0.1.10
Docker DNS Server
Docker Daemon
Dissecting the DNS lookup
task1.web
resolve
“app”
IPTables
{127.0.0.11, 53} : DNAT
DNS Query
“app” to
127.0.0.11
DNS A Record
query : “app”
19. /etc/resolv.conf
nameserver 127.0.0.11
Dissecting the DNS lookup
task1.web
IPTables
{127.0.0.11, 53} : DNAT
DNS A Record
response : “app”
: 10.0.1.8
web 10.0.1.4
(vip)
app 10.0.1.8 (vip)
task1.web 10.0.1.5
task2.web 10.0.1.6
task1.app 10.0.1.9
task2.app 10.0.1.10
Docker DNS Server
Docker Daemon
20. /etc/resolv.conf
nameserver 127.0.0.11
Dissecting the DNS-rr lookup
task1.web
IPTables
{127.0.0.11, 53} : DNAT
DNS A Record
response : “app”
: [
10.0.1.9,
10.0.1.10
]
web 10.0.1.4
(vip)
app 10.0.1.9
10.0.1.10
task1.app 10.0.1.9
task2.app 10.0.1.10
task1.web 10.0.1.5
Docker DNS Server
Docker Daemon
docker service create —name=app —endpoint-mode=dns-rr demo/my-app
22. $ docker info
…
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge contiv/v2plugin:latest host macvlan null overlay
Swarm: active
Drivers provide data-plane
23. Docker host 1 Docker host 2 Docker host 3
CntnrA CntnrB CntnrC CntnrD CntnrE CntnrF
Overlay network
All containers on the overlay network can communicate!
What is Docker Overlay Networking
The overlay driver enables simple and secure multi-host networking
24. Docker Overlay• The overlay driver uses VXLAN
technology
• A VXLAN tunnel is created on top of
underlay network(s)
• At each end of the tunnel is a VXLAN
tunnel end point (VTEP)
• The VTEP performs encapsulation and
de-encapsulation
• The VTEP exists in the Docker Host’s
network namespace
VXLAN
25. Docker Host 1 Docker Host 2
172.31.1.5 192.168.1.25
Br0 Br0
VXLAN tunnel
VTEP
:4789/udp
VTEP
:4789/udp
veth veth
C1: 10.0.0.3 C2: 10.0.0.4
Network
Namespace
Network
Namespace
Layer 3 IP transport network
Building an Overlay Network (more detailed)
27. root@my-host $ docker network ls
NETWORK ID NAME DRIVER SCOPE
jm1eohsff6b4 demo_default overlay swarm
a5f124aef90b docker_gwbridge bridge local
root@my-host $ ls /var/run/docker/netns
1-jm1eohsff6 1-o2hnj2jm1f 2229639766c2 79f0ad997956 ingress_sbox
root@my-host $ nsenter —net=/var/run/docker/netns/1-jm1eohsff6
root@my-host $ brctl show br0
bridge name bridge id STP enabled interfaces
br0 8000.3a87525fe051 no vxlan0
veth0
veth1
Overlay dataplane
28. root@my-host $ ip -d link show br0
2: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP
mode DEFAULT group default
link/ether 3a:87:52:5f:e0:51 brd ff:ff:ff:ff:ff:ff promiscuity 0
bridge forward_delay 1500 hello_time 200 max_age 2000 addrgenmode eui64
root@my-host $ ip -d link show veth0
17: veth0@if16: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue
master br0 state UP mode DEFAULT group default
link/ether be:dc:c5:da:8c:0d brd ff:ff:ff:ff:ff:ff link-netnsid 2
promiscuity 1
veth
bridge_slave state forwarding priority 32 cost 2 hairpin off guard off
root_block off fastleave off learning on flood on addrgenmode eui64
Overlay dataplane
29. root@my-host $ ip -d link show vxlan0
14: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master
br0 state UNKNOWN mode DEFAULT group default
link/ether f6:ae:70:27:6c:9c brd ff:ff:ff:ff:ff:ff link-netnsid 0
promiscuity 1
vxlan id 4097 srcport 0 0 dstport 4789 proxy l2miss l3miss ageing 300
bridge_slave state forwarding priority 32 cost 100 hairpin off guard off
root_block off fastleave off learning on flood on addrgenmode eui64
Overlay dataplane
30. root@my-host $ ip -s neighbor show
10.0.0.6 dev vxlan0 lladdr 02:42:0a:00:00:06 used 1100/1100/1100 probes 0 PERMANENT
10.0.0.3 dev vxlan0 lladdr 02:42:0a:00:00:03 used 1101/1101/1101 probes 0 PERMANENT
root@my-host $ bridge fdb show
…
f6:ae:70:27:6c:9c dev vxlan0 vlan 1 master br0 permanent
02:42:0a:00:00:03 dev vxlan0 dst 192.168.56.101 link-netnsid 0 self permanent
02:42:0a:00:00:06 dev vxlan0 dst 192.168.56.101 link-netnsid 0 self permanent
be:dc:c5:da:8c:0d dev veth0 vlan 1 master br0 permanent
3a:87:52:5f:e0:51 dev veth1 vlan 1 master br0 permanent
…
Overlay dataplane
36. root@my-host $ iptables -nvL -t mangle
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 MARK all -- * * 0.0.0.0/0 10.0.0.7 MARK set 0x101
0 0 MARK all -- * * 0.0.0.0/0 10.0.0.4 MARK set 0x100
root@my-host $ ipvsadm -L
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
FWM 256 rr
-> 10.0.0.5:0 Masq 1 0 0
-> 10.0.0.6:0 Masq 1 0 0
FWM 257 rr
-> 10.0.0.3:0 Masq 1 0 0
root@my-host $ conntrack -L
tcp 6 431997 ESTABLISHED src=10.0.0.8 dst=10.0.0.4 sport=33635 dport=80
src=10.0.0.5 dst=10.0.0.8 sport=80 dport=33635 [ASSURED] mark=0 use=1
Client-side Load Balancing
37. /etc/resolv.conf
nameserver 127.0.0.11
Client-side DNS-rr Load Balancing
task1.web
DNS A Record
response : “app”
: [
10.0.1.9,
10.0.1.10
]
web 10.0.1.4
(vip)
app 10.0.1.9
10.0.1.10
task1.app 10.0.1.9
task2.app 10.0.1.10
task1.web 10.0.1.5
Docker DNS Server
Docker Daemon
docker service create —name=app —endpoint-mode=dns-rr demo/my-app
app : [ 10.0.1.9,
10.0.1.10 ]
38. Routing Mesh
• Native load balancing of requests coming
from an external source
• Services get published on a single port
across the entire Swarm
• Incoming traffic to the published port can be
handled by all Swarm nodes
• Traffic is internally load balanced as per
normal service VIP load balancing
Ingress Network
Docker host 2
task2.myservice
Docker host 1
task1.myservice
Docker host 3
IPVS IPVS IPVS
8080 8080 8080
Ingress network
docker service create -p 8080:80 nginx
42. Thank You.
106270 - Deep Dive in Docker Overlay Networks (Apr 19, 3:45 PM)
110420 - Docker Networking in Production at Visa (Apr 19, 2:25 PM)
@docker #dockercon