David Barroso - Spotify
Paolo Lucente - PMACCT
When you travel, do you carry an atlas or a local map? In this talk
we will explain the whys and hows of a possible approach of carrying
only a local map of the internet instead of the full internet routing
table. The solution we will present is based on free, open-source and
publicly available software and scripts and a prototype is being
evaluated at Spotify.
Language: English
Register today for the next edition of PLNOG: krakow.plnog.pl
Best SEO Services Company in Dallas | Best SEO Agency Dallas
PLNOG14: Waltzing on that gentle trade‐off between internet routes and FIB space, an SDN story - David Barroso, Paolo Lucente
1. Waltzing on that gentle
trade‐off between
internet routes and FIB
space, an SDN story
2. Network Engineer @Spotify
● +10 years in the network industry
● Python enthusiast
● Automation junkie
David Barroso
Principal Software Developer @pmacct
● +10 years measuring and correlating traffic flows
● Service Providers are his DNA
Paolo Lucente
3. Spotify is a commercial music streaming service
providing digital rights management-restricted
content from record labels [...] Paid "Premium"
subscriptions remove advertisements and allow
users to download music to listen to offline.
Source: http://en.wikipedia.org/wiki/Spotify
What is Spotify?
4. ● Over 60 million monthly active users
● 15 million paid subscribers
● More than 30 million songs
● Around 20.000 songs added each day
● Available in 58 markets
Source: https://press.spotify.com/information/
Some facts about Spotify
5. ● 4 major datacenters
● Stockholm
● London
● San Jose
● Ashburn
● Users are directed to the closest datacenter by
default.
● In case of fault or maintenance users can be
redirected to another datacenter.
Global Architecture of Spotify
6. RIB vs. FIB
● RIB (Routing Information Base)
● A representation in memory of all available paths and their attributes. This
information is fed by routing protocols.
● FIB (Forwarding Information Base)
● A copy of the RIB (usually in hardware) where some attributes are resolved
(like next-hop or outgoing interface)
7. RIB vs. FIB (cont’d)
● RIB (Routing Information Base)
● Virtually unlimited (limited only by the memory of the device)
● FIB (Forwarding Information Base)
● Limited by the underlying hardware.
● Between 64k-128k LPM prefixes in modern switches with commodity ASICs.
● Between 500k-1000k LPM prefixes in expensive routers/switches with customized ASICs.
8. RIB vs. FIB (cont’d)
● Typically there is a one to one mapping between RIB and FIB entries.
● The Internet is comprised of +512k prefixes.
● This means we need to support +512k FIB entries if we want to reach
the entire Internet.
9. When you travel...
● Do you carry an Atlas?
● Or do you carry a local map?
So…
● Why do I need all the prefixes?
● What if I only install the prefixes I really need?
10. Do we really need to reach the entire Internet?
● Spotify streams music to users.
● We typically serve users in the closest datacenter.
● Why would my datacenter in San Jose need to know how to reach
users in Poland?
11. Example: Stockholm Datacenter
● Total unique IPv4 prefixes (including transit): 519.000
● Prefixes coming from peers: 150.000
● Average prefixes active per day: approx. 16.000
● Average prefixes from peers active per day: approx. 2.000
12. Example: Stockholm Datacenter
● Green line shows the total amount
of prefixes that have been used at
least once
● Blue line the amount of new
prefixes required at a given time
● Total prefixes: 519.000
● Prefixes added within the first hour:
6412
● New prefixes added within the
second hour: 1818
● New prefixes added within the
second hour: 1242
● New prefixes added within the
tenth hour: 327
13. Example: Stockholm Datacenter
● Red line shows total bandwidth
● Blue line shows traffic with a
matching prefix in the FIB
● Purple line shows % of traffic
matched.
● When we start we match 0% of the
traffic.
● After the first iteration (6412 routes
added) we match 99.263%
● After that we go to 99.9%.
● New prefixes are most likely scans
14. Goal
● Pick only the routes I need from the RIB so I can fit them on the FIB of a
switch with commodity ASICs.
15. Motivation
Switches:
● Are cheaper.
● Have higher 10G density
● 256x10G in 2 RU or 48x10G + 2x100G in 1 RU vs 288 x 10G ports in 10 RU
● Consume less power
● 3.4W vs 33W per 10G port
16. SDN Internet Router key pieces:
Selective Route Download
Feature that allows you to pick a subset of the routes on the RIB and
install them on the FIB. Available in Cumulus (bird), Arista*, Juniper, some
Cisco routers and probably others.
* If you plan to try on Arista contact your SE.
18. pmacct: BGP integration
pmacct introduced a Quagga-based BGP daemon:
● Implemented as a parallel thread within the collector
● Doesn’t send UPDATEs and WITHDRAWs whatsoever
● Behaves as a passive BGP neighbor
● Maintains per-peer BGP RIBs
● Supports 32-bit ASNs; IPv4, IPv6 and VPN families
● Supports ADD-PATH (draft-ietf-idr-add-paths)
Why BGP at the collector?
● Telemetry reports on forwarding-plane, and a bit more.
● Extended visibility into control-plane information
19. SDN Internet Router: overview
Internet
Switch
Transit/
Backbo
ne
IXP
pmacct
BGP Controller
Spotify AP
● Transit will send the default route to the
Internet Switch. The route is installed by
default in the FIB.
● We receive from the IXP all the peers´
prefixes. Those are not installed, they
are forwarded to pmacct and the BGP
Controller.
● Pmacct will receive in addition sFlow
data.
0.0.0.0/0
Peers’
prefixes
Peers’
prefixes
Peers’
prefixes
& sFlow
20. SDN Internet Router: pmacct
Internet
Switch
Transit
IXP
pmacct
BGP Controller
Spotify AP
● pmacct aggregates sFlow data using
the BGP information previously sent by
the Internet Switch.
● pmacct reports the flows aggregated by
BGP prefix to the BGP controller.
● The BGP controller will compute the
best TopN prefixes.
● The BGP controller instructs the Internet
switch to install those TopN* prefixes.
* N is a number close to the maximum number of entries
that the FIB of the Internet Switch can support.
1. This is the flow
information
aggregated per
prefix
3. Please,
install these
prefixes I
computed.
Peers’
prefixes
& sFlow
2. Compute best
TopN prefixes.
21. SDN Internet Router: Spotify’s AP
Internet
Switch
Transit
IXP
pmacct
BGP Controller
Spotify AP
● $USER connects to the service
● The application informs the access
point that $USER has connected and
requests that the $PREFIX containing
his/her $IP is installed on the FIB.
● It might be installed already as another user
within the same range might have
connected previously or because pmacct
reported that prefix as being one of the
TopN prefixes.
● The BGP controller instructs the Internet
Switch to install the prefix if necessary.
1. $USER connects with
$IP.
2. Please, make sure
$PREFIX containing $IP
is installed.
3. Install $PREFIX (if
it wasn’t already).
22. Considerations
The BGP controller updates a prefix list containing the prefixes that the
device must take from the RIB and install on the FIB.
● If a prefix is removed from the RIB it will be removed from the FIB by the
device.
● If the BGP controller fails the prefix list remains in the device. Allowing the
device to operate normally as per the last instructions.
“Warming up” the FIB is important if you don’t want to buy as much transit
capacity as peering capacity you have. You can:
● If your application supports it, move users by region to the new location.
● Precompute the FIB using the flow information from another location and the
BGP table from the new location before you start advertising your prefixes.
23. Present and Future
● Demo run in Stockholm connected to Netnod to gather information (no
actual changes performed on the Internet Router there).
● Pilot to be run very soon in a major IXP in Europe.
24. Simulation in Stockholm: N = 10.000 prefixes
● Green line shows the total amount
of prefixes that have been used at
least once
● Blue line the amount of new
prefixes required at a given time.
● Red line shows the amount of
prefixes removed.
25. Simulation in Stockholm: N = 10.000 prefixes
● Red line shows total bandwidth
● Blue line shows traffic with a
matching prefix in the FIB
● Purple line shows % of traffic
matched.
● With only 10.000 prefixes we
offload around 99% of the traffic at
all times.
26. Simulation in Stockholm: N = 30.000 prefixes
● Green line shows the total amount
of prefixes that have been used at
least once
● Blue line the amount of new
prefixes required at a given time.
● Red line shows the amount of
prefixes removed.
27. Simulation in Stockholm: N = 30.000 prefixes
● Red line shows total bandwidth
● Blue line shows traffic with a
matching prefix in the FIB
● Purple line shows % of traffic
matched.
● With only 30.000 prefixes we
offload around 99.9% of the traffic
at all times.
28. Do you want to peer with us?
● Find us at peeringDB. We are peering under as8403.
● Send an email to peering@spotify.com even if we are not present in
your location.