In this assessment, Spotify's advantages, technical specifications and a review of Quality of Service and Experience will be presented. To determine how is the quality of the audio affected by the application, comparison between the streamed audio and the audio from the original.
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Spotify - Technical Solutions & Quality of Experience Review
1. FACULTY OF ENGINEERING
DEPARTMENTS OF ELECTRONICS ENGINEERING AND AUTOMATION ENGINEERING
PIRAEUS UNIVERSITY OF APPLIED SCIENCES
Module: Multimedia Communications
Module Coordinator: D.Kalivas, A.Papadakis
MSc IN NETWORKING AND DATA COMMUNICATIONS
COURSEWORK
MODULE:
CI7120: MULTIMEDIA COMMUNICATIONS
ID: K1465156
Module Coordinator:
D.Kalivas, A.Papadakis
Date of Module:
22/1/2016
Name of Student:
GRIGOROPOULOS MICHAIL
Kingston University London
2. FACULTY OF ENGINEERING
DEPARTMENTS OF ELECTRONICS ENGINEERING AND AUTOMATION ENGINEERING
PIRAEUS UNIVERSITY OF APPLIED SCIENCES
Module: Multimedia Communications
Module Coordinator: D.Kalivas, A.Papadakis
Subject: Spotify - Technical Solutions & Quality of Experience Review
Submission Date: 22/1/2016
Grade (%):
% Grade reduction because of submission delay:
(5% Grade reduction per every day of Cwk delay).
Final Grade (%):
Kingston University London
3. 1
Spotify
Technical Solutions & Quality of Experience Review
Grigoropoulos Michail
Kingston University London
TEI of Piraeus, Automation Department
Athens, Greece
Abstract — Because of technology's rapid development and the rapid spread of digital distribution of music
through the Internet, the music industry had to find new ways to reach new clients without losing more profit. Music
CDs now-days seem obsolete to younger people who can now download music from the internet in a few seconds.
Online music stores were not enough to cover the huge gaps that piracy and Youtube created but music streaming
applications such as Soundcloud, Pandora and Spotify were able to attract people's interest by giving a more complete
package. Spotify is the most famous among these applications and maybe the one with the best audio and streaming
quality and best functions. In this assessment, Spotify's advantages, technical specifications and a review of Quality of
Service and Experience will be presented. To determine how is the quality of the audio affected by the application,
comparison between the streamed audio and the audio from the original CD, will be conducted. As part of the QoS
review process, streaming will be tested on different network conditions.
Keywords — Spotify; Application; Assesment; Specifications; QoS, QoE, Audio Quality, Streaming;
I. INTRODUCTION
Spotify was developed in Stockholm, Sweden in 2006. It is a commercial music
application/service that offers digital rights protected content from record labels and media
companies. [1] The Spotify application is available for Android, iOS, MAC OS, Windows,
Playstation, WedOS and Linux. The user has to install the application to the hard drive of his
terminal and register an account. There is an add-funded free version and the ad-free paid
version with extra privileges and better streaming quality (up to 320 kbit/s). The user logs in
the application with his account and he gains access to unlimited music options. Spotify has
about 75 million users. 20 million of them are using the paid version. Both versions allow
unlimited streaming and the same catalogue of tracks. Spotify has an extremely large
catalogue of over 8 million tracks that is constantly updated with new songs and there is no
cost per track. Users can search music by song, artist, album, genre or even record label.
Songs can be cached locally and users can listen to them even if there is no internet
connection. It can also play local music files. Users can create radio stations or make playlists
that other users can access. Spotify also allows users to integrate their social media and
Last.fm accounts. Users can listen to what other users are listening, they can exchange music
messages and "Follow" each other's profiles. Spotify is a fast expanding network through
collaborations with mobile operators such as Cosmote and even car manufacturers like Volvo.
Nevertheless, the most important collaboration is the one with the social network, Facebook.
Spotify has been a leader in this space since taking early advantage of Facebook’s new open
graph. Through that tool, Spotify evolved to be a ‘musical social network’. [2][3]
Every Monday Spotify delivers 75 million unique mixtapes, these mixtapes are called
"Discover Weekly". Spotify users get a playlist based on their own listening as well as what
others are playlisting and listening to around, that the user might also like. By combining
clever algorithms with human curation, personalization is managed at scale and with great
quality.
4. 2
One of the distinguishing features of the Spotify client is its low playback latency. Spotify's
service is run by two data centers. One located in London and one in Sweden. Spotify relies
basically on a peer-to-peer (P2P) network to stream audio, like torrent networks do but it also
uses servers to solve the instability issues that a peer-to-peer network implements. It seems to
deliver a great overall quality using various clever techniques such as preloading subsequent
tracks. This way, playback delays are never experienced by the user in a fair network
environment. Spotify uses the Vorbis coding format at different bit-rates for non-paying and
paid users, to compress audio files.
Spotify does not offer CD quality nor high-resolution audio services, so comparing the
original CD track to the one streamed by Spotify would be useful to determine to what extent,
quality is reduced by the process. It would be also useful to test the different quality options
that Spotify offers at different network conditions.
II. HOW IT WORKS
To deliver instant music, Spotify uses a very clever custom protocol. It is based on a peer-to-
peer network similar to a torrent network. Files are streamed either by Spotify's host servers
or by the peer-to-peer network. Audio streams are encoded with the use of Ogg Vorbis. In the
free version q5 is the default quality used at about 160 Kbps, while in the paid version q9 at
320 Kbps can be used. For both files (either q5 or q9) no re-encoding is done from the peers.
That practically means that a peer that asks for the q9 version, can only get it from a peer that
has the q9 version cashed and not from the one that has the q5 version of the file.
When the user clicks to play a track, if the track is already cached, Spotify starts playing it
from there. Otherwise, the Spotify client requests the first 15 seconds of the track from the
Spotify servers so that playback can start as soon as possible. At the same time, the client
starts looking for the track on the peer-to-peer network. The rest of the track is streamed, from
a combination of multiple sources if available (cache, multiple peers, Spotify servers). The
more popular a track is, the more likely it will be streamed using the p2p network instead of
the Spotify servers. When the track has 30 seconds to go, the Spotify client begins searching
the p2p network for the next track. When the track has 10 seconds to go, if it hasn’t found the
next track on the network yet, the client starts pre-fetching it from the Spotify servers. [3]
Fig.1 How a track is streamed Fig.2 Where does the music comes from
A track can be uploaded to the peer only if it is fully stored in the client's cache. Tracks are
small so there is no need for breaking the file in parts. This would add complexity to the
protocol since it would have to include overhead communicating what parts of the file any
5. 3
client has. Normally most of the content delivery is made by the peers but if a client is
running out of buffer, it requests data from the server which is considered to be more reliable.
Unlike to other streaming applications, Spotify uses TCP instead of UDP. Between two peers,
a single TCP connection is used. This way, messages are multiplexed over the connection by
the application's protocol. TCP is preferred because of the peer-to-peer network instability.
The resending of lost packets is quite useful for the application and the explicit connection
signaling is friendlier to firewalls. It is also friendlier to the network because of its congestion
control. TCP uses congestion avoidance algorithms (such as TCP New Reno and CUBIC), to
limit large busts of data. The congestion window starts out small and gets bigger as data is
uploaded successfully. While clients are running, they also keep alive a TCP connection to the
server who transmits prioritized context material, including advertisements, band information,
lyrics, messages supporting interactive browsing etc. This connection will be active for a long
time and as data is sent over to the client the congestion windows will grow bigger. This
allows the server to quickly send the most or all of the response without waiting for ACKs
from the client. [9]
A. The Peer-to-Peer Network
Peer-to-Peer is a model that works quite differently from the standard client-server model.
The client-server model is the model in which Web Services are stored in a powerful server
and the client requests information from it. A single server can host files for hundreds of
simultaneous clients. This, in many cases, creates extreme physical hardware requirements
and network resources. In Peer-to-Peer model on the other hand, every client becomes a
server. Between two clients, a two way path is opened and they both upload and download
files. Seeding / Giving back is a critical parameter for the success of a Peer-to-Peer network.
Unlike the client-server model, in Peer-to-Peer, performance upgrades as more users are
added to the network.
To keep the protocol simple and mitigate latencies and overhead, Spotify does not use
general routing. Two clients must be directly connected to exchange data. Clients make local
decisions about where to stream, depending on the amount of data in their play-out buffers. If
a peer is too slow to satisfy the client's request, it gets replaced by another peer.
The number of peers each client can have at any time is confined so that domestic use routers
(that act as firewalls and NAT devices) can handle the traffic that multiple TCP connections
can create. A soft (50 connections) and a hard limit (60 connections) is applied to all clients.
The hard limit is never exceeded. Each client tries to stay below the soft limit and
periodically reduces connections if the limit is exceeded. If the client need to disconnect from
one or more peers, it does with a heuristic mechanism that decides which connection will
close, by evaluating the peer's utilization and usefulness. This task is done with the use of 6
criteria:
Bytes sent in the last ten minutes.
Bytes sent in the last sixty minutes.
Bytes received in the last ten minutes.
Bytes received in the last sixty minutes.
Number of peers in the last sixty minutes.
Number of tracks the peer has.
Every client can upload simultaneously to at most four other clients (peers). Usually, for
houses, asymmetric internet connections are used. The bandwidth given for downloading is
6. 4
much more from the bandwidth given for uploading. By establishing this limit, it is ensured
that the application will only consume a small amount of the clients network resources.
The client needs to locate peers that have the requested track. For locating peers, two
mechanisms are used the first asks a tracker deployed on the server and the second puts a
query in the overlay network. Audio tracks are small enough so one or a few seeders are
needed for a track to stream without problems. To minimize overhead, Spotify does not use
Distributed Hash Table (DHT) to find peers. The tracker maintains a mapping from tracks to
peers who have recently reported that they have the file requested. Peers listed in the tracker
have the whole track. The tracker is notified only when the client plays a track. Clients do not
report their cached contents constantly. This way, the implementation of the tracker is
simplified and overhead is kept in low levels. When a client wants to play a track, it asks the
tracker for available peers who have the track. The tracker then replies with up to ten peers
who are currently available. Clients also create a chain reaction of requests in the overlay
network. The client sends the request to all its neighbors and the neighbors forward the
request to theirs (2 levels of response). To avoid responding to duplicate messages, peers keep
a log with the 50 most recent searches and this is the only overlay routing done in the Spotify
peer-to-peer protocol.[3]
Because Spotify uses TCP instead of UDP, Spotify's clients, do not perform any NAT
Traversal. Clients use UPnP protocol instead of asking routers for a port to use for incoming
connections. Universal Plug n Play protocol, is an open source protocol that almost
substitutes the NAT procedure by letting each device know about another device's presence in
the network and allow connections to be established between them with the use of the HTTP,
SOAP and XML on top of IP without configuring the default gateway or the clients.
Spotify's service is run by two data centers and clients randomly choose to which to connect
to, so that the load is balanced between the two of them. Each data center has each own peer-
to-peer overlay.[4]
B. Coding
Ogg codecs use octet vectors of raw, compressed data (packets). These compressed packets
do not have any high-level structure or boundary information; strung together, they appear to
be streams of random bytes with no landmarks.
Raw packets may be used directly by transport mechanisms that provide their own framing
and packet-separation mechanisms (such as UDP datagrams). For stream based storage (such
as files) and transport (such as TCP streams or pipes), Vorbis and other future Ogg codecs use
the Ogg bitstream format to provide framing/sync, sync recapture after error, landmarks
during seeking, and enough information to properly separate data back into packets at the
original packet boundaries without relying on decoding to find packet boundaries.
Ogg Vorbis is an open source patent-free compressed audio format for mid to high quality
(8kHz-48.0kHz, 16+ bit, polyphonic) audio and music at fixed and variable bit-rates from 6 to
128 Kbps/channel. The Vorbis audio CODEC provides channel coupling mechanisms
designed to reduce effective bit-rate by both eliminating interchannel redundancy and
eliminating stereo image information labeled inaudible or undesirable according to spatial
psychoacoustic models.
Vorbis codec is forward adaptive and is based on the Modified Discrete Cosine Transform
(MDCT) (1) for converting data from the time to the frequency domain. Data then is broken
7. 5
into noise floor and residue components, quantized and entropy coded with the use of
codebook-based vector quantization algorithm. [13]
(1)
Vorbis files (.ogg extension) compress to a smaller size than MP3 files, which reduces
bandwidth and storage requirements. According to many reports and listening tests, a Vorbis
file provides better sound quality than a file of the same size in MP3.[6] [11]
C. Caching
Spotify uses a cache in the hard drive of every client. Each time a user listens to a new audio
file, the file is cached to the device's hard drive. Every time the user wants to listen to the
same track, Spotify fetches it from the device's cache. The size of the cache is adjustable by
the application's settings. It can be set manually from 1GB to 100GB. The default setting is at
10% of the client's hard disk's free space but not less than 50MB and not more than 10GB
(1GB of cache holds around 200 tracks). If the cache reaches maximum capacity, the least
accessed audio tracks are deleted and new ones take their place. Cashed content is encrypted
and other music players cannot use it.
D. Predicting next track
Spotify uses a smart mechanism to predict the next track and starts fetching it before the
previous one finishes playing. Spotify logically decides what tracks it’s going to play (for
example, if you select an album and start playing the first song, it’ll play you the rest of the
tracks from that album). The prediction mechanism, is based on the fact that 61% of
playbacks, occur in a predictable sequence. This means that the user either finishes the song
and listens to next song in queue, or the user clicks on next button and Spotify plays the next
track in queue. Thirty seconds before the current track finishes, the client starts prefetching
the next track from the server. If the user clicks next track inside those thirty seconds the next
track will be able to load quickly and play from the server. If he chooses a random track from
another list, the sequence will be interrupted and the request will be abandoned. Ten seconds
before the previous track ends, the client requests the start of the next track from the server if
there are not enough resources from the peer network. [5]
If the user has not added any tracks to the queue or queue comes to end, then Spotify uses a
default play-queue mechanism to determine what it will play next in the list, based on a
number of factors:
The album the user selected - It plays tracks from the same album in the albums order
or in shuffle mode.
The playlist the user selected - It plays tracks from the same album in the albums order
or in shuffle mode.
The list of the user's latest search results - It plays tracks based on the user's history.
Radio - It chooses similar tracks and artists based on genre and users' preferences
Local Files - It plays tracks from the device's hard drive.
E. Load Balancing
To ensure good QoS and QoE, Spotify uses several load balancing techniques. If an access
point has a choice of sending each metadataproxy request to 1 of 4 metadataproxy machines,
8. 6
it gets a quick response from Spotify servers and if one of them becomes slow it should be
avoided. Techniques that are used for Load Balancing by Spotify:
Round Robin - This load distribution technique uses a statistical model to address
requests. It delivers requests in a cyclical way. The first request will be delivered to
the first machine, the second to another machine, the third to another machine, the
fourth to another machine and the cycle repeats itself from the beginning in the same
order.
Join the shortest queue (JSQ) - This technique sends a request to the machine with the
lowest number of outstanding requests. These are requests that haven’t returned yet,
meaning that they are on the way to a machine, being processed by the machine or on
the way back.
Circuit Breakers - Problems of round-robin and JSQ can be alleviated with circuit
breakers. A circuit breaker monitors latency and failure rate of different machines. It
takes a machine out of rotation if latency or failure rate are too high. Machines are
added back to rotation when metrics improve.
Expected Latency Selector (ELS) - ELS is a probabilistic load balancer. Every
machine has a score. A machine with a two times higher score than others, gets twice
as much traffic than the other. This score/weight is a positive real number. ELS
measures the success latency and success rate of each machine, the requests that have
been set out but no reply has been received yet and the failure latency for each
machine. ELS also uses an improved circuit breaker that takes a machine out of
rotation if it performs badly relatively to the global average. ELS weight is 1/E where
E is the computed expected latency (2). The lower latency, the higher the weight. [5]
[8][6]
E = L (q + 1) = (ℓ + (f + 800) (1 / s – 1)) (q + 1)) (2)
F. Monitoring
Operational monitoring is performed by Spotify's servers using a custom Spotify's database
that is called Heroic. Heroic is a useful tool for real-time data collection and presentation at
scale. Heroic uses Cassandra and Elasticsearch technologies. Cassandra acts as the primary
means of storage and Elasticsearch is used to index all data collected. At this time there are
over 200 Cassandra nodes in several clusters across the world, serving over 50 million distinct
time series. Every host in Spotify's infrastructure is running ffwd, which is an agent
responsible for receiving and forwarding metrics. [3][5]
III. QUALITY OF SERVICE REVIEW
To examine the level of QoS provided by Spotify, specific key performance metrics of the
service should be examined:
Data rate: A measure of transmission speed (Kbps).
Latency: Maximum time needed from transmission to reception.
Packet loss or error rate: a measure (in percentage) of error rate of the packetized data
transmission.
Dropped packets (if they arrive when buffers are full).
Jitter: a measure of smoothness of the audio playback, related to the variance of
frame/packet delays.
Spotify uses 3 different quality settings for streaming. Low, Medium and High quality (All in
the Vorbis format). [10]
9. 7
TABLE I. SPOTIFY'S AVAILABLE QUALITY OPTIONS
Nominal
Option
Quality (Experienced)
Bit-Rate PC Mobile
Normal ~96 Kbps Low Normal
High ~160 Kbps Normal High
Extreme* ~320 Kbps High Extreme
* Premium users only
Previous research measurements from KTH – Royal Institute of Technology proved that
Spotify's median playback latency ranges from 265ms to 390ms (Figure 3). [5]
Fig.3 Weekly usage pattern of the Spotify Service. Data normalization to a 0-1 scale
It would be also useful though to see in action the effect of different data rates to latency. To
simulate different network conditions, Netlimiter was used. Measurements were made on the
same network with the same hardware which practically means that the only thing that
changes is the actual data rate. The outcome of this experimental approach would be to
measure from what streaming bit rate value and below, latency increases to a level that has a
significant effect to the Quality of Service. Wireshark is used to capture the TCP packets and
measure latency at different data rate values.
TABLE II. MEASUREMENTS TAKEN
Data-Rate (KB/s) Latency (ms) Packet Loss (%) Jitter
60 10.350 ~0 -
80 5.364 ~0 -
90 1.934 ~0 -
100 774 ~0 -
150 473 ~0 -
200 276 ~0 -
300 156 ~0 -
400 135 ~0 -
10. 8
Fig.4 Latency(ms) - Bitrate(KB/s) plot
During testing, jitter was not experienced at all and packet loss was 0%. At 80 KB/s the user
experienced only longer buffering time for the first not cached track to start playing. There
were no interruptions and the next tracks played as it should without more waiting time
thanks to Spotify's next track prediction and preloading. Below 70 KB/s latency grows too big
and the overall experience for the user becomes bad because of the waiting times and even
some buffering interruptions during the playback.
IV. QUALITY OF EXPERIENCE REVIEW
While QoS is usually measured by technical parameters, Quality of Experience implements
the human factor. Real life QoE evaluations' results can vary depending on factors that cannot
be easily measured. For example, a person can have different judgment on a service if he uses
it at home and another one if he uses it at work, or two different human personalities can
judge differently a service depending on their personal experience. That is why, longer
experiments with more sampling data to compare can prove to be more reliable.
In Spotify's case, reviews made by simple user's and experts, indicate that Spotify may be the
best music application/service a user can download. In Android's Playstore, 4.536.385 users
have reviewed the Spotify application with an average score of 4,5 stars out of 5. That is a
good indicator that the app provides a very good overall quality of experience to its users. [2]
[12]
Spotify offers a very nice looking easy to use interface with many options and really
interesting features. It has an almost complete music catalogue that grows day by day and
users can have it installed on many devices that synchronize with the users profile.
QoE has direct dependence on QoS but to the user tangible results matter most. Jitter for
example is more important than latency when reviewing QoE. Spotify offers a reliable
network with high QoS and the sound quality is great even when network resources are
limited. To prove that, a comparison between the original CD track and the one played from
Spotify will be made with the use of Audacity. To perfom the comparison, the output
spectrum from the computers soundcard will be analyzed.
11. 9
Fig.5 Waveform Comparison
Fig.6 Spectral Selection Waveform Comparison
Fig.7 Spotify's Spectrum analysis Fig.8 Audio CD's Spectrum analysis
To perform the audio quality test, Audacity was used to record the output (2 channels stereo)
from the computer's soundcard when Spotify was playing and then the same procedure was
repeated when Windows media player played the same song, in the same volume level, from
the original CD. The first two waveforms are the 2 stereo channels recorded from Spotify and
underneeth them the other two stero channels from the original CD recording (Figure 5). The
same applys for the Spectral selection waveform (Figure 6). One of the two waveforms was
inverted to be compared with the other. The two waveforms are almost identical and they also
sound about the same. It's extremely difficult, even for an experienced ear to realize the
difference between the two sounds. In spectrum analysis though a slight difference can be
seen easier.
12. 10
V. CONCLUSION
Spotify is a really successful application. Although Spotify is based on a peer-to-peer
network, it uses clever mechanisms to take advantage of the model's pros and combines with
it the client-server model to eliminate the peer-to-peer model's cons. Spotify's developers have
managed to create a high quality music streaming service free of jitter, lag and latency issues.
As long as there are fair network conditions the user does not experience any of these
problems and even if network resources gets low, the application finds the way to deliver
content while maintaining high levels of QoS and QoE. Latency increases to a level that has a
significant effect to the Quality of Service below 70 KB/s. Theoretically, with the current
internet technology that scenario does not happen very often. The successful use of TCP
instead of UDP, indicates that streaming over TCP can be a very viable option. [10]
The player itself is lightweight, loads quickly, and does a great job on putting local files in the
cloud for access from anywhere. The Premium version adds excellent audio quality and
offline listening. The audio codec used, does a great job with compression and delivers audio
quality that cannot be easily distinguished from the original wav track.
REFERENCES
[1] Orlowski, Andrew. "Spotify, DRM and the celestial jukebox". The Register H. Deshpande, M. Bawa, and H. Garcia-Molina,
“Streaming live media over peers,” Stanford InfoLab, Technical Report 2002-21, 2002. [Online]. Available:
http://ilpubs.stanford.edu:8090/863/ I.S. Jacobs and C.P. Bean, “Fine particles, thin films and exchange anisotropy,” in Magnetism, vol.
III, G.T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271-350.
[2] Spotify.com , Retrieved 21 January 2015.
[3] labs.spotify.com, Retrieved 21 January 2015.
[4] X. Liao, H. Jin, Y. Liu, L. M. Ni, and D. Deng, “AnySee: Peer-to-peer live streaming,” in Proc. of IEEE INFOCOM’06, 2006, pp. 1–
10.Y. Yorozu, M. Hirano, K. Oka, and Y. Tagawa, “Electron spectroscopy studies on magneto-optical media and plastic substrate
interface,” IEEE Transl. J. Magn. Japan, vol. 2, pp. 740-741, August 1987 [Digests 9th Annual Conf. Magnetics Japan, p. 301, 1982].
[5] Gunnar Kreitz, Fredrik Niemelä, " Spotify – Large Scale, Low Latency, P2P Music-on-Demand Streaming", 2010.
[6] "About". Xiph.org. Retrieved 21 January 2015.
[7] Vinton G. Cerf, Robert E. Kahn, (May 1974). "A Protocol for Packet Network Intercommunication" (PDF). IEEE Transactions on
Communications 22 (5): 637–648. doi:10.1109/tcom.1974.1092259. Archived from the original (PDF) on 2015-07-23.
[8] https://gameserverarchitecture.com/2015/10/pattern-client-side-load-balancing/ , Retrieved 21 January 2015.
[9] L. Kalampoukas, A. Varma, and K. K. Ramakrishnan, “Improving TCP throughput over two-way asymmetric links: analysis and
solutions,” SIGMETRICS Perform. Eval. Rev., vol. 26, no. 1, pp. 78–89, 1998
[10] http://www.cisco.com/c/en/us/products/ios-nx-os-software/quality-of-service-qos/index.html , Retrieved 21 January 2015
[11] "Ogg Vorbis". Xiph.Org Foundation. Retrieved 2009-09-11.
[12] https://play.google.com/store/apps/details?id=com.spotify.music&hl=en , Retrieved 21 January 2015.
[13] H. S. Malvar, "Lapped Transforms for Efficient Transform/Subband Coding", IEEE Trans. on Acoustics, Speech, and Signal
Processing, vol. 38, no. 6, pp. 969–978 (Equation 22), June 1990