In this thesis, we present design techniques -- and systems that illustrate and validate these techniques -- for building data-intensive applications over the Internet. We enable the use of a traditional bandwidth-limited server in these applications. A large number of cooperating users contribute resources such as disk space and network bandwidth, and form the backbone of such applications. The applications we consider fall in one of two categories. The first type provide user-perceived utility in proportion to the data download rates of the participants; bulk data distribution
systems is a typical example. The second type are usable only when the participants have data download rates above a certain threshold; video streaming is a prime example.
1. Scaling Data Servers via Cooperative Caching
Siddhartha Annapureddy
New York University
Committee:
David Mazi`res, Lakshminarayanan Subramanian, Jinyang Li,
e
Dennis Shasha, Zvi Kedem
Siddhartha Annapureddy Thesis Defense
2. Motivation
New functionality over the Internet
Large number of users consume many (large) files
YouTube, Azureus etc. for movies – DVDs no more
Data dissemination to large distributed farms
Siddhartha Annapureddy Thesis Defense
3. Current approaches
Server
Client Client Client
Siddhartha Annapureddy Thesis Defense
4. Current approaches
Server
Client Client Client
Scalability ↓ – Server’s uplink saturates
Operation takes long time to finish
Siddhartha Annapureddy Thesis Defense
5. Current approaches (contd.)
Server farms replace central server
Scalability problem persists
Notoriously expensive to maintain
Content Distribution Networks
(CDNs)
Still quite expensive
Peer-to-peer (P2P) systems
A new development
Siddhartha Annapureddy Thesis Defense
6. Peer-to-peer (P2P) systems
Low-end central server
Large client population
File usually divided into chunks/blocks
Use client resources to disseminate file
Bandwidth, disk space etc.
Clients organized into a connected overlay
Each client knows only a few other clients
How to design P2P systems to scale data servers?
Siddhartha Annapureddy Thesis Defense
7. P2P scalability – Design principles
In order to achieve scalability, need to avoid hotspots
Keep bw load on server to a minimum
Server gives data only to few users
Leverage participants to distribute to other users
Maintenance overhead with users to be kept low
Keep bw load on users to a minimum
Each node interacts with only a small subset of users
Siddhartha Annapureddy Thesis Defense
8. P2P systems – Themes
Usability – Exploiting P2P for a range of systems
File systems, user-level apps etc.
Security issues in such wide-area P2P systems
Providing data integrity, privacy, authentication
Cross file(system) sharing – Use other groups
Make use of all available sources
Topology management – Arrange nodes efficiently
Underlying network (locality properties)
Node capacities (bandwidth, disk space etc.)
Application-specific (certain nodes together work well)
Scheduling dissemination of chunks efficiently
Nodes only have partial information
Siddhartha Annapureddy Thesis Defense
9. P2P systems – Themes
Usability – Exploiting P2P for a range of systems
File systems, user-level apps etc.
Security issues in such wide-area P2P systems
Providing data integrity, privacy, authentication
Cross file(system) sharing – Use other groups
Make use of all available sources
Topology management – Arrange nodes efficiently
Underlying network (locality properties)
Node capacities (bandwidth, disk space etc.)
Application-specific (certain nodes together work well)
Scheduling dissemination of chunks efficiently
Nodes only have partial information
Siddhartha Annapureddy Thesis Defense
10. P2P systems – Themes
Usability – Exploiting P2P for a range of systems
File systems, user-level apps etc.
Security issues in such wide-area P2P systems
Providing data integrity, privacy, authentication
Cross file(system) sharing – Use other groups
Make use of all available sources
Topology management – Arrange nodes efficiently
Underlying network (locality properties)
Node capacities (bandwidth, disk space etc.)
Application-specific (certain nodes together work well)
Scheduling dissemination of chunks efficiently
Nodes only have partial information
Siddhartha Annapureddy Thesis Defense
11. P2P systems – Themes
Usability – Exploiting P2P for a range of systems
File systems, user-level apps etc.
Security issues in such wide-area P2P systems
Providing data integrity, privacy, authentication
Cross file(system) sharing – Use other groups
Make use of all available sources
Topology management – Arrange nodes efficiently
Underlying network (locality properties)
Node capacities (bandwidth, disk space etc.)
Application-specific (certain nodes together work well)
Scheduling dissemination of chunks efficiently
Nodes only have partial information
Siddhartha Annapureddy Thesis Defense
12. Shark – Motivation
Scenario – Want to test a new application on a hundred nodes
Workstation Workstation
Workstation Workstation
Workstation Workstation
Workstation Workstation
Workstation Workstation
Problem – Need to push software to all nodes and collect results
Siddhartha Annapureddy Thesis Defense
13. Current approaches
Distribute from a central server
Server Client
Network A Network B
Client Client
rsync, unison, scp
– Server’s network uplink saturates
– Wastes bandwidth along bottleneck links
Siddhartha Annapureddy Thesis Defense
14. Current approaches
File distribution mechanisms
BitTorrent, Bullet
+ Scales by offloading burden on server
– Client downloads from half-way across the world
Siddhartha Annapureddy Thesis Defense
15. Inherent problems with copying files
Users have to decide a priori what to ship
Ship too much – Waste bandwidth, takes long time
Ship too little – Hassle to work in a poor environment
Idle environments consume disk space
Users are loath to cleanup ⇒ Redistribution
Need low cost solution for refetching files
Manual management of software versions
Point /usr/local to a central server
Illusion of having development environment
Programs fetched transparently on demand
Siddhartha Annapureddy Thesis Defense
16. Networked file systems
+ Know how to deploy these systems
+ Know how to administer such systems
+ Simple accountability mechanisms
Eg: NFS, AFS, SFS etc.
Siddhartha Annapureddy Thesis Defense
19. Let’s design this new filesystem
Client Server
Vanilla SFS
Central server model
Siddhartha Annapureddy Thesis Defense
20. Scalability – Client-side caching
Client
Server
Cache
Large client cache ` la AFS
a
Whole file caching
Scales by avoiding redundant data transfers
Leases to ensure NFS-style cache consistency
Siddhartha Annapureddy Thesis Defense
21. Scalability – Cooperative caching
Client
Client
Client Server
Client
Client
With multiple clients, must address bandwidth concerns
Siddhartha Annapureddy Thesis Defense
22. Scalability – Cooperative caching
Clients fetch data from each other and offload burden from server
Client
Client
Client Server
Client
Client
Shark clients maintain distributed index
Siddhartha Annapureddy Thesis Defense
23. Scalability – Cooperative caching
Clients fetch data from each other and offload burden from server
Client
Client
Client Server
Client
Client
Shark clients maintain distributed index
Fetch a file from multiple other clients in parallel
Siddhartha Annapureddy Thesis Defense
24. Scalability – Cooperative caching
Clients fetch data from each other and offload burden from server
Client
Client
Client Server
Client
Client
Shark clients maintain distributed index
Fetch a file from multiple other clients in parallel
Read-heavy workloads – Writes still go to server
Siddhartha Annapureddy Thesis Defense
25. P2P systems – Themes
Usability – Exploiting P2P for a range of systems
Cross file(system) sharing – Use other groups
Security issues in such wide-area P2P systems
Topology management – Arrange nodes efficiently
Scheduling dissemination of chunks efficiently
Siddhartha Annapureddy Thesis Defense
26. Cross file(system) sharing – Application
Linux distribution with LiveCDs
Knoppix
Slax Sharkcd
Knoppix
Slax
Sharkcd
Slax
Sharkcd Knoppix Knoppix
Mirror
Sharkcd
LiveCD – Run an entire OS without using hard disk
But all your programs must fit on a CD-ROM
Download dynamically from server but scalability problems
Knoppix and Slax users form global cache – Relieve servers
Siddhartha Annapureddy Thesis Defense
27. Cross file(system) sharing
Global cooperative cache regardless of origin servers
Slax Knoppix
Client Client
Client Client
Client Client
Client Client
Client Client
Two groups of clients accessing Slax/Knoppix servers
Client groups share a large amount of software
Such clients automatically form a global cache
Siddhartha Annapureddy Thesis Defense
28. Cross file(system) sharing
Global cooperative cache regardless of origin servers
Slax Knoppix
Client Client
Client Client
Client Client
Client Client
Client Client
Two groups of clients accessing Slax/Knoppix servers
Client groups share a large amount of software
Such clients automatically form a global cache
Siddhartha Annapureddy Thesis Defense
29. Cross file(system) sharing
Global cooperative cache regardless of origin servers
Slax Knoppix
Client Client
Client Client
Client Client
Client Client
Client Client
Two groups of clients accessing Slax/Knoppix servers
Client groups share a large amount of software
BitTorrent supports only per-file groups
Siddhartha Annapureddy Thesis Defense
30. Content-based chunking
Client
Client
Client Server
Client
Client
Single overlay for all filesystems and content-based chunks
Chunks – Variable-sized blocks based on content
Chunks are better – Exploit commonalities across files
LBFS-style chunks preserved across versions, concatenations
[Muthitacharoen et al. A Low-Bandwidth Network File System. SOSP ’01]
Siddhartha Annapureddy Thesis Defense
31. P2P systems – Themes
Usability – Exploiting P2P for a range of systems
Cross file(system) sharing – Use other groups
Security issues in such wide-area P2P systems
Topology management – Arrange nodes efficiently
Scheduling dissemination of chunks efficiently
Siddhartha Annapureddy Thesis Defense
32. Security issues – Client authentication
Client
Traditional
Client Client Authentication
Server
Client
Client
Traditionally, server authenticated read requests using uids
Siddhartha Annapureddy Thesis Defense
33. Security issues – Client authentication
Client
Authenticator Derived Traditional
Client From Token
Client Authentication
Server
Client
Client
Traditionally, server authenticated read requests using uids
Challenge – Interaction between distrustful clients
Siddhartha Annapureddy Thesis Defense
34. Security issues – Client communication
Client should be able to check integrity of downloaded chunk
Client should not send chunks to other unauthorized clients
An eavesdropper shouldn’t be able to obtain chunk contents
Accomplish this without a complex PKI
Keep the number of rounds to a minimum
Siddhartha Annapureddy Thesis Defense
35. File metadata
GET_TOKEN
(fh,offset,count)
Client Metadata:
Server
Chunk tokens
Possession of token implies server permissions to read
Tokens are a shared secret between authorized clients
Tokens can be used to check integrity of fetched data
Given a chunk B...
Chunk token TB = H(B)
H is a collision resistant hash function
Siddhartha Annapureddy Thesis Defense
36. File metadata
GET_TOKEN
(fh,offset,count)
Client Metadata:
Server
Chunk tokens
Possession of token implies server permissions to read
Tokens are a shared secret between authorized clients
Tokens can be used to check integrity of fetched data
Given a chunk B...
Chunk token TB = H(B)
H is a collision resistant hash function
Siddhartha Annapureddy Thesis Defense
37. File metadata
GET_TOKEN
(fh,offset,count)
Client Metadata:
Server
Chunk tokens
Possession of token implies server permissions to read
Tokens are a shared secret between authorized clients
Tokens can be used to check integrity of fetched data
Given a chunk B...
Chunk token TB = H(B)
H is a collision resistant hash function
Siddhartha Annapureddy Thesis Defense
38. File metadata
GET_TOKEN
(fh,offset,count)
Client Metadata:
Server
Chunk tokens
Possession of token implies server permissions to read
Tokens are a shared secret between authorized clients
Tokens can be used to check integrity of fetched data
Given a chunk B...
Chunk token TB = H(B)
H is a collision resistant hash function
Siddhartha Annapureddy Thesis Defense
39. File metadata
GET_TOKEN
(fh,offset,count)
Client Metadata:
Server
Chunk tokens
Possession of token implies server permissions to read
Tokens are a shared secret between authorized clients
Tokens can be used to check integrity of fetched data
Given a chunk B...
Chunk token TB = H(B)
H is a collision resistant hash function
Siddhartha Annapureddy Thesis Defense
40. Security protocol
Receiver Source
Client Client
RC , RP – Random nonces to ensure freshness
AuthC – Authenticator to prove receiver has token
AuthC = MAC (TB , “Auth C”, C, P, RC , RP )
K – Key to encrypt chunk contents
K = MAC (TB , “Encryption”, C, P, RC , RP )
Siddhartha Annapureddy Thesis Defense
41. Security protocol
RC
RP
Receiver Source
Client Client
RC , RP – Random nonces to ensure freshness
AuthC – Authenticator to prove receiver has token
AuthC = MAC (TB , “Auth C”, C, P, RC , RP )
K – Key to encrypt chunk contents
K = MAC (TB , “Encryption”, C, P, RC , RP )
Siddhartha Annapureddy Thesis Defense
42. Security protocol
RC
RP
Receiver Source
Client IB, AuthC Client
EK(B)
RC , RP – Random nonces to ensure freshness
AuthC – Authenticator to prove receiver has token
AuthC = MAC (TB , “Auth C”, C, P, RC , RP )
K – Key to encrypt chunk contents
K = MAC (TB , “Encryption”, C, P, RC , RP )
Siddhartha Annapureddy Thesis Defense
43. Security protocol
RC
RP
Receiver Source
Client IB, AuthC Client
EK(B)
RC , RP – Random nonces to ensure freshness
AuthC – Authenticator to prove receiver has token
AuthC = MAC (TB , “Auth C”, C, P, RC , RP )
K – Key to encrypt chunk contents
K = MAC (TB , “Encryption”, C, P, RC , RP )
Siddhartha Annapureddy Thesis Defense
44. Security properties
RC
RP
Receiver Source
Client IB, AuthC Client
EK(B)
Client can check integrity of downloaded chunk
?
Client checks H(Downloaded chunk) = TB
Source should not send chunks to unauthorized clients
Malicious clients cannot send correct AuthC
Eavesdropper shouldn’t get chunk contents
All communication encrypted with K
Siddhartha Annapureddy Thesis Defense
45. P2P systems – Themes
Usability – Exploiting P2P for a range of systems
Cross file(system) sharing – Use other groups
Security issues in such wide-area P2P systems
Topology management – Arrange nodes efficiently
Scheduling dissemination of chunks efficiently
Siddhartha Annapureddy Thesis Defense
46. Topology management
Client
Fetch Metadata
Client Client Server
Client
Client
Fetch metadata from the server
Look up clients caching needed chunks in overlay
Overlay is common to all file systems
Overlay organized like a DHT – Get and Put
Siddhartha Annapureddy Thesis Defense
47. Topology management
Client
Client Lookup Clients Client Server
Client
Client
Fetch metadata from the server
Look up clients caching needed chunks in overlay
Overlay is common to all file systems
Overlay organized like a DHT – Get and Put
Siddhartha Annapureddy Thesis Defense
48. Topology management
Client
Client Lookup Clients Client Server
Client
Client
Fetch metadata from the server
Look up clients caching needed chunks in overlay
Overlay is common to all file systems
Overlay organized like a DHT – Get and Put
Siddhartha Annapureddy Thesis Defense
49. Topology management
Client
Client Lookup Clients Client Server
Client
Client
Fetch metadata from the server
Look up clients caching needed chunks in overlay
Overlay is common to all file systems
Overlay organized like a DHT – Get and Put
Siddhartha Annapureddy Thesis Defense
50. Overlay – Get operation
Client
Client Discover Clients Client Server
Client
Client
For every chunk B, there’s indexing key IB
IB used to index clients caching B
Cannot set IB = TB , as TB is a secret
IB = MAC (TB , “Indexing Key”)
Siddhartha Annapureddy Thesis Defense
51. Overlay – Put operation
Register as a source
Client now becomes a source for the downloaded chunk
Client registers in distributed index – PUT(IB , Addr)
Chunk Reconciliation
Reuse TCP connection to download more chunks
Exchange mutually needed chunks w/o indexing overhead
Siddhartha Annapureddy Thesis Defense
52. Overlay – Locality awareness
Client Client
Network A Network B
Client Client
Overlay organized as clusters based on latency
Preferentially returns sources in same cluster as the client
Hence, chunks usually transferred from nearby clients
[Freedman et al. Democratizing content publication with Coral. NSDI ’04]
Siddhartha Annapureddy Thesis Defense
53. Topology management – Shark vs. BitTorrent
BitTorrent
Server
Neighbours “fixed”
Makes cross file sharing difficult
Client
Shark Client
Chunk-based overlay
Neighbours based on data Client
Client
Enables cross file sharing
Siddhartha Annapureddy Thesis Defense
54. Evaluation
How does Shark compare with SFS? With NFS?
How scalable is the server?
How fair is Shark across clients?
Which order is better? Random or Sequential
Siddhartha Annapureddy Thesis Defense
55. Emulab – 100 Nodes on LAN
100
Percentage completed within time
80
60
40
40 MB read
Shark, rand, negotiation
20 NFS
SFS
0
0 100 200 300 400 500 600 700 800
Time since initialization (sec)
Shark – 88s
SFS – 775s (≈ 9x better), NFS – 350s (≈ 4x better)
SFS less fair because of TCP backoffs
Siddhartha Annapureddy Thesis Defense
56. PlanetLab – 185 Nodes
100
Percentage completed within time
80
60
40
20
40 MB read
Shark
SFS
0
0 500 1000 1500 2000 2500
Time since initialization (sec)
Shark ≈ 7 min – 95th Percentile
SFS ≈ 39 min – 95th Percentile (5x better)
NFS – Triggered kernel panics in server
Siddhartha Annapureddy Thesis Defense
57. Data pushed by server
7400
1.0
Normalized bandwidth 0.5
954
0.0
SFS Shark
Shark vs SFS – 23 copies vs 185 copies (8x better)
Overlay not responsive enough
Torn connections, timeouts force going to server
Siddhartha Annapureddy Thesis Defense
58. Data served by Clients
140
PlanetLab hosts
120 Shark, 40MB
Bandwidth transmitted (MB)
100
80
60
40
20
0
0 20 40 60 80 100 120 140 160 180
Unique Shark proxies
Maximum contribution ≈ 3.5 copies
Median contribution ≈ 0.75 copies
Siddhartha Annapureddy Thesis Defense
59. P2P systems – Themes
Usability – Exploiting P2P for a range of systems
Cross file(system) sharing – Use other groups
Security issues in such wide-area P2P systems
Topology management – Arrange nodes efficiently
Scheduling dissemination of chunks efficiently
Siddhartha Annapureddy Thesis Defense
60. Fetching Chunks – Order Matters
Scheduling problem – What order to fetch chunks of file?
Natural choices – Random or Sequential
Intuitively, when many clients start simultaneously
Random
All clients fetch independent chunks
More chunks become available in the cooperative cache
Sequential
Better disk I/O scheduling on the server
Client that downloads most chunks alone adds to cache
Siddhartha Annapureddy Thesis Defense
61. Emulab – 100 Nodes on LAN
100
Percentage completed within time
80
60
40
40 MB read
Shark, rand
20 Shark, seq
0
0 100 200 300 400 500 600 700 800
Time since initialization (sec)
Random – 133s
Sequential – 203s
Random Wins !!! – 35% better
Siddhartha Annapureddy Thesis Defense
62. Performance characteristics of workloads
For Shark, what matters is throughput
Utility →
Operation should finish as soon as
possible
Throughput →
Another interesting class of applications
Utility →
Video streaming – RedCarpet
Throughput →
Study the scheduling problem for Video-on-Demand
Press a button, wait a little while, and watch playback
Siddhartha Annapureddy Thesis Defense
63. Challenges for VoD
Near VoD
Small setup time to play
videos
High sustainable goodput
Largest slope osculating block
arrival curve
Highest video encoding rate
system can support
Goal: Do block scheduling to achieve low setup time and high
goodput
Siddhartha Annapureddy Thesis Defense
64. System design – Outline
Components based on BitTorrent
Central server (tracker + seed)
Users interested in the data
Central server
Maintains a list of nodes
Each node finds neighbours
Contacts the server for this
Another option – Use gossip
protocols
Exchange data with neighbours
Connections are bi-directional
Siddhartha Annapureddy Thesis Defense
65. Simulator – Outline
All nodes have unit bandwidth capacity
500 nodes in most of our experiments
Each node has 4-6 neighbours
Simulator is round-based
Block exchanges done at each round
System moves as a whole to the next round
File divided into 250 blocks
Segment is a set of consecutive blocks
Segment size = 10
Maximum possible goodput – 1 block/round
Siddhartha Annapureddy Thesis Defense
66. Why 95th percentile?
(x, y)
y nodes have a
500
goodput at
450
least x
400
350
Red line at top
300
shows 95th
Nodes
250
percentile
200
150
95th percentile a
100
good indicator of
50
goodput
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Most nodes
Goodput (in blocks/round) have at least
that goodput
Siddhartha Annapureddy Thesis Defense
67. Feasibility of VoD – What does block/round mean?
0.6 blocks/round with 35 rounds of setup time
Two independent parameters
1 round ≈ 10 seconds
1 block/round ≈ 512 kbps
Consider 35 rounds a good setup time
35 rounds ≈ 6 min
Goodput = 0.6 blocks/round
Max encoding rate = 0.6 ∗ 512 > 300 kbps
Size of video = 250/0.6 ≈ 70 min
Siddhartha Annapureddy Thesis Defense
68. Na¨ scheduling policies
ıve
Segment-Random policy
Divide file into segments
Fetch blocks in random order inside same segment
Download segments in order
Siddhartha Annapureddy Thesis Defense
69. Na¨ approaches – Random
ıve
Each node fetches a
1
block at random
0.9
Throughput – High
Goodput (blocks/round)
0.8
0.7
as nodes fetch
0.6
disjoint blocks
0.5 More opportunities
0.4 for block
0.3 exchanges
0.2
0.1 Goodput – Low as
0
0 10 20 30 40 50 60 70 80 90 100
nodes do not get
Setup Time (in rounds) blocks in order
0 blocks/round
Siddhartha Annapureddy Thesis Defense
70. Na¨ approaches – Sequential
ıve
1
0.9
Goodput (blocks/round)
0.8
0.7 0.2 blocks/round
0.6
0.5
0.4
0.3
Sequential
0.2 Random
0.1
0
0 10 20 30 40 50 60 70 80 90 100
Setup Time (in rounds)
Each node fetches blocks in order of playback
Throughput – Low as fewer opportunities for exchange
Need to increase this
Siddhartha Annapureddy Thesis Defense
71. Na¨ approaches – Segment-Random policy
ıve
1
0.9
Goodput (blocks/round)
0.8
0.4 blocks/round
0.7
0.6
0.5
0.4 Segment-Random
0.3
Sequential
0.2
Random
0.1
0
0 10 20 30 40 50 60 70 80 90 100
Setup Time (in rounds)
Segment-Random
Fetch random block from the segment the earliest block falls in
Increases the number of blocks propagating in the system
Siddhartha Annapureddy Thesis Defense
72. NetCoding – How it helps?
A has blocks 1 and 2
B gets 1 or 2 with
equal prob. from A
C gets 1 in parallel
If B downloaded 1,
link B-C becomes
useless
Network coding routinely sends 1 ⊕ 2
Siddhartha Annapureddy Thesis Defense
73. Network coding – Mechanics
Coding over blocks
B1 , B2 , . . . , Bn
Choose coefficients
c1 , c2 , . . . , cn from finite
field
Generate encoded block:
E1 = n ci × Bi
i=1
Without n coded blocks, decoding cannot be done
Setup time at least the time to fetch a segment
[Gkantsidis et al. Network Coding for Large Scale Content Distribution. InfoCom ’05]
Siddhartha Annapureddy Thesis Defense
74. Benefits of network coding
1
0.9
Goodput (blocks/round)
0.8
0.7
0.6
Segment-Netcoding
0.5 Segment-Random
0.4 Sequential
0.3 Random
0.2
0.1
0
0 10 20 30 40 50 60 70 80 90 100
Setup Time (in rounds)
Goodput
≈ 0.6 blocks/round with netcoding
0.4 blocks/round with seg-rand
Siddhartha Annapureddy Thesis Defense
75. Segment scheduling
Netcoding good for fetching blocks in a segment, but cannot
be used across segments
Because decoding requires all encoded blocks
Problem: How to schedule fetching of segments?
Until now, sequential segment scheduling
Results in rare segments when nodes arrive at different times
Segment scheduling algorithm to avoid rare segments
Evaluation with C# prototype
Siddhartha Annapureddy Thesis Defense
76. Segment scheduling
A has 75% of the file
Flash crowd of 20 Bs join
with empty caches
Server shared between A
and Bs
Throughput to A
decreases, as server is
busy serving Bs
Initial segments served
repeatedly
Throughput of Bs low
because the same
segments are fetched
from server
Siddhartha Annapureddy Thesis Defense
77. Segment scheduling – Problem
8 segments in the video; A has 75% = 6 segments, Bs have
none
Green is popularity 2, Red is popularity 1
Popularity = # full copies in the system
Siddhartha Annapureddy Thesis Defense
78. Segment scheduling – Problem
Instead of serving A...
Siddhartha Annapureddy Thesis Defense
79. Segment scheduling – Problem
The server gives segments to Bs
Siddhartha Annapureddy Thesis Defense
80. Segment scheduling – Problem
100
90
80
70
60
# Data Blocks
50
40
A
30
Bs
20
10
0
0 25 50 75 100 125 150 175 200 225 250
Time (in sec)
Naive -- #Blks Downloaded (A)
Naive -- #Blks Downloaded (B1..n)
A’s goodput plummets
Get the best of both worlds – Improve A and Bs
Idea: Each node serves only rarest segments in system
Siddhartha Annapureddy Thesis Defense
81. Segment scheduling – Algorithm
When dst node connects to the src node...
Here: dst is B, src is server
Siddhartha Annapureddy Thesis Defense
82. Segment scheduling – Algorithm
src node sorts segments in order of popularity
Segments 7, 8 least popular at 1
Segments 1-6 equally popular at 2
Siddhartha Annapureddy Thesis Defense
83. Segment scheduling – Algorithm
src considers segments in sorted order one-by-one and serves
dst
Either completely available at src, or
First segment required by dst
Siddhartha Annapureddy Thesis Defense
84. Segment scheduling – Algorithm
Server injects blocks from segment needed by A into system
Avoids wasting bandwidth in serving initial segments multiple
times
Siddhartha Annapureddy Thesis Defense
85. Segment scheduling – Popularities
How does src figure out popularities?
Centrally available at the server
Our implementation uses this technique
Each node maintains popularities for segments
Could use a gossiping protocol for aggregation
Siddhartha Annapureddy Thesis Defense
86. Segment scheduling
100
90
80
70
60
# Data Blocks
50
40
30
20
Naive -- #Blks Downloaded (A)
Naive -- #Blks Downloaded (B1..n)
10 #Blks Downloaded (A)
#Blks Downloaded (B1..n)
0
0 25 50 75 100 125 150 175 200 225 250
Time (in sec)
Note that goodput of both A and Bs improves
Siddhartha Annapureddy Thesis Defense
87. Conclusions
Problem: Designing scalable data-intensive applications
Enabled low-end servers to scale to large numbers of users
Shark: Scales file server for read-heavy workloads
RedCarpet: Scales video server for near-Video-on-Demand
Exploited P2P for file systems, solved security issues
Techniques to improve scalability
Cross file(system) sharing
Topology management – Data-based, locality awareness
Scheduling of chunks – Network coding
Siddhartha Annapureddy Thesis Defense