2. 1196 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 18, NO. 4, AUGUST 2010
delay performance, but also has the potential to do so in the
face of realistic network impairments, such as long propagation
delays and random bandwidth fluctuations. The delay bounds
derived in our analysis can serve as delay performance bench-
marks for various proposed/deployed P2P streaming systems.
Insights brought forth by the study of the snowball streaming
algorithm can be used to guide the design of new P2P streaming
systems with shorter startup delays and playback lags.
The paper is organized as follows. In Section II, we provide
a short overview on the existing P2P streaming solutions. The Fig. 1. Balanced multi-tree-based streaming. (a) Seven-nodes example. (b) Hi-
bounds on the delay for a single chunk dissemination is estab- erarchical view.
lished in Section III for both homogeneous and heterogeneous
P2P network environments. A snowball chunk dissemination al-
gorithm is introduced to achieve the delay bound for a single (including themselves). In terms of time, after transmissions
chunk dissemination. In Section IV, we show that the snow- by peer 0, the task of disseminating a chunk to peers becomes
ball chunk dissemination algorithm can be extended to a snow- subtasks of disseminating the chunk to peers.
ball streaming algorithm to achieve the delay bounds in contin-
uous video streaming. The performance of the snowball chunk C. Mesh-Based Streaming
dissemination algorithm under realistic network environment is The management of streaming trees is challenging in face
studied in Section V. A centralized dynamic snowball streaming of frequent peer churns. Mesh-based streaming systems are
algorithm is presented in Section VI. Through simulations, we more robust against peer dynamics. Many recent P2P streaming
demonstrate that the dynamic snowball streaming algorithm can systems adopt mesh-based streaming approach [28], [18], [25],
approach the minimum delay bounds in highly variable network [17], [27]. In a mesh-based system, there is no static streaming
environments with a small peer upload bandwidth overhead. topology. Peers establish and terminate peering relationship
The paper is concluded with future work in Section VII. dynamically. A peer may download/upload video from/to
multiple peers simultaneously. However, in practice, the delay
II. BACKGROUND AND RELATED WORK performance of mesh-based streaming is still not satisfactory.
Existing P2P streaming solutions can be classified into the One important motivation of the study presented in this paper
following categories. is to provide some guidelines for the design of peering strate-
gies and chunk scheduling schemes for mesh-based streaming
A. Single-Tree Streaming systems to achieve better delay performance.
In a single-tree-based approach, peers form a tree topology at
the application layer, with the video source server as the root. D. Related Work on Delay Performance
Each peer receives the stream from its parent peer and forwards Despite P2P streaming systems’ popularity, few studies
to its children peers. The fan-out degree of a peer is limited by its have addressed their delay performance analytically. One
uploading bandwidth. An early example is Overcast [10]. One related work was presented in [22]. Authors of [22] studied
major drawback of the single-tree approach is that all the leaf the tradeoff between the server bandwidth cost, the maximum
nodes do not contribute their uploading bandwidth. Since leaf number of peers that can be supported, and the minimum
nodes account for a large portion of peers in the system, this number of streaming hops experienced by a peer. We study
largely degrades the peer bandwidth utilization efficiency. the optimal streaming strategy when the server only plays a
minimum role in video uploading. The delay bounds obtained
B. Balanced Multi-Tree Streaming through our analysis are much tighter than those predicted in
To solve the leaf nodes problem, multi-tree-based approaches [22] and can be achieved by the proposed snowball streaming
have been proposed [4], [11]. In balanced multi-tree streaming, algorithm. A recent paper [14] studied the minimum tree depth
the server divides the stream into substreams. Instead of one of multi-tree-based streaming as a function of server and peer
streaming tree, subtrees are formed, one for each substream. bandwidth and peer degree. They assumed that video can be
In a fully balanced multi-tree streaming, the node degree of each divided infinitely into substreams like fluid. Consequently,
subtree is . Each peer joins all subtrees to retrieve substreams. the chunk transmission delay was not considered. Authors
A single peer is positioned on an internal node in only one tree of [21] proposed a heuristic algorithm to build low-delay
and only uploads one substream to its children peers in that overlay mesh for P2P live streaming. The delay between two
tree. In each of the remaining subtrees, the peer is posi- peers at the overlay level is the end-to-end propagation delay
tioned on a leaf node and downloads a substream from its parent along the underlay path between the two. Again, the chunk
peer. Fig. 1(a) shows an example of two-tree streaming for seven transmission delay was not take into consideration. Different
peers. For balanced multi-tree streaming, a chunk is dissemi- from those works, we study the delay bounds for chunk-based
nated in a hierarchical way. As illustrated in Fig. 1(b), for an P2P streaming where the chunk transmission delays are not
-degree tree of peers, peer 0 sends a chunk to its chil- negligible, compared to chunk transmission delays. We develop
dren peers at level 1, each of which is then responsible for dis- continuous P2P streaming algorithms to schedule the transmis-
seminating the chunk in its own subtree with peers sions of chunks to approach the minimum delay bounds. After
3. LIU: DELAY BOUNDS OF CHUNK-BASED PEER-TO-PEER VIDEO STREAMING 1197
the conference version of this paper [15], a recent work [2] If there are peers, the number of levels of each subtree is
also studied the delay bound of chunk-based P2P streaming by . The only peer at level 0 downloads
modeling the diffusion process using difference equations. We the chunk directly from the server, and a peer at level then up-
systematically study P2P delay bounds considering peer het- loads a video chunk to children peers at level . Let be
erogeneity, random propagation delay, and upload bandwidth. the number of peers at level . Then, ,
The tightness of delay bounds in dynamic network environment and . Since each peer/server only has up-
is also established by a centralized streaming algorithm. loading bandwidth of 1, if the uploading is done in parallel, all
children peers of one peer will receive the chunk time slots
III. BOUND ON SINGLE-CHUNK DISSEMINATION after their common parent receives the chunk. The peer at the
top level can always receive the chunk from the server after one
In a P2P live video streaming session, a sequence of video time slot. For parallel uploading, the peers at the very bottom
chunks are continuously generated by the server and dissemi- level will receive the chunk in time slots. The
nated to all peers in the session. The streaming delay is deter- average delay among all peers is
mined by how fast all chunks can be delivered to peers. In this
section, instead of developing the streaming delay bound, we
(1)
assume that there is only one chunk to be disseminated in a P2P
video system and develop the delay bound for the single-chunk
dissemination. Obviously, the single-chunk delay bound is a When is large, the average delay and the worst-case delay are
lower bound for streaming delay. We will generalize the anal- both of the form
ysis for the single-chunk dissemination to continuous streaming
in Section IV. (2)
Given a P2P system with a server and peers, one can an-
swer the question: If the server generates a chunk of content at To achieve the shortest delay, one can choose tree degree
time , how does one disseminate that chunk to all peers
in the shortest time possible? The answer depends on the size
of the chunk, available bandwidth, and the propagation delays
among all nodes in the system, including the server and all peers. i.e., the server divides the stream into three substreams and
Without loss of generality, we can normalize the chunk size to feeds each stream into one subtree with node degree of 3.
be one and choose the video streaming rate as the bandwidth The minimum delay, in both average and worst-case sense, is
unit. Consequently, the chosen time unit after the normalization .
equals to the chunk transmission time on a unit bandwidth link, If the uploading is done sequentially, the first child peer will
which in turn equals to the average playback time of video con- receive the chunk from its parent within one time slot, and the
tained in a chunk. For now, let us assume the propagation delay last child of a peer will receive the chunk after time slots. The
between any two nodes is dominated by the chunk transmission longest delay at level is still . Therefore, the worst-case
delay and thus can be ignored. We will take propagation delays delay is still . A degree of 3 can achieve the
into account in Sections V-A and VI when the chunk transmis- minimum worst-case delay of . The average
sion delay becomes small. delay at level is time slots more than the average
delay at level . The peer at the top level can always receive
A. Homogeneous Case the chunk from the server after one time slot. We can calculate
the average delay among all peers as
We start with a homogeneous case where the server and all
peers have upload bandwidth of 1. Each peer uploads and down-
loads at the same rate, and the whole P2P streaming system is
self-scalable. Throughout this paper, we assume all peers have
enough download bandwidth to receive the whole video stream. Again, when is large, the average delay is
Therefore, the download part is never a bottleneck in our anal-
ysis. We further assume that the server will upload only one copy
of the chunk to one peer and will not participate in the chunk dis-
semination afterward. When the tree degree is 4, the average delay is minimized to
1) Single-Tree Chunk Dissemination: Given the unit band- , which is less than of the average delay
width on all peers, a peer can only have one child. The only pos- of parallel uploading.
sible single-tree-based streaming solution is a chain: The server 3) Snowball Chunk Dissemination: For single-chunk dis-
uploads the chunk to peer 0, then peer 0 uploads it to peer 1, and semination, peers only need to disseminate one chunk instead of
so on until peer uploads it to peer . The chunk propa- a continuous stream of chunks. After downloading the chunk, a
gates along the chain from the server to all peers in time . The peer can keep uploading that chunk to other peers until all peers
average delay is . receive it. This will largely reduce the chunk dissemination time.
2) Multi-Tree Chunk Dissemination: If the multi-tree ap- The accumulation of the aggregate uploading bandwidth for the
proach with degree is employed, a chunk propagates from chunk mimics the formation of a snowball. We refer to it as the
the server to all peers along a subtree with node degree of . snowball chunk dissemination approach. Fig. 2(a) illustrates the
4. 1198 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 18, NO. 4, AUGUST 2010
TABLE I
MINIMUM DELAY ACHIEVED BY DIFFERENT STREAMING STRATEGIES FOR HOMOGENEOUS CASE
Proof: See proof of Theorem 1 in [16].
Table I compares the delay performance of snowball chunk
dissemination with tree-based and the optimal multiple-tree ap-
proach. For a system of 1024 peers, if the transmission delay of
a chunk is 0.2 s, it takes only 2 s for the snowball approach to
complete chunk dissemination to all peers, while the minimum
delay achieved by the multi-tree approach is 3.78 s. Since the
single-tree approach degrades to a chain, peers’ average delay
is around 100 s.
As detailed in [15], if the server bandwidth is increased from
1 to , one can divide peers into clusters, and let the
server upload the chunk to one peer in each cluster within one
Fig. 2. Snowball chunk dissemination. (a) Eight nodes. (b) Recursive view. time slot. The delay bound can be reduced by a constant of
to . However, if all peers’ upload band-
width is increased to , with sequential chunk uploading be-
progress of snowball chunk dissemination for eight peers. An tween peers, each chunk can be uploaded within time slot.
arc from node to node with a label represents peer (or Consequently, the delay bound can be reduced to .
the server) uploading the chunk to peer in time slot . The In the snowball approach, peers who receive the chunk in
server uploads the chunk to peer 0 in time slot 0. In time slot 1, the th time slot upload the chunk for times, and the
peer 0 uploads the received chunk to peer 1. In time slot 2, both peers who receive the chunk in the last time slot (about half
peer 0 and peer 1 will upload the chunk to peers 2 and 3, re- of the peers) do not get a chance to upload the chunk to other
spectively. Peers 0,1,2,3 will upload the chunk to peers 4,5,6,7 peers. Their uploading bandwidth can be utilized to upload other
in time slot 3. All peers receive the chunk after four time slots. chunks in continuous video streaming when multiple chunks are
For a general case, the snowball approach disseminates a in transition simultaneously. We will further show in Section IV
chunk in a recursive way. As illustrated in Fig. 2(b), after peer 0 that the snowball chunk dissemination can be extended to snow-
sends a chunk to peer 1, the task of disseminating a chunk to ball continuous streaming to continuously disseminate a stream
peers becomes two subtasks of disseminating the chunk to of chunks, and the worst-case delay for each chunk is still
peers. Peer 0 continues to lead one subtask, and peer 1 becomes . The snowball streaming in Section IV is designed in
the leader for the other subtask. Even though the task-splitting an optimal way such that the uploading bandwidth of all peers
degree is only 2, compared to degree in Fig. 1(b), it happens is fully utilized to achieve the minimum delay bound for each
after only one chunk transmission instead of transmissions in chunk.
Fig. 1(b). We will show that this is indeed the fastest branching
process. B. Heterogeneous Cases
Let denote the number of peers that have the chunk at In a real network environment, different peers have different
the beginning of time slot . In time slot 0, the server uploads types of network access with different upload bandwidth. The
the chunk to one peer; therefore, . Afterward, every chunk dissemination delay is determined by how quickly peers’
peer with the chunk will upload it to another peer in one time bandwidth can be utilized to upload the chunk. We define the
slot, and we have . Therefore, it system-wide usable uploading bandwidth for the chunk as
takes time slots for all peers to receive the aggregate uploading bandwidth that can be utilized to upload
the chunk. One peer receives the chunk after one time slot, the chunk at any time . In the homogeneous case, every peer
peers receive the chunk after time slots , and has the same uploading bandwidth. is proportional to the
peers receive the chunk after time slots. The number of peers with the chunk . The order at which peers
average delay performance is receive the chunk has no impact on how grows over time.
However, in a heterogeneous environment, the order at which
peers receive the chunk determines the growth speed of
and consequently the chunk dissemination delay. For the quick
growth of , the intuition is to upload the chunk to peers with
If , the average delay is . large uploading capacities first.
Theorem 1: In a homogeneous P2P streaming system, In this section, we study the impact of uploading bandwidth
the snowball chunk dissemination approach simultaneously heterogeneity among peers on the chunk dissemination delay
achieves the minimum average peer delay and the minimum by studying several typical cases. It will become clear that the
worst-case peer delay. peer uploading bandwidth heterogeneity enables the snowball
5. LIU: DELAY BOUNDS OF CHUNK-BASED PEER-TO-PEER VIDEO STREAMING 1199
approach to achieve a shorter chunk dissemination delay than three levels. Another insight obtained from this example is that:
the homogeneous case. Peers should be organized into tiers according to their uploading
1) Case 1: Super Peers and Free-Riders: Suppose there are bandwidth, peers within each tier should help each other to ob-
super peers that can upload at rate . All the re- tain the chunk in the shortest possible time, then pass it down to
maining peers are free-riders and do not participate in the up- the neighboring lower tier. This way, the delay performance of
loading. The chunk can be disseminated by the snowball ap- the whole system can be reduced.
proach to all super peers within time 3) General Heterogeneous Case: For general heterogeneous
slots. Then, all super peers can upload the chunk to the re- cases, one can index peers according to the decreasing order of
maining free-riders in additional time their uploading capacities. Suppose the sorted uploading capac-
slots. The total delay is . In this case, the average ities of peers are . To derive a lower bound on the
uploading bandwidth of peers are . If all peers have the av- shortest chunk dissemination time, let us allow infinitely fine
erage uploading bandwidth 1, the shortest delay is , chunk stripping, namely, multiple peers can upload different bits
which is around times of the heterogeneous case. This shows of a chunk to the same peer simultaneously. If the first peers
that the heterogeneity of peer uploading bandwidth helps reduce have the chunk at time , the uploading to peer can finish
the chunk dissemination delay. by ; therefore, the lower delay bound can be calculated
2) Case 2: Multilevel Bandwidth Hierarchy: In the previous as
case, peers form a two-level hierarchy according to their up-
loading contribution. A fraction of super peers with up-
loading bandwidth stay at the top level and feed video chunk
to the free-riders at the bottom level. In real network environ-
ment, peers can be clustered based on the types of their net- However, this is a loose bound. For example, for the homoge-
work access. In this case, we extend the two-level hierarchy to neous case, the bound is .
accommodate multiple levels and show that even a very small We know the shortest delay without chunk stripping is instead
percentage of super peers can bootstrap the chunk dissemina- . In [15], we developed several variations of the
tion. snowball chunk dissemination algorithm to achieve short delay
Suppose there are super peers with bandwidth in the general heterogeneous case. Due to the space limit, we
medium peers with bandwidth , and slow peers refer interested readers to [15] for more details.
with bandwidth . To quickly disseminate the chunk to all
peers, the following chunk scheduling algorithm can be em- IV. SNOWBALL STREAMING
ployed: In single-chunk dissemination, any peer can be utilized to up-
1) Use the snowball algorithm to upload to super peers load the chunk after it has downloaded the chunk. In continuous
within time . streaming, one new chunk is generated by the server each time
2) Each of those super peers acts as a server with band- slot. When the server capacity is less than , one chunk cannot
width and uploads to other medium peers. As be disseminated to all peers within one time slot. Therefore,
studied in Section III-A, the uploading can finish within there will be more than one chunk in transition at any given time.
time . Now, medium peers have If is the minimum transmission delay for a single chunk,
the chunk. there will be at least chunks in transition at any given time.
3) Each of those medium peers acts as a server with If the chunk scheduling is not set up appropriately, some chunks
bandwidth and uploads to other slow peers within cannot be disseminated to all peers within time slots.
time . Now, slow peers have the
A. Homogeneous Environment
chunk.
The total delay is In this section, we show that for the homogeneous case, there
exists a chunk streaming schedule such that all chunks can be
disseminated to all peers within the minimum chunk delay time.
In the snowball chunk dissemination approach, the server up-
loads the chunk to the first peer at time slot 0. Before the begin-
Without those super and medium peers, the fastest chunk dis- ning of time slot , all peers will receive
semination to slow peers takes time the chunk. Let be the number of peers with the chunk at
. the beginning of time slot that will upload that chunk in time
This suggests that the existence of super peers (even if a very slot . We have
small percentage) can dramatically reduce the chunk dissemina-
tion delay. For example, it takes at least 15 time slots to dissemi-
nate a chunk to peers with bandwidth 1. Meanwhile, .
if and ,
in other words, 32 (only 0.1%) of them have bandwidth of 10 We call the snowball chunk
and 1024 (only 3%) of them have bandwidth of 5, the time to dissemination profile. To achieve the minimum chunk delay,
disseminate a chunk to all peers is less than 5.2 time slots. all chunks have to be disseminated according to the optimal
The example can be easily extended to incorporate more than profile .
6. 1200 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 18, NO. 4, AUGUST 2010
Theorem 2: For a homogeneous P2P streaming system, there to those peers and finish the upload of chunk . Peers in
exists a continuous streaming schedule such that all chunks in can be used to upload other chunks in time slot
the stream will be disseminated to all peers with the shortest . We set . Then,
delay achieved by the snowball algorithm in the single- .
chunk dissemination. II. If , chunk will be uploaded for the
Proof: Without loss of generality, the server uploads chunk second-to-last time in slot . According to
to some peer at time slot . Let be the number peers in set will upload chunk to other
of peers that have chunk and will upload chunk to other peers that do not have chunk . In addition, the schedule
peers at time slot . For any feasible schedule, we should have should guarantee that there will be peers avail-
, i.e., at any time slot the aggregate up- able in time slot to upload chunk .
loading bandwidth for all chunks is at most , and If , let each peer in upload
, i.e., each peer can upload to at most one peer within any chunk to any peer without chunk , then pick
time slot. A streaming schedule can achieve the optimal delay peers out of to form the set of peers to upload
for each chunk if and only if each chunk can be uploaded chunk in next time slot, i.e., . Other peers
according to the snowball chunk dissemination profile after in can be used to upload other chunks in time slot
it is uploaded to some peer by the server, i.e., . We set . We
have .
If , from step 1,
otherwise. , we can take a subset of
peers out of , and let
It can be verified that such a schedule satisfies the feasibility peers in upload chunk to peers in . Re-
constraints maining peers in then upload chunk to arbitrary
peers without chunk . Now, peers in are ready
to upload chunk in time slot ; therefore, we set
.
We also have .
and . To complete the proof, for each time III. Let . Any chunk
slot, we need to construct a uploading schedule for all active , needs to be uploaded to peers by peers in set
chunks. Let be the set of all peers. Denote by the set of . We have
peers with chunk at the beginning of time slot that will upload
the chunk to other peers without chunk in the time slot.
To follow the optimal dissemination profile , it is sufficient to
have , and are pairwise-disjoint
(since each peer can only upload one chunk in one time slot). We (3)
call the previous condition the sufficient condition to achieve
the minimum delay streaming. We complete the proof of the Then , take a subset of peers
theorem by constructing a chunk uploading schedule for each out of , let all peers in upload chunk to peers
time slot through inductions. in , and set
Initial Condition: The server uploads chunk 0 to peer 0 in . At the end, due to (3), we will have
time slot 0. Therefore, at the beginning of time slot 1, .
, and . It can be easily verified that the IV. The server uploads chunk to some peer in ,
sufficient condition is satisfied at the beginning of time slot 1. and set .
Induction: If at the beginning of time slot , the con- Following the previous scheduling steps, the sufficient con-
dition is satisfied, we can construct a schedule in time slot dition will be satisfied at the beginning of time slot .
such that is still satisfied at the beginning of time slot . Conclusion: There exists a schedule such that all chunks can
At the beginning of time slot , according to be disseminated with snowball chunk dissemination profile
is the ID of the oldest chunk that needs to and achieve the optimal delay .
be uploaded in time slot . Then, ; Fig. 3 illustrates the previous snowball streaming schedule in
are pairwise-disjoint, . a system with eight peers. We use a sequence of eight subfigures
Define a set , i.e., the set of peers that to show the snowball streaming schedule among all peers within
do not need to upload any chunk at the beginning of time slot eight consecutive time slots. Blocks represent chunks, and cir-
. The following scheduling will guarantee the condition is cles represent peers. For time slot , a white chunk beside a peer
still satisfied at the beginning of time slot . is the chunk that the peer has and will be uploaded to another
I. If , chunk will be uploaded for peer within that time slot. An arc from peer to indicates that
the last time in slot . Since the chunk has been uploaded peer uploads its chunk to peer . A black chunk beside a peer
times by the server and peers in the indicates that the server will inject that chunk to the peer in time
previous time slots, only peers do slot . Chunk 0 is uploaded to all peers by the end of time slot 3,
not have it. Let all peers in set upload chunk and chunk 1 is uploaded to all peers by the end of time slot 4.
7. LIU: DELAY BOUNDS OF CHUNK-BASED PEER-TO-PEER VIDEO STREAMING 1201
Fig. 3. Evolution of chunk scheduling of snowball streaming among eight Peers. All chunks are delivered to all peers three time slots after they are injected by
the server. (a) Time 0; (b) Time 1; (c) Time 2; (d) Time 3; (e) Time 4; (f) Time 5; (g) Time 6; (h) Time 7.
The example shows that all chunks can be disseminated to all V. IMPACT OF NETWORK IMPAIRMENTS ON SINGLE-CHUNK
peers within the minimum chunk dissemination delay bound. DISSEMINATION
In real networks, the performance of P2P video streaming
is subject to various network impairments. In this section, we
B. Heterogeneous Environment evaluate the performance of the snowball chunk dissemination
in network settings with long propagation delays and random
For the heterogeneous case, the delay bound for single-chunk bandwidth variations.
dissemination cannot always be achieved in continuous
A. Impact of Propagation Delays
streaming. For example, if the server’s upload capacity is 1,
and seven peers’ upload capacities are 2, 1, 1, 1, 1, 1, and From the analysis in the previous sections, using smaller
0, following the snowball chunk dissemination approach in chunks in P2P video streaming leads to smaller chunk trans-
Section III, a single chunk can be disseminated to the seven mission delay and consequently smaller overall dissemination
peers in three time slots. However, no streaming algorithm can latency. On the other hand, using smaller chunks increases
achieve this. If peer 0 is still uploading chunk 0 at time slot 2, the signaling overhead and the scheduling complexity among
chunk 1 cannot be uploaded according to the greedy chunk peers. Meanwhile, as the chunk transmission delay getting
profile . In this case, the first peer with bandwidth 2 becomes smaller, the propagation delay between peers will play a more
the scheduling bottleneck for adjacent chunks. For the two important role. We still use the transmission time of a chunk as
special heterogeneous cases considered in Section III-B, we are the time unit. Now, suppose the propagation delay is
able to prove the existence of snowball streaming to achieve time slots . The time between when a sender begins to
the minimum chunk dissemination delay for all chunks. upload the chunk and when the receiving peer gets the whole
Theorem 3: For a P2P streaming system with super chunk is time slots. For the multi-tree approach, if parallel
peers and free-riders, there exists a continuous uploading is employed, the chunk transmission delay from a
streaming schedule such that all chunks in the stream will be dis- peer to all its children increases from to , and the
seminated to all peers within a delay of time slots. delay performance is . If sequential uploading is
Proof: See proof of Theorem 3 in [16]. employed, the worst-case delay is still , and the
Corollary 4: If peers in a streaming system form an -level average delay is .
hierarchy with peers on level with uploading ca- Again, denote by the number of peers with the chunk at
pacity of , there exists a continuous the beginning of time slot . All the chunks received right before
streaming schedule such that chunks can be streamed to all peers the beginning of time slot were sent out at the beginning of
with a delay of , where . time slot . Therefore, we have
Proof: See proof of Corollary 4 in [16].
Examples of snowball streaming schedule in heterogeneous
environment can be found in the technical report [16]. (4)
8. 1202 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 18, NO. 4, AUGUST 2010
TABLE II variations, the available bandwidth on a connection between two
MINIMUM DELAY ACHIEVED BY DIFFERENT STREAMING STRATEGIES WITH peers varies over time. Consequently, the transmission time of a
PROPAGATION DELAYS
chunk is not constant. In this section, we investigate the robust-
ness of different streaming strategies against the randomness in
chunk transmissions.
For the clarity of presentation, we assume all transmission
delays are independent and follow the same distribution. We
introduce random variable for sequential transmission time,
with and ; the -parallel transmission
where is a Fibonacci series with time lag ( is the time is , with and .
standard Fibonacci series). We can solve by taking Z-trans- For the chain-based approach, we have shown in [15] that
form on (4) mean and variance of the worst-case delay is
For the average delay among all peers, we have
where is the largest root of the denominator. The finish time
is approximately , which is times of the
snowball delay without propagation delay.
We compare the delay performance of multi-tree-based
strategies and the snowball strategy in Table II. The delay This suggests that, in a chain topology, the impact of the ran-
performance is measured in the unit of the average delay of domness of individual chunk transmission on the average and
snowball approach when there is no propagation delay. For worst-case chunk delay performance of all peers is proportional
parallel multi-tree strategy, we fix the node degree to the number of peers.
that minimizes the average and worst-case delay when there For the parallel multi-tree approach, all peers at the bottom
is no propagation delay. For sequential multi-tree strategy, level will receive the chunk after independent parallel
at different propagation delays, the node degree is optimized chunk transmissions. Then, we have for the worst-case delay
for the average delay. The associated worst-case delay is also
calculated. As the propagation delay increases, the delay perfor-
mance of all three strategies degrades. For parallel multiple tree For the sequential multi-tree approach, there is one peer at the
with fixed degree, its delay increases fastest among the three. bottom level that will receive the chunk after inde-
As the propagation delay increases, the optimal node degree pendent sequential chunk transmissions. We have for worst-case
for sequential multi-tree also increases. This is because prop- delay
agation delays provide additional chance for pipelining chunk
transmissions from a peer to its children. Sequential multi-tree
strategy explores this pipelining gain. It increases node degree,
and a peer will spend more time to upload the same chunk to all Therefore, the mean and variance of the worst-case delay for
its children. This makes it closer to the uploading philosophy multi-tree-based approaches are proportional to .
of snowball streaming: A peer should keep uploading the same As detailed in [15], we also calculated the mean and vari-
chunk until all peers have it. As a result, sequential multi-tree ance of the average delay performance for multi-tree-based ap-
has better delay performance than the fixed-degree parallel proaches using recursions. It was shown that for both parallel
multi-tree. However, its worst-case delay performance is much and sequential multi-tree, and are the
worse than that of the snowball approach. This is because leaf same as the deterministic case. The variance of the average delay
peers at the bottom level have large delay variations. Leaf performance for parallel and sequential uploading are,
peers do not contribute to the uploading of the chunk even if
they receive the chunk early. To the contrary, in the snowball
approach, a peer always contributes to the uploading as long as
the chunk is still missing on some peers. In both cases, the impact of the variability of individual trans-
Analysis and simulation for single-chunk dissemination missions on the average delay performance is independent of
under random propagation delays can be found in our technical the number of peers. Also, the average delay variance will not
report [16]. diminish as grows. This is due to the variability at the first
few transmission steps will affect almost all peers.
B. Impact of Bandwidth Variations In the proposed snowball approach, a peer will keep up-
In previous sections, we assume that peers have constant up- loading a chunk until all peers have the chunk. Within one time
loading bandwidth and a chunk transmission completes in con- period, a peer that has more bandwidth will upload to more
stant time. In sequential transmission, a chunk can be trans- peers than a peer with less bandwidth. Over time, the workload
mitted from a peer to another peer in one time slot. In parallel of a peer is naturally adaptive to its bandwidth: uploads more if
transmission with degree , a peer can transmit a chunk simul- it has more bandwidth; uploads less if its bandwidth reduces.
taneously to children in time slots. Due to network traffic As for the recursive view in Fig. 2(b), due to the workload
9. LIU: DELAY BOUNDS OF CHUNK-BASED PEER-TO-PEER VIDEO STREAMING 1203
self-adaptiveness, the number of peers in each subtree is no delay bound. The DSB algorithm is developed as a centralized
longer . What remains to be true is that the uploading in both streaming algorithm. We defer the distributed implementation
subgroups will finish around the same time. of DSB algorithms to future work.
To further illustrate, let us assume the chunk transmission
time between two peers follows exponential distribution with A. Dynamic Snowball (DSB) Streaming Algorithm
mean 1. Denote by the time interval between the time in- The philosophy of DSB streaming algorithm follows the
stants when the th and the th peers receive the chunk. static snowball streaming algorithm. DSB aims at pushing
is the transmission time from peer 0 to peer 1; it is an ex- out older chunks as quickly as possible to reduce the chunk
ponential random variable with rate 1. For , due to the dissemination delays as well as the number of active chunks
memoryless property of exponential distribution, follows an in transition in the system. At the same time, DSB should also
exponential distribution with rate . Therefore, the worst-case make sure that newer chunks get enough peer upload bandwidth
delay is . We have access to quickly grow the usable upload bandwidth for them.
In a static network environment, as studied in Section IV, these
two seemingly conflicting objectives can be simultaneously
achieved by employing a carefully calculated chunk upload
schedule among peers. The challenge for DSB streaming in a
The expected chunk dissemination finish time is only dynamic network environment is that the chunk transmission
% of the deterministic case. Due to the constant bounded complete time is not predictable. Therefore, there is no optimal
delay variance, for large , the snowball algorithm has better static streaming schedule that can achieve the minimum delay
delay performance in the exponential case than in the determin- bound for all chunks in a video stream. Instead, our DSB
istic case. Similarly, we have shown in [15] that the average algorithm is a simple heuristic algorithm that mimics the static
delay performance of the snowball algorithm is better in the snowball streaming algorithm and dynamically resolves the
exponential case than in the deterministic case conflicts between active chunks in continuous streaming.
The DSB streaming algorithm works in rounds. At each
round, let be the set of active chunks that have been gener-
ated by the video source server but have not been uploaded to
For chunk transmission time following general distribution with all peers. For any chunk , let be the number of peers
mean 1, the interval between two chunk upload finish times is with chunk and be the number of peers without chunk .
no longer exponential. However, the superposition of a large Define the demand factor for chunk as , which
number of point processes converges to a Poisson process [3]. is the expected workload for each peer with chunk to upload
For large approximately follows an exponential distribu- it to some peers without it. Then, for any peer , let be the
tion with rate . We can apply the previous exponential distribu- set of chunks in its buffer. The total expected workload for peer
tion analysis to study the behavior of large system with generally can be calculated as . The DSB algorithm
distributed chunk transmission time. It is our conjecture that for calculates the chunk uploading schedule among peers round by
a reasonable large , e.g., , one can expect snowball round. The DSB algorithm is outlined in Algorithm 1.
approach achieves better delay performance than the determin-
istic case. Analysis and simulations of snowball chunk dissem- B. Performance Study of DSB
ination when the chunk transmission time distribution time fol- We implemented the centralized DSB streaming algorithm
lows general distribution are presented in [16]. and conducted simulations of a P2P video streaming systems
with 4000 peers. For each simulation, a stream of 1000 contin-
VI. CONTINUOUS STREAMING IN DYNAMIC ENVIRONMENT uous chunks are disseminated to all peers. We introduce random
variations in peer upload bandwidth and propagation delays be-
In the previous section, we studied the delay performance
tween peers. More specifically, for each chunk transmission be-
of the snowball single-chunk dissemination scheme under
tween peers, the transmission time follows a truncated expo-
long propagation delays and network bandwidth variations.
nential distribution. The propagation delays between two peers
However, for continuous streaming, due to the randomness in
follow a truncated normal distribution. We record how long it
network bandwidth and propagation delays, we can no longer
takes for each chunk to be received by each peer. Then, we
predetermine fixed chunk streaming schedules among peers
calculate the average and worst-case streaming delay for each
as in the static network case studied in Section IV. Instead,
chunk and compare them to the single-chunk dissemination de-
chunk uploading schedules have to be calculated dynamically
lays obtained using a single-chunk dissemination simulator2 in
to adapt to network bandwidth and delay variations. Now, we
a system with the same bandwidth and propagation delay set-
extend the static snowball streaming algorithm to the Dynamic
tings.
Snowball (DSB) streaming algorithm. We will show through
When there is no bandwidth variation and the propagation
simulations that, with a small peer upload bandwidth over-
delays are negligible, the transmission time of a chunk is set to
head, the proposed DSB streaming algorithm can approach the
be eight simulation time-steps. The DSB streaming algorithm
minimum delay bounds in dynamic network environment. The
achieves the minimum single-chunk delay bound as presented
main purpose of this section is to demonstrate the potential of
snowball type of streaming algorithms to achieve the minimum 2Details of the simulator and results are described in [16].
10. 1204 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 18, NO. 4, AUGUST 2010
Fig. 4. Delay performance of dynamic snowball streaming algorithm degrades
when there are variations in propagation delays and peer upload bandwidth.
The minimum delay bounds can be approached by slightly increasing peer up-
load bandwidth. (a) Random propagation delay; (b) random upload bandwidth;
(c) random delay and random bandwidth.
TABLE III
DELAY PERFORMANCE OF DSB UNDER RANDOM PROPAGATION DELAYS
AND UPLOAD BANDWIDTH
average peer upload bandwidth and the streaming rate [6], [12].
in Section III. Each of the 1000 chunks in the stream is deliv- The single-chunk delays can be approached by the DSB algo-
ered to 4000 peers after exactly 97 time-steps, and the average rithm if peers have upload bandwidth slightly higher than the
delay experienced by peers is 88.81 time-steps. It demonstrated streaming rate. (The statistics of average and worst-case delay
that the dynamic snowball streaming is delay-optimal in static performance of DSB and Multi-Tree are compared in Table III.
homogeneous environment. The delay performance of Mulit-Tree is worse than DSB. How-
Next, we conduct simulations to evaluate the performance of ever, the delay variance is much smaller than DSB. This sug-
DSB in dynamic network environment. For comparison, we also gests that Multi-Tree can adapt well to propagation delay vari-
run the same simulations for the balanced multi-tree streaming ations.
with sequential upload and node degree of 5.3 Now, we repeat the previous simulation with zero propaga-
We first introduce random propagation delays according to tion delay and random peer upload bandwidth. Now each chunk
a truncated normal distribution with the mean equal to eight transmission time follows a truncated exponential distribution,
time-steps (the chunk transmission time) and the standard de- with the mean equal to eighttime-steps, and the lower and upper
viation equal to four time-steps. The lower and upper bound for limit is 1 and 24, respectively. Again, we use the single-chunk
the random propagation delay is one and 16 time-steps, respec- dissemination simulation as the reference point. As predicted by
tively. In Fig. 4(a), we compare the delay performance of DSB the analysis in Section V, with random chunk transmission time,
when the average peer upload bandwidth varies from 1 to 1.125 the average and worst-case single-chunk delays (66.8 and 91.3,
to 1.25. We use as a reference point the single-chunk delays ob- respectively) are smaller than those for the zero propagation
tained from 100 simulation runs of a single-chunk dissemination delay case (88.81 and 97). The streaming delay performance of
between 4000 peers with the same random propagation delay DSB is plotted in Fig. 4(b). When the average peer upload band-
setting using the simulator described in [16]. Due to the prop- width is equal to the streaming rate, due to conflicts between
agation delay, the average and worst-case single-chunk delays chunks, the streaming delay performance is much worse than the
are 127.3 and 151.6, respectively, which are larger than 88.81 corresponding single-chunk delay performance. By increasing
and 97 for the zero propagation delay case. Fig. 4(a) plots the peer upload bandwidth by 12.5%, the delay performance is re-
average streaming delay for each chunk in DSB. The system re- duced by 25%. If we further increase the average peer upload
source index in the figure is defined as the ratio between the bandwidth to 1.25 times the streaming rate [corresponding to
3The degree is optimized for the transmission and propagation delay ratio of the curve labeled with resource in Fig. 4(b)], the
1 according to Table II. delay performance is getting closer to the single-chunk delay
11. LIU: DELAY BOUNDS OF CHUNK-BASED PEER-TO-PEER VIDEO STREAMING 1205
TABLE IV P2P video systems. The proposed DSB algorithm can be
DELAY PERFORMANCE IMPROVEMENT OF DSB AT DIFFERENT CHUNK SIZES implemented in a clustered P2P streaming framework, such
as HCPS [13]. Within a cluster, peers can employ the DSB
chunk scheduling to achieve the minimum delay. For general
mesh-based systems, using insights obtained in this study, we
will design a distributed chunk scheduling algorithm that will
bound. As seen in Table III, the delay performance of Mulit-Tree mimic the snowball schedule spirit. It will explore the peer
is much worse than DSB. This is because the delivery path of bandwidth heterogeneity for shorter delay. It will also schedule
chunks are predetermined in Multi-Tree. If one chunk transmis- uploads of multiple active chunks to achieve a delay perfor-
sion gets delayed on one link, all subsequent chunks have to mance close to the ideal contention-free delay bound. The
be queued. This can happen on any link on the path and re- snowball algorithm minimizes the delay by pushing the oldest
sults in a “chain effect.” In contrast, DSB can adaptively find chunk first. As chunk delays decrease, the number of active
peers with available bandwidth to quickly disseminate chunks. chunks in the system decreases. This increases the chance of
Its delay performance is much better than Multi-Tree. content bottleneck. Rarest-first type of chunk scheduling has
Next, we introduce both random propagation delays and proven efficient in eliminating content bottlenecks in P2P file
random peer upload bandwidth by combining the random delay sharing [24]. We will develop chunk scheduling algorithms that
and bandwidth variations introduced in the previous two sets of efficiently combine oldest-first and rarest-first scheduling rules
simulations. In Fig. 4(c), we compare the delay performance of to achieve a good balance between delay performance and peer
DSB when the average peer upload bandwidth varies from 1 to bandwidth utilization. We will test its performance in a real
1.125 to 1.25. Again, the minimum single-chunk delay bounds network environment and compare it to the theoretical bounds
can be approached by the DSB algorithm if peers have upload predicted by our analysis here. Another direction for future
bandwidth slightly higher than the streaming rate. In Table III, work is to extend the delay performance analysis to take into
due to the bandwidth variations, the delay performance of consideration other factors, such as peer churns, geographic
Multi-Tree is still much worse than DSB. To study the impact locality of peers and correlations among individual chunk
of chunk size on the delay performance improvement of DSB, transmissions, etc. More broadly, we are interested in extending
we fix the average propagation delay at eight time slots and our design and analysis of snowball-type algorithms to other
vary the size of chunks such that the chunk transmission time forms of P2P systems with stringent delay requirements, such
ranges from two to 16 time slots. As indicated in Table IV, as as Content Delivery Networks and P2P gaming systems [1].
chunk size decreases, the delay performance of both DSB and
Multi-Tree both degrade. The performance gap between them
also decreases. When there is random bandwidth variation, REFERENCES
DSB still outperforms Multi-Tree by around 25% even with a [1] A. Bharambe, J. R. Douceur, J. R. Lorch, T. Moscibroda, J. Pang, S. Se-
small chunk size of 2. shan, and X. Zhuang, “Donnybrook: Enabling large-scale, high-speed,
Through simulations, we demonstrated that with a little peer-to-peer games,” in Proc. ACM SIGCOMM, 2008, pp. 389–400.
[2] G. Bianchi, N. B. Melazzi, L. Bracciale, F. L. Piccolo, and S. Sal-
bit of extra peer uploading bandwidth, our dynamic snowball sano, “Fundamental delay bounds in peer-to-peer chunk-based real-
streaming algorithm can approach the minimum delay bounds time streaming systems,” Tech. Rep., Feb. 2009 [Online]. Available:
in the face of random variations in peer uploading bandwidth http://arxiv.org/PS cache/arxiv/pdf/0902/0902. 1394v1.pdf
[3] J. Cao and K. Ramanan, “A Poisson limit for buffer overflow probabil-
and propagation delays on peering connections. ities,” in Proc. IEEE INFOCOM, 2002, vol. 2, pp. 994–1003.
[4] M. Castro, P. Druschel, A.-M. Kermarrec, A. Nandi, A. Rowstron, and
A. Singh, “SplitStream: High-bandwidth multicast in cooperative en-
VII. CONCLUSION AND FUTURE WORK vironments,” in Proc. ACM SOSP, 2003, pp. 298–313.
In this paper, we analytically study the delay performance [5] M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon, “I tube, you
tube, everybody tubes: Analyzing the world’s largest user generated
of P2P live video streaming systems. We derive various delay content video system,” in Proc. IMC, 2007, pp. 1–14.
bounds that can serve as delay performance benchmarks for pro- [6] Y. Chu, S. Rao, S. Seshan, and H. Zhang, “Enabling conferencing ap-
posed/deployed P2P streaming systems. Through our analysis, plications on the internet using an overlay multicast architecture,” in
Proc. ACM SIGCOMM, 2001, pp. 55–67.
we quantify the impact of the bandwidth distribution among [7] Y.-H. G. Chu, S. Rao, and H. Zhang, “A case for end system multicast,”
peers on their delay performance. Insights brought forth by our in Proc. ACM SIGMETRICS, 2000, pp. 1–12.
study can be used to guide the design of new P2P streaming [8] X. Hei, C. Liang, J. Liang, Y. Liu, and K. W. Ross, “A measurement
study of a large-scale P2P IPTV system,” IEEE Trans. Multimedia, vol.
systems with shorter startup delays and playback lags. Static 9, no. 8, pp. 1672–1687, Dec. 2007.
snowball streaming algorithms is proposed to achieve the min- [9] X. Hei, Y. Liu, and K. Ross, “Inferring network-wide quality in P2P
imum delay bounds in static homogeneous and heterogeneous live streaming systems,” IEEE J. Sel. Areas Commun., vol. 25, no. 9, pp.
1640–1654, Dec. 2007, Special Issue on Advances in P2P Streaming.
P2P video systems. A dynamic snowball streaming algorithm [10] J. Jannotti, D. K. Gifford, K. L. Johnson, M. F. Kaashoek, and J.
is also developed to approach the minimum delay bounds with W. O’Toole, Jr., “Overcast: Reliable multicasting with an overlay
a small peer upload bandwidth overhead. Through analysis and network,” in Proc. OSDI, 2000, pp. 197–212.
[11] D. Kostic, A. Rodriguez, J. Albrecht, and A. Vahdat, “Bullet: High
simulation, we show that the snowball type of streaming algo- bandwidth data dissemination using an overlay mesh,” in Proc. ACM
rithms are robust to network impairments, such as long propa- SOSP, 2003, pp. 282–297.
gation delays and random bandwidth variations. [12] R. Kumar, Y. Liu, and K. Ross, “Stochastic fluid theory for P2P
streaming systems,” in Proc. IEEE INFOCOM, 2007, pp. 919–927.
The next step is to develop distributed implementation of [13] C. Liang, Y. Guo, and Y. Liu, “Hierarchically clustered P2P streaming
the proposed snowball streaming algorithms in mesh-based system,” in Proc. IEEE GLOBECOM, 2007, pp. 236–241.
12. 1206 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 18, NO. 4, AUGUST 2010
[14] S. Liu, R. Zhang-Shen, W. Jiang, J. Rexford, and M. Chiang, “Perfor- [26] M. Zhang, J.-G. Luo, L. Zhao, and S.-Q. Yang, “A peer-to-peer network
mance bounds for peer-assisted live streaming,” in Proc. ACM SIG- for live media streaming using a push-pull approach,” in Proc. ACM
METRICS, 2008, pp. 313–324. Multimedia, 2005, pp. 287–290.
[15] Y. Liu, “On the minimum delay peer-to-peer video streaming: How [27] M. Zhang, L. Zhao, J. Tang, and L. Y. S. Yang, “A peer-to-peer network
real-time can it be?,” in Proc. ACM Multimedia, 2007 [Online]. Avail- for live media streaming—Using a push–pull approach,” in Proc. ACM
able: http://eeweb.poly.edu/faculty/yongliu/docs/mm07.pdf Multimedia, 2005, pp. 287–290.
[16] Y. Liu, “Delay bounds of peer-to-peer video streaming,” Polytechnic [28] X. Zhang, J. Liu, B. Li, and T.-S. P. Yum, “CoolStreaming/DONet: A
Inst. NYU, Tech. Rep., Jun. 2009 [Online]. Available: http://eeweb. data-driven overlay network for live media streaming,” in Proc. IEEE
poly. edu/faculty/yongliu/docs/mm2009.pdf INFOCOM, 2005, vol. 3, pp. 2102–2111.
[17] N. Magharei and R. Rejaie, “Prime: Peer-to-peer receiver-driven mesh-
based streaming,” in Proc. IEEE INFOCOM, 2007, pp. 1415–1423.
[18] V. Pai, K. Kumar, K. Tamilmani, V. Sambamurthy, and A. Mohr,
“Chainsaw: Eliminating trees from overlay multicast,” in Proc. 4th Yong Liu (M’02) received the Bachelor’s and
Int. Workshop Peer-to-Peer Syst., 2005, pp. 127–140. Master’s degrees in automatic control from the
[19] “PPLive homepage,” PPLIVE [Online]. Available: http://www. University of Science and Technology of China,
pplive.com Hefei, China, in 1994 and July 1997, respectively,
[20] “PPStream homepage,” PPSTREAM [Online]. Available: http://www. and the Ph.D. degree in electrical and computer
ppstream.com engineering from the University of Massachusetts,
[21] D. Ren, Y. Li, and S. Chan, “On reducing mesh delay for peer-to-peer Amherst, in May 2002.
live streaming,” in Proc. IEEE INFOCOM, 2008, pp. 1058–1066. He has been an Assistant Professor with the Elec-
[22] T. Small, B. Liang, and B. Li, “Scaling laws and tradeoffs in peer-to- trical and Computer Engineering Department, Poly-
peer live multimedia streaming,” in Proc. 14th Annu. ACM Int. Conf. technic Institute of NYU, Brooklyn, NY, since March
Multimedia, 2006, pp. 539–548. 2005. His general research interests lie in modeling,
[23] “SopCast homepage,” SOPCAST [Online]. Available: http://www.sop- design, and analysis of communication networks. His current research directions
cast.org include robust network routing, peer-to-peer IPTV systems, overlay networks,
[24] “UNKOWN BitTorrent Web site,” [Online]. Available: http://www.bit- and network measurement.
torrent.com/ Dr. Liu is a Member of the Association for Computing Machinery (ACM). He
[25] V. Venkataraman, K. Yoshida, and J. P. Francis, “Chunkyspread: Het- is the winner of the IEEE INFOCOM Best Paper Award in 2009 and the IEEE
erogeneous unstructured tree-based peer-to-peer multicast,” in Proc. Communications Society Best Paper Award in Multimedia Communications in
14th IEEE ICNP, 2006, pp. 2–11. 2008.