1. Computer Networks 11/11/2009
Internet Online Applications
Scalable Internet Servers and
Load Balancingg Internet online applications
Applications accessible to online users through Internet
Internet.
Examples
Online keyword search engine: Google.
Web email: Gmail.
News: CNN, NBC news.
Web directory: Yahoo!, MSN.
Kai Shen
Scalability requirements
Many simultaneous user accesses; large amount of hosted
data, …
Internet servers
Computer systems that host these online applications
11/11/2009 CSC 257/457 - Fall 2009 1 11/11/2009 CSC 257/457 - Fall 2009 2
Internet Servers are at the Search Engine as An Example:
Application Layer Step 1 – Crawling
Normally on the end hosts, involving no routers
Function on transport-layer protocols TCP/UDP Crawling – get all these Web pages out there:
g g p g
First retrieve some root pages;
Parse their content and follow hyperlinks to retrieve more
pages;
Depth-first search or breadth-first search? Remove
Internet duplicates.
Google
Yahoo!
CNN
11/11/2009 CSC 257/457 - Fall 2009 3 11/11/2009 CSC 257/457 - Fall 2009 4
CSC 257/457 - Fall 2009 1
2. Computer Networks 11/11/2009
Performance Analysis for Search Engine as An Example:
Crawling Step 2 – Indexing
What are the resources involved?
CPU processing for TCP/HTTP protocol handling and the
p g p g Indexing
parsing of page content
f crawled raw web pages are not easy to search.
writing to disk storage we index them to formats that are easy to search.
network bandwidth to remote web sites
As part of indexing, we need to give each page an ID
Assume average page size 10KB using a hash function.
raw processing power of a single CPU
……
1000 requests/sec
Computer: Page #123 Page #357
I/O to a single disk
100 seeks/sec up to 100 requests/sec
network bandwidth from/to the Internet
T1 link (1.5Mbit/s) 12 requests/sec
Networks: Page #124 Page #468 ……
T3 link (45Mbit/s) 360 requests/sec
11/11/2009 CSC 257/457 - Fall 2009 5 11/11/2009 CSC 257/457 - Fall 2009 6
Search Engine as An Example:
Step 3 – Online Search Partitioning and Replication
Index servers
(partition 1)
Index server
Firewall/
Firewall Router
Local-
Local-area
network
Web server/
Query handler Local-
Local-area
Internet Internet network Index servers
(partition 2)
Page server
Web server/
Query handlers
Page servers
Scalability, reliability
11/11/2009 CSC 257/457 - Fall 2009 7 11/11/2009 CSC 257/457 - Fall 2009 8
CSC 257/457 - Fall 2009 2
3. Computer Networks 11/11/2009
Load Balancing on Internet Servers
Load Balancing over Internet
Technique 1 - DNS Rotation
Servers
128.111.1.2
Popular sites like Google or CNN receive tens or
hundreds of millions of hits per day
day. IP address of
CNN.com?
A large number of replicated servers are used at Firewall/ 128.111.1.3
these sites. Router
IP address of
Key question: how to balance client requests over CNN.com? Internet
these servers?
128.111.1.4
128.111.1.2
Web servers
for CNN.com
128.111.1.3
DNS server
for CNN.com
11/11/2009 CSC 257/457 - Fall 2009 9 11/11/2009 CSC 257/457 - Fall 2009 10
Load Balancing on Internet Servers
Discussions on DNS Rotation Technique 2 – Cooperative Offloading
128.111.1.2
Advantages
Require almost no change on the existing Internet
architecture
Firewall/ 128.111.1.3
Router
Problems
DNS Caching Internet
Rigid load balancing policy
can’t balance based on runtime load changes
128.111.1.4
slow or no adjustment in response to failures
Web servers
for CNN.com
11/11/2009 CSC 257/457 - Fall 2009 11 11/11/2009 CSC 257/457 - Fall 2009 12
CSC 257/457 - Fall 2009 3
4. Computer Networks 11/11/2009
Discussions on Cooperative Cooperative Offloading with
Offloading TCP Handoff [Pai et al. ASPLOS1998]
128.111.1.2
What does 1.3 do?
Can be combined with the DNS rotation.
What does 1.4 do?
Advantages:
More flexible policy is possible clt IP Firewall/ 128.111.1.3
Router
Be more responsive to runtime workload and server 1.3
failures (to a certain degree) clt IP
Internet 1.4
Problems: 128.111.1.4
Need software changes on servers 1.3
13
Longer delay clt IP Web servers
for CNN.com
All packets in a TCP
connection must
offload to one server?
11/11/2009 CSC 257/457 - Fall 2009 13 11/11/2009 CSC 257/457 - Fall 2009 14
Cooperative Offloading vs. Load Balancing on Internet Servers
TCP Handoff Technique 3 – Load Balancing Router
128.111.1.2
clt IP
Software changes on the servers
g 1.2
1.2
clt IP
clt IP
Delays 128.111.1.3
Firewall
1.1 LB Router
Internet 128.111.1.1
1.1
clt IP 128.111.1.4
Web servers
for CNN.com
11/11/2009 CSC 257/457 - Fall 2009 15 11/11/2009 CSC 257/457 - Fall 2009 16
CSC 257/457 - Fall 2009 4
5. Computer Networks 11/11/2009
More About Load Balancing Router Summary
How deep do we look into the network protocol stack? Scalable Internet servers
Network layer (IP)? partitioning
replication
Transport layer (TCP/UDP)?
Application layer? Load balancing for Internet servers
DNS rotation
Load balancing policies in LB routers (Goal: transparency, cooperative offloading (w. TCP handoff)
plug-and-play) Load balancing router
Simple rotation Changes required on the components:
components
DNS server??
Least number of active requests
Web server??
Shortest response time client??
router??
11/11/2009 CSC 257/457 - Fall 2009 17 11/11/2009 CSC 257/457 - Fall 2009 18
CSC 257/457 - Fall 2009 5