SlideShare a Scribd company logo
1 of 49
Evaluating the Impact of Content
Delivery Networks on N-tier E-
commerce Environments
Witold Rzepnicki
March 27th, 2007
2
Short Bio
• Moved to U.S. from Poland circa 1995
• Completed undergraduate studies in Computer
Information Systems at Missouri State University
• I have worked for Hallmark Cards since 1998 as a Java EE
developer, project manager/lead and a technology
architect
• PMP and SCEA certifications….811, 816 and 818 came in
handy
• Hobbies: travel, foreign languages, tennis (outdoors and
on Nintendo Wii)
3
Acknowledgments
• Dr. Hossein Saiedian
• Dr. Arvin Agah
• Dr. Prasad Kulkarni
• My wife, Masako
4
Outline
• Problem
• Significance
• Methodology/Solution
• Results/Evaluation
• Conclusion
• Further Research
5
Outline
Problem
• Significance
• Methodology/Solution
• Results/Evaluation
• Conclusion
• Further Research
6
The Non-technical Introduction
What Matters to Consumers?
• Are you happy with the web sites you visit?
• Consumers cite website performance and
responsiveness as key challenges for E-
commerce environments (Nielsen Research)
• Role of content and content delivery
Satisfaction Level 2005 2004 2003 2002
2002 vs 2005
change
Very Satisfied 40% 37% 40% 37% 3%
Somewhat Satisfied 24% 24% 23% 22% 2%
Neutral 31% 32% 30% 33% -3%
Somewhat Dissatisfied 4% 5% 5% 5% -1%
Very Dissatisfied 2% 3% 3% 3% -1%
7
Typical Hourly Downtime Costs
• Brokerage operations $6,450,000
• Credit card authorization $2,600,000
• Ebay $225,000
• Amazon.com $180,000
• Package shipping services $150,000
• Home shopping channel $113,000
• Catalog sales center $90,000
• Airline reservation center $89,000
Source: Pp. 185-188 of the Proceedings of LISA '02: Sixteenth Systems Administration Conference,
(Berkeley, CA: USENIX Association, 2002).
8
Subject E-commerce Environment
• Architecture
– Infrastructure
– Logical
• Workload characteristics
– Seasonal spikes
– Content size increase
• Current content delivery model
– Server-farm based content
delivery
– Not geographically dispersed
Content Type Size 1998 2004 2007
Documents 0 KB 28 KB 34 KB
Images 36 KB 42 KB 96 KB
Scripts 0 KB 2 KB 20 KB
StyleSheets 0 KB 0 KB 62 KB
Total 36 KB 72 KB 181 KB
9
Architectural Views - Infrastructure
10
Content Delivery Bottlenecks
Even with infinite server-side scalability we
would still encounter WAN bottlenecks
11
Page View Workload
12
Content Publishing
13
Problem Statement
• Insufficient performance and scalability during peaks
• Tactics to-date do not fully address the content
delivery layer
– Last-mile, first-mile, peering and backbone problems
– Upper limit to bandwidth scalability for content
delivery (single hosting site)
– Cost factors
• Symptom: performance degrades as Web servers get
overloaded with requests
14
Outline
• Problem
Significance
• Methodology/Solution
• Results/Evaluation
• Conclusion
• Further research
15
Content Delivery Networks
• The CDNs offload some or all of the content
delivery from the origin Web servers.
• It is a large set of replica servers called the
edge servers that deliver content on behalf of
the origin server.
• CDNs claim to address
– Client perceived latency (e.g. Web browsers)
– Capacity management of the servers
– Static content caching requirements
16
• Quality attribute evaluation of the CDN claim
– Performance
– Scalability
– Availability
– Maintainability
• Consumer and server-side measurements
• Infrastructure footprint impact
– Potential cost savings can be significant
– One hosting center versus two
– Resilience of a geographically dispersed network
• Research to-date focuses on network impacts alone
Research Focus
17
Performance and Scalability Issues
Consumer
experience
and site
visits
18
Performance and Scalability Issues
Resource
utilization
19
Tactics Implemented To-date
• Horizontal and vertical scalability strategies
implemented to-date
– Clustering
– Origin server caching – content and application
– Scaling individual nodes’ CPU and memory capacity
– Application and database tuning
– Additional bandwidth and switching improvements
– Considered introducing another hosting site to
further improve bandwidth
20
Outline
• Problem
• Significance
Methodology/Solution
• Results/Evaluation
• Conclusion
• Further research
21
Why a CDN?
• Server-side caching approaches not sufficient
• Fewer “hops” and more efficient routing
• Ease of implementation versus establishing a
set of secondary hosting facilities
• CDNs (e.g., Akamai) improve web performance
by
– Performing extensive network & server
measurements
– DNS redirection to the most efficient servers
22
DNS Fundamentals
• Client-server
architecture
• TTL and
caching
• Name
resolution
steps
23
Content Delivery Network
•Browser requests redirected
to the most suitable edge
server
•Browser gets web site’s DNS
CNAME entry with domain
name in CDN network
•Hierarchy of a CDN’s DNS
servers direct client to a
“nearby” server
•Based on current network
conditions as measured by the
CDN
24
CDN Selection and Implementation
• Redirect method selection: URL
rewrite vs. URL redirect, partial-
site vs. full-site
• DNS changes
– Local name server
• CDN configuration changes
25
How To Measure Quality Attribute
Impacts?
• Performance
– Page response times
– Java EE component processing times
– Data center network latency
• Scalability
– Ability to sustain traffic spikes while maintaining the
same resource footprint
– Resource utilization (bandwidth, CPU, etc.)
• Other QA impacts
– Availability and maintainability
26
Experimental Challenges
• Scalability
– Requires sufficient load to test elasticity of
resources
– Need to simulate fast transactional bursts
– Gather production environment data during the
February peak
• Performance
– Establish pre-CDN and post-CDN baselines under
steady state
– Eliminate outside “noise” by isolating transactions in
a non-production environment
27
Monitoring and Measurement
Framework
• Consumer perspective
– Real-time user monitoring
– Browse versus shop transactions
– Geographic distribution
– Consistent and sustained rate
• Application perspective
– URI stem-level performance measurement
– Host, network and end-to-end times
• System perspective
– Vmstat and bandwidth utilization
28
Consumer Transaction Emulation
• Response times before and after CDN
• Real-time user monitoring
• Transaction characteristics and frequency
ISP City and State
Level3 Los Angeles, CA
Savvis Santa Clara, CA
Verizon Denver, CO
MFN Washington, D.C.
Internap Miami, FL
Level3 Chicago, IL
Sprint New York, NY
29
End-to-end Performance Timeline
30
Browse and Shop Transaction
Characteristics
Transaction workload characteristics Browse Shop
Number of transaction steps 9 6
Number of images retrieved 163 94
Number of scripts, HTML, CSS, Flash components 57 39
Number of server-side J2EE components accessed 12 15
Average image size 2.9 KB 2.8 KB
Average size of HTML, script and Flash 4.9 KB 5.8 KB
Total number of bytes retrieved per connection 250 KB 98 KB
Number of web-server connections initiated from the
browser
4 5
31
Appliance-based Server-side
Monitoring
32
Outline
• Problem
• Significance
• Methodology/Solution
Results/Evaluation
• Evaluation
• Conclusion
• Further research
33
Scalability: Memory Utilization
34
Scalability: CPU Utilization
35
Scalability: TCP Packet Count
Reduction
36
Scalability: CDN Bandwidth
Utilization
37
Bandwidth Efficiency Improvements
38
Web Tier Scale Factor
• Maximum concurrent Web server socket threads
• Maximum object “hits” in Akamai
• 16,000 hits / 3,600 threads
• Equivalent to 4x of our Web server farm
39
Performance: Shop Tx Impact
20% improvement
40
Performance: Browse Tx Impact
30% improvement – more content for “window”
shopping
41
Performance: Server-side
42
Performance: HTML Object Download
Time
• Browse
Transaction
• Shop
Transaction
• Why the discrepancy between the RTUM and
Server performance?
43
Maintainability and Availability
• Configuration management
– 2 hours on average to deploy configuration changes
• Content management
– 7-10 minutes to propagate content across edge
servers
• Achieved 100% availability during the observed
February peak
44
Outline
• Problem
• Significance
• Methodology/Solution
• Results
• Evaluation
Conclusions
• Further research
45
Conclusions – The Good
• Improved user-perceived performance
• Significant scalability impact
• Availability improvements
46
Conclusions – The Not-so-Good
• Server-side performance impacted by additional
DNS redirects
• Maintainability impacts
– Configuration changes
– Content changes
47
Outline
• Problem
• Significance
• Methodology/Solution
• Evaluation/Results
• Conclusion
Further research
48
Future Work
• Edge computing
– Edge delivery of applications
• Impact of edge delivery on media streaming
and protocols other than HTTP
– RTSP, MMS
49
End of Presentation
• Thank you
• Questions are welcome

More Related Content

What's hot

Sharing the Point South America 2013 (STPSA) - Ultimate SharePoint Infrastruc...
Sharing the Point South America 2013 (STPSA) - Ultimate SharePoint Infrastruc...Sharing the Point South America 2013 (STPSA) - Ultimate SharePoint Infrastruc...
Sharing the Point South America 2013 (STPSA) - Ultimate SharePoint Infrastruc...
Michael Noel
 
Nov 2014 webinar Making The Transition From Ftp
Nov 2014 webinar Making The Transition From FtpNov 2014 webinar Making The Transition From Ftp
Nov 2014 webinar Making The Transition From Ftp
FileCatalyst
 
Bottlenecks exposed web app db servers
Bottlenecks exposed web app db serversBottlenecks exposed web app db servers
Bottlenecks exposed web app db servers
Upender Dravidum
 

What's hot (11)

Sharing the Point South America 2013 (STPSA) - Ultimate SharePoint Infrastruc...
Sharing the Point South America 2013 (STPSA) - Ultimate SharePoint Infrastruc...Sharing the Point South America 2013 (STPSA) - Ultimate SharePoint Infrastruc...
Sharing the Point South America 2013 (STPSA) - Ultimate SharePoint Infrastruc...
 
Fishbowl Solutions Webinar: A Path, Package, and Promise for WebCenter Conten...
Fishbowl Solutions Webinar: A Path, Package, and Promise for WebCenter Conten...Fishbowl Solutions Webinar: A Path, Package, and Promise for WebCenter Conten...
Fishbowl Solutions Webinar: A Path, Package, and Promise for WebCenter Conten...
 
Data Server Manager for DB2 for z/OS
Data Server Manager for DB2 for z/OS Data Server Manager for DB2 for z/OS
Data Server Manager for DB2 for z/OS
 
v9.1.2 update
 v9.1.2 update v9.1.2 update
v9.1.2 update
 
Partner spotlight: Empress
Partner spotlight: EmpressPartner spotlight: Empress
Partner spotlight: Empress
 
Nov 2014 webinar Making The Transition From Ftp
Nov 2014 webinar Making The Transition From FtpNov 2014 webinar Making The Transition From Ftp
Nov 2014 webinar Making The Transition From Ftp
 
Bottlenecks exposed web app db servers
Bottlenecks exposed web app db serversBottlenecks exposed web app db servers
Bottlenecks exposed web app db servers
 
#DNUG45 - IBM Notes and Domino Performance Boost - Reloaded
 #DNUG45 - IBM Notes and Domino Performance Boost - Reloaded #DNUG45 - IBM Notes and Domino Performance Boost - Reloaded
#DNUG45 - IBM Notes and Domino Performance Boost - Reloaded
 
Content Delivery Network - CDN
Content Delivery Network - CDNContent Delivery Network - CDN
Content Delivery Network - CDN
 
Dev Analytics Overview
Dev Analytics OverviewDev Analytics Overview
Dev Analytics Overview
 
Webinar: Migrating from RDBMS to MongoDB
Webinar: Migrating from RDBMS to MongoDBWebinar: Migrating from RDBMS to MongoDB
Webinar: Migrating from RDBMS to MongoDB
 

Viewers also liked

Wilfried Martens
Wilfried MartensWilfried Martens
Wilfried Martens
seppegeens
 
Malalties bacterianes infeccioses
Malalties bacterianes infecciosesMalalties bacterianes infeccioses
Malalties bacterianes infeccioses
albertbjm
 
Etika & teknologi informasi p 2
Etika & teknologi informasi p 2Etika & teknologi informasi p 2
Etika & teknologi informasi p 2
Rudi Kurniawan
 
Participación democrática medellín_def
Participación democrática medellín_defParticipación democrática medellín_def
Participación democrática medellín_def
oidp
 

Viewers also liked (20)

Wilfried Martens
Wilfried MartensWilfried Martens
Wilfried Martens
 
форматирование символов
форматирование символовформатирование символов
форматирование символов
 
Acropolis
AcropolisAcropolis
Acropolis
 
Curso online Introducción a técnicas de marketing digital
Curso online Introducción a técnicas de marketing digitalCurso online Introducción a técnicas de marketing digital
Curso online Introducción a técnicas de marketing digital
 
Nilai murni siput n kumbang
Nilai murni siput n kumbangNilai murni siput n kumbang
Nilai murni siput n kumbang
 
Malalties bacterianes infeccioses
Malalties bacterianes infecciosesMalalties bacterianes infeccioses
Malalties bacterianes infeccioses
 
Qa05 square root and cube root
Qa05 square root and cube rootQa05 square root and cube root
Qa05 square root and cube root
 
Presentasi unauthorized accses
Presentasi   unauthorized accsesPresentasi   unauthorized accses
Presentasi unauthorized accses
 
Etika & teknologi informasi p 2
Etika & teknologi informasi p 2Etika & teknologi informasi p 2
Etika & teknologi informasi p 2
 
Participación democrática medellín_def
Participación democrática medellín_defParticipación democrática medellín_def
Participación democrática medellín_def
 
Women in Ancient Athens
Women in Ancient AthensWomen in Ancient Athens
Women in Ancient Athens
 
Strategi Industri Telematika Indonesia
Strategi Industri Telematika IndonesiaStrategi Industri Telematika Indonesia
Strategi Industri Telematika Indonesia
 
Jenis dan klasifikasi media pembelajaran
Jenis dan klasifikasi media pembelajaranJenis dan klasifikasi media pembelajaran
Jenis dan klasifikasi media pembelajaran
 
The Peanuts movie: Snoopy vs Red Baron
The Peanuts movie: Snoopy vs Red BaronThe Peanuts movie: Snoopy vs Red Baron
The Peanuts movie: Snoopy vs Red Baron
 
General Quiz - IIT BHU Quiz Club
General Quiz - IIT BHU Quiz Club General Quiz - IIT BHU Quiz Club
General Quiz - IIT BHU Quiz Club
 
bisa leasing
bisa leasing bisa leasing
bisa leasing
 
IIT(BHU) Dirty Quiz-Prelims
IIT(BHU) Dirty Quiz-PrelimsIIT(BHU) Dirty Quiz-Prelims
IIT(BHU) Dirty Quiz-Prelims
 
Biodiversidad términos y conceptos claves 2016
Biodiversidad términos y conceptos claves 2016 Biodiversidad términos y conceptos claves 2016
Biodiversidad términos y conceptos claves 2016
 
Zonas de vida y formaciones vegetales 2016
Zonas de vida y formaciones vegetales 2016 Zonas de vida y formaciones vegetales 2016
Zonas de vida y formaciones vegetales 2016
 
Advertising & sales promotion activities adopted @ keshav cement project repo...
Advertising & sales promotion activities adopted @ keshav cement project repo...Advertising & sales promotion activities adopted @ keshav cement project repo...
Advertising & sales promotion activities adopted @ keshav cement project repo...
 

Similar to Rzepnicki_thesis_presentation_2003(2) (1)

Scalability designprinciples-v2-130718023602-phpapp02 (1)
Scalability designprinciples-v2-130718023602-phpapp02 (1)Scalability designprinciples-v2-130718023602-phpapp02 (1)
Scalability designprinciples-v2-130718023602-phpapp02 (1)
Minal Patil
 

Similar to Rzepnicki_thesis_presentation_2003(2) (1) (20)

Scalability and performance for e commerce
Scalability and performance for e commerceScalability and performance for e commerce
Scalability and performance for e commerce
 
Web Performance Optimization (WPO)
Web Performance Optimization (WPO)Web Performance Optimization (WPO)
Web Performance Optimization (WPO)
 
M|18 How We Made the Move to MariaDB at FNI
M|18 How We Made the Move to MariaDB at FNIM|18 How We Made the Move to MariaDB at FNI
M|18 How We Made the Move to MariaDB at FNI
 
IRT Unit_4.pptx
IRT Unit_4.pptxIRT Unit_4.pptx
IRT Unit_4.pptx
 
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
 
WebCenter Content 11g Upgrade Webinar - March 2013
WebCenter Content 11g Upgrade Webinar - March 2013WebCenter Content 11g Upgrade Webinar - March 2013
WebCenter Content 11g Upgrade Webinar - March 2013
 
The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!
The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!
The Autobahn Has No Speed Limit - Your XPages Shouldn't Either!
 
Caching up is hard to do: Improving your Web Services' Performance
Caching up is hard to do: Improving your Web Services' PerformanceCaching up is hard to do: Improving your Web Services' Performance
Caching up is hard to do: Improving your Web Services' Performance
 
Data_Visualization_Project
Data_Visualization_ProjectData_Visualization_Project
Data_Visualization_Project
 
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourcePr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open source
 
Scalability designprinciples-v2-130718023602-phpapp02 (1)
Scalability designprinciples-v2-130718023602-phpapp02 (1)Scalability designprinciples-v2-130718023602-phpapp02 (1)
Scalability designprinciples-v2-130718023602-phpapp02 (1)
 
Webinar: Achieving Customer Centricity and High Margins in Financial Services...
Webinar: Achieving Customer Centricity and High Margins in Financial Services...Webinar: Achieving Customer Centricity and High Margins in Financial Services...
Webinar: Achieving Customer Centricity and High Margins in Financial Services...
 
Webinar: The OpEx Business Plan for NoSQL
 Webinar: The OpEx Business Plan for NoSQL Webinar: The OpEx Business Plan for NoSQL
Webinar: The OpEx Business Plan for NoSQL
 
Migration from Oracle to PostgreSQL: NEED vs REALITY
Migration from Oracle to PostgreSQL: NEED vs REALITYMigration from Oracle to PostgreSQL: NEED vs REALITY
Migration from Oracle to PostgreSQL: NEED vs REALITY
 
Today's Unified Communications: To upgrade, coexist, or go 'all in' with the ...
Today's Unified Communications: To upgrade, coexist, or go 'all in' with the ...Today's Unified Communications: To upgrade, coexist, or go 'all in' with the ...
Today's Unified Communications: To upgrade, coexist, or go 'all in' with the ...
 
Trafficshifting: Avoiding Disasters & Improving Performance at Scale
Trafficshifting: Avoiding Disasters & Improving Performance at ScaleTrafficshifting: Avoiding Disasters & Improving Performance at Scale
Trafficshifting: Avoiding Disasters & Improving Performance at Scale
 
Self-Tuning MySQL - a Hosting Provider's Unfair Advantage
Self-Tuning MySQL - a Hosting Provider's Unfair AdvantageSelf-Tuning MySQL - a Hosting Provider's Unfair Advantage
Self-Tuning MySQL - a Hosting Provider's Unfair Advantage
 
Operational-Analytics
Operational-AnalyticsOperational-Analytics
Operational-Analytics
 
Neotys PAC - Ian Molyneaux
Neotys PAC - Ian MolyneauxNeotys PAC - Ian Molyneaux
Neotys PAC - Ian Molyneaux
 
Overview di MongoDB
Overview di MongoDBOverview di MongoDB
Overview di MongoDB
 

Rzepnicki_thesis_presentation_2003(2) (1)

  • 1. Evaluating the Impact of Content Delivery Networks on N-tier E- commerce Environments Witold Rzepnicki March 27th, 2007
  • 2. 2 Short Bio • Moved to U.S. from Poland circa 1995 • Completed undergraduate studies in Computer Information Systems at Missouri State University • I have worked for Hallmark Cards since 1998 as a Java EE developer, project manager/lead and a technology architect • PMP and SCEA certifications….811, 816 and 818 came in handy • Hobbies: travel, foreign languages, tennis (outdoors and on Nintendo Wii)
  • 3. 3 Acknowledgments • Dr. Hossein Saiedian • Dr. Arvin Agah • Dr. Prasad Kulkarni • My wife, Masako
  • 4. 4 Outline • Problem • Significance • Methodology/Solution • Results/Evaluation • Conclusion • Further Research
  • 5. 5 Outline Problem • Significance • Methodology/Solution • Results/Evaluation • Conclusion • Further Research
  • 6. 6 The Non-technical Introduction What Matters to Consumers? • Are you happy with the web sites you visit? • Consumers cite website performance and responsiveness as key challenges for E- commerce environments (Nielsen Research) • Role of content and content delivery Satisfaction Level 2005 2004 2003 2002 2002 vs 2005 change Very Satisfied 40% 37% 40% 37% 3% Somewhat Satisfied 24% 24% 23% 22% 2% Neutral 31% 32% 30% 33% -3% Somewhat Dissatisfied 4% 5% 5% 5% -1% Very Dissatisfied 2% 3% 3% 3% -1%
  • 7. 7 Typical Hourly Downtime Costs • Brokerage operations $6,450,000 • Credit card authorization $2,600,000 • Ebay $225,000 • Amazon.com $180,000 • Package shipping services $150,000 • Home shopping channel $113,000 • Catalog sales center $90,000 • Airline reservation center $89,000 Source: Pp. 185-188 of the Proceedings of LISA '02: Sixteenth Systems Administration Conference, (Berkeley, CA: USENIX Association, 2002).
  • 8. 8 Subject E-commerce Environment • Architecture – Infrastructure – Logical • Workload characteristics – Seasonal spikes – Content size increase • Current content delivery model – Server-farm based content delivery – Not geographically dispersed Content Type Size 1998 2004 2007 Documents 0 KB 28 KB 34 KB Images 36 KB 42 KB 96 KB Scripts 0 KB 2 KB 20 KB StyleSheets 0 KB 0 KB 62 KB Total 36 KB 72 KB 181 KB
  • 9. 9 Architectural Views - Infrastructure
  • 10. 10 Content Delivery Bottlenecks Even with infinite server-side scalability we would still encounter WAN bottlenecks
  • 13. 13 Problem Statement • Insufficient performance and scalability during peaks • Tactics to-date do not fully address the content delivery layer – Last-mile, first-mile, peering and backbone problems – Upper limit to bandwidth scalability for content delivery (single hosting site) – Cost factors • Symptom: performance degrades as Web servers get overloaded with requests
  • 14. 14 Outline • Problem Significance • Methodology/Solution • Results/Evaluation • Conclusion • Further research
  • 15. 15 Content Delivery Networks • The CDNs offload some or all of the content delivery from the origin Web servers. • It is a large set of replica servers called the edge servers that deliver content on behalf of the origin server. • CDNs claim to address – Client perceived latency (e.g. Web browsers) – Capacity management of the servers – Static content caching requirements
  • 16. 16 • Quality attribute evaluation of the CDN claim – Performance – Scalability – Availability – Maintainability • Consumer and server-side measurements • Infrastructure footprint impact – Potential cost savings can be significant – One hosting center versus two – Resilience of a geographically dispersed network • Research to-date focuses on network impacts alone Research Focus
  • 17. 17 Performance and Scalability Issues Consumer experience and site visits
  • 18. 18 Performance and Scalability Issues Resource utilization
  • 19. 19 Tactics Implemented To-date • Horizontal and vertical scalability strategies implemented to-date – Clustering – Origin server caching – content and application – Scaling individual nodes’ CPU and memory capacity – Application and database tuning – Additional bandwidth and switching improvements – Considered introducing another hosting site to further improve bandwidth
  • 20. 20 Outline • Problem • Significance Methodology/Solution • Results/Evaluation • Conclusion • Further research
  • 21. 21 Why a CDN? • Server-side caching approaches not sufficient • Fewer “hops” and more efficient routing • Ease of implementation versus establishing a set of secondary hosting facilities • CDNs (e.g., Akamai) improve web performance by – Performing extensive network & server measurements – DNS redirection to the most efficient servers
  • 22. 22 DNS Fundamentals • Client-server architecture • TTL and caching • Name resolution steps
  • 23. 23 Content Delivery Network •Browser requests redirected to the most suitable edge server •Browser gets web site’s DNS CNAME entry with domain name in CDN network •Hierarchy of a CDN’s DNS servers direct client to a “nearby” server •Based on current network conditions as measured by the CDN
  • 24. 24 CDN Selection and Implementation • Redirect method selection: URL rewrite vs. URL redirect, partial- site vs. full-site • DNS changes – Local name server • CDN configuration changes
  • 25. 25 How To Measure Quality Attribute Impacts? • Performance – Page response times – Java EE component processing times – Data center network latency • Scalability – Ability to sustain traffic spikes while maintaining the same resource footprint – Resource utilization (bandwidth, CPU, etc.) • Other QA impacts – Availability and maintainability
  • 26. 26 Experimental Challenges • Scalability – Requires sufficient load to test elasticity of resources – Need to simulate fast transactional bursts – Gather production environment data during the February peak • Performance – Establish pre-CDN and post-CDN baselines under steady state – Eliminate outside “noise” by isolating transactions in a non-production environment
  • 27. 27 Monitoring and Measurement Framework • Consumer perspective – Real-time user monitoring – Browse versus shop transactions – Geographic distribution – Consistent and sustained rate • Application perspective – URI stem-level performance measurement – Host, network and end-to-end times • System perspective – Vmstat and bandwidth utilization
  • 28. 28 Consumer Transaction Emulation • Response times before and after CDN • Real-time user monitoring • Transaction characteristics and frequency ISP City and State Level3 Los Angeles, CA Savvis Santa Clara, CA Verizon Denver, CO MFN Washington, D.C. Internap Miami, FL Level3 Chicago, IL Sprint New York, NY
  • 30. 30 Browse and Shop Transaction Characteristics Transaction workload characteristics Browse Shop Number of transaction steps 9 6 Number of images retrieved 163 94 Number of scripts, HTML, CSS, Flash components 57 39 Number of server-side J2EE components accessed 12 15 Average image size 2.9 KB 2.8 KB Average size of HTML, script and Flash 4.9 KB 5.8 KB Total number of bytes retrieved per connection 250 KB 98 KB Number of web-server connections initiated from the browser 4 5
  • 32. 32 Outline • Problem • Significance • Methodology/Solution Results/Evaluation • Evaluation • Conclusion • Further research
  • 35. 35 Scalability: TCP Packet Count Reduction
  • 38. 38 Web Tier Scale Factor • Maximum concurrent Web server socket threads • Maximum object “hits” in Akamai • 16,000 hits / 3,600 threads • Equivalent to 4x of our Web server farm
  • 39. 39 Performance: Shop Tx Impact 20% improvement
  • 40. 40 Performance: Browse Tx Impact 30% improvement – more content for “window” shopping
  • 42. 42 Performance: HTML Object Download Time • Browse Transaction • Shop Transaction • Why the discrepancy between the RTUM and Server performance?
  • 43. 43 Maintainability and Availability • Configuration management – 2 hours on average to deploy configuration changes • Content management – 7-10 minutes to propagate content across edge servers • Achieved 100% availability during the observed February peak
  • 44. 44 Outline • Problem • Significance • Methodology/Solution • Results • Evaluation Conclusions • Further research
  • 45. 45 Conclusions – The Good • Improved user-perceived performance • Significant scalability impact • Availability improvements
  • 46. 46 Conclusions – The Not-so-Good • Server-side performance impacted by additional DNS redirects • Maintainability impacts – Configuration changes – Content changes
  • 47. 47 Outline • Problem • Significance • Methodology/Solution • Evaluation/Results • Conclusion Further research
  • 48. 48 Future Work • Edge computing – Edge delivery of applications • Impact of edge delivery on media streaming and protocols other than HTTP – RTSP, MMS
  • 49. 49 End of Presentation • Thank you • Questions are welcome

Editor's Notes

  1. Welcome! Let’s get started.
  2. Question for the audience Why is this still a problem after all these years? Focus on how little progress has been made 2002-2005 in terms of customer satisfaction and try to discuss whys: Traffic growth, exponential growth of online transactions and infrastructures not always keeping up with demand Discuss content as a driver to the website and enabler of shopping transactions
  3. Now on to dollars and cents of what it costs to be unavailable…these figures are from real research and they are likely to be much higher these days Surprising that the airline reservation center would have lower downtime costs than a home shopping channel
  4. This presentation will focus on an e-commerce environment similar to the ones on slide 6 although we can’t really say what it costs per hour to be unavailable This is a quick overview of our e-commerce environment from the architecture, workload and content delivery perspective Seasonal spikes between 6-10x for different metrics: visits or page views In subsequent slides, we’ll cover the architectural views and the content delivery model and its potential shortcomings Typical things to consider in content delivery and management. On first bullet bring up AJAX, RIAs and heavy Flash usage on some sites
  5. This is a generic model of architecture. We’ll discuss potential problems with content delivery that result from this type of architecture. Define STATIC and DYNAMIC content Define performance and scalability as key quality attributes Consumer and server-side views of performance Static = non-unique to a particular consumer (images, article pages) Dynamic = based on individual consumer characteristics (JSPs) Describe interactions, differences between static and dynamic elements and how they’re served Server-side caching helps offload repetitive requests for dynamic content Function of load balancing in the context of scalability and performance Describe where the content delivery problems from scalability and performance perspectives may reside Internet cloud and its role in content delivery Web servers - static Application servers – dynamic DB - dynamic
  6. The First Mile bottleneck refers to the limitations in the website’s connectivity to the Internet via its Internet Service Provider (ISP). In order to reach desired scalability it needs to continuously expand its connectivity to the ISP. The ISP, in turn, must also expand its capacity in order to meet its customers’ scalability requirements. Peering points also represent potential bottlenecks as large networks are not economically motivated to scale up the number of peering points with the networks of their competitors, especially since a significant portion of the traffic handled by the peering points is transit traffic with packets originating on other networks. This lack of competitive and financial motivation over time has resulted in a limited number of peering points across major networks. The Backbone Problem refers to the fact that the ISPs’ router capacity has historically not kept up with growth of traffic demands. Finally, the Last Mile problem reflects the limited capacity of a typical user’s connection to their ISP. 85% of our website’s consumers have broadband access, so this is less of a problem for our website. It’s important to note that just solving one of the above bottlenecks, such as the Last Mile, by increasing the reach of broadband connectivity at home will not automatically address the other limitations. These need to be treated as separate problems that, if addressed, would help solve the problem as a whole. The problems with the Internet cloud compound the other potential scalability and performance problems we discussed earlier.
  7. Let’s talk about workload in terms of page views. Traffic spikes several times a year and it’s “bursty” in nature. The weekly picture does not reflect hourly spikes we experience. Quick slide! This slide suggests the need to scale 5x based on page views alone.
  8. Can’t talk about content delivery without discussing content management and publishing This is a generic content management model…. Describes differences between static and dynamic content and catalog data vs. article pages. Static content tags are embedded in the JSPs which are rendered within the application server and usually contain static and dynamic content elements.
  9. Refer to outages during peaks from slide 15 With single hosting facility we cannot control the efficiency of content delivery once it leaves our network We could create our own network of geographically dispersed servers, but it would be cost prohibitive We have attempted to scale horizontally and vertically (define each) A Web page download consists of the following basic steps: server name resolution, TCP connection establishment, transmission of the HTTP request, reception of the HTTP response, reception of data packets, and TCP connection termination. Using HTTP/1.0 results in repeating the the above steps for each embedded object within a composite page. Note that when the embedded objects are stored on another server (e.g., servers in a content distribution service), having HTTP/1.1 support for persistent TCP connections across multiple HTTP requests does not eliminate the first two steps – but it reduces them by a factor of 2 to 10 Our challenge is not only how many connections we have open, but also for how long…large video files
  10. We’ll discuss significance from two perspectives: The impact on our e-commerce environment and other e-commerce environments – practical value The additional CDN research aspects evaluated in this work – research value
  11. When is website suitable for a CDN it has a high ratio of reads compared to writes client access patterns tend to access some set of objects more frequently limited windows of inconsistent data are acceptable data updates occur relatively slowly
  12. CDN stands for Content Delivery Network What do CDNs claim to help with?
  13. The spikes create extra load on our infrastructure that causes outages. It’s worth noting that 2006 is the year with the CDN in place…just a little preview of the results.
  14. Hypothesis: could we reduce resource utilization with a CDN? Here’s what we could address if we were to solve the problem….. This chart is showing CPU utilization spike in the web tier, but we experience similar curves for bandwidth and memory.
  15. Why do we even need to explore a new tactic? Refer to definition of tactic from Bass et a A design decision that is influential in the control of a quality attribute response. Tactics tell you what to do in order to affect a quality attribute response measure. Unlike sensitivity points, tactics are independent of any specific system.
  16. How we went about determining the criteria to measure impact of a CDN.
  17. Akarouting promises one-hop routing
  18. DNS is essentially a distributed database that follows the client-server architecture. Adequate performance of DNS is achieved through replication and caching. The server side portion of a request is handled by programs called name servers. They contain information about a portion of the global database and are capable of forwarding requests to other authoritative servers if necessary. The information is made available to the client-side software components called resolvers. A typical domain name on the Internet consists of two or more parts separated by dots such as my.yahoo.com. Top-level domain (TLD) represents the rightmost portion, .com in our case, while the subdomain(s) are represented by the labels to the left of the top-level domain. In our example, my.yahoo.com is a subdomain of yahoo.com, which in turn belongs to the .com top-level domain. Finally, the hostname refers to a domain name that has one or more IP addresses associated with it. Each domain or subdomain has an authoritative server associated with it. It contains and publishes information about the domain and any other domains it encapsulates. Root nameservers reside at the top of the DNS hierarchy and they are queried first to resolve TLD. Caching and timeto-live (TTL) are very important concepts in DNS and, as we will later discover, in CDN implementations. IP mappings obtained from DNS can be stored in the local resolver for a period of time as defined in TTL. This greatly reduces the load on the DNS servers. Figure 1 illustrates how a client typically finds the address of a service using DNS. The client application uses a resolver, usually implemented as a set of operating system library routines, to make a recursive query to its local nameserver. The local nameserver may be configured statically (e.g., in a system file), or dynamically using protocols like DHCP or PPP. After making the request, the client waits as the local nameserver iteratively tries to resolve the name (www.service.com in this example). The local nameserver first sends an iterative query to the root to resolve the name (steps 1 and 2), but since the subdomain service.com has been delegated, the root server responds with the address of the authoritative nameserver for the sub-domain, i.e., ns.service.com (step 3)1. The client’s nameserver then queries ns.service.com and receives the IP address of www.service.com (steps 4 and 5). Finally the nameserver returns the address to the client (step 6) and the client is able to connect to the server (step 7).
  19. Very simple extension of the DNS redirection mechanism, but the complexity lies in the algorithm that measures current network conditions. In essence, Akamai performs a highly complex translation of a customer’s domain to the IP address of the most suitable edge server. First, the Web browser requests an HTML object. In order to accommodate this request, the local DNS resolver has to translate the domain name into an IP address. The resolver issues a query to the customer’s DNS server which in turn forwards the request to the Akamai network. This is enabled via a configuration of a canonical name record (CNAME) in the origin site’s DNS name server. The CNAME triggers the request redirection to the CDN. Next, a hierarchy of Akamai servers responds to the request using the requestor’s IP address, the name of the CDN customer, and the name of the requested content as seeds for its DNS resolution. The CDN name resolution step is perhaps the most critical in this sequence of events. Configuration of the Akamai CDN is described in [4]. The steps for our deployment can be summarized as follows: 1. Create origin hostname 2. Activate Akamai edge hostname3. Activate content delivery configuration 4. Point website to Akamai network In our case, this process begins with configuration of a CNAME in our DNS name server. A CNAME record maps an alias or nickname to the real name which may lie outside the current zone. Typical format of a CNAME entry is as follows: name ttl class rr canonical name www IN CNAME joe.example.com. We need to set up an origin server hostname that will resolve to our content server. This server will be used by Akamai edge servers to retrieve our content, so it can be made available to all of the nodes in the CDN. The naming convention for the origin server is: origin-<website> where “website” refers to the is the hostname for our content that will be delivered from Akamai. Our website stores all of its static content in the generic images folder, so we will define the following origin server name: origin-images.example.com for images.example.com Next, we will create a DNS record for our origin server hostname on our authoritative name server. We will use the CNAME record type for this step. origin-www.example.com IN CNAME loadbalancer.example.com We are now pointing our website to the Akamai network. An edge hostname will need to be activated on an Akamai domain for our website using the CDN’s configuration console. It will resolve to the Akamai network. For example, www.example.com would have to point to www.example.edgesuite.net and www.example.edgesuite.net would in turn resolve to individual servers on the Akamai network since it owns the edgesuite.net domain. The remaining configuration steps need to be performed in the configuration console of Akamai and they are covered in-depth in [4].
  20. The main purpose of a CDN is to direct consumer requests for objects to a server at the optimal Internet location relative to the consumer’s location. The key components of a CDN architecture are described in [37]. They are defined as: overlay network formation, client request redirection, content routing and last-mile content delivery. The two most common techniques employed by the networks are DNS redirection and URL rewriting. The DNS redirection technique utilizes a series of DNS resolutions based on several factors such as server availability and network conditions with the purpose of identifying the most suitable server. The end result is a DNS response with the IP address to the content server. The response includes a time-to-live value that is usually limited to less than a minute (in the case of Akamai it is 20 seconds). The TTL has to be set to a relatively low value because the network conditions and server availability change constantly and quick IP re-mapping is key. The DNS redirection technique can facilitate either a full- or partial-site delivery. Will full-site delivery, all requests to the origin server are directed using DNS to a CDN server. If the CDN server can’t fulfill the request it simply routes it back to the origin server. Several networks, including Adero and NetCaching, employ this delivery model. The main shortcoming of this model is the additional routing overhead of wasted DNS requests that could have been handled by the origin server to begin with. With partial-site content delivery, on the other hand, the origin site modifies the URLs for certain objects or object directory locations to be resolved by the CDN’s DNS server. This approach seems to be well suited for our website due to its combination of static digital assets and dynamically generated server-side presentation components. URL rewriting is another potential solution for server lookups. With this technique, the origin server continuously rewrites the URL links for dynamically generated pages in order to redirect them to the appropriate CDN server. The DNS functionality remains on the origin site with this approach. When a page is requested by the user it will be served from the origin server. However, before it is served, all of the embedded links will be rewritten to point to the CDN’s DNS. Figure 3.1 shows a typical rewrite approach. The main drawback to the URL rewrite approach from the measurement standpoint is the fact that the rewrites usually take place at the Web server tier. Hence, the rewrite steps would inevitably introduce additional background noise to our performance measurements. Therefore, we decided to avoid this approach for the purpose of our study. At the time of writing of this thesis, we counted 18 different networks on Davison’s website. It is not our primary purpose to evaluate tradeoffs between the various networks and their implementations. The choice we made does not reflect a belief in superiority of one network over others - it is merely a reflection of the need to get our experimental test bed up and running as quickly as possible within boundaries imposed on us by our existing hosting facility. For our implementation, we settled on partial-site, DNS redirection-based CDN implementation using the Akamai delivery network.
  21. Why are availability and maintainability important? Bandwidth utilization Hosting facility CDN Server resource utilization CPU and run queues Memory page-ins and page-outs Measured in the context of traffic and page views h to date focused on network impacts alone
  22. This is different from research to-date
  23. What does a transaction consist of Why geographically dispersed locations for testing are important We ran over 1K transactions over a period of 48 hrs before and after
  24. DNS look-up The process of calling a DNS server to lookup and convert a hostname to an IP address. For instance, to convert www.foo.com to 10.0.0.1 Connect time The time it takes to connect to a Web server (or CDN edge server in our case) across a network from a client browser or an RTUM agent Secure sockets layer time The time it takes to create an SSL TCP/IP connection with a website. First byte time The time between the completion of the TCP connection with the destination server that will provide the displayed page’s HTML, graphic, or other component and the reception of the first packet (also known as first byte) for that object. Content download time The time in seconds that measures the actual time to deliver content (images, HTML, or other objects) from theWeb server to the browser. The application perspective will be captured using an appliance based application monitoring solution. The network location of this appliance is depicted in Figure 3.3.1. We will configure “watchpoints” using the appliance’s configuration tool to capture server-side response times of Java EE components corresponding to the transaction steps defined in the RTUM service. The appliance uses passive traffic analysis to capture actual transactions from the RTUM within our hosting environment and measures performance and availability of our e-commerce application as a whole. The important difference between this and other approaches is that our appliance does not generate any traffic and the only performance overhead it introduces is reading the copy of traffic from the network connection. The data is assembled into requests for objects, pages and user sessions. Performance metrics include host, SSL and redirect times. This solution also measures server errors or prematurely terminated connections due to increase in traffic. Figure 3.3.1 depicts the measurement timeline for a sample request that would be captured by our appliance [26]. The appliance solution groups latency into the following six categories and defines them as follows: Host time This is the combined time the Web, application, and database servers take to process a request. Host time is a key measure to assess performance implications of implementing a CDN on performance of our Java EE components (servlet, EJBs, etc.). It can be very short in the case of a static image or long in cases of long reports and complex server-side transactions such as adding a list of items to the shopping basket. CHAPTER 3. PROPOSED SOLUTION 29 Network time This is the time spent traveling across intervening networks. Once the server has prepared its response, host time is over and network time begins. A small object might be delivered quickly; a large one might take a long time. This time is highly dependent on the type of consumer’s connection. Low-bandwidth connections will result in higher network times and vice versa with broadband connections. Our monitoring appliance also records additional information on packet loss, out-of-order delivery, and round-trip time to help with this diagnosis. SSL time The appliance will record the time spent negotiating the encryption of encrypted transactions. This portion of the SSL time represents the server-side latency elements of the handshake versus the client-side SSL time captured by the RTUM. Redirect time This is the time the site spends sending a request on to other pages. In some applications, a request for a page results in a redirect that usually points elsewhere. This delay is recorded as redirect time. Idle time When a browser is retrieving a page, but there is no activity between objects on the same page, the HTTP interaction is defined as “idle”. This measurement is key to understanding the amount of time spent processing client-side scripts such as JavaScript. When there is inactivity in the middle of rendering the page within the browser, our appliance will measure it as idle time. End-to-end This is the total time for the object or page, from the moment the first packet of a request is seen until the browser acknowledges delivery of the last packet.
  25. Differentiate between the two types and discuss why expect the browse tx to benefit more. Also discuss number of web server connections.
  26. What does the appliance do?
  27. Physical memory is a finite resource on any system. The UNIX memory handler manages the memory allocations. The kernel is responsible for freeing up physical memory of idle processes by saving it to disk until it is needed again. Paging and swapping are used to accomplish this task. Paging refers to writing portions, termed pages, of a process’ memory to disk. Swapping refers to writing the entire process, not just part, to disk. Page-out represents the event of writing pages to disk, while page-in is defined as retrieving memory data from disk. Page-ins are common and under normal circumstances are not a cause for concern. However, if page-ins become excessive the kernel can reach a point where it’s actually spending more time managing paging activity than running the applications, and system performance suffers.
  28. We decided to look at all tiers. The application servers experienced a spike in CPU utilization – probably due to the elimination of the Web server bottleneck and more traffic going to the app servers. The Web servers experienced the most benefit. The DB server improvements were not related to the CDN, but rather a DB tuning exercise we undertook.
  29. Another way to look at the network efficiencies gained from offloading. High-content pages experienced the higher packet count drops. This results in lower resource utilization of the network gear (routers, switches, etc.), but we didn’t measure it as part of the experiments.
  30. Note the spike over a period of just a couple of hours.
  31. Discuss implications from hosting and cost perspective. For example, we could avoid start-up costs of a new hosting center. Our bandwidth utilization went up because we eliminated the bottleneck in the Web tier Akamai offloaded the equivalent of one Gbps connection to our hosting facility
  32. First time in a few years we had an unqualified success in terms of availability.