2. > Introduction Size matters Facebook (2009) +200B pageviews /month >3.9T feed actions /day +300M active users >1B chat mesgs /day 100M search queries /day >6B minutes spent /day (ranked #2 on Internet) +20B photos, +2B/month growth 600,000 photos served /sec 25TB log data /day processed thru Scribe 120M queries /sec on memcache Twitter (2009) 600 requests /sec avg 200-300 connections /sec; peak at 800 MySQL handles 2,400 requests /sec 30+ processes for handling odd jobs process a request in 200 milliseconds in Rails average time spent in the database is 50-100 milliseconds +16 GB of memcached Google (2007) +20 petabytes of data processed /day by +100K MapReduce jobs 1 petabyte sort took ~6 hours on ~4K servers replicated onto ~48K disks +200 GFS clusters, each at 1-5K nodes, handling +5 petabytes of storage ~40 GB /sec aggregate read/write throughput across the cluster +500 servers for each search query < 500ms >1B views / day on Youtube (2009) Myspace(2007) 115B pageviews /month 5M concurrent users @ peak +3B images, mp3, videos +10M new images/day 160 Gbit/sec peak bandwidth Flickr (2007) +4B queries /day +2B photos served ~35M photos in squid cache ~2M photos in squid’s RAM 38k req/sec to memcached (12M objects) 2 PB raw storage +400K photos added /day Source: multiple articles, High Scalability http://highscalability.com/
3. > Introduction Cloud levels the playing field 2007 founded by 6 people 2008 $29M funding from VC 2009 revenue - $270M $180M funding from Digital Sky Technologies 2010 1,000+ employees $300M funding from Google and Softbank Active unique players 75M monthly 60M daily 1M daily 4 days after launch 10M after 60 days Hosted in Amazon Web Services 12,000 EC2 nodes 3 Gigabits/sec of traffic between FarmVille and Facebook (at peak) caching cluster serves another 1.5 Gigabits/sec to the application Source: “How FarmVille Scales to Harvest 75 Million Players a Month”, 2010.02.08, Tedd Hoff http://highscalability.com/blog/2010/2/8/how-farmville-scales-to-harvest-75-million-players-a-month.html
4. > Introduction Cloud computing Characteristics On-demand self-service Broad network access Resource pooling Rapid elasticity Measured service Service models Software as a service Platform as a service Infrastructure as a service Deployment models Private cloud Community cloud Public cloud Hybrid cloud “Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models.” Source: The NIST Definition of Cloud Computing, Version 15, 2009.10.07, Peter Mell and Tim Grance http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc
5. > Introduction Service delivery models (On-Premise) Infrastructure (as a Service) Platform (as a Service) Software (as a Service) You manage Applications Applications Applications Applications You manage Data Data Data Data Runtime Runtime Runtime Runtime Managed by vendor Middleware Middleware Middleware Middleware You manage Managed by vendor O/S O/S O/S O/S Managed by vendor Virtualization Virtualization Virtualization Virtualization Servers Servers Servers Servers Storage Storage Storage Storage Networking Networking Networking Networking
6. > Architecting for Scale > Vertical Scaling Traditional scale-up architecture Common characteristics synchronous processes sequential units of work tight coupling stateful pessimistic concurrency clustering for HA vertical scaling units of work app server web data store app server web data store
7. > Architecting for Scale >Vertical Scaling Traditional scale-up architecture To scale, get bigger servers expensive has scaling limits inefficient use of resources app server web data store app server web
8. > Architecting for Scale >Vertical Scaling Traditional scale-up architecture When problems occur bigger failure impact data store app server web app server web
9. > Architecting for Scale >Vertical Scaling Traditional scale-up architecture When problems occur bigger failure impact more complex recovery app server web data store web
19. > Architecting for Scale > Horizontal scaling Scale-out architecture To scale, add more servers not bigger servers app server web data store app server web data store app server web data store app server web data store app server web data store app server web data store
20. > Architecting for Scale > Horizontal scaling Scale-out architecture When problems occur smaller failure impact higher perceived availability app server web data store app server web data store app server web data store app server web data store app server web data store app server web data store
21. > Architecting for Scale > Horizontal scaling Scale-out architecture When problems occur smaller failure impact higher perceived availability simpler recovery app server web data store app server web data store web app server data store web data store app server web data store app server web data store
22. > Architecting for Scale > Horizontal scaling Scale-out architecture + distributed computing parallel tasks Scalable performance at extreme scale asynchronous processes parallelization smaller footprint optimized resource usage reduced response time improved throughput app server web data store app server web data store web app server data store app server web data store perceived response time app server web data store app server web data store async tasks
23. > Architecting for Scale > Horizontal scaling Scale-out architecture + distributed computing When problems occur smaller units of work decoupling shields impact app server web data store app server web data store web app server data store app server web data store app server web data store app server web data store
24. > Architecting for Scale > Horizontal scaling Scale-out architecture + distributed computing When problems occur smaller units of work decoupling shields impact even simpler recovery app server web data store app server web data store web app server data store app server web data store app server web data store web data store
25. > Architecting for Scale >Cloud Architecture Patterns Live Journal (from Brad Fitzpatrick, then Founder at Live Journal, 2007) Web Frontend Apps & Services Partitioned Data Distributed Cache Distributed Storage
26. > Architecting for Scale >Cloud Architecture Patterns Flickr (from Cal Henderson, then Director of Engineering at Yahoo, 2007) Web Frontend Apps & Services Distributed Storage Distributed Cache Partitioned Data
27. > Architecting for Scale >Cloud Architecture Patterns SlideShare(from John Boutelle, CTO at Slideshare, 2008) Web Frontend Apps & Services Distributed Cache Partitioned Data Distributed Storage
28. > Architecting for Scale >Cloud Architecture Patterns Twitter (from John Adams, Ops Engineer at Twitter, 2010) Web Frontend Apps & Services Partitioned Data Queues Async Processes Distributed Cache Distributed Storage
29. > Architecting for Scale >Cloud Architecture Patterns Distributed Storage Facebook (from Jeff Rothschild, VP Technology at Facebook, 2009) 2010 stats (Source: http://www.facebook.com/press/info.php?statistics) People +500M active users 50% of active users log on in any given day people spend +700B minutes /month Activity on Facebook +900M objects that people interact with +30B pieces of content shared /month Global Reach +70 translations available on the site ~70% of users outside the US +300K users helped translate the site through the translations application Platform +1M developers from +180 countries +70% of users engage with applications /month +550K active applications +1M websites have integrated with Facebook Platform +150M people engage with Facebook on external websites /month Web Frontend Apps & Services Distributed Cache Parallel Processes Partitioned Data Async Processes
31. >Architecting for Scale Fundamental concepts Horizontal scaling for cloud computing Small pieces, loosely coupled Distributed computing best practices asynchronous processes (event-driven design) parallelization idempotent operations (handle duplicity) de-normalized, partitioned data (sharding) shared nothing architecture optimistic concurrency fault-tolerance by redundancy and replication etc.
32. > Architecting for Scale >Fundamental Concepts Asynchronous processes & parallelization Defer work as late as possible return to user as quickly as possible event-driven design (instead of request-driven) Cloud computing friendly distributes work to more servers (divide & conquer) smaller resource usage/footprint smaller failure surface decouples process dependencies Windows Azure platform services Queue Service AppFabric Service Bus inter-node communication Worker Role Web Role Queues Service Bus Web Role Web Role Web Role Worker Role Worker Role Worker Role
33. > Architecting for Scale >Fundamental Concepts Partitioned data Shared nothing architecture transaction locality (partition based on an entity that is the “atomic” target of majority of transactional processing) loosened referential integrity (avoid distributed transactions across shard and entity boundaries) design for dynamic redistribution and growth of data (elasticity) Cloud computing friendly divide & conquer size growth with virtually no limits smaller failure surface Windows Azure platform services Table Storage Service SQL Azure read Web Role Queues Web Role Web Role Worker Role Relational Database Relational Database Relational Database Web Role write
34. > Architecting for Scale >Fundamental Concepts Idempotent operations Repeatable processes allow duplicates (additive) allow re-tries (overwrite) reject duplicates (optimistic locking) stateless design Cloud computing friendly resiliency Windows Azure platform services Queue Service AppFabric Service Bus Worker Role Service Bus Worker Role Worker Role
35. > Architecting for Scale >Fundamental Concepts Hybrid architectures Scale-out (horizontal) BASE: Basically Available, Soft state, Eventually consistent focus on “commit” conservative (pessimistic) shared nothing favor extreme size e.g., user requests, data collection & processing, etc. Scale-up (vertical) ACID: Atomicity, Consistency, Isolation, Durability availability first; best effort aggressive (optimistic) transactional favor accuracy/consistency e.g., BI & analytics, financial processing, etc. Most distributed systems employ both approaches
Microsoft's Windows Azure platform is a virtualized and abstracted application platform that can be used to build highly scalable and reliable applications, with Java. The environment consists of a set of services such as NoSQL table storage, blob storage, queues, relational database service, internet service bus, access control, and more. Java applications can be built using these services via Web services APIs, and your own Java Virtual Machine, without worrying about the underlying server OS and infrastructure. Highlights of this session will include: • An overview of the Windows Azure environment • How to develop and deploy Java applications in Windows Azure • How to architect horizontally scalable applications in Windows Azure