Case Study with Answers.com on Scaling with Memcached and MySQL
1. Answers to the Scaling Challenge: A Case Study with Answers.com on Scaling with Memcached and MySQL Presenters: Dan Marriott, Director - Production Operations, Answers.com Joaquín Ruiz, EVP Products, Gear6
2. Answers to the Scaling Challenge: A Case Study With Answers.com on Scaling With Memcached and MySQL April 14, 2010 Dan MarriottDirector - Production Operationsdanm@answers.com
15. Some key infrastructure elements 2 Data Centers – Active/Active HP Blade Servers (BL460) and 2U Servers (DL380) Fusion-io SSDs in all large MySQL DB servers VMware HP/LeftHand SAN Hardware Load-Balancers 9
16. Software 10 Linux – mainly CentOS 5.x/64bit Apache/JBOSS Lucene / Solr Memcached MySQL 5.0.x Apache-Tomcat Memcached MySQL 5.0.x
17. Secret to Web 2.0 Everything runs from memory in Web 2.0. Evan Weaver – Twitter 11
18. Questions: How do I Scale???? Four answers to the scaling challenge from Answers.com: Use Enterprise-grade PCI-SSDs in your MySQL servers Enhance Memcached tier Controlling DB Slave server clusters Hardware Load-Balancers 12
19. Scaling Answer #1 Use Enterprise-grade SSDs in your MySQL servers 13
20.
21. Quick plug :-) Panel: How Solid-state Technologies are Transforming MySQL Server Performance and the Datacenter Architectures Sumeet Bansal (Fusion-io) Ryan White (Cloudmark) Dan Marriott (Answers.com) Vadim Tkachenko (Percona Inc) Jeremy Zawodny (craigslist.org) Weds 5:15pm - Ballroom D 15
23. Memcached tier - before 10 x 16/32GB RAM Memcached Servers per Data Center Divided into several clusters Striped instances across each cluster 10’s millions items/instance 17
24. Memcached tier – Challenges No redundancy Lose 25% of cache (or worse) on any server failure Loss of cache = poor performance/user experience Costly (OpEx: Rackspace, power, cooling, maint, admin) 18
25. Cache data critical to performance 10’s of millions of pages Pages are dynamic – always publishing Unify data on same topic from different data sources onto one page 19
31. Lose memcached node? Lose 25% cached data (or more) 2-20 addit. MySQL queries to retrieve metadata and data needed to construct page Heavy load on MySQL Servers Longer page load times Site slows down 24
34. Handle cache data restore when node becomes available againAlso: Address memcached slab issues 25
35. Alternative solution – Gear6 Built-in redundancy function (pair of boxes) Mirroring (writes *everything* to both nodes) 200GB Memcached space per box (RAM+SSD) Uses standard VIP mechanism Graceful failover (no impact) Automatic cache-sync on node recovery Auto fail-back and rebalance VIP No code changes necessary 26
38. Value-add with Gear6 memcached Improve app reliability Ensure no SPOF at memcached layer Scale up our database infrastructure safely Significantly reduce TCO by decommissioning 20 servers Save 6 man-months of custom memcached wrapper development 29
39. Gear6 Value-add …cont’d Insight into memcached performance GUI helped us troubleshoot app issues First class support (even at 4:00am!) New in v3.0: Ability to dynamically resize memcached instances with no cache loss!! 30
41. Controlling DB Slave server clusters Maint. required sending command to Load Balancer to disable slave node in cluster On DB node failure, lost ‘000s queries before monitoring noticed and issued LB command ---------- Solution: Install lighty (lighttpd) on every DB server 32
62. Web 2.0 Needs: Data Acceleration Hierarchy of Needs: Survive 1. Accelerated Data: i.e. CACHE = Survival! Web Stack Net Interface Clients Storage App Web RDBMS Internet Storage Interface: X CDN Proxy LoadBalancer Data Cache
63. Web 2.0 Needs: Memcached In LAMP/JAVA/Ruby World Data Cache = Memcached Web Stack Net Interface Clients Storage App Web RDBMS Internet Storage Interface: X CDN Proxy LoadBalancer Memcached
64. Scaling Needs: More than Memcached Memcached = Survival Is there more to life?
65. Scaling Needs: More than Memcached Sharing? Ops? Query? Persistence? Acceleration
66. Acceleration: Basic Services Foundation Acceleration Servicesfor Gear6 Memcached: Memory on-boarding Hybrid storage: DRAM and Flash Clustering Replication Elastic pool sizing Data management Enhanced slab management Enhanced key space management These basic features have been built into the Gear6 Memcached Server Acceleration
67. Gear6: Beyond Acceleration Persistence Local or In-network snapshots Rapid restore and cache warm-up Offline analytics Primary store: Unstructured datasets Query Native Memcached Regex matching Key-based Value-based K-V based Deletes: K, V, and K-V matching delete operations http://www.gear6.com/memcached-resources/query Operations Persistence Available Today Query
68. Gear6: The near term Operations Advanced atomic operations based onintegration with Redis Memcached interface Elements: Strings, Sets and Lists Atomic support Push, pop, add, remove Unions, Intersections, Diffs Sharing Cloud to origin synchronization services Multi-site replication “Dynamic Data Bus” Ops Sharing
69. Gear6: Beyond Memcached Web Stack Net Interface Clients Storage App Web RDBMS Internet Storage Interface: X CDN Proxy LoadBalancer Gear6
70.
71. KV operation store for unstructured data setsWeb Stack Net Interface Clients Storage App Web RDBMS Internet Storage Interface: X CDN Proxy Gear6 LoadBalancer
72. More Information & Resources Gear6 products available as: Cloud images: free and paid on Amazon and GoGRID DEB and RPM packages: free and paid Dedicated data center server images: paid HP solutions More resources: http://www.gear6.com/memcached-resources/query http://www.gear6.com/memcached-resources/export http://www.gear6.com/memcached-resources/tools http://www.gear6.com/memcached-product/memcached
73. QUESTIONS?Thank You. Slides: http://tinyurl.com/mysqlconf2010-scaling-tips Presenters: Dan Marriott, Director - Production Operations, Answers.com Joaquín Ruiz, EVP Products, Gear6
Hinweis der Redaktion
These three are interrelated Content is Dynamic and PersonalTake a step back…Many more people and not just in the USATraffic (cisco petabyte data): measuring in petabytes per monthApplications: Dynamic and personalized
A lot of this population is coming into US-based social networking, gaming, media sites… distance means latency issue.Proliferation of broadband drove users to the net.USA #33 in broadband (>2Mbps); #16 in average speed. Miniwatts data shows the shift to Europe and Asia in particular.World internet usage is only NOW @ 1.6b now (24% according to 50x15 site)NOW MOBILE is adding rapidly to this population explosion.Akamai state of the internet report for q1 2009 gives very insightful information of current data trends.
Consumer is driving the traffic growth (3x of business)>40% CAGR growth (4x in 4 years)
How do sites make the same software stack go faster
Why we are here again:Scalable stack framework for LAMP and Java had/have to emerge in order to accelerate stack and provide distribution services
Why we are here again:Scalable stack framework for LAMP and Java had/have to emerge in order to accelerate stack and provide distribution services
100ms latency introduced by traversing internet to origin site Typically +0.1 secs means 2-4% DROP in revenue
100ms latency introduced by traversing internet to origin site Typically +0.1 secs means 2-4% DROP in revenue
Maslow's Hierarchy Of Needs
Increases revenue via being able to economically scale and support more members, services and transactionsScalable stack framework delivered as a network service addressable by all stack componentsOnramp information into Memory ASAPMemcached has emerged as de facto protocol to load information into memory BUT a lot more is neededMemcached ++ Focused on caching functions Advance network-based distributed services Non relational functions
Web 2.0 driving user & traffic growth Key Apps: Entertainment, Communication, Social Networking Key Drivers: Mobile, Broadband, International, ConsumerMobile applications – 3G+ is 22% by 2010 (Mobile goes hi speed)Consumer Internet Traffic 2006–2012 This category encompasses any IP traffic that crosses the Internet and is not confined to a single service provider’s network. Peer-to-peer (P2P) traffic, still the largest share of Internet traffic today, will decrease as a percentage of overall Internet traffic. Internet video streaming and downloads are beginning to take a larger share of bandwidth, and will grow to nearly 50 percent of all consumer Internet traffic in 2012.
We talked about “Universal Distro” and EC2 beforeGear6 Web Cache Software requirements:Certified on HP, Dell, Rackable…Requires dedicated serverRedundant GE ports or 10 GE ports64-bit (x86 64) server with 32GB or more of RAMIDE/SATA HD with 80GB+Option: 2.5” SATA drive bays for Flash SSDs - Intel X25-E or SamsungGear6 Web Cache delivers:High availability and consistent scalable high performance50k-200k Memcached ops per sec32-300+GB Memcached space per rack unitFault toleranceComplete cache control and visibility of resourcesCost effective mechanism of scaling database and application tiers
We talked about “Universal Distro” and EC2 beforeGear6 Web Cache Software requirements:Certified on HP, Dell, Rackable…Requires dedicated serverRedundant GE ports or 10 GE ports64-bit (x86 64) server with 32GB or more of RAMIDE/SATA HD with 80GB+Option: 2.5” SATA drive bays for Flash SSDs - Intel X25-E or SamsungGear6 Web Cache delivers:High availability and consistent scalable high performance50k-200k Memcached ops per sec32-300+GB Memcached space per rack unitFault toleranceComplete cache control and visibility of resourcesCost effective mechanism of scaling database and application tiers