16. Memcache Why 为响应实时直播视频信息快速上传同步和广告服务的要求 What 收到服务请求,内存查找 If 命中,请求返回 Else 查找memcache If cache命中,请求返回;同时item添加到内存中 Else if cache正在被创建,返回。 Else cache内容缺失,返回;启动异步OLTP查找 If DB命中,item同时添加到cache和内存中 Else DB缺失,item placeholder添加到cache中,设置过期
17. Ad Request come in. Look up video from RAM. 3". Video in mem, return response 3. Video not it mem, redirect to /ax/ interface Async look up video from cache. 5,6: Query memcached for Video. the result might be: a).NULL, b). Hit but data is loading by some adserver c). Hit and data available 7". Return response. 7. if cache missing, then return. 8. Set Video with status = loading 9. Invoke a thread fetch video from oltp DB. 10, 11. Query video and ge result from DB 12,13. Set Video with status = available and data in both cache and mem.
18. Dot Server Why Adserver高可用性保护方案以应对突发峰值和Bug down机. 触发Down机的bug往往会中断全部应用服务器 What 服务器内维持一个待服务请求队列, 当排队请求超过阈值 或由某用户请求触发特定Bug,导致机群相继Down机 后续请求转到一个无逻辑的Server Farm,并返回一个cache的无广告标准输出. 最大可能避免对外服务失效
19. 无状态日志处理 日志处理设计的主要考虑因素: 减少日志体积,容易扩展,避免自己定义格式和写Parser Text Log -> Binary log by Google Protocol buffer 用较少的机器达到较高吞吐量:理想情况下接近磁盘I/O 日志收集分区,一个Node只处理固定一组广告服务器日志 尽量减少日志中需要交换处理的内容 MapReduce由Hadoop/Java 转C++ Aggregation由Python转C++ 重处理:中断后,容易从头开始,避免人工接续处理 无状态设计的关键: 日志条目自身应包含交易所有现场和Callback的必要信息 不变的Meta Data通过Pusher从OLTP中提取
26. Long tail roll up & Data Availability 一个简单的计算: 100,000 video X 100 site X 100 Country X 1,000 Ad = 1,000,000,000,000 行/天,Impossible! 互联网视频的访问热度呈现典型“长尾”分布 5%的热门视频占有50%的流量 在所有视频上统计所有维度/粒度的指标,ROI太低 Long Tail Roll up 对单日小于某一阈值流量的video在DB中roll up成一个item 创建Long tail表单独对长尾视频做粗粒度统计 Data Availability 对不需要某些粒度指标的客户不统计相关维度 产品功能设计需谨慎,一旦发布很难收回,从而带来维护负担
36. 运营原则——50% 上限 & N+1 Data Center 所有子系统做容量扩展规划时,预估上限以50%负载(经验值)为限。 Adserver为峰值预留50%容量,e.g突发新闻,世界杯决赛 后台日志处理在用户要求的数据发布时间50%内完成,有机会应对意外出错重做一遍。 业务量上涨导致系统平均负载>50%,扩容的信号! N+1 Data Center 数据中心不同地理位置分布 备用ISP,备用CDN 保证一个DC由于意外服务中断,其他N个可正常负载服务。
40. 关于测试的一点话题 以自动化回归测试为核心 并未使用TDD单元测试 坚持Local测试->集成测试->回归测试->Staging 每次升级发布前的测试检查清单 New Feature Function Test UI/Core/VI Integration Test Regression Case Suite Memory Leak Check Performance Test Compatible Test Post Release Live Check
45. 关于分布式系统和Web服务扩展相关链接 Blogs NatiShalom's Blog: Discussions about middleware and distributed technologieshttp://natishalom.typepad.com/nati_shaloms_blog/ All Things Distributed: Werner Vogels' weblog on building scalable and robust distributed systems.http://www.allthingsdistributed.com/ High Scalability: Building bigger, faster, more reliable websiteshttp://highscalability.com/ ProductionScale: Information Technology, Scalability, Technology Operations, and Cloud Computinghttp://www.productionscale.com/ iamcal.comhttp://www.iamcal.com/ (the "talks" section is particularly interesting) Kitchen Soap: Thoughts on capacity planning and web operationshttp://www.kitchensoap.com/ MySQL Performance Blog: Everything about MySQL Performancehttp://www.mysqlperformanceblog.com/
46. Presentations Scalable Internet Architectureshttp://www.slideshare.net/shiflett/scalable-internet-architectures How to build the Webhttp://www.slideshare.net/simon/how-to-build-the-web Netlog: What we learned about scalability & high availabilityhttp://www.slideshare.net/folke/netlog-what-we-learned-about-scalability-high-availability-430211 Database Sharding at Netloghttp://www.slideshare.net/oemebamo/database-sharding-at-netlog-presentation MySQL 2007 Techn At Digg V3http://www.slideshare.net/epee/mysql-2007-tech-at-digg-v3 Flickr and PHPhttp://www.slideshare.net/coolpics/flickr-44054 Scalable Web Architectures: Common Patterns and Approacheshttp://www.slideshare.net/techdude/scalable-web-architectures-common-patterns-and-approaches How to scale your web apphttp://www.slideshare.net/Georgio_1999/how-to-scale-your-web-app Google Cluster Innardshttp://www.slideshare.net/ultradvorka/google-cluster-innards Sharding Architectureshttp://www.slideshare.net/guest0e6d5e/sharding-architectures