Presentation given at DreamIt Fall '11 Philly class. Build a smart at using new technologies that allow you to scale easier down to the road while keeping maintenance and costs low.
2. Who Am I? I Am You! Just fast forward 4 yrs, add luck & late nights Combination of Biz & Tech Worked on high velocity web apps 2 Startups eVariant - MyHealthConnect.com Yellow Hammer Media Geo Marketing Technology
4. Build It Smart Ingredients Low Cost Scalable Horizontal Vertical HA (High Availability) Maintainable GeoSmart or GSLB (Global Server Load Balancing) Multiple Layers working Together DNS, File Serving, CDN, etc Don’t Implement just Plan for it (Procrastination can be a good thing)
5. Shiny Balls Don’t be Distracted by Shiny Balls Don’t use it just because it’s cool Focus on what you know Leverage Experts If all else fails, build it fast
6. Web Servers IIS, Apache Rule w/ 80% But they are overhead pigs c10K Problem Event Driven / Newer Web Servers Node.js Nginx (lighttpd) Twisted (Tornedo)
7. Web Servers - Node.js Created by Ryan Dahl in 2009 (Sponsored by Joyent) Non Blocking, Event Driven (Not just a web server) Single Threaded (use Cluster, Monitoring is a pain) JavaScript based Use Cases Push/Streaming Notification Stacks – Socket IO (SocketStream) Super Fast Simple Apps Ex: Pixel Tracking (Click Redirect, Conversion, UI in Rails/PHP) c10K Problem is not a problem Don’t Use It For Complex Apps Regular Web Server
8. Web Servers - Other Nginx c10k is not a problem, Single Threaded like Node Easy to Use 7.65% of the Web Great for Content Serving (CDNs), Load Balancing Alt to Apache to reduce memory and overhead Twisted / Tornado Event Driven based off of Ruby’s Event Machine Great for High C10k, Performance Needs Long Polling HTTP Streaming
10. Databases - MongoDB NoSQL version of MySQL Fast, Scalable, HA, MapReduce Tons of Community Support Use Cases Capped Collections – Great for Event logging Async Transactions – Great for Analytics Don’t Use it For Large data sets – 100-200GB+ Vertica, Greenplum, Netezza, Teradata
11. Databases – Vertica Developed by Michael Stonebraker Big Data – Tera or Petra Use Cases Lots of dataneed HA on cheap machines Correlation queries (Zynga) Swallow vast data sets quickly (5 TB in 1 hour) History Its Bad At OLTP
12. Databases – Cassandra Developed by Facebook, No Apache Project NoSQL & MoSQL CQL Use Cases Counters Advanced Replication (GeoSmart) Super Fast Memory to Disk Its Bad At OLTP Relational Data
13. Databases – VoltDB Created by Michael Stronebraker NewSQL? – Fast Relational Data Relational in Memory lightning fast HA, Shared Nothing, Self Healing 1.6 million transactions /sec on commodity servers Use Cases Real Time Analytics OLTP Schemas that don’t change Don’t Use It for Schemas that change often Complicated replicated Large data sets
14. Databases – VoltDB Created by Michael Stronebraker NewSQL? – Fast Relational Data Relational in Memory lightning fast HA, Shared Nothing, Self Healing 1.6 million transactions /sec on commodity servers Use Cases Real Time Analytics OLTP Schemas that don’t change Don’t Use It for Schemas that change often Complicated replicated Large data sets
15. Cloud Serving / DNS AWS, Rackspace, Linode, Firehost I like AWS EC2 (Auto Scaling) ElisticMap Reduce S3, CloudFront Databases – RDS, SimpleDB ElastiCache ElasticBeanstalk & CloudFormation Route 53 (not UltraDNS) Dyn $30 - $200 / month GSLB Round Robin / Fail Over
16. GeoMarketing App w/ Real Time Reports 1 Ad can produce 130 Million Inserts / day Running on EC2 Poor I/O & Network Tried MySQL (NDB), Cassandra, Redis, Riak Solution VoltDB stores 2 hours of data Dump stats to MySQL Dump rows out to Vertica (BigData) Reporting – Aggregate query of Volt & MySQL