Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Designing a scalable twitter<br />Nati Shalom,<br />CTO & Founder GigaSpaces<br />John D. Mitchell<br />Mad Scientist of F...
About GigaSpaces Technologies<br />Enabling applications to run a distributed cluster as if it was a single machine…<br />...
Introduction – Why twitter as an example<br />Everybody knows twitter <br />Twitter exposes really difficult scaling chall...
Why twitter is different then other web apps?<br />Twitter makes many to many relationship first class citizen<br />Twitte...
Scaling challenges 1 -  Fluctuating growth <br />Every user network (Following/Followers) grows constantly <br />Traffic t...
Scaling challenges 2 – The crowd effect<br />Re-tweet can lead to a perfect storm<br />Every Scaling has a physical limit<...
Designing a scalable twitter<br />
Approach 1 –  Use share nothing web scaling approach<br />Application<br />Application<br /><ul><li>Use load balancer and ...
Grow by adding more web containers</li></ul>Read Service<br />Read Service<br />Web<br />Data Base<br />Publish Service<br...
Reader/Publisher Service<br />namespace MicroBlog.Services<br />{<br />    public interface IReaderService<br />    {<br /...
Users<br />Approach 1 scalability challenge<br />Application<br />Application<br />Application<br />Application<br />Read ...
Reader – Database Implementation<br />public ICollection&lt;Post&gt; GetUserPosts(string userID, DateTime fromDate)<br /> ...
Eliminate database bottlenecks – a questions of alternatives<br />Solving the database bottleneck requires change<br />Dif...
The NOSQL movement<br />Distributed key/value store (#NOSQL) – <br />New hot trend, influenced by Google, Amazon..) <br />...
No correlation between failure rate and disk type - SCSI, SATA, or fiber channel
Lower temperatures are associated with higher failure rates</li></li></ul><li>NOSQL common principles<br />Assume that fai...
In-Memory vs Filesystem based approaches<br />In-memory <br />Pros:<br />Complementary – the database can be kept unchange...
Choosing the right solution for our twitter app<br />Performance analysis:<br />Twitter allows 150 req/hour per user<br />...
Scaling twitter usingSpace Based Architecture (SBA)<br />
What is Space Based Architecture (SBA)<br />Processing unit cloud: <br /><ul><li>Scale through Partitioning
Virtualized middleware</li></ul>What is a space:<br /><ul><li>Elegant – 4 API
Solves:
Data sharing
Messaging
Workflow
Nächste SlideShare
Wird geladen in …5
×

Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web Applications

8.418 Aufrufe

Veröffentlicht am

Twitter is a good example for next generation real-time web applications, but building such an application imposes challenges such as handling an every growing volume of tweets and responses, as well as a large number of concurrent users, who continually *listen* for tweets from users (or topics) they follow. During this session we will review some of the key design principles addressing these challenges, including alternatives *NoSQL* alternatives and blackboard patterns. We will be using Twitter as a use case, while learning how to apply these to any real-time we application

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web Applications

  1. 1. Designing a scalable twitter<br />Nati Shalom,<br />CTO & Founder GigaSpaces<br />John D. Mitchell<br />Mad Scientist of Friendster.<br />
  2. 2. About GigaSpaces Technologies<br />Enabling applications to run a distributed cluster as if it was a single machine…<br />75+ Cloud Customers<br />Among Top 50 Cloud Vendors<br />300+ Direct Customers<br />2<br />2<br />
  3. 3. Introduction – Why twitter as an example<br />Everybody knows twitter <br />Twitter exposes really difficult scaling challenges <br />Twitter can serve as a generic example for any real-time/social web application<br />If you can built a scalable twitter your already half way through.<br />
  4. 4. Why twitter is different then other web apps?<br />Twitter makes many to many relationship first class citizen<br />Twitter behaves like a living organism<br />Every post gets to number of followers <br />Every user follows number of users <br />Followers<br />Following<br />User<br />User<br />
  5. 5. Scaling challenges 1 - Fluctuating growth <br />Every user network (Following/Followers) grows constantly <br />Traffic tends to fluctuate randomly<br />Followers<br />Following<br />User<br />User<br />
  6. 6. Scaling challenges 2 – The crowd effect<br />Re-tweet can lead to a perfect storm<br />Every Scaling has a physical limit<br />Followers<br />User<br />
  7. 7. Designing a scalable twitter<br />
  8. 8. Approach 1 – Use share nothing web scaling approach<br />Application<br />Application<br /><ul><li>Use load balancer and stateless web server to load balance the web traffic
  9. 9. Grow by adding more web containers</li></ul>Read Service<br />Read Service<br />Web<br />Data Base<br />Publish Service<br />Publish Service<br />Users<br />Load Balancer<br />Web<br />
  10. 10. Reader/Publisher Service<br />namespace MicroBlog.Services<br />{<br /> public interface IReaderService<br /> {<br /> ICollection&lt;Post&gt; GetUserPosts(String userID);<br />ICollection&lt;Post&gt; GetUserPosts(String userID, DateTime fromDate);<br /> }<br />}<br />namespace MicroBlog.Services<br />{<br /> public interface IPublisherService<br /> {<br /> void PublishPost(Post post);<br /> }<br />}<br />
  11. 11. Users<br />Approach 1 scalability challenge<br />Application<br />Application<br />Application<br />Application<br />Read Service<br />Read Service<br />Read Service<br />Read Service<br />The database becomes the bottleneck<br />Web<br />Data Base<br />Publish Service<br />Publish Service<br />Web<br />Publish Service<br />Publish Service<br />Load Balancer<br />Web <br />Web<br />IIS<br />Web<br />
  12. 12. Reader – Database Implementation<br />public ICollection&lt;Post&gt; GetUserPosts(string userID, DateTime fromDate)<br /> {<br />// Create command:<br />IDbCommand dbCommand = _dbConnection.CreateCommand();<br /> dbCommand.CommandText = String.Format(<br />&quot;SELECT * FROM Post WHERE UserID=&apos;{0}&apos; AND PostedOn &gt; {1}&quot;,<br /> userID, fromDate);<br />// Execute command:<br />IDataReader dataReader = dbCommand.ExecuteReader();<br />// Translate results from db records to .NET objects:<br />List&lt;Post&gt; result = ReadPostsFromDataReader(dataReader);<br />// Return results:<br />return result;<br />}<br />
  13. 13. Eliminate database bottlenecks – a questions of alternatives<br />Solving the database bottleneck requires change<br />Difference approaches offers various degree of change<br />Database clusters- read-only databases<br />Requires significant data access design changes<br />Does not improve performance In Memory DataGrid – Change at the application data not the database<br />Memcache – Small change/Small value<br />fairly limited not transactional, doesn’t support SQL query, non clustered<br />Google App Engine Persistency (On top of Big Table)<br />Provides standard JPA access (Smaller change)<br />Many model limitations<br />Does not provide database integration into existing model<br />Application Partitioning (several different databases for the same application)<br />Requires significant data access design changes<br />Need to synchronize between the databases for transactional integrity<br />
  14. 14. The NOSQL movement<br />Distributed key/value store (#NOSQL) – <br />New hot trend, influenced by Google, Amazon..) <br />Use Distributed commodity HW then expensive HW<br />Designed for massive scale<br />Examples<br />FIlesystem based implementation<br />Amazon Dynamo/SimpleDB<br />Google Big Table<br />Casandra<br />Memory based implementation<br />GigaSpaces<br />Gemstone<br />IBM extremeScale<br />JBoss infinispan <br />Oracle Coherence<br />Did you know?<br /><ul><li>Disk failure/year - 3% vs. 0.5 - 0.9% estimated
  15. 15. No correlation between failure rate and disk type - SCSI, SATA, or fiber channel
  16. 16. Lower temperatures are associated with higher failure rates</li></li></ul><li>NOSQL common principles<br />Assume that failure is inevitable <br />Disk, machine, network will fail<br />Don’t avoid it (through costly HW) - cope with it<br />Reduce the impact of failure by:<br />Keeping multiple replicas <br />Distributing the data (partitioning)<br />CAP Theorem<br />Relax some of the consistency constrains – eventual consistency<br />Use Map/Reduce to handle aggregation<br />Parallel query<br />Execute the query close to the data<br />Available as Filesystem or In-Memory implementation<br />
  17. 17. In-Memory vs Filesystem based approaches<br />In-memory <br />Pros:<br />Complementary – the database can be kept unchanged <br />Designed for real time performance (Not limited to disk I/O)<br />Enable execution of the code with the data (stored procedure)<br /> Cons<br />Memory is more expensive then file-system (Cost of per GB storage)<br />File-System<br />Pros:<br />Can manage terra bytes of data<br />Low cost per GB storage<br />Cons<br />Fairly radical change -&gt; requires complete re-write<br />Best of both worlds<br />Memory based storage for real time access<br />Filesystem for long-term data<br />
  18. 18. Choosing the right solution for our twitter app<br />Performance analysis:<br />Twitter allows 150 req/hour per user<br />For 1M users that means 40,000 req/sec<br />Capacity <br />The actual data that needs to be accessed in real-time is only the window of time between interactions (Last 10/30 min should be more then sufficient)<br />The rest of the data can be stored in file-system storage (For users who just logged in)<br />Assuming a ratio of 20% writes and the size per post is 128 bytes:<br />Data produced = 40k*20% * 128 = 1Mb/Sec<br />We can keep a buffer of an hour in less then 5G<br />Solution: <br />Use In Memory Data Grid for the real-time access and files-ystem storage for long term, search and other analytics requirements<br />
  19. 19. Scaling twitter usingSpace Based Architecture (SBA)<br />
  20. 20. What is Space Based Architecture (SBA)<br />Processing unit cloud: <br /><ul><li>Scale through Partitioning
  21. 21. Virtualized middleware</li></ul>What is a space:<br /><ul><li>Elegant – 4 API
  22. 22. Solves:
  23. 23. Data sharing
  24. 24. Messaging
  25. 25. Workflow
  26. 26. Parallel proc.</li></ul>What is Processing unit :<br /><ul><li>Bundle of services, data, messaging
  27. 27. Collocation into single VM
  28. 28. Unified Messaging & Data
  29. 29. In-Memory </li></ul>Space-Based Architecture (SBA) is a software architecture pattern for achieving <br />linear scalability of stateful, high-performance applications, based on the Yale’s<br />Tuple-Space Model (Source Wikipedia)<br />
  30. 30. SBA (Step I) – Remove DB Bottlenecks DataGrid & NoSQL<br /><ul><li>Reduce I/O bottlenecks – less DB access
  31. 31. In-Memory caching
  32. 32. Reduced Network hops (if in-process)</li></ul>Application<br />Read Service<br />Publish Service<br />Users<br />Load Balancer<br />NoSQL<br />In-Memory<br />Data Grid<br />Web<br />
  33. 33. Route calls<br />Based on @userid<br />Reader<br />(Proxy)<br />Users<br />Load Balancer<br />SBA (Step II) – Linear Scalability: Partitioning and Collocation<br />Writer<br />(Proxy)<br />Data Base<br />Space<br />Space<br />Space<br />Space<br />Space<br />Reader<br />Reader<br />Reader<br />Reader<br />Reader<br />Writer<br />Writer<br />Writer<br />Writer<br />Writer<br />
  34. 34. Scaling read/write<br />Partition the data based on @user-id<br />Read uses Map/Reduce pattern to read the data from all relevant partitions<br />Space<br />Reader<br />Writer<br />Read all <br />tweets<br />Space<br />Space<br />Space<br />Space<br />Space<br />Reader<br />Reader<br />Reader<br />Reader<br />Reader<br />Post <br />a tweet<br />Web<br />Writer<br />Writer<br />Writer<br />Writer<br />Writer<br />Web<br />space.write(post);<br />Read- Map/Reduce<br />Write<br />
  35. 35. Monitor<br />Provision<br />Users<br />Load Balancer<br />Adding Dynamic Scalability<br />Data Base<br />Web<br />Space<br />Space<br />Space<br />Reader<br />Reader<br />Reader<br />Web<br />Writer<br />Writer<br />Writer<br /><ul><li>SLA Driven Policies
  36. 36. Dynamic Scalability
  37. 37. Self healing</li></li></ul><li>Summary - Space Based Architecture<br />Linear Scalability<br />Predictable cost model – pay per value<br />Predictable growth model<br />Dynamic<br />On demand – grow only when needed<br />Scale back when resources are not needed anymore<br />SLA Driven<br />Automatic <br />Self healing<br />Application aware<br />Simple<br />Non intrusive programming model<br />Single clustering Model<br />
  38. 38. Questions?<br />
  39. 39. GigaSpaces Home Page:http://www.gigaspaces.com/<br />GigaSpaces XAP Product Overview:<br />http://www.gigaspaces.com/wiki/display/XAP7/Concepts <br />GigaSpaces XAP for the Cloud: http://www.gigaspaces.com/cloud<br />

×