Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

euclides-c mthesis

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 25 Anzeige

Weitere Verwandte Inhalte

Ähnlich wie euclides-c mthesis (20)

Anzeige

Aktuellste (20)

Anzeige

euclides-c mthesis

  1. 1. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- Master Thesis Presentation Internet Architecture and Systems Laboratory Reducing Tail Latency In Cassandra Cluster Using Regression Based Replica Selection Algorithm Chauque Euclides
  2. 2. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- Outline 1. Background 2. Tail Latency 3. Replica Selection 4. Proposed Approach 4.1. Linear Regression Based Replica Selection 4.2. Predicting Query Execution Time 4.3. Training Data Generation 4.4. Model Training 4.5. Experimental Results 4.6. Comparison With the Heron 5. Summary 6. Future work 2
  3. 3. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 1. Background u For business oriented applications, fast and predictable response times are critical for a good user experience. u A study conducted by Amazon and Google [1], where a controlled delay was added on every query before sending back results to the user, found that: u An extra delay of 500ms per query resulted in a 1.2% loss of revenue. u Bounce probability in a website increases the longer the website takes to load. 3 [1] https://www.gigaspaces.com/blog/ amazon-found-every-100ms-of-latency-cost-them-1-in-sales/
  4. 4. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 2. Tail Latency • It is challenging to consistently deliver fast response time, since applications are generally multi- tiered, where serving a single end-user request may involve contacting multiple servers l Causes of Latency can be attributed to Server Performance Variability, due to: Queuing, Shared Resources, Background Demons 4
  5. 5. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 3. Replica Selection [1/3] u Looking into the causes of the tail latency, it follows that it is infeasible to eliminate all latency variability. u However some approaches were developed to reduce its impact, these approaches rely on standard techniques including: u Giving preferential resource allocations or guarantees; u Reissuing requests; uTrading off completeness for latency; 5
  6. 6. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 3. Replica selection [2/3] u A recurring pattern to reducing tail latency is to take advantage of the redundancy built into each tier of the application architecture. u Replica selection strategies can help reducing tail latency when the performance of the servers differ. u A request can be directed to the presumably best replica, i.e. the one that is expected to serve the request with the smallest latency. u Ideal Replica Selection Proprieties u Replica selection needs to quickly adapt to changing system dynamics. u Must avoid entering oscillating instabilities. u Should not be computationally costly, nor require significant coordination overheads 6
  7. 7. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 3. Replica selection [3/3] 7 l Jaiman, et al. Heron: Taming Tail Latencies in Key-Value Stores under Heterogeneous Workloads, 37th ISRDS, IEEE 2018. - Takes into consideration the size of the values associated with keys. - The algorithm uses Bloom filters to keep track of keys associated with large values. - Whenever a replica a processing a request for a large value, it is marked as busy. - As the amount of data in the datastore increases, the bloom filter cannot be expanded without loosing previous mapping. l Suresh et al. C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection, NSDI’15, USENIX 2015 - The Algorithm consists of a replica ranking algorithm and a rate control and backpressure algorithm; - It ranks the the servers, taking into account server side queue, and service time. - An incoming request is sent to a server with the minimum expected service time.
  8. 8. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 8 4. Proposed Approach Linear Regression Based Replica Selection
  9. 9. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 4.1. Linear Regression Based Replica Selection 9 l Previous approaches do not support aggregation queries l Query duration is inferred based on the size of the value requested and not on real estimates l In my research I explore a different approach, using a regression model to predict query duration; l And focus on reducing the tail latency above p999
  10. 10. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 4.2. Predicting Query Execution Time 10
  11. 11. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 4.3. Training Data Generation [1/2] u For data collection, 3 tables for from the TPC-H benchmark were loaded into a Cassandra cluster; u 8 servers were used, a replication factor was set to 3. u Subsequently, locust, was used to issue requests, using the chosen subset of TPC-H queries, to simulate user requests; u The response time values for different percentiles were recorded for each request. u The same process was repeated, with different number of simulated users to simulate an increased load. 11
  12. 12. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 4.3. Training Data Generation [2/2] 12 Ø The queries show different response time behavior; Ø The queries with longer response time show a greater variation of response time as the load is increased.
  13. 13. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 4.4. Model Training [1/2] u To keep prediction overhead low, based on [1] a Linear Regression was chosen to fit the data. u For each query template data, a regression model was fit. u As the evaluation method for the regressors the R Squared was used: u The R squared is the percentage of the dependent variable variation that a linear model explains [1] https://scikit-learn.org/0.16/modules/computational_performance.html 13
  14. 14. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 4.4. Model Training [2/2]
  15. 15. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 4.5. Results: Homogeneous Servers [1/3] 15 l Homogeneous Servers - Figures below shows the tail latency values p999 and p99999 for each query; - Overall latency is improved.
  16. 16. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 4.5. Results Homogeneous Servers [2/3] 16 u Comparison of p50, p90 and p999. u Higher percentile latency (99.9%) is improved, however the 90% percentile is degraded.
  17. 17. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 4.5. Results Homogeneous Servers [3/3] 17 u Throughput Comparison
  18. 18. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 4.5. Results Heterogeneous servers [1/3] 18 l Up to 2 seconds delay was introduced into 4 servers response to simulate an environment with servers with different processing capabilities. l Figures below shows the tail latency values p999 and p99999 for each query
  19. 19. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 4.5. Results Heterogeneous servers [2/3] 19 u Comparison of p50, p90 and p999. u Higher percentile latency p999 and p99999 are improved, however the p50 percentile is degraded.
  20. 20. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 4.5. Results Heterogeneous servers [3/3] 20 u Throughput comparison for a cluster with heterogeneous servers
  21. 21. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 4.6. Comparison with Heron 21 Ø Colloquium B comment by Professor Keiichi Yasumuto. Proposed approach relation with previous work Ø P999 Response time for all queries, and p999 aggregate comparison between proposed method and heron.
  22. 22. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 5. Summary Ø In the present work the tail latency problem was reviewed, and the problem of server selection was considered as a method to reduce tail latency.. Ø Previous work had been based in a simpler queries, thus is no longer suitable for the complex queries that came to be supported in Cassandra, this served as motivation for exploring a new approach for server selection using a regression model to model the interaction between the queries. Ø This new approach proved to be successful in reducing tail latency, while preserving the throughput, however it affected negatively the lower percentiles.. 22
  23. 23. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 6. Future Work Ø A still remaining point to explore is the use of more evolved machine learning models, to see if the excessive overhead assumption holds true or not. Ø And also, experiment with an even greater number of servers 23
  24. 24. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- 24 End
  25. 25. 奈良先端科学技術大学院大学無限の可能性、ここが最先端 -Outgrow your limits- Models Computational Performance Ø Prediction Latency l Sklearn benchmark for different models prediction latency Ø Prediction Throughput l Sklearn benchmark for different models prediction throughput https://scikit-learn.org/0.16/modules/computational_performance.html 25

×