Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Predictive Elastic Database Systems

51 Aufrufe

Veröffentlicht am

Predictive Elastic Database Systems by Rebecca Taft

On-line transaction processing (OLTP) database management systems (DBMSs) are a critical part of the operation of many large enterprises. These systems often serve time-varying workloads due to daily, weekly or seasonal fluctuations in demand, or because of rapid growth in demand due to a company’s business success. In addition, many OLTP workloads are heavily skewed to “hot” tuples or ranges of tuples. For example, the majority of NYSE volume involves only 40 stocks. To deal with such fluctuations, an OLTP DBMS needs to be elastic; that is, it must be able to expand and contract resources in response to load fluctuations and dynamically balance load as hot tuples vary over time. In this talk, I will present a system we built at MIT called P-Store, which uses predictive modeling to reconfigure the system in advance of predicted load changes. I will show that when running a real database workload, P-Store achieves comparable performance to a traditional static OLTP DBMS while using 50% fewer computing resources.



Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Predictive Elastic Database Systems

  1. 1. Predictive Elastic Database Systems May 24th, 2018 1 Rebecca Taft – becca@cockroachlabs.com
  2. 2. Transaction Performance Makes or Breaks a Business Online Retail Financial Applications Internet of Things Social Media …. 2
  3. 3. Challenges to transaction performance: skew and workload variation 3 Database Elasticity E-Store – manage skew and react to variation P-Store – predictive modeling for time variation
  4. 4. Data Partition 1 Data Partition 2 Data Partition 3 Router 4
  5. 5. Data Partition 1 Data Partition 2 Data Partition 3 Router Query: Get cart_id 758 4
  6. 6. Data Partition 1 Data Partition 2 Data Partition 3 Router Hash(758) → Partition 1 4
  7. 7. Data Partition 1 Data Partition 2 Data Partition 3 Router Query: Add “Harry Potter” to cart_id 534 5
  8. 8. Data Partition 1 Data Partition 2 Data Partition 3 Router Hash(534) → Partition 3 5
  9. 9. TransacRons 0 1 3 4 5 ParRRon 1 ParRRon 2 ParRRon 3 Data Partition 1 Data Partition 2 Data Partition 3 Router Uniform Workload Transaction Arrival Rate 6
  10. 10. But many real database workloads are skewed, not uniform 7
  11. 11. 8
  12. 12. 9
  13. 13. TransacRons 0 3 5 8 10 ParRRon 1 ParRRon 2 ParRRon 3 Data Partition 1 Data Partition 2 Data Partition 3 Router Skewed Workload Transaction Arrival Rate 10
  14. 14. And real database workloads are variable 11
  15. 15. 12
  16. 16. 13
  17. 17. 14
  18. 18. TransacRons 0 1 3 4 5 ParRRon 1 ParRRon 2 ParRRon 3 Data Partition 1 Data Partition 2 Data Partition 3 Router Uniform Workload Transaction Arrival Rate 15
  19. 19. TransacRons 0 3 5 8 10 ParRRon 1 ParRRon 2 ParRRon 3 Data Partition 1 Data Partition 2 Data Partition 3 Router Uniform Workload,
 Increased Load 18 Transaction Arrival Rate
  20. 20. TransacRons 0 3 5 8 10 Part 1 Part 2 Part 3 E-Store: Elastic Scaling to Adapt to Workload Transaction Arrival Rate 19 Uniform Workload, Increasing Load
  21. 21. TransacRons 0 3 5 8 10 Part 1 Part 2 Part 3 E-Store: Elastic Scaling to Adapt to Workload Transaction Arrival Rate 19 Uniform Workload, Increasing Load
  22. 22. E-Store: Elastic Scaling to Adapt to Workload Transaction Arrival Rate TransacRons 0 3 5 8 10 Part 1 Part 2 Part 3 Part 4 Part 5 19 Uniform Workload, Increasing Load
  23. 23. E-Store: Elastic Scaling to Adapt to Workload Transaction Arrival Rate TransacRons 0 3 5 8 10 Part 1 Part 2 Part 3 Part 4 Part 5 19 Uniform Workload, Increasing Load
  24. 24. TransacRons 0 2.5 5 7.5 10 Part 1 Part 2 Part 3 Part 4 Part 5 E-Store: Elastic Scaling to Adapt to Workload Transaction Arrival Rate 19 Uniform Workload, Increasing Load
  25. 25. TransacRons 0 2.5 5 7.5 10 Part 1 Part 2 Part 3 Part 4 Part 5 E-Store: Elastic Scaling to Adapt to Workload Transaction Arrival Rate TransacRons 0 3 5 8 10 Part 1 Part 2 Part 3 Transaction Arrival Rate 19 Uniform Workload, Increasing Load Skewed Workload
  26. 26. TransacRons 0 2.5 5 7.5 10 Part 1 Part 2 Part 3 Part 4 Part 5 E-Store: Elastic Scaling to Adapt to Workload Transaction Arrival Rate TransacRons 0 3 5 8 10 Part 1 Part 2 Part 3 Transaction Arrival Rate 19 Uniform Workload, Increasing Load Skewed Workload
  27. 27. TransacRons 0 3 5 8 10 Part 1 Part 2 Part 3 TransacRons 0 2.5 5 7.5 10 Part 1 Part 2 Part 3 Part 4 Part 5 E-Store: Elastic Scaling to Adapt to Workload Transaction Arrival Rate Transaction Arrival Rate 19 Uniform Workload, Increasing Load Skewed Workload
  28. 28. E-Store Elastic DBMS Architecture 20 Live Migration Elastic Controller Planner Load Monitor Shared Nothing DBMS (e.g. H-Store)
  29. 29. A Case Study: B2W Digital 21
  30. 30. 3-Day Online Retail Database Workload 22
  31. 31. 3-Day Online Retail Database Workload 22
  32. 32. Elastic Scaling Adapts to Workload 23
  33. 33. Reactive Scaling Causes Latency Spikes 23
  34. 34. P-Store
 A Predictive Elastic DBMS 24
  35. 35. P-Store Architecture 25 Live Migration Predictive Controller Predictor Planner Load Monitor Shared Nothing DBMS (e.g. H-Store)
  36. 36. Ideal Capacity Actual Servers Allocated Demand Machine Capacity 26
  37. 37. Demand Machine Capacity ? 27
  38. 38. Options 1. Add machine(s) Demand Machine Capacity 28
  39. 39. Options 1. Add machine(s) Demand Machine Capacity 28
  40. 40. Options 1. Add machine(s) Demand Machine Capacity 28
  41. 41. Options 1. Add machine(s) 2. Remove machine(s) Demand Machine Capacity 28
  42. 42. Options 1. Add machine(s) 2. Remove machine(s) 3. No change Demand Machine Capacity 28
  43. 43. Options 1. Add machine(s) 2. Remove machine(s) 3. No change Demand Machine Capacity 28
  44. 44. Options 1. Add machine(s) 2. Remove machine(s) 3. No change Demand Machine Capacity Effective Capacity 28
  45. 45. Options 1. Add machine(s) 2. Remove machine(s) 3. No change Challenges 1. Time to Reconfigure 2. Effective Capacity 3. Cost Demand Machine Capacity Effective Capacity 28
  46. 46. Time to Reconfigure • Rate control: small chunks of data, spaced apart • ∃ some time D – minimum time to move entire database w/ no performance impact % of Data 0 50 100 Nodes 1 2 3 4 % of Data 0 50 100 Nodes 1 2 3 → 29
  47. 47. Time to Reconfigure • Rate control: small chunks of data, spaced apart • ∃ some time D – minimum time to move entire database w/ no performance impact % of Data 0 50 100 Nodes 1 2 3 4 % of Data 0 50 100 Nodes 1 2 3 → Time = ¼ * D 29
  48. 48. Effective Capacity 30 • What transaction rate can the system support? • Max rate per machine: Q
  49. 49. Cost Cost = 7
  50. 50. Example: Scale from 1 to 6 nodes 34
  51. 51. Demand Machine Capacity Effective Capacity 4 machines
 at t = 9 Goal: find path with 
 minimal cost such that
 eff. capacity > demand Cost of {path to A machines ending at time t}
 =
 Cost of {path to B machines ending at time t – TimeB→A} + CostB→A 35
  52. 52. Demand Machine Capacity Effective Capacity Move: 3→4 machines Ending at t = 9 36
  53. 53. Demand Machine Capacity Effective Capacity T3→4 = 3 Move: 3→4 machines Ending at t = 9 36
  54. 54. Demand Machine Capacity Effective Capacity T3→4 = 3 C3→4 = 12 Move: 3→4 machines Ending at t = 9 36
  55. 55. Demand Machine Capacity Effective Capacity T3→4 = 3 C3→4 = 12 Move: 3→4 machines Ending at t = 9 36
  56. 56. Demand Machine Capacity Effective Capacity T3→4 = 3 C3→4 = 12 Move: 3→4 machines Ending at t = 9 36
  57. 57. Demand Machine Capacity Effective Capacity T3→4 = 3 C3→4 = 12 C3→4 = ∞ Move: 3→4 machines Ending at t = 9 36
  58. 58. Demand Machine Capacity Effective Capacity Move: 4→4 machines Ending at t = 9 37
  59. 59. Demand Machine Capacity Effective Capacity T4→4 = 1 Move: 4→4 machines Ending at t = 9 37
  60. 60. Demand Machine Capacity Effective Capacity T4→4 = 1 C4→4 = 4 Move: 4→4 machines Ending at t = 9 37
  61. 61. Demand Machine Capacity Effective Capacity T4→4 = 1 C4→4 = 4 ✔ Move: 4→4 machines Ending at t = 9 37
  62. 62. Demand Machine Capacity Effective Capacity Move: 3→4 machines Ending at t = 8 38
  63. 63. Demand Machine Capacity Effective Capacity T3→4 = 3 Move: 3→4 machines Ending at t = 8 38
  64. 64. Demand Machine Capacity Effective Capacity T3→4 = 3 C3→4 = 12 Move: 3→4 machines Ending at t = 8 38
  65. 65. Demand Machine Capacity Effective Capacity T3→4 = 3 C3→4 = 12 Move: 3→4 machines Ending at t = 8 38
  66. 66. Demand Machine Capacity Effective Capacity T3→4 = 3 C3→4 = 12 Move: 3→4 machines Ending at t = 8 ✔ 38
  67. 67. Demand Machine Capacity Effective Capacity 39
  68. 68. Demand Machine Capacity Effective Capacity 39
  69. 69. Demand Machine Capacity Effective Capacity 39
  70. 70. Demand Machine Capacity Effective Capacity Feasible Solution
 Total Cost: 30 39
  71. 71. Demand Machine Capacity Effective Capacity Move: 4→4 machines Ending at t = 8 40
  72. 72. Demand Machine Capacity Effective Capacity T4→4 = 1 Move: 4→4 machines Ending at t = 8 40
  73. 73. Demand Machine Capacity Effective Capacity T4→4 = 1 C4→4 = 4 Move: 4→4 machines Ending at t = 8 40
  74. 74. Demand Machine Capacity Effective Capacity T4→4 = 1 C4→4 = 4 ✔ Move: 4→4 machines Ending at t = 8 40
  75. 75. Demand Machine Capacity Effective Capacity T4→4 = 1 C4→4 = 4 ✔ Move: 4→4 machines Ending at t = 8 And So On…. 41
  76. 76. Dynamic Programming Algorithm for Planning Reconfigurations • Complexity ∝ # time steps and max number of machines needed • In practice: runtime < 1 second • Greedy alternatives don’t work 42
  77. 77. Time Series Prediction • Diurnal, periodic workload, some variation day to day • Prediction must complete in seconds-to-minutes • We use Sparse Periodic Auto-Regression (SPAR) – Chen et al. NSDI ‘08 43
  78. 78. Evaluation • Can we reduce resource usage? • Can we prevent latency spikes? 44
  79. 79. Experiments • 3 Days of B2W workload run on H-Store • Cluster of 10 servers, 6 partitions per server • About 1 GB of shopping cart and checkout data • Track machines allocated, throughput, latency, reconfiguration time periods 45
  80. 80. Results – Static, Peak Provisioning
 Machines Used: 10 46
  81. 81. Results – Static, Average Provisioning
 Machines Used: 4 47
  82. 82. 48 Results – Reactive Scaling
 Avg. Machines Used: 4.02
  83. 83. Results – P-Store
 Avg. Machines Used: 5.05 49
  84. 84. Results Comparison: CDF of Top 1% of Latency 50
  85. 85. Summary: P-Store Evaluation • Can we reduce resource usage? • Saves 50% of computing resources compared to static allocation • Can we prevent latency spikes? • Superior performance compared to reactive approach • On a real workload, P-Store reduces resource usage while keeping latency within application requirements 51
  86. 86. Simulation 52 • 4.5 months of B2W data • Test many hypothetical allocation strategies + parameters
  87. 87. Simulation: Example 53
  88. 88. Summary ❖ Real database workloads are skewed and vary over time ❖ Elasticity enables management of skew and adaptation to load changes ❖ Predictive scaling improves performance during load changes compared to reactive scaling Rebecca Taft becca@cockroachlabs.com 54

×