Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

iThome Cloud Summit: The next generation of data center: Machine Intelligent Cluster

2.397 Aufrufe

Veröffentlicht am

隨著時代的演進,管理機器的方式也隨著慢慢進化。隨著一台台機器的管理方式,變成了集群(Cluster)的管理方式. 而應用程式的部屬方始也從最原始的實體機部署,到了虛擬機器(VM)甚至到了容器(Container)的部署方式。

Kubernetes, DCOS 與 Docker Swarm 就是結合了容器部署與機器集群的管理方式的一個管理套件。但是集群的部屬方式並不代表你可以節省部署與維護的人力,因為隨著集群的數量變大,緊接來著就是需要更多管理的技巧。 比如說:

如何有效地管理集群的使用率?
如何讓集群內閒置機器能夠自動關閉?
如何有效地自動擴展你的集群?
如何讓集群偵測惡意攻擊或是自我修復?
講者將會帶來一個新的觀念,就是一個具有機器智能的集群管理系統。結合時下最受歡迎的機器學習架構(SMACK)與深度學習代表的 Tensorflow 如何讓你的集群能夠更有效地分析使用量,進而讓你的集群變得有智能。變成下一代的機器智能集群(Machine Intelligent Cluster)。

Veröffentlicht in: Internet
  • Login to see the comments

iThome Cloud Summit: The next generation of data center: Machine Intelligent Cluster

  1. 1. Machine Intelligent Cluster: The next generation of data center Evan Lin @Linker Networks
  2. 2. About me Cloud Architect @ Linker Networks Golang User Group - Co- Organizer Top 5 Taiwan Golang open source contributor (github award) Developer, Curator, Blogger
  3. 3. Recap Cloud Summit 2016
  4. 4. Agenda • Problems on data center • How machine learning helps • Machine Intelligent Cluster • Applications • Q&A
  5. 5. Data center
  6. 6. • Power consumption • Low usage • Unpredictable peak • Noisy neighbors Efficiency • Physical damage • Networking problem • Anomaly • Attack Risk Real data center
  7. 7. Power consumption
  8. 8. Low usage and Unpredictable peak
  9. 9. Noisy neighbor
  10. 10. Use machine learning improve DC power consumption
  11. 11. None of your business?
  12. 12. Modern Data center: Machine Cluster
  13. 13. Before machine cluster DB Master: IP: 192.168.1.222 DB Slave: IP: 192.168.1.223 Web Server 1: IP: 192.168.1.101 Web Server 2: IP: 192.168.1.102 Web Server 3: IP: 192.168.1.103 Load Balancer: IP: 1.2.3.4
  14. 14. Container orchestration Resource arrangement Scalability Portability Automation migration
  15. 15. Resource management 3 Web App Servers 2 DB Servers 1 Load Balancer
  16. 16. Scalability
  17. 17. Automation migration
  18. 18. Automation migration
  19. 19. Automation migration
  20. 20. Automation migration
  21. 21. But .. we need better ..
  22. 22. No prediction
  23. 23. How to define scale out threshold? 50 %? 75 %? 25 %?
  24. 24. Machine Intelligent Cluster
  25. 25. Efficiency Maximize Utilization Operation Optimization Accident Risk Mitigation Serviceability Management Machine Intelligence Cluster How MIC helps
  26. 26. Operation Optimization 1. Reinforcement learning 2. Adjust thermostat 3. Check the reward (CPU performance). [1]: Refer from https://goo.gl/ly3zyX
  27. 27. Maximize Utilization Analyze utilization and reduce working machines to save our customer budget - Predict utilization trend - Provide auto-scaling threshold adjustment
  28. 28. Prediction and dynamic threshold
  29. 29. Optimized Scheduler Node 1 Node 2 Node 3 Node 1 Node 2 Node 3 Nginx (CPU 30%) DB- MySQL (IO 25%) DB- Mongo (IO 30%) Apache (CPU 30%) Backend Process (CPU 35%) DB- Oracle (IO 35%) NodeJS (CPU 7%) Go backend (CPU 8%) Nginx (CPU 30%) DB- MySQL (IO 25%) NodeJS (CPU 7%) Go backend (CPU 8%) Apache (CPU 30%) Backend Process (CPU 35%) DB- Mongo (IO 30%) DB- Oracle (IO 35%) Maximize Utilization P.S. Not rearrange processes, we change the scheduler to avoid it happen..
  30. 30. Model 1 Serial Number Prediction S.M.A.R.T. RNN Prediction Serviceability Management (cont.) Model 2
  31. 31. Dummy VM Detection Outlier Attack Detection Mitigate risk
  32. 32. Storage SDN Zombie Tagging system
  33. 33. Architecture
  34. 34. Cloud Native Architecture
  35. 35. HPC (with GPU) Server Storage SDN Storage SDN Data Collect Probe & Sensor & Smart GW Visualization Data Process Data Analysis & Machine Learning DCOS/ Kubernetes Spark ML Tensorflow DCOS / Kubernetes Cassandra (Storage) Kafka (Queueing) Go/Akka (Connector) Spark (ETL/Streaming) D3.js Scikit Learn R Interactive Dashboard Jupyter Notebook Zeppelin ML Job Scheduler Chronos MIC System Architecture
  36. 36. Data Agent Kafka Spark Streaming Cassandra Spark ML (Classification, Clustering) TensorFlow (Deep Learning) Backend Server API Portal TensorFlow Predict SparkML Predict MIC Data Flow
  37. 37. Applications on MIC Machine Intelligent Cluster IOT Gaming 5G NFV E-Commerce
  38. 38. Machine Intelligent Cluster Summary • Machine cluster with Intelligent • Features • Self-Optimization • Self-Learning • Self-Recovery • Green, Secure and Predictive machine cluster
  39. 39. 歡迎訂閱 碼天狗 http://weekly.codetengu.com/
  40. 40. Thank You

×