Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

解讀雲端大數據新趨勢

解讀雲端大數據新趨勢
2018-05-16 @ iThome Cloud Summit 2018
雲端運算、大數據、物聯網、人工智慧,這些熱門話題從 2008 年開始就陸續出現在媒體版面上。放眼過去十年 Apache Hadoop 技術在臺灣本土的應用,本次分享將為各位解讀這四個話題之間的關聯,並探討 Big Data Stack on the Cloud 背後的市場需求驅動力,最後分享 Big Data Stack on Kubernetes 的進展。

  • Loggen Sie sich ein, um Kommentare anzuzeigen.

解讀雲端大數據新趨勢

  1. 1. My Journey of “Innovation” 
 ( aka “From Zero to One” )解讀雲端大數據新趨勢 Big Data Stack on The Cloud Jazz Yao-Tsung Wang Initiator and Chair, TDEA Data Architect, TenMax Shared at 2018-05-16 < iThome Cloud Summit 2018 >
  2. 2. Hello! I am Jazz Wang Co-Founder of Hadoop.TW Initiator and Chair of Taiwan Data Engineering Association (TDEA) Hadoop Evangelist since 2008. Open Source Promoter. System Admin (Ops). - 11 years (2002/08 ~ 2014/02) Associate Researcher in HPC field. - 2 years (2014/03 ~ 2016/04) Assistant Vice President (AVP), 
 Product Management of ‘Big Data Platform Management’ - 2 years (2016/04 ~ Now) Data Architect of Real-Time Bidding You can find me at @jazzwang_tw or
 https://fb.com/groups/dataengineering.tw 
 https://slideshare.net/jazzwang 2
  3. 3. 3200820112014201520162017 Cloud Computing Big Data Internet of Things Artificial Intelligence ….
  4. 4. 4 Different parts of single Data Pipeline
  5. 5. Life of Big Data 5 大數據 人工智慧 2013/05/01 http://www.nchc.org.tw/tw/e_paper/e_paper_content.php?SN=124&cat=news
  6. 6. ▷ ▷ ▷ ▷ ▷ 6
  7. 7. { } 7
  8. 8. 8
  9. 9. 9 mainframe 超級電腦 PC Cluster 電腦叢集 CPU Computing Intensive Memory Intensive 隨需自助服務 隨時網路存取 多人共享資源 快速重新部署 量化受控服務 雲 的 五 大 特 徵 Volume Variety Velocity Veracity Value 大 數 據 的 五 大 特 徵 Data Intensive ( Ex. SSD ) Netflow Intensive CDN GPU Intensive GPU TPU FPGA SDN
  10. 10. ... 10
  11. 11. ▷ CPU ➧ GPU ➧ TPU 🔜 ? ▷ DRAM 🔜 NVM (3DXPoint?) ▷ ➧ HDD ➧ SSD ➧ NVMe 🔜 ? ▷ Ethernet ➧ 🔜 ( ) / 11
  12. 12. { } 12
  13. 13. 13
  14. 14. Rich Hickey: The Composite Database
 @ Strata Conference + Hadoop World 2012 14 Source : Rich Hickey: Strata Conference + Hadoop World Keynote https://www.youtube.com/watch?v=P-NZei5ANaQ Traditional DB Indexing as Component Query as Component Reference : Rich Hickey: Deconstructing the Database https://www.youtube.com/watch?v=Cym4TZwTCNU Ex. Lucene / Solr ElasticSearch Ex. Impala / Presto Ex. Hive
  15. 15. ▷ 
 
 
 AAA 
 BBB CCC 
 DDD 
 EEE FFF BI ▷ 
 15
  16. 16. 16 OpsDev 
 XD XXX 

  17. 17. { } 17
  18. 18. 18
  19. 19. 19
  20. 20. { } 20
  21. 21. 21 Enterprise SMB On-premises Cloud Service
  22. 22. 22 2008 2010 
 Telecom 2012 
 eCommerce 2015 / 
 Finance 2018 
 Manufactory 202x ?
 Healthcare
  23. 23. 23 Source: https://www.slideshare.net/jazzwang/hadoop-deployment-model-osdctw “Hadoop Deployment Model” @ OSDC.TW 2014, By Jazz Wabg
  24. 24. 24 Source: https://www.slideshare.net/jazzwang/14-0308-treasuredatacloud , 2014-03-08, By Jazz Wang Data Source Data Collector Cloud Storage Cloud ETL Cloud Query Engine BI Report
  25. 25. Apache Hadoop from 0.x to 1.x 25 Master Worker #1 Worker #2 Worker #3 NameNode DataNode DataNode DataNode DataNode Job
 Tracker Task Tracker Task
 Tracker Task
 Tracker Task
 TrackerComputation Layer MapReduce Storage Layer HDFS 
 Data Locality
  26. 26. Apache Hadoop from 2.x to 3.x 26 Master Worker #1 Worker #2 Worker #3 NameNode DataNode DataNode DataNode DataNode Resource
 Manager Node
 Manager Node
 Manager Node
 Manager Node
 ManagerComputation Layer YARN Storage Layer HDFS Container 
 Data Locality GPU
  27. 27. 27 https://www.facebook.com/groups/hadoop.tw/permalink/1061706333938741/? comment_id=1072414466201261&reply_comment_id=1073302882779086&comment_tracking={%22tn%22%3A%22R%22}
  28. 28. 28 http://www.slideshare.net/HadoopSummit/hadoop-cloud-storage-object-store-integration-in-production https://www.youtube.com/watch?v=XehH3iJJy3Q
  29. 29. Apache Hadoop 2.7 HCFS 29 Master Worker #1 Worker #2 Worker #3 Resource
 Manager Node
 Manager Node
 Manager Node
 Manager Node
 ManagerComputation Layer YARN Storage Layer HCFS Windows Azure Blob AWS S3 Google Cloud Storage CephFS Hadoop Compatible File System
  30. 30. 30 Source: https://www.slideshare.net/databricks/robust-and-scalable-etl-over-cloud-storage-with-apache-spark “Robust and Scalable ETL over Cloud Storage with Apache Spark“, Spark Summit 2017
  31. 31. { } 31
  32. 32. Apache Spark 2.3 Kubernetes 32Source: https://kubernetes.io/blog/2018/03/apache-spark-23-with-native-kubernetes/
  33. 33. K8S Big Data SIG 33 ▷ Big Data SIG Covers deploying and operating big data applications (Spark, Kafka, Hadoop, Flink, Storm, etc) on Kubernetes. We focus on integrations with big data applications and architecting the best ways to run them on Kubernetes. ▷ Big Data SIG ○ K8S 
 Design and architect ways to run big data applications effectively on Kubernetes ○ Discuss ongoing implementation efforts ○ 
 Discuss resource sharing and multi-tenancy (in the context of big data applications) ○ K8S 
 Suggest Kubernetes features where we see a need
  34. 34. Apache Big Data Ecosystem 34 SIG Apache Big Data Project Apache Hadoop HDFS - Data Locality Doc - https://goo.gl/zZNzwH - https://github.com/apache-spark-on-k8s/kubernetes-HDFS 
 - https://youtu.be/DxCDxi08HWo @ Spark Summit 2017 Apache Spark Spark Core - Design Proposal - https://goo.gl/ppY28R / https://goo.gl/nyJRWi - Dynamic Allocation Proposal - https://goo.gl/QhsRaF - SPARK-18278 / Kubernetes Issue #34377 - https://github.com/apache-spark-on-k8s/spark - https://youtu.be/0xRHONrWwvU @ Spark Summit 2017 Apache Zepplin - Spark 
 https://github.com/kubernetes/kubernetes/tree/master/examples/spark Apache Storm https://github.com/kubernetes/kubernetes/tree/master/examples/storm Apache Cassandra - https://kubernetes.io/docs/tutorials/stateful-application/cassandra/ - https://github.com/kubernetes/examples/tree/master/cassandra Apache Kafka - https://github.com/kubernetes/contrib/tree/master/statefulsets/kafka Apache Airflow - Roadmap - https://goo.gl/BpM4jq
  35. 35. 35
  36. 36. ▷ [ ] ○ - ○ AI ▷ [ ] ○ - ○ - ▷ [ ] ○ - ○ ▷ [ ] ○ - ○ - ▷ [ ] ○ ○ .... 36
  37. 37. 37 https://goo.gl/2z9BGK
  38. 38. Thanks! Any questions? You can find me at @jazzwang_tw or
 https://fb.com/groups/dataengineering.tw 
 https://slideshare.net/jazzwang 38

×