Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Why i love Apache Spark?

194 Aufrufe

Veröffentlicht am

On the theme of why I love <your favorite IBM product> , this is why I love Apache Spark. Why is it it an IBM product? Find out!

Veröffentlicht in: Software
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Why i love Apache Spark?

  1. 1. CONFIDENTIAL © 2019 Why I love Spark
 Jean Georges “JG" Perrin February 11th 2019 v100
  2. 2. CONFIDENTIAL © 2019 Why I love Spark
 (and all it does for IBM products) Jean Georges “JG" Perrin February 11th 2019 v100
  3. 3. CONFIDENTIAL © 2019 JGP • Jean Georges Perrin • @jgperrin • Chapel Hill, NC • I ! SW since 1983 • #Knowledge = 
 𝑓 ( ∑ (#SmallData, #BigData), #DataScience)
 & #Software  • #IBMChampion x11 • #KeepLearning • @ http://jgp.net
  4. 4. CONFIDENTIAL © 2019
  5. 5. CONFIDENTIAL © 2019 Analytics operating system
  6. 6. CONFIDENTIAL © 2019 An analytics operating system? Hardware OS Apps
  7. 7. CONFIDENTIAL © 2019 An analytics operating system? Hardware OS Apps HardwareHardware OS OS
  8. 8. CONFIDENTIAL © 2019 An analytics operating system? Hardware OS Apps HardwareHardware OS OS Apps
  9. 9. CONFIDENTIAL © 2019 Apps Analytics Distrib. An analytics operating system? Hardware OS Apps HardwareHardware OS OS
  10. 10. CONFIDENTIAL © 2019 Apps Analytics Distrib. An analytics operating system? Hardware OS Apps HardwareHardware OS OS HardwareHardware OS OS
  11. 11. CONFIDENTIAL © 2019 Apps Analytics Distrib. An analytics operating system? Hardware OS Apps HardwareHardware OS OS Distributed OS Analytics OS HardwareHardware OS OS
  12. 12. CONFIDENTIAL © 2019 Apps Analytics Distrib. An analytics operating system? Hardware OS Apps HardwareHardware OS OS Distributed OS Analytics OS Apps HardwareHardware OS OS
  13. 13. CONFIDENTIAL © 2019 An analytics operating system? HardwareHardware OS OS Distributed OS Analytics OS Apps {
  14. 14. CONFIDENTIAL © 2019 An analytics operating system? HardwareHardware OS OS Distributed OS Analytics OS Apps {
  15. 15. CONFIDENTIAL © 2019 An analytics operating system? HardwareHardware OS OS Distributed OS Analytics OS Apps {
  16. 16. CONFIDENTIAL © 2019 There are two kinds of data scientists: 1) Those who can extrapolate from incomplete data. -The Internet
  17. 17. CONFIDENTIAL © 2019 Unified API Data Science Data Engineering InfoSphere Information AnalyzerDb2 Event Store Watson Knowledge CatalogWatson Data Studio DataStage Flow Designer… Watson Knowledge Catalog Cloud Private for Data … SparkBench What kind of applications?
  18. 18. CONFIDENTIAL © 2018 DATA Engineer DATA Scientist Adapted from: https://www.datacamp.com/community/blog/data-scientist-vs-data-engineer Develop, build, test, and operationalize datastores and large-scale processing systems. DataOps is the new DevOps. Clean, massage, and organize data. Perform statistics and analysis to develop insights, build models, and search for innovative correlations. Match architecture with business needs. Develop processes for data modeling, mining, and pipelines. Improve data reliability and quality. Prepare data for predictive models. Explore data to find hidden gems and patterns. Tells stories to key stakeholders.
  19. 19. CONFIDENTIAL © 2018 Adapted from: https://www.datacamp.com/community/blog/data-scientist-vs-data-engineer DATA Engineer DATA Scientist SQL
  20. 20. CONFIDENTIAL © 2019 Difference between machine learning and AI: If it is written in Python, 
 it’s probably machine learning If it is written in PowerPoint, 
 it’s probably AI -Curt Simon Harlinghausen
  21. 21. CONFIDENTIAL © 2019 IBM’s communities and CODAIT • IBM’s investment is not limited to products • CODAIT (formerly Spark Technology Center) • IBM Communities
  22. 22. CONFIDENTIAL © 2019 Key takeaways • IBM contributed to building a new kind of Operating System. • IBM builds its new generation of data products on this Operating System. • Share the love. • Use Java.
  23. 23. CONFIDENTIAL © 2019 Going even further Spark in Action (MEAP) by Jean Georges Perrin (@jgperrin) published by Manning http://jgp.net/sia sprkact-8D74 sprkact-2C72 ctwthink19 One two free books 40% off
  24. 24. CONFIDENTIAL © 2019 Links • Apache Spark • http://spark.apache.org • Spark in Action, 2e • http://jgp.net/sia • IBM Products • https://dataplatform.cloud.ibm.com/docs/content/catalog/overview-wkc.html • https://www.ibm.com/support/knowledgecenter/en/SSZJPZ_11.7.0/com.ibm.swg.im.iis.ds.fd.doc/topics/ t_config_spark.html • https://www.ibm.com/support/knowledgecenter/en/SSZJPZ_11.7.0/com.ibm.swg.im.iis.ia.administer.doc/topics/ t_spark_job.html • https://dataplatform.cloud.ibm.com/docs/content/catalog/overview-wkc.html • https://www.ibm.com/products/db2-event-store • https://www.ibm.com/analytics/cloud-private-for-data • https://developer.ibm.com/open/projects/spark-bench/, https://research.spec.org/fileadmin/user_upload/ documents/wg_bd/BD-20150401-spark_benchmark-v1.3-spec.pdf • IBM Center for Open-Source Data & AI Technologies (Spark Technology Center) • https://developer.ibm.com/code/open/centers/codait/about/

×