Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Idea behind Apache Hivemall

62 Aufrufe

Veröffentlicht am

LT at ApacheCon North America, 2018.

Veröffentlicht in: Daten & Analysen
  • //DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... //DOWNLOAD PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... //DOWNLOAD EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... //DOWNLOAD doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... //DOWNLOAD PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... //DOWNLOAD EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... //DOWNLOAD doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • //DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... //DOWNLOAD PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... //DOWNLOAD EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... //DOWNLOAD doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... //DOWNLOAD PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... //DOWNLOAD EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... //DOWNLOAD doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Gehören Sie zu den Ersten, denen das gefällt!

Idea behind Apache Hivemall

  1. 1. LT: Idea behind Apache Hivemall Makoto YUI <myui@apache.org> Principal Engineer, 1ApacheCon North America 2018
  2. 2. 2 Running machine learning on massive data stored on data warehouse Make It! ApacheCon North America 2018 Suppose …
  3. 3. 3 Running machine learning on massive data stored on data warehouse Scalability? Data movement? Tool? ApacheCon North America 2018 Concerns:
  4. 4. Approach #1 4 Data warehouse Data preprocessing Machine Learning Typical Data Scientist’s Solution Small data? ApacheCon North America 2018
  5. 5. 5 Data warehouse Data preprocessing Machine Learning Approach #2 Data Engineer’s Solution ApacheCon North America 2018
  6. 6. 6 Q: Is Dataframe a great idea for data (pre-)processing? ApacheCon North America 2018
  7. 7. 7 Q: Do you like it? (for production-ready data preprocessing) p Yes p No p Maybe ApacheCon North America 2018 I like it for simple data processing
  8. 8. 8 Q: Do you really like it? (for messy real-world data preprocessing) p Yes p No p Maybe ApacheCon North America 2018
  9. 9. 9 Real-world ML pipelines (could be more complex) Join Extract Feature Datasource #1 Datasource #2 Datasource #3 Extract Feature Feature Scaling Feature Hashing Feature Engineering Feature Selection Train by Logistic Regression Train by RandomForest Train by Factorization Machines Ensemble Evaluate Predict ApacheCon North America 2018
  10. 10. 10 Q: Have you ever seen/write hundreds-thousands lines of preprocessing in Dataframe? ApacheCon North America 2018
  11. 11. 11 Q. Fun to play with it? (scala/python coding for trivial things) Do you write testing codes? IMPO, notebook codes are error-prone for production uses ApacheCon North America 2018
  12. 12. My Suggestion 12 Data warehouse Data preprocessing Machine Learning + Scalability + Durability/Stability + Functionalities (UDFs, JSON, Windowing functions) Push more works back to DB where data resides (including some ML logics) One size does not fit all though ... ApacheCon North America 2018
  13. 13. Machine Learning in SQL queries 13 ApacheCon North America 2018
  14. 14. BigQuery ML at Google I/O 2018 14 https://ai.googleblog.com/2018/07/machine-learning-in-google-bigquery.html ApacheCon North America 2018
  15. 15. 15 Could I use ML-in-SQL in my cluster? ApacheCon North America 2018
  16. 16. 16 Open-source Machine Learning Solution for SQL-on-Hadoop https://hivemall.apache.org (incubating) ApacheCon North America 2018
  17. 17. 17 HiveQL SparkSQL/Dataframe API Pig Latin Hivemall is a multi/cross platform ML library that provides rich set of functions ApacheCon North America 2018
  18. 18. 18 Thank you! Follow us @ApacheHivemall Check out our talk tomorrow 16:40~ at ballroom J Mentors wanted ApacheCon North America 2018
  19. 19. ApacheCon North America 2018 19
  20. 20. 20 CREATE TABLE model AS SELECT feature, -- reducers perform model averaging in parallel avg(weight) as weight FROM ( SELECT train_classifier(features,label,..) as (feature,weight) FROM train ) t -- map-only task GROUP BY feature; -- shuffled to reducers This query runs in parallel on Hadoop/Spark cluster ApacheCon North America 2018
  21. 21. 21 Apache Hivemall Digdag (or Airflow) digdag.io Whatever you like ApacheCon North America 2018

×