Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Introducción al Machine Learning Automático

920 Aufrufe

Veröffentlicht am

¿Cómo puede llevar el aprendizaje automático a las masas? Los proyectos de Machine Learning con la búsqueda de talento, el tiempo para construir e implementar modelos y confiar en los modelos que se construyen.

¿Cómo puede tener varios equipos en su organización para crear modelos de ML precisos sin ser expertos en ciencia de datos o aprendizaje automático?

¿Se pregunta sobre los diferentes sabores de AutoML?

H2O Driverless AI emplea las técnicas de científicos expertos en datos en una aplicación fácil de usar que ayuda a escalar sus esfuerzos de ciencia de datos. La inteligencia artificial Driverless permite a los científicos de datos trabajar en proyectos más rápido utilizando la automatización y la potencia de computación de vanguardia de las GPU para realizar tareas en minutos que solían tomar meses.

Con H2O Driverless AI, todos, incluyendo expertos y científicos de datos junior, científicos de dominio e ingenieros de datos pueden desarrollar modelos confiables de aprendizaje automático. Esta plataforma de aprendizaje automático de última generación ofrece una funcionalidad única y avanzada para la visualización de datos, la ingeniería de características, la interpretabilidad del modelo y la implementación de baja latencia.

H2O Driverless AI hace:

* Visualización automática de datos
* Ingeniería automática de funciones a nivel de Grandmaster
* Selección automática del modelo
* Ajuste y capacitación automáticos del modelo
* Paralelización automática utilizando múltiples CPU o GPU
* Ensamblaje automático del modelo
*automática del Interpretaciónaprendizaje automático (MLI)
* Generación automática de código de puntuación

¿Quieres probarlo tú mismo? Puede obtener una prueba gratuita aquí: H2O Driverless AI trial.

Venga a esta sesión y descubra cómo comenzar con el Aprendizaje automático automático con AI sin conductor H2O, y cree modelos potentes con solo unos pocos clics.

¡Te veo pronto!
Acerca de H2O.ai
H2O.ai es una empresa visionaria de software de código abierto de Silicon Valley que creó y reimaginó lo que es posible. Somos una empresa de fabricantes que trajeron al mercado nuevas plataformas y tecnologías para impulsar el movimiento de inteligencia artificial. Somos los creadores de, H2O, la principal plataforma de aprendizaje de ciencia de datos de fuente abierta y de aprendizaje automático utilizada por casi la mitad de Fortune 500 y en la que confían más de 14,000 organizaciones y cientos de miles de científicos de datos de todo el mundo.

Veröffentlicht in: Software
  • Hello! I have searched hard to find a reliable and best research paper writing service and finally i got a good option for my needs as ⇒ www.WritePaper.info ⇐
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Mike Cruickshank Profit Maximiser◆◆◆ http://t.cn/A6hPRSfx
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • $25 per hour jobs on Facebook, now hiring! ●●● http://t.cn/AieX6y8B
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Winning the Lottery is Based on This [7 Time Winner Tells All] ★★★ http://t.cn/Airfq84N
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Thanks for your nice post, Machine Learning is steadily moving away from abstractions and engaging more in business problem solving with support from AI and Deep Learning. With Big Data making its way back to mainstream business activities,For more informations visit www.pridesys.com
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

Introducción al Machine Learning Automático

  1. 1. Introducción al Machine Learning Automático Meetup “AI to do AI” Rafael Coss (Rafael.Coss@h2o.ai) Director of Community @racoss @h2oai Chris Carpenter (Chris.Carpenter@h2o.ai) Leobardo Morales (lmorales@mx1.ibm.com)
  2. 2. H2O.ai Meetup Groups Contact Rafael Coss community@h2o.ai If you want to … - Give a talk about AI / machine learning use case (it is a great opportunity to promote your work) - Host a joint meetup with H2O.ai https://www.meetup.com/pro/h2oai
  3. 3. H2O.ai Community Slack Workspace •Join the H2O.ai Community Slack Workspace today! •https://www.h2o.ai/community/driverless-ai-community/#chat •Use emoji to tag messages •:question :use_case :mli :get_started :bugs … •Reply to message using threads •Check out Community Guide for more info: •https://tinyurl.com/hac-community-guide Online Chat to ask questions, discuss use cases, give feedback and more
  4. 4. H2O WORLD SAN FRANCISCO February 4-5, 2019 Hilton San Francisco Union Square world.h2o.ai
  5. 5. AI is Transforming the IT Industry "AI is the fastest growing workload on the planet” 300% Increase in AI spend year over year “Demand for AI Talent on the Rise” 200% Increase in jobs requiring AI skills “Businesses are preparing for the widespread adoption of machine learning” 9/10 CIOs planning to use machine learning
  6. 6. 6 H2O.ai HQ Mountain View
  7. 7. H2O.ai Company Overview Company Founded in Silicon Valley in 2012 Series C Investors: Wells Fargo, NVIDIA, Nexus Ventures, Paxion Ventures Products • H2O Open Source Machine Learning (14,000 organizations) • H2O Driverless AI – Automatic Machine Learning Leadership Market Leader recognized by Gartner, Forrester, InfoWorld, Constellation Research Team 130+ AI experts (Kaggle Grandmasters, Distributed Computing and Visualization experts) Global Mountain View, London, Prague, Chennai
  8. 8. Worlds largest data science community (over 2 million members) AI and ML education Best known for AI competitions Public datasets Code and analysis sharing http://www.kaggle.com 1st 4th 48th33rd25th 13th Grandmasters (their highest ranking)
  9. 9. CONFIDENTIAL 14,000 Companies using H2O 155,000 data scientists 130K Meet-up Members H2O World NYC, London, SF Growing Worldwide Open Source Community
  10. 10. “Confidential and property of H2O.ai. All rights reserved” Partner Ecosystem Strategic Partners Cloud ProvidersHW Vendors System Integrators Value Added Resellers Data Stores
  11. 11. H2O.ai Product Suite GPU-accelerated machine learning package Automatic feature engineering, machine learning and interpretability • 100% open source – Apache V2 licensed • Built for data scientists – interface using R, Python on H2O Flow (interactive notebook interface) • Enterprise Support subscriptions • Built for domain users, analysts and data scientists – GUI based interface for end-to-end data science • Fully automated machine learning from ingest to deployment • Licensed on a per seat basis (annual subscription) Open Source In-memory, distributed machine learning algorithms with H2O Flow GUI -3 H2O AI open source engine integration with Spark
  12. 12. H2O.ai is a Recognized Leader in AI and ML 2018 Gartner Magic Quadrant for Data Science and Machine Learning Platforms Forrester Wave: Notebook-Based Predictive Analytics And Machine Learning Solutions, Q3 2018 Top 3 Artificial Intelligence (AI) and Machine Learning (ML) Software Solution “Technology leadership … with a distinguished vision” “the quasi-industry standard” “its vision of creating an AI and ML tool that ultimately aims to allow almost everyone within the business to create their own predictive models” “H2O.ai’s future is automated machine learning” “its bright future is in Driverless AI”
  13. 13. Highly Regarded by Customers Dr. Robert Coop AI and ML Manager Stanley Black & Decker “H2O Driverless AI feature engineering is better than anything I've seen out there right now. And the scoring pipeline generation is probably one of the bigger pluses for me. These features alone have provided us with a true competitive edge in agile manufacturing. It's a massive time saver.”
  14. 14. 1st Generation Automatic Machine Learning
  15. 15. What is Data Science? Clean, transform, filter, aggregate, impute Convert into X and Y Problem Formulation Data Processing Machine Learning • Identify a data task or prediction problem • Collect relevant data • Train models • Evaluate models
  16. 16. The Data Science Venn Diagram Drew Conway (2010)
  17. 17. The Data Scientist “Unicorn”
  18. 18. What is Automatic Machine Learning “the automated process of algorithm selection, feature generation, hyperparameter tuning, iterative modeling, and model assessment.” Enabled by advances in computing power at lower cost that make it possible for machines to try thousands of possible combinations to find the best one. Confidential and property of H2O.ai. All rights reserved
  19. 19. The Evolving Space of Automatic Machine Learning 01 02 Open source model showdown with feature encoding, automatic hyper- parameter tuning, ensembles and model leader board First Gen 01 HPC powered evolutionary model development with advanced feature engineering, extensive model explainabilty Second Gen 02 2014-15 2017-18 The picture can't be displayed. Confidential and property of H2O.ai. All rights reserved
  20. 20. Challenges in AI Model Development Basic Encoding Feature Generation Advanced Encoding Feature Engineering Algorithm Selection Parameter Tuning Model Building Model Ensembles Pipeline Generation Model Explainabilty Model Deployment Model Documentation • Time consuming • Requires advanced skillset • Creating new feature combinations requires advanced skill • Time consuming • Requires advanced knowledge of algorithms and parameters • Creating ensembles is an advanced skill • Time consuming • Requires different set of skills to deploy models • Explaining how models make decisions is critical to building trust with business stakeholders and regulators The entire process is highly iterative and can take weeks or months to develop a single production-ready model. Confidential and property of H2O.ai. All rights reserved
  21. 21. H2O AutoML
  22. 22. Different Flavors of AutoML https://www.h2o.ai/blog/the-different-flavors-of-automl
  23. 23. The Challenges of Enterprise AI Adoption Time to Insights Slow Weeks to Months Lack of AI Talent ~100 Data Science “Grandmasters” in the World Time for a data scientist to build a model Lack of Trust in AI Black box models ”US alone faces a shortage of 190,000 people with analytical expertise.”
  24. 24. 2nd Generation Automatic Machine Learning
  25. 25. Challenges in AI Model Development Basic Encoding Feature Generation Advanced Encoding Feature Engineering Algorithm Selection Parameter Tuning Model Building Model Ensembles Pipeline Generation Model Explainabilty Model Deployment Model Documentation • Time consuming • Requires advanced skillset • Creating new feature combinations requires advanced skill • Time consuming • Requires advanced knowledge of algorithms and parameters • Creating ensembles is an advanced skill • Time consuming • Requires different set of skills to deploy models • Explaining how models make decisions is critical to building trust with business stakeholders and regulators The entire process is highly iterative and can take weeks or months to develop a single production-ready model. Confidential and property of H2O.ai. All rights reserved
  26. 26. Why Next Generation Automatic Machine Learning for the Enterprise Time to Insight Months down to Hours 7 Kaggle Grandmasters Top 10 Data Science Experts Automated GPU-accelerated ML with IBM AC922 Explainability & Transparency Trust In AI
  27. 27. Supervised Learning 27
  28. 28. Problems Addressed by Driverless AI 28 • Supervised Learning • Regression • Classification • Tabular Structured Data • Numeric • Categorical • Time / Date • Text • Missing Values • Identically and Independently Distributed (iid) rows • Time-series • Single time-series • Grouped time-series • e.g. Store - Department - Item • Time-series with gaps between training and test set to account for time to deploy
  29. 29. H2O Driverless AI – Simple, Fast, Accurate, Interpretable Easy Deployment for Low Latency Models • Stand-alone scoring pipeline that is easy for IT to deploy and manage • Easy to update when a new model version is available • Streamlined scoring code to deploy on any device: on the edge, mobile, … • Very fast (milliseconds) to satisfy today’s real-time apps Fast and Accurate Results • “Data Scientist in a Box” • Simple interface • Automatic feature engineering to increase accuracy • Automatic recipes for solving wide variety of use-cases • Automatic tuning to find and tune the right ensemble of models Industry Leading Interpretability • Trusted results with explainability and transparency • Interpretability for debugging, not just for regulators • Get reason codes and model interpretability in plain English • K-Lime, LOCO, partial dependence and more Automatic Data Visualization • Automatic generation of visualizations and graphs to explore your data before the model-building process • Most relevant graphs shown for the given data set • Identify outliers and missing values
  30. 30. H2O Driverless AI Customer Use Cases
  31. 31. H2O Driverless AI Delivers Value in Every Industry Matched 10 years of machine learning expertise Financial Services +6% Accuracy Increased customer satisfaction Healthcare Near perfect scores Outperforms alternative digital marketing Marketing 2.5x performance Accurately predicting supplies & materials for future orders Manufacturing 25% time savings “Driverless AI is giving amazing results in terms of feature and model performance “ “Driverless AI powers our data science team to operate at scale. We have the opportunity to impact care at large.” “Driverless AI helped us gain an edge for our clients. AI to do AI, truly is improving our system on a daily basis.” “H2O Driverless AI feature engineering is better than anything I've seen out there right now.” Venkatesh Ramanathan Sr. Data Scientist, PayPal Martin Stein Chief Product Officer, G5 Bharath Sudarshan Dir. of Data Science, ArmadaHealth Robert Coop Sr. Data Scientist, SB&D
  32. 32. www.h2o.ai/customer-stories/ 32 www.h2o.ai/company/news/h2o-ai-ibm-vision-banco-machine-learning
  33. 33. Financial Fraud Detection “Driverless AI is giving amazing results in terms of feature and model performance “ Venkatesh Ramanathan Senior Data Scientist, PayPal • Driverless AI matched 10 years of expert feature engineering • Increased accuracy from 0.89 to 0.947 (6%) in detecting fraudulent activity • 6X speed up when running on an IBM Power GPU-based server
  34. 34. Connecting Patients to Specialists for Better Healthcare • Companies have seen “skyrocketing” net promoter scores and “near perfect” customer satisfaction rates • Customer loyalty and premium retention rates have increased • Reduces costs, while patients receive care faster “Driverless AI powers our data science team to operate efficiently and experiment at scale. With this latest innovation, we have the opportunity to impact care at large.” Bharath Sudarshan Director of Data Science and Innovation Armada Health
  35. 35. Marketing Optimization for the Real Estate Market “Driverless AI helped us gain an edge with our Intelligent Marketing Cloud for our clients. AI to do AI, truly is improving our system on a daily basis.” Martin Stein Chief Product Officer • Outperforms other real estate digital marketing solutions by 2.5X • A G5 client saved $500K annual digital spend while increasing web traffic 3X • 10X faster model creation
  36. 36. Improve Manufacturing Sales and Forecasting “H2O Driverless AI feature engineering is better than anything I've seen out there right now. And the scoring pipeline generation is probably one of the bigger pluses for me. It's a massive time saver.” Robert Coop Sr. Data Scientist Stanley Black & Decker • Time savings of 25% with 1 data scientist • Saved 1 month of time in model tuning and training for industrial product line • Accurately predicted supplies and materials for a future client order increasing forecast accuracy
  37. 37. IBM & H2O Driverless AI Simplifying and Accelerating Enterprise AI Initiatives
  38. 38. H2O Driverless AI Benefits from the Power Systems Advantage High Speed Data Transfer 9.5x Big Data Scale 2.6x More RAM Max I/O bandwidth GPU Accelerated ML Integrated Systems Approach Faster on GPUs 30x
  39. 39. H2O Driverless AI on IBM Power Systems A Winning Combination High Speed Data Transfer 1.5x Big Data Scale 2x Data Ingest Faster Feature Engineering GPU Accelerated ML Time Series 5x Integrated Systems Approach
  40. 40. PowerAI Deep Learning Impact (DLI) Module Data & Model Management, ETL, Visualize, Advise IBM Spectrum Conductor with Spark Cluster Virtualization, Auto Hyper-Parameter Optimization PowerAI: Open Source ML Frameworks Large Model Support (LMS) Distributed Deep Learning (DDL) Auto ML PowerAI Enterprise PowerAI Vision Auto-DL for Images & Video Label Train Deploy Accelerated Infrastructure Accelerated Servers Storage AI for Data Scientists and non-Data Scientists H2O Driverless AI Auto-ML for Text & Numeric Data, NLP Import Experiment Deploy
  41. 41. H2O Driverless AI Complements IBM PowerAI Vision Sensors Log Transactional IBM PowerAI delivers Deep Learning for Images H2O Driverless AI is Automatic Machine Learning NLP
  42. 42. The H2O Driverless AI Experience
  43. 43. Driverless AI: Automates Data Science and Machine Learning Workflows Driverless AI
  44. 44. H2O Driverless AI: How it Works Local Amazon S3 HDFS X Y Automatic Scoring Pipeline Machine learning Interpretability Deploy Low-latency Scoring to Production Modelling Dataset Model Recipes: • IID data • Time-series • More on the way Advanced Feature Engineering Algorithm Model Tuning + + Survival of the Fittest Automatic Machine Learning Understand the data shape, outliers, missing values, etc. Powered by GPU Acceleration 1 Drag and drop data 2 Automatic Visualization Use best practice model recipes and the power of high performance computing to Iterate across thousands of possible models including advanced feature engineering and parameter tuning 3 Automatic Machine Learning Deploy ultra-low latency Python or Java Automatic Scoring Pipelines that include feature transformations and models. 4 Automatic Scoring Pipelines Ingest data from cloud, big data and desktop systems Google BigQuery Azure Blog Storage Snowflake Model Documentation
  45. 45. The Driverless AI Experience 1. Import Data 2. Review Auto-Visualizations 3. Start Experiment 4. Review Winning Model 5. Review Model Interpretations 6. Deploy Model
  46. 46. The Driverless AI Experience 1. Import Data 2. Review Auto- Visualizations 3. Start Experiment 4. Review Winning Model 5. Review Model Interpretations 6. Deploy Model
  47. 47. The Driverless AI Experience 2. Review Auto-Visualizations
  48. 48. The Driverless AI Experience Quickly start an experiment and benefit from built-in automation: 1. Import Data 2. Review Auto-Visualizations 3. Start Experiment 4. Review Winning Model 5. Review Model Interpretations 6. Deploy Model • Feature Engineering • Model Tuning • Model Selection
  49. 49. The Driverless AI Experience 3. Start Experiment Feature Engineering Model Tuning Quickly Start Experiment Model Selection
  50. 50. These are the only required settings – all others are optional depending on the scenario It’s Easy to Start an Experiment Dataset being used to train the models What column are we trying to predict? Should certain rows of data have a higher weight? Data used to calculate metrics for the final model; not used during training Is this a time- series forecasting exercise? Columns to exclude from experiment Data used for parameter tuning
  51. 51. Experiment Settings • Relative time for completing the experiment • Higher settings mean: • More iterations are performed to find the best set of features • Longer “early stopping” threshold Time • Relative accuracy – higher values should lead to higher confidence in model performance (accuracy) • Impacts things such as level of data sampling, how many models are used in the final ensemble, parameter tuning level, among others Accuracy • Relative interpretability – higher values favor more interpretable models • The higher the interpretability setting, the lower the complexity of the engineered features and of the final model(s). Interpretability
  52. 52. Auto Feature Generation Kaggle Grandmaster Out of the Box • Automatic Text Handling • Frequency Encoding • Cross Validation Target Encoding • Truncated Singular Value Decompression • Clustering and more Feature Transformations Examples of Original Features Examples of Generated Features
  53. 53. The Driverless AI Experience 1. Import Data 2. Review Auto-Visualizations 3. Start Experiment 4. Review Winning Model 5. Review Model Interpretations 6. Deploy Model
  54. 54. The Driverless AI Experience 4. Review Winning Model
  55. 55. The Driverless AI Experience 1. Import Data 2. Review Auto-Visualizations 3. Start Experiment 4. Review Winning Model 5. Review Model Interpretations 6. Deploy Model
  56. 56. Live Demo 16th (of 2926)
  57. 57. http://h2o.ai 21-day free trial Easy installation: Native and Dockerized deployment options
  58. 58. CONFIDENTIAL CONFIDENTIAL H2O.ai in the Cloud EMR KubeFlow DataPro c
  59. 59. CONFIDENTIAL H2O Driverless AI on the Cloud • Easy setup on any cloud or on premise. Support for Azure, AWS and Google Cloud with marketplace offerings. • Develop more models using H2O Driverless AI automatic machine learning using high- performance computing and evolutionary algorithms to perform time-consuming data science tasks like feature engineering and model hyperparameter tuning. • Leverage your existing ML workbench to create and deploy streamlined production models based on insights from Driverless AI
  60. 60. H2O Driverless AI Delivers Automatic ML for the Enterprise 21 day free trial for Driverless AI • Performs the function of an expert data scientist • Create models quickly with GPUs and Machine Learning automation • Delivers insights and interpretability • Created and supported by world renowned AI experts from H2O.ai • Award-winning software
  61. 61. Getting Started 67 • Get the 21 day free trial for Driverless AI • Don’t have the hardware try Qwiklab cloud training environment • Go to your favorite cloud AWS, Azure, Google • Try video Tutorial or follow Booklet • Learn how Driverless AI delivers Trust & Explainable AI • Learn more about NLP and Time-Series in Driverless AI • Watch Replays from H2O World London 2018 • Watch “Democratizing Intelligence” by Sri Ambati, CEO &Founder • Learn how PayPal is solving fraud with Driverless AI • Docs • H2O Community Slack
  62. 62. Gracias Rafael Coss (Rafael.Coss@h2o.ai) Director of Community @racoss @h2oai Chris Carpenter (Chris.Carpenter@h2o.ai) Leobardo Morales (lmorales@mx1.ibm.com)
  63. 63. Additional sources of information Docs • Docs: http://docs.h2o.ai Slack Community (Public) • Register • https://www.h2o.ai/community/driverless-ai-community/#chat • Guide: http://tinyurl.com/hac-community-guide Ask questions, discuss use cases, feedback, … • Driverless AI Trial: • Walkthrough: http://ibm.biz/H2O-DAI-Power-Video • Tutorials: https://www.youtube.com/watch?v=5jSU3CUReXY • Booklet: http://docs.h2o.ai/driverless-ai/latest- stable/docs/booklets/DriverlessAIBooklet.pdf
  64. 64. BACKUP SLIDES
  65. 65. Video Tutorial on YouTube 71 https://www.youtube.com/watch?v=5jSU3CUReXY
  66. 66. Template 72
  67. 67. Hands-on Experiment: Credit Card Example 73
  68. 68. Credit Card Example 74 • Dataset: • information on default payments, demographic factors, credit data, history of payment, etc. • Source: www.kaggle.com/uciml/default-of-credit-card-clients- dataset • File System: • CreditCard-train.csv (for training models) • CreditCard-test.csv (for making new predictions) • Our Goal: • Predict whether someone will default on their credit card payment. • Tutorial: • http://docs.h2o.ai/driverless-ai/latest-stable/docs/booklets/DriverlessAIBooklet.pdf • http://docs.h2o.ai/driverless-ai/latest-stable/docs/booklets/MLIBooklet.pdf
  69. 69. Credit Card Example 75
  70. 70. Credit Card Example 76
  71. 71. Target Learn the Pattern Education, Marriage, Age, Sex, Repayment Status, Limit Balance ... 77 Learning from Credit Card Data Features Default Payment Next Month (Binary) Predictions Probability (0...1)

×