En esta reunión virtual, damos una introducción a la plataforma de aprendizaje automático de código abierto número 1, H2O-3 y te mostramos cómo puedes usarla para desarrollar modelos para resolver diferentes casos de uso.
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
Introducción al Aprendizaje Automatico con H2O-3 (1)
1. 11
Machine Learning a
Escala
Sept 29th, 2020
Franklin Velasquez
Technical Marketing Engineer and Academic
Program Manager
https://www.linkedin.com/in/franklin-velasquez-alva
renga-260827183/
franklin.alvarenga@h2o.ai
Introducción al
Aprendizaje Automático
(Machine Learning) con
H2O-3
2. 2
H2O.ai is the open source leader in AI
and Machine Learning
Democratize AI for Everyone
4. 4
Founded in Silicon Valley 2012
Funding: $147M | Series D
Investors: Goldman Sachs, Ping An,
Wells Fargo, NVIDIA, Nexus Ventures
We are Established
We Make World-class AI Technology
We are Global
H2O Open Source Machine Learning
H2O Driverless AI: Automatic Machine Learning
H2O Q: AI platform for business users
Mountain View, NYC, London, Paris, Ottawa,
Prague, Chennai, Singapore
240 1K
20K 180K
Universities
Companies Using
H2O Open Source
Meetup Members
Best AI Team
H2O.ai Snapshot
We are Passionate about Customers
4X customers, 2 years, all industries, all continents
Aetna/CVS, Allergan, AT&T, Capital One, CBA, Citi,
Coca Cola, Bradesco, Disney, Franklin Templeton,
Genentech, Kaiser Permanente, Lego, Merck, Pepsi,
Reckitt Benckiser, Roche
5. 5
H2O.ai Spans Industries and Use Cases
Wholesale / Commercial
Banking
• Know Your Customers (KYC)
• Anti-Money Laundering
(AML)
Card / Payments Business
• Transaction frauds
• Collusion fraud
• Real-time targeting
• Credit risk scoring
• In-context promotion
Retail Banking
• Deposit fraud
• Customer churn prediction
• Auto-loan
Financial Services
• Early cancer detection
• Product recommendations
• Personalized prescription
matching
• Medical claim fraud
detection
• Flu season prediction
• Drug discovery
• ER and hospital
management
• Remote patient monitoring
• Medical test predictions
Healthcare and
Life Science
• Predictive maintenance
• Avoidable truck-rolls
• Customer churn prediction
• Improved customer viewing
experience
• Master data management
• In-context promotions
• Intelligent ad placements
• Personalized program
recommendations
Telecom
• Funnel predictions
• Personalized ads
• Fraud detection
• Next best offer
• Next best action
• Customer segmentation
• Customer churn
• Customer recommendations
• Ad predictions and fraud
Marketing and RetailMarketing and Retail
Save Time. Save Money. Gain a Competitive Edge.
6. 66
Our Team is Made up of the World’s Leading Data Scientists
Your projects are backed by 10% of the World’s Data Science
Grandmasters and a Team of Experts who are relentless in solving your
critical problems.
7. 7
Gartner 2020: H2O.ai is a Visionary in Two MQs
New MQ for 2020
Strengths:
1. Automation
2. Ease of Use &
Explainability
3. Excellent
Customer
Support
2020 Cloud AI for Developer
Services MQ
2020 Data Science and Machine
Learning MQ
Named a Visionary,
with the strongest
“Completeness of
Vision” in the entire
quadrant
Strengths:
1. Automation
2. Explainability
3. High-Performance
ML Components
8. 8
• Automatic feature engineering, ML training
and interpretability, from ingest to
deployment
• Open and Extensible AutoML
• User licenses on a per seat basis annually
• GUI-based interface, along with R & Python
API, for end-to-end data science
• A new and innovated platform to make
your own AI apps
• Rapid & Easy SDK to build interactive, low
latency AI apps
• Easy and intuitive platform to have AI
answer your question
The H2O.ai Platform
In-memory, distributed
machine learning algorithms with
H2O Flow GUI
Open Source
H2O open source engine
integration with Spark
H2O Driverless AI H2O Q
• 100% open source – Apache V2 Licensed
• Enterprise support subscriptions
• Interface using R, Python for ML training on massive datasets
H2O ModelOps
• AI deployment platform built for DevOps and MLOps
• Scalable to support high throughput and low latency
model scoring environments
• Comprehensive model monitoring
Highly flexible and scalable
model deployment and
monitoring platform.
App Marketplace
10. 10
H2O Open Source AI Platform
Rapid Model Deployment Cloud IntegrationAcceleration
• Highly portable models
deployed in Java (POJO)
and Model Object Optimized
(MOJO)
• Automated and streamlined
scoring service
deployment with
Rest API
• Distributed in-memory
computing platform
• Distributed algorithms
Big Data EcosystemOpen Source Flexible Interface
Scalability and Performance
Smart and Fast Algorithms
H2
O Flow100% open source
Distributed in-memory machine learning with linear scalability
11. 11
H2O Machine Learning Features
• Supervised & Unsupervised machine learning algorithms
– GBM, RF, DNN, GLMStack Ensembles, AutoML, etc.
• Imputation, normalization and auto one-hot-encoding
• Automatic early stopping
• Automatic ML at Scale
• Cross-validation, grid search and random search
• Variable importance, model evaluation metrics, plots
DRF
XRT
GBM
XGBoost
GLM
DNN Stacked
Ensemble
13. 13
H2O Distributed Computing
• Multi-node cluster with shared memory model
• All computations in memory
• Each node sees only some rows of the data
• No limit on cluster size
• Distributed data frames (collection of vectors)
• Columns are distributed (across nodes) arrays
• Works just like R’s data.frame or Python Pandas
DataFrame
H2O Cluster
H2O Frame
14. 14
Python Interface Overview
Action Pandas or scikit-learn H2O
Reading data pandas.read_csv(data_path) h2o.import_file(data_path)
Summarizing data pandas_frame.describe() h2o_frame.describe()
Summary
statistics
pandas_frame.mean() h2o_frame.mean()
Combining rows pandas.concat(list[frame1,frame2]) h2o_frame.rbind(h2o_frame2)
Combining
columns
pandas.concat(list[frame1,frame2],axis = 1) h2o_frame.cbind(h2o_frame2)
Data selection pandas_frame[:, :] h2o_frame[:, :]
Transforming
columns
np.log(pandas_frame[x])
np.sqrt(pandas_frame[x])
h2o_frame[x].log()
h2o_frame[x].sqrt()
Building Random
Forest
model = RandomForestClassifier(n_estimators = 100)
model = model.fit(x_frame, y_frame)
model = H2ORandomForestClassifier(n_trees = 100)
model = model.train(x, y, train_frame)
Model Prediction model.predict model.predict
Model Metrics metrics.auc metrics = model.model_performance(frame)
metrics.auc()
16. 16
H2
O
H2
O
H2
O
data.csv
HTTP call to H2O
cluster
H2O ClusterInitiate distributed
ingest
HDFS/S3/Local File/URL
STEP 2
2.2
2.3
2.4
Python
h2o.import_file()
2.1
Python
function call
Reading Data into H2O with Python
17. 17
H2
O
H2
O
H2
O
Python
STEP 3
Cluster IP
Cluster Port
Pointer to Data
Return pointer to
dataframe
3.3
3.4
3.1h2o_df object
created in
Python
data.csv
h2o_df
H2
O
Frame
3.2
Distributed H2
O
Frame
H2O Cluster
Reading Data into H2O with Python
HDFS/S3/Local File/URL