Weitere Ă€hnliche Inhalte Ăhnlich wie Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine Data (20) Mehr von Timothy Spann (20) KĂŒrzlich hochgeladen (20) Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine Data1. Edge to AI: Analytics from Edge to Cloud with
Efficient Movement of Machine Data
TIMOTHY SPANN JOHN KUCHMEK
Field Engineer Solutions Engineer
Cloudera Cloudera
2. 2 © Cloudera, Inc. All rights reserved.
DISCLAIMER
The information in this document is proprietary to Cloudera. No part of this document may be reproduced,
copied or transmitted in any form for any purpose without the express prior written permission of Cloudera.
This document is a preliminary version and not subject to your license agreement or any other agreement
with Cloudera. This document contains only intended strategies, developments and functionalities of
Cloudera products and is not intended to be binding upon Cloudera to any particular course of business,
product strategy and/or development. Please note that this document is subject to change and may be
changed by Cloudera at any time without notice.
Cloudera assumes no responsibility for errors or omissions in this document. Cloudera does not warrant
the accuracy or completeness of the information, text, graphics, links or other items contained within this
material. This document is provided without a warranty of any kind, either express or implied, including but
not limited to the implied warranties of merchantability, fitness for a particular purpose or non-infringement.
Cloudera shall have no liability for damages of any kind including without limitation direct, special, indirect
or consequential damages that may result from the use of these materials. The limitation shall not apply in
cases of gross negligence.
3. Introduction
Tim Spann has been running meetups in Princeton on Big Data technologies since 2015.
Tim has spoken at many international conferences on Apache NiFi, Deep Learning and
Streaming.
https://community.hortonworks.com/users/9304/tspann.html
https://dzone.com/users/297029/bunkertor.html
https://www.meetup.com/futureofdata-princeton/
https://dzone.com/articles/integrating-keras-tensorflow-yolov3-into-apache-ni
4. Introduction
John Kuchmek recently joined cloudera. Previously he worked at American Water as a data
engineer and a data scientist where he worked extensively with both NiFi and Hadoop.
https://dataworkssummit.com/san-jose-2018/session/bridging-the-gap-
achieving-fast-data-synchronization-from-sap-hana-by-leveraging-hdp-hdf/
7. 7© Cloudera, Inc. All rights reserved.
CLOUDERA FLOW MANAGEMENT
â Web-based user interface
â Highly configurable
â Out-of-the-box data provenance
â Designed for extensibility
â Secure
â NiFi Registry
â DevOps support
â FDLC
â Versioning
â Deployment
8. 8© Cloudera, Inc. All rights reserved.
300+ PROCESSORS FOR DEEPER ECOSYSTEM INTEGRATION
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
Route Content
Route Context
Route Text
Control Rate
Distribute Load
Generate Table Fetch
Jolt Transform JSON
Prioritized Delivery
Encrypt
Tail
Evaluate
Execute
Fetch
HTTP
Syslog
Email
HTML
Image
HL7
FTP
UDP
XML
SFTP
AMQP
WebSocket
9. 9© Cloudera, Inc. All rights reserved.
MINIFI EDGE AGENTS
âą Edge data collection powered by MiNiFi
âą MiNiFi â smaller footprint than NiFi
âąGuaranteed delivery
âąData buffering
âąPrioritized queuing
âąFlow-specific QoS
âąData provenance
âąDesigned for extension
âąC++ / Java agents
âąTensorflow support
âą Designed for IoT
11. 11 © Cloudera, Inc. All rights reserved.
MACHINE LEARNING AT CLOUDERA
Our philosophy
We empower our customers to
run their business on data with an
open platform:
â Your data
â Open algorithms
â Running anywhere
We accelerate enterprise data science
We help clients build their AI factory
12. 12© Cloudera, Inc. All rights reserved.
OUR APPROACH
Modern enterprise platform, tools and expert guidance to help you unlock
business value with ML/AI
Agile platform to build,
train, and deploy many
scalable ML applications
Enterprise data science
tools to accelerate team
productivity
Expert guidance,
services & training to
fast track value & scale
13. © Cloudera, Inc. All rights reserved. 13© Cloudera, Inc. All rights reserved.
WE DELIVER AN ENTERPRISE DATA CLOUD
IoT, Ingest &
Streaming
Data
Engineering
Data
Warehouse
Operational
Database
Machine
Learning
Catalog | Schema | Migration | Security | Governance
Hybrid
Cloud
Public
Multi-Cloud
Edge
Datacenter
14. 14 © Cloudera, Inc. All rights reserved.
MACHINE LEARNING IS BUILT ON DATA MANAGEMENT
We deliver an Enterprise Data Cloud for any data, anywhere, from the edge to AI
DataFlow &
Streaming
Data
Engineering
Data
Warehouse
Operational
Database
Machine
Learning
Catalog | Schema | Migration | Security | Governance
Hybrid
Cloud
Public
Multi-Cloud
Edge
Datacenter
Enterprise grade
Secure, performant and compliant
Scalable
Elastic, cost-effective and lower TCO
Runs anywhere
Public cloud, on-premises, multi, hybrid
15. 15 © Cloudera, Inc. All rights reserved.
PLATFORMS FOR INDUSTRIALIZED AI
Manage pipelines + models
Deploy models
Automate pipelines
Monitor performance
DEPLOYDEVELOP
Make teams more productive
Explore data
Develop reports, pipelines, models
Collaborate with peers
TRAIN
Scale resources efficiently
Train models
Tune parameters
Track performance
End-to-end machine learning infrastructure for teams building at scale
MANAGE
Run anywhere with a common architecture
Manage access and resources
Scale cost with usage
16. 16 © Cloudera, Inc. All rights reserved.
INDUSTRIALIZED AI REQUIRES LARGER DATA PLATFORM
Streaming
Ingest
Batch Ingest
Machine
Learning Tools
BI Tools and
SQL Editors
Data Products
DATA, METADATA, SECURITY, GOVERNANCE, WORKLOAD MANAGEMENT
MACHINE
LEARNING
DATA
ENGINEERING
DATA
WAREHOUSE
OPERATIONAL
DATABASE
17. 17© Cloudera, Inc. All rights reserved.
MACHINE LEARNING PHASES
Where to Connect to Apache NiFi
19. Speed of Data Model Training Model Scoring Use Case
Batch
Batch
Batch
Batch Reporting,
Analytics,
Applications
Online
DS Applications/
Interactive
Dashboards
Streaming
In-stream
Streaming
Applications
Incremental/Online In-stream
Streaming
Applications
Training, Scoring and Monitoring
21. 21 © Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCH
Accelerate machine learning from research to production
For data scientists
âą Experiment faster
Use R, Python, or Scala with on-
demand compute and secure
CDH/HDP data access
âą Work together
Share reproducible research
with your whole team
âą Deploy with confidence
Get to production repeatedly
and without recoding
For IT professionals
âą Bring data science to the data
Give your data science team
more freedom while reducing
the risk and cost of silos
âą Secure by default
Leverage common security and
governance across workloads
âą Run anywhere
On-premises or in the cloud
22. 22 © Cloudera, Inc. All rights reserved.
ACCELERATED DEEP LEARNING WITH GPUS
Multi-tenant GPU support on-premises or cloud
âą Extend CDSW to deep learning
âą Schedule & share GPU resources
âą Train on GPUs, deploy on CPUs
âą Works on-premises or cloud
CDSW
GPUCPU
CDH
CPU
CDH
CPU
single-node
training
distributed
training, scoring
âOur data scientists want GPUs, but
we need multi-tenancy. If they go to
the cloud on their own, itâs expensive
and we lose governance.â
GPU On CDH coming in C6
24. 24 © Cloudera, Inc. All rights reserved.
INTRODUCING MODELS
Machine learning models as one-click microservices (REST APIs)
Model APIs made easy!
1. Choose Python/R file, e.g. score.py
2. Choose function, e.g. forecast
f = open('model.pk', 'rb')
model = pickle.load(f)
def forecast(data):
return model.predict(data)
3. Choose resources
25. 25© Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCH
Select a Project, Create a Session, Load Libraries and Data
CLOUDERA DATA SCIENCE WORKBENCH
26. 26© Cloudera, Inc. All rights reserved.
Load a File and Run It
CLOUDERA DATA SCIENCE WORKBENCH
27. 27© Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCH
Install Python Libraries for Python 2 or Python 3
CLOUDERA DATA SCIENCE WORKBENCH
28. 28© Cloudera, Inc. All rights reserved.
Test your function with an argument
CLOUDERA DATA SCIENCE WORKBENCH
29. 29© Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCH
Create a model from that file and function
CLOUDERA DATA SCIENCE WORKBENCH
30. 30© Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCHList All The Models
CLOUDERA DATA SCIENCE WORKBENCH
31. 31© Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCHDeploy the Model
CLOUDERA DATA SCIENCE WORKBENCH
32. 32© Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCHCheckout The Build
CLOUDERA DATA SCIENCE WORKBENCH
33. 33© Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCHTest the Model
CLOUDERA DATA SCIENCE WORKBENCH
34. 34© Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCHValidate the Model Results
CLOUDERA DATA SCIENCE WORKBENCH
35. 35© Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCHMonitor The Running Models
CLOUDERA DATA SCIENCE WORKBENCH
36. 36© Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCHInvoke the Model From Apache NiFi In Flow
CLOUDERA DATA SCIENCE WORKBENCH
37. 37© Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCHQuery Results of Classification in Flow
{ "class1": "cat", "cpu": 38.3, "end": "1549672761.1262221",
"host": "gluoncv-apache-mxnet-29-50-7fb5cfc5b9-sx6dg", "memory": 14.9,
"pct1": "98.15670800000001",
"shape": "(1, 3, 566, 512)", "systemtime": "02/09/2019 00:39:21",
"te": "3.380652666091919"
}
CLOUDERA DATA-IN-MOTION (APACHE NIFI)
38. 38© Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCHIntegrating Calls to CDSW Jobs
CLOUDERA DATA-IN-MOTION (APACHE NIFI)
39. 39© Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCHPySpark Job for HDFS Storage
CLOUDERA DATA SCIENCE WORKBENCH
40. 40© Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCHPySpark Job Receiving REST API
CLOUDERA DATA SCIENCE WORKBENCH
41. 41© Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCHNiFi Job Integration
CLOUDERA DATA SCIENCE WORKBENCH
42. 42© Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCHDisplay Data
CLOUDERA DATA SCIENCE WORKBENCH