SlideShare a Scribd company logo
1 of 43
1 © Hortonworks Inc. 2011–2018. All rights reserved
Apache NiFi + Tensorflow + Hadoop:
Big Data AI サンドイッチの作り方
Zhen Zeng
Solution Engineer
5th July, 2018
2 © Hortonworks Inc. 2011–2018. All rights reserved
Agenda
• 自己紹介
• Bigdata AI サンドイッチの作り方
• NiFi
• TensorFlow
• NiFi/TensorFlow/Hadoopとの組み合わせ
3 © Hortonworks Inc. 2011–2018. All rights reserved
About Me
• 曾 臻(Zhen Zeng)
• Solution Engineer, Hortonworks Japan
• Java Engineer, BigData Engineer
4 © Hortonworks Inc. 2011–2018. All rights reserved
Hortonworks 会社概要
会社概要: 本社 米国カリフォルニア州サンタクララ市
次世代データプラットフォームの世界標準・デファクトスタンダードを提供するオープンソースソフトウェア企業の世界的リーダー
2017年売上実績
$ 261.8M (前年同期比 +42%)
Q4 2017/2016
Support Subscription売上高
+63% YoY
データレイクの市場浸透と BigData, IoTの基盤の
標準技術としての採用が加速し売上が順調に伸びている
創立 2011年 Yahoo!のApache Hadoop
オリジナルチームのメンバー24人のエンジニアが設立
役員 CEO: ロブ・ビアデン、COO:スコット・デイビッドソン
オープンソースソフトウェアへ100%コミット
Apache Hadoop プロジェクトへの貢献世界一
2011年 創業、Microsoft社 (Azure HDInsight )と提携
2014年 9月 日本法人ホートンワークスジャパン株式会社設立
12月 NASDAQ上場(NASDAQ: HDP)
2015年 創業以来最速で売上$100Mを達成
Apache NiFiのOnyara社を買収、Hortonworks DataFlow (HDF)を市場投入
2016年 Billingが$270M越す
Hortonworks Data Cloud (HDC) for AWSを市場投入
2016年 DellEMC社と提携 Pivotal Hadoop Distroを Hortonworks Data Platform
(HDP) に
2017年 6月 IBM社と提携 BigInsight Hadoop Distroを HDPに
9月 サイバーセキュリティ HCPとデータプレーンサービスDPSを市場投入
9月 NECグローバル契約締結
2018年 1月 HDF3.1市場投入
6月 HDP3.0市場投入
6月 Google Cloudとの連携を拡大
6月 Microsoft社との提携を強化
創業以来の売上推移
2011 創業
2013 $24.085M
2014 $46.048M + 91.1% IPO
2015 $121.944M + 164.8%
2016 $184.461M + 51.3%
2017 $261.810M + 41.9%
5 © Hortonworks Inc. 2011–2018. All rights reserved
Big data AIサンドイッチの
作り方
6 © Hortonworks Inc. 2011–2018. All rights reserved
AIサンドイッチの中身
これらのMachine Learning/Deep LearningのWorkflowをどうやって統合するか?
Computer Vision
• Object Recognition
• Image Classification
• Object Detection
• Motion Estimation
• Annotation
• Visual Question and Answer
• Autonomous Driving
• Speech to Text
• Speech Recognition
• Chat Bot
• Voice UI
Speech Recognition Natural Language Processing
• Sentiment Analysis
• Text Classification
• Named Entity Recognition
https://github.com/zackchase/mxnet-the-straight-dope
Recommender Systems
• Content-based
Recommendations
7 © Hortonworks Inc. 2011–2018. All rights reserved
Bigdata AI サンドイッチ レシピ
• 材料
• Apache NiFi
• MiNiFi Agent
• TensorFlow
• Apache Hadoop
8 © Hortonworks Inc. 2011–2018. All rights reserved
Bigdata AI サンドイッチ 構成図(Basic版)
Ingestion
Simple Event Processing
Destination
Build
Predictive Model
From Historical Data
Deploy
Predictive Model
For Real-time Insights
Perishable Insights
Historical Insights
9 © Hortonworks Inc. 2011–2018. All rights reserved
Bigdata AI サンドイッチ 構成図(Professional版)
Ingestion
Simple Event Processing
Engine
Stream Processing
Destination
Data Bus
Build
Predictive Model
From Historical Data
Deploy
Predictive Model
For Real-time Insights
Perishable Insights
Historical Insights
10 © Hortonworks Inc. 2011–2018. All rights reserved
Deep Learning Components
Streaming Analytics
Manager
Machine Learning
Distributed queue
Buffering
Process decoupling
Streaming and SQL
Orchestration
Queueing
Simple Event Processing
REST API
Secure Spark Execution
11 © Hortonworks Inc. 2011–2018. All rights reserved
Streaming Analytics
Manager
Detect metadata and data
Extract metadata and data
Content Analysis
Deep Learning Framework
Entity Resolution
Natural Language Processing
Deep Learning Components
Work with MiNiFi Agent
Deep Learning Framework
12 © Hortonworks Inc. 2011–2018. All rights reserved
What do we want to do?
• MiNiFi ingests camera images and
sensor data
• MiNiFi executes algorithms at the edge
• Run Trained Inception Classification to
recognize objects in image
• Apache NiFi stores images, metadata
and enriched data in Hadoop
• Apache NiFi ingests social data and
REST feeds
• Apache OpenNLP and Apache Tika for
textual data
13 © Hortonworks Inc. 2011–2018. All rights reserved
Recommendations
• Model Training
• Install CPU Version on CPU YARN Nodes
• Install GPU Version on Nvidia (CUDA)
• Do training on GPU YARN Nodes where possible
• Model Applying
• Apply Model on All Nodes and Trigger with Apache NiFi
• What helps Hadoop and Spark will help TensorFlow.
• More RAM, More and Faster Cores, More Nodes.
• Try YARN 3.1 Containerized TensorFlow.
14 © Hortonworks Inc. 2011–2018. All rights reserved
Aggregate all data from sensors, drones, logs, geo-location devices,
machines and social feeds
Collect: Bring Together
Mediate point-to-point and bi-directional data flows, delivering data
reliably to Apache HBase, Apache Hive, HDFS, Slack and Email.
Conduct: Mediate the Data Flow
Parse, filter, join, transform, fork, query, sort, dissect; enrich with weather,
location, sentiment analysis, image analysis, object detection, image
recognition, voice recognition with Apache Tika, Apache OpenNLP,
TensorFlow and Apache MXNet.
Curate: Gain Insights
15 © Hortonworks Inc. 2011–2018. All rights reserved
Apache NiFi 紹介
16 © Hortonworks Inc. 2011–2018. All rights reserved
Why Apache NiFi?
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Hundreds of processors
• Visual command and
control
• Over a fifty sources
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
• Version Control
17 © Hortonworks Inc. 2011–2018. All rights reserved
まずはデータがなければ始まらない
HDPクラスタ
データ分析
ビッグデータも、AIも
まずデータがなければ始まらない
どうやってデータを集めてくればよい?
Web App, Logs, RDBMS, NoSQL
TCP, HTTP, WebSocket,
JMS, Syslog, Email, Image
JSON, CSV, XML, Avro, Parquet
… etc. 多種多様な入力
18 © Hortonworks Inc. 2011–2018. All rights reserved
Apache NiFiを利用したデータインジェスション
MiNiFi
Web App, Logs, RDBMS, NoSQL
TCP, HTTP, WebSocket,
JMS, Syslog, Email, Image
JSON, CSV, XML, Avro, Parquet
… etc. 多種多様な入力
エッジ、オンプレ、クラウド間
セキュアなデータ転送
HadoopクラスタNiFiクラスタ データ分析
19 © Hortonworks Inc. 2011–2018. All rights reserved
220以上のエコシステム連携用プロセッサ
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
Route Content
Route Context
Route Text
Control Rate
Distribute Load
Generate Table Fetch
Jolt Transform JSON
Prioritized Delivery
Encrypt
Tail
Evaluate
Execute
All Apache project logos are trademarks of the ASF and the respective projects.
Fetch
HTTP
Syslog
Email
HTML
Image
HL7
FTP
UDP
XML
SFTP
AMQP
WebSocket
20 © Hortonworks Inc. 2011–2018. All rights reserved
Few possible scenarios with NiFi
• Ingestion: connectors to read/write data from/to several data sources
• Protocols: FTP, HTTP, Syslog, email, WS, etc
• Databases: JDBC, MongoDB, HBase, Cassandra, etc
• Brokers: Kafka, JMS, AMQP, MQTT, etc
• Transformation:
• Format conversion (JSON to Avro, CSV to ORC, etc
• Compression/decompression, Merge, Split, encryption, etc
• Data enrichment
• Attribute, content, rules, etc
• Routing
• Priority, dynamic/static, based on content or metadata, etc
• Parsing (XML, JSON, Regex, Grok, etc)
• Etc …
21 © Hortonworks Inc. 2011–2018. All rights reserved
Drag-and-Drop でデータフローを作成
22 © Hortonworks Inc. 2011–2018. All rights reserved
HDP + HDF Component Land Scope
SAM
Storm
MiNiFi
Web App, Logs, RDBMS, NoSQL
TCP, HTTP, WebSocket,
JMS, Syslog, Email, Image
JSON, CSV, XML, Avro, Parquet
… etc. multiple data source/format
Securely transfer data
between edge, On-premise and Cloud
HDP ClusterHDF(NiFi/Kafka/Storm) Cluster Streaming Application
Development
Cluster operation and
management
Data Analytics
Model
Authorization policy management
23 © Hortonworks Inc. 2011–2018. All rights reserved
Event Broker Cluster
Sensor Sources
Truck Sensors
Truck Sensors
Truck Sensors
Truck Sensors
HDFができること
Flow Management
Clusters
Ingress
Gateway
Nifi
Site to Site
Protocol
Egress
Gateway
Stream Analytics Cluster
Ingest
Streams
Generate
Insights
Real-Time Apps
Real-time
Apps &
Exploration Platform
24 © Hortonworks Inc. 2011–2018. All rights reserved
TensorFlow
25 © Hortonworks Inc. 2011–2018. All rights reserved
What is TensorFlow?
• Google
• Multiple platform
support
• Hadoop integration
• Spark integration
• Keras
• Large Community
• Python and Java APIs
• GPU Support
• Mobile Support
• Inception v3
• Clustering
• Fully functional demos
• Open Source
• Apache Licensed
• Large Model Library
• Buzz
• Extensive Documentation
• Raspberry Pi Support
26 © Hortonworks Inc. 2011–2018. All rights reserved
TensorFlow with Hadoop 3.1
27 © Hortonworks Inc. 2011–2018. All rights reserved
TensorFlow Serving on YARN 3.1 https://github.com/NVIDIA/nvidia-docker
We use NVIDIA Docker
containers on top of YARN
28 © Hortonworks Inc. 2011–2018. All rights reserved
Run TensorFlow on YARN 3.1
https://community.hortonworks.com/articles/83872/data-lake-30-containerization-erasure-coding-gpu-p.html
29 © Hortonworks Inc. 2011–2018. All rights reserved
Run TensorFlow on YARN 3.1
https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/
30 © Hortonworks Inc. 2011–2018. All rights reserved
python classify_image.py --image_file /opt/demo/dronedata/Bebop2_20160920083655-0400.jpg
solar dish, solar collector, solar furnace (score = 0.98316)
window screen (score = 0.00196)
manhole cover (score = 0.00070)
radiator (score = 0.00041)
doormat, welcome mat (score = 0.00041)
bazel-bin/tensorflow/examples/label_image/label_image --
image=/opt/demo/dronedata/Bebop2_20160920083655-0400.jpg
tensorflow/examples/label_image/main.cc:204] solar dish (577): 0.983162I
tensorflow/examples/label_image/main.cc:204] window screen (912): 0.00196204I
tensorflow/examples/label_image/main.cc:204] manhole cover (763): 0.000704005I
tensorflow/examples/label_image/main.cc:204] radiator (571): 0.000408321I
tensorflow/examples/label_image/main.cc:204] doormat (972): 0.000406186
TensorFlow via Python or C++ Binary
31 © Hortonworks Inc. 2011–2018. All rights reserved
TensorFlow Java Processor in NiFi
https://community.hortonworks.com/content/kbentry/116803/building-a-custom-processor-in-
apache-nifi-12-for.html
https://github.com/tspannhw/nifi-tensorflow-processor
https://community.hortonworks.com/articles/178498/integrating-tensorflow-
16-image-labelling-with-hdf.html
32 © Hortonworks Inc. 2011–2018. All rights reserved
TensorFlow Java Processor in NiFi
Installation On A Single Node of Apache NiFi 1.5+
Download NAR here: https://github.com/tspannhw/nifi-tensorflow-
processor/releases/tag/1.6
Install NAR file to /usr/hdf/current/nifi/lib/
Create a model directory (/opt/demo/models)
wget https://raw.githubusercontent.com/tspannhw/nifi-tensorflow-processor/master/nifi-
tensorflow-processors/src/test/resources/models/imagenet_comp_graph_label_strings.txt
wget https://github.com/tspannhw/nifi-tensorflow-processor/blob/master/nifi-tensorflow-
processors/src/test/resources/models/tensorflow_inception_graph.pb?raw=true
Restart Apache NiFi via Ambari
33 © Hortonworks Inc. 2011–2018. All rights reserved
TensorFlow Java Processor in NiFi
34 © Hortonworks Inc. 2011–2018. All rights reserved
TensorFlow Running on Edge Nodes (MiniFi)
CREATE EXTERNAL TABLE IF NOT EXISTS tfimage (image
STRING, ts STRING, host STRING, score STRING,
human_string STRING, node_id FLOAT) STORED AS ORC
LOCATION '/tfimage'
35 © Hortonworks Inc. 2011–2018. All rights reserved
Deploy
Capture Billions of images in
data lake in Core
Pool GPUs and CPUs
- think a giant super computer
for 100x faster processing
Deploy data intensive containerized
deep learning micro-services in minutes
Train deep learning models using
GPUs & images in data lake
Edge
Nvidia Drive PX 2
Use Case — Autonomous Driving Car
36 © Hortonworks Inc. 2011–2018. All rights reserved
37 © Hortonworks Inc. 2011–2018. All rights reserved
Watson:様々なAPIが用意されている
• https://console.bluemix.net/docs/
38 © Hortonworks Inc. 2011–2018. All rights reserved
画像識別 API
• https://console.bluemix.net/docs/services/visual-recognition/getting-
started.html#getting-started-tutorial
39 © Hortonworks Inc. 2011–2018. All rights reserved
TEST
43 © Hortonworks Inc. 2011–2018. All rights reserved
まとめ
44 © Hortonworks Inc. 2011–2018. All rights reserved
BigData AI サンドイッチ まとめ
• NiFi
• データ収集、Data Flow
• TensorFlow
• Deep Learning
• Hadoop/Spark
• データ蓄積、処理
45 © Hortonworks Inc. 2011–2018. All rights reserved
Bigdata AI サンドイッチ 構成図(Professional版)
Ingestion
Simple Event Processing
Engine
Stream Processing
Destination
Data Bus
Build
Predictive Model
From Historical Data
Deploy
Predictive Model
For Real-time Insights
Perishable Insights
Historical Insights
46 © Hortonworks Inc. 2011–2018. All rights reserved
Thank you

More Related Content

What's hot

Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFiIntelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
DataWorks Summit
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
DataWorks Summit
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
DataWorks Summit
 

What's hot (20)

The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop Summit
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
 
Open Computer Vision with OpenCV, Apache NiFi, TensorFlow, Python
Open Computer Vision with OpenCV, Apache NiFi, TensorFlow, PythonOpen Computer Vision with OpenCV, Apache NiFi, TensorFlow, Python
Open Computer Vision with OpenCV, Apache NiFi, TensorFlow, Python
 
Apache NiFi: Ingesting Enterprise Data At Scale
Apache NiFi:   Ingesting Enterprise Data At Scale Apache NiFi:   Ingesting Enterprise Data At Scale
Apache NiFi: Ingesting Enterprise Data At Scale
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFiIntelligently Collecting Data at the Edge – Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 
Open Source Predictive Analytics Pipeline with Apache NiFi and MiniFi Princeton
Open Source Predictive Analytics Pipeline with Apache NiFi and MiniFi PrincetonOpen Source Predictive Analytics Pipeline with Apache NiFi and MiniFi Princeton
Open Source Predictive Analytics Pipeline with Apache NiFi and MiniFi Princeton
 
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
 
BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFi
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
 
NiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseNiFi Best Practices for the Enterprise
NiFi Best Practices for the Enterprise
 
Deep learning on HDP 2018 Prague
Deep learning on HDP 2018 PragueDeep learning on HDP 2018 Prague
Deep learning on HDP 2018 Prague
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
 

Similar to Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方

Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
DataWorks Summit
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
skumpf
 
IoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFi
DataWorks Summit
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 

Similar to Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方 (20)

Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018
 
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
 
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
IoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFi
 
Apache MXNet for IoT with Apache NiFi
Apache MXNet for IoT with Apache NiFiApache MXNet for IoT with Apache NiFi
Apache MXNet for IoT with Apache NiFi
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
Apache deep learning 101
Apache deep learning 101Apache deep learning 101
Apache deep learning 101
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData Day
 

Recently uploaded

Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 

Recently uploaded (20)

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 

Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方 Zhen Zeng Solution Engineer 5th July, 2018
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved Agenda • 自己紹介 • Bigdata AI サンドイッチの作り方 • NiFi • TensorFlow • NiFi/TensorFlow/Hadoopとの組み合わせ
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved About Me • 曾 臻(Zhen Zeng) • Solution Engineer, Hortonworks Japan • Java Engineer, BigData Engineer
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved Hortonworks 会社概要 会社概要: 本社 米国カリフォルニア州サンタクララ市 次世代データプラットフォームの世界標準・デファクトスタンダードを提供するオープンソースソフトウェア企業の世界的リーダー 2017年売上実績 $ 261.8M (前年同期比 +42%) Q4 2017/2016 Support Subscription売上高 +63% YoY データレイクの市場浸透と BigData, IoTの基盤の 標準技術としての採用が加速し売上が順調に伸びている 創立 2011年 Yahoo!のApache Hadoop オリジナルチームのメンバー24人のエンジニアが設立 役員 CEO: ロブ・ビアデン、COO:スコット・デイビッドソン オープンソースソフトウェアへ100%コミット Apache Hadoop プロジェクトへの貢献世界一 2011年 創業、Microsoft社 (Azure HDInsight )と提携 2014年 9月 日本法人ホートンワークスジャパン株式会社設立 12月 NASDAQ上場(NASDAQ: HDP) 2015年 創業以来最速で売上$100Mを達成 Apache NiFiのOnyara社を買収、Hortonworks DataFlow (HDF)を市場投入 2016年 Billingが$270M越す Hortonworks Data Cloud (HDC) for AWSを市場投入 2016年 DellEMC社と提携 Pivotal Hadoop Distroを Hortonworks Data Platform (HDP) に 2017年 6月 IBM社と提携 BigInsight Hadoop Distroを HDPに 9月 サイバーセキュリティ HCPとデータプレーンサービスDPSを市場投入 9月 NECグローバル契約締結 2018年 1月 HDF3.1市場投入 6月 HDP3.0市場投入 6月 Google Cloudとの連携を拡大 6月 Microsoft社との提携を強化 創業以来の売上推移 2011 創業 2013 $24.085M 2014 $46.048M + 91.1% IPO 2015 $121.944M + 164.8% 2016 $184.461M + 51.3% 2017 $261.810M + 41.9%
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved Big data AIサンドイッチの 作り方
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved AIサンドイッチの中身 これらのMachine Learning/Deep LearningのWorkflowをどうやって統合するか? Computer Vision • Object Recognition • Image Classification • Object Detection • Motion Estimation • Annotation • Visual Question and Answer • Autonomous Driving • Speech to Text • Speech Recognition • Chat Bot • Voice UI Speech Recognition Natural Language Processing • Sentiment Analysis • Text Classification • Named Entity Recognition https://github.com/zackchase/mxnet-the-straight-dope Recommender Systems • Content-based Recommendations
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved Bigdata AI サンドイッチ レシピ • 材料 • Apache NiFi • MiNiFi Agent • TensorFlow • Apache Hadoop
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved Bigdata AI サンドイッチ 構成図(Basic版) Ingestion Simple Event Processing Destination Build Predictive Model From Historical Data Deploy Predictive Model For Real-time Insights Perishable Insights Historical Insights
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved Bigdata AI サンドイッチ 構成図(Professional版) Ingestion Simple Event Processing Engine Stream Processing Destination Data Bus Build Predictive Model From Historical Data Deploy Predictive Model For Real-time Insights Perishable Insights Historical Insights
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved Deep Learning Components Streaming Analytics Manager Machine Learning Distributed queue Buffering Process decoupling Streaming and SQL Orchestration Queueing Simple Event Processing REST API Secure Spark Execution
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved Streaming Analytics Manager Detect metadata and data Extract metadata and data Content Analysis Deep Learning Framework Entity Resolution Natural Language Processing Deep Learning Components Work with MiNiFi Agent Deep Learning Framework
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved What do we want to do? • MiNiFi ingests camera images and sensor data • MiNiFi executes algorithms at the edge • Run Trained Inception Classification to recognize objects in image • Apache NiFi stores images, metadata and enriched data in Hadoop • Apache NiFi ingests social data and REST feeds • Apache OpenNLP and Apache Tika for textual data
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved Recommendations • Model Training • Install CPU Version on CPU YARN Nodes • Install GPU Version on Nvidia (CUDA) • Do training on GPU YARN Nodes where possible • Model Applying • Apply Model on All Nodes and Trigger with Apache NiFi • What helps Hadoop and Spark will help TensorFlow. • More RAM, More and Faster Cores, More Nodes. • Try YARN 3.1 Containerized TensorFlow.
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved Aggregate all data from sensors, drones, logs, geo-location devices, machines and social feeds Collect: Bring Together Mediate point-to-point and bi-directional data flows, delivering data reliably to Apache HBase, Apache Hive, HDFS, Slack and Email. Conduct: Mediate the Data Flow Parse, filter, join, transform, fork, query, sort, dissect; enrich with weather, location, sentiment analysis, image analysis, object detection, image recognition, voice recognition with Apache Tika, Apache OpenNLP, TensorFlow and Apache MXNet. Curate: Gain Insights
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved Apache NiFi 紹介
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved Why Apache NiFi? • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a fifty sources • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved まずはデータがなければ始まらない HDPクラスタ データ分析 ビッグデータも、AIも まずデータがなければ始まらない どうやってデータを集めてくればよい? Web App, Logs, RDBMS, NoSQL TCP, HTTP, WebSocket, JMS, Syslog, Email, Image JSON, CSV, XML, Avro, Parquet … etc. 多種多様な入力
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved Apache NiFiを利用したデータインジェスション MiNiFi Web App, Logs, RDBMS, NoSQL TCP, HTTP, WebSocket, JMS, Syslog, Email, Image JSON, CSV, XML, Avro, Parquet … etc. 多種多様な入力 エッジ、オンプレ、クラウド間 セキュアなデータ転送 HadoopクラスタNiFiクラスタ データ分析
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved 220以上のエコシステム連携用プロセッサ Hash Extract Merge Duplicate Scan GeoEnrich Replace ConvertSplit Translate Route Content Route Context Route Text Control Rate Distribute Load Generate Table Fetch Jolt Transform JSON Prioritized Delivery Encrypt Tail Evaluate Execute All Apache project logos are trademarks of the ASF and the respective projects. Fetch HTTP Syslog Email HTML Image HL7 FTP UDP XML SFTP AMQP WebSocket
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved Few possible scenarios with NiFi • Ingestion: connectors to read/write data from/to several data sources • Protocols: FTP, HTTP, Syslog, email, WS, etc • Databases: JDBC, MongoDB, HBase, Cassandra, etc • Brokers: Kafka, JMS, AMQP, MQTT, etc • Transformation: • Format conversion (JSON to Avro, CSV to ORC, etc • Compression/decompression, Merge, Split, encryption, etc • Data enrichment • Attribute, content, rules, etc • Routing • Priority, dynamic/static, based on content or metadata, etc • Parsing (XML, JSON, Regex, Grok, etc) • Etc …
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved Drag-and-Drop でデータフローを作成
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved HDP + HDF Component Land Scope SAM Storm MiNiFi Web App, Logs, RDBMS, NoSQL TCP, HTTP, WebSocket, JMS, Syslog, Email, Image JSON, CSV, XML, Avro, Parquet … etc. multiple data source/format Securely transfer data between edge, On-premise and Cloud HDP ClusterHDF(NiFi/Kafka/Storm) Cluster Streaming Application Development Cluster operation and management Data Analytics Model Authorization policy management
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved Event Broker Cluster Sensor Sources Truck Sensors Truck Sensors Truck Sensors Truck Sensors HDFができること Flow Management Clusters Ingress Gateway Nifi Site to Site Protocol Egress Gateway Stream Analytics Cluster Ingest Streams Generate Insights Real-Time Apps Real-time Apps & Exploration Platform
  • 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved TensorFlow
  • 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved What is TensorFlow? • Google • Multiple platform support • Hadoop integration • Spark integration • Keras • Large Community • Python and Java APIs • GPU Support • Mobile Support • Inception v3 • Clustering • Fully functional demos • Open Source • Apache Licensed • Large Model Library • Buzz • Extensive Documentation • Raspberry Pi Support
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved TensorFlow with Hadoop 3.1
  • 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved TensorFlow Serving on YARN 3.1 https://github.com/NVIDIA/nvidia-docker We use NVIDIA Docker containers on top of YARN
  • 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved Run TensorFlow on YARN 3.1 https://community.hortonworks.com/articles/83872/data-lake-30-containerization-erasure-coding-gpu-p.html
  • 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved Run TensorFlow on YARN 3.1 https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/
  • 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved python classify_image.py --image_file /opt/demo/dronedata/Bebop2_20160920083655-0400.jpg solar dish, solar collector, solar furnace (score = 0.98316) window screen (score = 0.00196) manhole cover (score = 0.00070) radiator (score = 0.00041) doormat, welcome mat (score = 0.00041) bazel-bin/tensorflow/examples/label_image/label_image -- image=/opt/demo/dronedata/Bebop2_20160920083655-0400.jpg tensorflow/examples/label_image/main.cc:204] solar dish (577): 0.983162I tensorflow/examples/label_image/main.cc:204] window screen (912): 0.00196204I tensorflow/examples/label_image/main.cc:204] manhole cover (763): 0.000704005I tensorflow/examples/label_image/main.cc:204] radiator (571): 0.000408321I tensorflow/examples/label_image/main.cc:204] doormat (972): 0.000406186 TensorFlow via Python or C++ Binary
  • 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved TensorFlow Java Processor in NiFi https://community.hortonworks.com/content/kbentry/116803/building-a-custom-processor-in- apache-nifi-12-for.html https://github.com/tspannhw/nifi-tensorflow-processor https://community.hortonworks.com/articles/178498/integrating-tensorflow- 16-image-labelling-with-hdf.html
  • 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved TensorFlow Java Processor in NiFi Installation On A Single Node of Apache NiFi 1.5+ Download NAR here: https://github.com/tspannhw/nifi-tensorflow- processor/releases/tag/1.6 Install NAR file to /usr/hdf/current/nifi/lib/ Create a model directory (/opt/demo/models) wget https://raw.githubusercontent.com/tspannhw/nifi-tensorflow-processor/master/nifi- tensorflow-processors/src/test/resources/models/imagenet_comp_graph_label_strings.txt wget https://github.com/tspannhw/nifi-tensorflow-processor/blob/master/nifi-tensorflow- processors/src/test/resources/models/tensorflow_inception_graph.pb?raw=true Restart Apache NiFi via Ambari
  • 33. 33 © Hortonworks Inc. 2011–2018. All rights reserved TensorFlow Java Processor in NiFi
  • 34. 34 © Hortonworks Inc. 2011–2018. All rights reserved TensorFlow Running on Edge Nodes (MiniFi) CREATE EXTERNAL TABLE IF NOT EXISTS tfimage (image STRING, ts STRING, host STRING, score STRING, human_string STRING, node_id FLOAT) STORED AS ORC LOCATION '/tfimage'
  • 35. 35 © Hortonworks Inc. 2011–2018. All rights reserved Deploy Capture Billions of images in data lake in Core Pool GPUs and CPUs - think a giant super computer for 100x faster processing Deploy data intensive containerized deep learning micro-services in minutes Train deep learning models using GPUs & images in data lake Edge Nvidia Drive PX 2 Use Case — Autonomous Driving Car
  • 36. 36 © Hortonworks Inc. 2011–2018. All rights reserved
  • 37. 37 © Hortonworks Inc. 2011–2018. All rights reserved Watson:様々なAPIが用意されている • https://console.bluemix.net/docs/
  • 38. 38 © Hortonworks Inc. 2011–2018. All rights reserved 画像識別 API • https://console.bluemix.net/docs/services/visual-recognition/getting- started.html#getting-started-tutorial
  • 39. 39 © Hortonworks Inc. 2011–2018. All rights reserved TEST
  • 40. 43 © Hortonworks Inc. 2011–2018. All rights reserved まとめ
  • 41. 44 © Hortonworks Inc. 2011–2018. All rights reserved BigData AI サンドイッチ まとめ • NiFi • データ収集、Data Flow • TensorFlow • Deep Learning • Hadoop/Spark • データ蓄積、処理
  • 42. 45 © Hortonworks Inc. 2011–2018. All rights reserved Bigdata AI サンドイッチ 構成図(Professional版) Ingestion Simple Event Processing Engine Stream Processing Destination Data Bus Build Predictive Model From Historical Data Deploy Predictive Model For Real-time Insights Perishable Insights Historical Insights
  • 43. 46 © Hortonworks Inc. 2011–2018. All rights reserved Thank you

Editor's Notes

  1. TALK TRACK Hortonworks Powers the Future of Data: data-in-motion, data-at-rest, and Modern Data Applications. [NEXT SLIDE]
  2. Kafka Reads events in memory and write to  distributed log 
  3. Kafka Reads events in memory and write to  distributed log 
  4. https://www.tensorflow.org/tutorials/image_recognition https://github.com/tensorflow/models https://github.com/yahoo/TensorFlowOnSpark/wiki/Conversion https://community.hortonworks.com/articles/54954/setting-up-gpu-enabled-tensorflow-to-work-with-zep.html
  5. We install the GPU enabled tensorflow on the nodes that have GPUs and CPU version on the others. We label which ones have GPUs and send to those for training.
  6. https://community.hortonworks.com/articles/54954/setting-up-gpu-enabled-tensorflow-to-work-with-zep.html
  7. https://community.hortonworks.com/articles/54954/setting-up-gpu-enabled-tensorflow-to-work-with-zep.html
  8. Kafka Reads events in memory and write to  distributed log