SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Karthikeyan Nagalingam
Nilesh Bagad
ANALYZING IOT DATA IN APACHE
SPARK ACROSS DATA CENTERS
AND CLOUD
By
Agenda
• IoT Data Management Challenges
• NetApp Data Fabric Architecture for Big Data
• IoT Customer Use Cases on Data Fabric
• Q&A
IoT Data Management
Challenges
IoT Data Flow
Data is created1
Data is analyzed in realtime2
Data is aggregated and sent to Core3
Data is stored1
EdgeEdge
Data Center
Edge
Geo Distributed Data Lake
…. ….
Data is analyzed2
Data is protected3
EDGE
CORE
Cloud
IoT Data Management Challenges
AnalyzeStoreCollect Transport Protect
• Flexibility and Agility
• Cost
• Data Protection
NetApp Data Fabric
Architecture for Big Data
The NetApp Data Fabric
Helping customers unleash data to address their business imperatives
HARNESS
the power of the
hybrid cloud
BUILD
a next-generation
data center
MODERNIZE
storage through
data management
PUBLIC
CLOUD
MULTI-
CLOUD
NEXT-GEN
DATA
CENTER
ENTERPRISE
IT
NETAPP
DATA
FABRIC
Introducing Data Fabric Building Blocks for Analytics
NetApp Private Storage
AFF
A200
4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
Applications
ONTAP
Express Route
SnapMirror®
Applications Applications Applications
S3
Direct Connect
Cold
Data
StorageGRID
FabricPool
In Place Analytics:
+ Enable big data analytics on NFSv3 data
AFF
A200
4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
Apache Spark
Cluster
NetApp FAS NFS Connector
Apache Spark
Cluster
NetApp FAS NFS Connector
HDFS
NFS
AFF
A200
4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
NFS
Key Benefits
• Avoid data move to HDFS.
Reduce replicas
• Scale compute and storage
independently
• Enterprise data protection
• Hybrid cloud deployment
• Hortonworks Certified
Confit 1 : NFS as a Storage Confit 2 : HDFS and NFS in Single Spark Cluster
Analytics with Data Fabric
NetApp Private Storage
AFF
A200
4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
Apache Spark Cluster
NetApp FAS NFS Connector
ONTAP
NFS
On Premises
HDInsight Spark Cluster
NetApp FAS NFS Connector
Databrics or EMR
NetApp FAS NFS Connector
Express Route
SnapMirror®
Direct Connect
IoT Customer Use Cases
on Data Fabric
Customer Scenario
• IoT data received in AWS and analyzed using Apache Spark
cluster in AWS
• Data Management Challenge:
– How to Backup 10 TB data without load on cluster?
– How to protect the data to on-premises?
Broadcasting Provider
An Architecture for Processing IoT-Data Ingested in Cloud
Backup and DR to On Premises
AFF
A200
4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
Apache Spark Cluster
NetApp FAS NFS Connector
ONTAP
NFS
On Premises
EBS
Spark Cluster
NetApp FAS NFS
Connector
Kafka
Events Via
Rest API
Customer Scenario
• IoT data is received in AWS and analyzed using Apache Spark
Cluster in AWS
• Data Management Challenge:
– How to reduce the solution cost?
– How to consume analytics services in data center and multiple
clouds?
IT Service Provider
An Architecture for Processing IoT-Data Ingested in Cloud
Multi Cloud Connectivity
NetApp Private Storage
AFF
A200
4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
Apache Spark Cluster
NetApp FAS NFS Connector
ONTAP
NFS
On Premises
Analytics Service
NetApp FAS NFS Connector
Spark Cluster
NetApp FAS NFS Connector
Express Route
SnapMirror®
Direct Connect
Kafka
Events Via
Rest API
Customer Scenario
• IoT data received on-premises and analyzed using Cloudera Spark
Cluster across data center and cloud
• Data Management Challenge:
– How to leverage cloud computation for analytics ?
– How to consume legacy data (7PB) for analytics?
Insurance Company
An Architecture for Processing IoT-Data Ingested on premises
DR in Cloud; Analytic across data centers
NetApp Private Storage
AFF
A200
4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
NetApp FAS NFS
Connector
ONTAP
NFS
On Premises
HDInsight
NetApp FAS NFS Connector
DataBricks
NetApp FAS NFS Connector
Express Route
SnapMirror®
Direct Connect
Spark ClusterKafka
Events Via
Rest API
Customer Scenario
• IoT data received on-premises and stored in Hadoop Data Lake.
Data needs to be backed up for compliance
• Data Management Challenge:
– How to reduce the backup window and optimize solution cost?
– How to minimize impact on analytics performance during
backup?
Large Bank
Use Case: Backup for IoT Data
HDFS
(SAN)
Tape
HDFS
(DAS)
Backup
Edge
OR + Archive
Current
distcp –update
HDFS (snap) NetApp (snap)
HDFS
(SAN)
FlexClone
NetBackup
(Files)
NFS Export
NetApp Solution ATypical
HDFS
(DAS)
HDFS
(DAS)
Backup
Edge
OR + Archive
Proposed
distcp -update –diff
HDFS (snap)
distcp -update –diff
(Files)
NFS
FlexClone
NFS Connector
NetApp Solution B
HDFS
(DAS)
SnapMirror
FlexClone
FlexClone FlexClone
Run lots
of parallel
mappers
NetApp Backup Solution A
• NetApp Snapshot Backup
• Backup Archival
• Cloud Compatible
NetApp Backup Solution B
• Hadoop Native Support
• Offload Backup Operation
• Enterprise Management
Customer Scenario
• Large Hadoop Data Lake implementation on premises with Multiple
Spark clusters
• Data Management Challenge:
– How to make data available for dev/test teams?
– How to build the new cluster in minutes from an existing cluster?
Online Music Distribution
Use Case: Dev/Test for IoT Data
AFF
A200
4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
Production
Apache Spark Cluster
NetApp FAS NFS Connector
ONTAP
NFS
On Premises
Development
Apache Spark Cluster
NetApp FAS NFS Connector
NFS
QA/Test
Apache Spark Cluster
NetApp FAS NFS Connector
NFS
FlexClone FlexClone
Customer Scenario
• Run analytics on archival data in object store
• Data Management Challenge:
– How to run Hadoop jobs in object store
– Archive the Hadoop data on primary or a secondary site
Online Marketplace
Analytics with NetApp StorageGRID
In place analytics with StorageGRID
Secondary Site
AFF
A200
4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
Apache Spark Cluster
NetApp FAS NFS Connector
ONTAP
NFS
On Premises
Direct Connect
AFF
A200
4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
1600GB
StorageGRID
StorageGRID
S3
S3
S3A
• Input dataset size – 1.5TB
• ONTAP– 52% better than JBOD
Spark Performance
Type Hadoop
Worker
Nodes
Drives
per
Node
Number of
Storage
Arrays
JBOD 6 12 NA
ONTAP 6 6 1
0
1000
2000
3000
4000
5000
6000
Througput(MBytes/Sec)
MB/Sec
Spark Scala WordCount - Throughput MB/Sec
(Higher is better)
JBOD ONTAP
0
50
100
150
200
250
300
350
400
450
500
Duration(s)
Seconds
Spark Scala WordCount - Duration(s)
( Lower Is better )
JBOD ONTAP
HiBench – Spark Scala Word Count
Key Takeaways
Flexibility and Agility
• On Demand analytics with Hybrid Cloud/Multi Cloud deployments
• Rapid provisioning of clusters for test/dev environments
Lower Cost
• Add storage capacity without adding compute nodes
• One copy vs 3 copies of data for HDFS
• Data Tiering with FabricPool
Enterprise Data Protection
• Efficient backup, DR and Archival
Further Resources
§ Please visit us at: Booth #512
§ Visit our Big Data Website: www.netapp.com/bigdata
Q & A
Thank You.
Nilesh Bagad: nileshb@netapp.com
Karthikeyan Nagalingam: nkarthik@netapp.com

Weitere ähnliche Inhalte

Was ist angesagt?

State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)
Nicolas Poggi
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Databricks
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark Summit
 

Was ist angesagt? (20)

Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
 
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
 
Spark Summit EU talk by John Musser
Spark Summit EU talk by John MusserSpark Summit EU talk by John Musser
Spark Summit EU talk by John Musser
 
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)
 
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
 
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
Scale-Out Using Spark in Serverless Herd Mode!
Scale-Out Using Spark in Serverless Herd Mode!Scale-Out Using Spark in Serverless Herd Mode!
Scale-Out Using Spark in Serverless Herd Mode!
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 

Ähnlich wie Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp Data Fabric and NetApp Private Storage with Nilesh Bagad and Karthikeyan Nagalingam

15.) cloud (opex, capex or hybrid)
15.) cloud (opex, capex or hybrid)15.) cloud (opex, capex or hybrid)
15.) cloud (opex, capex or hybrid)
Jeff Green
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
DataWorks Summit
 

Ähnlich wie Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp Data Fabric and NetApp Private Storage with Nilesh Bagad and Karthikeyan Nagalingam (20)

(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
15.) cloud (opex, capex or hybrid)
15.) cloud (opex, capex or hybrid)15.) cloud (opex, capex or hybrid)
15.) cloud (opex, capex or hybrid)
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 
Model driven telemetry
Model driven telemetryModel driven telemetry
Model driven telemetry
 
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
 
Apache Spark Streaming + Kafka 0.10 with Joan Viladrosariera
Apache Spark Streaming + Kafka 0.10 with Joan ViladrosarieraApache Spark Streaming + Kafka 0.10 with Joan Viladrosariera
Apache Spark Streaming + Kafka 0.10 with Joan Viladrosariera
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
 
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
 
DIY Netflow Data Analytic with ELK Stack by CL Lee
DIY Netflow Data Analytic with ELK Stack by CL LeeDIY Netflow Data Analytic with ELK Stack by CL Lee
DIY Netflow Data Analytic with ELK Stack by CL Lee
 
Sundar Ranganathan, NetApp + Vinod Iyengar, H2O.ai - Driverless AI integratio...
Sundar Ranganathan, NetApp + Vinod Iyengar, H2O.ai - Driverless AI integratio...Sundar Ranganathan, NetApp + Vinod Iyengar, H2O.ai - Driverless AI integratio...
Sundar Ranganathan, NetApp + Vinod Iyengar, H2O.ai - Driverless AI integratio...
 
OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...
OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...
OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...
 
Architecture at Scale
Architecture at ScaleArchitecture at Scale
Architecture at Scale
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
 
Elastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryElastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent Memory
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
 
OpenStack at Scale Inside NetApp
OpenStack at Scale Inside NetAppOpenStack at Scale Inside NetApp
OpenStack at Scale Inside NetApp
 
Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/Day
 
Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech Talks
Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech TalksDeep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech Talks
Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech Talks
 
Detecting Spoofing at IXPs
Detecting Spoofing at IXPsDetecting Spoofing at IXPs
Detecting Spoofing at IXPs
 

Mehr von Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Kürzlich hochgeladen

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Kürzlich hochgeladen (20)

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 

Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp Data Fabric and NetApp Private Storage with Nilesh Bagad and Karthikeyan Nagalingam

  • 1. Karthikeyan Nagalingam Nilesh Bagad ANALYZING IOT DATA IN APACHE SPARK ACROSS DATA CENTERS AND CLOUD By
  • 2. Agenda • IoT Data Management Challenges • NetApp Data Fabric Architecture for Big Data • IoT Customer Use Cases on Data Fabric • Q&A
  • 4. IoT Data Flow Data is created1 Data is analyzed in realtime2 Data is aggregated and sent to Core3 Data is stored1 EdgeEdge Data Center Edge Geo Distributed Data Lake …. …. Data is analyzed2 Data is protected3 EDGE CORE Cloud
  • 5. IoT Data Management Challenges AnalyzeStoreCollect Transport Protect • Flexibility and Agility • Cost • Data Protection
  • 7. The NetApp Data Fabric Helping customers unleash data to address their business imperatives HARNESS the power of the hybrid cloud BUILD a next-generation data center MODERNIZE storage through data management PUBLIC CLOUD MULTI- CLOUD NEXT-GEN DATA CENTER ENTERPRISE IT NETAPP DATA FABRIC
  • 8. Introducing Data Fabric Building Blocks for Analytics NetApp Private Storage AFF A200 4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB Applications ONTAP Express Route SnapMirror® Applications Applications Applications S3 Direct Connect Cold Data StorageGRID FabricPool
  • 9. In Place Analytics: + Enable big data analytics on NFSv3 data AFF A200 4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB Apache Spark Cluster NetApp FAS NFS Connector Apache Spark Cluster NetApp FAS NFS Connector HDFS NFS AFF A200 4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB NFS Key Benefits • Avoid data move to HDFS. Reduce replicas • Scale compute and storage independently • Enterprise data protection • Hybrid cloud deployment • Hortonworks Certified Confit 1 : NFS as a Storage Confit 2 : HDFS and NFS in Single Spark Cluster
  • 10. Analytics with Data Fabric NetApp Private Storage AFF A200 4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB Apache Spark Cluster NetApp FAS NFS Connector ONTAP NFS On Premises HDInsight Spark Cluster NetApp FAS NFS Connector Databrics or EMR NetApp FAS NFS Connector Express Route SnapMirror® Direct Connect
  • 11. IoT Customer Use Cases on Data Fabric
  • 12. Customer Scenario • IoT data received in AWS and analyzed using Apache Spark cluster in AWS • Data Management Challenge: – How to Backup 10 TB data without load on cluster? – How to protect the data to on-premises? Broadcasting Provider
  • 13. An Architecture for Processing IoT-Data Ingested in Cloud Backup and DR to On Premises AFF A200 4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB Apache Spark Cluster NetApp FAS NFS Connector ONTAP NFS On Premises EBS Spark Cluster NetApp FAS NFS Connector Kafka Events Via Rest API
  • 14. Customer Scenario • IoT data is received in AWS and analyzed using Apache Spark Cluster in AWS • Data Management Challenge: – How to reduce the solution cost? – How to consume analytics services in data center and multiple clouds? IT Service Provider
  • 15. An Architecture for Processing IoT-Data Ingested in Cloud Multi Cloud Connectivity NetApp Private Storage AFF A200 4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB Apache Spark Cluster NetApp FAS NFS Connector ONTAP NFS On Premises Analytics Service NetApp FAS NFS Connector Spark Cluster NetApp FAS NFS Connector Express Route SnapMirror® Direct Connect Kafka Events Via Rest API
  • 16. Customer Scenario • IoT data received on-premises and analyzed using Cloudera Spark Cluster across data center and cloud • Data Management Challenge: – How to leverage cloud computation for analytics ? – How to consume legacy data (7PB) for analytics? Insurance Company
  • 17. An Architecture for Processing IoT-Data Ingested on premises DR in Cloud; Analytic across data centers NetApp Private Storage AFF A200 4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB NetApp FAS NFS Connector ONTAP NFS On Premises HDInsight NetApp FAS NFS Connector DataBricks NetApp FAS NFS Connector Express Route SnapMirror® Direct Connect Spark ClusterKafka Events Via Rest API
  • 18. Customer Scenario • IoT data received on-premises and stored in Hadoop Data Lake. Data needs to be backed up for compliance • Data Management Challenge: – How to reduce the backup window and optimize solution cost? – How to minimize impact on analytics performance during backup? Large Bank
  • 19. Use Case: Backup for IoT Data HDFS (SAN) Tape HDFS (DAS) Backup Edge OR + Archive Current distcp –update HDFS (snap) NetApp (snap) HDFS (SAN) FlexClone NetBackup (Files) NFS Export NetApp Solution ATypical HDFS (DAS) HDFS (DAS) Backup Edge OR + Archive Proposed distcp -update –diff HDFS (snap) distcp -update –diff (Files) NFS FlexClone NFS Connector NetApp Solution B HDFS (DAS) SnapMirror FlexClone FlexClone FlexClone Run lots of parallel mappers NetApp Backup Solution A • NetApp Snapshot Backup • Backup Archival • Cloud Compatible NetApp Backup Solution B • Hadoop Native Support • Offload Backup Operation • Enterprise Management
  • 20. Customer Scenario • Large Hadoop Data Lake implementation on premises with Multiple Spark clusters • Data Management Challenge: – How to make data available for dev/test teams? – How to build the new cluster in minutes from an existing cluster? Online Music Distribution
  • 21. Use Case: Dev/Test for IoT Data AFF A200 4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB Production Apache Spark Cluster NetApp FAS NFS Connector ONTAP NFS On Premises Development Apache Spark Cluster NetApp FAS NFS Connector NFS QA/Test Apache Spark Cluster NetApp FAS NFS Connector NFS FlexClone FlexClone
  • 22. Customer Scenario • Run analytics on archival data in object store • Data Management Challenge: – How to run Hadoop jobs in object store – Archive the Hadoop data on primary or a secondary site Online Marketplace
  • 23. Analytics with NetApp StorageGRID In place analytics with StorageGRID Secondary Site AFF A200 4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB Apache Spark Cluster NetApp FAS NFS Connector ONTAP NFS On Premises Direct Connect AFF A200 4 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 94 5 6 70 1 2 3 1 2 1 3 1 4 1 58 9 1 0 1 1 2 0 2 1 2 2 2 31 6 1 7 1 8 1 9 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB 1600GB StorageGRID StorageGRID S3 S3 S3A
  • 24. • Input dataset size – 1.5TB • ONTAP– 52% better than JBOD Spark Performance Type Hadoop Worker Nodes Drives per Node Number of Storage Arrays JBOD 6 12 NA ONTAP 6 6 1 0 1000 2000 3000 4000 5000 6000 Througput(MBytes/Sec) MB/Sec Spark Scala WordCount - Throughput MB/Sec (Higher is better) JBOD ONTAP 0 50 100 150 200 250 300 350 400 450 500 Duration(s) Seconds Spark Scala WordCount - Duration(s) ( Lower Is better ) JBOD ONTAP HiBench – Spark Scala Word Count
  • 25. Key Takeaways Flexibility and Agility • On Demand analytics with Hybrid Cloud/Multi Cloud deployments • Rapid provisioning of clusters for test/dev environments Lower Cost • Add storage capacity without adding compute nodes • One copy vs 3 copies of data for HDFS • Data Tiering with FabricPool Enterprise Data Protection • Efficient backup, DR and Archival
  • 26. Further Resources § Please visit us at: Booth #512 § Visit our Big Data Website: www.netapp.com/bigdata
  • 27. Q & A
  • 28. Thank You. Nilesh Bagad: nileshb@netapp.com Karthikeyan Nagalingam: nkarthik@netapp.com