SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
Azure IoT services Reference Architecture MICROSOFT CONFIDENTIAL 1
DataStax and Azure IoT Reference Architecture
pg. 2
Table of Contents
Reference Architecture Overview 3
Implementation 3
Device Registry Store 5
Device State Store 6
Real-Time Analytics 7
Batch Analytics 8
Field Gateway 9
Geographical Redundancy 12
Conclusion 13
Summary
Connected sensors, devices, and
intelligent operations are transforming
businesses and enabling new growth
opportunities with Microsoft Azure
Internet of Things (IoT) services.
This document outlines how DataStax
Enterprise, providing a scalable and
resilient IoT infrastructure, can be used
to implement specific components of
the Azure IoT reference architecture.
© 2016 Microsoft Corporation. All rights reserved.
This document is provided "as-is." Information and views expressed in this document, including URL and other Internet
Web site references, may change without notice. You bear the risk of using it. Some examples are for illustration only and
are fictitious. No real association is intended or inferred. This document does not provide you with any legal rights to any
intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes.
DataStax and Azure IoT Reference Architecture 3
1. Reference Architecture Overview
Connected sensors, devices, and intelligent operations are transforming businesses and enabling new growth
opportunities with Microsoft Azure Internet of Things (IoT) services.
This document outlines how to use DataStax Enterprise (DSE) in the Azure IoT reference architecture.
DataStax Enterprise is a geographically distributed and horizontally scalable transactional database based on Apache
Cassandra. It includes integrated Spark analytics for steam processing and machine learning and a graph database for
relationship modeling. DataStax Enterprise is ideal for storing operational data with always available uptime requirements.
Figure 1: IoT solution architecture
Figure 1 shows the conceptual architecture for Azure IoT. This is detailed in the document Microsoft Azure IoT Reference
Architecture1
. DataStax Enterprise can be used to implement a number of components in the architecture, enhancing
performance, functionality and reliability.
2. Implementation
Figure 2, below, shows how DataStax Enterprise can be used as part of the end-to-end Azure IoT reference architecture.
1
https://azure.microsoft.com/en-us/updates/microsoft-azure-iot-reference-architecture-available/
Low power
devices
Existing IoT
devices
IoT Client
Solution UX
Provisioning API
Identity and Registry Stores
Stream Process
Analytics &
Machine Learning
Business
Integration
Connectors
and
Gateway(s)
Device State Store
Gateway
Cloud
Gateway
App Backend
Data Path
Optional solution component
IoT solution component
IoT Client
Presentation & Business
Connectivity
Data Processing, Analytics and ManagementDevice Connectivity
Personal
mobile
devices
IP capable
devices
IoT Client
Business
systems
Storage
DataStax and Azure IoT Reference Architecture 4
Figure 2: DataStax usage in the Azure IoT Reference Architecture
DataStax Enterprise can be used to implement the device registry and the device state stores. Additionally, the analytics
components of DataStax Enterprise can be leveraged to implement the stream processing and analytics portions of the
Azure IoT reference architecture.
Using DataStax Enterprise for these components offers several key advantages:
 Linear Scalability – Able to scale to handle millions of transactions per second
 Resilience to Failure – DataStax Enterprise provides node fault tolerance, rack fault tolerance and data center
level disaster tolerance
 Integrated Analytics – Support for graph databases, full text search and machine learning.
These three advantages are very pertinent to the requirements of IoT stack components. More detail on how they relate to
those components is given later in this paper.
DataStax is working with Mesosphere to deploy DataStax Enterprise in the Mesos Universe, allowing for a push button,
containerized deployment in both Mesos and Azure Container Service (ACS). This allows for simplified provisioning and
orchestration if implementing an open source version of the Azure IoT reference architecture. More information is
available here2
:
Alternatively, if the Azure IoT reference architecture is implemented using primarily Azure services rather than open source
components, DataStax Enterprise can be deployed on Azure VMs using Azure Resource Manager (ARM). For field
gateways, DataStax can be deployed directly on the hardware. This means that a single database infrastructure can be
used across hybrid cloud IoT deployments. This results in simplified operations by avoiding the need to maintain multiple
types of infrastructure.
2
http://www.marketwired.com/press-release/mesosphere-brings-datastax-enterprise-to-the-dc-os-universe-app-store-
2130849.htm
Low power
devices
Existing IoT
devices
IoT Client
Solution UX
Provisioning API
Device Registry Stores
Real-time Analytics
(Spark / Spark R/ Spark ML scoring)
Batch Analytics
(Spark ML training)
Business
Integration
Connectors
and
Gateway(s)
Device State Store
Gateway
IoT Hub
App Backend
Data Path
Optional solution component
IoT solution component
IoT Client
Presentation & Business
Connectivity
Data Processing, Analytics and ManagementDevice Connectivity
Personal
mobile
devices
IP capable
devices
IoT Client
Business
systems
IoT solution component using DataStax
Hadoop
DataStax and Azure IoT Reference Architecture 5
The following sections describe the advantages of using DataStax Enterprise for the different components of the Azure IoT
reference architecture.
3. Device Registry Store
The device registry contains device related metadata attributes and reference data for provisioned devices. The device
registry serves as an index for device discoverability and is used by the solution backend components and UI. Typically, the
device registry contains only slowly changing data. Examples of device registry store data include:
 The building and room number a smoke alarm is installed in
 The installation date for a mixing valve
 The upstream and downstream components connected to a generator.
Uptime is extremely important for the device registry. If it is not available, operations that depend on device metadata will
fail. DataStax Enterprise is very resilient to failure, making it an excellent choice for the device registry store. DataStax
Enterprise clusters are made up of, in descending order, data centers, racks and nodes. DSE is resilient to failure at the
data center, rack and node level.
Using DataStax Enterprise for the device state store has advantages beyond the resilience inherent in the database. DSE is
a multi-model database, including support for tabular data, full text search, analytics and graph databases. The graph
mode is particularly useful for the device registry. It can be used to model the relations between devices and other domain
specific entities. For example, as shown in Figure 3, it can be used to represent the relations between sensors, machines,
factories and products in a manufacturing scenario.
Figure 3: Example of DataStax Enterprise graph model for Azure IoT
Many databases fall into one of the two categories with respect to transactional behavior:
1. Strongly Consistent
2. Eventually Consistent
SENSOR
OPERATOR FACTORY PRODUCT
SENSOR
MACHINE
MACHINE
FAILURE
Monitors
Monitors
Reports
Affects
Part of
Assembles
Producers
Works for
DataStax and Azure IoT Reference Architecture 6
The CAP theorem3
details the tradeoffs between these categories. Essentially strong consistency comes with
disadvantages on scale and reliability. Eventual consistency does better in that regard, but at the cost of potential
inconsistency.
DataStax Enterprise is unique in that it offers tunable consistency. This means that the consistency level can be balanced
with performance and availability characteristics depending on the application. For the device registry store, it may be
advisable to tune toward read performance and consistency. This is because the device registry contains changing data
where inconsistency could impact the application back end and analytics processes.
One example of how tunable consistency can benefit performance in Azure IoT would be to take advantage of write
frequency for the device registry. In this scenario, writes are infrequent but should be reflected across the entire cluster.
We could tune write consistency to ALL. This would require all writes to be acknowledged before an update to devices
registered in the database is acknowledged as successful. Reads in this scenario are much more frequent, so we may want
to bias the consistency tuning to make them occur as quickly as possible. For this reason, we could tune to ONE for reads
in the device registry store. This would mean that a read would be acknowledged as soon as any copy of the data was
returned.
Quorum level consistency options could be used as well. More information on tuning consistency is available in this
article4
.
4. Device State Store
The device state store contains operational data from the devices. Device operational data is high volume and high
velocity data, typically many orders of magnitude more than what is stored in the Device Registry Store. This is because a
single device will produce many readings. Given these data volumes, it’s extremely important to use a highly scalable
database for the device state store.
DataStax Enterprise scales linearly to handle the load demands of millions of devices. Figure 4, below, shows how DSE
performance scales linearly as nodes are added to a cluster. DSE can scale from extremely small to large clusters that can
handle millions of transactions per second. This makes DSE an ideal database for the device state store.
3
https://en.wikipedia.org/wiki/CAP_theorem
4
https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_config_consistency_c.html
DataStax and Azure IoT Reference Architecture 7
Figure 4: Near linear scalability in DataStax Enterprise5
In the case of the device state store, the need for transferring or replicating each data category should be analyzed. Raw
telemetry data might not need to be available on a secondary site. Aggregated data will represent a reduced data volume
which might be easier to replicate if needed.
The device state store has much more dynamic data than the device registry store. Uptime and the ability to handle large
data volumes remain important and the deployment architecture that works well in that scenario remains viable here.
In the device state store, it may be advisable to tune consistency differently than in the device registry. Here the aim is to
optimize for writes rather than reads. In these cases, reads and writes can occur with quorum level consistency.
The Device State Store and Device Registry store can be implemented as distinct databases or as a single database. While
there may be minor latency advantages to implementing distinct databases, for most cases we would recommend a single
database. This simplifies administration and reduces hardware cost.
5. Real-Time Analytics
After ingress through Azure IoT Hub as the cloud gateway, the flow of data through the system is facilitated by data
pumps and analytics tasks. Data pumps are typically moving or routing data without any transformation, while analytics
tasks perform complex event processing. Since the IoT Hub provides brokered communication and supports multiple
consumers, the same data can be consumed by different stream processors for different purposes, which will result in
multiple data streams flowing concurrently. For example, a stream processor may listen only for special types of events,
while another one could perform complex event processing in parallel. Those processors can determine the path of data
and route without any reshaping or perform complex event processing tasks such as data aggregation, data enrichment
5
http://www.datastax.com/apache-cassandra-leads-nosql-benchmark
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
1 2 4 8 16 32
Cassandra Couchbase Hbase MongoDB
Operations/sec
Nodes
DataStax and Azure IoT Reference Architecture 8
through correlation with reference data, as well as analytics tasks such as detection of threshold limits or anomalies and
generation of alerts.
As part of its multi-model capabilities, DataStax Enterprise embeds Apache Spark. Spark consists of a number of
components that are relevant to the Real-time Analytics aspect of the Azure IoT reference architecture, including Spark
Streaming and Spark MLlib.
For hot path analytics, data flows directly from the Azure IoT Hub into Spark Streaming that is integrated into the DataStax
Enterprise runtime. Spark allows you to use a pre-trained MLlib models directly in the Spark Streaming analytics pipeline.
Thus, incoming data can be scored against Spark MLlib models. With this architecture, real-time predictions can be made
against pre-trained machine learning models. Embedding machine learning infrastructure into real-time data feeds using
Spark Streaming allows the system to react quickly to new input, intelligently predicting with greater responsiveness and
accuracy than a traditional business intelligence based approach.
This integration simplifies usage of machine learning models directly in the streaming pipeline. It also improves
performance and latency as the data is processed close to the database.
Some example applications of this infrastructure include:
 Real-time correlation analysis from multiple fire alarm sensors to determine if particulate buildup is due to a fire or
normal wear and tear, allowing for maintenance optimization.
 Prediction of drill head failure in oil drilling through real time modeling of heat and fatigue measurements.
 Integration with workforce management data in real-time to automatically route maintenance personal while
optimizing for job urgency and trip distance.
Hot path analytics with machine learning are where IoT users can extract the greatest amount of business value from their
IoT investment. DataStax Enterprise provides a concrete implementation of this type of IoT analytics infrastructure that
combines the resilience of Cassandra with multi model analytics for a comprehensive operational analytics solution.
6. Batch Analytics
Batch analytics in Azure IoT can be provided by Azure HD Insight (HDI). HDI is ideal for cases where large amounts of data
must be analyzed with batch queries or even ad-hoc queries. Training machine learning models in batch (as opposed to
incremental real-time training) or building monthly roll up reports are both use cases HDI is well suited for.
DataStax and Azure IoT Reference Architecture 9
Figure 5: Machine Learning Lifecycle with Spark
In some cases, where the batch queries are well defined or the amount of data stored is smaller, it may make sense to use
DataStax Enterprise for both the hot path and batch analytics. This results in a lower TCO as only one database needs to
be maintained as opposed to both DSE and HDI. However, this is balanced with some tradeoffs in suitability for large scale
batch queries.
Note that Spark MLlib models trained in HDI Spark as well as Spark in DataStax Enterprise can be exported and used in the
hot path for real-time analytics based on machine learning. This allows the buildout of a full machine learning lifecycle
within the IoT architecture as shown in Figure 5 above.
7. Field Gateway
A field gateway is a specialized device that acts as a communication enabler and as a local device control system and
device data processing hub. A field gateway can perform local processing and control functions for the devices. On the
other side, it can filter and aggregate the device telemetry. This reduces the amount of data transferred to the cloud back
end. Gateways may assist in device provisioning, data filtering, batching and aggregation, buffering of data, protocol
translation, and event processing.
Define Model
Features
(SparkSQL,
BI, etc.)
Train Model
(Batch Spark
MLlib)
Export Model
to Real-Time
Analytics
(Spark MLlib)
Real-Time
Model
Scoring
(Spark
Streaming
and MLlib)
Evaluate
Model
Performance
(SparkSQL,
BI, etc.)
DataStax and Azure IoT Reference Architecture 10
Figure 6: Data Flow in a stateful Field Gateway
Field Gateways that embed DataStax Enterprise are stateful. If connectivity is intermittent, these gateways can operate as
store and forward databases, syncing to the cloud gateways when connectivity is available. This gives a mechanism for
data storage locally, even on gateways with constrained hardware. This is because DataStax Enterprise can run on devices
with minimal hardware resources. It also lays the path for field gateways with sufficiently powerful hardware to embed
advanced analytics for edge processing.
Field Gateways that embed DataStax Enterprise can persist state in a standalone instance of the database or in an edge
database using DataStax Enterprise Advanced Replication. In the Advanced Replication case, with additional integration
work, messages can be passed through the IoT Hub to the central DataStax Enterprise datacenters.
This synchronizes the database automatically, saving a user the tedious exercise of implementing that logic.
An example topology is show in Figure 7.
Figure 7: Advanced Replication for Field Gateways with a DataStax Enterprise cluster
Low power
IoT devices
IoT devices
Field Gateway IoT Hub
Client
IoT Hub
Protocol
Adapter
Data
Buffering
DataStax and Azure IoT Reference Architecture 11
7.1. Edge Processing
In many scenarios, especially those where devices communicate with their cloud backend systems via metered networks, it
is not desirable to send raw sensor readings or status information across the communication link to the cloud because of
the associated cost and load.
Some IoT solutions specifically require evaluation of signal data streams, with video and audio covering particular signal
shapes and spectrums, by application of digital signal processing algorithms or pattern matching or discovery, so it is
required to treat these kinds of signals in a first-class fashion.
A sufficiently powerful field gateway can perform local processing, aggregation or encoding before data is transferred
over the network.
In some cases, resilience and increased processing power for edge processing is desired in the Field Gateway. In such a
case, it may be desirable to deploy a three or more node cluster at the edge as shown in Figure 8.
Figure 8: DataStax Edge Cluster and Central Cluster
By embedding DataStax Enterprise in a field gateway or even an edge device, hot path analytics can be provided to
devices with lower latency. Additionally, in scenarios with intermittent connectivity, analytics will continue to be performed
even when the network is down.
DataStax Advanced Replication allows gateways (or even devices) to store a subset of the entire database locally and
replicate particular information in a unidirectional way. There are two obvious use cases for uni-directional replication
here:
1. Telemetry Data may be aggregated on the gateway and then encoded. The raw bit stream would remain local to
the device and not be replicated anywhere. However, the encoded data would be passed uni-directionally to the
cloud based Device State Store on the backend.
2. Registry Data should be stored in the cloud based Device Registry Store. Some devices may want to maintain a
local copy of registered devices in their immediate area, for instance in the same building.
Storing data locally at the Field Gateway both reduces latency in the system and makes the system more resilient to
failure.
Edge Cluster
Central Cluster
Telemetry Data
Operational Metadata
Command & Control
DataStax and Azure IoT Reference Architecture 12
For edge devices with limited hardware footprints, simple analytics such as moving averages and other aggregations are
possible. For devices with more performant hardware, it is even possible to embed Spark MLlib scoring at the edge of the
IoT network and push trained models from the cloud back end in Azure down to the edge.
8. Geographical Replication
DataStax Enterprise, built on Apache Cassandra, is unique among databases. DSE provides the ability to deploy
geographically distributed databases. DSE clusters are made up of arbitrary numbers of datacenters with any number of
nodes in each datacenter. Each DSE data center (deployed in an Azure region) automatically synchronizes information
across the geo-distributed cluster. This provides two keys benefits:
 Disaster Avoidance – All DSE data centers run in an active/active/active configuration. In the event an Azure
service or data center fails, clients automatically fail over to a live data center.
 Data Locality – Data is available locally, wherever an application back end is deployed. This reduces access latency
and ensures the application can scale horizontally to handle load from millions of devices.
Figure 9: DataStax Geo-Replication Architecture
For an implementation of the Azure IoT reference architecture, geographical redundancy can be leveraged in a number of
ways:
 Disaster Avoidance – IoT infrastructure can be deployed in an active/active state by leveraging DSE. This allows
the application to continue operating even in the event of a regional failure. This capability is particularly powerful
as DSE is deployed in an active/active scenario with any number of simultaneously active datacenters. The result is
there is no downtime during failover, instead the application can immediately connect to a DSE node in an
available region. This is a great way to take advantage of the large number of Azure regions. Even in the case
where full active deployment is not required, geographical resilience can be leveraged to protect against potential
data loss of local failures.
 Improved Performance – By deploying IoT solutions in multiple locations and data centers closer to IoT devices,
latency to analyze and act on device data and sensor readings can be reduced. This gives an improved experience
for users of the IoT system.
DataStax and Azure IoT Reference Architecture 13
9. Conclusion
Azure IoT is leading the industry with a componentized IoT reference architecture and pluggable services and
infrastructure that can be customized to meet any IoT need. DataStax Enterprise can be used to implement components of
the reference architecture in a geographically scalable and resilient way. Beyond that, DataStax Enterprise provides
analytics and graph database features that makes its usage as part of the Azure IoT reference architecture even more
compelling.
For more information, please contact konstantin.dotchkoff@microsoft.com, claudioc@microsoft.com or
ben.lackey@datastax.com.

Weitere ähnliche Inhalte

Was ist angesagt?

Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack IntroductionVikram Shinde
 
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...Lucidworks
 
Architectural Best Practices to Master + Pitfalls to Avoid (P)
Architectural Best Practices to Master + Pitfalls to Avoid (P) Architectural Best Practices to Master + Pitfalls to Avoid (P)
Architectural Best Practices to Master + Pitfalls to Avoid (P) Elasticsearch
 
Introducing log analysis to your organization
Introducing log analysis to your organization Introducing log analysis to your organization
Introducing log analysis to your organization Sematext Group, Inc.
 
Better Search and Business Analytics at Southern Glazer’s Wine & Spirits
Better Search and Business Analytics at Southern Glazer’s Wine & SpiritsBetter Search and Business Analytics at Southern Glazer’s Wine & Spirits
Better Search and Business Analytics at Southern Glazer’s Wine & SpiritsElasticsearch
 
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataAltinity Ltd
 
Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016
Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016
Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016Zabbix
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack PresentationAmr Alaa Yassen
 
Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Exper...
Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Exper...Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Exper...
Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Exper...InfluxData
 
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and PythonBuilding a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and PythonSingleStore
 
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...DataStax Academy
 
Replicate Elasticsearch Data with Cross-Cluster Replication (CCR)
Replicate Elasticsearch Data with Cross-Cluster Replication (CCR)Replicate Elasticsearch Data with Cross-Cluster Replication (CCR)
Replicate Elasticsearch Data with Cross-Cluster Replication (CCR)Elasticsearch
 
Empowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorEmpowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorScyllaDB
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevAltinity Ltd
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs
 
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
NDC Minnesota - Analyzing StackExchange data with Azure Data LakeNDC Minnesota - Analyzing StackExchange data with Azure Data Lake
NDC Minnesota - Analyzing StackExchange data with Azure Data LakeTom Kerkhove
 

Was ist angesagt? (20)

Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
 
Zabbix at scale with Elasticsearch
Zabbix at scale with ElasticsearchZabbix at scale with Elasticsearch
Zabbix at scale with Elasticsearch
 
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
 
Architectural Best Practices to Master + Pitfalls to Avoid (P)
Architectural Best Practices to Master + Pitfalls to Avoid (P) Architectural Best Practices to Master + Pitfalls to Avoid (P)
Architectural Best Practices to Master + Pitfalls to Avoid (P)
 
Introducing log analysis to your organization
Introducing log analysis to your organization Introducing log analysis to your organization
Introducing log analysis to your organization
 
Log analytics with ELK stack
Log analytics with ELK stackLog analytics with ELK stack
Log analytics with ELK stack
 
Better Search and Business Analytics at Southern Glazer’s Wine & Spirits
Better Search and Business Analytics at Southern Glazer’s Wine & SpiritsBetter Search and Business Analytics at Southern Glazer’s Wine & Spirits
Better Search and Business Analytics at Southern Glazer’s Wine & Spirits
 
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
 
tdtechtalk20160330johan
tdtechtalk20160330johantdtechtalk20160330johan
tdtechtalk20160330johan
 
Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016
Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016
Konstantin Yakovlev - Event Analysis Toolset | ZabConf2016
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Exper...
Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Exper...Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Exper...
Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Exper...
 
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and PythonBuilding a Real-Time Data Pipeline with Spark, Kafka, and Python
Building a Real-Time Data Pipeline with Spark, Kafka, and Python
 
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Ap...
 
Cassandra in e-commerce
Cassandra in e-commerceCassandra in e-commerce
Cassandra in e-commerce
 
Replicate Elasticsearch Data with Cross-Cluster Replication (CCR)
Replicate Elasticsearch Data with Cross-Cluster Replication (CCR)Replicate Elasticsearch Data with Cross-Cluster Replication (CCR)
Replicate Elasticsearch Data with Cross-Cluster Replication (CCR)
 
Empowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorEmpowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with Alternator
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
NDC Minnesota - Analyzing StackExchange data with Azure Data LakeNDC Minnesota - Analyzing StackExchange data with Azure Data Lake
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
 

Ähnlich wie Azure-and-DataStax-IoT-Reference-Architecture-White-Paper_1

Azure SQL Database & Azure SQL Data Warehouse
Azure SQL Database & Azure SQL Data WarehouseAzure SQL Database & Azure SQL Data Warehouse
Azure SQL Database & Azure SQL Data WarehouseMohamed Tawfik
 
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data AnalyticsComprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data AnalyticsSparity1
 
Tiarrah Computing: The Next Generation of Computing
Tiarrah Computing: The Next Generation of ComputingTiarrah Computing: The Next Generation of Computing
Tiarrah Computing: The Next Generation of ComputingIJECEIAES
 
Azure Data Engineering Online Training
Azure Data Engineering Online TrainingAzure Data Engineering Online Training
Azure Data Engineering Online Trainingmaniiveera
 
Azure intelligent edge solutions overview
Azure intelligent edge solutions overviewAzure intelligent edge solutions overview
Azure intelligent edge solutions overviewCenk Ersoy
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” Stratio
 
CLOUD ANALYTICS: AN INSIGHT ON DATA AND STORAGE SERVICES IN MICROSOFT AZURE
CLOUD ANALYTICS: AN INSIGHT ON DATA AND STORAGE SERVICES IN MICROSOFT AZURECLOUD ANALYTICS: AN INSIGHT ON DATA AND STORAGE SERVICES IN MICROSOFT AZURE
CLOUD ANALYTICS: AN INSIGHT ON DATA AND STORAGE SERVICES IN MICROSOFT AZUREJournal For Research
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsInformatica
 
Azure SQL DB Managed Instances Built to easily modernize application data layer
Azure SQL DB Managed Instances Built to easily modernize application data layerAzure SQL DB Managed Instances Built to easily modernize application data layer
Azure SQL DB Managed Instances Built to easily modernize application data layerMicrosoft Tech Community
 
Using power bi in hybrid it
Using power bi in hybrid itUsing power bi in hybrid it
Using power bi in hybrid ithman10010
 
An Overview of All The Different Databases in Google Cloud
An Overview of All The Different Databases in Google CloudAn Overview of All The Different Databases in Google Cloud
An Overview of All The Different Databases in Google CloudFibonalabs
 
SQL Server Data Services
SQL Server Data ServicesSQL Server Data Services
SQL Server Data ServicesEduardo Castro
 
Azure SQL Database Managed Instance - technical overview
Azure SQL Database Managed Instance - technical overviewAzure SQL Database Managed Instance - technical overview
Azure SQL Database Managed Instance - technical overviewGeorge Walters
 
Integration of Things (Sam Vanhoutte @Iglooconf 2017)
Integration of Things (Sam Vanhoutte @Iglooconf 2017) Integration of Things (Sam Vanhoutte @Iglooconf 2017)
Integration of Things (Sam Vanhoutte @Iglooconf 2017) Codit
 

Ähnlich wie Azure-and-DataStax-IoT-Reference-Architecture-White-Paper_1 (20)

Azure 10 major services
Azure 10 major servicesAzure 10 major services
Azure 10 major services
 
Azure SQL Database & Azure SQL Data Warehouse
Azure SQL Database & Azure SQL Data WarehouseAzure SQL Database & Azure SQL Data Warehouse
Azure SQL Database & Azure SQL Data Warehouse
 
Azure IoT Summary
Azure IoT SummaryAzure IoT Summary
Azure IoT Summary
 
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data AnalyticsComprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
 
Tiarrah Computing: The Next Generation of Computing
Tiarrah Computing: The Next Generation of ComputingTiarrah Computing: The Next Generation of Computing
Tiarrah Computing: The Next Generation of Computing
 
Azure Data Engineering Online Training
Azure Data Engineering Online TrainingAzure Data Engineering Online Training
Azure Data Engineering Online Training
 
azure pdf.pdf
azure pdf.pdfazure pdf.pdf
azure pdf.pdf
 
8. 9590 1-pb
8. 9590 1-pb8. 9590 1-pb
8. 9590 1-pb
 
UNIT - II.docx
UNIT - II.docxUNIT - II.docx
UNIT - II.docx
 
Azure intelligent edge solutions overview
Azure intelligent edge solutions overviewAzure intelligent edge solutions overview
Azure intelligent edge solutions overview
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack”
 
CLOUD ANALYTICS: AN INSIGHT ON DATA AND STORAGE SERVICES IN MICROSOFT AZURE
CLOUD ANALYTICS: AN INSIGHT ON DATA AND STORAGE SERVICES IN MICROSOFT AZURECLOUD ANALYTICS: AN INSIGHT ON DATA AND STORAGE SERVICES IN MICROSOFT AZURE
CLOUD ANALYTICS: AN INSIGHT ON DATA AND STORAGE SERVICES IN MICROSOFT AZURE
 
Azure Cloud Services
Azure Cloud ServicesAzure Cloud Services
Azure Cloud Services
 
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
 
Azure SQL DB Managed Instances Built to easily modernize application data layer
Azure SQL DB Managed Instances Built to easily modernize application data layerAzure SQL DB Managed Instances Built to easily modernize application data layer
Azure SQL DB Managed Instances Built to easily modernize application data layer
 
Using power bi in hybrid it
Using power bi in hybrid itUsing power bi in hybrid it
Using power bi in hybrid it
 
An Overview of All The Different Databases in Google Cloud
An Overview of All The Different Databases in Google CloudAn Overview of All The Different Databases in Google Cloud
An Overview of All The Different Databases in Google Cloud
 
SQL Server Data Services
SQL Server Data ServicesSQL Server Data Services
SQL Server Data Services
 
Azure SQL Database Managed Instance - technical overview
Azure SQL Database Managed Instance - technical overviewAzure SQL Database Managed Instance - technical overview
Azure SQL Database Managed Instance - technical overview
 
Integration of Things (Sam Vanhoutte @Iglooconf 2017)
Integration of Things (Sam Vanhoutte @Iglooconf 2017) Integration of Things (Sam Vanhoutte @Iglooconf 2017)
Integration of Things (Sam Vanhoutte @Iglooconf 2017)
 

Azure-and-DataStax-IoT-Reference-Architecture-White-Paper_1

  • 1. Azure IoT services Reference Architecture MICROSOFT CONFIDENTIAL 1 DataStax and Azure IoT Reference Architecture
  • 2. pg. 2 Table of Contents Reference Architecture Overview 3 Implementation 3 Device Registry Store 5 Device State Store 6 Real-Time Analytics 7 Batch Analytics 8 Field Gateway 9 Geographical Redundancy 12 Conclusion 13 Summary Connected sensors, devices, and intelligent operations are transforming businesses and enabling new growth opportunities with Microsoft Azure Internet of Things (IoT) services. This document outlines how DataStax Enterprise, providing a scalable and resilient IoT infrastructure, can be used to implement specific components of the Azure IoT reference architecture. © 2016 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it. Some examples are for illustration only and are fictitious. No real association is intended or inferred. This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes.
  • 3. DataStax and Azure IoT Reference Architecture 3 1. Reference Architecture Overview Connected sensors, devices, and intelligent operations are transforming businesses and enabling new growth opportunities with Microsoft Azure Internet of Things (IoT) services. This document outlines how to use DataStax Enterprise (DSE) in the Azure IoT reference architecture. DataStax Enterprise is a geographically distributed and horizontally scalable transactional database based on Apache Cassandra. It includes integrated Spark analytics for steam processing and machine learning and a graph database for relationship modeling. DataStax Enterprise is ideal for storing operational data with always available uptime requirements. Figure 1: IoT solution architecture Figure 1 shows the conceptual architecture for Azure IoT. This is detailed in the document Microsoft Azure IoT Reference Architecture1 . DataStax Enterprise can be used to implement a number of components in the architecture, enhancing performance, functionality and reliability. 2. Implementation Figure 2, below, shows how DataStax Enterprise can be used as part of the end-to-end Azure IoT reference architecture. 1 https://azure.microsoft.com/en-us/updates/microsoft-azure-iot-reference-architecture-available/ Low power devices Existing IoT devices IoT Client Solution UX Provisioning API Identity and Registry Stores Stream Process Analytics & Machine Learning Business Integration Connectors and Gateway(s) Device State Store Gateway Cloud Gateway App Backend Data Path Optional solution component IoT solution component IoT Client Presentation & Business Connectivity Data Processing, Analytics and ManagementDevice Connectivity Personal mobile devices IP capable devices IoT Client Business systems Storage
  • 4. DataStax and Azure IoT Reference Architecture 4 Figure 2: DataStax usage in the Azure IoT Reference Architecture DataStax Enterprise can be used to implement the device registry and the device state stores. Additionally, the analytics components of DataStax Enterprise can be leveraged to implement the stream processing and analytics portions of the Azure IoT reference architecture. Using DataStax Enterprise for these components offers several key advantages:  Linear Scalability – Able to scale to handle millions of transactions per second  Resilience to Failure – DataStax Enterprise provides node fault tolerance, rack fault tolerance and data center level disaster tolerance  Integrated Analytics – Support for graph databases, full text search and machine learning. These three advantages are very pertinent to the requirements of IoT stack components. More detail on how they relate to those components is given later in this paper. DataStax is working with Mesosphere to deploy DataStax Enterprise in the Mesos Universe, allowing for a push button, containerized deployment in both Mesos and Azure Container Service (ACS). This allows for simplified provisioning and orchestration if implementing an open source version of the Azure IoT reference architecture. More information is available here2 : Alternatively, if the Azure IoT reference architecture is implemented using primarily Azure services rather than open source components, DataStax Enterprise can be deployed on Azure VMs using Azure Resource Manager (ARM). For field gateways, DataStax can be deployed directly on the hardware. This means that a single database infrastructure can be used across hybrid cloud IoT deployments. This results in simplified operations by avoiding the need to maintain multiple types of infrastructure. 2 http://www.marketwired.com/press-release/mesosphere-brings-datastax-enterprise-to-the-dc-os-universe-app-store- 2130849.htm Low power devices Existing IoT devices IoT Client Solution UX Provisioning API Device Registry Stores Real-time Analytics (Spark / Spark R/ Spark ML scoring) Batch Analytics (Spark ML training) Business Integration Connectors and Gateway(s) Device State Store Gateway IoT Hub App Backend Data Path Optional solution component IoT solution component IoT Client Presentation & Business Connectivity Data Processing, Analytics and ManagementDevice Connectivity Personal mobile devices IP capable devices IoT Client Business systems IoT solution component using DataStax Hadoop
  • 5. DataStax and Azure IoT Reference Architecture 5 The following sections describe the advantages of using DataStax Enterprise for the different components of the Azure IoT reference architecture. 3. Device Registry Store The device registry contains device related metadata attributes and reference data for provisioned devices. The device registry serves as an index for device discoverability and is used by the solution backend components and UI. Typically, the device registry contains only slowly changing data. Examples of device registry store data include:  The building and room number a smoke alarm is installed in  The installation date for a mixing valve  The upstream and downstream components connected to a generator. Uptime is extremely important for the device registry. If it is not available, operations that depend on device metadata will fail. DataStax Enterprise is very resilient to failure, making it an excellent choice for the device registry store. DataStax Enterprise clusters are made up of, in descending order, data centers, racks and nodes. DSE is resilient to failure at the data center, rack and node level. Using DataStax Enterprise for the device state store has advantages beyond the resilience inherent in the database. DSE is a multi-model database, including support for tabular data, full text search, analytics and graph databases. The graph mode is particularly useful for the device registry. It can be used to model the relations between devices and other domain specific entities. For example, as shown in Figure 3, it can be used to represent the relations between sensors, machines, factories and products in a manufacturing scenario. Figure 3: Example of DataStax Enterprise graph model for Azure IoT Many databases fall into one of the two categories with respect to transactional behavior: 1. Strongly Consistent 2. Eventually Consistent SENSOR OPERATOR FACTORY PRODUCT SENSOR MACHINE MACHINE FAILURE Monitors Monitors Reports Affects Part of Assembles Producers Works for
  • 6. DataStax and Azure IoT Reference Architecture 6 The CAP theorem3 details the tradeoffs between these categories. Essentially strong consistency comes with disadvantages on scale and reliability. Eventual consistency does better in that regard, but at the cost of potential inconsistency. DataStax Enterprise is unique in that it offers tunable consistency. This means that the consistency level can be balanced with performance and availability characteristics depending on the application. For the device registry store, it may be advisable to tune toward read performance and consistency. This is because the device registry contains changing data where inconsistency could impact the application back end and analytics processes. One example of how tunable consistency can benefit performance in Azure IoT would be to take advantage of write frequency for the device registry. In this scenario, writes are infrequent but should be reflected across the entire cluster. We could tune write consistency to ALL. This would require all writes to be acknowledged before an update to devices registered in the database is acknowledged as successful. Reads in this scenario are much more frequent, so we may want to bias the consistency tuning to make them occur as quickly as possible. For this reason, we could tune to ONE for reads in the device registry store. This would mean that a read would be acknowledged as soon as any copy of the data was returned. Quorum level consistency options could be used as well. More information on tuning consistency is available in this article4 . 4. Device State Store The device state store contains operational data from the devices. Device operational data is high volume and high velocity data, typically many orders of magnitude more than what is stored in the Device Registry Store. This is because a single device will produce many readings. Given these data volumes, it’s extremely important to use a highly scalable database for the device state store. DataStax Enterprise scales linearly to handle the load demands of millions of devices. Figure 4, below, shows how DSE performance scales linearly as nodes are added to a cluster. DSE can scale from extremely small to large clusters that can handle millions of transactions per second. This makes DSE an ideal database for the device state store. 3 https://en.wikipedia.org/wiki/CAP_theorem 4 https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_config_consistency_c.html
  • 7. DataStax and Azure IoT Reference Architecture 7 Figure 4: Near linear scalability in DataStax Enterprise5 In the case of the device state store, the need for transferring or replicating each data category should be analyzed. Raw telemetry data might not need to be available on a secondary site. Aggregated data will represent a reduced data volume which might be easier to replicate if needed. The device state store has much more dynamic data than the device registry store. Uptime and the ability to handle large data volumes remain important and the deployment architecture that works well in that scenario remains viable here. In the device state store, it may be advisable to tune consistency differently than in the device registry. Here the aim is to optimize for writes rather than reads. In these cases, reads and writes can occur with quorum level consistency. The Device State Store and Device Registry store can be implemented as distinct databases or as a single database. While there may be minor latency advantages to implementing distinct databases, for most cases we would recommend a single database. This simplifies administration and reduces hardware cost. 5. Real-Time Analytics After ingress through Azure IoT Hub as the cloud gateway, the flow of data through the system is facilitated by data pumps and analytics tasks. Data pumps are typically moving or routing data without any transformation, while analytics tasks perform complex event processing. Since the IoT Hub provides brokered communication and supports multiple consumers, the same data can be consumed by different stream processors for different purposes, which will result in multiple data streams flowing concurrently. For example, a stream processor may listen only for special types of events, while another one could perform complex event processing in parallel. Those processors can determine the path of data and route without any reshaping or perform complex event processing tasks such as data aggregation, data enrichment 5 http://www.datastax.com/apache-cassandra-leads-nosql-benchmark 0 50,000 100,000 150,000 200,000 250,000 300,000 350,000 1 2 4 8 16 32 Cassandra Couchbase Hbase MongoDB Operations/sec Nodes
  • 8. DataStax and Azure IoT Reference Architecture 8 through correlation with reference data, as well as analytics tasks such as detection of threshold limits or anomalies and generation of alerts. As part of its multi-model capabilities, DataStax Enterprise embeds Apache Spark. Spark consists of a number of components that are relevant to the Real-time Analytics aspect of the Azure IoT reference architecture, including Spark Streaming and Spark MLlib. For hot path analytics, data flows directly from the Azure IoT Hub into Spark Streaming that is integrated into the DataStax Enterprise runtime. Spark allows you to use a pre-trained MLlib models directly in the Spark Streaming analytics pipeline. Thus, incoming data can be scored against Spark MLlib models. With this architecture, real-time predictions can be made against pre-trained machine learning models. Embedding machine learning infrastructure into real-time data feeds using Spark Streaming allows the system to react quickly to new input, intelligently predicting with greater responsiveness and accuracy than a traditional business intelligence based approach. This integration simplifies usage of machine learning models directly in the streaming pipeline. It also improves performance and latency as the data is processed close to the database. Some example applications of this infrastructure include:  Real-time correlation analysis from multiple fire alarm sensors to determine if particulate buildup is due to a fire or normal wear and tear, allowing for maintenance optimization.  Prediction of drill head failure in oil drilling through real time modeling of heat and fatigue measurements.  Integration with workforce management data in real-time to automatically route maintenance personal while optimizing for job urgency and trip distance. Hot path analytics with machine learning are where IoT users can extract the greatest amount of business value from their IoT investment. DataStax Enterprise provides a concrete implementation of this type of IoT analytics infrastructure that combines the resilience of Cassandra with multi model analytics for a comprehensive operational analytics solution. 6. Batch Analytics Batch analytics in Azure IoT can be provided by Azure HD Insight (HDI). HDI is ideal for cases where large amounts of data must be analyzed with batch queries or even ad-hoc queries. Training machine learning models in batch (as opposed to incremental real-time training) or building monthly roll up reports are both use cases HDI is well suited for.
  • 9. DataStax and Azure IoT Reference Architecture 9 Figure 5: Machine Learning Lifecycle with Spark In some cases, where the batch queries are well defined or the amount of data stored is smaller, it may make sense to use DataStax Enterprise for both the hot path and batch analytics. This results in a lower TCO as only one database needs to be maintained as opposed to both DSE and HDI. However, this is balanced with some tradeoffs in suitability for large scale batch queries. Note that Spark MLlib models trained in HDI Spark as well as Spark in DataStax Enterprise can be exported and used in the hot path for real-time analytics based on machine learning. This allows the buildout of a full machine learning lifecycle within the IoT architecture as shown in Figure 5 above. 7. Field Gateway A field gateway is a specialized device that acts as a communication enabler and as a local device control system and device data processing hub. A field gateway can perform local processing and control functions for the devices. On the other side, it can filter and aggregate the device telemetry. This reduces the amount of data transferred to the cloud back end. Gateways may assist in device provisioning, data filtering, batching and aggregation, buffering of data, protocol translation, and event processing. Define Model Features (SparkSQL, BI, etc.) Train Model (Batch Spark MLlib) Export Model to Real-Time Analytics (Spark MLlib) Real-Time Model Scoring (Spark Streaming and MLlib) Evaluate Model Performance (SparkSQL, BI, etc.)
  • 10. DataStax and Azure IoT Reference Architecture 10 Figure 6: Data Flow in a stateful Field Gateway Field Gateways that embed DataStax Enterprise are stateful. If connectivity is intermittent, these gateways can operate as store and forward databases, syncing to the cloud gateways when connectivity is available. This gives a mechanism for data storage locally, even on gateways with constrained hardware. This is because DataStax Enterprise can run on devices with minimal hardware resources. It also lays the path for field gateways with sufficiently powerful hardware to embed advanced analytics for edge processing. Field Gateways that embed DataStax Enterprise can persist state in a standalone instance of the database or in an edge database using DataStax Enterprise Advanced Replication. In the Advanced Replication case, with additional integration work, messages can be passed through the IoT Hub to the central DataStax Enterprise datacenters. This synchronizes the database automatically, saving a user the tedious exercise of implementing that logic. An example topology is show in Figure 7. Figure 7: Advanced Replication for Field Gateways with a DataStax Enterprise cluster Low power IoT devices IoT devices Field Gateway IoT Hub Client IoT Hub Protocol Adapter Data Buffering
  • 11. DataStax and Azure IoT Reference Architecture 11 7.1. Edge Processing In many scenarios, especially those where devices communicate with their cloud backend systems via metered networks, it is not desirable to send raw sensor readings or status information across the communication link to the cloud because of the associated cost and load. Some IoT solutions specifically require evaluation of signal data streams, with video and audio covering particular signal shapes and spectrums, by application of digital signal processing algorithms or pattern matching or discovery, so it is required to treat these kinds of signals in a first-class fashion. A sufficiently powerful field gateway can perform local processing, aggregation or encoding before data is transferred over the network. In some cases, resilience and increased processing power for edge processing is desired in the Field Gateway. In such a case, it may be desirable to deploy a three or more node cluster at the edge as shown in Figure 8. Figure 8: DataStax Edge Cluster and Central Cluster By embedding DataStax Enterprise in a field gateway or even an edge device, hot path analytics can be provided to devices with lower latency. Additionally, in scenarios with intermittent connectivity, analytics will continue to be performed even when the network is down. DataStax Advanced Replication allows gateways (or even devices) to store a subset of the entire database locally and replicate particular information in a unidirectional way. There are two obvious use cases for uni-directional replication here: 1. Telemetry Data may be aggregated on the gateway and then encoded. The raw bit stream would remain local to the device and not be replicated anywhere. However, the encoded data would be passed uni-directionally to the cloud based Device State Store on the backend. 2. Registry Data should be stored in the cloud based Device Registry Store. Some devices may want to maintain a local copy of registered devices in their immediate area, for instance in the same building. Storing data locally at the Field Gateway both reduces latency in the system and makes the system more resilient to failure. Edge Cluster Central Cluster Telemetry Data Operational Metadata Command & Control
  • 12. DataStax and Azure IoT Reference Architecture 12 For edge devices with limited hardware footprints, simple analytics such as moving averages and other aggregations are possible. For devices with more performant hardware, it is even possible to embed Spark MLlib scoring at the edge of the IoT network and push trained models from the cloud back end in Azure down to the edge. 8. Geographical Replication DataStax Enterprise, built on Apache Cassandra, is unique among databases. DSE provides the ability to deploy geographically distributed databases. DSE clusters are made up of arbitrary numbers of datacenters with any number of nodes in each datacenter. Each DSE data center (deployed in an Azure region) automatically synchronizes information across the geo-distributed cluster. This provides two keys benefits:  Disaster Avoidance – All DSE data centers run in an active/active/active configuration. In the event an Azure service or data center fails, clients automatically fail over to a live data center.  Data Locality – Data is available locally, wherever an application back end is deployed. This reduces access latency and ensures the application can scale horizontally to handle load from millions of devices. Figure 9: DataStax Geo-Replication Architecture For an implementation of the Azure IoT reference architecture, geographical redundancy can be leveraged in a number of ways:  Disaster Avoidance – IoT infrastructure can be deployed in an active/active state by leveraging DSE. This allows the application to continue operating even in the event of a regional failure. This capability is particularly powerful as DSE is deployed in an active/active scenario with any number of simultaneously active datacenters. The result is there is no downtime during failover, instead the application can immediately connect to a DSE node in an available region. This is a great way to take advantage of the large number of Azure regions. Even in the case where full active deployment is not required, geographical resilience can be leveraged to protect against potential data loss of local failures.  Improved Performance – By deploying IoT solutions in multiple locations and data centers closer to IoT devices, latency to analyze and act on device data and sensor readings can be reduced. This gives an improved experience for users of the IoT system.
  • 13. DataStax and Azure IoT Reference Architecture 13 9. Conclusion Azure IoT is leading the industry with a componentized IoT reference architecture and pluggable services and infrastructure that can be customized to meet any IoT need. DataStax Enterprise can be used to implement components of the reference architecture in a geographically scalable and resilient way. Beyond that, DataStax Enterprise provides analytics and graph database features that makes its usage as part of the Azure IoT reference architecture even more compelling. For more information, please contact konstantin.dotchkoff@microsoft.com, claudioc@microsoft.com or ben.lackey@datastax.com.