4. 4
レガシーデバイス
(カスタムプロトコル)
IoT デバイス
IP 通信可能な
デバイス
(Windows/Linux)
省電力駆動のデバ
イス (RTOS)
Cloud gateways
Field
gateways
ストリーム処理
(Stream Analytics)
クエリと検索
(Azure Search)
見える化・分析
(Power BI)
ダッシュボード
(Azure App Services)
機械学習
(Machine
Learning API)
機械学習
(Machine
Learning /
Revolution
R Enterprise)
並列データ処理
(Azure Data Lake Analytics)
デバイスへの通知
(Notification Hubs)
DWH
ドキュメント
データベース
時系列データ
イベント
ブローカー/
デバイス管理
Azure Active
Directory 認証基盤
Azure Active
Directory
5.
Partition 1
Partition 2
Partition “n”
Consumer Group C
Callback for prtn. 6
Callback for prtn. 2
Worker “n”
Callback for prtn. 1
Callback “n”
Worker 1
Consumer Group B
Callback for prtn. 6
Callback for prtn. 2
Worker “n”
Callback for prtn. 1
Callback “n”
Worker 1
Consumer Group A
Worker “n”
Callback for prtn. 6
Callback for prtn. 2
Callback for prtn. 1
Callback “n”
Worker 1
13. Transformation Meaning
map(func)
Return a new DStream by passing each element of the source
DStream through a function func.
flatMap(func)
Similar to map, but each input item can be mapped to 0 or more
output items.
filter(func)
Return a new DStream by selecting only the records of the source
DStream on which func returns true.
repartition(numPartitions)
Changes the level of parallelism in this DStream by creating more or
fewer partitions.
union(otherStream)
Return a new DStream that contains the union of the elements in
the source DStream and otherDStream.
count()
Return a new DStream of single-element RDDs by counting the
number of elements in each RDD of the source DStream.
reduce(func)
Return a new DStream of single-element RDDs by aggregating the
elements in each RDD of the source DStream using a function func
(which takes two arguments and returns one). The function should
be associative so that it can be computed in parallel.
14. Transformation Meaning
countByValue()
When called on a DStream of elements of type K, return a new DStream of (K,
Long) pairs where the value of each key is its frequency in each RDD of the source
DStream.
reduceByKey(func,
[numTasks])
When called on a DStream of (K, V) pairs, return a new DStream of (K, V) pairs
where the values for each key are aggregated using the given reduce function.
Note: By default, this uses Spark's default number of parallel tasks (2 for local
mode, and in cluster mode the number is determined by the config property
spark.default.parallelism) to do the grouping. You can pass an optional numTasks
argument to set a different number of tasks.
join(otherStream,
[numTasks])
When called on two DStreams of (K, V) and (K, W) pairs, return a new DStream of
(K, (V, W)) pairs with all pairs of elements for each key.
cogroup(otherStream,
[numTasks])
When called on DStream of (K, V) and (K, W) pairs, return a new DStream of (K,
Seq[V], Seq[W]) tuples.
transform(func)
Return a new DStream by applying a RDD-to-RDD function to every RDD of the
source DStream. This can be used to do arbitrary RDD operations on the DStream.
updateStateByKey(func)
Return a new "state" DStream where the state for each key is updated by applying
the given function on the previous state of the key and the new values for the key.
This can be used to maintain arbitrary state data for each key.
15.
16. Transformation Description
Map
Takes one element and produces one element. A map function that doubles the values of the input
stream
FlatMap
Takes one element and produces zero, one, or more elements. A flatmap function that splits sentences
to words
Filter
Evaluates a boolean function for each element and retains those for which the function returns true. A
filter that filters out zero values:
KeyBy
Logically partitions a stream into disjoint partitions, each partition containing elements of the same
key. Internally, this is implemented with hash partitioning.
Reduce
A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value
and emits the new value.
Fold
A "rolling" fold on a keyed data stream with an initial value. Combines the current element with the
last folded value and emits the new value.
Aggregations
Rolling aggregations on a keyed data stream. The difference between min and minBy is that min
returns the minimun value, whereas minBy returns the element that has the minimum value in this field
(same for max and maxBy).
Window
Windows can be defined on already partitioned KeyedStreams. Windows group the data in each key
according to some characteristic (e.g., the data that arrived within the last 5 seconds).
WindowAll
Windows can be defined on regular DataStreams. Windows group all the stream events according to
some characteristic (e.g., the data that arrived within the last 5 seconds).
Window Apply
Applies a general function to the window as a whole. Below is a function that manually sums the
elements of a window.
Window Reduce Applies a functional reduce function to the window and returns the reduced value.
Window Fold Applies a functional fold function to the window and returns the folded value.
17. Transformation Description
Aggregations on
windows
Aggregates the contents of a window. The difference between min and minBy is that min
returns the minimun value, whereas minBy returns the element that has the minimum value in
this field (same for max and maxBy).
Union
Union of two or more data streams creating a new stream containing all the elements from all
the streams. Node: If you union a data stream with itself you will get each element twice in the
resulting stream.
Window Join Join two data streams on a given key and a common window.
Window CoGroup Cogroups two data streams on a given key and a common window.
Connect
"Connects" two data streams retaining their types. Connect allowing for shared state between
the two streams.
CoMap, CoFlatMap Similar to map and flatMap on a connected data stream
Split Split the stream into two or more streams according to some criterion.
Select Select one or more streams from a split stream.
Iterate
Creates a "feedback" loop in the flow, by redirecting the output of one operator to some
previous operator. This is especially useful for defining algorithms that continuously update a
model. The following code starts with a stream and applies the iteration body continuously.
Elements that are greater than 0 are sent back to the feedback channel, and the rest of the
elements are forwarded downstream.
Extract Timestamps
Extracts timestamps from records in order to work with windows that use event time
semantics.