最新のデータベース技術の方向性で思うこと



 “Stream-driven” paradigm

4
レガシーデバイス
(カスタムプロトコル)
IoT デバイス
IP 通信可能な
デバイス
(Windows/Linux)
省電力駆動のデバ
イス (RTOS)
Cloud gateways
Field
gateways
ストリーム処理
(Stream Analytics)
クエリと検索
(Azure Search)
見える化・分析
(Power BI)
ダッシュボード
(Azure App Services)
機械学習
(Machine
Learning API)
機械学習
(Machine
Learning /
Revolution
R Enterprise)
並列データ処理
(Azure Data Lake Analytics)
デバイスへの通知
(Notification Hubs)
DWH
ドキュメント
データベース
時系列データ
イベント
ブローカー／
デバイス管理
Azure Active
Directory 認証基盤
Azure Active
Directory




Partition 1
Partition 2
Partition “n”
Consumer Group C
Callback for prtn. 6
Worker “n”
Callback “n”
Worker 1
Consumer Group B
Worker “n”
Callback “n”
Worker 1
Consumer Group A
Worker “n”
Callback “n”
Worker 1










Intermediary
Broker
Backpressure
Feedback







Broker
Broker
Broker

Transformation Meaning
map(func)
Return a new DStream by passing each element of the source
DStream through a function func.
flatMap(func)
Similar to map, but each input item can be mapped to 0 or more
output items.
filter(func)
Return a new DStream by selecting only the records of the source
DStream on which func returns true.
repartition(numPartitions)
Changes the level of parallelism in this DStream by creating more or
fewer partitions.
union(otherStream)
Return a new DStream that contains the union of the elements in
the source DStream and otherDStream.
count()
Return a new DStream of single-element RDDs by counting the
number of elements in each RDD of the source DStream.
reduce(func)
Return a new DStream of single-element RDDs by aggregating the
elements in each RDD of the source DStream using a function func
(which takes two arguments and returns one). The function should
be associative so that it can be computed in parallel.

Transformation Meaning
countByValue()
When called on a DStream of elements of type K, return a new DStream of (K,
Long) pairs where the value of each key is its frequency in each RDD of the source
DStream.
reduceByKey(func,
[numTasks])
When called on a DStream of (K, V) pairs, return a new DStream of (K, V) pairs
where the values for each key are aggregated using the given reduce function.
Note: By default, this uses Spark's default number of parallel tasks (2 for local
mode, and in cluster mode the number is determined by the config property
spark.default.parallelism) to do the grouping. You can pass an optional numTasks
argument to set a different number of tasks.
join(otherStream,
[numTasks])
When called on two DStreams of (K, V) and (K, W) pairs, return a new DStream of
(K, (V, W)) pairs with all pairs of elements for each key.
cogroup(otherStream,
[numTasks])
When called on DStream of (K, V) and (K, W) pairs, return a new DStream of (K,
Seq[V], Seq[W]) tuples.
transform(func)
Return a new DStream by applying a RDD-to-RDD function to every RDD of the
source DStream. This can be used to do arbitrary RDD operations on the DStream.
updateStateByKey(func)
Return a new "state" DStream where the state for each key is updated by applying
the given function on the previous state of the key and the new values for the key.
This can be used to maintain arbitrary state data for each key.

Transformation Description
Map
Takes one element and produces one element. A map function that doubles the values of the input
stream
FlatMap
Takes one element and produces zero, one, or more elements. A flatmap function that splits sentences
to words
Filter
Evaluates a boolean function for each element and retains those for which the function returns true. A
filter that filters out zero values:
KeyBy
Logically partitions a stream into disjoint partitions, each partition containing elements of the same
key. Internally, this is implemented with hash partitioning.
Reduce
A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value
and emits the new value.
Fold
A "rolling" fold on a keyed data stream with an initial value. Combines the current element with the
last folded value and emits the new value.
Aggregations
Rolling aggregations on a keyed data stream. The difference between min and minBy is that min
returns the minimun value, whereas minBy returns the element that has the minimum value in this field
(same for max and maxBy).
Window
Windows can be defined on already partitioned KeyedStreams. Windows group the data in each key
according to some characteristic (e.g., the data that arrived within the last 5 seconds).
WindowAll
Windows can be defined on regular DataStreams. Windows group all the stream events according to
some characteristic (e.g., the data that arrived within the last 5 seconds).
Window Apply
Applies a general function to the window as a whole. Below is a function that manually sums the
elements of a window.
Window Reduce Applies a functional reduce function to the window and returns the reduced value.
Window Fold Applies a functional fold function to the window and returns the folded value.

Transformation Description
Aggregations on
windows
Aggregates the contents of a window. The difference between min and minBy is that min
returns the minimun value, whereas minBy returns the element that has the minimum value in
this field (same for max and maxBy).
Union
Union of two or more data streams creating a new stream containing all the elements from all
the streams. Node: If you union a data stream with itself you will get each element twice in the
resulting stream.
Window Join Join two data streams on a given key and a common window.
Window CoGroup Cogroups two data streams on a given key and a common window.
Connect
"Connects" two data streams retaining their types. Connect allowing for shared state between
the two streams.
CoMap, CoFlatMap Similar to map and flatMap on a connected data stream
Split Split the stream into two or more streams according to some criterion.
Select Select one or more streams from a split stream.
Iterate
Creates a "feedback" loop in the flow, by redirecting the output of one operator to some
previous operator. This is especially useful for defining algorithms that continuously update a
model. The following code starts with a stream and applies the iteration body continuously.
Elements that are greater than 0 are sent back to the feedback channel, and the rest of the
elements are forwarded downstream.
Extract Timestamps
Extracts timestamps from records in order to work with windows that use event time
semantics.



出典： S-Store: Streaming Meets Transaction Processing


http://yahoohadoop.tumblr.com/post/135370591481/benchmarking-streaming-computation-engines-at



















 1.4億 TATP tps

 PB オーダー












出典： No compromises distributed transactions with consistency, availability, and performance

 本書に記載した情報は、本書各項目に関する発行日現在の Microsoft の見解を表明するものです。Microsoftは絶えず変化する市場に対応しなければならないため、ここに記載した情報に
対していかなる責務を負うものではなく、提示された情報の信憑性については保証できません。
 本書は情報提供のみを目的としています。 Microsoft は、明示的または暗示的を問わず、本書にいかなる保証も与えるものではありません。
 すべての当該著作権法を遵守することはお客様の責務です。Microsoftの書面による明確な許可なく、本書の如何なる部分についても、転載や検索システムへの格納または挿入を行うこ
とは、どのような形式または手段（電子的、機械的、複写、レコーディング、その他）、および目的であっても禁じられています。これらは著作権保護された権利を制限するものではあ
りません。
 Microsoftは、本書の内容を保護する特許、特許出願書、商標、著作権、またはその他の知的財産権を保有する場合があります。Microsoftから書面によるライセンス契約が明確に供給さ
れる場合を除いて、本書の提供はこれらの特許、商標、著作権、またはその他の知的財産へのライセンスを与えるものではありません。
© 2015 Microsoft Corporation. All rights reserved.
Microsoft, Windows, その他本文中に登場した各製品名は、Microsoft Corporation の米国およびその他の国における登録商標または商標です。
その他、記載されている会社名および製品名は、一般に各社の商標です。

最新のデータベース技術の方向性で思うこと

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie 最新のデータベース技術の方向性で思うこと

Ähnlich wie 最新のデータベース技術の方向性で思うこと (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

最新のデータベース技術の方向性で思うこと