10. – 13.12.2018
Frankfurt am Main
#ittage
Dataservices
{Big,Fast,Smart} Data Processing mit Microservices
Mario-Leander Reimer
Cheftechnologe, QAware GmbH
Contact Details
Mail: mario-leander.reimer@qaware.de
Twitter: @LeanderReimer
Github: https://github.com/lreimer/
12.12.2018
2
Developer & Architect
20+ years of experience
#CloudNativeNerd
Open Source Enthusiast
Speaker & Author
Fork me on Github.
https://github.com/lreimer/data-services-data2day
https://github.com/lreimer/data-services-javaee7
4
Device
The System
Traffic Data Historical Data
Map Data Vehicle Data
5
The system. The data center.
6http://martinfowler.com/bliki/MonolithFirst.html
We are here.
We need to go here.
Die Dekomposition des Monolithen war machbar.
7
def Dataservices :=
{Big, Fast, Smart} Data &
Cloud-native Technology &
Microservice Architecture;
FAST
DATA
Low latency and high
throughput:
Stream processing
Messaging
Event-driven
BIG
DATA
All things distributed:
Distributed
Processing
Distributed
Databases
9
Data to information:
Machine (deep) learning
Advanced statistics
Natural Language Processing
SMART
DATA
Das Gustafsons Gesetz ist bei großen Datenmengen
passender als das Amdahlsche Gesetz.
10
Annahme: Der parallele Anteil P ist linear abhängig von der Problemgröße (i.W. der Datenmenge), der sequenzielle Anteil hingegen
nicht. Beispiel: Mehr Bilder, mehr parallele Konvertierung
Gesetz: Steigt der parallele Anteil P linear (oder mehr) mit der Problemgröße, so wächst auch der Speedup linear
X
QAware 11
12
13
Die Grundidee:
Cloud-native Platform für Micro- und Dataservices.
MICROSERVICE PLATFORM
DATASERVICE PLATFORM
CLUSTER OPERATING SYSTEM
DATASERVICES
MICROSERVICES
MESSAGING
IMDG
DATABASE
14
loosely coupled
stateless
15
Die Grundidee: Eingabe – Verarbeitung – Ausgabe.
Datenverarbeitung mit einem Graph von Microservices.
I1
Sources
Pn
P1P1
Pn
Processors
O1
Sinks
Microservice
(aka Dataservice)
Message
Queue
Probleme und Überlegungen zu “Pipes und Filters“.
QAware 16
Komplexität: das Muster bietet enorme Flexibilität, doch die Komplexität steigt da die Filter auf mehrere
Server verteilt sind.
Zuverlässigkeit: es braucht Infrastruktur, die sicherstellt, dass die zwischen Filtern in einer Pipeline
weitergeleiteten Daten nicht verloren gehen und die Filter selbst resilient laufen.
Idempotenz: speziell im Fehlerfall muss die Verarbeitung einer Nachricht das gleiche Ergebnis ohne jegliche
Seiteneffekte ergeben.
Wiederholte Nachrichten: die Pipeline muss doppelte Nachrichten erkennen und entfernen. Idealerweise
bietet die Messaging-Infrastruktur eine automatische Erkennung und Entfernung doppelter Nachrichten.
Kontext und Status: die Filter einer Pipeline werden separat ausgeführt, sie sollten keine Annahmen über die
Art und Reihenfolge des Aufrufs machen. Jeder Filter muss mit ausreichend Kontext- und Status-Information
versorgt werden.
siehe https://docs.microsoft.com/de-de/azure/architecture/patterns/pipes-and-filters
Der Showcase.
18
JDBC
Source
Weather
Processor
Weather
File Sink
Weather
DB Sink
REST
Source
JAX-RS
JMS
MQTT
Source
JSON-P
JMS
CSV
Source
JBatch
JMS
JBatch
JMS
CSV
In-Memory
Datagrid
Topic
Queue
Topic
Location
Processor
JSON-P
JMS
JCache
JSON-P
JMS
JCache
CSV
JMS
JSON-P
JPA
JMS
JSON-P
JPA
Baustein 1:
Eine Cloud-native Plattform.
QAware 20
21
Building Block 1
22
Create
Conceptual View on Kubernetes Building Blocks.
23
Wichtige Kubernetes Konzepte.
24
Services are an abstraction for a logical
collection of pods.
Pods are the smallest unit of compute in
Kubernetes
Deployments are an abstraction used to
declare and update pods, RCs, …
Replica Sets ensure that the desired number
of pod replicas are running
Labels are key/value pairs used to identify
Kubernetes resources
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 3
strategy:
type: RollingUpdate
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
livenessProbe:
...
readinessProbe:
...
Kubernetes YAML Madness.
25
---
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
namespace: default
spec:
type: NodePort
ports:
- port: 80
selector:
app: nginx
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: nginx
labels:
app: nginx
namespace: default
spec:
rules:
- host: nginx.cloudkoffer
http:
paths:
- path: /
backend:
serviceName: nginx
servicePort: 80
Baustein 2:
Eine Datenbank.
QAware 27
QAware 28
QAware 29
Strongly consistent, distributed ACID transactions
Data Replication & Automatic Rebalancing
Fault tolerance & Recovery
Orchestration-aware
Flexible schemas using JSON types
Standard SQL using postgreSQL syntax
Building Block 2
30
Deploy
Deploy
Baustein 3:
Eine Messaging-Infrastruktur.
32
Messaging Patterns für den flexiblen und zuverlässigen
Nachrichten-Austausch zwischen Dataservices.
P1 C1Q1
Message Passing
P
1
C1
Q1
Cn
Work Queue
P1
C1Q1
CnQn
Publish/Subscribe
P1 C1
Q1
Q2
Remote Procedure Call
QAware 33
34
35
AMQP protocol support
OpenWire support for ActiveMQ 5 clients
MQTT support
STOMP protocol support
HornetQ Core protocol support for HornetQ clients
JMS 2.0 and 1.1 support
High availability with shared store and non
shared store (replication)
Flexible Clustering
High performance journal for message persistence
Large Message Support
Quelle: https://softwaremill.com/mqperf/
Amazon SQS is an easy choice, with good performance and no
setup required, if already on AWS
if you are already using Mongo, it is easy to build a replicated
message queue on top of it, without the need to create and
maintain a separate messaging cluster
ActiveMQ is a popular and widely used messaging broker with
moderate performance, wide protocol support
Artemis offers the best perfomance (on par with Kafka) with the
familiarity of JMS and a wide range of supported protcols
AMQP/STOMP/MQTT support
Kafka offers the best performance (on par with Artemis) and
scalability, at the cost of feature set
EventStore can be your central storage for events with complex
event processing capabilities and great performance
36
Building Block 3
37
Topic
Queue
Topic
Deploy
Deploy
Deploy
Deploy
Baustein 4:
Ein In-Memory Data Grid.
QAware 39
QAware 40
QAware 41
Distributed Data Structures
Data Partitioning and Replication
Distributed Compute and Query
Cloud and Virtualization Support
JCache Provider, JEE Integrated Clustering
Multitude of clients (.NET, Java, Go, …)
Big Data (Jet, Spark, Mesos)
Building Block 4
42
In-Memory
Datagrid
Topic
Queue
Topic
Deploy
Baustein 5:
Eine Dataservice Plattform.
44
Some Open Source Dataservice Platforms.
Standardized API with several open source implementations
Microservices: JavaEE micro container
Messaging: JMS, MQTT, Kafka, SQS
Platforms: Docker, Kubernetes, OpenShift, DC/OS
Stream processing tightly integrated with Kafka
Microservices: main()
Messaging: Kafka, Kafka Streams
Platforms: any Kafka runs on
Open source by Lightbend
Microservices: Lagom, Play
Messaging: akka
Platforms: Conductr, ???
Open source project based on the Spring stack
Microservices: Spring Boot, Spring Cloud Stream & Task
Messaging: Kafka, RabbitMQ
Platforms: PCF, Kuberntes, YARN, Mesos
Java EE / Jakarta EE Kafka Streams
Lagom Framework Spring Cloud Data Flow
Overview of Java EE APIs suited for Dataservices.
45
CDI
Extensions
Web
Fragments
Bean Validation
2.0
CDI 2.0
Managed Beans 1.0
JCA 1.7
JPA 2.2 JMS 2.0
JSP 2.3
EL 3.0
EJB 3.2 Batch 1.0
JSF 2.3
Interceptors
1.2
Mail 1.6
Common
Annotations 1.3
JTA 1.2
JAX-
WS 1.4
JAX-RS
2.1
Concurrency
1.0
JSON-
P 1.1
JSON-B 1.0
WebSocket
1.1
JAPSIC 1.1 JACC 1.5
Security
1.0
Servlet 4.0
JCache 1.0
Cloud-ready runtimes suited for Dataservices.
46
… and many more.
Building Block 5
47
JDBC
Source
Weather
Processor
Weather
File Sink
Weather
DB Sink
REST
Source
JAX-RS
JMS
MQTT
Source
JSON-P
JMS
CSV
Source
JBatch
JMS
JBatch
JMS
CSV
In-Memory
Datagrid
Topic
Queue
Topic
Location
Processor
JSON-P
JMS
JCache
JSON-P
JMS
JCache
CSV
JMS
JSON-P
JPA
JMS
JSON-P
JPA
12.12.2018
QAware 48
Mario-Leander Reimer
mario-leander.reimer@qaware.de
@LeanderReimer xing.com/companies/qawaregmbh
linkedin.com/company/qaware-gmbh slideshare.net/qaware
twitter.com/qaware
youtube.com/qawaregmbh
github.com/qaware

Dataservices - Data Processing mit Microservices