In this talk, we are going to tell you the story of building the Connection Platform (CoPa). This is an endeavor undertaken at Generali Switzerland over the course of the last year, in a collaboration with Innovation Process Technology. The goal was to design a general purpose, state of the art integration platform, which covers all integration needs of the enterprise. The central data distribution and integration layer are powered by Confluent Kafka. We will throw a spotlight on three different aspects of this platform that, all in their own right, are essential for agile data integration.
First of all, the platform is hosted on the container platform Redhat Openshift. Everything is set up in flexible Docker containers. Automated pipelines are used to build, provision and deploy everything on the platform from infrastructure to data pipeline
1. Agile Data Integration
How is it Possible?
Meetup, 27th of March 2018
Thomas Peter (Generali Switzerland)
Yves Brise (Innovation Process Technology)
2. Disclaimer
The following presentation is for general information,
education and discussion purposes only. Views or
opinions expressed, whether oral or in writing do not
necessarily reflect those of Generali or ipt nor do they
constitute legal or professional advice. -> But it rocks!
2
3. A new Connection Platform for Generali CH
• GCH starts conceiving and designing the new integration and application platform
• GCH starts building new platform MVP in collaboration with IPT (in about 9 months)
Embedded in GCH Enterprise Cloud-CRM program, business applications (e.g. integration
paths for Cloud-CRM as well as other new applications) are delivered by third party providers
MVP, MVP, MVP: just do it and…
Platform MVP delivered to program on the 15th march 18
3
March 17
March 18
June 17
4. Innovation Process Technology
• IT Service Provider
• CH based, ca. 115 employees
• Strategic integration partner for Generali
• Premier partner of Confluent
• A great place to work: www.ipt.ch
Data-Driven Business Process Digitalization Cyber Security Agile Organizations
4
5. The Vision of Data-Driven Business
«[…] the means by which an organisation seeks to
maximise the efficiency with which it plans, collects,
organises, uses, controls, stores, disseminates, and
disposes of its data, and through which it ensures
that the value of that data is identified and exploited
to the maximum extent possible.»
Adapted from: Oracle, Information Management and Big Data – A Reference Architecture, September 20145
6. Key Elements to Become More Data-Driven
6
„Data First“
Technology
Governance
Friendliness
Project
Enablement
Agile Data
Integration
Design for
Scalability
8. CoPa Physical Technology Stack
Physical Infrastructure
Operating System
Virtualization
OpenShift
Docker
API GW CDC Confluent
Data
Store
Spring Boot
CI/CD(Infrastructure,
ProjecInitilizer,
Customization)
App
App
App
App App
App
App
App
• Infastructure provider takes
care of bottom layers
• Openshift / Kubernetes /
Docker as Container layer
• Spring Boot as application
framework
• Functionality provided as
service in the platform: e.g.
API GW, Data Store, CDC,
Kafka
• Top Layers DevOps enabled
8
9. CoPa Logical Layers
Ingestion & Delivery
Process & Persistence
Service
Access
Client
SA AT
Res
Res
Res
CDC
Res
KS KS
Proxy GW Proxy GW
Connect
9
• Sources and targets are
served through
ingestion & delivery
layer
• ‘External’ clients are
served through service
layer implementations
(Resources)
• Security is guaranteed
on the access layer
• Identities are translated
on the service layer
10. Getting In and Out of CoPa • One Openshift instance hosts all
production stages (DEVL, TEST, …)
• Separation is guaranteed through
multitenant networking plugin
• Kafka is not exposed to the outside of
Openshift cluster (yet)
• Access from/to outside solely through
Confluent REST proxy
• Kafka clients are authenticated via
client certificates
• If outside access to Kafka is needed,
use 3rd party networking plugin (e.g.
Calico, Contiv,…) that allows BGP
• Network performance no bottleneck
(yet)
10
13. Master-Slave Data Flow in System Integration
Master
View A View B
Slave A Slave B
Query
[push]
Query
[pull]
maintains
Change commandChange command
Looks like CQRS.
But how to build the view?
13
14. Journey Towards Event Streaming
«Process data as it has changed»
«Process change data events»
Listen &
Copy
1:1 Table
Replication
Batch
Processing
Domain based
Table
Batch
Processing
Optimized
Table
Replication:
ChangedData
Processing
Streaming:
ChangeEvent
Processing
Data Source
in Core
Data is
changed
Data is
changed
Data Source
in Core
Ingest
Event
Technical
Event Journal
Process
Event
Domain based
Event Journal
Process
Event
Optimized
Event Journal
Consume
Data
Consume
Data or Event
14
16. The Integration Model is Document-Based
11 April 2018
K
V
K
V
K
V
K
V
K
V
K
V
K
V
im.party - COMPACT
Unique Partner
Aggregate identifier
Partyinformation
Physicaladdress
Contactaddress
PhonecontactPoint
Emailcontactpoint
Party header
Partyheader
Party record state
Address record state
Contact point record state
16
Main drivers of this design decision
• Coherent contexts
• Join-once-shredder-often
• Dynamic and expressive
• No re-keying needed
«Party» as an example
17. Part III
Patterns for the Kafka Cluster
⏤
Cornerstone of Solution Design
How do integration and application patterns contribute?
18. Application & Integration Patterns for Solution Design
18
CoPa
Event-driven
Business
Objects
Event
Sourcing
Event
Distribution
Event
Streaming
Name
Alias
Description
Application or Integration
In scope of integration model
Cleanup policy
New source of truth
Naming convention
Data format
Key format
Schema Registry schemes
Keying schemes
Consumer / Producer schemes
Delivery semantics
19. Event-Driven BOs Provides Consumer-Driven Data Views
Event-driven Business Objects: data change events flow from a master to one or many slaves
in order to be queried by them in the form appropriate to them
19
20. Event-Distribution Ensures Transactional Integrity
Event-Distribution: data change commands flow from a slave to a master in order to be applied there
or requests flow from a client to a server in order to be executed
20
21. Event-Streaming Handles ‚Big Data‘ Loads
Event-Streaming: events flow from producers to consumers in order to be analyzed and
processed along a given time-window
21
22. Event-Sourcing Provides the Complete Change History
Event-Sourcing: data change events are stored in the sequence they occur in order to be able
to derive any state in time
22
23. In order to succeed,
technology,
governance and
solution design
need to scale well and
become teammates!
25. Kafka Meetup – 27th of March 2018
26
6:00pm Doors open
6:10pm – KSQL - An Open Source Streaming Engine for Apache Kafka
6:45pm Kai Waehner (Confluent)
6:45pm – Data Integration with Kafka on Openshift
7:20pm Thomas Peter (Generali)
Yves Brise (IPT)
7:20pm – Networking, Apéro and Drinks
8:30pm