One of the most challenging aspects of cloud applications is selecting the right technologies for moving, store, and protect your data. What frameworks, tools and managed services will you select? Your choices include open source solutions, enterprise versions, and cloud service providers. Things only get more confusing when you realize that you might be one acquisition or business decision away from needing to provide a hybrid cloud or multi-cloud solution.In this talk we'll look specifically at: * the use of technologies for persisting and moving data in cloud applications * how the concept of "polyglot persistence" affects you* the proper usage of different styles of databases, caches, and streaming solutions* effective patterns for combining these technologies, using Apache Cassandra and Apache Kafka as examples
3. 3 Š DataStax, All Rights Reserved.
Agenda
1 Context â Monolith to Microservices, On-Prem to Cloud
2 Selecting Infrastructure, Then and Now
3 Persistence Patterns â Featuring Cassandra
4 Persistence + Streaming â Featuring Kafka
5 Resources
community.datastax.com | @jscarp
4. 4 Š DataStax, All Rights Reserved.
Agenda
1 Context â Monolith to Microservices, On-Prem to Cloud
2 Selecting Infrastructure, Then and Now
3 Persistence Patterns â Featuring Cassandra
4 Persistence + Streaming â Featuring Kafka
5 Resources
community.datastax.com | @jscarp
5. Old School Enterprise Architecture
5
Š DataStax, All Rights Reserved.
All tables
ACID Transactions
Joins
Indexes
RDBMS
Monolithi
c
Applicatio
n
Other
AppsIntegration
by
database
community.datastax.com | @jscarp
6. Transitional Architecture
6
Š DataStax, All Rights Reserved.
RDBMS
Monolithi
c
Applicatio
n
Integration by API
Service
s
Other
Apps
NoSQL,
NewSQL,
RDBMS
?
community.datastax.com | @jscarp
7. On Prem DC
Microservices in the Cloud
Services
Clients
Applications
AWS DC A AWS DC B GCP DC
community.datastax.com | @jscarp
8. 8 Š DataStax, All Rights Reserved. community.datastax.com | @jscarp
9. 9 Š DataStax, All Rights Reserved.
Agenda
1 Context â Monolith to Microservices, On-Prem to Cloud
2 Selecting Infrastructure, Then and Now
3 Persistence Patterns â Featuring Cassandra
4 Persistence + Streaming â Featuring Kafka
5 Resources
community.datastax.com | @jscarp
10. Tasks of the Architect
Defining Components and Interfaces
Identifying Patterns
Managing the âilities
Making tradeoffs
10 Š DataStax, All Rights Reserved. community.datastax.com | @jscarp
13. Quality Attribute Bingo - Then
â˘Performance â˘Scalability â˘Availability â˘Reliability
â˘Extensibility â˘Modularity â˘Reusability â˘Monitorability
â˘Deployability â˘Maintainability â˘Usability â˘Cost
13 Š DataStax, All Rights Reserved. community.datastax.com | @jscarp
14. Data Infrastructure Criteria - Now
DX
Performance
Availability
Security
Flexibility
Cost
14 Š DataStax, All Rights Reserved. community.datastax.com | @jscarp
15.
16. 16 Š DataStax, All Rights Reserved. community.datastax.com | @jscarp
17. Minimizing Cost of Change - Abstraction
17
Š DataStax, All Rights Reserved.
Service
Database
API
Busines
s Logic
Messaging
Data
Access
Queue / Stream
community.datastax.com | @jscarp
18. 18 Š DataStax, All Rights Reserved.
Agenda
1 Context â Monolith to Microservices, On-Prem to Cloud
2 Selecting Infrastructure, Then and Now
3 Persistence Patterns â Featuring Cassandra
4 Persistence + Streaming â Featuring Kafka
5 Resources
community.datastax.com | @jscarp
19. Core
application data
Microservices and Polyglot Persistence
19
Š DataStax, All Rights Reserved.
Servic
e A
Service
B
Tabular Key-value (cache)
Servic
e C
RelationalDocument Graph
Service
D
Service
E
Reference data Content
Highly
networked data
Legacy, low
volume data
community.datastax.com | @jscarp
20. Apache Cassandra Overview
⢠First developed by Facebook
⢠Top-level Apache project since 2010
⢠Partitioned row store
⢠Distributed, decentralized
⢠Elastic scalability / high performance
⢠High availability / fault tolerant
⢠Tuneable consistency
⢠Cassandra Query Language (CQL)
Š DataStax, All Rights Reserved.20
Apache Cassandra ÂŽ Apache Software Foundation
community.datastax.com | @jscarp
21.
22. KillrVideo â A video sharing application
https://github.com/KillrVideohttps://killrvideo.github.io
community.datastax.com | @jscarp
23. KillrVideo High Level Architecture
KillrVideo
Services
Your
Browser
Web
Application
Technology Choices
⢠Node.js
⢠Falcor
⢠Java / C# / Node.js / Python
⢠GRPC
⢠Etcd
⢠DataStax Drivers
⢠DataStax Enterprise
including Apache
Cassandra & Spark, Graph
Deployment
⢠Download and run locally
via Docker
⢠Deployed in AWS using
DataStax Managed
Services:
http://killrvideo.com/
community.datastax.com | @jscarp
24. Application Workflow in KillrVideo
User Logs
into site
Show basic
information
about user
Show videos
added by a
user
Show
comments
posted by a
user
Search for a
video by tag
Show latest
videos added
to the site
Show
comments
for a video
Show ratings
for a video
Show video
and its
details
community.datastax.com | @jscarp
25. Queries in KillrVideo to Support Workflows
Users
User Logs into
site
Find user by email
address
Show basic
information
about user
Find user by id
Comments
Show
comments for
a video
Find comments by
video (latest first)
Show
comments
posted by a
user
Find comments by
user (latest first)
Ratings
Show ratings
for a video Find ratings by video
community.datastax.com | @jscarp
26. Designing Tables Based on Queries
Show video
and its
details
Find video by id
Show videos
added by a
user
Find videos by user (latest
first)
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
CREATE TABLE user_videos (
userid uuid,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (userid,
added_date, videoid)
)
WITH CLUSTERING ORDER BY (
added_date DESC,
videoid ASC);
community.datastax.com | @jscarp
27. Delivery Models for Cloud (Data) Infrastructure
Enterprise Versions
⢠Pro
â Certification and
Support
â Additional features
â Security
⢠Con
â Licensing cost
â Cost of change
Open Source
⢠Pro
â Free for dev and
prod
â Visibility and
modifiability
⢠Con
â Cost to maintain
expertise
â Dependence on
community
Managed Services
⢠Pro
â Ease of adoption
â Lowest time to
prod
â Pay as you go
⢠Con
â Observability
obscured
â Cost
management
community.datastax.com | @jscarp
29. Core
application data
Microservices and Polyglot Persistence
29
Š DataStax, All Rights Reserved.
Servic
e A
Service
B
Tabular Key-value (cache)
Servic
e C
RelationalDocument Graph
Service
D
Service
E
Reference data Content
Highly
networked data
Legacy, low
volume data
community.datastax.com | @jscarp
30. Should a Service be Polyglot?
30
Š DataStax, All Rights Reserved.
Hotel
Service
Cassandra Key-value
(Redis, etc.)
Name-to-
ID
mapping
?
Primary
store
(tabular)
community.datastax.com | @jscarp
31. Emerging - Multi-model Databases
31 Š DataStax, All Rights Reserved.
Servic
e A
Service
B
DSE database
Key-value
semantics
Servic
e C
Service
D
CQL JSON Gremlin
DSE Graph
community.datastax.com | @jscarp
32. 32 Š DataStax, All Rights Reserved.
Agenda
1 Context â Monolith to Microservices, On-Prem to Cloud
2 Selecting Infrastructure, Then and Now
3 Persistence Patterns â Featuring Cassandra
4 Persistence + Streaming â Featuring Kafka
5 Resources
community.datastax.com | @jscarp
33. Apache Kafka Overview
⢠First developed by LinkedIn
⢠Top-level Apache Project since 2012
⢠Distributed streaming platform
⢠Used for real-time data pipelines and
streaming applications
⢠Horizontal scalability / high performance
⢠High availability / Fault tolerance
⢠Stream persistence and querying
(KSQL)
⢠Connect framework
33 Š DataStax, All Rights Reserved.
Apache Kafka ÂŽ Apache Software Foundation
community.datastax.com | @jscarp
34. Kafka Concepts
⢠Topics
â Collection of key/value pairs
â Append-only
â Can be partitioned
⢠Producers
⢠Consumers
â Separate offsets
34 Š DataStax, All Rights Reserved. community.datastax.com | @jscarp
35. Kafka Concepts
⢠Streams applications
â Combined Producer/Consumer
⢠KSQL
â Query language used by stream
applications
35 Š DataStax, All Rights Reserved. community.datastax.com | @jscarp
37. Cassandra + Kafka â Similarities and Distinctives
⢠Concepts in common
â Distributed Systems
â Partitioning / Hashing
â Replication
⢠Slight differences in implementation
â Multi-DC
â Log-structured
â TTL / retention
⢠Cassandra excels atâŚ
â High volume, write intensive data storage
workloads at scale
â Suitable as a system of record
â High performance searching via DSE
⢠Kafka excels atâŚ
â Streaming data to/from services and legacy
data sources
â Acting upon changes in data from multiple
sources (aka pipelines)
37 Š DataStax, All Rights Reserved. community.datastax.com | @jscarp
39. Pattern 1: Cassandra + Kafka in Microservices
39 Š DataStax, All Rights Reserved.
Some
Producer
My
microservice
DataStax Enterprise
⢠Consume
topic(s)
Other
consumers
⢠Read /
write data
⢠Publish to
topic(s)
community.datastax.com | @jscarp
40. KillrVideo Services Suggested
Videos
Service
DataStax Enterprise
DSE Graph
⢠UserCreated
⢠YouTubeVideoAdded
⢠UserRatedVideo ⢠Populate graph
⢠Graph recommender
traversal
⢠Read and
write data
User Management, Video
Catalog, Ratings
Cassandra + Kafka â KillrVideo Example
community.datastax.com | @jscarp
41. ConfidentialŠ DataStax, All Rights Reserved.
Pattern 2: Kafka into Cassandra
41 community.datastax.com | @jscarp
42. Takeaways
Flexibility in selection of databases per microservice
Select and deploy infrastructure based on scale
Use queues to coordinate data synchronization
Use abstraction to minimize the cost of change
42 Š DataStax, All Rights Reserved. community.datastax.com | @jscarp
43. 43 Š DataStax, All Rights Reserved.
Agenda
1 Context â Monolith to Microservices, On-Prem to Cloud
2 Selecting Infrastructure, Then and Now
3 Persistence Patterns â Featuring Cassandra
4 Persistence + Streaming â Featuring Kafka
5 Resources
community.datastax.com | @jscarp
44. DataStax Academy
⢠Free self-paced courses
⢠DS201: Apache Cassandraâ˘
⢠DS210: Operations
⢠DS220: Data Modeling
⢠DS310: Search
⢠DS320: Analytics
⢠DS330: Graph
⢠Kafka Connector Getting Started
44 Š DataStax, All Rights Reserved.
https://academy.datastax.com
community.datastax.com | @jscarp
45. Docker and Datastax
45 Confidential
⢠WHERE
â https://hub.docker.com/u/datastax/
â https://github.com/datastax/docker-
images/tree/master/datastax-docker-image-
examples
⢠We provide
â Dockers images for DSE, studio, Opscenter
â Docker-compose configuration files
â Sample Deployments
⢠We support
â Installation on dev before 6.7
â Installation on prod from 6.7 (December 2018)
community.datastax.com | @jscarp
46. Live Coding on Twitch
⢠Live coding sessions with advocates and
guests
⢠Working through the challenges of
building distributed systems
⢠Join the conversation and ask questions
⢠Twitch Rewind: Kafka Connector
â https://www.youtube.com/watch?v=2_BidD
K5zGE
https://www.twitch.tv/datastaxacademy
46 Š DataStax, All Rights Reserved. community.datastax.com | @jscarp
Architect â distributed systems, will share mistakes
Author
advocate
As of just a few years ago, most application development used a single primary data store based on a relational database, plus the occasional file based storage for other data.
This seemed great because you could have all your data in one place, and even have transactions with ACID semantics spanning multiple tables. You could add any indexes you wanted and perform complex joins across tables
These databases worked so well that sometimes we were even tempted to use them as the interface between systems. This âintegration by databaseâ came to be considered an anti-pattern as we realized how brittle these integrations were â usually when we updated an application database only to find that it broke other apps.
This architecture served us well for many years, and is still appropriate for some applications. The problem is that its entirely inappropriate for cloud-scale applications.
Strangler pattern for getting rid of monoliths
Decomposed into microservices
More lightweight/flexible front end apps, webapps
Independently scalable
Deploy across multiple datacenters
Data movement becomes an important factor
Multi-cloud demo
Modified from Eben
Components and interfaces â including API definition and defining patterns
How I grew up
Based on trade studies
Based on enterprise license agreements
Major contracts, Multi-year agreements
Corporate governance, Guidance documents
Now, the wild west?
DevOps â you build it, you maintain it
Not everyone does this
Matrixed decision making
Now, a progression
Developers start the food chain
What do we encounter, when
The bogeyman
We all use it
Lock in is lazy
Need a better discourse
Cost to adopt, cost to operate, cost to change
Soapbox â regardless of infrastructure choices
Use a modular design within the service to isolate concerns
If the API style changes
If the database changes
If the queue or stream changes
Discoverable endpoints, well known names â if the deployment changes
So we had to come up with new architectural approaches to deal with this new world of massively scalable, distributed systems.
The microservice architecture approach has become very popular for building cloud-based applications, and for good reason. The ability to develop, manage and scale services independently gives us a ton of flexibility.
One axis of this flexibility is known by the term âpolyglot persistenceâ, as popularized by Martin Fowler and others. In this approach, services are developed by separate teams, and each team is free to use whatever storage mechanism seems most appropriate to them.
So one team developing Service A might choose to use Cassandra because it is managing core application data that really fits that tabular format, while Service Bâs data supports very simple semantics of looking up reference values by well known keys.
Another service C might be primarily concerned with serving up content for a website and use a document store. Another Service.
Service D might be all about navigating complex relationships between data such as customer data and relationships.
We might also have a legacy system or service that uses relational technology, or perhaps a service that manages low volume data that doesnât change often, so a relational database might be a good fit for that.
Note that Iâm not trying to constrain our trade space or specify a particular design, Iâm just trying to highlight the strengths of each of these styles of database and why a multi-model approach to cloud architecture can be attractive.
We built DSE to address these cloud application characteristics
They help describe why we were major contributors behind Apache Cassandra, and other open source technologies
And itâs why this technology has been applied successfully in many companies
Cassandra was first developed at Facebook (explain other details)
DataStax Enterprise is our distribution of Apache Cassandra.
We say this distribution is the best because of the additional security and performance features we put into the core database and the testing and hardening we do
What is KillrVideo?
A great way to learn concepts
So what is behind the system
We built a microservice application
Using this to demonstrate how we identify data models and services
Service identification â wrapping a service around video tables
Revisiting this â at what levels do we apply polyglot persistence concepts?
Itâs also possible that we could design a service that actually sits on top of multiple databases, although in that case Iâd definitely want to make sure that weâre not over-engineering or building a service whose scope of work is too large.
Probably not a good combination
Another way to think about this problem is to consider that our database itself could be a multi-model database, that is a database that supports different models of interaction.
For example, Datastax Enterprise is built on top of the most performant, hardened distribution of Apache Cassandra
Services could interact with the core cassandra directly using CQL. Although DSE does not provide a key-value API, you can interact with it as a key value store. DSE does provide document-style interaction in terms of JSON documents.
DSE Graph is a highly scalable graph database that is built directly on top of DSE core Cassandra and supports the popular Gremlin API.
Why would we want something like this? I can think of two primary reasons.
First, Cassandra has demonstrated that it can handle the massive scalability and performance required of cloud applications, which is not necessarily true of other databases. DSE provides that hardened, stable, reliable implementation of Cassandra as a solid foundation for your cloud application.
Second, using a multi-model database approach with DSE can help us with operational simplicity. Even if different development teams are using different APIs and modes of interaction with the backend database platform, we can gain efficiency by only having a single platform to manage.
Replication â but subtle
A third pattern is possible â Cassandra as a destination system