This document discusses different API options for databases: REST, gRPC, and GraphQL. It begins with an overview of Apache Cassandra and its key features as a distributed database. It then covers an API design methodology, including conceptual and logical data modeling, mapping queries to tables, and creating the physical schema. The document presents criteria for evaluating API choices and provides pros and cons of REST, gRPC, and GraphQL. It concludes that REST is best for CRUD operations, gRPC for high performance services, and GraphQL for discoverability and flexible payloads.
11. @clunven | @voxxed_lu | #voxxed_lu
Sweet spots
1. High Throughput (because we can keep up)
2. High Volume (because we scale linearly and still OLTP)
3. High Availability (replication, masterless)
4. Data distribution (read/write around the globe)
12. @clunven | @voxxed_lu | #voxxed_lu
• Le KEYSPACE est comme un schéma
dans Oracle, une isolation des
données
12
Projet_X Keyspace
Une table contient une CLEF
PRIMAIRE contenant 2 parties : La
partition key et le reste (clustering
columns). Chaque valeur de
partition key est hashée sous la
forme d’un token.
Plusieurs lignes avec la même
partition key constitue une
partition.
Data Modelling
13. @clunven | @voxxed_lu | #voxxed_lu
• Syntaxe proche du SQL pour
les bases relationnelles
• Création des objets avec le
DDL :
• CREATE, INSERT, UPDATE,
DELETE, GRANT, REVOKE,
SELECT, WHERE
13
Exemple
CREATE TABLE market_prices (
symbol TEXT,
date TIMESTAMP,
price DECIMAL,
side INT,
PRIMARY KEY (symbol, date)
) WITH CLUSTERING ORDER BY(date DESC);
Cassandra Query Language
20. @clunven | @voxxed_lu | #voxxed_lu
Application Workflow
R1: Find comments related to target video using its identifier
• Get most recent first
• Implement Paging
R2: Find comments related to target user using its identifier
• Get most recent first
• Implement Paging
R3: Implement CRUD operations
21. @clunven | @voxxed_lu | #voxxed_lu
Mapping
Q2: Find comments posted for a user with a
known id (show most recent first)
comments_by_video
comments_by_user
Q1: Find comments for a video with a
known id (show most recent first)
Q3: CRUD Operations
22. @clunven | @voxxed_lu | #voxxed_lu
Logical Data Model
userid
creationdate
commentid
videoid
comment
comments_by_user
K
C
↑
videoid
creationdate
commentid
userid
comment
comments_by_video
C
↑
K
C
↑
↑C
23. @clunven | @voxxed_lu | #voxxed_lu
Physical Data Model
userid
commentid
videoid
comment
comments_by_user
TIMEUUID
K
TEXT
C
UUID
UUID
↑
videoid
commentid
userid
comment
comments_by_video
TIMEUUID
K
TEXT
C
UUID
UUID
↑
24. @clunven | @voxxed_lu | #voxxed_lu
Schema DDL
CREATE TABLE IF NOT EXISTS comments_by_user (
userid uuid,
commentid timeuuid,
videoid uuid,
comment text,
PRIMARY KEY ((userid), commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
CREATE TABLE IF NOT EXISTS comments_by_video (
videoid uuid,
commentid timeuuid,
userid uuid,
comment text,
PRIMARY KEY ((videoid), commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
25. @clunven | @voxxed_lu | #voxxed_lu
How?
Conceptual Data
Model
(Entities, Relations)
Application Workflow
(Queries)
Database Family
(Technos +Table)
33. @clunven | @voxxed_lu | #voxxed_lu
Decoupling Client / Server (Schema on read)
Flexibility: Sync, Async, Reactive + Multi payload
Api Lifecycle (Versioning)
Tooling (API Management, Serverless)
Verbose payloads (json, xml)
No discoverability
Not best fit for command-like (functions) API (RPC)
CRUD superstar
Relevant for OLTP mutations and statuses
Public and web APIs
34. @clunven | @voxxed_lu | #voxxed_lu
High Performances (http/2 – binary serialisation)
Multiple stubs : Sync, Async, Streaming
Multi languages (Interoperability)
Strongly coupled (schema with proto files)
No discoverability
Protobuf serialization format
Distributed network of services (no waits)
High throughput & streaming use cases
Command-like, RPC
35. @clunven | @voxxed_lu | #voxxed_lu
Discoverability, documentation
Custom payloads
Match standards (Json | Http)
Single endpoint (versioning, monitoring, security)
Complex implementation (tooling, still young)
Nice for customers nasty for DB (N+1 select)
BFF : Backend for frontend
Service aggregation | composition (joins)
When bandwidth matters (mobile phones)
GraphQL
One of Cassandra's fault-tolerance strategies is replication.
Replication is a matter of duplicating data across nodes.
The number of replicas is called the Replication Factor.
Let’s look at some examples…
<click> (RF=1 appears)
Let’s start with the simplest example:
a replication factor of 1 – only a single copy
It’s not something you would likely do in production,
but it's a good place to start the discussion.
<click> (data appears)
Here we’re showing a write request
Some data with a partition token of 59
<click> (data moves to node)
The top-right node will serve as the coordinator
<click> (data turns purple)
Notice 59 falls in the purple range
<click> (data move to the node)
So the coordinator forwards the data to the purple node
<click> (data clears)
<click> (RF=2 appears)
Let's increase the replication factor to 2
<click> (ring colors double)
This doubles the range that each node is responsible for.
For example, node 75 becomes responsible for
the red range
and the purple range
<click> (data appears)
Again our request to write partition with token 59 arrives.
But this time the coordinator sends it to two nodes.
<click> (data moves)
<click> (data fades)
Let's increase the replication factor to 3
<click> (RF=3 appears)
This means that each node is responsible for 3 ranges
<click> (3 ranges appear)
Once again, the data arrives at the coordinator
Where will the coordinator send the data this time?
<click> (data moves)
We see the data replicated to all three nodes
<click> (data fades)
So, in a nutshell, that's how replication works
Consistency level is different than replication factor.
On a read, consistency level is how many replicas you read
Each replica has a time stamp.
The most recent replica wins
In this example, imagine we have a replication factor of 3.
<click> (shows write arrows)
So when we write, we write 3 replicas of the data.
<click> (shows CL=ONE)
Now let's say we want to read from this cluster with a consistency level of 1.
In this case, we only need to read from a single node to resolve the data.
<click> (shows read arrows)
Now, we can change the consistency level to quorum,
<click> (CL=QUORUM appears)
Which means we want to read a majority of the replicas.
Since the replication factor is 3, quorum implies reading 2 replicas
<click> (second read line appears)
Notice, if the replicas disagree, the coordinator returns the data with the most recent time stamp
<click> (CL clears)
We can even specify a consistency level of ALL,
<click> (CL=ALL appears)
Which means we will read all replicas
<click> (third read line appears)
(pause to let people absorb)
the default data model in DSE is tabular. It is similar to an RDBMS table but more flexible/dynamic
Table is partitioned by one or more columns enabling fast lookups by partition keys
A keyspace is the outermost container containing data corresponding to applications, keyspace is similar to database a relational database. each row in a column family is indexed by its key and contains ordered columns. Such unique data model allow DSE to deal with unstructured, schemaless data.
Cassandra Query Language (CQL) is the primary language for communicating with DSE data management platform. CQL is a SQL-like language and allows you to create keyspace and tables, insert and query tables plus other activities using the language you are already familiar with.
ALEXANDRE
Few logos:
As we already told we are using Java why not using the last Java 12
Services are implemented and connected with Spring
Everything is wrapped into a Spring boot 2.1 application
Services are exposed as REST with Spring MVC
Did you see our gray hairs and beards here, we do Java,
we are serious people and do not play with the teenager language JavaScript
ALEXANDRE
Show repository, stress its simplicity and blocking calls
Show controller, same thing
Show controller unit test, run tests