This document discusses modern operational data architectures and the use of both relational and NoSQL databases. It provides an overview of relational databases and their ACID properties. While relational databases dominate the market, they have limitations around scalability, flexibility, and performance. NoSQL databases offer alternatives like horizontal scaling and flexible schemas. Key-value stores are best for caching, sessions, and serving data, while document stores are popular for hierarchical and search use cases. Graph databases excel at link analysis. The document advocates a polyglot persistence approach using multiple database types according to their strengths. It provides examples of search architectures using both database-centric and application-centric distribution approaches.
2. About Me
• Name: Arthur Gimpel
• Position: Technology Evangelist, Solutions
Architect, Trainer
• Tech Stack: MongoDB, SQL Server,
Couchbase, Elastic Stack, Redis, Kafka,
Python, .NET
3. Relational Databases
• First RDBMS was introduced in late 1970s
• Exist in all possible flavors but share one
thing - ACID
• Still dominate the database market
4. RDBMS In Theory
• Atomicity: All or nothing approach, transactions
• Consistency: Hard state, every transaction
changes the whole DBMS
• Isolation: Transactions cannot interfere with
each other
• Durability: Every transaction is persisted
5. RDBMS Is Not Perfect
• Everything is persisted, synchronously.
Limited by IO performance
• All data is bound to a tabular schema,
hard to make changes in big databases
• ACID makes horizontal scaling nearly*
impossible
• Complex schema slows down aggregations
and queries drastically
6. NoSQL
• Distributed / Horizontal Scalability
• Mostly Open Source
• Mostly schema less:
• Key - Value
• Document
• Graph
• Serves specific purposes
7. NoSQL - Key Value Stores
• Key:
• Usually string, equivalent to primary key in a
relational database
• Value:
• Simple values: Int, Float, DateTime
• Complex values: Array, Binary, XML, JSON
8. Key Value - Characteristics
• Database is usually a set of unique keys,
and its values
• KV data stores are usually easy to
distribute
• Key Value access usually is VERY fast
• Indexing and querying values is usually
challenging
9. Key Value - Use Cases
• Distributed caching
• Session / temporary user data
• Ad tech: Impressions
• Ad tech: Serving data - profiles, segments
• Recommendation engines - main data store
10. NoSQL - Graph Stores
“In computing, a graph database is a database
that uses graph structures for semantic
queries with nodes, edges and properties to
represent and store data” (Wikipedia)
11. Graph - Characteristics
• Nodes are entities - for example a person
• Properties describe nodes - for example
age, name
• Edges are relations between nodes and/or
properties
12. Graph - Use Cases
• Fraud detection
• Recommendation engines - link analysis
• Intelligence systems
• Social Networks
• Medical Research
13. NoSQL - Document Stores
• Document databases usually store JSON
• Used to store object oriented data
• Usually used to avoid relational - object
mismatch
• Document stores have the highest
adoption rate among NoSQL databases
14. Document Store - Characteristics
• Information is stored in JSON variations
• Some document stores support secondary
indexes for easier querying
• Documents are usually divided to logical
groups (collections, buckets, types -
instead of RDBMS tables)
15. Document Store - Use Cases
• “Relational” use cases where there is a
need for high scale (volume, velocity,
variety)
• Hierarchal data - aggregations
• Search use cases
16. NoSQL - Challenges
• Every data store has its purpose. There is
no single solution to all database needs
• NoSQL does not implement all of RDBMS’s
abilities (CDC, Jobs, Stored Procedures,
Triggers)
• Every data store has its own languages,
and APIs. There is no ANSI SQL
18. Polyglot Persistence
Sample Use Cases
• Add search capabilities to your database
• Split session / temporary data processing
to key value stores
• Add Graph analysis capabilities to your
operational database
22. Architecture Comparison
Architecture #1 Architecture #2
Data distribution
strategy
Data store based Application based
Data distribution
component
Data Pipeline Message Queue
Implementation Team Data Engineers / DevOps DevOps / Developers
Implementation
Complexity
Low: Data pipeline
development
High: data access layer
refactor
Scalability Limited to RDBMS Scale
Fully scalable regardless
of RDBMS
23. Summary
• Chose the relevant database engine for
the right mission - replacing databases is
not easy
• Do not hesitate to use more than one
database engine in your operational
application, single point of truth will be
created in the analytical stack
• Sizing is no replacement for benchmark.
Check your deployment carefully