This document discusses query-driven data modeling for NoSQL databases. It explains that NoSQL databases have different data models, capabilities, and transactional properties than SQL databases. Data modeling for NoSQL requires unlearning normalization rules and embedding related data together to serve queries from a single document. Important considerations for query-driven modeling include document size, relationship cardinality, indexing impacts, schema versioning strategies, choice of sharding keys, and facilitating communication.
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patterns
1. Hackolade Tutorial
Part 3- Query driven data modeling based on access patterns
Copyright Š 2016-2023 Hackolade 1
2. The rules of data modeling
⢠Data modeling for RDBMSs uses the rules of normalization
Copyright Š 2016-2023 Hackolade 2
3. The rules of data modeling
⢠NoSQL databases are completely different
⢠Different data models
⢠Different sizing parameters and capabilities
⢠True âhorizontalâ scalability in infinitely distributed systems
⢠Different transactional capabilities
⢠immediate vs. eventual consistency
⢠ACID vs BASE
⢠Different use cases
⢠NoSQL requires a mindshift in schema design, adhering to different
rules and parameters
Copyright Š 2016-2023 Hackolade 3
4. The rules of data modeling
⢠NoSQL advocates UNLEARNING the rules of normalisation
⢠NoSQL allows to aggregate information that belongs together
⢠join the data âon writeâ, instead of (time and time again) âon readâ
Copyright Š 2016-2023 Hackolade 4
5. The NoSQL mindshift
⢠From APPLICATION-AGNOSTIC to APPLICATION-SPECIFIC data
modeling
Copyright Š 2016-2023 Hackolade 5
6. The âEmbeddingâ approach
Query driven data modeling
⢠first define the queries (aka âaccess patternsâ) for the
application,
⢠then store the data according to the query needs
⢠Ideally, single db access should provide access to all related,
joined-up information: EMBEDDING the data into single
atomic document
<> âreferencingâ â leveraging data stored elsewhere using foreign keys, pulled
in with joins
Copyright Š 2016-2023 Hackolade 6
7. Important factors for Query-driven Data Modeling
⢠Aggregate / Document size and transaction volume
⢠Cardinality of relationships
⢠Beware of unbounded arrays: consequences of unlimited growth!
⢠When embedding one-to-many relationships, one should estimate
the cardinality
Copyright Š 2016-2023 Hackolade 7
8. Important factors for Query-driven Data Modeling
Referential integrity
⢠To ACID or not to ACID
⢠Embedding vs. ACID: documents are atomic units!
⢠Role of the application!
Indexing impacts
Polymorphic document designs can lead to proliferation of
indexes
Data duplication can be a good idea!
Copyright Š 2016-2023 Hackolade 8
9. Schema versioning and migration
⢠Schemas can often be evolved without interruption of
database operations.
⢠Handle with care!
⢠Especially when multiple applications / reporting & analytical tools access
the same DB!
⢠Transition periods & strategies matter!
Copyright Š 2016-2023 Hackolade 9
10. Schema versioning and migration
Different strategies are used:
⢠Eager: first migrate data, then application
⢠Does not leverage the benefits of JSON
⢠Lazy: only update the document when used
⢠Some documents will never be migrated!
⢠Incremental: migrate when lower load!
⢠Predictive migration: based on heuristics/estimates
⢠Also combinations of strategies: predictive migration
first, followed by incremental
⢠Endless versioning: not desirable!
⢠See entire chapter in MongoDB Data Modeling & Schema Design
book
Copyright Š 2016-2023 Hackolade 10
11. Backward- and forward-compatibility
⢠No database is an island: many systems interacting with it
⢠Avoid the introduction of breaking changes: huge impacts on
agility and costs.
⢠Think through each evolution: features in schema standards
(JSON Schema, Avro, etc.) for full compatibility of schemas.
⢠consumers and producers can upgrade at their own pace!
Copyright Š 2016-2023 Hackolade 11
12. Choice of partition / sharding keys
⢠NoSQL databases offer horizontal scaling
⢠through distribution of data across servers, data centers and
geographies
⢠Requires careful design
⢠find a scalable way to facilitate efficient retrieval of information
when serving queries, from a minimal number of shards
⢠a query should hit 1 shard!
Copyright Š 2016-2023 Hackolade 12
13. Facilitating communication and collaboration
⢠The purpose of all Data Modeling!!!
⢠âLooking at the codeâ vs. sharing an ERD picture
⢠Data modeling and schema design for NOSQL databases and data
formats provides some guardrails in the face of unlimited flexibility
and power of NoSQL.
Copyright Š 2016-2023 Hackolade 13
14. Reading material
⢠See Hackolade online documentation
⢠The Hackolade Blog
⢠This excellent new book:
MongoDB Data Modeling & Schema Design
⢠Many of the principles in the book are related to query
driven modeling based on access patterns!
⢠Hackoladeâs on social media: LinkedIn page, Twitter page
⢠Download Hackolade studio for free
Copyright Š 2016-2023 Hackolade 14