SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Semi Formal Model for Document
Oriented Databases
Daniel Coupal
Universia.com
1
Agenda
1.Why Having a Model?
2.Modeling Steps
3.Capturing the Model
4.Tools
2
Why having a Model?
• Documentation, common language
• Repeatable process
• Abstraction from database implementations
• Support for tools
• A document DB is supposed to be “schemaless”!
• No! Having a schema is a good thing.
Need to declare everything is the problem.
3
What if you have many apps?
Info about the schema is in
the code of Application A
Application B wants to read
the data in the DB.
Where is the description of
what it can read, write, ...?
4
Why we choose NoSQL?
• Rewards
• Huge amount of data
• Cheap hardware
• Blazing fast
5
Why we choose NoSQL?
• Rewards
• Huge amount of data
• Cheap hardware
• Blazing fast
• Compromises
• No joins, no transactions, less integrity
• Not as mature technology
• Less tools
6
Tradeoff between Performance and Data Integrity
NoSQL Little Secrets
• No experience on maintaining
databases and apps over the
years, which is the most
expensive activity in software
development.
• Not all the same vendors will
be there in few years.
• What if your DB is not
maintained anymore?
• What if there is a better DB
available?
7
NoSQL State of the Art
• Designing by Example
• Used in most tutorials
• Works well on small examples, like blogs
• Database with more tables needs a better way
to capture the design
8
{
"_id" : ObjectId("508d27069cc1ae293b36928d"),
"title" : "This is the title",
"body" : "This is the body text.",
"tags" : [
"chocolate",
"spleen",
"piano",
"spatula"
],
"created_date" : ISODate("2012-10-28T12:41:39.110Z"),
"author_id" : ObjectId("508d280e9cc1ae293b36928e"),
"category_id" : ObjectId("508d29709cc1ae293b369295"),
"comments" : [
{
"subject" : "This is comment 1",
"body" : "This is the body of comment 1.",
"author_id" : ObjectId("508d345f9cc1ae293b369296"),
"created_date" : ISODate("2012-10-28T13:34:23.929Z")
},
{
"subject" : "This is comment 2",
"body" : "This is the body of comment 2.",
"author_id" : ObjectId("508d34739cc1ae293b369297"),
"created_date" : ISODate("2012-10-28T13:34:43.192Z")
},
]
}
9
NoSQL State of the Art
Complex ER Diagram
10
Northwind ER Diagram
11
Northwind Doc Diagram
11 tables in those 5 collections
No need for:
- CustomerCustomerDemographics
- EmployeeTerritories
because they are N-to-N relationships,
and don’t contain any data
Products
Suppliers
OrdersEmployees Customers
Customer
Demographics
Shippers
OrderDetails
Region
Categories
12
Territories
That was a bad example...
• Why?
13
That was a bad example...
• Why?
• With a document database, you don’t model
data as your first step!
• Data is modeled based on the usage
• SQL’s model first approach leads to bad
performance for every app.
NOSQL does the opposite.
14
Modeling Steps
SQL NoSQL
Goal
Answer to
Step 1
Step 2
Step 3
Step 4
general usage current usage
what answer do I have? what questions do I have?
model data write queries
write application add indexes
write queries model data
add indexes write application
15
Step 1: Write Queries
• Basic fields to retrieve
• Frequency of the query, requested speed
• Criticality of the query for the system
• Design notes
➡ Sort the queries by importance
16
Step 2: Add Indexes
• Which indexes do you need for the queries to go
fast?
• Attributes of your indexes
17
Step 3: Model Data
• List the collections
• How many documents per collection?
➡ NoSQL is all about size and performance, no?
• Attributes on the collections (capped, ...)
• List the fields, their types, constraints
➡ Only for the important fields
18
Step 4: Write Application
• Integration code/driver/queries/database
• Balance between using the product functionality and
isolating the layer that deals with the database.
• Interesting new tools to normalize to a common
query language: JSONiq, BigSQL, ...
19
Capturing the Model
• JSON is a cool format!
• Your document database is a cool storage facility!
• Language for the model: JSON Schema
• supports things like: types, cardinality, references, acceptable values, ...
20
JSON Schema
{
"address": {
"streetAddress": "21 2nd Street",
"city":"New York"
},
"phoneNumber":
[
{
"type":"home",
"number":"212 555-1234"
}
]
}
{
"type": "object",
"properties": {
"address": {
"type": "object",
"properties": {
"city": {
"type": "string"
},
"streetAddress": {
"type": "string"
}
}
},
"phoneNumber": {
"type": "array",
"items": {
"properties": {
"number": {
"type": "string"
},
"type": {
"type": "string"
}
}
}
}
}
}
21
Model: Query
• Use:
• the native DB notation
• or use SQL (everyone can read SQL)
• Avoid joins!!!
• Example:
• Product by ProductID, ProductName, SupplierID
• Order by OrderID, CustomerID, ContactName
• Customer by CustomerID, ContactName, OrderID
22
Example
23
{
! "id" : "REQ002",
! "name" : "Get product by name",
! "n" : “20000/day”,
“t” : “2 ms”,
! "notes" : [
! ! "User asking about a product availability by product name"
! ],
! "sqlquery" : "select * from product where product.ProductName = abcde",
! "mongoquery" : {
! ! "ProductName" : "abcde"
! }
}
Model: Index
• Again, use the native DB notation
• Example:
• Product.ProductID, .ProductName, .SupplierID
• Order.OrderID, .CustomerID, .ContactName
• Customer by .CustomerID, .ContactName, .OrderID
• Why is it useful, it looks so trivial?
• If written a tool can validate it or create estimates
24
Example
25
{
! "id" : "REQ002",
! "name" : "Get product by name",
! "n" : “20000/day”,
“t” : “2 ms”,
! "notes" : [
! ! "User asking about a product availability by product name"
! ],
! "sqlquery" : "select * from product where product.ProductName = abcde",
! "mongoquery" : {
! ! "ProductName" : "abcde"
! },
! "index" : {
! ! "collection" : "Products",
! ! "field" : "ProductName"
! }
}
Model: Data
• Collection
• One JSON-Schema document per collection
• Fields for collection and database
• Optionally, add a version number
26
Example for ‘Orders’
27
{
“database” : “northwind”,
“collection” : “Orders”,
“version” : 1,
"type":"object",
"$schema": “http://json-schema.org/draft-03/schema”,
"id": "http://jsonschema.net",
“properties”: {
"CustomerID": {
"type":"string",
"id": "http://jsonschema.net/CustomerID"
},
“Details”: {
"type":"array",
"id": "http://jsonschema.net/Details",
"items":
{
“type”: “object”,
"id": "http://jsonschema.net/Details/0",
“required”: [ “ProductID”, “Quantity” ],
"properties": {
"ProductID": {
"type":"number",
"id": "http://jsonschema.net/Details/0/ProductID"
},
"Quantity": {
“type”: “number",
},
Simpler...
28
{
“database” : “northwind”,
“collection” : “Orders”,
“version” : 1,
"type":"object",
"properties": {
"CustomerID": {
"type":"string"
},
"Details": {
"type":"array",
"items":
{
"type":"object",
"properties": {
"ProductID": {
"type":"number"
},
"Quantity": {
"type":"number"
},
...
Model: Versioning
• Each modified version of a
collection is a new document
• db.<database>.find({“version:2”})
➡shows all collections for version
‘2’ of the schema for the DB.
29
Partial Schema
• Example: you just want to validate the ‘version’
field which has values as ‘string’ and as ‘number’
30
{
"type": "object",
"properties": {
"version": {
"type": "string",
}
}
}
{
"version": 1.0,
...
},
{
"version": “1.0.1”,
...
}
JSON SchemaJSON
Tools
• Get some JSON Schema from JSON:
• http://www.jsonschema.net/
• Validate your schema
• http://jsonschemalint.com/
• https://github.com/dcoupal/godbtools.git
• Validate/edit JSON
• http://jsonlint.com/ or RoboMongo
• Import SQL into NoSQL
• Pentaho, Talend
31
Tools considerations
• NoSQL often relies on data being in RAM.
Scanning all your data can make your dataset in
memory “cold”, instead of “hot”
• running incremental validations work better, ensure
you have timestamps on insertions and updates
32
Document Validator
33
Schema
(JSON Schema)
Collection
(JSON)
Validator
“Eventual Integrity”
• NoSQL have eventual consistency
• With tools that validate and fix the data according
to a set of rules, we get “eventual integrity”
34
Tools to be developed
• UI to manipulate a schema graphically
• More Complete Validators:
• constraints
• relationships
• Per language library to validate inserted/updated
documents
35
Conclusion: Take Aways
• Design in this order:
queries, indexes, data,
application.
• Capture your model
outside the application.
• Not having a schema is
not a good thing!
Use the attribute
‘schemaless’ wisely!
36
NoSQL
Goal
Answer to
Step 1
Step 2
Step 3
Step 4
current usage
what questions do I have?
write queries
add indexes
model data
write application
Questions?
• dcoupal@universia.com
37

Weitere ähnliche Inhalte

Was ist angesagt?

Multi-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresMulti-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresJiaheng Lu
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.Keshav Murthy
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBMongoDB
 
Data Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane FineData Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane FineMongoDB
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleMongoDB
 
Understanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesUnderstanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesKeshav Murthy
 
OrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data RelationshipsOrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data RelationshipsFabrizio Fortino
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4jNeo4j
 
01 nosql and multi model database
01   nosql and multi model database01   nosql and multi model database
01 nosql and multi model databaseMahdi Atawneh
 
Jumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema DesignJumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema DesignMongoDB
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data ModelingVital.AI
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB MongoDB
 
Getting Started with NoSQL
Getting Started with NoSQLGetting Started with NoSQL
Getting Started with NoSQLAaron Benton
 
Webinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsWebinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsMongoDB
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
 
MongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB
 

Was ist angesagt? (20)

Multi-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresMulti-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated Polystores
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDB
 
Data Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane FineData Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane Fine
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You Scale
 
Understanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesUnderstanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune Queries
 
OrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data RelationshipsOrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data Relationships
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
01 nosql and multi model database
01   nosql and multi model database01   nosql and multi model database
01 nosql and multi model database
 
Jumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema DesignJumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema Design
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data Modeling
 
MongoDB Meetup
MongoDB MeetupMongoDB Meetup
MongoDB Meetup
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB
 
Getting Started with NoSQL
Getting Started with NoSQLGetting Started with NoSQL
Getting Started with NoSQL
 
Webinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsWebinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance Implications
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
Multi model-databases
Multi model-databasesMulti model-databases
Multi model-databases
 
MongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema Design
 
Nosql
NosqlNosql
Nosql
 

Ähnlich wie Semi Formal Model for Document Oriented Databases

The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsMatias Cascallares
 
Gab document db scaling database
Gab   document db scaling databaseGab   document db scaling database
Gab document db scaling databaseMUG Perú
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Keshav Murthy
 
No SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDBNo SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDBKen Cenerelli
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfjill734733
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhenDavid Peyruc
 
Application development with Oracle NoSQL Database 3.0
Application development with Oracle NoSQL Database 3.0Application development with Oracle NoSQL Database 3.0
Application development with Oracle NoSQL Database 3.0Anuj Sahni
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveIBM Cloud Data Services
 
NoSQL Data Modeling using Couchbase
NoSQL Data Modeling using CouchbaseNoSQL Data Modeling using Couchbase
NoSQL Data Modeling using CouchbaseBrant Burnett
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDBMongoDB
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015NoSQLmatters
 
Crafting Evolvable Api Responses
Crafting Evolvable Api ResponsesCrafting Evolvable Api Responses
Crafting Evolvable Api Responsesdarrelmiller71
 
Inferring Versioned Schemas from NoSQL Databases and its Applications
Inferring Versioned Schemas from NoSQL Databases and its ApplicationsInferring Versioned Schemas from NoSQL Databases and its Applications
Inferring Versioned Schemas from NoSQL Databases and its ApplicationsDiego Sevilla Ruiz
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichNorberto Leite
 
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced FeaturesAndrew Liu
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB MongoDB
 
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsSQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsMike Broberg
 
Retail referencearchitecture productcatalog
Retail referencearchitecture productcatalogRetail referencearchitecture productcatalog
Retail referencearchitecture productcatalogMongoDB
 

Ähnlich wie Semi Formal Model for Document Oriented Databases (20)

The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
 
Gab document db scaling database
Gab   document db scaling databaseGab   document db scaling database
Gab document db scaling database
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
 
No SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDBNo SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDB
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdf
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
 
Application development with Oracle NoSQL Database 3.0
Application development with Oracle NoSQL Database 3.0Application development with Oracle NoSQL Database 3.0
Application development with Oracle NoSQL Database 3.0
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
 
NoSQL Data Modeling using Couchbase
NoSQL Data Modeling using CouchbaseNoSQL Data Modeling using Couchbase
NoSQL Data Modeling using Couchbase
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
 
Crafting Evolvable Api Responses
Crafting Evolvable Api ResponsesCrafting Evolvable Api Responses
Crafting Evolvable Api Responses
 
Inferring Versioned Schemas from NoSQL Databases and its Applications
Inferring Versioned Schemas from NoSQL Databases and its ApplicationsInferring Versioned Schemas from NoSQL Databases and its Applications
Inferring Versioned Schemas from NoSQL Databases and its Applications
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
 
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsSQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 Questions
 
Retail referencearchitecture productcatalog
Retail referencearchitecture productcatalogRetail referencearchitecture productcatalog
Retail referencearchitecture productcatalog
 

Mehr von Daniel Coupal

MongoDB.Live 2020 - Advanced Schema Design Patterns
MongoDB.Live 2020  - Advanced Schema Design PatternsMongoDB.Live 2020  - Advanced Schema Design Patterns
MongoDB.Live 2020 - Advanced Schema Design PatternsDaniel Coupal
 
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDBMongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDBDaniel Coupal
 
Silicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in productionSilicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in productionDaniel Coupal
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelDaniel Coupal
 
MMS: The Easiest Way to Run MongoDB
MMS: The Easiest Way to Run MongoDBMMS: The Easiest Way to Run MongoDB
MMS: The Easiest Way to Run MongoDBDaniel Coupal
 
Silicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDBSilicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDBDaniel Coupal
 

Mehr von Daniel Coupal (6)

MongoDB.Live 2020 - Advanced Schema Design Patterns
MongoDB.Live 2020  - Advanced Schema Design PatternsMongoDB.Live 2020  - Advanced Schema Design Patterns
MongoDB.Live 2020 - Advanced Schema Design Patterns
 
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDBMongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
 
Silicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in productionSilicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in production
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
MMS: The Easiest Way to Run MongoDB
MMS: The Easiest Way to Run MongoDBMMS: The Easiest Way to Run MongoDB
MMS: The Easiest Way to Run MongoDB
 
Silicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDBSilicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDB
 

Kürzlich hochgeladen

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 

Kürzlich hochgeladen (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 

Semi Formal Model for Document Oriented Databases

  • 1. Semi Formal Model for Document Oriented Databases Daniel Coupal Universia.com 1
  • 2. Agenda 1.Why Having a Model? 2.Modeling Steps 3.Capturing the Model 4.Tools 2
  • 3. Why having a Model? • Documentation, common language • Repeatable process • Abstraction from database implementations • Support for tools • A document DB is supposed to be “schemaless”! • No! Having a schema is a good thing. Need to declare everything is the problem. 3
  • 4. What if you have many apps? Info about the schema is in the code of Application A Application B wants to read the data in the DB. Where is the description of what it can read, write, ...? 4
  • 5. Why we choose NoSQL? • Rewards • Huge amount of data • Cheap hardware • Blazing fast 5
  • 6. Why we choose NoSQL? • Rewards • Huge amount of data • Cheap hardware • Blazing fast • Compromises • No joins, no transactions, less integrity • Not as mature technology • Less tools 6 Tradeoff between Performance and Data Integrity
  • 7. NoSQL Little Secrets • No experience on maintaining databases and apps over the years, which is the most expensive activity in software development. • Not all the same vendors will be there in few years. • What if your DB is not maintained anymore? • What if there is a better DB available? 7
  • 8. NoSQL State of the Art • Designing by Example • Used in most tutorials • Works well on small examples, like blogs • Database with more tables needs a better way to capture the design 8
  • 9. { "_id" : ObjectId("508d27069cc1ae293b36928d"), "title" : "This is the title", "body" : "This is the body text.", "tags" : [ "chocolate", "spleen", "piano", "spatula" ], "created_date" : ISODate("2012-10-28T12:41:39.110Z"), "author_id" : ObjectId("508d280e9cc1ae293b36928e"), "category_id" : ObjectId("508d29709cc1ae293b369295"), "comments" : [ { "subject" : "This is comment 1", "body" : "This is the body of comment 1.", "author_id" : ObjectId("508d345f9cc1ae293b369296"), "created_date" : ISODate("2012-10-28T13:34:23.929Z") }, { "subject" : "This is comment 2", "body" : "This is the body of comment 2.", "author_id" : ObjectId("508d34739cc1ae293b369297"), "created_date" : ISODate("2012-10-28T13:34:43.192Z") }, ] } 9 NoSQL State of the Art
  • 12. Northwind Doc Diagram 11 tables in those 5 collections No need for: - CustomerCustomerDemographics - EmployeeTerritories because they are N-to-N relationships, and don’t contain any data Products Suppliers OrdersEmployees Customers Customer Demographics Shippers OrderDetails Region Categories 12 Territories
  • 13. That was a bad example... • Why? 13
  • 14. That was a bad example... • Why? • With a document database, you don’t model data as your first step! • Data is modeled based on the usage • SQL’s model first approach leads to bad performance for every app. NOSQL does the opposite. 14
  • 15. Modeling Steps SQL NoSQL Goal Answer to Step 1 Step 2 Step 3 Step 4 general usage current usage what answer do I have? what questions do I have? model data write queries write application add indexes write queries model data add indexes write application 15
  • 16. Step 1: Write Queries • Basic fields to retrieve • Frequency of the query, requested speed • Criticality of the query for the system • Design notes ➡ Sort the queries by importance 16
  • 17. Step 2: Add Indexes • Which indexes do you need for the queries to go fast? • Attributes of your indexes 17
  • 18. Step 3: Model Data • List the collections • How many documents per collection? ➡ NoSQL is all about size and performance, no? • Attributes on the collections (capped, ...) • List the fields, their types, constraints ➡ Only for the important fields 18
  • 19. Step 4: Write Application • Integration code/driver/queries/database • Balance between using the product functionality and isolating the layer that deals with the database. • Interesting new tools to normalize to a common query language: JSONiq, BigSQL, ... 19
  • 20. Capturing the Model • JSON is a cool format! • Your document database is a cool storage facility! • Language for the model: JSON Schema • supports things like: types, cardinality, references, acceptable values, ... 20
  • 21. JSON Schema { "address": { "streetAddress": "21 2nd Street", "city":"New York" }, "phoneNumber": [ { "type":"home", "number":"212 555-1234" } ] } { "type": "object", "properties": { "address": { "type": "object", "properties": { "city": { "type": "string" }, "streetAddress": { "type": "string" } } }, "phoneNumber": { "type": "array", "items": { "properties": { "number": { "type": "string" }, "type": { "type": "string" } } } } } } 21
  • 22. Model: Query • Use: • the native DB notation • or use SQL (everyone can read SQL) • Avoid joins!!! • Example: • Product by ProductID, ProductName, SupplierID • Order by OrderID, CustomerID, ContactName • Customer by CustomerID, ContactName, OrderID 22
  • 23. Example 23 { ! "id" : "REQ002", ! "name" : "Get product by name", ! "n" : “20000/day”, “t” : “2 ms”, ! "notes" : [ ! ! "User asking about a product availability by product name" ! ], ! "sqlquery" : "select * from product where product.ProductName = abcde", ! "mongoquery" : { ! ! "ProductName" : "abcde" ! } }
  • 24. Model: Index • Again, use the native DB notation • Example: • Product.ProductID, .ProductName, .SupplierID • Order.OrderID, .CustomerID, .ContactName • Customer by .CustomerID, .ContactName, .OrderID • Why is it useful, it looks so trivial? • If written a tool can validate it or create estimates 24
  • 25. Example 25 { ! "id" : "REQ002", ! "name" : "Get product by name", ! "n" : “20000/day”, “t” : “2 ms”, ! "notes" : [ ! ! "User asking about a product availability by product name" ! ], ! "sqlquery" : "select * from product where product.ProductName = abcde", ! "mongoquery" : { ! ! "ProductName" : "abcde" ! }, ! "index" : { ! ! "collection" : "Products", ! ! "field" : "ProductName" ! } }
  • 26. Model: Data • Collection • One JSON-Schema document per collection • Fields for collection and database • Optionally, add a version number 26
  • 27. Example for ‘Orders’ 27 { “database” : “northwind”, “collection” : “Orders”, “version” : 1, "type":"object", "$schema": “http://json-schema.org/draft-03/schema”, "id": "http://jsonschema.net", “properties”: { "CustomerID": { "type":"string", "id": "http://jsonschema.net/CustomerID" }, “Details”: { "type":"array", "id": "http://jsonschema.net/Details", "items": { “type”: “object”, "id": "http://jsonschema.net/Details/0", “required”: [ “ProductID”, “Quantity” ], "properties": { "ProductID": { "type":"number", "id": "http://jsonschema.net/Details/0/ProductID" }, "Quantity": { “type”: “number", },
  • 28. Simpler... 28 { “database” : “northwind”, “collection” : “Orders”, “version” : 1, "type":"object", "properties": { "CustomerID": { "type":"string" }, "Details": { "type":"array", "items": { "type":"object", "properties": { "ProductID": { "type":"number" }, "Quantity": { "type":"number" }, ...
  • 29. Model: Versioning • Each modified version of a collection is a new document • db.<database>.find({“version:2”}) ➡shows all collections for version ‘2’ of the schema for the DB. 29
  • 30. Partial Schema • Example: you just want to validate the ‘version’ field which has values as ‘string’ and as ‘number’ 30 { "type": "object", "properties": { "version": { "type": "string", } } } { "version": 1.0, ... }, { "version": “1.0.1”, ... } JSON SchemaJSON
  • 31. Tools • Get some JSON Schema from JSON: • http://www.jsonschema.net/ • Validate your schema • http://jsonschemalint.com/ • https://github.com/dcoupal/godbtools.git • Validate/edit JSON • http://jsonlint.com/ or RoboMongo • Import SQL into NoSQL • Pentaho, Talend 31
  • 32. Tools considerations • NoSQL often relies on data being in RAM. Scanning all your data can make your dataset in memory “cold”, instead of “hot” • running incremental validations work better, ensure you have timestamps on insertions and updates 32
  • 34. “Eventual Integrity” • NoSQL have eventual consistency • With tools that validate and fix the data according to a set of rules, we get “eventual integrity” 34
  • 35. Tools to be developed • UI to manipulate a schema graphically • More Complete Validators: • constraints • relationships • Per language library to validate inserted/updated documents 35
  • 36. Conclusion: Take Aways • Design in this order: queries, indexes, data, application. • Capture your model outside the application. • Not having a schema is not a good thing! Use the attribute ‘schemaless’ wisely! 36 NoSQL Goal Answer to Step 1 Step 2 Step 3 Step 4 current usage what questions do I have? write queries add indexes model data write application