SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Semi Formal Model for Document
Oriented Databases
Daniel Coupal
Universia.com
1
Agenda
1.Why Having a Model?
2.Modeling Steps
3.Capturing the Model
4.Tools
2
Why having a Model?
• Documentation, common language
• Repeatable process
• Abstraction from database implementations
• Support for tools
• A document DB is supposed to be “schemaless”!
• No! Having a schema is a good thing.
Need to declare everything is the problem.
3
What if you have many apps?
Info about the schema is in
the code of Application A
Application B wants to read
the data in the DB.
Where is the description of
what it can read, write, ...?
4
Why we choose NoSQL?
• Rewards
• Huge amount of data
• Cheap hardware
• Blazing fast
5
Why we choose NoSQL?
• Rewards
• Huge amount of data
• Cheap hardware
• Blazing fast
• Compromises
• No joins, no transactions, less integrity
• Not as mature technology
• Less tools
6
Tradeoff between Performance and Data Integrity
NoSQL Little Secrets
• No experience on maintaining
databases and apps over the
years, which is the most
expensive activity in software
development.
• Not all the same vendors will
be there in few years.
• What if your DB is not
maintained anymore?
• What if there is a better DB
available?
7
NoSQL State of the Art
• Designing by Example
• Used in most tutorials
• Works well on small examples, like blogs
• Database with more tables needs a better way
to capture the design
8
{
"_id" : ObjectId("508d27069cc1ae293b36928d"),
"title" : "This is the title",
"body" : "This is the body text.",
"tags" : [
"chocolate",
"spleen",
"piano",
"spatula"
],
"created_date" : ISODate("2012-10-28T12:41:39.110Z"),
"author_id" : ObjectId("508d280e9cc1ae293b36928e"),
"category_id" : ObjectId("508d29709cc1ae293b369295"),
"comments" : [
{
"subject" : "This is comment 1",
"body" : "This is the body of comment 1.",
"author_id" : ObjectId("508d345f9cc1ae293b369296"),
"created_date" : ISODate("2012-10-28T13:34:23.929Z")
},
{
"subject" : "This is comment 2",
"body" : "This is the body of comment 2.",
"author_id" : ObjectId("508d34739cc1ae293b369297"),
"created_date" : ISODate("2012-10-28T13:34:43.192Z")
},
]
}
9
NoSQL State of the Art
Complex ER Diagram
10
Northwind ER Diagram
11
Northwind Doc Diagram
11 tables in those 5 collections
No need for:
- CustomerCustomerDemographics
- EmployeeTerritories
because they are N-to-N relationships,
and don’t contain any data
Products
Suppliers
OrdersEmployees Customers
Customer
Demographics
Shippers
OrderDetails
Region
Categories
12
Territories
That was a bad example...
• Why?
13
That was a bad example...
• Why?
• With a document database, you don’t model
data as your first step!
• Data is modeled based on the usage
• SQL’s model first approach leads to bad
performance for every app.
NOSQL does the opposite.
14
Modeling Steps
SQL NoSQL
Goal
Answer to
Step 1
Step 2
Step 3
Step 4
general usage current usage
what answer do I have? what questions do I have?
model data write queries
write application add indexes
write queries model data
add indexes write application
15
Step 1: Write Queries
• Basic fields to retrieve
• Frequency of the query, requested speed
• Criticality of the query for the system
• Design notes
➡ Sort the queries by importance
16
Step 2: Add Indexes
• Which indexes do you need for the queries to go
fast?
• Attributes of your indexes
17
Step 3: Model Data
• List the collections
• How many documents per collection?
➡ NoSQL is all about size and performance, no?
• Attributes on the collections (capped, ...)
• List the fields, their types, constraints
➡ Only for the important fields
18
Step 4: Write Application
• Integration code/driver/queries/database
• Balance between using the product functionality and
isolating the layer that deals with the database.
• Interesting new tools to normalize to a common
query language: JSONiq, BigSQL, ...
19
Capturing the Model
• JSON is a cool format!
• Your document database is a cool storage facility!
• Language for the model: JSON Schema
• supports things like: types, cardinality, references, acceptable values, ...
20
JSON Schema
{
"address": {
"streetAddress": "21 2nd Street",
"city":"New York"
},
"phoneNumber":
[
{
"type":"home",
"number":"212 555-1234"
}
]
}
{
"type": "object",
"properties": {
"address": {
"type": "object",
"properties": {
"city": {
"type": "string"
},
"streetAddress": {
"type": "string"
}
}
},
"phoneNumber": {
"type": "array",
"items": {
"properties": {
"number": {
"type": "string"
},
"type": {
"type": "string"
}
}
}
}
}
}
21
Model: Query
• Use:
• the native DB notation
• or use SQL (everyone can read SQL)
• Avoid joins!!!
• Example:
• Product by ProductID, ProductName, SupplierID
• Order by OrderID, CustomerID, ContactName
• Customer by CustomerID, ContactName, OrderID
22
Example
23
{
! "id" : "REQ002",
! "name" : "Get product by name",
! "n" : “20000/day”,
“t” : “2 ms”,
! "notes" : [
! ! "User asking about a product availability by product name"
! ],
! "sqlquery" : "select * from product where product.ProductName = abcde",
! "mongoquery" : {
! ! "ProductName" : "abcde"
! }
}
Model: Index
• Again, use the native DB notation
• Example:
• Product.ProductID, .ProductName, .SupplierID
• Order.OrderID, .CustomerID, .ContactName
• Customer by .CustomerID, .ContactName, .OrderID
• Why is it useful, it looks so trivial?
• If written a tool can validate it or create estimates
24
Example
25
{
! "id" : "REQ002",
! "name" : "Get product by name",
! "n" : “20000/day”,
“t” : “2 ms”,
! "notes" : [
! ! "User asking about a product availability by product name"
! ],
! "sqlquery" : "select * from product where product.ProductName = abcde",
! "mongoquery" : {
! ! "ProductName" : "abcde"
! },
! "index" : {
! ! "collection" : "Products",
! ! "field" : "ProductName"
! }
}
Model: Data
• Collection
• One JSON-Schema document per collection
• Fields for collection and database
• Optionally, add a version number
26
Example for ‘Orders’
27
{
“database” : “northwind”,
“collection” : “Orders”,
“version” : 1,
"type":"object",
"$schema": “http://json-schema.org/draft-03/schema”,
"id": "http://jsonschema.net",
“properties”: {
"CustomerID": {
"type":"string",
"id": "http://jsonschema.net/CustomerID"
},
“Details”: {
"type":"array",
"id": "http://jsonschema.net/Details",
"items":
{
“type”: “object”,
"id": "http://jsonschema.net/Details/0",
“required”: [ “ProductID”, “Quantity” ],
"properties": {
"ProductID": {
"type":"number",
"id": "http://jsonschema.net/Details/0/ProductID"
},
"Quantity": {
“type”: “number",
},
Simpler...
28
{
“database” : “northwind”,
“collection” : “Orders”,
“version” : 1,
"type":"object",
"properties": {
"CustomerID": {
"type":"string"
},
"Details": {
"type":"array",
"items":
{
"type":"object",
"properties": {
"ProductID": {
"type":"number"
},
"Quantity": {
"type":"number"
},
...
Model: Versioning
• Each modified version of a
collection is a new document
• db.<database>.find({“version:2”})
➡shows all collections for version
‘2’ of the schema for the DB.
29
Partial Schema
• Example: you just want to validate the ‘version’
field which has values as ‘string’ and as ‘number’
30
{
"type": "object",
"properties": {
"version": {
"type": "string",
}
}
}
{
"version": 1.0,
...
},
{
"version": “1.0.1”,
...
}
JSON SchemaJSON
Tools
• Get some JSON Schema from JSON:
• http://www.jsonschema.net/
• Validate your schema
• http://jsonschemalint.com/
• https://github.com/dcoupal/godbtools.git
• Validate/edit JSON
• http://jsonlint.com/ or RoboMongo
• Import SQL into NoSQL
• Pentaho, Talend
31
Tools considerations
• NoSQL often relies on data being in RAM.
Scanning all your data can make your dataset in
memory “cold”, instead of “hot”
• running incremental validations work better, ensure
you have timestamps on insertions and updates
32
Document Validator
33
Schema
(JSON Schema)
Collection
(JSON)
Validator
“Eventual Integrity”
• NoSQL have eventual consistency
• With tools that validate and fix the data according
to a set of rules, we get “eventual integrity”
34
Tools to be developed
• UI to manipulate a schema graphically
• More Complete Validators:
• constraints
• relationships
• Per language library to validate inserted/updated
documents
35
Conclusion: Take Aways
• Design in this order:
queries, indexes, data,
application.
• Capture your model
outside the application.
• Not having a schema is
not a good thing!
Use the attribute
‘schemaless’ wisely!
36
NoSQL
Goal
Answer to
Step 1
Step 2
Step 3
Step 4
current usage
what questions do I have?
write queries
add indexes
model data
write application
Questions?
• dcoupal@universia.com
37

Weitere ähnliche Inhalte

Was ist angesagt?

Multi-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresMulti-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresJiaheng Lu
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.Keshav Murthy
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBMongoDB
 
Data Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane FineData Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane FineMongoDB
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleMongoDB
 
Understanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesUnderstanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesKeshav Murthy
 
OrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data RelationshipsOrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data RelationshipsFabrizio Fortino
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4jNeo4j
 
01 nosql and multi model database
01   nosql and multi model database01   nosql and multi model database
01 nosql and multi model databaseMahdi Atawneh
 
Jumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema DesignJumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema DesignMongoDB
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data ModelingVital.AI
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB MongoDB
 
Getting Started with NoSQL
Getting Started with NoSQLGetting Started with NoSQL
Getting Started with NoSQLAaron Benton
 
Webinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsWebinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsMongoDB
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
 
MongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB
 

Was ist angesagt? (20)

Multi-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresMulti-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated Polystores
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.N1QL workshop: Indexing & Query turning.
N1QL workshop: Indexing & Query turning.
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDB
 
Data Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane FineData Analytics with MongoDB - Jane Fine
Data Analytics with MongoDB - Jane Fine
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You Scale
 
Understanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesUnderstanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune Queries
 
OrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data RelationshipsOrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data Relationships
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
01 nosql and multi model database
01   nosql and multi model database01   nosql and multi model database
01 nosql and multi model database
 
Jumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema DesignJumpstart: Introduction to Schema Design
Jumpstart: Introduction to Schema Design
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data Modeling
 
MongoDB Meetup
MongoDB MeetupMongoDB Meetup
MongoDB Meetup
 
How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB How Insurance Companies Use MongoDB
How Insurance Companies Use MongoDB
 
Getting Started with NoSQL
Getting Started with NoSQLGetting Started with NoSQL
Getting Started with NoSQL
 
Webinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsWebinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance Implications
 
Webinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick DatabaseWebinar: How Banks Use MongoDB as a Tick Database
Webinar: How Banks Use MongoDB as a Tick Database
 
Multi model-databases
Multi model-databasesMulti model-databases
Multi model-databases
 
MongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema Design
 
Nosql
NosqlNosql
Nosql
 

Ähnlich wie Semi Formal Model for Document Oriented Databases

The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsMatias Cascallares
 
Gab document db scaling database
Gab   document db scaling databaseGab   document db scaling database
Gab document db scaling databaseMUG Perú
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Keshav Murthy
 
No SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDBNo SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDBKen Cenerelli
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfjill734733
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhenDavid Peyruc
 
Application development with Oracle NoSQL Database 3.0
Application development with Oracle NoSQL Database 3.0Application development with Oracle NoSQL Database 3.0
Application development with Oracle NoSQL Database 3.0Anuj Sahni
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveIBM Cloud Data Services
 
NoSQL Data Modeling using Couchbase
NoSQL Data Modeling using CouchbaseNoSQL Data Modeling using Couchbase
NoSQL Data Modeling using CouchbaseBrant Burnett
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDBMongoDB
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015NoSQLmatters
 
Crafting Evolvable Api Responses
Crafting Evolvable Api ResponsesCrafting Evolvable Api Responses
Crafting Evolvable Api Responsesdarrelmiller71
 
Inferring Versioned Schemas from NoSQL Databases and its Applications
Inferring Versioned Schemas from NoSQL Databases and its ApplicationsInferring Versioned Schemas from NoSQL Databases and its Applications
Inferring Versioned Schemas from NoSQL Databases and its ApplicationsDiego Sevilla Ruiz
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichNorberto Leite
 
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced FeaturesAndrew Liu
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB MongoDB
 
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsSQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsMike Broberg
 
Retail referencearchitecture productcatalog
Retail referencearchitecture productcatalogRetail referencearchitecture productcatalog
Retail referencearchitecture productcatalogMongoDB
 

Ähnlich wie Semi Formal Model for Document Oriented Databases (20)

The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
 
Gab document db scaling database
Gab   document db scaling databaseGab   document db scaling database
Gab document db scaling database
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
 
No SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDBNo SQL, No Problem: Use Azure DocumentDB
No SQL, No Problem: Use Azure DocumentDB
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdf
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
 
Application development with Oracle NoSQL Database 3.0
Application development with Oracle NoSQL Database 3.0Application development with Oracle NoSQL Database 3.0
Application development with Oracle NoSQL Database 3.0
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
 
NoSQL Data Modeling using Couchbase
NoSQL Data Modeling using CouchbaseNoSQL Data Modeling using Couchbase
NoSQL Data Modeling using Couchbase
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
 
Crafting Evolvable Api Responses
Crafting Evolvable Api ResponsesCrafting Evolvable Api Responses
Crafting Evolvable Api Responses
 
Inferring Versioned Schemas from NoSQL Databases and its Applications
Inferring Versioned Schemas from NoSQL Databases and its ApplicationsInferring Versioned Schemas from NoSQL Databases and its Applications
Inferring Versioned Schemas from NoSQL Databases and its Applications
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
 
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsSQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 Questions
 
Retail referencearchitecture productcatalog
Retail referencearchitecture productcatalogRetail referencearchitecture productcatalog
Retail referencearchitecture productcatalog
 

Mehr von Daniel Coupal

MongoDB.Live 2020 - Advanced Schema Design Patterns
MongoDB.Live 2020  - Advanced Schema Design PatternsMongoDB.Live 2020  - Advanced Schema Design Patterns
MongoDB.Live 2020 - Advanced Schema Design PatternsDaniel Coupal
 
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDBMongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDBDaniel Coupal
 
Silicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in productionSilicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in productionDaniel Coupal
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelDaniel Coupal
 
MMS: The Easiest Way to Run MongoDB
MMS: The Easiest Way to Run MongoDBMMS: The Easiest Way to Run MongoDB
MMS: The Easiest Way to Run MongoDBDaniel Coupal
 
Silicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDBSilicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDBDaniel Coupal
 

Mehr von Daniel Coupal (6)

MongoDB.Live 2020 - Advanced Schema Design Patterns
MongoDB.Live 2020  - Advanced Schema Design PatternsMongoDB.Live 2020  - Advanced Schema Design Patterns
MongoDB.Live 2020 - Advanced Schema Design Patterns
 
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDBMongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
MongoDB World 2019 - A Complete Methodology to Data Modeling for MongoDB
 
Silicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in productionSilicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in production
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
MMS: The Easiest Way to Run MongoDB
MMS: The Easiest Way to Run MongoDBMMS: The Easiest Way to Run MongoDB
MMS: The Easiest Way to Run MongoDB
 
Silicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDBSilicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDB
 

Kürzlich hochgeladen

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Kürzlich hochgeladen (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Semi Formal Model for Document Oriented Databases

  • 1. Semi Formal Model for Document Oriented Databases Daniel Coupal Universia.com 1
  • 2. Agenda 1.Why Having a Model? 2.Modeling Steps 3.Capturing the Model 4.Tools 2
  • 3. Why having a Model? • Documentation, common language • Repeatable process • Abstraction from database implementations • Support for tools • A document DB is supposed to be “schemaless”! • No! Having a schema is a good thing. Need to declare everything is the problem. 3
  • 4. What if you have many apps? Info about the schema is in the code of Application A Application B wants to read the data in the DB. Where is the description of what it can read, write, ...? 4
  • 5. Why we choose NoSQL? • Rewards • Huge amount of data • Cheap hardware • Blazing fast 5
  • 6. Why we choose NoSQL? • Rewards • Huge amount of data • Cheap hardware • Blazing fast • Compromises • No joins, no transactions, less integrity • Not as mature technology • Less tools 6 Tradeoff between Performance and Data Integrity
  • 7. NoSQL Little Secrets • No experience on maintaining databases and apps over the years, which is the most expensive activity in software development. • Not all the same vendors will be there in few years. • What if your DB is not maintained anymore? • What if there is a better DB available? 7
  • 8. NoSQL State of the Art • Designing by Example • Used in most tutorials • Works well on small examples, like blogs • Database with more tables needs a better way to capture the design 8
  • 9. { "_id" : ObjectId("508d27069cc1ae293b36928d"), "title" : "This is the title", "body" : "This is the body text.", "tags" : [ "chocolate", "spleen", "piano", "spatula" ], "created_date" : ISODate("2012-10-28T12:41:39.110Z"), "author_id" : ObjectId("508d280e9cc1ae293b36928e"), "category_id" : ObjectId("508d29709cc1ae293b369295"), "comments" : [ { "subject" : "This is comment 1", "body" : "This is the body of comment 1.", "author_id" : ObjectId("508d345f9cc1ae293b369296"), "created_date" : ISODate("2012-10-28T13:34:23.929Z") }, { "subject" : "This is comment 2", "body" : "This is the body of comment 2.", "author_id" : ObjectId("508d34739cc1ae293b369297"), "created_date" : ISODate("2012-10-28T13:34:43.192Z") }, ] } 9 NoSQL State of the Art
  • 12. Northwind Doc Diagram 11 tables in those 5 collections No need for: - CustomerCustomerDemographics - EmployeeTerritories because they are N-to-N relationships, and don’t contain any data Products Suppliers OrdersEmployees Customers Customer Demographics Shippers OrderDetails Region Categories 12 Territories
  • 13. That was a bad example... • Why? 13
  • 14. That was a bad example... • Why? • With a document database, you don’t model data as your first step! • Data is modeled based on the usage • SQL’s model first approach leads to bad performance for every app. NOSQL does the opposite. 14
  • 15. Modeling Steps SQL NoSQL Goal Answer to Step 1 Step 2 Step 3 Step 4 general usage current usage what answer do I have? what questions do I have? model data write queries write application add indexes write queries model data add indexes write application 15
  • 16. Step 1: Write Queries • Basic fields to retrieve • Frequency of the query, requested speed • Criticality of the query for the system • Design notes ➡ Sort the queries by importance 16
  • 17. Step 2: Add Indexes • Which indexes do you need for the queries to go fast? • Attributes of your indexes 17
  • 18. Step 3: Model Data • List the collections • How many documents per collection? ➡ NoSQL is all about size and performance, no? • Attributes on the collections (capped, ...) • List the fields, their types, constraints ➡ Only for the important fields 18
  • 19. Step 4: Write Application • Integration code/driver/queries/database • Balance between using the product functionality and isolating the layer that deals with the database. • Interesting new tools to normalize to a common query language: JSONiq, BigSQL, ... 19
  • 20. Capturing the Model • JSON is a cool format! • Your document database is a cool storage facility! • Language for the model: JSON Schema • supports things like: types, cardinality, references, acceptable values, ... 20
  • 21. JSON Schema { "address": { "streetAddress": "21 2nd Street", "city":"New York" }, "phoneNumber": [ { "type":"home", "number":"212 555-1234" } ] } { "type": "object", "properties": { "address": { "type": "object", "properties": { "city": { "type": "string" }, "streetAddress": { "type": "string" } } }, "phoneNumber": { "type": "array", "items": { "properties": { "number": { "type": "string" }, "type": { "type": "string" } } } } } } 21
  • 22. Model: Query • Use: • the native DB notation • or use SQL (everyone can read SQL) • Avoid joins!!! • Example: • Product by ProductID, ProductName, SupplierID • Order by OrderID, CustomerID, ContactName • Customer by CustomerID, ContactName, OrderID 22
  • 23. Example 23 { ! "id" : "REQ002", ! "name" : "Get product by name", ! "n" : “20000/day”, “t” : “2 ms”, ! "notes" : [ ! ! "User asking about a product availability by product name" ! ], ! "sqlquery" : "select * from product where product.ProductName = abcde", ! "mongoquery" : { ! ! "ProductName" : "abcde" ! } }
  • 24. Model: Index • Again, use the native DB notation • Example: • Product.ProductID, .ProductName, .SupplierID • Order.OrderID, .CustomerID, .ContactName • Customer by .CustomerID, .ContactName, .OrderID • Why is it useful, it looks so trivial? • If written a tool can validate it or create estimates 24
  • 25. Example 25 { ! "id" : "REQ002", ! "name" : "Get product by name", ! "n" : “20000/day”, “t” : “2 ms”, ! "notes" : [ ! ! "User asking about a product availability by product name" ! ], ! "sqlquery" : "select * from product where product.ProductName = abcde", ! "mongoquery" : { ! ! "ProductName" : "abcde" ! }, ! "index" : { ! ! "collection" : "Products", ! ! "field" : "ProductName" ! } }
  • 26. Model: Data • Collection • One JSON-Schema document per collection • Fields for collection and database • Optionally, add a version number 26
  • 27. Example for ‘Orders’ 27 { “database” : “northwind”, “collection” : “Orders”, “version” : 1, "type":"object", "$schema": “http://json-schema.org/draft-03/schema”, "id": "http://jsonschema.net", “properties”: { "CustomerID": { "type":"string", "id": "http://jsonschema.net/CustomerID" }, “Details”: { "type":"array", "id": "http://jsonschema.net/Details", "items": { “type”: “object”, "id": "http://jsonschema.net/Details/0", “required”: [ “ProductID”, “Quantity” ], "properties": { "ProductID": { "type":"number", "id": "http://jsonschema.net/Details/0/ProductID" }, "Quantity": { “type”: “number", },
  • 28. Simpler... 28 { “database” : “northwind”, “collection” : “Orders”, “version” : 1, "type":"object", "properties": { "CustomerID": { "type":"string" }, "Details": { "type":"array", "items": { "type":"object", "properties": { "ProductID": { "type":"number" }, "Quantity": { "type":"number" }, ...
  • 29. Model: Versioning • Each modified version of a collection is a new document • db.<database>.find({“version:2”}) ➡shows all collections for version ‘2’ of the schema for the DB. 29
  • 30. Partial Schema • Example: you just want to validate the ‘version’ field which has values as ‘string’ and as ‘number’ 30 { "type": "object", "properties": { "version": { "type": "string", } } } { "version": 1.0, ... }, { "version": “1.0.1”, ... } JSON SchemaJSON
  • 31. Tools • Get some JSON Schema from JSON: • http://www.jsonschema.net/ • Validate your schema • http://jsonschemalint.com/ • https://github.com/dcoupal/godbtools.git • Validate/edit JSON • http://jsonlint.com/ or RoboMongo • Import SQL into NoSQL • Pentaho, Talend 31
  • 32. Tools considerations • NoSQL often relies on data being in RAM. Scanning all your data can make your dataset in memory “cold”, instead of “hot” • running incremental validations work better, ensure you have timestamps on insertions and updates 32
  • 34. “Eventual Integrity” • NoSQL have eventual consistency • With tools that validate and fix the data according to a set of rules, we get “eventual integrity” 34
  • 35. Tools to be developed • UI to manipulate a schema graphically • More Complete Validators: • constraints • relationships • Per language library to validate inserted/updated documents 35
  • 36. Conclusion: Take Aways • Design in this order: queries, indexes, data, application. • Capture your model outside the application. • Not having a schema is not a good thing! Use the attribute ‘schemaless’ wisely! 36 NoSQL Goal Answer to Step 1 Step 2 Step 3 Step 4 current usage what questions do I have? write queries add indexes model data write application