SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Got Documents?
AN EXPLORATION OF DOCUMENT DATABASES IN SOFTWARE
ARCHITECTURE
About Me
www.maggiepint.com
maggiepint@gmail.com
@maggiepinthttps://www.tempworks.com
Flavors
MONGODB, COUCHDB, RAVENDB, AND MORE
MongoDB
•Dominant player in document databases
•Runs on nearly all platforms
•Strongly Consistent in default configuration
•Indexes are similar to traditional SQL indexes in nature
•Stores data in customized Binary JSON (BSON) format that allows typing
•No support for cross-collection querying
•Client API’s available in tons of languages
•Must use a third party provider like SOLR for advanced search capabilities
CouchDB
•Stores documents in plain JSON format
•Eventually consistent
•Indexes are map-reduce and defined in Javascript
•Clients in many languages
•Runs on Linux, OSX and Windows
•CouchDB-Lucene provides a Lucene integration for search
RavenDB
•Stores documents in plain JSON format
•Eventually consistent
•Indexes are built on Lucene. Lucene search is native to RavenDB.
•Server only runs on Windows
•.NET, Java, and HTTP Clients
•Limited support for cross-collection querying
Other Players
•Azure DocumentDB
• Very new product from Microsoft
•ReactDB
• Open source project that integrates push notifications into the database
•Cloudant
• IBM proprietary implementation of CouchDB
Architectural
Considerations
How do document databases work?
•Stores related data in a single document
•Usually uses JSON format for documents
•Enables the storage of complex object graphs together, instead of normalizing data out into
tables
•Stores documents in collections of the same type
•Allows querying within collections
•Does not typically allow querying across collections
•Offers high availability at the cost of consistency
Consideration: Schema Free
PROS
Easy to add properties
Simple migrations
Tolerant of differing data
CONS
Have to account for properties being missing
ACID
Atomicity
◦ Each transaction is all or nothing
Consistency
◦ Any transaction brings the database from one valid state to another
Isolation
◦ System ensures that transactions operated concurrently bring the database to the same state as if they
had been operated serially
Durability
◦ Once a transaction is committed, it remains so even in the event of power loss, etc
ACID in Document Databases
•Traditional transaction support is not available in any document database (except Raven)
•Document databases do support something like transactions within the scope of a document
•This makes document databases generally inappropriate for a wide variety of applications
• Do a google search for FlexCoin
•RavenDB is very close to ACID, but the community doesn’t agree on whether it is ACID
Consideration: Non-Acid
PROS
Performance Gain
CONS
No isolation means that concurrent operations
can affect each other
No way to guarantee that operations succeed
or fail together
Case Study: Survey
System
Requirements
•An administration area is used to define ‘Surveys’.
• Surveys have Questions
• Questions have answers
•Surveys can be administrated in sets called workflows
•When a survey changes, this change can only apply to surveys moving forward
• Because of this, each user must receive a survey ‘instance’ to track the version of the survey he/she got
A Traditional SQL Schema
•With various other requirements not described here, this schema came out to 83 tables
•For one of our heaviest usage clients, the average user would have 119 answers in the ‘Saved
Answer’ table
•With over 200,000 users after two years of use, the ‘Saved Answer’ table had 24,014,330 rows
•This table was both read and write heavy, so it was extremely difficult to define effective SQL
indexes
•The hardware cost for these SQL servers was astronomical
•This sucked
Designing Documents
•An aggregate is a collection of objects that can be treated as one
•An aggregate root is the object that contains all other objects inside of it
•When designing document schema, find your aggregates and create documents around them
•If you have an entity, it should be persisted as it’s own document because you will likely have to
store references to it
Survey System Design
•A combination SQL and Document DB design was used
•Survey Templates (one type of entity) were put into the SQL Database
•When a survey was assigned to a user as part of a workflow (another entity, and also an
aggregate), it’s data at that time was put into the document database
•The user’s responses were saved as part of the workflow document
•Reading a user’s application data became as simple as making one request for her workflow
document
Consideration: Models Aggregates Well
PROS
Improves performance by reducing lookups
Allows for easy persistence of object oriented
designs
CONS
none
Sharding
•Sharding is the practice of distributing data across multiple servers
•All major document database providers support sharding natively
•Document Databases are ideal for sharding because document data is self contained (less need
to worry about a query having to run on two servers)
•Sharding is usually accomplished by selecting a shard key for a collection, and allowing the
collection to be distributed to different nodes based on that key
•Tenant Id and geographic regions are typical choices for shard keys
Replication
•All major document database providers support replication
•In most replication setups, a primary node takes all write operations, and a secondary node
asynchronously replicates these write operations
•In the event of a failure of the primary, the secondary begins to take write operations
•MongoDB can be configured to allow reads from secondaries as a performance optimization,
resulting in eventual instead of strong consistency
Consideration: Scaling Out
PROS
Allows hardware to be scaled horizontally
Ensures very high availability
CONS
Consistency is sacrificed
Survey System: End Result
•Each user is associated with about 20 documents
•Documents are distributed across multiple databases using sharding
•Master/Master replication is used to ensure extremely high availability
•There have been no database performance issues in the year and a half the app has been in
production
•Because there is no schema migration concern, deploying updates has been drastically
simplified
•Hardware cost is reasonable (but not cheap)
Indexes
•All document databases support some form of indexing to improve query performance
•Some document databases do not allow querying without an index
•In general, you shouldn’t query without an index anyways
Consideration: Indexes
PROS
Improve performance of queries
CONS
Queries cannot reasonably be issued without
an index so indexes must frequently be
defined and deployed
Consideration: Eventual Consistency
PROS
Optimizes performance by allowing data
transfer to be a background process
CONS
Requires entire team to be aware of eventual
consistency implications
Case Study 2: CRM
CRM Requirements
•Track customers and basic information about them
•Track contacts and basic information about them
•Track sales deals and where they are in the pipeline
•Track orders generated from sales deals
•Track user tasks
Customers and Their Deals
•Customers and Deals are both entities, which is to say that they have distinct identity
•For this reason, Deals and Customer should be two separate collections
•There is no native support for cross-collection querying in most Document Databases
• The cross-collection querying support in RavenDB can have performance issues
Consideration: One document per
interaction
PROS
Improves performance
Encourages modeling aggregates well
CONS
Not actually achievable in most cases
Searching Deals by Customer Name
•The deal document must contain a denormalized customer object with the customer’s ID and
name
•We have a choice to make with this denormalization
• Allow the denormalization to just be wrong in the event the customer name is changed
• Maintain the denormalization when the customer name is changed
Denormalization Considerations
•Is stale data acceptable? This is the best option in all cases where it is possible.
•If stale data is unacceptable, how many documents are likely to need update when a change is
made? How many collections? How often are changes going to be made?
•Using an event bus to move denormalization updates to a background process can be very
beneficial if failure of an update isn’t critical for the user to know
Consideration: Models Relationships
Poorly
PROS
None
CONS
Stale (out of date) data must be accepted in
the system
Large amounts of boilerplate code must be
written to maintain denormalizations
In certain circumstances a queuing/eventing
system is unavoidable
Consideration: No Foreign Key
Constraints
PROS
Don’t have to define foreign key constraints
CONS
No built in checks for data consistency
Consideration: Administration
PROS
Generally less involved than SQL
CONS
Server performance must be monitored
Hardware must be maintained
Index processes must be tuned
Settings must be tweaked
Consideration Recap
•Schema Free
•Non-Acid
•Models Aggregates Well
•Scales out well
•All queries must be indexed
•Eventual Consistency
•One document per interaction
•Models relationships poorly
•No foreign key constraints
•Requires administration
RavenDB Bonus Section
ACID
•RavenDB has a session that allows multiple documents to be written as a transaction
•Keep in mind, reads from indexes are still eventually consistent
•http://ayende.com/blog/164066/ravendb-acid-base
Eventual Consistency
•Issues with eventual consistency can be circumvented by using the wait for non-stale results
functionality
•Waiting for non-stale results can result in long wait times
•Waiting for non-stale results with a timeout can result in no results
Load Document
•RavenDB has limited support for cross collection querying in the form of using LoadDocument
•This eliminates some of the concerns with the deals by customer name search example
•On Raven’s website they warn that injudicious use of LoadDocument can result in some very
expensive computations
Patching
•Raven supports partial document updates on a collection of documents using the Patching API
•This can be extremely helpful for maintaining denormalizations
•Patching is not transactional
Lucene Search
•RavenDB’s indexes are built on Lucene
•This allows easy full text search with term weighting and proximity searching
…nerds like us are allowed to be unironically
enthusiastic about stuff… Nerds are allowed to
love stuff, like jump-up-and-down-in-the-chair-
can’t-control-yourself love it.
-John Green

Weitere ähnliche Inhalte

Was ist angesagt?

Brk3288 sql server v.next with support on linux, windows and containers was...
Brk3288 sql server v.next with support on linux, windows and containers   was...Brk3288 sql server v.next with support on linux, windows and containers   was...
Brk3288 sql server v.next with support on linux, windows and containers was...Bob Ward
 
SQL Server Tuning to Improve Database Performance
SQL Server Tuning to Improve Database PerformanceSQL Server Tuning to Improve Database Performance
SQL Server Tuning to Improve Database PerformanceMark Ginnebaugh
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 editionBob Ward
 
Enterprise Architect's view of Couchbase 4.0 with N1QL
Enterprise Architect's view of Couchbase 4.0 with N1QLEnterprise Architect's view of Couchbase 4.0 with N1QL
Enterprise Architect's view of Couchbase 4.0 with N1QLKeshav Murthy
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and dockerBob Ward
 
SQL Server 2016 novelties
SQL Server 2016 noveltiesSQL Server 2016 novelties
SQL Server 2016 noveltiesMSDEVMTL
 
Sql vs NO-SQL database differences explained
Sql vs NO-SQL database differences explainedSql vs NO-SQL database differences explained
Sql vs NO-SQL database differences explainedSatya Pal
 
Sql server performance tuning and optimization
Sql server performance tuning and optimizationSql server performance tuning and optimization
Sql server performance tuning and optimizationManish Rawat
 
SQL Server 2017 Enhancements You Need To Know
SQL Server 2017 Enhancements You Need To KnowSQL Server 2017 Enhancements You Need To Know
SQL Server 2017 Enhancements You Need To KnowQuest
 
Inside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTPInside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTPBob Ward
 
An Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media PlatformAn Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media PlatformMongoDB
 
Incorta spark integration
Incorta spark integrationIncorta spark integration
Incorta spark integrationDylan Wan
 
Experience SQL Server 2017: The Modern Data Platform
Experience SQL Server 2017: The Modern Data PlatformExperience SQL Server 2017: The Modern Data Platform
Experience SQL Server 2017: The Modern Data PlatformBob Ward
 

Was ist angesagt? (20)

Brk3288 sql server v.next with support on linux, windows and containers was...
Brk3288 sql server v.next with support on linux, windows and containers   was...Brk3288 sql server v.next with support on linux, windows and containers   was...
Brk3288 sql server v.next with support on linux, windows and containers was...
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
SQL Server Tuning to Improve Database Performance
SQL Server Tuning to Improve Database PerformanceSQL Server Tuning to Improve Database Performance
SQL Server Tuning to Improve Database Performance
 
A to z for sql azure databases
A to z for sql azure databasesA to z for sql azure databases
A to z for sql azure databases
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
 
Enterprise Architect's view of Couchbase 4.0 with N1QL
Enterprise Architect's view of Couchbase 4.0 with N1QLEnterprise Architect's view of Couchbase 4.0 with N1QL
Enterprise Architect's view of Couchbase 4.0 with N1QL
 
Polyglot Persistence
Polyglot Persistence Polyglot Persistence
Polyglot Persistence
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and docker
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 
SQL Server 2016 novelties
SQL Server 2016 noveltiesSQL Server 2016 novelties
SQL Server 2016 novelties
 
Sql vs NO-SQL database differences explained
Sql vs NO-SQL database differences explainedSql vs NO-SQL database differences explained
Sql vs NO-SQL database differences explained
 
Sql server performance tuning and optimization
Sql server performance tuning and optimizationSql server performance tuning and optimization
Sql server performance tuning and optimization
 
SQL Server 2017 Enhancements You Need To Know
SQL Server 2017 Enhancements You Need To KnowSQL Server 2017 Enhancements You Need To Know
SQL Server 2017 Enhancements You Need To Know
 
Nosql
NosqlNosql
Nosql
 
Nosql
NosqlNosql
Nosql
 
Inside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTPInside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTP
 
An Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media PlatformAn Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media Platform
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Incorta spark integration
Incorta spark integrationIncorta spark integration
Incorta spark integration
 
Experience SQL Server 2017: The Modern Data Platform
Experience SQL Server 2017: The Modern Data PlatformExperience SQL Server 2017: The Modern Data Platform
Experience SQL Server 2017: The Modern Data Platform
 

Ähnlich wie Got documents - The Raven Bouns Edition

Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Lucas Jellema
 
ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...
ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...
ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...GoQA
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfRob Winters
 
Make Text Search "Work" for Your Apps - JavaOne 2013
Make Text Search "Work" for Your Apps - JavaOne 2013Make Text Search "Work" for Your Apps - JavaOne 2013
Make Text Search "Work" for Your Apps - JavaOne 2013javagroup2006
 
01-Database Administration and Management.pdf
01-Database Administration and Management.pdf01-Database Administration and Management.pdf
01-Database Administration and Management.pdfTOUSEEQHAIDER14
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...MongoDB
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
Microsoft Azure DocumentDB - Global Azure Bootcamp 2016
Microsoft Azure DocumentDB -  Global Azure Bootcamp 2016Microsoft Azure DocumentDB -  Global Azure Bootcamp 2016
Microsoft Azure DocumentDB - Global Azure Bootcamp 2016Sunny Sharma
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourcePr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourceTerry Bunio
 
Monitorando performance no Azure SQL Database
Monitorando performance no Azure SQL DatabaseMonitorando performance no Azure SQL Database
Monitorando performance no Azure SQL DatabaseVitor Fava
 
Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Data Virtualization Reference Architectures: Correctly Architecting your Solu...Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Data Virtualization Reference Architectures: Correctly Architecting your Solu...Denodo
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDBMongoDB
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Databasenehabsairam
 

Ähnlich wie Got documents - The Raven Bouns Edition (20)

Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
 
Mongo db 3.4 Overview
Mongo db 3.4 OverviewMongo db 3.4 Overview
Mongo db 3.4 Overview
 
ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...
ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...
ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
 
Make Text Search "Work" for Your Apps - JavaOne 2013
Make Text Search "Work" for Your Apps - JavaOne 2013Make Text Search "Work" for Your Apps - JavaOne 2013
Make Text Search "Work" for Your Apps - JavaOne 2013
 
01-Database Administration and Management.pdf
01-Database Administration and Management.pdf01-Database Administration and Management.pdf
01-Database Administration and Management.pdf
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
dbms introduction.pptx
dbms introduction.pptxdbms introduction.pptx
dbms introduction.pptx
 
Microsoft Azure DocumentDB - Global Azure Bootcamp 2016
Microsoft Azure DocumentDB -  Global Azure Bootcamp 2016Microsoft Azure DocumentDB -  Global Azure Bootcamp 2016
Microsoft Azure DocumentDB - Global Azure Bootcamp 2016
 
Foundations of business intelligence databases and information management
Foundations of business intelligence databases and information managementFoundations of business intelligence databases and information management
Foundations of business intelligence databases and information management
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourcePr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open source
 
Monitorando performance no Azure SQL Database
Monitorando performance no Azure SQL DatabaseMonitorando performance no Azure SQL Database
Monitorando performance no Azure SQL Database
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Data Virtualization Reference Architectures: Correctly Architecting your Solu...Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Data Virtualization Reference Architectures: Correctly Architecting your Solu...
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 

Mehr von Maggie Pint

Programming in the 4th Dimension
Programming in the 4th DimensionProgramming in the 4th Dimension
Programming in the 4th DimensionMaggie Pint
 
Maintaining maintainers(copy)
Maintaining maintainers(copy)Maintaining maintainers(copy)
Maintaining maintainers(copy)Maggie Pint
 
MomentJS at SeattleJS
MomentJS at SeattleJSMomentJS at SeattleJS
MomentJS at SeattleJSMaggie Pint
 
That Conference Date and Time
That Conference Date and TimeThat Conference Date and Time
That Conference Date and TimeMaggie Pint
 
Date and Time MomentJS Edition
Date and Time MomentJS EditionDate and Time MomentJS Edition
Date and Time MomentJS EditionMaggie Pint
 
Date and Time Odds Ends Oddities
Date and Time Odds Ends OdditiesDate and Time Odds Ends Oddities
Date and Time Odds Ends OdditiesMaggie Pint
 

Mehr von Maggie Pint (6)

Programming in the 4th Dimension
Programming in the 4th DimensionProgramming in the 4th Dimension
Programming in the 4th Dimension
 
Maintaining maintainers(copy)
Maintaining maintainers(copy)Maintaining maintainers(copy)
Maintaining maintainers(copy)
 
MomentJS at SeattleJS
MomentJS at SeattleJSMomentJS at SeattleJS
MomentJS at SeattleJS
 
That Conference Date and Time
That Conference Date and TimeThat Conference Date and Time
That Conference Date and Time
 
Date and Time MomentJS Edition
Date and Time MomentJS EditionDate and Time MomentJS Edition
Date and Time MomentJS Edition
 
Date and Time Odds Ends Oddities
Date and Time Odds Ends OdditiesDate and Time Odds Ends Oddities
Date and Time Odds Ends Oddities
 

Kürzlich hochgeladen

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Got documents - The Raven Bouns Edition

  • 1. Got Documents? AN EXPLORATION OF DOCUMENT DATABASES IN SOFTWARE ARCHITECTURE
  • 3.
  • 5. MongoDB •Dominant player in document databases •Runs on nearly all platforms •Strongly Consistent in default configuration •Indexes are similar to traditional SQL indexes in nature •Stores data in customized Binary JSON (BSON) format that allows typing •No support for cross-collection querying •Client API’s available in tons of languages •Must use a third party provider like SOLR for advanced search capabilities
  • 6. CouchDB •Stores documents in plain JSON format •Eventually consistent •Indexes are map-reduce and defined in Javascript •Clients in many languages •Runs on Linux, OSX and Windows •CouchDB-Lucene provides a Lucene integration for search
  • 7. RavenDB •Stores documents in plain JSON format •Eventually consistent •Indexes are built on Lucene. Lucene search is native to RavenDB. •Server only runs on Windows •.NET, Java, and HTTP Clients •Limited support for cross-collection querying
  • 8. Other Players •Azure DocumentDB • Very new product from Microsoft •ReactDB • Open source project that integrates push notifications into the database •Cloudant • IBM proprietary implementation of CouchDB
  • 10. How do document databases work? •Stores related data in a single document •Usually uses JSON format for documents •Enables the storage of complex object graphs together, instead of normalizing data out into tables •Stores documents in collections of the same type •Allows querying within collections •Does not typically allow querying across collections •Offers high availability at the cost of consistency
  • 11. Consideration: Schema Free PROS Easy to add properties Simple migrations Tolerant of differing data CONS Have to account for properties being missing
  • 12. ACID Atomicity ◦ Each transaction is all or nothing Consistency ◦ Any transaction brings the database from one valid state to another Isolation ◦ System ensures that transactions operated concurrently bring the database to the same state as if they had been operated serially Durability ◦ Once a transaction is committed, it remains so even in the event of power loss, etc
  • 13. ACID in Document Databases •Traditional transaction support is not available in any document database (except Raven) •Document databases do support something like transactions within the scope of a document •This makes document databases generally inappropriate for a wide variety of applications • Do a google search for FlexCoin •RavenDB is very close to ACID, but the community doesn’t agree on whether it is ACID
  • 14. Consideration: Non-Acid PROS Performance Gain CONS No isolation means that concurrent operations can affect each other No way to guarantee that operations succeed or fail together
  • 16. Requirements •An administration area is used to define ‘Surveys’. • Surveys have Questions • Questions have answers •Surveys can be administrated in sets called workflows •When a survey changes, this change can only apply to surveys moving forward • Because of this, each user must receive a survey ‘instance’ to track the version of the survey he/she got
  • 17. A Traditional SQL Schema •With various other requirements not described here, this schema came out to 83 tables •For one of our heaviest usage clients, the average user would have 119 answers in the ‘Saved Answer’ table •With over 200,000 users after two years of use, the ‘Saved Answer’ table had 24,014,330 rows •This table was both read and write heavy, so it was extremely difficult to define effective SQL indexes •The hardware cost for these SQL servers was astronomical •This sucked
  • 18. Designing Documents •An aggregate is a collection of objects that can be treated as one •An aggregate root is the object that contains all other objects inside of it •When designing document schema, find your aggregates and create documents around them •If you have an entity, it should be persisted as it’s own document because you will likely have to store references to it
  • 19. Survey System Design •A combination SQL and Document DB design was used •Survey Templates (one type of entity) were put into the SQL Database •When a survey was assigned to a user as part of a workflow (another entity, and also an aggregate), it’s data at that time was put into the document database •The user’s responses were saved as part of the workflow document •Reading a user’s application data became as simple as making one request for her workflow document
  • 20. Consideration: Models Aggregates Well PROS Improves performance by reducing lookups Allows for easy persistence of object oriented designs CONS none
  • 21. Sharding •Sharding is the practice of distributing data across multiple servers •All major document database providers support sharding natively •Document Databases are ideal for sharding because document data is self contained (less need to worry about a query having to run on two servers) •Sharding is usually accomplished by selecting a shard key for a collection, and allowing the collection to be distributed to different nodes based on that key •Tenant Id and geographic regions are typical choices for shard keys
  • 22. Replication •All major document database providers support replication •In most replication setups, a primary node takes all write operations, and a secondary node asynchronously replicates these write operations •In the event of a failure of the primary, the secondary begins to take write operations •MongoDB can be configured to allow reads from secondaries as a performance optimization, resulting in eventual instead of strong consistency
  • 23. Consideration: Scaling Out PROS Allows hardware to be scaled horizontally Ensures very high availability CONS Consistency is sacrificed
  • 24. Survey System: End Result •Each user is associated with about 20 documents •Documents are distributed across multiple databases using sharding •Master/Master replication is used to ensure extremely high availability •There have been no database performance issues in the year and a half the app has been in production •Because there is no schema migration concern, deploying updates has been drastically simplified •Hardware cost is reasonable (but not cheap)
  • 25.
  • 26. Indexes •All document databases support some form of indexing to improve query performance •Some document databases do not allow querying without an index •In general, you shouldn’t query without an index anyways
  • 27. Consideration: Indexes PROS Improve performance of queries CONS Queries cannot reasonably be issued without an index so indexes must frequently be defined and deployed
  • 28.
  • 29. Consideration: Eventual Consistency PROS Optimizes performance by allowing data transfer to be a background process CONS Requires entire team to be aware of eventual consistency implications
  • 31. CRM Requirements •Track customers and basic information about them •Track contacts and basic information about them •Track sales deals and where they are in the pipeline •Track orders generated from sales deals •Track user tasks
  • 32. Customers and Their Deals •Customers and Deals are both entities, which is to say that they have distinct identity •For this reason, Deals and Customer should be two separate collections •There is no native support for cross-collection querying in most Document Databases • The cross-collection querying support in RavenDB can have performance issues
  • 33. Consideration: One document per interaction PROS Improves performance Encourages modeling aggregates well CONS Not actually achievable in most cases
  • 34. Searching Deals by Customer Name •The deal document must contain a denormalized customer object with the customer’s ID and name •We have a choice to make with this denormalization • Allow the denormalization to just be wrong in the event the customer name is changed • Maintain the denormalization when the customer name is changed
  • 35. Denormalization Considerations •Is stale data acceptable? This is the best option in all cases where it is possible. •If stale data is unacceptable, how many documents are likely to need update when a change is made? How many collections? How often are changes going to be made? •Using an event bus to move denormalization updates to a background process can be very beneficial if failure of an update isn’t critical for the user to know
  • 36. Consideration: Models Relationships Poorly PROS None CONS Stale (out of date) data must be accepted in the system Large amounts of boilerplate code must be written to maintain denormalizations In certain circumstances a queuing/eventing system is unavoidable
  • 37. Consideration: No Foreign Key Constraints PROS Don’t have to define foreign key constraints CONS No built in checks for data consistency
  • 38.
  • 39. Consideration: Administration PROS Generally less involved than SQL CONS Server performance must be monitored Hardware must be maintained Index processes must be tuned Settings must be tweaked
  • 40. Consideration Recap •Schema Free •Non-Acid •Models Aggregates Well •Scales out well •All queries must be indexed •Eventual Consistency •One document per interaction •Models relationships poorly •No foreign key constraints •Requires administration
  • 42. ACID •RavenDB has a session that allows multiple documents to be written as a transaction •Keep in mind, reads from indexes are still eventually consistent •http://ayende.com/blog/164066/ravendb-acid-base
  • 43. Eventual Consistency •Issues with eventual consistency can be circumvented by using the wait for non-stale results functionality •Waiting for non-stale results can result in long wait times •Waiting for non-stale results with a timeout can result in no results
  • 44. Load Document •RavenDB has limited support for cross collection querying in the form of using LoadDocument •This eliminates some of the concerns with the deals by customer name search example •On Raven’s website they warn that injudicious use of LoadDocument can result in some very expensive computations
  • 45. Patching •Raven supports partial document updates on a collection of documents using the Patching API •This can be extremely helpful for maintaining denormalizations •Patching is not transactional
  • 46. Lucene Search •RavenDB’s indexes are built on Lucene •This allows easy full text search with term weighting and proximity searching
  • 47. …nerds like us are allowed to be unironically enthusiastic about stuff… Nerds are allowed to love stuff, like jump-up-and-down-in-the-chair- can’t-control-yourself love it. -John Green