SlideShare a Scribd company logo
1 of 29
Modeling with Document
Databases:
5 Key Patterns
Dan Sullivan, Principal
DS Applied Technologies
Enterprise Data World 2015
Washington, D.C.
April 1, 2015
My Background
 Data Architect / Engineer
 NoSQL and relational data
modeler
 Big data
 Analytics, machine learning
and text mining
 Cloud computing

 Computational Biologist
 Author
 No SQL for Mere Mortals
 Contributor to TechTarget
Overview
 Patterns and Data Modeling
 5 Key Patterns
 Anti-Patterns to Avoid
 Last suggestions
{
--------------
--------------
--------------
{ ------
------ }
}
[ ----------- ]
}
Document
{
--------------
--------------
--------------
{ ------
------ }
}
[ ----------- ]
}
Document
{
--------------
--------------
--------------
{ ------
------ }
}
[ ----------- ]
}
Document
{
--------------
--------------
--------------
{ ------
------ }
}
[ ----------- ]
}
Document
Patterns and Data Modeling
Schema-less <> Model-less
 Schema-less Document
Databases
 No fixed schema
 Polymorphic documents

 ...however, not a Design
Free for All
 Queries drives organization
 Performance Considerations
 Long-term Maintenance

 Middle Ground: Data
Model Patterns
 Reusable methods for
organizing data
 Model is implicit in
document structures
Patterns
 Commonly used structure and
organization
 Abstraction, not implementation
specific
 Popularized by “Gang of Four”
 Applied to Relational Databases
by David C. Hay
 Applies to NoSQL
Patterns
 Pattern 1: One-to-Many
 Pattern 2: Many-to-Many
 Pattern 3: Trees with References
 Pattern 4: Trees with Materialized Views
 Pattern 5: Entity Aggregation
Pattern 1: One-to-Many
 Embed Documents
 Multiple documents
embedded
 “Many” attributes stored
with “One” document
 Pros
 Single fetch returns
primary and related data
 Might improve
performance
 Simplifies application
code
 Cons
 Increases document size
 Might degrade
performance
{
OrderID: 1837373,
customer : {Name: 'Jane Lox'
Addr: '123 Main St'
City: 'Boston'
State: 'MA'},
orderItem:{ Sku: 38383838,
Descr: 'Black chair'},
orderItem:{ Sku: 2872636,
Descr: 'Glass desk'},
orderItem:{ Sku: 4747433,
Descr: 'USB Drive 32GB''}
}
One-to-Many Considerations
 Query attributes in
embedded documents?
 Support for indexing
embedded documents?
 Potential for arbitrary
growth after record
created?
 Need for atomic writes?
Patterns
 Pattern 1: One-to-Many
 Pattern 2: Many-to-Many
 Pattern 3: Trees with References
 Pattern 4: Trees with Materialized Views
 Pattern 5: Entity Aggregation
Pattern 2: Many-to-Many
Employees
({empID: 1783,
pname: “Michelle”,
lname:”Jones”
projects: [487,973, 287]}
{empID: 9872,
pname: “Bob”,
lname:”Williams”
projects: [487,973, 121]})
Projects
({projID:121,
projName:'NoSQL Pilot'',
team: [9872, 2431,
{projID:487,
projName:'Customer Churn
Analysis'',
team: [1873,9872]})
References
 Minimizes redundancy
 Preserves integrity
 Reduces document growth
 Requires multiple reads
Pattern 2: Many-to-Many
Employee
{empID: 1783,
pname: “Michelle”,
lname:”Jones”
projects: [
{projID:121,
projName:'NoSQL Pilot''},
{projID:487,
projName:'Customer Churn
Analysis''}
]}
Project
{projID:121,
projName:'NoSQL Pilot'',
team: [
{ empID: 1783,
fname: “Michelle”,
lname:”Jones”},
{ empID: 9872,
fname: “Bob”,
lname:”Williams”}
]}
Embedded Documents
 Captures point in time data
 One document read retrieves
data
 Increases document growth
Many-to-Many Considerations
 References
 Minimizes redundancy
 Preserves integrity
 Reduces document growth
 Requires multiple reads
 Embedded Documents
 Captures point in time data
 One document read retrieves
data
 Increases document growth
Patterns
 Pattern 1: One-to-Many
 Pattern 2: Many-to-Many
 Pattern 3: Trees with References
 Pattern 4: Trees with Materialized Views
 Pattern 5: Entity Aggregation
Pattern 3: Trees with Parent & Child
References
 Trees
 Single root
document
 At most one parent
 No cycles
 Multiple Types
 Is-A
 Part-of
Pattern 3: Trees with References
Children Refs.
({orgUnitID:178,
orgUnitType: “Primary”,
orgUnitName:”P1”
children: [179,180]},
{orgUnitID:179,
orgUnitType: “Branch”,
orgUnitName:”B1”
children: [181,182]},
{orgUnitID:180,
orgUnitType: “Branch”,
orgUnitName:”B2”
children: [183,184]})
Parent Refs.
({orgUnitID:178,
orgUnitType: “Primary”,
orgUnitName:”P1”
parent: 177},
{orgUnitID:179,
orgUnitType: “Branch”,
orgUnitName:”B1”
parent: 178},
{orgUnitID:180,
orgUnitType: “Branch”,
orgUnitName:”B2”
parent: 178})
Tree Considerations
 Children reference allow for
top-down navigation
 Parent references allow for-
bottom up navigation
 Combination allow for
bottom-up and top-down
navigation
 Avoid large arrays
 Consider need for point in
time data
Patterns
 Pattern 1: One-to-Many
 Pattern 2: Many-to-Many
 Pattern 3: Trees with References
 Pattern 4: Trees with Materialized Views
 Pattern 5: Entity Aggregation
Pattern 4: Trees with Materialized
Paths
 Full path from document
to root is represented in
document
 Implement with arrays or
string
 Especially useful in
hierarchical queries, i.e. a
type and all its subtypes
Materialized Paths
{orderItemId: 1873,
prodType: “Pens”
prodHierarchy:[ “Product_Categories”,
”Office Supplies”,
”Writing Instruments”,
”Pens”]
}
Materialized Paths Considerations
 Support for multi-key
indexing of arrays
 Use of regular expressions
for pattern matching when
string is used
 Ability to utilize indexes
when string representation
Patterns
 Pattern 1: One-to-Many
 Pattern 2: Many-to-Many
 Pattern 3: Trees with References
 Pattern 4: Trees with Materialized Views
 Pattern 5: Entity Aggregation
Pattern 5: Entity Aggregation
 Entities with sub-types
 Relational models use
multiple tables
 Document models use
varying embedded
documents
 Source of
polymorphism
Entity Aggregation Polymorphic
Documents
{concertID: 132,
locDescr:'Small Bar',
price:”$30.00”,
performerName:”Rolling Stones”}
{concertID: 133,
locDescr: “PDX Jazz Festival”
price:”$75.00”
festivalStart: 15-Feb
festivalEnd: 25-Feb}
Entity Aggregation Considerations
 Aggregation vs Separate
Collections
 High level branching
 Low level branching
Anti-Patterns
Anti-Patterns
 Large arrays
 Significant growth in
document size
 Fetching more data than
needed
 Fear of data duplication
 Thinking SQL, using
NoSQL
 Normalizing without need
Closing Remarks
 Consider, is it worth it to
loose:
 SQL
 Multi-statement transactions
 Triggers
 Let queries drive model
 Consider full life-cycle
 Exploit polymorphism
Questions?

More Related Content

What's hot

Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Edureka!
 
Advanced Dimensional Modelling
Advanced Dimensional ModellingAdvanced Dimensional Modelling
Advanced Dimensional ModellingVincent Rainardi
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
الوحدة السادسة - قاعدة البيانات وادارتها
الوحدة السادسة - قاعدة البيانات وادارتهاالوحدة السادسة - قاعدة البيانات وادارتها
الوحدة السادسة - قاعدة البيانات وادارتهاAmin Abu Hammad
 
A short introduction to database systems.ppt
A short introduction to  database systems.pptA short introduction to  database systems.ppt
A short introduction to database systems.pptMuruly Krishan
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data WarehousingAlex Meadows
 
Bases de données NoSQL
Bases de données NoSQLBases de données NoSQL
Bases de données NoSQLSamy Dindane
 
Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleSajjad Zaheer
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeKent Graziano
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation Brett VanderPlaats
 
Database Management System, Lecture-1
Database Management System, Lecture-1Database Management System, Lecture-1
Database Management System, Lecture-1Sonia Mim
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceSnowflake Computing
 
Database systems - Chapter 2
Database systems - Chapter 2Database systems - Chapter 2
Database systems - Chapter 2shahab3
 
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...Dataiku
 

What's hot (20)

Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Advanced Dimensional Modelling
Advanced Dimensional ModellingAdvanced Dimensional Modelling
Advanced Dimensional Modelling
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
الوحدة السادسة - قاعدة البيانات وادارتها
الوحدة السادسة - قاعدة البيانات وادارتهاالوحدة السادسة - قاعدة البيانات وادارتها
الوحدة السادسة - قاعدة البيانات وادارتها
 
A short introduction to database systems.ppt
A short introduction to  database systems.pptA short introduction to  database systems.ppt
A short introduction to database systems.ppt
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
 
Oracle DB
Oracle DBOracle DB
Oracle DB
 
Data models
Data modelsData models
Data models
 
Bases de données NoSQL
Bases de données NoSQLBases de données NoSQL
Bases de données NoSQL
 
Dimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with ExampleDimensional Modeling Basic Concept with Example
Dimensional Modeling Basic Concept with Example
 
Snowflake Data Access.pptx
Snowflake Data Access.pptxSnowflake Data Access.pptx
Snowflake Data Access.pptx
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
 
DBMS Bascis
DBMS BascisDBMS Bascis
DBMS Bascis
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
Database Management System, Lecture-1
Database Management System, Lecture-1Database Management System, Lecture-1
Database Management System, Lecture-1
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
 
Database systems - Chapter 2
Database systems - Chapter 2Database systems - Chapter 2
Database systems - Chapter 2
 
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
 

Similar to Modeling with Document Database: 5 Key Patterns

Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...NoSQLmatters
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsMatias Cascallares
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB DatabaseTariqul islam
 
Squirrel – Enabling Accessible Analytics for All
Squirrel – Enabling Accessible Analytics for AllSquirrel – Enabling Accessible Analytics for All
Squirrel – Enabling Accessible Analytics for AllSudipta Mukherjee
 
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...IRJET Journal
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overviewAmit Juneja
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGijiert bestjournal
 
CS8091_BDA_Unit_V_NoSQL
CS8091_BDA_Unit_V_NoSQLCS8091_BDA_Unit_V_NoSQL
CS8091_BDA_Unit_V_NoSQLPalani Kumar
 
Multi-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresMulti-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresJiaheng Lu
 
Being RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceBeing RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceDavid Hoerster
 
Core data in Swfit
Core data in SwfitCore data in Swfit
Core data in Swfitallanh0526
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopDoiT International
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLijscai
 
A Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLA Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLIJSCAI Journal
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLijscai
 
A Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLA Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLIJSCAI Journal
 

Similar to Modeling with Document Database: 5 Key Patterns (20)

Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB Database
 
Squirrel – Enabling Accessible Analytics for All
Squirrel – Enabling Accessible Analytics for AllSquirrel – Enabling Accessible Analytics for All
Squirrel – Enabling Accessible Analytics for All
 
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
 
MongoDB and Schema Design
MongoDB and Schema DesignMongoDB and Schema Design
MongoDB and Schema Design
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
CS8091_BDA_Unit_V_NoSQL
CS8091_BDA_Unit_V_NoSQLCS8091_BDA_Unit_V_NoSQL
CS8091_BDA_Unit_V_NoSQL
 
Multi-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresMulti-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated Polystores
 
Being RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data PersistenceBeing RDBMS Free -- Alternate Approaches to Data Persistence
Being RDBMS Free -- Alternate Approaches to Data Persistence
 
Core data in Swfit
Core data in SwfitCore data in Swfit
Core data in Swfit
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On Workshop
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
 
A Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLA Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQL
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
 
A Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLA Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQL
 

More from Dan Sullivan, Ph.D.

How to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQueryHow to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQueryDan Sullivan, Ph.D.
 
With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?Dan Sullivan, Ph.D.
 
Getting Started with BigQuery ML
Getting Started with BigQuery MLGetting Started with BigQuery ML
Getting Started with BigQuery MLDan Sullivan, Ph.D.
 
Google Cloud Certifications & Machine Learning
Google Cloud Certifications & Machine LearningGoogle Cloud Certifications & Machine Learning
Google Cloud Certifications & Machine LearningDan Sullivan, Ph.D.
 
Unstructured text to structured data
Unstructured text to structured dataUnstructured text to structured data
Unstructured text to structured dataDan Sullivan, Ph.D.
 
A first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupA first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupDan Sullivan, Ph.D.
 
ACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False DichotomyACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False DichotomyDan Sullivan, Ph.D.
 
Big data, bioscience and the cloud biocatalyst june 2015 sullivan
Big data, bioscience and the cloud   biocatalyst june 2015 sullivanBig data, bioscience and the cloud   biocatalyst june 2015 sullivan
Big data, bioscience and the cloud biocatalyst june 2015 sullivanDan Sullivan, Ph.D.
 
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyTools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyDan Sullivan, Ph.D.
 
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2Dan Sullivan, Ph.D.
 
Text Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious DiseasesText Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious DiseasesDan Sullivan, Ph.D.
 
Limits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsLimits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsDan Sullivan, Ph.D.
 

More from Dan Sullivan, Ph.D. (13)

How to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQueryHow to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQuery
 
With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?
 
Getting Started with BigQuery ML
Getting Started with BigQuery MLGetting Started with BigQuery ML
Getting Started with BigQuery ML
 
Google Cloud Certifications & Machine Learning
Google Cloud Certifications & Machine LearningGoogle Cloud Certifications & Machine Learning
Google Cloud Certifications & Machine Learning
 
Unstructured text to structured data
Unstructured text to structured dataUnstructured text to structured data
Unstructured text to structured data
 
A first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupA first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetup
 
Text mining meets neural nets
Text mining meets neural netsText mining meets neural nets
Text mining meets neural nets
 
ACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False DichotomyACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False Dichotomy
 
Big data, bioscience and the cloud biocatalyst june 2015 sullivan
Big data, bioscience and the cloud   biocatalyst june 2015 sullivanBig data, bioscience and the cloud   biocatalyst june 2015 sullivan
Big data, bioscience and the cloud biocatalyst june 2015 sullivan
 
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyTools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
 
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
 
Text Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious DiseasesText Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious Diseases
 
Limits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsLimits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in Bioinformatics
 

Recently uploaded

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 

Recently uploaded (20)

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Modeling with Document Database: 5 Key Patterns

  • 1. Modeling with Document Databases: 5 Key Patterns Dan Sullivan, Principal DS Applied Technologies Enterprise Data World 2015 Washington, D.C. April 1, 2015
  • 2. My Background  Data Architect / Engineer  NoSQL and relational data modeler  Big data  Analytics, machine learning and text mining  Cloud computing   Computational Biologist  Author  No SQL for Mere Mortals  Contributor to TechTarget
  • 3. Overview  Patterns and Data Modeling  5 Key Patterns  Anti-Patterns to Avoid  Last suggestions { -------------- -------------- -------------- { ------ ------ } } [ ----------- ] } Document { -------------- -------------- -------------- { ------ ------ } } [ ----------- ] } Document { -------------- -------------- -------------- { ------ ------ } } [ ----------- ] } Document { -------------- -------------- -------------- { ------ ------ } } [ ----------- ] } Document
  • 4. Patterns and Data Modeling
  • 5. Schema-less <> Model-less  Schema-less Document Databases  No fixed schema  Polymorphic documents   ...however, not a Design Free for All  Queries drives organization  Performance Considerations  Long-term Maintenance   Middle Ground: Data Model Patterns  Reusable methods for organizing data  Model is implicit in document structures
  • 6. Patterns  Commonly used structure and organization  Abstraction, not implementation specific  Popularized by “Gang of Four”  Applied to Relational Databases by David C. Hay  Applies to NoSQL
  • 7. Patterns  Pattern 1: One-to-Many  Pattern 2: Many-to-Many  Pattern 3: Trees with References  Pattern 4: Trees with Materialized Views  Pattern 5: Entity Aggregation
  • 8. Pattern 1: One-to-Many  Embed Documents  Multiple documents embedded  “Many” attributes stored with “One” document  Pros  Single fetch returns primary and related data  Might improve performance  Simplifies application code  Cons  Increases document size  Might degrade performance { OrderID: 1837373, customer : {Name: 'Jane Lox' Addr: '123 Main St' City: 'Boston' State: 'MA'}, orderItem:{ Sku: 38383838, Descr: 'Black chair'}, orderItem:{ Sku: 2872636, Descr: 'Glass desk'}, orderItem:{ Sku: 4747433, Descr: 'USB Drive 32GB''} }
  • 9. One-to-Many Considerations  Query attributes in embedded documents?  Support for indexing embedded documents?  Potential for arbitrary growth after record created?  Need for atomic writes?
  • 10. Patterns  Pattern 1: One-to-Many  Pattern 2: Many-to-Many  Pattern 3: Trees with References  Pattern 4: Trees with Materialized Views  Pattern 5: Entity Aggregation
  • 11. Pattern 2: Many-to-Many Employees ({empID: 1783, pname: “Michelle”, lname:”Jones” projects: [487,973, 287]} {empID: 9872, pname: “Bob”, lname:”Williams” projects: [487,973, 121]}) Projects ({projID:121, projName:'NoSQL Pilot'', team: [9872, 2431, {projID:487, projName:'Customer Churn Analysis'', team: [1873,9872]}) References  Minimizes redundancy  Preserves integrity  Reduces document growth  Requires multiple reads
  • 12. Pattern 2: Many-to-Many Employee {empID: 1783, pname: “Michelle”, lname:”Jones” projects: [ {projID:121, projName:'NoSQL Pilot''}, {projID:487, projName:'Customer Churn Analysis''} ]} Project {projID:121, projName:'NoSQL Pilot'', team: [ { empID: 1783, fname: “Michelle”, lname:”Jones”}, { empID: 9872, fname: “Bob”, lname:”Williams”} ]} Embedded Documents  Captures point in time data  One document read retrieves data  Increases document growth
  • 13. Many-to-Many Considerations  References  Minimizes redundancy  Preserves integrity  Reduces document growth  Requires multiple reads  Embedded Documents  Captures point in time data  One document read retrieves data  Increases document growth
  • 14. Patterns  Pattern 1: One-to-Many  Pattern 2: Many-to-Many  Pattern 3: Trees with References  Pattern 4: Trees with Materialized Views  Pattern 5: Entity Aggregation
  • 15. Pattern 3: Trees with Parent & Child References  Trees  Single root document  At most one parent  No cycles  Multiple Types  Is-A  Part-of
  • 16. Pattern 3: Trees with References Children Refs. ({orgUnitID:178, orgUnitType: “Primary”, orgUnitName:”P1” children: [179,180]}, {orgUnitID:179, orgUnitType: “Branch”, orgUnitName:”B1” children: [181,182]}, {orgUnitID:180, orgUnitType: “Branch”, orgUnitName:”B2” children: [183,184]}) Parent Refs. ({orgUnitID:178, orgUnitType: “Primary”, orgUnitName:”P1” parent: 177}, {orgUnitID:179, orgUnitType: “Branch”, orgUnitName:”B1” parent: 178}, {orgUnitID:180, orgUnitType: “Branch”, orgUnitName:”B2” parent: 178})
  • 17. Tree Considerations  Children reference allow for top-down navigation  Parent references allow for- bottom up navigation  Combination allow for bottom-up and top-down navigation  Avoid large arrays  Consider need for point in time data
  • 18. Patterns  Pattern 1: One-to-Many  Pattern 2: Many-to-Many  Pattern 3: Trees with References  Pattern 4: Trees with Materialized Views  Pattern 5: Entity Aggregation
  • 19. Pattern 4: Trees with Materialized Paths  Full path from document to root is represented in document  Implement with arrays or string  Especially useful in hierarchical queries, i.e. a type and all its subtypes
  • 20. Materialized Paths {orderItemId: 1873, prodType: “Pens” prodHierarchy:[ “Product_Categories”, ”Office Supplies”, ”Writing Instruments”, ”Pens”] }
  • 21. Materialized Paths Considerations  Support for multi-key indexing of arrays  Use of regular expressions for pattern matching when string is used  Ability to utilize indexes when string representation
  • 22. Patterns  Pattern 1: One-to-Many  Pattern 2: Many-to-Many  Pattern 3: Trees with References  Pattern 4: Trees with Materialized Views  Pattern 5: Entity Aggregation
  • 23. Pattern 5: Entity Aggregation  Entities with sub-types  Relational models use multiple tables  Document models use varying embedded documents  Source of polymorphism
  • 24. Entity Aggregation Polymorphic Documents {concertID: 132, locDescr:'Small Bar', price:”$30.00”, performerName:”Rolling Stones”} {concertID: 133, locDescr: “PDX Jazz Festival” price:”$75.00” festivalStart: 15-Feb festivalEnd: 25-Feb}
  • 25. Entity Aggregation Considerations  Aggregation vs Separate Collections  High level branching  Low level branching
  • 27. Anti-Patterns  Large arrays  Significant growth in document size  Fetching more data than needed  Fear of data duplication  Thinking SQL, using NoSQL  Normalizing without need
  • 28. Closing Remarks  Consider, is it worth it to loose:  SQL  Multi-statement transactions  Triggers  Let queries drive model  Consider full life-cycle  Exploit polymorphism