SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
MODELING
A SOLID DATABASE
FOR YOUR RUBY FRAMEWORK
Zoran Majstorović
github.com/zmajstor
twitter.com/z0maj
Intro
You want to build an awesome, full-stack web app or
maybe just a backend (API) server, so you:
1. pick your favourite ruby framework
2. pick an ORM which plays nicely with your ruby
framework
3. pick a database which plays nicely with your ORM
Finally, you can start coding and modeling your data ...
Easy Challenge
Person Telephone
Joe 555-123-4567
Joe 555-192-1234
Joe 123-4567-123
Jack 555-099-0987
Jack 555-098-7654 - Ext 45
Jill 555-333-4444
Person Telephone
Joe 555-123-4567,
555-192-1234,
123-4567-123
Jack 555-099-0987,
555-098-7654 - Ext 45
Jill 555-333-4444
Normalizing a Database Design
Person Telephone
Joe 555-123-4567
Joe 555-192-1234
Joe 123-4567-123
Jack 555-099-0987
Jack 555-098-7654 - Ext 45
Jill 555-333-4444
Person Telephone
Joe 555-123-4567,
555-192-1234,
123-4567-123
Jack 555-099-0987,
555-098-7654 - Ext 45
Jill 555-333-4444
split the data into an "atomic" (i.e. indivisible) entities: single phone numbers
More about 1st Normal Form: https://en.wikipedia.org/wiki/First_normal_form
The Dilemma
Denormalization a.k.a. favor columns over tables … because we have to avoid “expensive joins” (?)
Are joins generally expensive? And what is the price for avoiding them?
Person Apartment
Joe A
Jack B
Jill B
Apartment Floor
A 1
B 2
X 3
Person Apartment Floor
Joe A 1
Jack B 2
Jill B 2
?
Normalizing a Database Design
Person Apartment Floor
Joe A 1
Jack B 2
Jill B 2
The functional dependency {Person} → {Floor} applies;
that is, if we know the person, we know the floor.
Furthermore:
{Person} → {Apartment}
{Apartment} does not → {Person}
{Apartment} → {floor}
Therefore {Person} → {Floor} is a transitive dependency.
Transitive dependency occurred because a
non-key attribute (Apartment) was determining
another non-key attribute (Floor).
3rd Normal Form (3NF)
1. Requiring existence of "the key" ensures that the table is in 1NF
2. requiring that non-key attributes be dependent on "the whole key" ensures
2NF
3. further requiring that non-key attributes be dependent on "nothing but the key"
ensures 3NF
Source: https://en.wikipedia.org/wiki/Third_normal_form
3rd Normal Form (3NF)
● improve database processing while minimizing storage costs
● ideal for online transaction processing (OLTP)
● most 3NF tables are free of update, insertion, and deletion anomalies*
__________________________
* other, few cases, affected by such anomalies usually fall short of the higher normal forms: 4NF or 5NF
Deletion Anomaly
Person Apartment Floor
Joe A 1
Jack B 2
Jill B 2
Joe moves out of the apartment A, so by
deleting Joe, we are also losing information of
the apartment A (that is on the 1st floor).
Update Anomaly
Person Apartment Floor
Joe A 1
Jack B 2
Jill B 2
Jill moves from the apartment B to the
apartment A, but for some reason, only
Apartment column got updated.
After some time, we can’t tell on which floor is
the Apartment A.Person Apartment Floor
Joe A 1
Jack B 2
Jill A 2
???
Insertion Anomaly
Jim moves into apartment X,
but where to lookup for the floor?
On which floor is the apartment X?
Person Apartment Floor
Joe A 1
Jack B 2
Jill B 2
Jim X ?
A database-management system (DBMS) can work only with the information that we put explicitly into
its tables.
Normalized table to meet the 3rd Normal Form
Person Apartment
Joe A
Jack B
Jill B
Apartment Floor
A 1
B 2
X 3
Person Apartment Floor
Joe A 1
Jack B 2
Jill B 2
“Partial Denormalization”
Person Apartment Floor
Joe A 1
Jack B 2
Jill B 2
Apartment Floor
A 1
B 2
X 3
Keep the transitive dependent columns, in combination with another “lookup table”
WAT???
Denormalizing to avoid “expensive joins”?
“It sounds convincing, but it doesn't hold water.”
Peter Wone’s epic post: http://stackoverflow.com/a/174047/3452582
From Introduction to Database Systems, which is the definitive textbook on database theory and
design, in its 8th edition:
● Some of them hold for special cases
● All of them fail to pay off for general use
● All of them are significantly worse for other special cases
ALWAYS normalise OLTP. Denormalise OLAP if you think it will help.
Common Sense
“Rules of normalization aren’t esoteric or complicated.
They’re really just a commonsense technique to
reduce redundancy and improve consistency of data.”
SQL Antipatterns by Bill Karwin
Copyright © 2014, The Pragmatic Bookshelf.
Summary
The application layer can become riddled with bugs if the data layer is too
permissive, so here are 5 simple rules for a solid DB design:
1. Normalize tables up to the 3NF (and don’t be afraid of SQL Joins)
2. Define foreign keys on DB level (ORM can’t be fully trusted on that)
3. Define UNIQUE index where needed (ORM can't guarantee data uniqueness)
4. Define other validations on DB level (e.g. Not Null, Default Value)
5. Define index on every foreign key and any column that will appear in any
where clause (for better query performance)
Sources
Wikipedia
● https://en.wikipedia.org/wiki/Transitive_dependency
● https://en.wikipedia.org/wiki/Third_normal_form
● https://en.wikipedia.org/wiki/Database_normalization
StackOverflow
● http://stackoverflow.com/a/174047/3452582
● http://stackoverflow.com/a/59522
Quora
● https://www.quora.com/What-is-the-difference-between-OLTP-and-OLAP
THE END

Weitere ähnliche Inhalte

Ähnlich wie RubyZG - Modeling a Solid Database

Programming Programming logic devicespptx
Programming Programming logic devicespptxProgramming Programming logic devicespptx
Programming Programming logic devicespptx
GhassanJ
 
Fortran & Link with Library & Brief Explanation of MKL BLAS
Fortran & Link with Library & Brief Explanation of MKL BLASFortran & Link with Library & Brief Explanation of MKL BLAS
Fortran & Link with Library & Brief Explanation of MKL BLAS
Jongsu "Liam" Kim
 

Ähnlich wie RubyZG - Modeling a Solid Database (20)

Debugging ZFS: From Illumos to Linux
Debugging ZFS: From Illumos to LinuxDebugging ZFS: From Illumos to Linux
Debugging ZFS: From Illumos to Linux
 
Determinism in finance
Determinism in financeDeterminism in finance
Determinism in finance
 
Programming Programming logic devicespptx
Programming Programming logic devicespptxProgramming Programming logic devicespptx
Programming Programming logic devicespptx
 
The tortoise and the ORM
The tortoise and the ORMThe tortoise and the ORM
The tortoise and the ORM
 
When DevOps and Networking Intersect by Brent Salisbury of socketplane.io
When DevOps and Networking Intersect by Brent Salisbury of socketplane.ioWhen DevOps and Networking Intersect by Brent Salisbury of socketplane.io
When DevOps and Networking Intersect by Brent Salisbury of socketplane.io
 
Simple network troubleshooting
Simple network troubleshootingSimple network troubleshooting
Simple network troubleshooting
 
Data guard & nologging new features in 18c
Data guard & nologging   new features in 18cData guard & nologging   new features in 18c
Data guard & nologging new features in 18c
 
Fortran & Link with Library & Brief Explanation of MKL BLAS
Fortran & Link with Library & Brief Explanation of MKL BLASFortran & Link with Library & Brief Explanation of MKL BLAS
Fortran & Link with Library & Brief Explanation of MKL BLAS
 
Securefile LOBs
Securefile LOBsSecurefile LOBs
Securefile LOBs
 
Virtual Machine Introspection with Xen
Virtual Machine Introspection with XenVirtual Machine Introspection with Xen
Virtual Machine Introspection with Xen
 
ZERO WIRE LOAD MODEL.pptx
ZERO WIRE LOAD MODEL.pptxZERO WIRE LOAD MODEL.pptx
ZERO WIRE LOAD MODEL.pptx
 
libinjection and sqli obfuscation, presented at OWASP NYC
libinjection and sqli obfuscation, presented at OWASP NYClibinjection and sqli obfuscation, presented at OWASP NYC
libinjection and sqli obfuscation, presented at OWASP NYC
 
Adding a BOLT pass
Adding a BOLT passAdding a BOLT pass
Adding a BOLT pass
 
[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...
[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...
[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...
 
PHP - Introduction to Advanced SQL
PHP - Introduction to Advanced SQLPHP - Introduction to Advanced SQL
PHP - Introduction to Advanced SQL
 
Sdn dell lab report v2
Sdn dell lab report v2Sdn dell lab report v2
Sdn dell lab report v2
 
MySQL Integral DB Design #JAB14
MySQL Integral DB Design #JAB14MySQL Integral DB Design #JAB14
MySQL Integral DB Design #JAB14
 
module 2 cn new.pptx
module 2 cn new.pptxmodule 2 cn new.pptx
module 2 cn new.pptx
 
Tainted LOB
Tainted LOBTainted LOB
Tainted LOB
 
PostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesPostgreSQL + ZFS best practices
PostgreSQL + ZFS best practices
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

RubyZG - Modeling a Solid Database

  • 1. MODELING A SOLID DATABASE FOR YOUR RUBY FRAMEWORK Zoran Majstorović github.com/zmajstor twitter.com/z0maj
  • 2. Intro You want to build an awesome, full-stack web app or maybe just a backend (API) server, so you: 1. pick your favourite ruby framework 2. pick an ORM which plays nicely with your ruby framework 3. pick a database which plays nicely with your ORM Finally, you can start coding and modeling your data ...
  • 3. Easy Challenge Person Telephone Joe 555-123-4567 Joe 555-192-1234 Joe 123-4567-123 Jack 555-099-0987 Jack 555-098-7654 - Ext 45 Jill 555-333-4444 Person Telephone Joe 555-123-4567, 555-192-1234, 123-4567-123 Jack 555-099-0987, 555-098-7654 - Ext 45 Jill 555-333-4444
  • 4. Normalizing a Database Design Person Telephone Joe 555-123-4567 Joe 555-192-1234 Joe 123-4567-123 Jack 555-099-0987 Jack 555-098-7654 - Ext 45 Jill 555-333-4444 Person Telephone Joe 555-123-4567, 555-192-1234, 123-4567-123 Jack 555-099-0987, 555-098-7654 - Ext 45 Jill 555-333-4444 split the data into an "atomic" (i.e. indivisible) entities: single phone numbers More about 1st Normal Form: https://en.wikipedia.org/wiki/First_normal_form
  • 5. The Dilemma Denormalization a.k.a. favor columns over tables … because we have to avoid “expensive joins” (?) Are joins generally expensive? And what is the price for avoiding them? Person Apartment Joe A Jack B Jill B Apartment Floor A 1 B 2 X 3 Person Apartment Floor Joe A 1 Jack B 2 Jill B 2 ?
  • 6. Normalizing a Database Design Person Apartment Floor Joe A 1 Jack B 2 Jill B 2 The functional dependency {Person} → {Floor} applies; that is, if we know the person, we know the floor. Furthermore: {Person} → {Apartment} {Apartment} does not → {Person} {Apartment} → {floor} Therefore {Person} → {Floor} is a transitive dependency. Transitive dependency occurred because a non-key attribute (Apartment) was determining another non-key attribute (Floor).
  • 7. 3rd Normal Form (3NF) 1. Requiring existence of "the key" ensures that the table is in 1NF 2. requiring that non-key attributes be dependent on "the whole key" ensures 2NF 3. further requiring that non-key attributes be dependent on "nothing but the key" ensures 3NF Source: https://en.wikipedia.org/wiki/Third_normal_form
  • 8. 3rd Normal Form (3NF) ● improve database processing while minimizing storage costs ● ideal for online transaction processing (OLTP) ● most 3NF tables are free of update, insertion, and deletion anomalies* __________________________ * other, few cases, affected by such anomalies usually fall short of the higher normal forms: 4NF or 5NF
  • 9. Deletion Anomaly Person Apartment Floor Joe A 1 Jack B 2 Jill B 2 Joe moves out of the apartment A, so by deleting Joe, we are also losing information of the apartment A (that is on the 1st floor).
  • 10. Update Anomaly Person Apartment Floor Joe A 1 Jack B 2 Jill B 2 Jill moves from the apartment B to the apartment A, but for some reason, only Apartment column got updated. After some time, we can’t tell on which floor is the Apartment A.Person Apartment Floor Joe A 1 Jack B 2 Jill A 2 ???
  • 11. Insertion Anomaly Jim moves into apartment X, but where to lookup for the floor? On which floor is the apartment X? Person Apartment Floor Joe A 1 Jack B 2 Jill B 2 Jim X ? A database-management system (DBMS) can work only with the information that we put explicitly into its tables.
  • 12. Normalized table to meet the 3rd Normal Form Person Apartment Joe A Jack B Jill B Apartment Floor A 1 B 2 X 3 Person Apartment Floor Joe A 1 Jack B 2 Jill B 2
  • 13. “Partial Denormalization” Person Apartment Floor Joe A 1 Jack B 2 Jill B 2 Apartment Floor A 1 B 2 X 3 Keep the transitive dependent columns, in combination with another “lookup table” WAT???
  • 14. Denormalizing to avoid “expensive joins”? “It sounds convincing, but it doesn't hold water.” Peter Wone’s epic post: http://stackoverflow.com/a/174047/3452582 From Introduction to Database Systems, which is the definitive textbook on database theory and design, in its 8th edition: ● Some of them hold for special cases ● All of them fail to pay off for general use ● All of them are significantly worse for other special cases ALWAYS normalise OLTP. Denormalise OLAP if you think it will help.
  • 15. Common Sense “Rules of normalization aren’t esoteric or complicated. They’re really just a commonsense technique to reduce redundancy and improve consistency of data.” SQL Antipatterns by Bill Karwin Copyright © 2014, The Pragmatic Bookshelf.
  • 16. Summary The application layer can become riddled with bugs if the data layer is too permissive, so here are 5 simple rules for a solid DB design: 1. Normalize tables up to the 3NF (and don’t be afraid of SQL Joins) 2. Define foreign keys on DB level (ORM can’t be fully trusted on that) 3. Define UNIQUE index where needed (ORM can't guarantee data uniqueness) 4. Define other validations on DB level (e.g. Not Null, Default Value) 5. Define index on every foreign key and any column that will appear in any where clause (for better query performance)
  • 17. Sources Wikipedia ● https://en.wikipedia.org/wiki/Transitive_dependency ● https://en.wikipedia.org/wiki/Third_normal_form ● https://en.wikipedia.org/wiki/Database_normalization StackOverflow ● http://stackoverflow.com/a/174047/3452582 ● http://stackoverflow.com/a/59522 Quora ● https://www.quora.com/What-is-the-difference-between-OLTP-and-OLAP