Tonight I delivered a fun presentation on the topic of designing a relational database using entity relationship diagrams. The group was very diverse (QA, Product Managers, DBAs, Software Architects, startup folks, etc.) and we had a great time talking about the nuts and bolts of designing a database.
25-30 minutes of instruction, 15-20 minutes of demo, 30+ minutes of Q&AI hope I’ll address most of your questions as we go, but write them down + let’s talk about them at the end of the slidesBut if you’re lost and your neighbor has a blank stare, please interrupt
Who are you?------Software engineers?Database engineers? DBAs?Don’t work with anything Microsoft?Why are you here?------Total new to DB?Work with a database that someone else designed?Handed a project and now you’re in charge?Design databases yourself?
Edgar F.Codd – British IBM researcher, regarded as the inventor of the relational model for database managementGOALS FOR SESSION: At the end of this session, you should be able to read, understand and create database Entity Relationship diagrams. And know where to look next.There will be a couple of exercises for the audience as we go and an Interactive demoQ&A – very curious about what you’re facing out there in the world
Want you to know of any bias that may affect my talkRun a startup, we make software for wholesalers and manufacturers to scale sales + operationsOn Microsoft stack (SQL Server, MVC, TDD, etc.)Love teaching + learningCheck out upcoming meetings on Agile, PhilHaack! and APIsConnect with me on LinkedInBe one of my 43 followers on Twitter!
Visual representation of the ENTITIES and RELATIONSHIPS between themBased on rules from the real world / your business; Entails a number of rules, standards and guidelinesGreat way to communicate with co-workers, customers, developers; Great working model and critical reference documentYou could consider this as a whole bunch of linked-together Excel worksheets: these fields define the columnsWorking with a model is 50% science, 50% art, 50% experience
CustomerId in SalesOrder table references a record/row in the Customer tableWe can look up / refer to related or seemingly unrelated data elsewhere
C’mon, you have to put all that data *somewhere*!Business Rules? Really? Yes, in *relationships* and other definitionsIt’s a great mental exercise that forces you to think through a lot of hard problemsSeparation of Concerns (1974, check Wikipedia)But what about Agile, TDD and BDD? Yes, in the demo we’ll start simple, break stuff and refactor.
Entity = object that is important to the business.Entity = the NOUNRelationship = how are these objects/entities relating to one another?Relationship = the VERB; the VERB PHRASE
What’s in a name?Naming conventions – singular vs. plural DESCRIBE THE BUSINESSAttribute = property of Entity / Columns of db tableData type choice (decimal vs. float vs. money) varchar vs. nvarchar vs. char vs. textNULL option – is this mandatory per business rules?Primary Key: unique identifier for each row of dataForeign Key: another entity’s PK that is here due to relationshipIDENTITY column: guarantees uniqueness and auto-increments (autonumber). (1,1) specifies the SEED and INCREMENT
A relationship is the “verb” between entities – defines how they interact per business rulesNaming / “writing a sentence” / be consistentDefines how the entities interactRead the relationship from parent->child (end with the dot)Identifying: the child is dependent on its parent for its identityNon-identifying: may be existence dependent but not identity dependent
Sometimes we end up with complicated relationships and that’s OKRecursive: can ParentId be NULL?Associative / link / join / junction table: used for many-to-many tablesCan make joins a little more difficult but is a common patternGREAT for when you want to use an entity in other areas, e.g. Customer<->Address and Person<->Address
Cardinality: how many instances of each entity may be involved or must be involved?w/ identifying (circle) – there has to be one Customer for some # of Addressw/ non-identifying (diamond) – no requirement to have a Customer to have an Address (could be an Employee Address, Order Address, etc.)<nothing> One-to-zero-or-more:<P> one to one or more<Z> one to zero or one<N> one to exactly Nw/ non-identifying relationships, these can all be zero-or-one-to…Enforced via triggers and constraints
Primary Key – unique identifying key (can be any data type as long as it’s unique)Foreign Key – some other entity’s PK that appears here because of some relationship with itCandidate Key – Could this work as a PK? What’s are the candidate keys in this table?Composite Key – put a few fields together to get a unique keyStrings? Timestamps? Integers? Decimals?
DENORMALIZED:Data can be duplicated and get out of syncEase of queryingPerformance considerationsNORMALIZED:One “fact” occurs in one place and only one placeRequires you to get the business rules “right”No way to have data concurrency issuesDoes require a lot of joins
Great opportunities for self-directed learningGeneralization:Account Savings Account, Checking Account, Mortgage AccountCascading deletesIn sum, there are a lot of database rules that can protect your data from you! Developer actions, etc.
In some ways it’s easier to create than repair… let’s work together to see how we can improve this modelFirstName/LastName fields long enough?StateProvince field?Website (255) OK?Are all of these fields really OK to be nulls?WTF is OwnerId doing in Country? State?Is StateCode and CountryCode OK for a PK? Why not a bigint?Relationships in wrong directionDenormalizedInfo? What is this? A flag?
Features + benefits of these as you go up/down the food chain