1. MODULE 4: DATA MODELING AND THE ERD
SQL and Scripting
Training
(C) 2020-2021 Highervista, LLC 1
2. 2
TOPICS
Questions from Prior Day
Getting Started with a
Database Project (the
Capstone)
Data Modeling Concepts:
Part 2
Cardinality
Steps for Creating an ERD
Normalization
Getting Started with a
Database Project (the
Capstone)
Draw.io
Preview of Afternoon
Activities
4. 4
DATA MODELING OBJECTIVES
Understand concepts and purpose of data modeling
Learn how relationships between entities are defined and refined, and how
such relationships are incorporated into the database design process
Learn how ERD components affect database design and implementation
Learn how to interpret the modeling symbols
5. 5
PROCEDURE OF ERD
Relatively simple representations of complex real-world data
structures
Data modeling is an iterative process
A “complete” and “100% error-free” data model is not
possible!
Only an optimized data model is possible
6. 6
DATA MODEL: REVIEW
Model: an abstraction of a real-world
object or event
Useful in understanding complexities of the real-
world environment
Data model
A diagram that displays a set of tables and the
relationships between them
A foundation!
Next slides: Draw.io entity relationship
diagram (ERD) examples West and Fowler (1999)
7. 7
REVIEW: WHAT IS AN ENTITY
RELATIONSHIP DIAGRAM (ERD)?
ERD is a data modeling technique used in
software engineering to produce a
conceptual data model of an information
system.
ERDs illustrate the logical structure of
databases.
ERDs represent business or use cases.
Source: Data Model (McFarland, 2020)
9. 9
THE IMPORTANCE OF DATA MODEL
Blueprint: official documentation
Blueprint of house
Employee’s w/o DB knowledge can understand
a data model diagram vs. a list of tables
Used as an effective Communication Tool
Improve interaction among the managers,
the designers, and the end users
Independence from a particular DBMS
Network DB, Object-oriented DB, etc.
Source: Data Model (McFarland, 2020)
10. 10
DATA MODEL (CON’T)
The data modeling
revolves around
discovering and analyzing
organizational and user's
data requirements (use
cases).
Requirements based on
policies, meetings,
procedures, system
specifications, etc.
• Identify what data is
important
• Identify what data should
be maintained
Source: Data Model (McFarland, 2020)
11. 11
ERD The major activity of this phase is
identifying entities, attributes, and their
relationships to construct model using
the Entity Relationship Diagram.
“Logical” (or design) names include:
Entity/Attribute/Relationship
“Physical” implementation names
include: Table, Column, Line
Entity à table
Attribute à column
Relationship à line
Source: ERD (McFarland, 2020)
14. 14
CLARIFICATION: HOW TO FIND
CARDINALITIES?
Cardinality:
The cardinality is the number of occurrences in one
entity which are associated to the number of
occurrences in another.
There are three basic cardinalities (degrees of
relationship).
one-to-one (1:1)
one-to-many (1:M)
many-to-many (M:N)
Note: In the Crow’s foot notation, the relationship lines
between entities is collapsed into 1 bi-directional line
Source: Example of Cardinality (McFarland, 2020)
15. 15
CROW’S FOOT: OPTIONALITY
The Optionality is a property of an attribute which specify if
a value is mandatory or optional.
To identify optional relationship, look for auxiliary verb such
as can or may
16. 16
DEGREE OF RELATIONSHIP
Degree of a Relationship describes the number of
entity participation
Unary (Recursive) Relationship: One instance related to
another of the same entity type
Binary Relationship: Instances of two different entities
related to each other
Ternary Relationship: Instances of three different types
related to each other
19. 19
UNARY (RECURSIVE) RELATIONSHIP
It is possible for an entity to have a relationship to itself—this is called a
recursive relationship.
supervises
Is supervised by
21. 21
CROW’S FOOT: WEAK ENTITY
RELATIONSHIP
A weak entity is an entity that cannot be uniquely identified and
existed by itself alone.
A weak entity is an entity that exists only if it is related to a set of
uniquely determined entities (owners of the weak entity).
More examples on the textbook
Each employee might have none or multiple dependents. However,
dependents must belong to at least one employee.
EMP DEP
weak entity
notation
22. 22
TRANSFORMATION OF M:N
A logical model will contain many M:N relationships
These will need to be transformed when moving to a physical model
When transform to relational model, many redundancies can be generated.
The relational operations become very complex and are likely to cause system
efficiency errors and output errors.
Break the M:N down into 1:N and N:1 relationships using bridge entity (weak
entity).
CLASS STUDEN
T
ENROLL
23. 23
CONVERTING ONE M:N RELATIONSHIP TO TWO 1:M
RELATIONSHIPS
Association Entity
(Join Table)
Converting M:N Relationships (FileMaker, 2020)
24. 24
BRIDGE (ASSOCIATIVE) ENTITY
ENROLL entity becomes a weak entity of both STUDENT
entity and CLASS entity
MUST have a composite (unique) identifier
STU_NUM (from STUDENT entity) and CLASS_CODE (from
CLASS entity)
25. 25
M:N WITH OPTIONALITY ON
BOTH SIDE
A person might or might not work for an employer, but could certainly
moonlight for multiple companies. An employer might have no
employees, but could have any number of them.
After broken down, optional relationship notation on
both side of associative entity
Association
26. 26
RECURSIVE RELATIONSHIP
Each student is taught by a
STA (student teaching
assistant). Each STA can
teach several students.
A recursive relationship is an
entity is associated with itself.
Student
teaches
is taught by
30. 30
STEPS FOR CREATING AN
ERD 1
Identify entity: look for singular nouns (but avoid a noun w/o attributes) and
also avoid proper nouns
Identify attribute: look for a descriptor whose values are associated with
individual entities of a specific entity type
Identify relationship: typically, a relationship is indicated by a verb connecting
two or more entities.
Identify cardinality: look for the number of occurrences in one entity which are
associated to the number of occurrences in another
31. 31
WHAT ARE THE ENTITIES?
ATTRIBUTES?
ANG Laboratory has several chemists who work on one or more projects.
Chemists also may use certain kinds of equipment on each project. The
organization would like to store the chemist’s employee identification
number, his/her name, up to three phone numbers, his/her project
identification number and the date on which the project started. Every piece of
equipment, the chemist uses, has a serial number and a cost.
33. 33
ATTRIBUTES?
ANG Laboratory has several chemists who work on one or more projects.
Chemists also may use certain kinds of equipment on each project. The
organization would like to store the chemist’s employee identification number,
his/her name, up to three phone numbers, his/her project identification number
and the date on which the project started.
Every piece of equipment, the chemist uses, has a serial number and a cost.
34. 34
ENTITIES, ATTRIBUTES AND IDENTIFIERS (IN
CHEN NOTATION)
Project
Proj#
Start-Date
Chemist
Phone#
Emp#
Equipment
Serial#
cost
Phone#
35. 35
RELATIONSHIPS?
ANG Laboratory has several chemists who work on one or more projects.
Chemists also may use certain kinds of equipment on each project.
The organization would like to store the chemist’s employee identification number,
his/her name, up to three phone numbers, his/her project identification number and
the date on which the project started.
Every piece of equipment, the chemist uses, has a serial number and a cost.
37. 37
CARDINALITY
The organization would like to store the date the chemist was
assigned to the project and the date an equipment item was assigned
to a particular chemist working on a particular project.
A chemist must be assigned at least to one (or more) project and one
(or more) equipment.
Projects and equipment must be managed by only one chemist. A
given project need not be assigned an equipment.
40. 40
NORMALIZATION DEFINED
Normalization is a process for evaluating and correcting
relational structures to minimize data redundancies,
reducing the likelihood of data inconsistencies or anomalies.
42. 42
NORMALIZATION
During the design process, we often create entities (tables) with
inconsistencies and anomalies.
Anomaly: An inconsistent, incomplete or contradictory issue with data in a database.
Anomalies can cause significant issues in running the database including
the incorrect deletion or inappropriate updating of data within a table.
Normalization is a process that we can step through to reduce anomalies
in the relational database.
43. 43
WELL-STRUCTURED RELATIONS
What constitutes a well-structured relation? Intuitively, a well-structured relation
contains minimal redundancy and allows users to insert, modify, and delete rows in
a table without errors or inconsistencies.
EmpID Name Dept Salary
230 Pillsbury Marketing 58,000
241 Marshall Finance 68,400
277 Marco Accounting 66,000
279 Gunston Marketing 42,400
290 Jaffe Planning 49,000
EMPLOYEE1 Table
44. 44
EMPLOYEE1 is a well-structured relation. Each row of the table contains data
describing one employee, and any modification of an employee’s data (such as a
change in salary) is confined to one row in the table.
EmpID Name Dept Salary
230 Pillsbury Marketing 58,000
241 Marshall Finance 68,400
277 Marco Accounting 66,000
279 Gunston Marketing 42,400
290 Jaffe Planning 49,000
EMPLOYEE1 Table
Well-Structured Relations
45. 45
In contrast, EMPLOYEE2 is not a well-structured relation. Notice the redundancy. For
example, values for EmpID, Name, Dept, and Salary appear in two separate rows for
employees 241 and 290.
EmpID Name Dept Salary Course Date
230 Pillsbury Marketing 58,000 C++ 2/12/06
241 Marshall Finance 68,400 SPSS 5/30/07
241 Marshall Finance 68,400 Web Design 11/2/08
277 Marco Accounting 66,000 C# 12/8/07
279 Gunston Marketing 42,400 Java 9/10/06
290 Jaffe Planning 49,000 Tax Acct 4/22/06
290 Jaffe Planning 49,000 Bus Adm 6/6/08
EMPLOYEE2 Table
Well-Structured Relations
46. 46
EXAMPLE TABLE: DENTIST-PATIENT (WITH
ANOMALIES)
Insert anomaly: No new dentist or patient record can be added unless an appointment has
been made for that patient or dentist.
Delete anomaly: If the appointment on 1/9/05 at 10 is canceled and deleted, the information
about patient P100 would be gone, as the patient has only one appointment.
Update anomaly: If patient P108 has a name change, it is possible only row 2 will get
updated, not row 4.
47. 47
NORMALIZING TABLES
On the previous four slides we presented an intuitive discussion of well-
structured relations. We need a more formal procedure for designing them.
Normalization is the process of successively reducing relations with
anomalies to produce smaller, well-structured relations. Some of the goals
are
§Minimize data redundancy, thereby avoiding anomalies and conserving storage space.
§Simplify the enforcement of referential integrity constraints.
§Make it easier to maintain data (insert, delete, update).
§Provide a better design that is an improved representation of the real world and a stronger basis for
future growth.
48. 48
NORMAL FORMS: 1NF, 2NF, AND 3NF
We work through the ‘normal’ forms, successively, through each table
(entity) in our model.
While there are more ‘normal forms’ in addition to 1NF, 2NF, and 3NF,
these three are essential.
Work through 1NF first.
Progressive: 1NF then 2NF then 3NF
There are 11 normal forms (we’ll focus on 1NF, 2NF, 3NF only)
49. 49
1NF (FIRST NORMAL FORM)
For 1NF, ensure the following:
Every attribute (or field) is a single value for each table.
There are no repeating attributes.
Each attribute is ‘atomic’ (as small as it can get).
50. 50
FIRST NORMAL FORM
All fields describe the entity represented by the table.
All fields contain simplest possible values.
No multivalued attributes (also called repeating groups).
Home Town
Chicago, IL
NOT
City State
Chicago IL
1st NORMAL
51. 51
NOT 1NF EmpID Dept CourseName DateCompleted
203 Finance Tax Accounting 6/22/07
421 Info Systems Java
Database Mgt
10/7/07
6/4/06
666 Marketing
Another
multivalued
attribute
A multivalued
attribute
52. 52
1NF - ELIMINATING MULTIVALUED ATTRIBUTES
EmpID Dept CourseName DateCompleted
203 Finance Tax Accounting 6/22/07
421 Info Systems Java 10/7/07
421 Info Systems Database Mgt 6/4/06
666 Marketing
This new table does have only single-valued
attributes and so satisfies 1NF. However, as we
saw, the table still has some undesirable properties.
53. 53
UN-NORMALIZED ORDERS TABLE (FROM 1NF TO 2NF)
Issues:
• Making a change to a part
description …
• A part that appears in many rows .
• The primary key in this table is
(OrderNum, PartNum). So if we
wanted to insert a new part into the
table…
• What if we deleted an order?
Un-normalized Orders Table (McFarland, 2020)
54. 54
When transitioning from
1NF to 2NF, what
happens to the number
of tables?
Transitioning from 1NF to 2NF (McFarland, 2020)
1NF to 2NF
55. 55
SECOND NORMAL FORM
Table must be in 1st normal form first.
2nd Normal form: No partial dependencies exist. (No
non-key fields are determined by only part of a
multiple-field primary key, i.e., non-keys are identified
by the whole primary key)
*Primary key
NOT2nd NORMAL
Course#* Grade
CIS 101 B
Student ID*
12345
Course#* Name
CIS 101 Higgins
Student ID*
12345
DeterminesDetermines
56. 56
THIRD NORMAL FORM
Table must be in 2nd Normal Form.
No transitive dependencies (no non-key fields are determined
by other non-key fields, i.e., non-keys are identified by only the
primary key).
Course# * Textbook
CIS 101 Intro to CIS
Credits
3
NOT3rd NORMAL
*Primary key
Course# * Textbook
CIS 101 Intro to CIS
Book Price
$45.99
Determines Determines
57. 57
FOURTH AND FIFTH NORMAL FORM
Fourth Normal Form – Table is 3NF and has
at most one multivalued dependency. Can
produce records with many blank values.
• Fifth Normal Form – Table is in 4NF and
the table cannot be split into further
tables.
58. 58
HOW TO GET STARTED WITH A DATABASE PROJECT
1. Explore the project
a) Size, scope, depth and breath
b) Executive sponsor and/or funding
2. Capstone Specific: Review data to be modeled (for Capstone, NIH
database)
3. Develop Statement of Work (define scope, depth, limits)
4. Develop ERD (entities, attributes, relationships) - Crow’s foot
59. 59
HOW TO GET STARTED WITH A DATABASE PROJECT
6. Normalize the Model (remove anomalies from the model)
7. Apply the Normalized Model
a. Create the Database (create database)
b. Create tables with fields using Data Definition Language (DDL)
c. Create Data Manipulation Language (DML) to query (question) the data
8. Implement Functionality
a. Use Python to extract data from a data source
b. Load extracted data into the database
c. Be able to report on the data loaded into the database
60. 60
REFERENCES
Draw.io. (2020). Diagrams.net - free flowchart maker and diagrams online. Retrieved
November 23, 2020, from https://app.diagrams.net/
FileMaker. (2020). File Maker Pro 16: Many-to-many relationships. Retrieved December 10,
2020, from https://fmhelp.filemaker.com/help/16/fmp/en/index.html
SQLite Browser. (2020, November 09). DB Browser for SQLite. Retrieved November 23, 2020,
from https://sqlitebrowser.org/
SQLite. (2020). SQLite Main Website. Retrieved November 23, 2020, from
https://sqlite.org/index.html
McFarland, R. (2020). Published Articles: Ron McFarland. Retrieved December 03, 2020,
from https://medium.com/@highervista
Tutorialspoint. (2020). SQLite Tutorial. Retrieved November 23, 2020, from
https://www.tutorialspoint.com/sqlite/index.htm
Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process
Industries STEP Technical Liaison Executive (EPISTLE).
62. 62
ABOUT THIS COURSE
This course is distributed free. I use several
sources. But importantly, I use the book noted
on the next slide.
If you are using these PowerPoints, please
attribute Highervista, LLC and me (Ron
McFarland). IN ADDITION, please attribute
the author noted on the next slide, as the
author’s textbook provides essential
information for this course.
Source: Microsoft Images
63. 63
INTRODUCTION
This course is offered to you free. HOWEVER, please
purchase the following book, as it is a primary resource for
this course. I do not make any $ from this course or this
book. So, since a handful of good content is derived from the
following text, please support this author!
Title: SQL Quickstart Guide
Author: Walter Shields
Available: Amazon, B&N, and through ClydeBank media
website at:
https://www.clydebankmedia.com/books/programming-
tech/sql-quickstart-guide