An introduction to database architecture, design and development, its relation to Object Oriented Analysis & Design in software, Illustration with examples to database normalization and finally, a basic SQL guide and best practices
2. Database Basics
What is a database, DBMS and its types?
What makes a database?
Object-Oriented Design vs. Database Design
Database Normalization
Structured Query Language (SQL)
4. What is a database?
“A database is an organized collection of data. The data is typically
organized to model aspects of reality in a way that supports processes
requiring information”. [Wikipedia.org Definition]
“A database provides for the storing and control of business information,
independent from (but not separate from the processing requirements of)
one or more applications”. [IBM Definition]
“A database is a collection of information that is organized so that it can
easily be accessed, managed, and updated”. [WhatIs.com Definition]
So, What do you think?
5. Database Models, Types and Formats
Flat-file Model (Mainframes)
Hierarchical Model (Tree-Structure, LDAP)
Relational Model (RDBMS)
Object-Oriented Model (Apache Derby DB, H2 and In-memory databases)
Not Only SQL (NoSQL): Semi structured Model, Structure within data
6. What is a RDBMS?
RDBMS: Relational Database Management System
A tool for organizing, storing, retrieving, and communicating groups of
information that have similar characteristics.
Examples are: ORACLE DB, MySQL, MS Access, MS SQL Server, IBM DB2 and
Sybase
8. What makes a database?
Schema
(Owner)
Tables Views Indices Sequences Triggers Constraints Subroutines
9. Table
The primary unit of physical storage for data in a database.
Stores data, so they’re essential building blocks of any database.
Any database consists of one or more tables (at least one)
Contains Rows called Records and Columns called Fields.
Cell - ENTRY
Rows – RECORDS
Columns - FIELDS
ID CustomerName Email City
1 Mahmoud mahmoud@test.com Cairo
2 Samy samy@test.com Alex
3 Aly aly@test.com Aswan
10. View
Views can represent a subset of the data contained in a table. Consequently,
a view can limit the degree of exposure of the underlying tables to the outer
world: a given user may have permission to query the view, while denied
access to the rest of the base table.
Views can join and simplify multiple tables into a single virtual table.
Views can act as aggregated tables, where the database engine aggregates
data (sum, average, etc.) and presents the calculated results as part of the
data.
Views can hide the complexity of data. For example, a view could appear as
Sales2000 or Sales2001, transparently partitioning the actual underlying
table.
Views take very little space to store; the database contains only the
definition of a view, not a copy of all the data that it presents.
It can be read-only or updatable
11. Index
A data structure that improves the speed of data retrieval operations on a
database table at the cost of additional writes and storage space to maintain
the index data structure.
Used to quickly locate data without having to search every row in a database
table every time a database table is accessed.
Can be created using one or more columns of a database table, providing the
basis for both rapid random lookups and efficient access of ordered records.
An index is a copy of select columns of data from a table that can be
searched very efficiently that also includes a low-level disk block address or
direct link to the complete row of data it was copied from.
The Primary Key is merely an index that enforces uniqueness
12. Sequence
The sequence is a feature by some database products which just creates
unique values. It just increments a value and returns it.
The special thing about it is: there is no transaction isolation, so several
transactions can not get the same value, the incremental is also not rolled
back.
Sequence will allow you to populate primary key with a unique, serialized
number.
Without a database sequence, it is very hard to generate unique incrementing
numbers.
Other database products support columns that are automatically initialized
with a incrementing number.
13. Trigger
A procedural code that is automatically executed in response to certain
events on a particular table or view in a database.
The trigger is mostly used for maintaining the integrity of the information on
the database.
For example, in HRMS database when a new record (representing a new
worker) is added to the employees table, new records should also be created
in the tables of the taxes, vacations and salaries.
Can run before or after a database change.
Can run at the cell level, row level, column level, or statement level.
In some implementations, it can call an external program
14. Constraints
It can be a single rule or a set of rules to restrict the data entered in the
table and guarantee its correctness.
If there is any violation between the constraint and the data action, the
action is aborted by the constraint.
Can apply to a single column or the entire table
Can be specified when the table is created or after that
In SQL, we have the following constraints:
NOT NULL - UNIQUE - PRIMARY KEY - FOREIGN KEY - CHECK – DEFAULT
Advantages:
1. Input Validation
2. Data Consistency
3. Intelligent Defaults
15. Subroutines
A sequence of program instructions that perform a specific task, packaged as
a unit and stored in the database.
Can consolidate and centralize logic that was originally implemented in
applications
Sometimes called a stored procedure (SP), function or User Defined Function
(UDF)
Advantages:
1. Simple, no overhead
2. Avoidance of network traffic
3. Encapsulation of business logic
4. Delegation of access-rights
5. Some protection from SQL injection attacks
17. Entity Structure
Customer
• id: int
• name: String
• birthDate: Date
• salary: double
• jobCode: String
CUSTOMER
• C_ID: INTEGER
• C_NAME: VARCHAR(25)
• C_BIRTH_DATE: DATE
• C_SAL: DOUBLE
• C_JOB_CODE: CHAR(3)
CUSTOMER TABLE structure can be represented by a Customer Class
Every COLUMN is mapped to its equivalent class attribute/variable
Data types may vary according to the database vendor / programming
language
18. Entity Data
ID Name Birth Date Salary Job Code
5011 Moneim Emad 03/07/1982 5,000.00 SAR
5012 Kareem Hewady 28/06/1993 3,000.00 OSD
A single, specific occurrence of an entity is an instance.
The selected record/row represents a single record in CUSTOMER table
This is mapped to an instance / object of the Customer class
The entire table data can be mapped to a list of Customer objects
19. Entity Relationship Diagram (ERD)
A relationship is a link that relates two entities that share one or more attributes
An ERD can be mapped to UML Class Diagram
Database relations between tables can be mapped to UML association between classes
CustomerPublisher Book
Author Inventory Order
20. Relationship: Primary Key
ID Name Birth Date Salary Job Code
5011 Moneim Emad 03/07/1982 5,000.00 SAR
5012 Kareem Hewady 28/06/1993 3,000.00 OSD
A primary key is a unique identifier of records in a table.
A primary key values may be generated manually or automatically
A primary key can consist of more than one column (composite key)
21. Relationship: Foreign Key
ID Name Birth Date Salary Job Code
5011 Moneim Emad 03/07/1982 5,000.00 SAR
5012 Kareem Hewady 28/06/1993 3,000.00 OSD
A foreign key is a column reference to the primary key in its original table
In contrast to primary key, foreign key may be repeated in the child table
ID Customer ID Location Delivery Date
1 5011 El Dokky 27/03/2015 11:45 AM
2 5011 Nasr City 28/03/2015 09:30 AM
3 5897 El Zamalek 28/03/2015 02:15 PM
ORDER (Child table)
CUSTOMER (Parent table)
22. Relationship Types
One-to-one
Customer Basic info Customer Additional Info
One-to-many
Customer info Customer Orders Info
Many-to-many
Customer Info Customer Books Interest Books Info
24. Normalization
A set of recommendations to be followed when designing a database that
helps for organizing data elements into tables and optimizing the database.
We will see an example for database normalization in three steps using the
three normal forms
1NF 2NF 3NF
25. Normalization: Case Study (Un-normalized)
SalesOrders
• SalesOrderNo
• Date
• CustomerNo
• CustomerName
• CutomerAddress
• ClerkNo
• ClerkName
• Item1Description
• Item1Quantity
• Item1UnitPrice
• Item2Description
• Item2Quantity
• Item2UnitPrice
• Item3Description
• Item3Quantity
• Item3UnitPrice
• Total
26. Normalization: Into 1NF
Separate repeating groups into new tables.
Start a new table for the repeating data.
The primary key for the repeating group is usually a composite key.
28. Normalization: Into 2NF
Remove partial dependencies.
Start a new table for the partially dependent data and the part of the key it
depends on.
Tables started at this step usually contain descriptions of resources.
29. Normalization: Into 2NF
Functional dependency:
The value of one attribute depends entirely on the value of another.
Partial dependency:
An attribute depends on only part of the primary key. (The primary key must
be a composite key.)
Transitive dependency:
An attribute depends on an attribute other than the primary key.
31. Normalization: Into 2NF – Problems Solved
Duplication of data:
ItemDescription would appear for every order.
Insert anomaly:
To insert an inventory item, you must insert a sales order.
Delete anomaly:
Information about the items stay with sales order records. Delete a sales
order record, delete the item description.
Update anomaly:
To change an item description, you must change all the sales order
records that have the item.
32. Normalization: Into 3NF
Remove transitive dependencies.
Start a new table for the transitively dependent attribute and the attribute it
depends on.
Keep a copy of the key attribute in the original table.
33. Normalization: Into 3NF
• SalesOrderNo
• Date
• CustomerNo
• CustomerName
• CutomerAddress
• ClerkNo
• ClerkName
• Total
SalesOrders
Clerk
• ClerkNo
• ClerkName
Customer
• CustomerNo
• CustomerName
• CustomerAddress
• SalesOrderNo
• Date
• CustomerNo
• ClerkNo
• Total
SalesOrders
34. Normalization: Into 3NF – Problems Solved
Duplication of data:
Customer and Clerk details would appear for every order.
Insert anomaly:
To insert a customer or clerk, you must insert a sales order.
Delete anomaly:
Information about the customers and clerks stay with sales order records.
Delete a sales order record, delete the customer or clerk.
Update anomaly:
To change the details of a customer or clerk, you must change all the sales
order that involve that customer or clerk.
37. SQL
SQL is a standard language for accessing and manipulating databases.
Stands for Structured Query Language.
SQL can execute administrative, structural and data manipulation statements
Administrative:
Set permissions on schemas, tables, procedures and views
Structural:
Create, Drop or Alter tables, procedures and views
Data Manipulation:
Retrieve data from a database
Insert, update or delete records from a database
38. SQL
Although SQL is an ANSI (American National Standards Institute) standard,
there are different versions of the SQL language.
Most of the big database vendors have their own proprietary extensions in
addition to the SQL standard.
However, to be compliant with the ANSI standard, they all support at least the
major commands (such as SELECT, UPDATE, DELETE, INSERT, WHERE) in a
similar manner.
SQL is NOT case sensitive: select is the same as SELECT.
As a coding convention, we should write all SQL keywords in upper-case.
Semicolon is the standard way to separate each SQL statement in database
systems that allow more than one SQL statement to be executed in the same
call to the server.
Some database vendors let you choose your statement terminator character.
39. SQL
SELECT - extracts data from a database
UPDATE - updates data in a database
DELETE - deletes data from a database
INSERT INTO - inserts new data into a database
CREATE DATABASE - creates a new database
ALTER DATABASE - modifies a database
CREATE TABLE - creates a new table
ALTER TABLE - modifies a table
DROP TABLE - deletes a table
CREATE INDEX - creates an index (search key)
DROP INDEX - deletes an index
40. SQL Syntax
SELECT col1_name,col2_name FROM table_name;
SELECT * FROM table_name;
The SELECT DISTINCT statement is used to return only distinct (different) values.
SELECT DISTINCT column1_name,column2_name FROM table_name;
The WHERE clause is used to extract only those records that fulfill a specified criterion.
SELECT col1_name,col2_name FROM table_name WHERE boolean_condition;
WHERE CITY=‘Cairo’
WHERE (CUSTOMER_ID=5011 OR CITY<>’Alex’)
WHERE CUSTOMER_ID IN(5011,5378,5834)
WHERE (PRICE>=5000 AND PRICE<10000)
WHERE CUSTOMER_NAME LIKE ’%Moneim%’
WHERE HIRE_DATE BETWEEN 2006-12-31 AND 2004-01-01
41. SQL Syntax
INSERT INTO table_name VALUES (value1,value2,value3,...);
INSERT INTO table_name (col1,col2,col3) VALUES (val1,val2,val3);
UPDATE table_name SET col1=val1,col2=val2 WHERE col5=val5;
DELETE FROM table_name WHERE col3=val5;
The WHERE clause specifies which record or records that should be updated or deleted.
If you omit the WHERE clause, all records will be updated or deleted!
Using Aliases for tables and columns
SELECT CUSTOMER_NAME AS C_NAME FROM CUSTOMER AS C;
SELECT CUSTOMER_NAME C_NAME FROM CUSTOMER C;
42. SQL Syntax
Using NOT & Wildcards
WHERE ID NOT IN (5011,5012); or WHERE PROVIDER NOT LIKE ‘%mobinil%’;
WHERE City LIKE '_erlin'; or WHERE City LIKE 'L_n_on';
City starting with "b", "s", or "p": WHERE City LIKE '[bsp]%'
starting with "a", "b", or "c": WHERE City LIKE '[a-c]%
City NOT starting with "b", "s", or "p":
WHERE City LIKE '[!bsp]%'; or WHERE City NOT LIKE '[bsp]%';
WHERE Price NOT BETWEEN 10 AND 20;
WHERE ProductName BETWEEN 'C' AND 'M'; -- beginning with C,D,…,M
Ordering Data
SELECT col1,col2 FROM table_name ORDER BY col1 ASC|DESC;-- Ascending
ORDER BY col1 ASC|DESC, col2 ASC|DESC;
43. SQL Syntax - JOINs
An SQL JOIN clause is used to combine rows from two or more tables, based on a common
field between them.
The most common type of join is: SQL INNER JOIN (simple join).
INNER JOIN: Returns all rows when there is at least one match in BOTH tables
LEFT JOIN: Return all rows from the left table, and the matched rows from the right table
RIGHT JOIN: Return all rows from the right table, and the matched rows from the left table
FULL JOIN: Return all rows when there is a match in ONE of the tables
44. SQL Syntax - JOINs
A B A B
A B A B
INNER JOIN LEFT OUTER JOIN
RIGHT OUTER JOIN FULL JOIN
45. SQL Syntax – INNER JOIN
OrderID CustomerID OrderDate
10308 2 1996-09-18
10309 2 1996-09-19
10310 1 1996-09-20
10311 1996-09-21
ID CustomerName Email City
1 Mahmoud mahmoud@test.com Cairo
2 Samy samy@test.com Alex
3 Aly aly@test.com Aswan
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers [INNER] JOIN Orders
ON Customers.ID=Orders.CustomerID
ORDER BY Customers.CustomerName; CustomerName OrderID
Mahmoud 10310
Samy 10308
Samy 10309
46. SQL Syntax – LEFT OUTER JOIN
OrderID CustomerID OrderDate
10308 2 1996-09-18
10309 2 1996-09-19
10310 1 1996-09-20
10311 1996-09-21
ID CustomerName Email City
1 Mahmoud mahmoud@test.com Cairo
2 Samy samy@test.com Alex
3 Aly aly@test.com Aswan
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers LEFT JOIN Orders
ON Customers.ID=Orders.CustomerID
ORDER BY Customers.CustomerName; CustomerName OrderID
Aly NULL
Mahmoud 10310
Samy 10308
Samy 10309
47. SQL Syntax – RIGHT OUTER JOIN
OrderID CustomerID OrderDate
10308 2 1996-09-18
10309 2 1996-09-19
10310 1 1996-09-20
10311 1996-09-21
ID CustomerName Email City
1 Mahmoud mahmoud@test.com Cairo
2 Samy samy@test.com Alex
3 Aly aly@test.com Aswan
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers RIGHT JOIN Orders
ON Customers.ID=Orders.CustomerID
ORDER BY Customers.CustomerName; CustomerName OrderID
Mahmoud 10310
Samy 10308
Samy 10309
NULL 10311
48. SQL Syntax – FULL OUTER JOIN
OrderID CustomerID OrderDate
10308 2 1996-09-18
10309 2 1996-09-19
10310 1 1996-09-20
10311 1996-09-21
ID CustomerName Email City
1 Mahmoud mahmoud@test.com Cairo
2 Samy samy@test.com Alex
3 Aly aly@test.com Aswan
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers FULL JOIN Orders
ON Customers.ID=Orders.CustomerID
ORDER BY Customers.CustomerName; CustomerName OrderID
Aly NULL
Mahmoud 10310
Samy 10308
Samy 10309
NULL 10311
49. SQL Syntax - Advanced
Using SELECT INTO
SELECT Customers.CustomerName, Orders.OrderID
INTO CustomersOrderBackup2013
FROM Customers LEFT JOIN Orders ON Customers.ID=Orders.CustomerID;
Using INSERT INTO SELECT
INSERT INTO Customers (CustomerName, Country)
SELECT SupplierName, Country FROM Suppliers;
Using SQL Create
CREATE DATABASE dbname;
CREATE TABLE Persons(PersonID int NOT NULL UNIQUE,LastName varchar(255),
FirstName varchar(255),Address varchar(255),City varchar(255));
50. SQL Aggregate Functions
AVG() Returns the average value
COUNT() Returns the number of rows
FIRST() Returns the first value
LAST() Returns the last value
MAX() Returns the largest value
MIN() Returns the smallest value
SUM() Returns the sumUsing SELECT INTO
51. SQL Scalar Functions
UCASE() Converts a field to upper case
LCASE() Converts a field to lower case
MID() Extract characters from a text field
LEN() Returns the length of a text field
ROUND() Rounds a numeric field to the number of decimals specified
NOW() Returns the current system date and time
FORMAT() Formats how a field is to be displayed
52. SQL GROUP BY Statement
The GROUP BY statement is used in conjunction with the aggregate functions
to group the result set by one or more columns.
It can be used for summary reports
SELECT Shippers.ShipperName,COUNT(Orders.OrderID) AS NumberOfOrders
FROM Orders LEFT JOIN Shippers ON Orders.ShipperID=Shippers.ShipperID
GROUP BY ShipperName;
ShipperName NumberOfOrders
Federal Shipping 68
Speedy Express 54
United Package 74
53. SQL GROUP BY Statement
Group by two columns
SELECT Shippers.ShipperName, Employees.LastName,
COUNT(Orders.OrderID) AS NumberOfOrders FROM ((Orders
INNER JOIN Shippers ON Orders.ShipperID=Shippers.ShipperID)
INNER JOIN Employees ON Orders.EmployeeID=Employees.EmployeeID)
GROUP BY ShipperName,LastName;
ShipperName Last Name NumberOfOrders
Federal Shipping Baher 38
Federal Shipping Mona 30
Speedy Express Aly 54
United Package Kareem 70
United Package Yasser 4
54. SQL HAVING Clause
The HAVING clause was added to SQL because the WHERE keyword could not be
used with aggregate functions.
Suppose that we want to find if any of the employees has registered more than 10 orders.
SELECT Employees.LastName, COUNT(Orders.OrderID) AS NumberOfOrders
FROM (Orders INNER JOIN Employees
ON Orders.EmployeeID=Employees.ID)
GROUP BY LastName
HAVING COUNT(Orders.OrderID) > 10;
Last Name NumberOfOrders
Aly 54
Baher 38
Kareem 70
Mona 30
55. SQL Injection
An SQL injection can destroy your database
SQL injection is a technique where malicious users can inject SQL commands into an SQL
statement, via web page input.
Injected SQL commands can alter SQL statement and compromise the security of a web
application.
txtUserId = getRequestString("UserId");
txtSQL = "SELECT * FROM Users WHERE UserId = " + txtUserId;
UserId: 105 or 1=1
SELECT * FROM Users WHERE UserId = 105 or 1=1
SELECT * FROM Users WHERE UserId = 105 or ""=""
UserId: 105; DROP TABLE SUPPLIERS
Use SQL Parameters / PreparedStatements