Introduction
• Data: Known facts that can be recorded and have an implicit meaning; raw data, unprocessed data
• Information: Processed data
• Database: a highly organized, interrelated, and structured set of data about a particular enterprise
• Controlled by a database management system (DBMS)
• DBMS
• Set of programs to access the data
• An environment that is both convenient and efficient to use
• Database systems are used to manage collections of data that are:
• Highly valuable
• Relatively large
• Accessed by multiple users and applications, often at the same time.
• A modern database system is a complex software system whose task is to manage a large, complex
collection of data.
• Databases touch all aspects of our lives
Database Examples
• Enterprise Information
• Sales: customers, products, purchases
• Accounting: payments, receipts, assets
• Human Resources: Information about employees, salaries, payroll taxes.
• Manufacturing: management of production, inventory, orders, supply chain.
• Banking and finance
• customer information, accounts, loans, and banking transactions.
• Credit card transactions
• Finance: sales and purchases of financial instruments (e.g., stocks and bonds; storing real-time market data
• Universities: registration, grades
Databases
• Traditional applications:
• Numeric and textual databases
• More recent applications:
• Multimedia databases
• Geographic Information Systems (GIS)
• Biological and genome databases
• Data warehouses
• Mobile databases
• Real-time and active databases
• Social Networks started capturing a lot of information about people
and about communications among people-posts, tweets, photos,
videos in systems such as:
- Facebook
- Twitter
- Linked-In
• All of the above constitutes data
• Search Engines, Google, Bing, Yahoo: collect their own repository of
web pages for searching purposes
DBMS Functions
• Define a particular database in terms of its data types, structures etc.
• Construct or load the initial database contents on a secondary storage medium
• Manipulating the database:
• Retrieval: Querying, generating reports
• Modification: Insertions, deletions and updates to its content
• Accessing the database through Web applications
• Processing and sharing by a set of concurrent users and application programs –
yet, keeping all data valid and consistent
• DBMS may additionally provide:
• Protection or security measures to prevent unauthorized access
• “Active” processing to take internal actions on data
• Presentation and visualization of data
• Maintenance of the database and associated programs over the lifetime of
the database application
Purpose of Database Systems
File-processing system is supported by a conventional operating system. The system stores permanent records in
various files, and it needs different application programs to extract records from, and add records to, the
appropriate files. Before database management systems (DBMSs) were introduced, organizations usually stored
information in such systems.
Data redundancy and inconsistency:
• Since different programmers create the files and application programs over a long period, the various files are
likely to have different structures and the programs may be written in several programming languages.
Moreover, the same information may be duplicated in several places (files). This redundancy leads to higher
storage and access cost. In addition, it may lead to data inconsistency; that is, the various copies of the same
data may no longer agree.
Difficulty in accessing data
• Need to write a new program to carry out each new task
Data isolation
• Multiple files and formats. Because data are scattered in various files, and files may be in different formats,
writing new application programs to retrieve the appropriate data is difficult.
Integrity problems:
• The data values stored in the database must satisfy certain types of consistency constraints. Suppose also
that the university requires that the account balance of a department may never fall below zero. Developers
enforce these constraints in the system by adding appropriate code in the various application programs.
However, when new constraints are added, it is difficult to change the programs to enforce them. The
problem is compounded when constraints involve several data items from different files.
• Atomicity of updates
• Failures may leave database in an inconsistent state with partial updates carried out
• Example: Transfer of funds from one account to another should either complete or not happen at all
• Concurrent access by multiple users
• Concurrent access needed for performance
• Uncontrolled concurrent accesses can lead to inconsistencies
• Ex: Two people reading a balance (say 100) and updating it by withdrawing money (say 50 each) at the
same time
• Security problems
• Hard to provide user access to some, but not all, data
Types of Databases
1. Relational Database
• A relational database management system (RDBMS) is a system where data
is organized in two-dimensional tables using rows and columns.
• This is one of the most popular data models which is used in industries. It is
based on SQL.
• Every table in a database has a key field which uniquely identifies each
record.
• This type of system is the most widely used DBMS.
• Relational database management system software is available for personal
computers, workstation and large mainframe systems.
• For example − Oracle Database, MySQL, Microsoft SQL Server etc.
2. Object Oriented Database
• It is a system where information or data is represented in the form of
objects which is used in object-oriented programming.
• It is a combination of relational database concepts and object-oriented
principles.
• Relational database concepts are concurrency control, transactions, etc.
• OOPs principles are data encapsulation, inheritance, and polymorphism.
• It requires less code and is easy to maintain.
• For example − Object DB software.
3. Hierarchical Database
• It is a system where the data elements have a one to many
relationship (1: N). Here data is organized like a tree which is similar
to a folder structure in your computer system.
• The hierarchy starts from the root node, connecting all the child
nodes to the parent node.
• It is used in industry on mainframe platforms.
• For example− IMS(IBM), Windows registry (Microsoft).
4. Network database
• A Network database management system is
a system where the data elements maintain
one to one relationship (1: 1) or many to
many relationship (N: N).
• It also has a hierarchical structure, but the
data is organized like a graph and it is
allowed to have more than one parent for
one child record.
5. NoSQL databases
• NoSQL is a broad category that includes any database that doesn’t use SQL as its
primary data access language.
• These types of databases are also sometimes referred to as non-relational
databases.
• Unlike in relational databases, data in a NoSQL database doesn’t have to conform
to a pre-defined schema, so these types of databases are great for organizations
seeking to store unstructured or semi-structured data.
• One advantage of NoSQL databases is that developers can make changes to the
database on the fly, without affecting applications that are using the database.
•
• Examples: Apache Cassandra, MongoDB, CouchDB, and CouchBase
6. Cloud databases
• A cloud database refers to any database that’s designed to run in the cloud. Like other
cloud-based applications, cloud databases offer flexibility and scalability, along with high
availability. Cloud databases are also often low-maintenance, since many are offered via a
SaaS model.
• Examples: Microsoft Azure SQL Database, Amazon Relational Database Service, Oracle
Autonomous Database.
7. Columnar databases
• Also referred to as column data stores, store data in columns rather than rows. These
types of databases are often used in data warehouses because they’re great at handling
analytical queries. When you’re querying a columnar database, it essentially ignores all of
the data that doesn’t apply to the query, because you can retrieve the information from
only the columns you want.
• Examples: Google BigQuery, Cassandra, HBase, MariaDB, Azure SQL Data Warehouse
8. Document databases
• Document databases, also known as document stores, use JSON-like documents to model data instead of
rows and columns. Sometimes referred to as document-oriented databases, document databases are
designed to store and manage document-oriented information, also referred to as semi-structured data.
Document databases are simple and scalable, making them useful for mobile apps that need fast iterations.
• Examples: MongoDB, Amazon DocumentDB, Apache CouchDB
9. Graph databases
• Graph databases are a type of NoSQL database that are based on graph theory. Graph-Oriented Database
Management Systems (DBMS) software is designed to identify and work with the connections between data
points. Therefore graph databases are often used to analyze the relationships between heterogeneous data
points, such as in fraud prevention or for mining data about customers from social media.
• Examples: Datastax Enterprise Graph, Neo4J
10. Time series databases
• A time series database is a database optimized for time-stamped, or time series, data. Examples of this type
of data include network data, sensor data, and application performance monitoring data. All of those
Internet of Things sensors that are getting attached to everything put out a constant stream of time series
data.
• Examples: Druid, eXtremeDB, InfluxDB
Characteristics of Database Approach
1. Self-Describing Nature of a Database System : One of the most fundamental
characteristics of the database approach is that the database system contains
not only the database itself but also an entire definition or description of the
database structure and constraints also known as metadata of the database.
2. Support for Multiple Views of the Data :
• A database sometimes has many users, each of whom may require a special
perspective or view of the database.
• A view could also be a subset of the database, or it’s going to contain virtual
data that is derived from the database files but isn’t explicitly stored.
3. Sharing of knowledge and Multi-user Transaction Processing:
• A multi-user DBMS, as its name implies, must allow multiple users to access the database
at an equivalent time or concurrently.
• This is often essential if data for multiple applications is to be integrated and maintained
during a single database such as the latest feature of WhatsApp integration with
Facebook.
• The DBMS must implement concurrency control in the software to make sure that several
users trying to update equivalent data do so in a controlled manner in order that the
results of the updates are correct.
4. Manages Information
• A database always takes care of its information because information is always helpful for
whatever work we do. It manages all the information that is required to us.
5. Easy Operation Implementation
• All the operations like insert, delete, update, search etc. are carried out in a flexible and
easy way. Database makes it very simple to implement these operations. A user with little
knowledge can perform these operations. This characteristic of database makes it more
powerful.
6. Data For Specific Purpose
• A database is designed for data of specific purpose. For example, a database of
student management system is designed to maintain the record of student’s
marks, fees and attendance etc. This data has a specific purpose of maintaining
student record.
7. It has Users of Specific Interest
• A database always has some indented group of users and applications in which
these user groups are interested.
• For example, in a library system, there are three users, official administration of
the college, the librarian, and the students.
Characteristics of Data in the Database
▰Shared
▰Persistence
▰Validity/ Correctness
▰Security
▰Consistency
▰Non-Redundancy
▰Independence
Advantages of Using the Database Approach
▰Controlling redundancy in data storage and in development and maintenance efforts.
▻Sharing of data among multiple users.
▰Restricting unauthorized access to data.
▰Providing persistent storage for program Objects
▰Providing Storage Structures (e.g. indexes) for efficient Query Processing
▰Providing backup and recovery services.
▰Providing multiple interfaces to different classes of users.
▰Representing complex relationships among data.
▰Enforcing integrity constraints on the database.
▰Drawing inferences and actions from the stored data using deductive and active rules
When not to use a DBMS
▰Main inhibitors (costs) of using a DBMS:
▻High initial investment and possible need for additional hardware.
▻Overhead for providing generality, security, concurrency control, recovery, and integrity functions.
▰When a DBMS may be unnecessary:
▻If the database and applications are simple, well defined, and not expected to change.
▻If there are stringent real-time requirements that may not be met because of DBMS overhead.
▻If access to data by multiple users is not required.
▰When no DBMS may suffice:
▻If the database system is not able to handle the complexity of data because of modeling limitations
▻If the database users need special operations not supported by the DBMS.
Database Users
▰Users may be divided into
▻Those who actually use and control the database content, and those who
design, develop and maintain database applications (called “Actors on the
Scene”), and
▻Those who design and develop the DBMS software and related tools, and the
computer systems operators (called “Workers Behind the Scene”).
Actors on the scene
▻Database administrators:
▻Responsible for authorizing access to the database, for coordinating and
monitoring its use, acquiring software and hardware resources, controlling
its use and monitoring efficiency of operations.
▻Database Designers:
▻Responsible to define the content, the structure, the constraints, and
functions or transactions against the database. They must communicate
with the end-users and understand their needs.
▻End-users: They use the data for queries, reports and some of them update the
database content. End-users can be categorized into:
Casual: access database occasionally when needed
Naïve or Parametric: they make up a large section of the end-user population.
They use previously well-defined functions in the form of “canned transactions” against the database.
Examples are bank-tellers or reservation clerks who do this activity for an entire shift of operations.
Sophisticated:
These include business analysts, scientists, engineers, others thoroughly familiar with the system capabilities.
Many use tools in the form of software packages that work closely with the stored database.
Stand-alone:
Mostly maintain personal databases using ready-to-use packaged applications.
An example is a tax program user that creates its own internal database.
Another example is a user that maintains an address book
Workers Behind the scene
▰DBMS system designers and implementers :
▻Design and implement the DBMS modules and interfaces including modules for
implementing the catalog, query language processing, interface processing, accessing and
buffering data, controlling concurrency, and handling data recovery and security.
▰Tool developers
▻Design and implement tools which are optional packages for database design, performance
monitoring, natural language or graphical interfaces, prototyping, simulation, and test data
generation
▰Operators and maintenance personnel (system administration personnel) are responsible for the
actual running and maintenance of the hardware and software environment for the database
system.
Schemas, Instances and Database State
Database Schema (meta-data): The Design of a database is called the
schema. It Includes descriptions of the database structure and the
constraints that should hold on the database. The database schema
changes very infrequently.
Database Instance: The actual data stored in a database at a particular
moment in time. Also called database state ( or occurrence, snapshot)
The database state changes every time the database is updated.
Schema is also called intension, whereas state is called extension.
DBMS Architecture
• Three-Schema Architecture
External schema at the external level to describe the various user views.
Usually uses the same data model as the conceptual level or high-
level data model.
Conceptual schema at the conceptual level to describe the structure and
constraints for the whole database. Uses a conceptual or an
implementation data model.
Internal schema at the internal level to describe data storage structures
and access paths. Typically uses a physical data model.
• External/ View level
• This is the highest level of database abstraction. It includes a number of
external schemas or user views. This level provides different views of the
same database for a specific user or a group of users. An external view
provides a powerful and flexible security mechanism by hiding the parts of
the database from a particular user.
• Conceptual or Logical level
• This level describes the structure of the whole database. It acts as a middle
layer between the physical storage and user view. It explains what data to
be stored in the database, what the data types are, and what relationship
exists among those data. There is only one conceptual schema per
database.
• Internal or Physical level
• This is the lowest level of database abstraction. It describes how the
data is stored in the database and provides the methods to access
data from the database. It allows viewing the physical representation
of the database on the computer system.
• The interface between the conceptual and internal schema identifies
how an element in the conceptual schema is stored and how it may
be accessed. It is one which is closest to physical storage.
Data Independence
The capacity to change the schema at one level without having to change the schema
at the next higher level
Types:
Logical Data Independence: The capacity to change the conceptual schema
without having to change the external schemas and their application programs.
Physical Data Independence: The capacity to change the internal schema without
having to change the conceptual schema.
Requires only the mappings between one schema and higher-lever schemas to
change
Three Schema Architecture – Advantages
• Database abstraction
• Easier to use for a user.
• Allows each user to access customized view of data.
• Enables a database admin to change the storage structure without
affecting the user’s view
• The 3-tier architecture consists of the three layers as follows −
• Presentation layer − This layer is also called the client layer. The front-
end layer consists of a user interface. The main purpose is to
communicate with the application layer.
• Application layer − This layer is also called the business logic layer. It
acts as a middle layer between the client and the database server
which are used to exchange partially processed data.
• Database layer − In this layer the data or information is stored. This
layer performs operations like insert, update and delete to connect
with the database.