SlideShare ist ein Scribd-Unternehmen logo
1 von 56
CHAPTER 10:
STORAGE AND FILE
STRUCTURE
Storage Hierarchy
Volatile
Storage
Non-
-Volatile
Storage
Primary Storage
Tertiary
Storage
Secondary
Storage
Magnetic Hard Disk Mechanism
NOTE: Diagram is schematic, and simplifies the structure of actual disk drives
Performance Measures of Disks
• Access time – the time it takes from when a read or
write request is issued to when data transfer begins.
Consists of:
• Seek time – time it takes to reposition the arm over the
correct track.
• 4 to 10 milliseconds on typical disks
• Rotational latency – time it takes for the sector to be
accessed to appear under the head.
• 4 to 11 milliseconds on typical disks (5400 to 15000 r.p.m.)
• Data-transfer rate – the rate at which data can be
retrieved from or stored to the disk.
• 25 to 100 MB per second max rate, lower for inner tracks
FILE ORGANIZATION, RECORD
ORGANIZATION AND STORAGE
ACCESS
File Organization
• The database is stored as a collection of files.
Each file is a sequence of records. A record is a
sequence of fields.
• We first consider fixed length records, then extend
to variable length records.
Fixed-Length Records
• Simple approach:
• Store record i starting from byte n  (i – 1), where n is the size of
each record.
• Record access is simple but records may cross blocks
• Modification: do not allow records to cross block boundaries
• Deletion of record i:
alternatives:
• move records i + 1, . . ., n
to i, . . . , n – 1
• move record n to i
• do not move records, but
link all free records on a
free list
typeinstructor=record
ID varchar(5);
Name varchar(20);
Deptname varchar(20);
Salary numeric(8,2);
end
Free Lists
• Store the address of the first deleted record in the file header.
• Use this first record to store the address of the second deleted record,
and so on
• Can think of these stored addresses as pointers since they “point” to the
location of a record.
• More space efficient representation: reuse space for normal attributes of
free records to store pointers. (No pointers stored in in-use records.)
Variable-Length Records
• Variable-length records arise in database systems in several ways:
• Storage of multiple record types in a file.
• Record types that allow variable lengths for one or more fields such as
strings (varchar)
• Record types that allow repeating fields (used in some older data
models).
• Attributes are stored in order
• Variable length attributes represented by fixed size (offset, length),
with actual data stored after all fixed length attributes
• Null values represented by null-value bitmap
Variable-Length Records: Slotted Page Structure
• Slotted page header contains:
• number of record entries
• end of free space in the block
• location and size of each record
• Records can be moved around within a page to keep
them contiguous with no empty space between them;
entry in the header must be updated.
• Pointers should not point directly to record — instead
they should point to the entry for the record in header.
Organization of Records in Files
• Heap – a record can be placed anywhere in the
file where there is space
• Sequential – store records in sequential order,
based on the value of the search key of each
record
• Hashing – a hash function computed on some
attribute of each record; the result specifies in
which block of the file the record should be
placed
Data Dictionary Storage
• Information about relations
• names of relations
• names, types and lengths of attributes of each relation
• names and definitions of views
• integrity constraints
• User and accounting information, including passwords
• Statistical and descriptive data
• number of tuples in each relation
• Physical file organization information
• How relation is stored (sequential/hash/…)
• Physical location of relation
• Information about indices
The Data dictionary (also called system catalog) stores metadata; that
is, data about data, such as
Storage Access
• A database file is partitioned into fixed-length storage units called
blocks. Blocks are units of both storage allocation and data
transfer.
• Database system seeks to minimize the number of block transfers
between the disk and memory. We can reduce the number of disk
accesses by keeping as many blocks as possible in main memory.
• Buffer – portion of main memory available to store copies of disk
blocks.
• Buffer manager – subsystem responsible for allocating buffer space
in main memory.
DATA INDEXING AND
HASHING
Purposes of Data Indexing
• What is Data Indexing?
• A database index is a data structure that improves the speed of data
retrieval operations on a database table at the cost of additional writes
and storage space to maintain the index data structure
• Why is it important?
Concept of File Systems
• Stores and organizes data into computer files.
• Makes it easier to find and access data at any given time.
How DBMS Accesses Data?
• The operations read, modify, update, and delete are used
to access data from database.
• DBMS must first transfer the data temporarily to a buffer
in main memory.
• Data is then transferred between disk and main memory
into units called blocks.
Time Factors
• The transferring of data into blocks is a very slow
operation.
• Accessing data is determined by the physical storage
device being used.
Physical Storage Devices
• Random Access Memory – Fastest to access memory, but
most expensive.
• Direct Access Memory – In between for accessing
memory and cost
• Sequential Access Memory – Slowest to access memory,
and least expensive.
More Time Factors
• Querying data out of a database requires more time.
• DBMS must search among the blocks of the database file
to look for matching tuples.
Purpose of Data Indexing
• It is a data structure that is added to a file to provide faster
access to the data.
• It reduces the number of blocks that the DBMS has to
check.
Properties of Data Index
• It contains a search key and a pointer.
• Search key - an attribute or set of attributes that
is used to look up the records in a file.
• Pointer - contains the address of where the data
is stored in memory.
• It can be compared to the card catalog system
used in public libraries of the past.
Two Types of Indices
• Ordered index (Primary index or clustering index) – which
is used to access data sorted by order of values.
• Hash index (secondary index or non-clustering index ) -
used to access data that is distributed uniformly across a
range of buckets.
Ordered Index
Hash Index
Choosing Indexing Technique
• Five Factors involved when choosing the indexing
technique:
• access type
• access time
• insertion time
• deletion time
• space overhead
Indexing Definitions
• Access type is the type of access being used.
• Access time - time required to locate the data.
• Insertion time - time required to insert the new
data.
• Deletion time - time required to delete the data.
• Space overhead - the additional space occupied
by the added data structure.
Types of Ordered Indices
• Dense index - an index record appears for every search-
key value in the file.
• Sparse index - an index record that appears for only some
of the values in the file.
Dense Index
Sparse Index
Index Choice
• Dense index requires more space overhead and
more memory.
• Data can be accessed in a shorter time using
Dense Index.
• It is preferable to use a dense index when the file
is using a secondary index, or when the index file
is small compared to the size of the memory.
Choosing Multi-Level Index
• In some cases an index may be too large for efficient
processing.
• In that case use multi-level indexing.
• In multi-level indexing, the primary index is treated as a
sequence file and sparse index is created on it.
• The outer index is a sparse index of the primary index
whereas the inner index is the primary index.
Multi-Level Index
Hashing
• Bucket − A hash file stores data in bucket format. Bucket
is considered a unit of storage. A bucket typically stores
one complete disk block, which in turn can store one or
more records.
• Hash Function − A hash function, h, is a mapping
function that maps all the set of search-keys K to the
address where actual records are placed. It is a function
from search keys to bucket addresses.
• Hash function types
• Uniform
• Random
• That is, the hash function assigns each bucket the same
number of search-key values from the set of all possible
search-key values.
• That is, in the average case, each bucket will have nearly
the same number of values assigned to it, regardless of
the actual distribution of search-key values
Types of hashing
• Static hashing- In static hashing, when a search-key value is
provided, the hash function always computes the same
address.
• Dynamic hashing-The problem with static hashing is that it
does not expand or shrink dynamically as the size of the
database grows or shrinks. Dynamic hashing provides a
mechanism in which data buckets are added and removed
dynamically and on-demand. Dynamic hashing is also known
as extended hashing.
Bucket Overflows(Collision )
• If the bucket does not have enough space, a bucket
overflow is said to occur.
• Reasons:
Insufficient buckets
Skew
• Insufficient buckets. The number of buckets, which we
denote nB,
• must be chosen such that nB > nr / fr,
• where nr denotes the total number of records that will be stored
and
• fr denotes the number of records that will fit in a bucket.
• Skew. Some buckets are assigned more records than are
others, so a bucket may overflow even when other
buckets still have space.
• 1. Multiple records may have the same search key.
• 2. The chosen hash function may result in nonuniform distribution of
search keys.
Solution 1
• So that the probability of bucket overflow is reduced, the
number of buckets is chosen to be (nr / fr ) ∗ (1 + d),
where d is a fudge factor, typically around 0.2.
• Some space is wasted: About 20 percent of the space in
the buckets will be empty.
• But the benefit is that the probability of overflow is
reduced.
overflow buckets –solution2
• overflow buckets- The condition of bucket-overflow is
known as collision.
• Solution:
• Overflow Chaining − When buckets are full, a new
bucket is allocated for the same hash result and is linked
after the previous one. This mechanism is called Closed
Hashing.
• Linear Probing − When a hash function generates an
address at which data is already stored, the next free
bucket is allocated to it. This mechanism is called Open
Hashing.
• The form of hash structure that we have just described is
sometimes referred to as closed hashing.
• Under an alternative approach, called open hashing, the
• set of buckets is fixed, and there are no overflow chains.
Instead, if a bucket is
• full, the system inserts records in some other bucket in the
initial set of buckets B.
Dynamic Hashing
• Most databases grow larger over time.
• for such a database, we have three classes of options:
• 1 Choose a hash function based on the current file size.
This option will result in performance degradation as the
database grows.
• 2 Choose a hash function based on the anticipated size of
the file at some point in the future. Although performance
degradation is avoided, a significant amount of space may
be wasted initially.
• 3 Periodically reorganize the hash structure in
response to file growth. Such a reorganization
involves choosing a new hash function,
• Re-computing the hash function on every record
in the file, and generating new bucket
assignments
• This reorganization is a massive, time-consuming
operation.
HASHING EXAMPLE
Example
• Suppose A company with 250 employees assign a 5-digit
employee number to each employee which is used as
primary key in company’s employee file.
• We can use employee number as a address of record in memory.
• The search will require no comparisons at all.
• Unfortunately, this technique will require space for 1,00,000
memory locations, where as fewer locations would actually used.
• So, this trade off for time is not worth the expense.
Hashing
• The general idea of using the key to determine the
address of record is an excellent idea, but it must be
modified so that great deal of space is not wasted.
• This modification takes the form of a function H from the
set K of keys in to set L of memory address.
• H: K L , Is called a Hash Function or
• Unfortunately, Such a function H may not yield distinct values: it is
possible that two different keys k1 and k2 will yield the same hash
address. This situation is called Collision, and some method must be
used to resolve it.
Hash Functions
• the two principal criteria used in selecting a hash function
H: K L are as follows:
1. The function H should be very easy and quick
to compute.
2. The function H should as far as possible,
uniformly distribute the hash address through
out the set L so that there are minimum number
of collision.
Hash Functions
1. Division method: choose a number m larger than the number n of
keys in K. (m is usually either a prime number or a number without
small divisor) the hash function H is defined by
H(k) = k (mod m) or H(k) = k (mod m) + 1.
here k (mod m) denotes the reminder when k is divided by m. the
second formula is used when we want a hash address to range
from 1 to m rather than 0 to m-1.
2. Midsquare method: the key k is squared. Then the hash function H is
defined by H(k) = l. where l is obtained by deleting digits from
both end of k^2.
3. Folding Method: the key k is portioned into a number of parts, k1, k2,
……,kr, where each part is added togather, ignoring the last carry.
H(k) = k1+k2+ ……………+Kr.
Sometimes, for extra “milling”, the even numbered parts, k2, k4, …. Are
each reversed befor addition.
Example of Hash Functions
Consider a company with 68 employees assigns a 4-digit employee
number to each employee. Suppose L consists of 100 two-digit
address: 00, 01, 02 , ……….99. we apply above hash functions to each
of following employee numbers: 3205, 7148,2345.
1. Division Method:
choose a prime number m close to 99, m=97.
H(k)=k(mod m): H(3205)=4, H(7148)=67, H(2345)=17.
2. Midsquare Method:
k= 3205 7148 2345
k^2= 10272025 51093904 5499025
H(k)= 72 93 99
3. Folding Method: chopping the key k into two parts and adding yield
the following hash address:
H(3205)=32+05=37, H(7148)=71+48=19, H(2345)=23+45=68
Or,
H(3205)=32+50=82, H(7148)=71+84=55, H(2345)=23+54=77
Collision Resolution
• Suppose we want to add a new record R with key K to our file F, but
suppose the memory location address H(k) is already occupied. This
situation is called Collision.
• There are two general ways to resolve collisions :
• Open addressing,(array method)
• Separate Chaining (linked list method)

Weitere ähnliche Inhalte

Ähnlich wie Data Indexing Presentation-My.pptppt.ppt

overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam pratikkadam78
 
File Structure.pptx
File Structure.pptxFile Structure.pptx
File Structure.pptxzedd15
 
2.7 use of ict in data management
2.7 use of ict in data management2.7 use of ict in data management
2.7 use of ict in data managementHaa'Meem Mohiyuddin
 
UNIT7-FileMgmt.pptx
UNIT7-FileMgmt.pptxUNIT7-FileMgmt.pptx
UNIT7-FileMgmt.pptxNavyaKumar22
 
Csci12 report aug18
Csci12 report aug18Csci12 report aug18
Csci12 report aug18karenostil
 
File organization
File organizationFile organization
File organizationGokul017
 
files,indexing,hashing,linear and non linear hashing
files,indexing,hashing,linear and non linear hashingfiles,indexing,hashing,linear and non linear hashing
files,indexing,hashing,linear and non linear hashingRohit Kumar
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
409793049-Storage-Virtualization-pptx.pptx
409793049-Storage-Virtualization-pptx.pptx409793049-Storage-Virtualization-pptx.pptx
409793049-Storage-Virtualization-pptx.pptxson2483
 
19IS305_U4_LP10_LM10-22-23.pdf
19IS305_U4_LP10_LM10-22-23.pdf19IS305_U4_LP10_LM10-22-23.pdf
19IS305_U4_LP10_LM10-22-23.pdfJESUNPK
 
File system and Deadlocks
File system and DeadlocksFile system and Deadlocks
File system and DeadlocksRohit Jain
 
Fundamental file structure concepts & managing files of records
Fundamental file structure concepts & managing files of recordsFundamental file structure concepts & managing files of records
Fundamental file structure concepts & managing files of recordsDevyani Vaidya
 
cache memory introduction, level, function
cache memory introduction, level, functioncache memory introduction, level, function
cache memory introduction, level, functionTeddyIswahyudi1
 
Elasticsearch tuning
Elasticsearch tuningElasticsearch tuning
Elasticsearch tuningNIKHIL DUBEY
 
FIle Organization.pptx
FIle Organization.pptxFIle Organization.pptx
FIle Organization.pptxSreenivas R
 

Ähnlich wie Data Indexing Presentation-My.pptppt.ppt (20)

overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam
 
File Structure.pptx
File Structure.pptxFile Structure.pptx
File Structure.pptx
 
1650607.ppt
1650607.ppt1650607.ppt
1650607.ppt
 
2.7 use of ict in data management
2.7 use of ict in data management2.7 use of ict in data management
2.7 use of ict in data management
 
UNIT7-FileMgmt.pptx
UNIT7-FileMgmt.pptxUNIT7-FileMgmt.pptx
UNIT7-FileMgmt.pptx
 
Csci12 report aug18
Csci12 report aug18Csci12 report aug18
Csci12 report aug18
 
File organization
File organizationFile organization
File organization
 
files,indexing,hashing,linear and non linear hashing
files,indexing,hashing,linear and non linear hashingfiles,indexing,hashing,linear and non linear hashing
files,indexing,hashing,linear and non linear hashing
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
409793049-Storage-Virtualization-pptx.pptx
409793049-Storage-Virtualization-pptx.pptx409793049-Storage-Virtualization-pptx.pptx
409793049-Storage-Virtualization-pptx.pptx
 
19IS305_U4_LP10_LM10-22-23.pdf
19IS305_U4_LP10_LM10-22-23.pdf19IS305_U4_LP10_LM10-22-23.pdf
19IS305_U4_LP10_LM10-22-23.pdf
 
Rdbms
RdbmsRdbms
Rdbms
 
File organization
File organizationFile organization
File organization
 
File system and Deadlocks
File system and DeadlocksFile system and Deadlocks
File system and Deadlocks
 
Fundamental file structure concepts & managing files of records
Fundamental file structure concepts & managing files of recordsFundamental file structure concepts & managing files of records
Fundamental file structure concepts & managing files of records
 
cache memory introduction, level, function
cache memory introduction, level, functioncache memory introduction, level, function
cache memory introduction, level, function
 
Elasticsearch tuning
Elasticsearch tuningElasticsearch tuning
Elasticsearch tuning
 
virtual_memory (3).pptx
virtual_memory (3).pptxvirtual_memory (3).pptx
virtual_memory (3).pptx
 
OS Unit5.pptx
OS Unit5.pptxOS Unit5.pptx
OS Unit5.pptx
 
FIle Organization.pptx
FIle Organization.pptxFIle Organization.pptx
FIle Organization.pptx
 

Kürzlich hochgeladen

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 

Kürzlich hochgeladen (20)

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

Data Indexing Presentation-My.pptppt.ppt

  • 1. CHAPTER 10: STORAGE AND FILE STRUCTURE
  • 3. Magnetic Hard Disk Mechanism NOTE: Diagram is schematic, and simplifies the structure of actual disk drives
  • 4. Performance Measures of Disks • Access time – the time it takes from when a read or write request is issued to when data transfer begins. Consists of: • Seek time – time it takes to reposition the arm over the correct track. • 4 to 10 milliseconds on typical disks • Rotational latency – time it takes for the sector to be accessed to appear under the head. • 4 to 11 milliseconds on typical disks (5400 to 15000 r.p.m.) • Data-transfer rate – the rate at which data can be retrieved from or stored to the disk. • 25 to 100 MB per second max rate, lower for inner tracks
  • 6. File Organization • The database is stored as a collection of files. Each file is a sequence of records. A record is a sequence of fields. • We first consider fixed length records, then extend to variable length records.
  • 7. Fixed-Length Records • Simple approach: • Store record i starting from byte n  (i – 1), where n is the size of each record. • Record access is simple but records may cross blocks • Modification: do not allow records to cross block boundaries • Deletion of record i: alternatives: • move records i + 1, . . ., n to i, . . . , n – 1 • move record n to i • do not move records, but link all free records on a free list
  • 9. Free Lists • Store the address of the first deleted record in the file header. • Use this first record to store the address of the second deleted record, and so on • Can think of these stored addresses as pointers since they “point” to the location of a record. • More space efficient representation: reuse space for normal attributes of free records to store pointers. (No pointers stored in in-use records.)
  • 10. Variable-Length Records • Variable-length records arise in database systems in several ways: • Storage of multiple record types in a file. • Record types that allow variable lengths for one or more fields such as strings (varchar) • Record types that allow repeating fields (used in some older data models). • Attributes are stored in order • Variable length attributes represented by fixed size (offset, length), with actual data stored after all fixed length attributes • Null values represented by null-value bitmap
  • 11. Variable-Length Records: Slotted Page Structure • Slotted page header contains: • number of record entries • end of free space in the block • location and size of each record • Records can be moved around within a page to keep them contiguous with no empty space between them; entry in the header must be updated. • Pointers should not point directly to record — instead they should point to the entry for the record in header.
  • 12. Organization of Records in Files • Heap – a record can be placed anywhere in the file where there is space • Sequential – store records in sequential order, based on the value of the search key of each record • Hashing – a hash function computed on some attribute of each record; the result specifies in which block of the file the record should be placed
  • 13. Data Dictionary Storage • Information about relations • names of relations • names, types and lengths of attributes of each relation • names and definitions of views • integrity constraints • User and accounting information, including passwords • Statistical and descriptive data • number of tuples in each relation • Physical file organization information • How relation is stored (sequential/hash/…) • Physical location of relation • Information about indices The Data dictionary (also called system catalog) stores metadata; that is, data about data, such as
  • 14. Storage Access • A database file is partitioned into fixed-length storage units called blocks. Blocks are units of both storage allocation and data transfer. • Database system seeks to minimize the number of block transfers between the disk and memory. We can reduce the number of disk accesses by keeping as many blocks as possible in main memory. • Buffer – portion of main memory available to store copies of disk blocks. • Buffer manager – subsystem responsible for allocating buffer space in main memory.
  • 16. Purposes of Data Indexing • What is Data Indexing? • A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure • Why is it important?
  • 17. Concept of File Systems • Stores and organizes data into computer files. • Makes it easier to find and access data at any given time.
  • 18. How DBMS Accesses Data? • The operations read, modify, update, and delete are used to access data from database. • DBMS must first transfer the data temporarily to a buffer in main memory. • Data is then transferred between disk and main memory into units called blocks.
  • 19. Time Factors • The transferring of data into blocks is a very slow operation. • Accessing data is determined by the physical storage device being used.
  • 20. Physical Storage Devices • Random Access Memory – Fastest to access memory, but most expensive. • Direct Access Memory – In between for accessing memory and cost • Sequential Access Memory – Slowest to access memory, and least expensive.
  • 21. More Time Factors • Querying data out of a database requires more time. • DBMS must search among the blocks of the database file to look for matching tuples.
  • 22. Purpose of Data Indexing • It is a data structure that is added to a file to provide faster access to the data. • It reduces the number of blocks that the DBMS has to check.
  • 23. Properties of Data Index • It contains a search key and a pointer. • Search key - an attribute or set of attributes that is used to look up the records in a file. • Pointer - contains the address of where the data is stored in memory. • It can be compared to the card catalog system used in public libraries of the past.
  • 24. Two Types of Indices • Ordered index (Primary index or clustering index) – which is used to access data sorted by order of values. • Hash index (secondary index or non-clustering index ) - used to access data that is distributed uniformly across a range of buckets.
  • 27. Choosing Indexing Technique • Five Factors involved when choosing the indexing technique: • access type • access time • insertion time • deletion time • space overhead
  • 28. Indexing Definitions • Access type is the type of access being used. • Access time - time required to locate the data. • Insertion time - time required to insert the new data. • Deletion time - time required to delete the data. • Space overhead - the additional space occupied by the added data structure.
  • 29. Types of Ordered Indices • Dense index - an index record appears for every search- key value in the file. • Sparse index - an index record that appears for only some of the values in the file.
  • 32. Index Choice • Dense index requires more space overhead and more memory. • Data can be accessed in a shorter time using Dense Index. • It is preferable to use a dense index when the file is using a secondary index, or when the index file is small compared to the size of the memory.
  • 33. Choosing Multi-Level Index • In some cases an index may be too large for efficient processing. • In that case use multi-level indexing. • In multi-level indexing, the primary index is treated as a sequence file and sparse index is created on it. • The outer index is a sparse index of the primary index whereas the inner index is the primary index.
  • 35. Hashing • Bucket − A hash file stores data in bucket format. Bucket is considered a unit of storage. A bucket typically stores one complete disk block, which in turn can store one or more records. • Hash Function − A hash function, h, is a mapping function that maps all the set of search-keys K to the address where actual records are placed. It is a function from search keys to bucket addresses. • Hash function types • Uniform • Random
  • 36. • That is, the hash function assigns each bucket the same number of search-key values from the set of all possible search-key values. • That is, in the average case, each bucket will have nearly the same number of values assigned to it, regardless of the actual distribution of search-key values
  • 37.
  • 38. Types of hashing • Static hashing- In static hashing, when a search-key value is provided, the hash function always computes the same address. • Dynamic hashing-The problem with static hashing is that it does not expand or shrink dynamically as the size of the database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets are added and removed dynamically and on-demand. Dynamic hashing is also known as extended hashing.
  • 39. Bucket Overflows(Collision ) • If the bucket does not have enough space, a bucket overflow is said to occur. • Reasons: Insufficient buckets Skew
  • 40. • Insufficient buckets. The number of buckets, which we denote nB, • must be chosen such that nB > nr / fr, • where nr denotes the total number of records that will be stored and • fr denotes the number of records that will fit in a bucket.
  • 41. • Skew. Some buckets are assigned more records than are others, so a bucket may overflow even when other buckets still have space. • 1. Multiple records may have the same search key. • 2. The chosen hash function may result in nonuniform distribution of search keys.
  • 42. Solution 1 • So that the probability of bucket overflow is reduced, the number of buckets is chosen to be (nr / fr ) ∗ (1 + d), where d is a fudge factor, typically around 0.2. • Some space is wasted: About 20 percent of the space in the buckets will be empty. • But the benefit is that the probability of overflow is reduced.
  • 43. overflow buckets –solution2 • overflow buckets- The condition of bucket-overflow is known as collision. • Solution: • Overflow Chaining − When buckets are full, a new bucket is allocated for the same hash result and is linked after the previous one. This mechanism is called Closed Hashing. • Linear Probing − When a hash function generates an address at which data is already stored, the next free bucket is allocated to it. This mechanism is called Open Hashing.
  • 44. • The form of hash structure that we have just described is sometimes referred to as closed hashing.
  • 45. • Under an alternative approach, called open hashing, the • set of buckets is fixed, and there are no overflow chains. Instead, if a bucket is • full, the system inserts records in some other bucket in the initial set of buckets B.
  • 46. Dynamic Hashing • Most databases grow larger over time.
  • 47. • for such a database, we have three classes of options: • 1 Choose a hash function based on the current file size. This option will result in performance degradation as the database grows.
  • 48. • 2 Choose a hash function based on the anticipated size of the file at some point in the future. Although performance degradation is avoided, a significant amount of space may be wasted initially.
  • 49. • 3 Periodically reorganize the hash structure in response to file growth. Such a reorganization involves choosing a new hash function, • Re-computing the hash function on every record in the file, and generating new bucket assignments • This reorganization is a massive, time-consuming operation.
  • 51. Example • Suppose A company with 250 employees assign a 5-digit employee number to each employee which is used as primary key in company’s employee file. • We can use employee number as a address of record in memory. • The search will require no comparisons at all. • Unfortunately, this technique will require space for 1,00,000 memory locations, where as fewer locations would actually used. • So, this trade off for time is not worth the expense.
  • 52. Hashing • The general idea of using the key to determine the address of record is an excellent idea, but it must be modified so that great deal of space is not wasted. • This modification takes the form of a function H from the set K of keys in to set L of memory address. • H: K L , Is called a Hash Function or • Unfortunately, Such a function H may not yield distinct values: it is possible that two different keys k1 and k2 will yield the same hash address. This situation is called Collision, and some method must be used to resolve it.
  • 53. Hash Functions • the two principal criteria used in selecting a hash function H: K L are as follows: 1. The function H should be very easy and quick to compute. 2. The function H should as far as possible, uniformly distribute the hash address through out the set L so that there are minimum number of collision.
  • 54. Hash Functions 1. Division method: choose a number m larger than the number n of keys in K. (m is usually either a prime number or a number without small divisor) the hash function H is defined by H(k) = k (mod m) or H(k) = k (mod m) + 1. here k (mod m) denotes the reminder when k is divided by m. the second formula is used when we want a hash address to range from 1 to m rather than 0 to m-1. 2. Midsquare method: the key k is squared. Then the hash function H is defined by H(k) = l. where l is obtained by deleting digits from both end of k^2. 3. Folding Method: the key k is portioned into a number of parts, k1, k2, ……,kr, where each part is added togather, ignoring the last carry. H(k) = k1+k2+ ……………+Kr. Sometimes, for extra “milling”, the even numbered parts, k2, k4, …. Are each reversed befor addition.
  • 55. Example of Hash Functions Consider a company with 68 employees assigns a 4-digit employee number to each employee. Suppose L consists of 100 two-digit address: 00, 01, 02 , ……….99. we apply above hash functions to each of following employee numbers: 3205, 7148,2345. 1. Division Method: choose a prime number m close to 99, m=97. H(k)=k(mod m): H(3205)=4, H(7148)=67, H(2345)=17. 2. Midsquare Method: k= 3205 7148 2345 k^2= 10272025 51093904 5499025 H(k)= 72 93 99 3. Folding Method: chopping the key k into two parts and adding yield the following hash address: H(3205)=32+05=37, H(7148)=71+48=19, H(2345)=23+45=68 Or, H(3205)=32+50=82, H(7148)=71+84=55, H(2345)=23+54=77
  • 56. Collision Resolution • Suppose we want to add a new record R with key K to our file F, but suppose the memory location address H(k) is already occupied. This situation is called Collision. • There are two general ways to resolve collisions : • Open addressing,(array method) • Separate Chaining (linked list method)