SlideShare ist ein Scribd-Unternehmen logo
1 von 68
File Organization and
Indexing
PRESENTED BY :Raveena
FILE ORGANIZATION
is a method of arranging data on secondary
storage devices and addressing them such that it
facilitates storage and read/write operations of
data or information requested by the user.
A record is a collection of logically related fields
or data items. Each data item is formed of one or
more bytes. Record usually describes entities
and their attributes.
File is a collection of related sequence of
records.
2
3
File Organization
 The physical arrangement of data in a file into records and pages on
the disk
 File organization determines the set of access methods for
 Storing and retrieving records from a file
 Therefore, ‘file organization’ synonymous with ‘access method’
 We study three types of file organization
 Unordered or Heap files
 Ordered or sequential files
 Hash files
 We examine each of them in terms of the operations we perform on
the database
 Insert a new record
 Search for a record (or update a record)
 Delete a record
File Indexing
 Database - is an organized collection of logically
related data.
 The operations (read, modify, update, and
delete) access data from the database.
 DBMS must first transfer the data temporarily
from secondary memory to a buffer in main
memory.
5
Need for indexing:
 Data is transferred between disk and main
memory into units called blocks.
 The transferring of data from secondary memory
to main memory is a very slow operation.
6
Need for indexing:
 An index is a data structure, which is maintained
to determine the location of records in a file.
The Index Table :
 The index table contains a search key and a
pointer.
 Search key - an attribute or set of attributes that is
used to look up the records in a file.
 Pointer - contains the address of where the record
is stored in the memory.
7
Index :
 Each index entry corresponds to a record on
the secondary storage file.
To retrieve a record from the disk-
 First the index is searched for the address of
the record in the index table
 Then the record is read from that address.
 It is just like searching for a book in a library
catalogue.
8
 An index a data structure that is added to
a file to provide faster access to the data.
 It reduces the number of blocks that the
DBMS has to check.
9
 If there is a table similar to this:
CREATE TABLE test1
( id integer,
content varchar2(30) );
 The application requires a lot of queries of the form
SELECT content FROM test1 WHERE id =
constant;
 Ordinarily, the system would have to scan the entire
test1 table row by row to find all matching entries.
10
Example:
 SYNTAX:
CREATE INDEX <index-name>
ON <table-name>
(<column-name> [ASC|DESC],
<column-name> [ASC|DESC]…);
11
Creating Index :
ID CONTENT …..
1 ABC …..
2 DEF …..
3 PQR …..
4 XYZ …..
12
1
2
3
4
INDEX
TABLE
Create the index on the id column:
CREATE INDEX testid_index ON test1
(id ASC);
FILE ORGANIZATION
A file can be of 2 formats:
 Fixed length records
 Variable length records
13
FIXED LENGTH RECORDS
 All records on the page are of the same slot length i.e.
every record in the file has exactly the same size(in bytes).
 Record slots are uniform and records are arranged
consecutively within a page.
type deposit=record
account_no:char(10);
branch_name:char(22);
balance:numeral(12,2);
end
Assume char occupies 1 byte and numeral stores 8 bytes
.So first 40 bytes are stored for first record and next 40
bytes for the second record and so on.
14
FIXED LENGTH RECORDS
ACCOUNT_NO BRANCH_NAME BALANCE
Record 0-> A-101 AB 100
Record 1-> A-102 CD 400
Record 2-> A-567 BC 200
Record 3-> A-92 AT 900
Record 4-> A-45 TC 300
15
FIXED LENGTH RECORDS
 Insertion- insertion occurs at the first
available slot(or empty spaces).
 Deletion- after a record is deleted an empty
space is created. Now there can be various
ways to fill this space:
(a) shift all the records just below space by one
place upwards. and all the empty slots
appear together at the end ( requires a large
no. of shifting)
16
FIXED LENGTH RECORDS
(b) shift the last record in empty slot, instead of
disturbing large no. of records.
(c) here deletion of records is handled by using
an array of bits called file header at the
beginning of the file. When the first record is
deleted, the file header stores its address.
Now this empty slot of record is used to store
the address ( also called pointers)of next
empty slot.
17
FIXED LENGTH RECORDS
ACCOUNT_NO BRANCH_NAM
E
BALANCE
Record 0-> A-101 AB 100
Record 2-> A-567 BC 200
Record 3-> A-92 AT 900
Record 4-> A-45 TC 300
ACCOUNT_NO BRANCH_NAM
E
BALANCE
A-101 AB 100
A-567 BC 200
A-92 AT 900
A-45 TC 300
18
FIXED LENGTH RECORDS
(b)
ACCOUNT_NO BRANCH_NAME BALANCE
A-101 AB 100
A-45 TC 300
A-567 BC 200
A-92 AT 900
19
ACCOUNT_NO BRANCH_NAM
E
BALANCE
Record 0-> A-101 AB 100
Record 2-> A-567 BC 200
Record 3-> A-92 AT 900
Record 4-> A-45 TC 300
FIXED LENGTH RECORDS
ACCOUNT_NO BRANCH_NAM
E
BALANCE
Stores the
address
of first emplty
slot
A-101 AB 100
A-567 BC 200
A-45 TC 300
FILE
HEADER
20
FIXED LENGTH RECORDS
These stored empty slot addresses form a
link list called as FREE LIST. Now whenever
a new record is to be inserted, it is inserted in
the first available empty slot pointed by the
header. And value of file header is changed
to address of next available empty slot. In
case of unavailability of empty slot, the new
record is added at the end of the file.
21
ADVANTAGE OF FIXED LENGTH
RECORDS
Because the space made available by a
deleted record is exactly the space
needed to insert a new record, insertion
and deletion for files are simple to
implement.
22
VARIABLE LENGTH RECORDS
 All the records on the page are not of the same
length but here different records in the file have
different sizes.
type account_list=record
branch_name:char(22);
account_info:array[1..infinity]of record
account_no:char(10);
balance:numeral(12,2);
end
end
23
Variable length records
2 ways to implement variable-length records
 Byte string representation
 Fixed length representation
24
Variable length records:
Byte string representation
 a special symbol called end of record( _|_) is added
at the end of each record.
 each record is stored as a string of consecutive
bytes.
 Disadvantages:
a)it is very difficult to reuse empty slot (occupied
formally
by deleted record).
b)a large no. of small fragments of disk storage are
wasted.
 Therefore a modified form of byte-string
representation called the slotted-page structure, is
commonly used for organizing records within a single
block.
25
Byte string representation
BRANCH_
NAME
ACCOUNT
_NO
BALANCE ACCOUNT
_NO
BALANCE
AB A-101 500 A-147 200
CD A-92 100 _I_
EF A-47 900 _I_
GH A-52 500 A-66 1000
_I_
_I_
26
Consider an example where a person can have more than one accounts in a bank.
Slotted-page structure
 Used for organizing records within a single block .
 There is a header at the beginning of each block
containing the following information:
(i)The no. of record entries .
(ii)The end of free space in the block
(iii)An array whose entries contain the loc and size of
each record.
Free space
#Entries
27
Variable length records:
Fixed length representation
 One or more fixed-length records are used to represent
variable-length record.
 2 methods namely (a)reserved space (b) list
representation
 Reserved Space:
(i) a fixed length record of the size equal to that of maximum
record length in a file is used.
(ii) unused space of the records shorter than the maximum
size is filled with a special null or end-of –record symbol.
Method is useful when most records have length close to
the maximum record length otherwise a significant amount
of space is wasted.
28
Fixed length representation:
Reserved Space
BRANCH_N
AME
ACCOUNT_
NO
BALANCE ACCOUNT_
NO
BALANCE
AB A-101 500 A-147 200
CD A-92 100 _I_ _I_
EF A-47 900 A-65 400
GH A-52 500 A-66 1000
29
Variable length records : Fixed length
representation
 List representation(also called link list)
(i) here a list of fixed-length records is
chained together by pointers. Method is
similar to file header address concept except
in case of file header method pointers are
used to chain together only deleted records,
whereas here pointers of all records
pertaining to the same branch_name are
chained together.
30
Fixed length representation:
List representation
BRANCH_N
AME
ACCOUNT_
NO
BALANCE
AB A-101 500
CD A-92 100
EF A-47 900
GH A-52 500
A-147 200
A-65 400
A-66 1000
FILE
HEADER
31
Variable length records:Fixed length
representation
Problem with this representation is
when a new record is to be inserted, an
empty slot of just right size is needed
( neither a smaller size will do nor the big slot
would solve the purpose as it would lead to
wastage of space).So it becomes imp to
allocate a right size of slot for insertion of
record and move records to fill the space
created by deletion of records to ensure that
all the free space in the file is contiguous.
32
Variable length records:Fixed length
representation
To deal with this problem, we allow two kinds
of blocks in our file:
1. Anchor block: which contain the first record
of a chain
2. Overflow block: This contains records other
than those that are the first record of a chain.
Thus all the records within a block have the
same length, even though not all records in
the file have the same length.
33
Anchor block-Overflow block
 Anchor block:
BRANCH_NAME ACCOUNT_NO BALANCE
AB A-101 500
CD A-92 100
EF A-47 900
GH A-52 500
OVERFLOW BLOCK
A-147 200
A-65 400
A-66 1000
34
 SYNTAX:
CREATE INDEX <index-name>
ON <table-name>
(<column-name> [ASC|DESC],
<column-name> [ASC|DESC]…);
35
Creating Index :
ID CONTENT …..
1 ABC …..
2 DEF …..
3 PQR …..
4 XYZ …..
36
1
2
3
4
INDEX
TABLE
Create the index on the id column:
CREATE INDEX testid_index ON test1
(id ASC);
 Ordered indexing
 Hashed indexing
 Ordered index – which is used to access data
sorted by order of values.
Binary search can be used to access records.
 Hash index - used to access data that is
distributed uniformly across a range using a
hashed function.
37
Types of indexing techniques :
Dense indexing :
 An index record or entry appears for every search-key
value in the table.
 The index record contains the search key and a pointer to
the first data record with that search-key value.
38
Types of Ordered indexing
39
Example:
1
2
3
Taking year as the search key :
ROLL
NO
NAME YEAR MARKS PERCE-
NTAGE
6545 ABC 1 894 76.8
7645 BCD 2 900 78.4
7456 PQR 2 1055 82.9
8734 XYZ 3 867 75.6
Insertion :
 if the search key value does not appear in the index,
the new record is inserted to an appropriate position
 if the index record stores a pointer to only the first
record with the same search-key value, the record
being inserted is placed right after the other records
with the same search-key values
Deletion :
 if the deleted record was the only record with its
unique search-key value, then it is simply deleted
 If the index record stores a pointer to only the first
record with the same search-key value, and if the
deleted record was the first record, update the index
record to point to the next record.
40
41
Inserting :
ROLL
NO
NAME YEAR MARKS PERCE-
NTAGE
6545 ABC 1 894 76.8
7645 BCD 2 900 78.4
7456 PQR 2 1055 82.9
7477 RST 2 1088 88.8
8734 XYZ 3 867 75.6
8899 UVW 4 894 76.8
1
2
3
4
Inserting
-a record with year 2 (existing search-key value)
-a record with year 4 (new search-key value)
 An index record that appears for only some of
the values in the file.
 The records are divided into blocks.
 The index is assumed to be storing an entry
of each block of the file.
 To locate a record, we find the index record
with the largest search key value less than or
equal to the search key value we are looking
for.
43
Sparse Indexing :
44
Example :
Insertion :
 if no new block is created, no change is made
to the index.
 if a new block is created, the first search-key
value in the new block is added to the index.
Deletion :
 if the deleted record was the only record with its
search key, the corresponding index record is
replaced with an index record for the next
search-key value
 if the record being deleted is the first record of
the block, the next record with the same search-
key value is updated as the reference instead 45
Which one is better?
Dense or sparse?
 Records can be located faster using
dense index than sparse index.
 But, sparse indexes require less space
and they impose less maintenance
overhead for insertions and deletions.
 So choosing between dense and sparse
indexes , it’s a trade off between time
and space overhead.
Primary Index :
 It is an ordered index with the primary key field
as the search-key.
 It can be dense or sparse.
 There can only be one primary index for a file.
Secondary Index :
 It provides a secondary means of accessing a
file where a primary access already exists.
 The search-key is a candidate key & has a
unique value.
 More than one secondary indices can exist for
the same file
47
Clustered Index :
 If the records are physically ordered on a nonkey
field – which does not have a distinct value for
each record-that field is called the clustering
field and the index with the clustering field as
the search-key is known as the clustered index.
 The clustering index has an entry for each
distinct value of the clustering field.
 There can be only one clustered index per table.
 Blocks of fixed size are reserved for each value
of the clustering field to avoid physical
reordering during insertion & deletion.
48
Clustered Index(finally !)_
A clustered index is a special type of index
that reorders the way records in the table are
physically stored. Therefore a table can have
only one clustered index.
Clustering alters the data block into a certain
distinct order to match the index, hence it is
also an operation on the data storage blocks
as well as on the index.
Example of Clustered Index_
An address book ordered by first name
resembles a clustered index in its
structure and purpose.
A dictionary.
 Clustered indexes are good for range
searches.
Non clustered Index :
 The logical order of the index does not match
the physical stored order of the rows on disk.
 A file can have multiple non-clustered
indexes.
 Non-clustered indexes are good for random
searches.
51
Example of Non-Clustered Index_
An index on the backside of the book
where you see only the pages on which
the words of the index appear.
The order of the words in the book is not
according to the order of the words in the
index.
Non-Clustered Index_
An non-clustered index is structured in a way
that doesn't correspond to the order of the
actual data records.
For this reason, it will perform worse than
clustered indexes on ranged searches where
the result set is large, since each result could
cost an additional I/O-operation to get the
actual data record.
 If an index may be too large for efficient
processing we use multi-level indexing.
 The outer index is a sparse index of the primary
index whereas the inner index is the primary
index.
 In the same way any no. of levels can be made
depending on the size of the index.
 Generally all index levels are physically ordered
which creates problems in insertion & deletion.
 Therefore , dynamic multilevel indexes (B-
trees and B+
-trees) are used.
54
Multilevel Index :
55
Multilevel Index :
HASHING
INTRODUCTION
 Hashing is a method for determining physical
location of record.
 In it the record key is processed
mathematically and another no is computed
which represents location of record (hashing
function).
 Files which use hashing techniques are
called hash files.
Example…
empid ename salary
4555 abc 5000
Here empid is primary key. let there be 1000 employees. hash
Function is to divide the key by prime number close to 1000 and
Use remainder as storage location.
Prime number=997
Remainder =567
Hence the employee record is stored at memory location 567.
Types of hashing
 Internal Hashing-In case
of internal files hashing is
implemented as a hash
table .Hash table is a
data structure in which
location of data item is
determined by hash
function. If hashed value
of two keys result same
then it is called ‘collision’.
K1
k4
k2
k3
H(k1)
H(k4)
H(k2)
H(k3)
Set of keys
Hash table
External hashing
 Hashing for disk files is called external hashing.
Target address space is made of buckets.
Each bucket has multiple records (block address).
Hash functions map record key to bucket number.
A table maintained in file header converts bucket number to
Corresponding block address.
 Hash function is used to locate records for access, insertion as
well as deletion.
This method of buckets prevents collision.
This technique of hashing is called static hashing as fixed
number of buckets are allocated in beginning.
.
1.
2.
3.
Block 1
Block 2
Block 3
Buckets
key
Hash function
CREATE TABLE test ( id INTEGER PRIMARY KEY ,
name varchar(100) );
In table test, we have decided to store 4 rows...
insert into test values (1,'Gordon'); insert into test values (2,'Jim');
insert into test values (4,'Andrew'); insert into test values (3,'John');
The algorithm splits the places which the rows are to be stored
into areas. These areas are called buckets. If a row's primary key
matches the requirements to be stored in that bucket, then that is
where it will be stored. The algorithm to decide which bucket to
use is called the hash function. For our example we will have a
nice simple hash function, where the bucket number equals the
primary key.
Problem of overflow….
 Overflow occurs when bucket is filled to its
capacity and new record has to be inserted.
 We can have pointer in bucket which points to
linked list of overflowed records of buckets.
Deficiencies of Static Hashing
• In static hashing, function h maps search-key values to a
fixed set of B of bucket addresses.
– Databases grow with time. If initial number of buckets is too small,
performance will degrade due to too much overflows.
.
– If database shrinks, again space will be wasted.
.
• These problems can be avoided by using techniques that
allow the number of buckets to be modified dynamically.
Dynamic Hashing
• Good for database that grows and shrinks in size. Allows the
hash function to be modified dynamically
• Extendable hashing – one form of dynamic hashing
•A directory –an array of 2^d bucket address is maintained
•D is called global depth of directory.
•The result of hash function is represented in binary form and
the first ‘i’ bits of hash value are used to determine directory
entry and value at it determines bucket in which record is to be
stored.
•We can increase the number of ‘d’ to accommodate more
buckets.
1001
1010
1011
Buckets for records with
hash value starting with100
Buckets for record with hash
value starting with 101.
key
Hash function
Here global depth is 4 and let first three bits of hash value
represent array Index.
empid ename
6 abc
5 xyz
example…..
1101
1011
directory
Buckets for records having hash
value Starting with 110
Bucket for records having hash
value Starting with 101
Hash function=k*2
Hash value for first
record is 1100
Hash value for
second record is
1010

Weitere ähnliche Inhalte

Was ist angesagt?

Concurrency Control in Database Management System
Concurrency Control in Database Management SystemConcurrency Control in Database Management System
Concurrency Control in Database Management SystemJanki Shah
 
Transaction management DBMS
Transaction  management DBMSTransaction  management DBMS
Transaction management DBMSMegha Patel
 
Data Manipulation Language
Data Manipulation LanguageData Manipulation Language
Data Manipulation LanguageJas Singh Bhasin
 
My lectures circular queue
My lectures circular queueMy lectures circular queue
My lectures circular queueSenthil Kumar
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMSkoolkampus
 
Presentation on Segmentation
Presentation on SegmentationPresentation on Segmentation
Presentation on SegmentationPriyanka bisht
 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query OptimizationAli Usman
 
File access methods.54
File access methods.54File access methods.54
File access methods.54myrajendra
 
Log based and Recovery with concurrent transaction
Log based and Recovery with concurrent transactionLog based and Recovery with concurrent transaction
Log based and Recovery with concurrent transactionnikunjandy
 
Indexing structure for files
Indexing structure for filesIndexing structure for files
Indexing structure for filesZainab Almugbel
 
Segmentation in Operating Systems.
Segmentation in Operating Systems.Segmentation in Operating Systems.
Segmentation in Operating Systems.Muhammad SiRaj Munir
 
Queue in Data Structure
Queue in Data Structure Queue in Data Structure
Queue in Data Structure Janki Shah
 
Free space managment46
Free space managment46Free space managment46
Free space managment46myrajendra
 

Was ist angesagt? (20)

Concurrency Control in Database Management System
Concurrency Control in Database Management SystemConcurrency Control in Database Management System
Concurrency Control in Database Management System
 
Acid properties
Acid propertiesAcid properties
Acid properties
 
System calls
System callsSystem calls
System calls
 
Transaction management DBMS
Transaction  management DBMSTransaction  management DBMS
Transaction management DBMS
 
Data Manipulation Language
Data Manipulation LanguageData Manipulation Language
Data Manipulation Language
 
Paging and segmentation
Paging and segmentationPaging and segmentation
Paging and segmentation
 
My lectures circular queue
My lectures circular queueMy lectures circular queue
My lectures circular queue
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMS
 
Presentation on Segmentation
Presentation on SegmentationPresentation on Segmentation
Presentation on Segmentation
 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query Optimization
 
File allocation methods (1)
File allocation methods (1)File allocation methods (1)
File allocation methods (1)
 
File access methods.54
File access methods.54File access methods.54
File access methods.54
 
Log based and Recovery with concurrent transaction
Log based and Recovery with concurrent transactionLog based and Recovery with concurrent transaction
Log based and Recovery with concurrent transaction
 
Indexing structure for files
Indexing structure for filesIndexing structure for files
Indexing structure for files
 
Segmentation in Operating Systems.
Segmentation in Operating Systems.Segmentation in Operating Systems.
Segmentation in Operating Systems.
 
Triggers and active database
Triggers and active databaseTriggers and active database
Triggers and active database
 
Serializability
SerializabilitySerializability
Serializability
 
Queue in Data Structure
Queue in Data Structure Queue in Data Structure
Queue in Data Structure
 
Free space managment46
Free space managment46Free space managment46
Free space managment46
 
Direct linking loaders
Direct linking loadersDirect linking loaders
Direct linking loaders
 

Andere mochten auch

Chapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organizationChapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organizationJafar Nesargi
 
File Organization
File OrganizationFile Organization
File OrganizationManyi Man
 
Indexing or dividing_head
Indexing or dividing_headIndexing or dividing_head
Indexing or dividing_headJavaria Chiragh
 
Lupws session 2 land classes_GIZ LM_RED_eng
Lupws session 2 land classes_GIZ LM_RED_engLupws session 2 land classes_GIZ LM_RED_eng
Lupws session 2 land classes_GIZ LM_RED_engLUP_Lao
 
Introduction to Controlled Vocabulary
Introduction to Controlled VocabularyIntroduction to Controlled Vocabulary
Introduction to Controlled VocabularyRebecca Thompson
 
Locking unit 1 topic 3
Locking unit 1 topic 3Locking unit 1 topic 3
Locking unit 1 topic 3avniS
 
LIS 653, Session 10: Controlled Vocabulary
LIS 653, Session 10: Controlled VocabularyLIS 653, Session 10: Controlled Vocabulary
LIS 653, Session 10: Controlled VocabularyDr. Starr Hoffman
 
Indexing languages
Indexing languagesIndexing languages
Indexing languagesyhen06
 
Cad lecture-5
Cad lecture-5Cad lecture-5
Cad lecture-527273737
 
Controlled Vocabulary
Controlled VocabularyControlled Vocabulary
Controlled Vocabularyguest118a9a
 
Introduction To Controlled Vocabularies
Introduction To Controlled VocabulariesIntroduction To Controlled Vocabularies
Introduction To Controlled VocabulariesFred Leise
 
Computer Assisted Language Learning ( CALL)
Computer Assisted Language Learning ( CALL)Computer Assisted Language Learning ( CALL)
Computer Assisted Language Learning ( CALL)Bird Heaven
 
Latent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationLatent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationElaheh Barati
 

Andere mochten auch (20)

Chapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organizationChapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organization
 
b+ tree
b+ treeb+ tree
b+ tree
 
File Organization
File OrganizationFile Organization
File Organization
 
Indexing or dividing_head
Indexing or dividing_headIndexing or dividing_head
Indexing or dividing_head
 
Seminar 28october
Seminar 28octoberSeminar 28october
Seminar 28october
 
File handling in c++
File handling in c++File handling in c++
File handling in c++
 
Social Media in Education
Social Media in EducationSocial Media in Education
Social Media in Education
 
Lupws session 2 land classes_GIZ LM_RED_eng
Lupws session 2 land classes_GIZ LM_RED_engLupws session 2 land classes_GIZ LM_RED_eng
Lupws session 2 land classes_GIZ LM_RED_eng
 
Classification, Cataloguing And Marc Crash Course
Classification, Cataloguing And Marc Crash CourseClassification, Cataloguing And Marc Crash Course
Classification, Cataloguing And Marc Crash Course
 
Introduction to Controlled Vocabulary
Introduction to Controlled VocabularyIntroduction to Controlled Vocabulary
Introduction to Controlled Vocabulary
 
Locking unit 1 topic 3
Locking unit 1 topic 3Locking unit 1 topic 3
Locking unit 1 topic 3
 
LIS 653, Session 10: Controlled Vocabulary
LIS 653, Session 10: Controlled VocabularyLIS 653, Session 10: Controlled Vocabulary
LIS 653, Session 10: Controlled Vocabulary
 
Indexing languages
Indexing languagesIndexing languages
Indexing languages
 
Cad lecture-5
Cad lecture-5Cad lecture-5
Cad lecture-5
 
Controlled Vocabulary
Controlled VocabularyControlled Vocabulary
Controlled Vocabulary
 
Bsp
Bsp Bsp
Bsp
 
Introduction To Controlled Vocabularies
Introduction To Controlled VocabulariesIntroduction To Controlled Vocabularies
Introduction To Controlled Vocabularies
 
Computer Assisted Language Learning ( CALL)
Computer Assisted Language Learning ( CALL)Computer Assisted Language Learning ( CALL)
Computer Assisted Language Learning ( CALL)
 
Latent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationLatent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text Summarization
 
Red black trees
Red black treesRed black trees
Red black trees
 

Ähnlich wie File organization and indexing

Ch 17 disk storage, basic files structure, and hashing
Ch 17 disk storage, basic files structure, and hashingCh 17 disk storage, basic files structure, and hashing
Ch 17 disk storage, basic files structure, and hashingZainab Almugbel
 
File System Implementation.pptx
File System Implementation.pptxFile System Implementation.pptx
File System Implementation.pptxRajapriya82
 
Chapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organizationChapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organizationJafar Nesargi
 
19 structured files
19 structured files19 structured files
19 structured filesashish61_scs
 
Report blocking ,management of files in secondry memory , static vs dynamic a...
Report blocking ,management of files in secondry memory , static vs dynamic a...Report blocking ,management of files in secondry memory , static vs dynamic a...
Report blocking ,management of files in secondry memory , static vs dynamic a...NoorMustafaSoomro
 
Chapter 11 - File System Implementation
Chapter 11 - File System ImplementationChapter 11 - File System Implementation
Chapter 11 - File System ImplementationWayne Jones Jnr
 
5 data storage_and_indexing
5 data storage_and_indexing5 data storage_and_indexing
5 data storage_and_indexingUtkarsh De
 
Internal representation of file chapter 4 Sowmya Jyothi
Internal representation of file chapter 4 Sowmya JyothiInternal representation of file chapter 4 Sowmya Jyothi
Internal representation of file chapter 4 Sowmya JyothiSowmya Jyothi
 

Ähnlich wie File organization and indexing (20)

Ch 17 disk storage, basic files structure, and hashing
Ch 17 disk storage, basic files structure, and hashingCh 17 disk storage, basic files structure, and hashing
Ch 17 disk storage, basic files structure, and hashing
 
Chapter13
Chapter13Chapter13
Chapter13
 
FS Mod2@AzDOCUMENTS.in.pdf
FS Mod2@AzDOCUMENTS.in.pdfFS Mod2@AzDOCUMENTS.in.pdf
FS Mod2@AzDOCUMENTS.in.pdf
 
File System Implementation.pptx
File System Implementation.pptxFile System Implementation.pptx
File System Implementation.pptx
 
Chapter13
Chapter13Chapter13
Chapter13
 
VSAM Tuning
VSAM TuningVSAM Tuning
VSAM Tuning
 
File System Implementation
File System ImplementationFile System Implementation
File System Implementation
 
NOTES ON "FOXPRO"
NOTES ON "FOXPRO" NOTES ON "FOXPRO"
NOTES ON "FOXPRO"
 
Ardbms
ArdbmsArdbms
Ardbms
 
Chapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organizationChapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organization
 
19 structured files
19 structured files19 structured files
19 structured files
 
Report blocking ,management of files in secondry memory , static vs dynamic a...
Report blocking ,management of files in secondry memory , static vs dynamic a...Report blocking ,management of files in secondry memory , static vs dynamic a...
Report blocking ,management of files in secondry memory , static vs dynamic a...
 
Chapter7.2-mikroprocessor
Chapter7.2-mikroprocessorChapter7.2-mikroprocessor
Chapter7.2-mikroprocessor
 
Chapter 11 - File System Implementation
Chapter 11 - File System ImplementationChapter 11 - File System Implementation
Chapter 11 - File System Implementation
 
Ch10
Ch10Ch10
Ch10
 
Database Sizing
Database SizingDatabase Sizing
Database Sizing
 
5 data storage_and_indexing
5 data storage_and_indexing5 data storage_and_indexing
5 data storage_and_indexing
 
Internal representation of file chapter 4 Sowmya Jyothi
Internal representation of file chapter 4 Sowmya JyothiInternal representation of file chapter 4 Sowmya Jyothi
Internal representation of file chapter 4 Sowmya Jyothi
 
ISO 2709
ISO 2709ISO 2709
ISO 2709
 
Indexing
IndexingIndexing
Indexing
 

Kürzlich hochgeladen

Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxPurva Nikam
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniquesugginaramesh
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 

Kürzlich hochgeladen (20)

Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptx
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniques
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 

File organization and indexing

  • 2. FILE ORGANIZATION is a method of arranging data on secondary storage devices and addressing them such that it facilitates storage and read/write operations of data or information requested by the user. A record is a collection of logically related fields or data items. Each data item is formed of one or more bytes. Record usually describes entities and their attributes. File is a collection of related sequence of records. 2
  • 3. 3 File Organization  The physical arrangement of data in a file into records and pages on the disk  File organization determines the set of access methods for  Storing and retrieving records from a file  Therefore, ‘file organization’ synonymous with ‘access method’  We study three types of file organization  Unordered or Heap files  Ordered or sequential files  Hash files  We examine each of them in terms of the operations we perform on the database  Insert a new record  Search for a record (or update a record)  Delete a record
  • 5.  Database - is an organized collection of logically related data.  The operations (read, modify, update, and delete) access data from the database.  DBMS must first transfer the data temporarily from secondary memory to a buffer in main memory. 5 Need for indexing:
  • 6.  Data is transferred between disk and main memory into units called blocks.  The transferring of data from secondary memory to main memory is a very slow operation. 6 Need for indexing:
  • 7.  An index is a data structure, which is maintained to determine the location of records in a file. The Index Table :  The index table contains a search key and a pointer.  Search key - an attribute or set of attributes that is used to look up the records in a file.  Pointer - contains the address of where the record is stored in the memory. 7 Index :
  • 8.  Each index entry corresponds to a record on the secondary storage file. To retrieve a record from the disk-  First the index is searched for the address of the record in the index table  Then the record is read from that address.  It is just like searching for a book in a library catalogue. 8
  • 9.  An index a data structure that is added to a file to provide faster access to the data.  It reduces the number of blocks that the DBMS has to check. 9
  • 10.  If there is a table similar to this: CREATE TABLE test1 ( id integer, content varchar2(30) );  The application requires a lot of queries of the form SELECT content FROM test1 WHERE id = constant;  Ordinarily, the system would have to scan the entire test1 table row by row to find all matching entries. 10 Example:
  • 11.  SYNTAX: CREATE INDEX <index-name> ON <table-name> (<column-name> [ASC|DESC], <column-name> [ASC|DESC]…); 11 Creating Index :
  • 12. ID CONTENT ….. 1 ABC ….. 2 DEF ….. 3 PQR ….. 4 XYZ ….. 12 1 2 3 4 INDEX TABLE Create the index on the id column: CREATE INDEX testid_index ON test1 (id ASC);
  • 13. FILE ORGANIZATION A file can be of 2 formats:  Fixed length records  Variable length records 13
  • 14. FIXED LENGTH RECORDS  All records on the page are of the same slot length i.e. every record in the file has exactly the same size(in bytes).  Record slots are uniform and records are arranged consecutively within a page. type deposit=record account_no:char(10); branch_name:char(22); balance:numeral(12,2); end Assume char occupies 1 byte and numeral stores 8 bytes .So first 40 bytes are stored for first record and next 40 bytes for the second record and so on. 14
  • 15. FIXED LENGTH RECORDS ACCOUNT_NO BRANCH_NAME BALANCE Record 0-> A-101 AB 100 Record 1-> A-102 CD 400 Record 2-> A-567 BC 200 Record 3-> A-92 AT 900 Record 4-> A-45 TC 300 15
  • 16. FIXED LENGTH RECORDS  Insertion- insertion occurs at the first available slot(or empty spaces).  Deletion- after a record is deleted an empty space is created. Now there can be various ways to fill this space: (a) shift all the records just below space by one place upwards. and all the empty slots appear together at the end ( requires a large no. of shifting) 16
  • 17. FIXED LENGTH RECORDS (b) shift the last record in empty slot, instead of disturbing large no. of records. (c) here deletion of records is handled by using an array of bits called file header at the beginning of the file. When the first record is deleted, the file header stores its address. Now this empty slot of record is used to store the address ( also called pointers)of next empty slot. 17
  • 18. FIXED LENGTH RECORDS ACCOUNT_NO BRANCH_NAM E BALANCE Record 0-> A-101 AB 100 Record 2-> A-567 BC 200 Record 3-> A-92 AT 900 Record 4-> A-45 TC 300 ACCOUNT_NO BRANCH_NAM E BALANCE A-101 AB 100 A-567 BC 200 A-92 AT 900 A-45 TC 300 18
  • 19. FIXED LENGTH RECORDS (b) ACCOUNT_NO BRANCH_NAME BALANCE A-101 AB 100 A-45 TC 300 A-567 BC 200 A-92 AT 900 19 ACCOUNT_NO BRANCH_NAM E BALANCE Record 0-> A-101 AB 100 Record 2-> A-567 BC 200 Record 3-> A-92 AT 900 Record 4-> A-45 TC 300
  • 20. FIXED LENGTH RECORDS ACCOUNT_NO BRANCH_NAM E BALANCE Stores the address of first emplty slot A-101 AB 100 A-567 BC 200 A-45 TC 300 FILE HEADER 20
  • 21. FIXED LENGTH RECORDS These stored empty slot addresses form a link list called as FREE LIST. Now whenever a new record is to be inserted, it is inserted in the first available empty slot pointed by the header. And value of file header is changed to address of next available empty slot. In case of unavailability of empty slot, the new record is added at the end of the file. 21
  • 22. ADVANTAGE OF FIXED LENGTH RECORDS Because the space made available by a deleted record is exactly the space needed to insert a new record, insertion and deletion for files are simple to implement. 22
  • 23. VARIABLE LENGTH RECORDS  All the records on the page are not of the same length but here different records in the file have different sizes. type account_list=record branch_name:char(22); account_info:array[1..infinity]of record account_no:char(10); balance:numeral(12,2); end end 23
  • 24. Variable length records 2 ways to implement variable-length records  Byte string representation  Fixed length representation 24
  • 25. Variable length records: Byte string representation  a special symbol called end of record( _|_) is added at the end of each record.  each record is stored as a string of consecutive bytes.  Disadvantages: a)it is very difficult to reuse empty slot (occupied formally by deleted record). b)a large no. of small fragments of disk storage are wasted.  Therefore a modified form of byte-string representation called the slotted-page structure, is commonly used for organizing records within a single block. 25
  • 26. Byte string representation BRANCH_ NAME ACCOUNT _NO BALANCE ACCOUNT _NO BALANCE AB A-101 500 A-147 200 CD A-92 100 _I_ EF A-47 900 _I_ GH A-52 500 A-66 1000 _I_ _I_ 26 Consider an example where a person can have more than one accounts in a bank.
  • 27. Slotted-page structure  Used for organizing records within a single block .  There is a header at the beginning of each block containing the following information: (i)The no. of record entries . (ii)The end of free space in the block (iii)An array whose entries contain the loc and size of each record. Free space #Entries 27
  • 28. Variable length records: Fixed length representation  One or more fixed-length records are used to represent variable-length record.  2 methods namely (a)reserved space (b) list representation  Reserved Space: (i) a fixed length record of the size equal to that of maximum record length in a file is used. (ii) unused space of the records shorter than the maximum size is filled with a special null or end-of –record symbol. Method is useful when most records have length close to the maximum record length otherwise a significant amount of space is wasted. 28
  • 29. Fixed length representation: Reserved Space BRANCH_N AME ACCOUNT_ NO BALANCE ACCOUNT_ NO BALANCE AB A-101 500 A-147 200 CD A-92 100 _I_ _I_ EF A-47 900 A-65 400 GH A-52 500 A-66 1000 29
  • 30. Variable length records : Fixed length representation  List representation(also called link list) (i) here a list of fixed-length records is chained together by pointers. Method is similar to file header address concept except in case of file header method pointers are used to chain together only deleted records, whereas here pointers of all records pertaining to the same branch_name are chained together. 30
  • 31. Fixed length representation: List representation BRANCH_N AME ACCOUNT_ NO BALANCE AB A-101 500 CD A-92 100 EF A-47 900 GH A-52 500 A-147 200 A-65 400 A-66 1000 FILE HEADER 31
  • 32. Variable length records:Fixed length representation Problem with this representation is when a new record is to be inserted, an empty slot of just right size is needed ( neither a smaller size will do nor the big slot would solve the purpose as it would lead to wastage of space).So it becomes imp to allocate a right size of slot for insertion of record and move records to fill the space created by deletion of records to ensure that all the free space in the file is contiguous. 32
  • 33. Variable length records:Fixed length representation To deal with this problem, we allow two kinds of blocks in our file: 1. Anchor block: which contain the first record of a chain 2. Overflow block: This contains records other than those that are the first record of a chain. Thus all the records within a block have the same length, even though not all records in the file have the same length. 33
  • 34. Anchor block-Overflow block  Anchor block: BRANCH_NAME ACCOUNT_NO BALANCE AB A-101 500 CD A-92 100 EF A-47 900 GH A-52 500 OVERFLOW BLOCK A-147 200 A-65 400 A-66 1000 34
  • 35.  SYNTAX: CREATE INDEX <index-name> ON <table-name> (<column-name> [ASC|DESC], <column-name> [ASC|DESC]…); 35 Creating Index :
  • 36. ID CONTENT ….. 1 ABC ….. 2 DEF ….. 3 PQR ….. 4 XYZ ….. 36 1 2 3 4 INDEX TABLE Create the index on the id column: CREATE INDEX testid_index ON test1 (id ASC);
  • 37.  Ordered indexing  Hashed indexing  Ordered index – which is used to access data sorted by order of values. Binary search can be used to access records.  Hash index - used to access data that is distributed uniformly across a range using a hashed function. 37 Types of indexing techniques :
  • 38. Dense indexing :  An index record or entry appears for every search-key value in the table.  The index record contains the search key and a pointer to the first data record with that search-key value. 38 Types of Ordered indexing
  • 39. 39 Example: 1 2 3 Taking year as the search key : ROLL NO NAME YEAR MARKS PERCE- NTAGE 6545 ABC 1 894 76.8 7645 BCD 2 900 78.4 7456 PQR 2 1055 82.9 8734 XYZ 3 867 75.6
  • 40. Insertion :  if the search key value does not appear in the index, the new record is inserted to an appropriate position  if the index record stores a pointer to only the first record with the same search-key value, the record being inserted is placed right after the other records with the same search-key values Deletion :  if the deleted record was the only record with its unique search-key value, then it is simply deleted  If the index record stores a pointer to only the first record with the same search-key value, and if the deleted record was the first record, update the index record to point to the next record. 40
  • 41. 41 Inserting : ROLL NO NAME YEAR MARKS PERCE- NTAGE 6545 ABC 1 894 76.8 7645 BCD 2 900 78.4 7456 PQR 2 1055 82.9 7477 RST 2 1088 88.8 8734 XYZ 3 867 75.6 8899 UVW 4 894 76.8 1 2 3 4 Inserting -a record with year 2 (existing search-key value) -a record with year 4 (new search-key value)
  • 42.
  • 43.  An index record that appears for only some of the values in the file.  The records are divided into blocks.  The index is assumed to be storing an entry of each block of the file.  To locate a record, we find the index record with the largest search key value less than or equal to the search key value we are looking for. 43 Sparse Indexing :
  • 45. Insertion :  if no new block is created, no change is made to the index.  if a new block is created, the first search-key value in the new block is added to the index. Deletion :  if the deleted record was the only record with its search key, the corresponding index record is replaced with an index record for the next search-key value  if the record being deleted is the first record of the block, the next record with the same search- key value is updated as the reference instead 45
  • 46. Which one is better? Dense or sparse?  Records can be located faster using dense index than sparse index.  But, sparse indexes require less space and they impose less maintenance overhead for insertions and deletions.  So choosing between dense and sparse indexes , it’s a trade off between time and space overhead.
  • 47. Primary Index :  It is an ordered index with the primary key field as the search-key.  It can be dense or sparse.  There can only be one primary index for a file. Secondary Index :  It provides a secondary means of accessing a file where a primary access already exists.  The search-key is a candidate key & has a unique value.  More than one secondary indices can exist for the same file 47
  • 48. Clustered Index :  If the records are physically ordered on a nonkey field – which does not have a distinct value for each record-that field is called the clustering field and the index with the clustering field as the search-key is known as the clustered index.  The clustering index has an entry for each distinct value of the clustering field.  There can be only one clustered index per table.  Blocks of fixed size are reserved for each value of the clustering field to avoid physical reordering during insertion & deletion. 48
  • 49. Clustered Index(finally !)_ A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore a table can have only one clustered index. Clustering alters the data block into a certain distinct order to match the index, hence it is also an operation on the data storage blocks as well as on the index.
  • 50. Example of Clustered Index_ An address book ordered by first name resembles a clustered index in its structure and purpose. A dictionary.
  • 51.  Clustered indexes are good for range searches. Non clustered Index :  The logical order of the index does not match the physical stored order of the rows on disk.  A file can have multiple non-clustered indexes.  Non-clustered indexes are good for random searches. 51
  • 52. Example of Non-Clustered Index_ An index on the backside of the book where you see only the pages on which the words of the index appear. The order of the words in the book is not according to the order of the words in the index.
  • 53. Non-Clustered Index_ An non-clustered index is structured in a way that doesn't correspond to the order of the actual data records. For this reason, it will perform worse than clustered indexes on ranged searches where the result set is large, since each result could cost an additional I/O-operation to get the actual data record.
  • 54.  If an index may be too large for efficient processing we use multi-level indexing.  The outer index is a sparse index of the primary index whereas the inner index is the primary index.  In the same way any no. of levels can be made depending on the size of the index.  Generally all index levels are physically ordered which creates problems in insertion & deletion.  Therefore , dynamic multilevel indexes (B- trees and B+ -trees) are used. 54 Multilevel Index :
  • 57. INTRODUCTION  Hashing is a method for determining physical location of record.  In it the record key is processed mathematically and another no is computed which represents location of record (hashing function).  Files which use hashing techniques are called hash files.
  • 58. Example… empid ename salary 4555 abc 5000 Here empid is primary key. let there be 1000 employees. hash Function is to divide the key by prime number close to 1000 and Use remainder as storage location. Prime number=997 Remainder =567 Hence the employee record is stored at memory location 567.
  • 59. Types of hashing  Internal Hashing-In case of internal files hashing is implemented as a hash table .Hash table is a data structure in which location of data item is determined by hash function. If hashed value of two keys result same then it is called ‘collision’. K1 k4 k2 k3 H(k1) H(k4) H(k2) H(k3) Set of keys Hash table
  • 60. External hashing  Hashing for disk files is called external hashing. Target address space is made of buckets. Each bucket has multiple records (block address). Hash functions map record key to bucket number. A table maintained in file header converts bucket number to Corresponding block address.  Hash function is used to locate records for access, insertion as well as deletion. This method of buckets prevents collision. This technique of hashing is called static hashing as fixed number of buckets are allocated in beginning.
  • 61. . 1. 2. 3. Block 1 Block 2 Block 3 Buckets key Hash function
  • 62. CREATE TABLE test ( id INTEGER PRIMARY KEY , name varchar(100) ); In table test, we have decided to store 4 rows... insert into test values (1,'Gordon'); insert into test values (2,'Jim'); insert into test values (4,'Andrew'); insert into test values (3,'John'); The algorithm splits the places which the rows are to be stored into areas. These areas are called buckets. If a row's primary key matches the requirements to be stored in that bucket, then that is where it will be stored. The algorithm to decide which bucket to use is called the hash function. For our example we will have a nice simple hash function, where the bucket number equals the primary key.
  • 63.
  • 64. Problem of overflow….  Overflow occurs when bucket is filled to its capacity and new record has to be inserted.  We can have pointer in bucket which points to linked list of overflowed records of buckets.
  • 65. Deficiencies of Static Hashing • In static hashing, function h maps search-key values to a fixed set of B of bucket addresses. – Databases grow with time. If initial number of buckets is too small, performance will degrade due to too much overflows. . – If database shrinks, again space will be wasted. . • These problems can be avoided by using techniques that allow the number of buckets to be modified dynamically.
  • 66. Dynamic Hashing • Good for database that grows and shrinks in size. Allows the hash function to be modified dynamically • Extendable hashing – one form of dynamic hashing •A directory –an array of 2^d bucket address is maintained •D is called global depth of directory. •The result of hash function is represented in binary form and the first ‘i’ bits of hash value are used to determine directory entry and value at it determines bucket in which record is to be stored. •We can increase the number of ‘d’ to accommodate more buckets.
  • 67. 1001 1010 1011 Buckets for records with hash value starting with100 Buckets for record with hash value starting with 101. key Hash function Here global depth is 4 and let first three bits of hash value represent array Index.
  • 68. empid ename 6 abc 5 xyz example….. 1101 1011 directory Buckets for records having hash value Starting with 110 Bucket for records having hash value Starting with 101 Hash function=k*2 Hash value for first record is 1100 Hash value for second record is 1010