SlideShare ist ein Scribd-Unternehmen logo
1 von 53
RELATIONAL DATABASE MANAGEMENT SYSTEM
UNIT – III
DATA STORAGE AND INDEXING
1
Storage And File Structure
Why do we need to know about storage/file structure
Many database technologies are developed to utilize the
Storage architecture/hierarchy
Data in the database needs to be organized and
stored/retrieved efficiently
Storage Hierarchy
Magnetic Tape
Optical Disk
Magnetic Disk
Flash Memory
Cache
unit priceMemory
Volatile
primary storage
Non-volatile speed
Secondary
storage
Tertiary
storage
Primary Storage (Volatile)
Cache
Speed: 7 to 20 ns (1 nanosecond = 10–9 seconds)
Capacity:
A typical PC level 2 cache 64KB-2 MB.
Within processors, level 1 cache usually ranges in size from 8
KB to 64 KB.
Main memory
Speed: 10s to 100s of nanoseconds;
Capacity:
Up to a few Gigabytes widely used currently
per-byte costs have decreased roughly factor of 2 every 2 3
years)
Secondary Storage (Non-volatile)
Flash memory
Speed: Read speed similar to main memory, write is much slower
Capacity: 32M to 512M currently
Forms: SmartMedia, memory stick, secure digital, BIOS
Cost: roughly same as main memory
Magnetic-disk
Capacities: up to roughly 100 GB currently Growing constantly
and rapidly with technology improvements.
1/14/2005
Yan Huang - CSCI5330 Database Implementation –
Storage and File Structure
Tertiary Storage (Non-volatile)
 Optical storage
 CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular
forms
 CD-RW, DVD-RW, and DVD-RAM
 Reads and writes are slower than with magnetic disk
 Juke-box systems, with large numbers of removable disks,
a few drives, and a mechanism for automatic
loading/unloading of disks available for storing large
volumes of data
Indexing:
*Indexing in database systems is similar to what we see
in books.
* Indexing is a data structure technique to efficiently
retrieve records from the database files based on some
attributes on which the indexing has been done.
*Indexing is defined based on its indexing attributes.
Indexing can be of the following types
*Primary Index - Primary index is defined on an
ordered data file. The data file is ordered on a key field. The
key field is generally the primary key of the relation.
Indexing Types:
Secondary Index - Secondary index may be
generated from a field which is a candidate key and has a
unique value in every record, or a non-key with duplicate
values.
Clustering Index - Clustering index is defined on an
ordered data file. The data file is ordered on a non-key field.
Ordered Indexing.
Dense Index - In dense index, there is an index
record for every search key value in the database. This makes
searching faster but requires more space to store index
records itself. Index records contain search key value and a
pointer to the actual record on the disk.
Sparse Index:
In sparse index, index records are not created for
every search key. An index record here contains a search key
and an actual pointer to the data on the disk.
To search a record, we first proceed by index record
and reach at the actual location of the data.
If the data we are looking for is not where we
directly reach by following the index, then the system starts
sequential search until the desired data is found.
Multilevels Indexing:
Index records comprise search-key values and data pointers.
Multilevel index is stored on the disk along with the actual
database files.
As the size of the database grows, so does the size of the
indices.
If single-level index is used, then a large size index cannot be
kept in memory which leads to multiple disk accesses .
Multi-level Index helps in breaking down the index into
several smaller indices in order to make the outermost level so small
that it can be saved in a single disk block,
Disk:
Hard disk drives are the most common secondary storage
devices in present computer systems. These are called magnetic disks
because they use the concept of magnetization to store information.
Hard disks are formatted in a well-defined order to store data
efficiently. A hard disk plate has many concentric circles on it,
called tracks. Every track is further divided into sectors. A sector on
a hard disk typically stores 512 bytes of data.
Disk
Disk Subsystem
Disk interface standards families
• ATA (AT adaptor) range of standards
• SCSI (Small Computer System Interconnect) range
of standards.
Disk Speed
Seek time
(milliseconds)
Rotation time/latency
milliseconds
Data-transfer rate
(4-8MB/sec)
Typical numbers:
 16,000 tracks per platter
 sectors per track: 200 – 400
 512 bytes per sector
 4-16KB per block
 5,400 - 15,000 r p m
Access time = seek time + latency
Discuss ways to improve disk
reading speed
Redundant Array of Independent Disks
RAID or Redundant Array of Independent Disks,
is a technology to connect multiple secondary storage
devices and use them as a single storage media.
RAID consists of an array of disks in which
multiple disks are connected together to achieve different
goals. RAID levels define the use of disk arrays.
RAID 0
In this level, a striped array of disks is implemented. The
data is broken down into blocks and the blocks are distributed
among disks. Each disk receives a block of data to write/read in
parallel. It enhances the speed and performance of the storage
device. There is no parity and backup in Level 0.
RAID1
RAID 1 uses mirroring techniques. When data is sent to
a RAID controller, it sends a copy of data to all the disks in the
array. RAID level 1 is also called mirroring and provides
100% redundancy in case of a failure.
RAID 2
RAID 2 records Error Correction Code using Hamming
distance for its data, striped on different disks. Like level 0, each
data bit in a word is recorded on a separate disk and ECC codes of
the data words are stored on a different set disks. Due to its
complex structure and high cost, RAID 2 is not commercially
available.
RAID3
RAID 3 stripes the data onto multiple disks. The parity
bit generated for data word is stored on a different disk. This
technique makes it to overcome single disk failures.
RAID 4
In this level, an entire block of data is written onto data
disks and then the parity is generated and stored on a different
disk. Note that level 3 uses byte-level striping, whereas level 4
uses block-level striping. Both level 3 and level 4 require at
least three disks to implement RAID.
RAID 5
RAID 5 writes whole data blocks onto different disks, but
the parity bits generated for data block stripe are distributed
among all the data disks rather than storing them on a different
dedicated disk.
RAID 6
RAID 6 is an extension of level 5. In this level, two
independent parities are generated and stored in distributed fashion
among multiple disks. Two parities provide additional fault
tolerance. This level requires at least four disk drives to implement
RAID.
File Organization
File Organization defines how file records are mapped onto
disk blocks. We have four types of File Organization to organize
file records
Heap File Organization
When a file is created using Heap File Organization, the
Operating System allocates memory area to that file without
any further accounting details.
File records can be placed anywhere in that memory
area.
It is the responsibility of the software to manage the
records.
Heap File does not support any ordering, sequencing, or
indexing on its own.
Sequential File Organization
Every file record contains a data field (attribute) to uniquely
identify that record.
In sequential file organization, records are placed in the file
in some sequential order based on the unique key field or search
key.
Practically, it is not possible to store all the records
sequentially in physical form.
Hash File Organization
Hash File Organization uses Hash function computation
on some fields of the records.
The output of the hash function determines the location
of disk block where the records are to be placed.
Clustered File Organization
Clustered file organization is not considered good
for large databases. In this mechanism, related records from
one or more relations are kept in the same disk block, that
is, the ordering of records is not based on primary key or
search key.
Hashing:
Hashing uses hash functions with search keys as
parameters to generate the address of a data record.
Bucket A hash file stores data in bucket
format. Bucket is considered a unit of storage. A bucket
typically stores one complete disk block, which in turn
can store one or more records.
A hash function, h, is a mapping function that
maps all the set of search-keys K to the address where
actual records are placed. It is a function from search
keys to bucket addresses.
B+ Tree:
B+ tree is a (key, value) storage method in a tree like
structure. B+ tree has one root, any number of intermediary
nodes (usually one) and a leaf node. Here all leaf nodes will
have the actual records stored. Intermediary nodes will have
only pointers to the leaf nodes; it not has any data. Any node
will have only two leaves. This is the basic of any B+ tree.
STRUCTURE OF B+ TREE
 A B+ tree index is a multilevel indexes , but it has a structure that differs from than of
the multilevel index-sequential file.
 The bucket structure is used only if the search key does not from a primary key and if
the file is not sorted in the search key value in the order.
QUERIES ON B+ TREE
 Process queries using a b+ tree . To find all the records with a search-key
value of k.
 Leaf nodes must have between 2 and 4 values([(n-1)/2)] and n-1 , with
n=5).
 Non-leaf nodes other than root must have between 3 and 5
children([(n/2)]and n with n=5).
 Root must have at least 2 children.
UPADATES ON B+ TREES
INSERTION
If the search key value already appears in the leaf node , we add the new
record to the file and , if necessary , a pointer to the bucket.
DELETION
Using same technique as for lookup , we find the record to be deleted and
remove it from the file . The search key value is removed from the leaf node
if there is no bucket associated with that search key value or if the bucket
becomes empty as a result of the deletion.
B+TREE FILE ORGANIZATION
In a B+ tree file organization , the leaf nodes of the tree store records
instead of storing pointers to records . An example of a B+ tree file
organization . Since records are usually larger than pointers , the maximum
number of records that can be stored in the leaf nodes is less than the
number of pointers in a non leaf node.
MAIN GOAL OF B+ TREE IS:
 Sorted Intermediary and leaf nodes
Since it is a balanced tree, all nodes should be sorted.
 Fast traversal and Quick Search
Any record should be fetched very quickly. This is made by maintaining the
balance in the tree and keeping all the nodes at same distance
 No overflow pages
B+ tree allows all the intermediary and leaf nodes to be partially filled – it will have
some percentage defined while designing a B+ tree. In our example above,
intermediary node with 108 is underflow. And leaf nodes are not partially filled,
hence it is an overflow. In ideal B+ tree, it should not have overflow or underflow
except root node.
Definition of a B-tree
A B-tree of order m is an m-way tree (i.e., a tree where each node
may have up to m children) in which:
The number of keys in each non-leaf node is one less than the
number of its children and these keys partition the keys in the
children in the fashion of a search tree.
All leaves are on the same level.
All non-leaf nodes except the root have at least m / 2 children.
The root is either a leaf node, or it has from two to m children
a leaf node contains no more than m – 1 keys.
The number m should always be odd.
An example B-Tree
B-Trees 41
51 6242
6 12
26
55 60 7064 9045
1 2 4 7 8 13 15 18 25
27 29 46 48 53
A B-tree of order 5
containing 26 items
Note that all the leaves are at the same level
Constructing a B-tree
 Suppose we start with an empty B-tree and keys arrive in the
following order:1 12 8 2 25 5 14 28 17 7 52 16 48 68
3 26 29 53 55 45
 We want to construct a B-tree of order 5
 The first four items go into the root:
 To put the fifth item in the root would violate condition 5
 Therefore, when 25 arrives, pick the middle key to make a new
root
B-Trees 42
1 2 8 12
Inserting into a B-Tree
 Attempt to insert the new key into a leaf
 If this would result in that leaf becoming too big, split the leaf
into two, promoting the middle key to the leaf’s parent
 If this would result in the parent becoming too big, split the
parent into two, promoting the middle key
 This strategy might have to be repeated all the way to the top
 If necessary, the root is split in two and the middle key is
promoted to a new root, making the tree one level higher
B-Trees 43
Removal from a B-tree
 During insertion, the key always goes into a leaf. For deletion
we wish to remove from a leaf. There are three possible ways
we can do this:
 1 - If the key is already in a leaf node, and removing it doesn’t
cause that leaf node to have too few keys, then simply remove
the key to be deleted.
 2 - If the key is not in a leaf then it is guaranteed (by the nature
of a B-tree) that its predecessor or successor will be in a leaf --
in this case we can delete the key and promote the predecessor
or successor key to the non-leaf deleted key’s position.
B-Trees 44
Analysis of B-Trees
 The maximum number of items in a B-tree of order m and
height h:
root m – 1
level 1 m(m – 1)
level 2 m2(m – 1)
. . .
level h mh(m – 1)
 So, the total number of items is
(1 + m + m2 + m3 + … + mh)(m – 1) =
[(mh+1 – 1)/ (m – 1)] (m – 1) = mh+1 – 1
 When m = 5 and h = 2 this gives 53 – 1 = 124
B-Trees 45
Static Hashing:
In static hashing, when a search-key value is provided, the
hash function always computes the same address.
For example, if mod-4 hash function is used, then it shall
generate only 5 values.
The output address shall always be same for that function.
The number of buckets provided remains unchanged at all times
Operation
When a record is required to be entered using static hash,
the hash function h computes the bucket address for search key K,
where the record will be stored.
Bucket address = h(K)
Search − When a record needs to be
retrieved, the same hash function can be used to
retrieve the address of the bucket where the data is
stored.
Delete − This is simply a search followed by a
deletion operation.
Dynamic Hashing
The problem with static hashing is that it does not
expand or shrink dynamically as the size of the database grows
or shrinks.
Dynamic hashing provides a mechanism in which data
buckets are added and removed dynamically and ondemand.
Dynamic hashing is also known as extended hashing.
Hash function, in dynamic hashing, is made to produce
a large number of values and only a few are used initially.
Multiple-Key Access
Use multiple indices for certain types of queries.
Example:
select ID
from instructor
where dept_name = “Finance” and salary = 80000
Possible strategies for processing query using indices on
single attributes:
Multiple Key Access
1. Use index on dept_name to find instructors with
department name Finance; test salary = 80000
2. Use index on salary to find instructors with a salary
of $80000; test dept_name = “Finance”.
3. Use dept_name index to find pointers to all records
pertaining to the “Finance” department.
Similarly use index on salary. Take
intersection of both sets of pointers obtained.
Data storage and indexing

Weitere ähnliche Inhalte

Was ist angesagt?

Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OSKumar Pritam
 
File Organization
File OrganizationFile Organization
File OrganizationManyi Man
 
File Management in Operating System
File Management in Operating SystemFile Management in Operating System
File Management in Operating SystemJanki Shah
 
Database performance tuning and query optimization
Database performance tuning and query optimizationDatabase performance tuning and query optimization
Database performance tuning and query optimizationDhani Ahmad
 
Hashing in datastructure
Hashing in datastructureHashing in datastructure
Hashing in datastructurerajshreemuthiah
 
Structure of the page table
Structure of the page tableStructure of the page table
Structure of the page tableduvvuru madhuri
 
Presentation on Segmentation
Presentation on SegmentationPresentation on Segmentation
Presentation on SegmentationPriyanka bisht
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management systemPooja Dixit
 
Associative memory 14208
Associative memory 14208Associative memory 14208
Associative memory 14208Ameer Mehmood
 
Query processing and optimization (updated)
Query processing and optimization (updated)Query processing and optimization (updated)
Query processing and optimization (updated)Ravinder Kamboj
 
Indexing and Hashing
Indexing and HashingIndexing and Hashing
Indexing and Hashingsathish sak
 

Was ist angesagt? (20)

Hashing
HashingHashing
Hashing
 
File organization
File organizationFile organization
File organization
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
 
Indexing Data Structure
Indexing Data StructureIndexing Data Structure
Indexing Data Structure
 
File Organization
File OrganizationFile Organization
File Organization
 
File Management in Operating System
File Management in Operating SystemFile Management in Operating System
File Management in Operating System
 
Database performance tuning and query optimization
Database performance tuning and query optimizationDatabase performance tuning and query optimization
Database performance tuning and query optimization
 
Hashing in datastructure
Hashing in datastructureHashing in datastructure
Hashing in datastructure
 
File organization
File organizationFile organization
File organization
 
Structure of the page table
Structure of the page tableStructure of the page table
Structure of the page table
 
Presentation on Segmentation
Presentation on SegmentationPresentation on Segmentation
Presentation on Segmentation
 
Distributed database management system
Distributed database management  systemDistributed database management  system
Distributed database management system
 
Disk structure
Disk structureDisk structure
Disk structure
 
raid technology
raid technologyraid technology
raid technology
 
Associative memory 14208
Associative memory 14208Associative memory 14208
Associative memory 14208
 
Query processing and optimization (updated)
Query processing and optimization (updated)Query processing and optimization (updated)
Query processing and optimization (updated)
 
Lec 1 indexing and hashing
Lec 1 indexing and hashing Lec 1 indexing and hashing
Lec 1 indexing and hashing
 
Indexing and Hashing
Indexing and HashingIndexing and Hashing
Indexing and Hashing
 
File system
File systemFile system
File system
 

Ähnlich wie Data storage and indexing

Unit 4 data storage and querying
Unit 4   data storage and queryingUnit 4   data storage and querying
Unit 4 data storage and queryingRavindran Kannan
 
3620121datastructures.ppt
3620121datastructures.ppt3620121datastructures.ppt
3620121datastructures.pptSheejamolMathew
 
CS 2212- UNIT -4.pptx
CS 2212-  UNIT -4.pptxCS 2212-  UNIT -4.pptx
CS 2212- UNIT -4.pptxLilyMkayula
 
lecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptxlecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptxpeter1097
 
Csci12 report aug18
Csci12 report aug18Csci12 report aug18
Csci12 report aug18karenostil
 
overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam pratikkadam78
 
File organization and introduction of DBMS
File organization and introduction of DBMSFile organization and introduction of DBMS
File organization and introduction of DBMSVrushaliSolanke
 
fileorganizationandintroductionofdbms-210313163900.pdf
fileorganizationandintroductionofdbms-210313163900.pdffileorganizationandintroductionofdbms-210313163900.pdf
fileorganizationandintroductionofdbms-210313163900.pdfFraolUmeta
 
What is Object storage ?
What is Object storage ?What is Object storage ?
What is Object storage ?Nabil Kassi
 
File Structure.pptx
File Structure.pptxFile Structure.pptx
File Structure.pptxzedd15
 

Ähnlich wie Data storage and indexing (20)

Unit 4 data storage and querying
Unit 4   data storage and queryingUnit 4   data storage and querying
Unit 4 data storage and querying
 
3620121datastructures.ppt
3620121datastructures.ppt3620121datastructures.ppt
3620121datastructures.ppt
 
UNIT III.pptx
UNIT III.pptxUNIT III.pptx
UNIT III.pptx
 
Storage struct
Storage structStorage struct
Storage struct
 
CS 2212- UNIT -4.pptx
CS 2212-  UNIT -4.pptxCS 2212-  UNIT -4.pptx
CS 2212- UNIT -4.pptx
 
lecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptxlecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptx
 
Chapter13
Chapter13Chapter13
Chapter13
 
Csci12 report aug18
Csci12 report aug18Csci12 report aug18
Csci12 report aug18
 
Indexing and hashing
Indexing and hashingIndexing and hashing
Indexing and hashing
 
Unit 08 dbms
Unit 08 dbmsUnit 08 dbms
Unit 08 dbms
 
overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam
 
File organization and introduction of DBMS
File organization and introduction of DBMSFile organization and introduction of DBMS
File organization and introduction of DBMS
 
fileorganizationandintroductionofdbms-210313163900.pdf
fileorganizationandintroductionofdbms-210313163900.pdffileorganizationandintroductionofdbms-210313163900.pdf
fileorganizationandintroductionofdbms-210313163900.pdf
 
DBMS (UNIT 5)
DBMS (UNIT 5)DBMS (UNIT 5)
DBMS (UNIT 5)
 
Unit08 dbms
Unit08 dbmsUnit08 dbms
Unit08 dbms
 
What is Object storage ?
What is Object storage ?What is Object storage ?
What is Object storage ?
 
Database management system session 6
Database management system session 6Database management system session 6
Database management system session 6
 
Ardbms
ArdbmsArdbms
Ardbms
 
File Structure.pptx
File Structure.pptxFile Structure.pptx
File Structure.pptx
 
DMBS Indexes.pptx
DMBS Indexes.pptxDMBS Indexes.pptx
DMBS Indexes.pptx
 

Mehr von pradeepa velmurugan (10)

FIREWALL
FIREWALLFIREWALL
FIREWALL
 
Multimedia compression
Multimedia compressionMultimedia compression
Multimedia compression
 
software design
software designsoftware design
software design
 
DIVIDE AND CONQUER
DIVIDE AND CONQUERDIVIDE AND CONQUER
DIVIDE AND CONQUER
 
IMAGE COMPRESSION
IMAGE COMPRESSIONIMAGE COMPRESSION
IMAGE COMPRESSION
 
File handling in input and output
File handling in input and outputFile handling in input and output
File handling in input and output
 
Analysis Of Attribute Revelance
Analysis Of Attribute RevelanceAnalysis Of Attribute Revelance
Analysis Of Attribute Revelance
 
Scheduling
SchedulingScheduling
Scheduling
 
Instruction codes
Instruction codesInstruction codes
Instruction codes
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
 

Kürzlich hochgeladen

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 

Kürzlich hochgeladen (20)

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 

Data storage and indexing

  • 1. RELATIONAL DATABASE MANAGEMENT SYSTEM UNIT – III DATA STORAGE AND INDEXING
  • 2. 1 Storage And File Structure Why do we need to know about storage/file structure Many database technologies are developed to utilize the Storage architecture/hierarchy Data in the database needs to be organized and stored/retrieved efficiently
  • 3. Storage Hierarchy Magnetic Tape Optical Disk Magnetic Disk Flash Memory Cache unit priceMemory Volatile primary storage Non-volatile speed Secondary storage Tertiary storage
  • 4. Primary Storage (Volatile) Cache Speed: 7 to 20 ns (1 nanosecond = 10–9 seconds) Capacity: A typical PC level 2 cache 64KB-2 MB. Within processors, level 1 cache usually ranges in size from 8 KB to 64 KB. Main memory Speed: 10s to 100s of nanoseconds; Capacity: Up to a few Gigabytes widely used currently per-byte costs have decreased roughly factor of 2 every 2 3 years)
  • 5. Secondary Storage (Non-volatile) Flash memory Speed: Read speed similar to main memory, write is much slower Capacity: 32M to 512M currently Forms: SmartMedia, memory stick, secure digital, BIOS Cost: roughly same as main memory Magnetic-disk Capacities: up to roughly 100 GB currently Growing constantly and rapidly with technology improvements.
  • 6. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Tertiary Storage (Non-volatile)  Optical storage  CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular forms  CD-RW, DVD-RW, and DVD-RAM  Reads and writes are slower than with magnetic disk  Juke-box systems, with large numbers of removable disks, a few drives, and a mechanism for automatic loading/unloading of disks available for storing large volumes of data
  • 7. Indexing: *Indexing in database systems is similar to what we see in books. * Indexing is a data structure technique to efficiently retrieve records from the database files based on some attributes on which the indexing has been done. *Indexing is defined based on its indexing attributes. Indexing can be of the following types *Primary Index - Primary index is defined on an ordered data file. The data file is ordered on a key field. The key field is generally the primary key of the relation.
  • 8. Indexing Types: Secondary Index - Secondary index may be generated from a field which is a candidate key and has a unique value in every record, or a non-key with duplicate values. Clustering Index - Clustering index is defined on an ordered data file. The data file is ordered on a non-key field. Ordered Indexing. Dense Index - In dense index, there is an index record for every search key value in the database. This makes searching faster but requires more space to store index records itself. Index records contain search key value and a pointer to the actual record on the disk.
  • 9.
  • 10. Sparse Index: In sparse index, index records are not created for every search key. An index record here contains a search key and an actual pointer to the data on the disk. To search a record, we first proceed by index record and reach at the actual location of the data. If the data we are looking for is not where we directly reach by following the index, then the system starts sequential search until the desired data is found.
  • 11.
  • 12. Multilevels Indexing: Index records comprise search-key values and data pointers. Multilevel index is stored on the disk along with the actual database files. As the size of the database grows, so does the size of the indices. If single-level index is used, then a large size index cannot be kept in memory which leads to multiple disk accesses . Multi-level Index helps in breaking down the index into several smaller indices in order to make the outermost level so small that it can be saved in a single disk block,
  • 13.
  • 14. Disk: Hard disk drives are the most common secondary storage devices in present computer systems. These are called magnetic disks because they use the concept of magnetization to store information. Hard disks are formatted in a well-defined order to store data efficiently. A hard disk plate has many concentric circles on it, called tracks. Every track is further divided into sectors. A sector on a hard disk typically stores 512 bytes of data.
  • 15. Disk
  • 16. Disk Subsystem Disk interface standards families • ATA (AT adaptor) range of standards • SCSI (Small Computer System Interconnect) range of standards.
  • 17. Disk Speed Seek time (milliseconds) Rotation time/latency milliseconds Data-transfer rate (4-8MB/sec) Typical numbers:  16,000 tracks per platter  sectors per track: 200 – 400  512 bytes per sector  4-16KB per block  5,400 - 15,000 r p m Access time = seek time + latency Discuss ways to improve disk reading speed
  • 18. Redundant Array of Independent Disks RAID or Redundant Array of Independent Disks, is a technology to connect multiple secondary storage devices and use them as a single storage media. RAID consists of an array of disks in which multiple disks are connected together to achieve different goals. RAID levels define the use of disk arrays.
  • 19. RAID 0 In this level, a striped array of disks is implemented. The data is broken down into blocks and the blocks are distributed among disks. Each disk receives a block of data to write/read in parallel. It enhances the speed and performance of the storage device. There is no parity and backup in Level 0.
  • 20. RAID1 RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a copy of data to all the disks in the array. RAID level 1 is also called mirroring and provides 100% redundancy in case of a failure.
  • 21. RAID 2 RAID 2 records Error Correction Code using Hamming distance for its data, striped on different disks. Like level 0, each data bit in a word is recorded on a separate disk and ECC codes of the data words are stored on a different set disks. Due to its complex structure and high cost, RAID 2 is not commercially available.
  • 22. RAID3 RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is stored on a different disk. This technique makes it to overcome single disk failures.
  • 23. RAID 4 In this level, an entire block of data is written onto data disks and then the parity is generated and stored on a different disk. Note that level 3 uses byte-level striping, whereas level 4 uses block-level striping. Both level 3 and level 4 require at least three disks to implement RAID.
  • 24. RAID 5 RAID 5 writes whole data blocks onto different disks, but the parity bits generated for data block stripe are distributed among all the data disks rather than storing them on a different dedicated disk.
  • 25. RAID 6 RAID 6 is an extension of level 5. In this level, two independent parities are generated and stored in distributed fashion among multiple disks. Two parities provide additional fault tolerance. This level requires at least four disk drives to implement RAID.
  • 26. File Organization File Organization defines how file records are mapped onto disk blocks. We have four types of File Organization to organize file records
  • 27. Heap File Organization When a file is created using Heap File Organization, the Operating System allocates memory area to that file without any further accounting details. File records can be placed anywhere in that memory area. It is the responsibility of the software to manage the records. Heap File does not support any ordering, sequencing, or indexing on its own.
  • 28. Sequential File Organization Every file record contains a data field (attribute) to uniquely identify that record. In sequential file organization, records are placed in the file in some sequential order based on the unique key field or search key. Practically, it is not possible to store all the records sequentially in physical form.
  • 29. Hash File Organization Hash File Organization uses Hash function computation on some fields of the records. The output of the hash function determines the location of disk block where the records are to be placed.
  • 30. Clustered File Organization Clustered file organization is not considered good for large databases. In this mechanism, related records from one or more relations are kept in the same disk block, that is, the ordering of records is not based on primary key or search key.
  • 31. Hashing: Hashing uses hash functions with search keys as parameters to generate the address of a data record. Bucket A hash file stores data in bucket format. Bucket is considered a unit of storage. A bucket typically stores one complete disk block, which in turn can store one or more records. A hash function, h, is a mapping function that maps all the set of search-keys K to the address where actual records are placed. It is a function from search keys to bucket addresses.
  • 32. B+ Tree: B+ tree is a (key, value) storage method in a tree like structure. B+ tree has one root, any number of intermediary nodes (usually one) and a leaf node. Here all leaf nodes will have the actual records stored. Intermediary nodes will have only pointers to the leaf nodes; it not has any data. Any node will have only two leaves. This is the basic of any B+ tree.
  • 33. STRUCTURE OF B+ TREE  A B+ tree index is a multilevel indexes , but it has a structure that differs from than of the multilevel index-sequential file.  The bucket structure is used only if the search key does not from a primary key and if the file is not sorted in the search key value in the order.
  • 34. QUERIES ON B+ TREE  Process queries using a b+ tree . To find all the records with a search-key value of k.
  • 35.  Leaf nodes must have between 2 and 4 values([(n-1)/2)] and n-1 , with n=5).  Non-leaf nodes other than root must have between 3 and 5 children([(n/2)]and n with n=5).  Root must have at least 2 children.
  • 36. UPADATES ON B+ TREES INSERTION If the search key value already appears in the leaf node , we add the new record to the file and , if necessary , a pointer to the bucket.
  • 37. DELETION Using same technique as for lookup , we find the record to be deleted and remove it from the file . The search key value is removed from the leaf node if there is no bucket associated with that search key value or if the bucket becomes empty as a result of the deletion.
  • 38. B+TREE FILE ORGANIZATION In a B+ tree file organization , the leaf nodes of the tree store records instead of storing pointers to records . An example of a B+ tree file organization . Since records are usually larger than pointers , the maximum number of records that can be stored in the leaf nodes is less than the number of pointers in a non leaf node.
  • 39. MAIN GOAL OF B+ TREE IS:  Sorted Intermediary and leaf nodes Since it is a balanced tree, all nodes should be sorted.  Fast traversal and Quick Search Any record should be fetched very quickly. This is made by maintaining the balance in the tree and keeping all the nodes at same distance  No overflow pages B+ tree allows all the intermediary and leaf nodes to be partially filled – it will have some percentage defined while designing a B+ tree. In our example above, intermediary node with 108 is underflow. And leaf nodes are not partially filled, hence it is an overflow. In ideal B+ tree, it should not have overflow or underflow except root node.
  • 40. Definition of a B-tree A B-tree of order m is an m-way tree (i.e., a tree where each node may have up to m children) in which: The number of keys in each non-leaf node is one less than the number of its children and these keys partition the keys in the children in the fashion of a search tree. All leaves are on the same level. All non-leaf nodes except the root have at least m / 2 children. The root is either a leaf node, or it has from two to m children a leaf node contains no more than m – 1 keys. The number m should always be odd.
  • 41. An example B-Tree B-Trees 41 51 6242 6 12 26 55 60 7064 9045 1 2 4 7 8 13 15 18 25 27 29 46 48 53 A B-tree of order 5 containing 26 items Note that all the leaves are at the same level
  • 42. Constructing a B-tree  Suppose we start with an empty B-tree and keys arrive in the following order:1 12 8 2 25 5 14 28 17 7 52 16 48 68 3 26 29 53 55 45  We want to construct a B-tree of order 5  The first four items go into the root:  To put the fifth item in the root would violate condition 5  Therefore, when 25 arrives, pick the middle key to make a new root B-Trees 42 1 2 8 12
  • 43. Inserting into a B-Tree  Attempt to insert the new key into a leaf  If this would result in that leaf becoming too big, split the leaf into two, promoting the middle key to the leaf’s parent  If this would result in the parent becoming too big, split the parent into two, promoting the middle key  This strategy might have to be repeated all the way to the top  If necessary, the root is split in two and the middle key is promoted to a new root, making the tree one level higher B-Trees 43
  • 44. Removal from a B-tree  During insertion, the key always goes into a leaf. For deletion we wish to remove from a leaf. There are three possible ways we can do this:  1 - If the key is already in a leaf node, and removing it doesn’t cause that leaf node to have too few keys, then simply remove the key to be deleted.  2 - If the key is not in a leaf then it is guaranteed (by the nature of a B-tree) that its predecessor or successor will be in a leaf -- in this case we can delete the key and promote the predecessor or successor key to the non-leaf deleted key’s position. B-Trees 44
  • 45. Analysis of B-Trees  The maximum number of items in a B-tree of order m and height h: root m – 1 level 1 m(m – 1) level 2 m2(m – 1) . . . level h mh(m – 1)  So, the total number of items is (1 + m + m2 + m3 + … + mh)(m – 1) = [(mh+1 – 1)/ (m – 1)] (m – 1) = mh+1 – 1  When m = 5 and h = 2 this gives 53 – 1 = 124 B-Trees 45
  • 46. Static Hashing: In static hashing, when a search-key value is provided, the hash function always computes the same address. For example, if mod-4 hash function is used, then it shall generate only 5 values. The output address shall always be same for that function. The number of buckets provided remains unchanged at all times Operation When a record is required to be entered using static hash, the hash function h computes the bucket address for search key K, where the record will be stored.
  • 47. Bucket address = h(K) Search − When a record needs to be retrieved, the same hash function can be used to retrieve the address of the bucket where the data is stored. Delete − This is simply a search followed by a deletion operation.
  • 48.
  • 49. Dynamic Hashing The problem with static hashing is that it does not expand or shrink dynamically as the size of the database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets are added and removed dynamically and ondemand. Dynamic hashing is also known as extended hashing. Hash function, in dynamic hashing, is made to produce a large number of values and only a few are used initially.
  • 50.
  • 51. Multiple-Key Access Use multiple indices for certain types of queries. Example: select ID from instructor where dept_name = “Finance” and salary = 80000 Possible strategies for processing query using indices on single attributes:
  • 52. Multiple Key Access 1. Use index on dept_name to find instructors with department name Finance; test salary = 80000 2. Use index on salary to find instructors with a salary of $80000; test dept_name = “Finance”. 3. Use dept_name index to find pointers to all records pertaining to the “Finance” department. Similarly use index on salary. Take intersection of both sets of pointers obtained.