2. Files and Data Storage
● Most computers are used for data
processing, as a big growth area in the
“information age”
● Data processing from a computer science
perspective:
– Storage of data
– Organization of data
– Access to data
– Processing of data
3. Data Structures vs File Structures
• Both involve:
– Representation of Data
+
– Operations for accessing data
• Difference:
– Data structures: deal with data in the
main memory
– File structures: deal with the data in the
secondary storage
6. Goal of the File Structures
● Minimize the number of trips to the
secondary storage (SS) in order to get
desired information.
● Group related information so that we are
likely to get everything we new with fewer
trip to the SS.
● Select the right file structures so that
performance can be increased.
7. File and File Operations
● A file is a collection of data stored on mass
storage like hard disk, CD etc.
● File data consist of records (student
information) and each record contains
number of fields (ID, Name etc.).
● We can perform following operations on a
file.
– Search for a particular data in a file.
– Add a certain data item.
– Remove / Update a certain item.
8. File and File operations
– Order the data items according to a
certain criterion, merge of files.
– Creation of new files from existing files.
– Finally create, open, and close
operations which have implications in the
operating system.
10. Sequential File Organization
● Records are conceptually organized in a
sequential list and can only be accessed
sequentially.
● The actual storage might or might not be
sequential (on tape or on disk)
11. Sequential File (Write/Read in C++)
● Create ofstream object (after including file
fstream.h at the top)
● Open file for output or for appending at the
end of file.
● Test whether the file open operation of step
2 is successful. If not successful then exit
else continue.
● Write / Read data to output file.
● Close file after writing / reading data.
12. Sequential File Implementation
● #include <iostream.h>
● #include <fstream.h>
● #include <stdlib.h>
●
● Void main() {
● Int i, Roll[N] = { 171,717, 834, 394, 475 };
● float Percentage[N]= {45.3, 84.5, 95.0,
48.2, 39.2 };
● Char* Name[N] = {“wajid”, “Aashir”,
“Luqman”, “Tushar”, “Waseem” };
● // Step1: Create ofstream and ifstream
objects
● Ofstream outFile; ifstream inFile;
● // Step 2: Open file for output
● outFile.open(“percent.dat”, ios::out);
● // Step 3 Test weather open operation is
successful
● If (!outFile) {
– cout<<”File could not open “;
– Exit(1);
● Else
– Cout<<”n File open successfullyn”;
● //Step 4: Write to file
● For( i=0; i<N; i++)
– OutFile <<Name[i]<<' '<<Roll[i]<< ' '
<<Percentage[i]<<endl;
● cout<<”n File write successfully. n”;
● // Step 5: Close file
● outFile.close();
13. Sequential File Implementation
● // Step 6 open file for input
● inFile.open(“percentage.dat”, ios::in);
● // Step 7: Test wether file open successfully.
● if(!inFile) {
– cout<<”File could not open”<<endl;
– Exit(1);
● }
● // Step 8: Read from input File
● While( inFile>> Name >>Roll >> Percentage)
– cout<<setiosflags(ios::left)<<setw(14)<<roll <<setw(16)<<Name <<setw(9)<<Setprecision(4)
– <<setiosflags(ios::showpoint | ios:: right)
– <<percent<<'%'<< endl;
● //Step 9: Close file
● inFile.close(); }
14. Sequential File Implementation
● OUTPUT:
Roll Number Name Percentage
171 Wajid 45.30%
717 Aashir 84.50%
834 Luqman 95.00%
394 Tushar 48.20%
475 Waseem 39.20%
15. Indexed File Organization
● Sequential search is even slower on
disk/tape than in main memory. Try to
improve performance using more
sophisticated data structures.
● An index for a file is a list of key field values
occurring in the file along with the address
of the corresponding record in the mass
storage.
● Typically the key field is much smaller than
the entire record, so the index will fit in
main memory.
16. Indexed File Organization ...
● The index can be organized as a list, a
search tree, a hash table, etc. To find a
particular record:
● Search the index for the desired key.
● When the search returns the index entry,
extract the record’s address on mass
storage.
● Access the mass storage at the given
address to get the desired record.
18. Hashed File Organization
● A hashed file uses a hash function to map
the key to the address.
● Eliminates the need for an extra file (index).
● There is no need for an index and all of the
overhead associated with it.
19. ● Use an array of M < N linked lists, good
choice is M~ N/10
● Hash: map key to integer i between 0 and
M-1.
● Insert: put at front of ith chain (if not already
there).
● Search: only need to search ith chain.
Collusion Resolution (Separate Chaining)
23. ● Use an array of size M >> N, good choice
M~2N
● Hash: map key to integer i between 0 and
M-1.
● Insert: put in slot i if free; if not try i+1, i+2,
etc.
● Search: search slot i; if occupied but no
match, try i+1, i+2, etc.
Collusion Resolution (Open Addressing)