Data Storage in DBMS
Data Storage and Querying in DBMS
At this point, it is worth while to note the difference between the terms file Organisation and the access method. A file organisation refers to the organisation of the data of a file into records, blocks, and access structures; this incbe and access structures: this includes the way records and blocks are placed on the storage medium and interlinked. An access method, on the other hand, is the way how the data can be retrieved based on the file Organisation. Data Storage in DBMS
Mostly the databases are stored. persistently on magnetic disks for the reasons given below:
- The databases being very large may not fit completely in the main memory.
- Storing the data permanently using the non-volatile storage and provide access to the users with the help of front end applications.
- Primary storage is considered to be very expensive and in order to cut short the cost of the storage per unit of data to substantially less. Data Storage in DBMS
Each hard drive is usually composed of a set of disk platters. Each disk platter has a layer of magnetic material deposited on its surface. The entire disk can contain a large amount of data, which is organised into smaller packages called BLOCKS (or pages). On most computers, one block is equivalent to 1 KB of data (= 1024 Bytes). Data Storage in DBMSDat a Storage in DBMS
A block is the smallest unit of data transfer between the hard disk and the processor of the computer. Each block therefore has a fixed, assigned, address. Typically, the computer processor will submit a read/write request, which includes the address of the block, and the address of RAM in the computer memory area called a buffer (or cache) where the data must be stored / taken from. The processor then reads and modifies the buffer data as required, and, if required, writes the block back to the disk. Let us see how the tables of the database are stored on the hard disk. Data Storage in DBMS
How are tables stored on Disk?
We realise that each record of a table can contain different amounts of data. This is because in some records, some attribute values may be ‘null’. Or, some attributes may be of type varchar(), and therefore each record may have a different length string as the value of this attribute. Therefore, the record is stored with each subsequent attribute separated by the next by a special ASCII character called a field separator. Of course, in each block, we may place many records. Each record is separated from the next, again by another special ASCII character called the record separator. Let us see:in the next section about the types of file Organisation briefly. Data Storage in DBMS
FILE ORGANISATION AND ITS TYPES
Just as arrays, lists, trees and other data structures are used to implement data Organisation in main memory, a number of strategies are used to support the support the Organisation of data in secondary memory. A file organisation is a technique to organise data in the secondary memory. In this section, we are concerned with obtaining data representation for files on external storage devices so that required functions (e.g. retrieval, update) may be carried out effictently. Data Storage in DBMS
File Organisation is a way of arranging the records in a file when the file is stored on the disk. Data files are organized so as to facilitate access to records and to ensure their efficient storage. A tradeoff between these two requirements generally exists: if örapid access is required, more storage is required to make it possible. Selection of File Organisations is dependant on two factors as shown below: Data Storage in DBMS
- Typical DBMS applications need a small subset of the DB at any given time. •
- When a portion of the data is needed it must be located on disk, copied to memory for processing and rewritten to disk if the data was modified. Data Storage in DBMS
A file of record is likely to be accessed and modified in a variety of ways, and different ways of arranging the records enable different operations over the file to be carried out efficiently. A DBMS supports several file Organisation techniques. The important task of the DBA is to choose a good Organisation for each file, based on its type of use. Data Storage in DBMS
The particular organisation most suitable for any application will depend upon such factors as the kind of external storage available, types of queries allowed, number of keys, mode of retrieval and mode of update. The Figurel illustrates different file organisations based on an access key. Data Storage in DBMS
Let us discuss some of these techniques in more detail:
Heap files tutordered file)
Basically these files are unordered files. It is the simplest and most basic type. These files consist of randomly ordered records. The records will have no particular order. The operations we can perform on the records are insert, retrieve and delete. The features of the heap file or the pile file Organisation are: Data Storage in DBMS
- New records can be inserted in any empty space that can accommodate them.
- When old records are deleted, the occupied space becomes empty and available for any new insertion.
- If updated records grow; they may need to be relocated (moved) to a new empty space. This needs to keep a list of empty space.
Advantages of heap files
- This is a simple file Organisation method.
- Insertion is somehow efficient.
- Good for bulk-loading data into a table. Data Storage in DBMS
- Best if file scans are common or insertions are frequent.
Disadvantages of heap files
- Retrieval requires a linear search and is inefficient. Data Storage in DBMS
- Deletion can result in unused space/need for reorganisation.
Sequential File Organisation
The most basic way to organise the collection of records in a file is to use sequential Organisation. Records of the file are stored in sequence by the primary key field values.
They are accessible only in the order stored, i.e., in the primary key order. This kind of file Organisation works well for tasks which need to access nearly every record in a file, e.g., payroll. Let us see the advantages and disadvantages of it. In a sequentially organised file records are written consecutively when the file is created and must be accessed consecutively when the file is later used for input (Figure 2).
A sequential file maintains the records in the logical sequence of its primary key values. Sequential files are inefficient for random access, however, are suitable for sequential access. A sequential file can be stored on devices like magnetic tape that allow sequential access.
On an average, to search a record in a sequential file would require to look into half of the records of the file. However, if a sequential file is stored on a disk (remember disks support direct access of its blocks) with keyword stored separately from the rest of record, then only those disk blocks need to be read that contains the desired record or records. This type of storage allows binary search on sequential file blocks, thus enhancing the speed of access.
Updating a sequential file usually creates a new file so that the record sequence on primary key is maintained. The update operation first copies the records till the record after which update is required into the new file and then the updated record is put followed by the remainder of records. Thus method of updating a sequential file automatically creates a backup copy.
Additions in the sequential files are also handled in a similar manner to update. Adding a record requires shifting of all records from the point of insertion to the end of file to create space for the new record. On the other hand deletion of a record requires a compression of the file space.
The basic advantages of sequential file is the sequential processing, as next record is easily accessible despite the absence of any data structure. However, simple queries are time consuming for large files. A single update is expensive as new file must be created, therefore, to reduce the cost per update, all updates requests are sorted in the order of the sequential file. This update file is then used to update the sequential file in a single go. The file containing the updates is sometimes referred to as a transaction file.
This process is called the batch mode of updating. In this mode each record of master sequential file is checked for one or more possible updates by comparing with the update information of transaction file. The records are written to new master file in the sequential manner. A record that require multiple update is written only when all the updates have been performed on the record. A record that is to be deleted is not written to new master file. Thus, a new updated master file will be created from the transaction file and old master file.
Thus, update, insertion and deletion of records in a sequential file require a new file creation. Can we reduce creation of this new file? Yes, it can easily be done if the original sequential file is created with holes which are empty records spaces as shown in the Figure 3. Thus, a reorganisation can be restricted to only a block that can be done very easily within the main memory. Thus, holes increase the performance of sequential file insertion and deletion.
This organisation also support a concept of overflow area, which can store the spilled over records if a block is full. This technique is also used in index sequential file organisation. A detailed discussion on it can be found in the further readings.
Advantages of Sequential File Organisation
- It is fast and efficient when dealing with large volumes of data that need to be processed periodically (batch system).
Disadvantages of sequential File Organisation
- Requires that all new transactions be sorted into the proper sequence for sequential access processing.
- Locating, storing, modifying, deleting, or adding records in the file require rearranging the file.
- This method is too slow to handle applications requiring immediate updating or responses.
Indexed (Indexed Sequential) File Organisation
It organises the file like a large dictionary, i.e., records are stored in order of the key but an index.is kept which also permits a type of direct access. The records are stored sequentially by primary key values and there is an index built over the primary key field
The retrieval of a record from a sequential file, on average, requires access to half the records in the file, making such inquiries not only inefficient but very time consuming for large files, To improve the query response time of a sequential file, a type of indexing technique can be added.
An index is a set of index value, address pairs. Indexing associates a set of objects to a set of orderable quantities, that are usually smaller in number or their properties. Thus, an index is a mechanism for faster search. Although the indices and the data blocks are kept together physically, they are logically distinct. Let us use the term an index file to describes the indexes and let us refer to data files as data records. An index can be small enough to be read into the main memory.
A sequential (or sorted on primary keys) file that is indexed on its primary key is called an index sequential file. The index allows for random access to records, while the sequential storage of the records of the file provides easy access to the sequential records. An additional feature of this file system is the over flow area. The overflow area provides additional space for record addition without the need to create.
Hashed File Organisation
Hashing is the most common form of purely random access to a file or database. It is also used to access columns that do not have an index as an optimisation technique. Hash functions calculate the address of the page in which the record is to be stored based on one or more fields in the record. The records in a hash file appear randomly distributed across the available space. It requires some hashing algorithm and the technique. Hashing Algorithm converts a primary key value into a record address. The most popular form of hashing is division hashing with chained overflow.
- Advantages of Hashed file Organisation
- Insertion or search on hash-key is fast.
- Best if equality search is needed on hash-key.
Disadvantages of Hashed file Organisation
- It is a complex file Organisation method.
- Search is slow.
- It suffers from disk space overhead.
- Unbalanced buckets degrade performance.
- Range search is slow.