Table of Contents
A simply definition of a regular file would be that it is a one dimensional assortment of bytes that are stored on a disk or other mass storage devices.
A program that uses a file needs to know the structure of the file and needs to interpret the file contents. This is because no structure is imposed on the contents of a file by the operating system itself. This is a very powerful feature as it means that you could work with any file that you need to work on e.g. a DOS file.
Files are presented to the application as a stream of bytes and then an EOF condition.
However the EOF condition is not typed in, it is merely that the stream of bytes is as long as the file size is and then it is at the end. In other words it is actually a sort of offset that will happen when a program attempts to access an offset larger than the file size itself.
There are many different types of regular files, text, binary, executable etc. When using the file command to establish a file type the command accesses the magic database. If you get a chance have a look at the magic file and see how many different types of files Linux could support.
A regular file is referenced by an inode number (see the section called “Inodes”).
A simple definition of a directory is that it is a file that provides a mapping mechanism between the names of files and the files (datablocks) themselves.
A directory is also called a file. Its purpose is really to make sure that there is a good structure maintained on the system - sort of like a good filing system.
The directory only holds inode numbers and filenames. Yet this is also vitally important as this is the only place where a filename is referenced by its inode.
If you delete a file from a directory the entry in the list is zeroed and this is then called a shadow inode. The inode is then freed up.
A device file refers to a device driver and these are important to the kernel. The device drivers are written in C and compiled to make an object file and then placed as part of the kernel. Created a device file using the mknod command.
The files in /dev are used to ensure that we can access hardware such as the printer, cdrom, network etc.
If you look at the way Linux uses a device driver, it handles many of the functions that we could compare to the way DOS uses the BIOS. However the differences are often the reason why a piece of hardware that would work with a DOS related system will not work with a Unix or Linux related system. Linux will either see or not see a non-standard piece of hardware.
Here we can read and write directly to the device, so the user issues a system call to a device, the kernel performs a successful open on that device, if busy the read/write routine cannot operate, if not busy then reads or writes directly to that device.
There are different types of device files:
Character device files - writes to and from the device a character at a time. Indicated by a "c" in the first field. Very little preliminary processing required by the kernel so the request is passed directly to the device.
A block device only receives a request once block buffering has taken place in the kernel. Indicated by a "b" in the first field. A filesystem is an example of a block buffering device. Block devices generally have an associated character device - for example if you format a diskette you would use the character device file name, if backing up to that diskette you would use the block device name where the blocking is handled in the kernel buffer cache.
It is possible in Linux to set up device files that allow communication between processes. Some of these are pipes, semaphores and shared memory devices and we have already seen the effectiveness of using un-named pipes.
Theoretically these are special files and they are handled in a similar way to the device files however they are not true device files. They are created when needed and then removed.
The scope of these definitions moves into development and therefore we are only going to discuss the theory of such, it is important to have an understanding of these features as the system administrator.
We are thus far used pipes as un-named pipes when doing shell scripting, however there are also named pipes available for us to use.
In a named pipe situation the processes are being run by the same process group and by the same user. e.g.
ls -li | more |
An un-named pipe would be created as a node using the mknod command and here you could use it when running processes within different process groups. The un-named pipe has to be specified to both processes separately. e.g. When the output from one process becomes the input to another process running in the background.
The shortest definition would be that it is a data structure used by several processes to control and synchronize the operations on one resource.
e.g. If more than one user is accessing a record in a database it is likely that a semaphore would either gain exclusive access for each process in turn, or lock the record being updated.
This type of file is indicated by an "m" in the first field. A piece of user-memory is allocated as a work-space, where it is possible for a process to read the data at the same time as another writes it.
Each device is referenced by numbers, which are read when the kernel needs to use a device and subsequent device driver.
These numbers are called the major and minor device numbers.
The major device number refers to the device driver that should be used to access the device and it is a unique number, whereas the minor device number points to the device itself.
If you do a long listing in the /dev directory you will see that they are represented in the 6th column of the long listing report.
As a file is created a unique number is ascribed to that file and this is called an inode number. The inode holds specific information about the file such as:
The permission mode assigned to that file (at creation time this would have been assigned with the system umask)
The number of links in place for the file
The file owners UID number
The group GID number
The file size represented in bytes
The address of the datablocks (or major and minor device numbers)
The time the file was last modified
The time that file was last accessed (useful for housekeeping for if a file has not been accessed for 3 years maybe we could remove it)
The time any part of the inode was changed
When an inode resides on the disk it is called a disk inode, however when a file is opened, the kernel puts the inode onto a generic inode table and the inode is called a generic inode. Inodes are stored on the generic table with a link to a hash queue and this is stored in table that is reference by the kernel each time it opens a file to make sure that the file is not already open.