Read system call processing in user space
The implementation mechanism of the Linux system call (SCI, system call interface) is actually a multi-channel aggregation and decomposition process. The aggregation point is the entry point of the 0x80 interrupt (X86 system structure). That is, all system calls are aggregated from user space to the 0x80 breakpoint, while the specific system call number is saved. When the 0x80 interrupt handler runs, different system calls are processed separately based on the system call number (calling different kernel function processing). For more on system calls, see Resources.
The Read system call is no exception. When the call occurs, the library function sinks into a 0x80 interrupt after saving the read system call number and parameters. At this point, the library function ends. The processing of the Read system call in user space is complete.
Read system call processing in core space
After the 0x80 interrupt handler takes over, first check its system call number, then find the system call table according to the system call number, and get the kernel function sys_read that handles the read system call from the system call table, and finally pass the parameters and run the sys_read function. At this point, the kernel really starts processing the read system call (sys_read is the kernel entry for the read system call).
In the processing part of the read system call in the core space, the hierarchical model of the kernel processing disk request is first introduced, and then the processing of the disk read request at each layer is introduced in order from top to bottom according to the hierarchical model.
Read system call hierarchy model processed in core space
Figure 1 shows the hierarchical model that the read system call goes through in the core space. It can be seen from the figure that a read request for a disk first passes through the virtual file system layer (vfs layer), followed by a specific file system layer (for example, ext2), followed by a cache layer (page cache layer) and a common block layer. (generic block layer), IO scheduling layer (I/O scheduler layer), block device driver layer, and finally block device layer
Figure 1 The processing hierarchy of the read system call in the core space
The role of the virtual file system layer: to shield the differences in the operation of the underlying specific file system, to provide a unified interface for the operation of the upper layer. Because of this level, you can abstract the device into a file, making the operating device as simple as manipulating the file.
In a specific file system layer, the specific operations of different file systems (such as ext2 and NTFS) are different. Each file system defines its own set of operations. For more on the file system, see Resources.
The purpose of introducing the cache layer is to improve the performance of the Linux operating system for disk access. The Cache layer caches some of the data on the disk in memory. When the data request arrives, if the data exists in the cache and is up-to-date, the data is directly passed to the user program, which eliminates the operation of the underlying disk and improves performance.
The main job of the general block layer is to receive the disk request from the upper layer and finally issue an IO request. This layer hides the characteristics of the underlying hardware block device and provides a generic abstract view of the block device.
The function of the IO scheduling layer: receiving IO requests from the general block layer, caching requests and attempting to merge adjacent requests (if the two requested data are contiguous on disk). According to the set scheduling algorithm, the request processing function provided by the driver layer is called back to process the specific IO request.
The driver in the driver layer corresponds to a specific physical block device. It takes the IO request from the upper layer and manipulates the device to transfer data by sending a command to the device controller of the specific block device based on the information specified in the IO request.
The device layer is a specific physical device. A specification for operating a specific device is defined.
Related kernel data structures:
Dentry : The i-node that contacted the file name and file
Inode : file i node, save file ID, permissions and content
File : a collection of function pointers for saving files and various action files
File_operations : a collection of function interfaces for the action file
Address_space : describes the page cache structure of the file and related information, and contains a collection of function pointers that manipulate the page cache
Address_space_operations : a collection of function interfaces that manipulate the page cache
Bio : description of the IO request
Relationship between data structures:
Figure 2 schematically illustrates the relationship between the various data structures described above (except bio). It can be seen that the inode object can be found by the dentry object, the address_space object can be fetched from the inode object, and the address_space_operations object can be found by the address_space object.
The File object can be retrieved based on the information provided in the current process descriptor, which in turn finds the dentry object, the address_space object, and the file_operations object.
Figure 2 Data structure diagram:
Prerequisites:
There are a lot of things that can be encountered in the kernel for a specific read call. Here is an example of one of them:
The file to be read already exists
File through page cache
I want to read a normal file.
The file system on the disk is the ext2 file system. For related content about the ext2 file system, see Resources.
ready:
Note: All the code in the list comes from the linux2.6.11 kernel source code.
The file must be opened before reading the data. The kernel function that handles open system calls is sys_open . So let's first look at what the function does. Listing 1 shows the code for sys_open (some parts are omitted, and the list of programs in the future is handled the same way)
Listing 1 sys_open function code
Code explanation:
Get_unuesed_fd() : Retrieves an unused file descriptor (the smallest unused file descriptor is selected each time).
Filp_open() : Calls the open_namei() function to retrieve the dentry and inode associated with the file (because the premise indicates that the file already exists, so dentry and inode can be found without creating it), then call the dentry_open() function to create a new file object. And initialize the file object with the information in dentry and inode (the current read and write position of the file is saved in the file object). Notice that there is a statement in dentry_open():
F->f_op = fops_get(inode->i_fop);
This assignment statement assigns a function pointer set associated with the specific file system to the file object's f _op variable (this pointer set is stored in the inode object), which will be called in the next sys_read function. -> member read in f_op.
Fd_install() : Indexes the file descriptor and associates the current process descriptor with the above file object to prepare for subsequent read and write operations.
The function finally returns the file descriptor.
Figure 3 shows the relationship between the file object and the current process descriptor after the sys_open function returns, and the source of the function pointer collection of the action file in the file object (member i_fop in the inode object).
Figure 3 The relationship between the file object and the current process descriptor
So far, all the preparations have been completed. The following describes the processing of the read system call in each level shown in Figure 1.
Processing of the virtual file system layer:
The kernel function sys_read() is the entry point for the read system call at this level, and Listing 2 shows the code for the function.
Listing 2 code for the sys_read function
Code parsing:
Fget_light() : Extracts the corresponding file object from the current process descriptor according to the index specified by fd (see Figure 3).
An error is returned if the specified file object is not found
If the specified file object is found:
Call the file_pos_read() function to get the current position of the read and write file.
Call vfs_read() to perform a file read operation, and this function finally calls the function pointed to by file->f_op.read(). The code is as follows:
If (file->f_op->read)ret = file->f_op->read(file, buf, count, pos);
Call file_pos_write() to update the current read and write location of the file.
Call fput_light() to update the reference count of the file.
Finally returns the number of bytes of data read.
At this point, the processing done by the virtual file system layer is complete, and control is passed to the ext2 file system layer.
Before we parse the operation of the ext2 file system layer, let's take a look at the source of the read pointer in the file object.
Source of the read function pointer in the File object:
From the previous analysis of the sys_open kernel function, file->f_op comes from inode->i_fop. So where does inode->i_fop come from? Given when the inode object is initialized. See Listing 3.
Listing 3 ext2_read_inode() function part code
As you can see from the code, if the file associated with the inode is a normal file, the address of the variable ext2_file_operations is assigned to the i_fop member of the inode object. So you can know: The function pointed to by the inode->i_fop.read function pointer is the function pointed to by the member read of the ext2_file_operations variable. Let's take a look at the initialization process for the ext2_file_operations variable, as shown in Listing 4.
Listing 4 Initialization of ext2_file_operations
The member read points to the function generic_file_read. So, inode->i_fop.read points to the generic_file_read function, and file->f_op.read points to the generic_file_read function. The final conclusion: the generic_file_read function is the real entry point of the ext2 layer.
Processing of the Ext2 file system layer
Figure 4 read system call processing function call relationship in the ext2 layer
As can be seen from Figure 4, the layer entry function generic_file_read calls the function __generic_file_aio_read, which determines the access mode of this read request. If it is direct io (filp->f_flags is set to the O_DIRECT flag, that is, without the cache), then Call the generic_file_direct_IO function; if it is a page cache, call the do_generic_file_read function. The function do_generic_file_read is just a wrapper function that calls the do_generic_mapping_read function.
Before explaining what the do_generic_mapping_read function does, let's look at how the cache area of ​​the file is organized in memory.
File cache structure
Figure 5 shows the page cache structure of a file. The file is divided into blocks of data in units of page size, which are organized into a multi-fork tree (called a radix tree). All leaf nodes in the tree are struct pages, representing each page used to cache the file. The first page at the far left of the leaf layer holds the first 4096 bytes of the file (if the page size is 4096 bytes), the next page holds the second 4096 bytes of the file, and so on. All intermediate nodes in the tree are organization nodes that indicate the page on which the data at one address is located. The hierarchy of this tree can range from 0 to 6 and the supported file sizes range from 0 bytes to 16 T bytes. The root node pointer of the tree can be obtained from the file-related address_space object (which is stored in the inode object associated with the file) (see Resources for more details on the structure of the page cache).
Figure 5 file cache structure of the file
Now, let's take a look at what the function do_generic_mapping_read does. The do_generic_mapping_read function has a long code. This article briefly introduces its main flow:
Find the page of the cache request data in the page cache according to the current read and write position of the file.
If the page is up to date, copy the requested data to user space
Otherwise, Lock this page
Call the readpage function to issue a page request to the disk (the page will be unlocked when the underlying layer completes the IO operation), code:
1 |
Dongguan SOLEPIN Electronics Co., Ltd , https://www.wentae.com