Page fault
A page fault (sometimes called #PF, PF or hard fault[lower-alpha 1]) is a type of interrupt, called trap, raised by computer hardware when a running program accesses a memory page that is mapped into the virtual address space, but not actually loaded into main memory. The hardware that detects a page fault is the processor's memory management unit (MMU), while the exception handling software that handles page faults is generally a part of the operating system kernel. When handling a page fault, the operating system generally tries to make the required page accessible at the location in physical memory, or terminates the program in case of an illegal memory access.
Contrary to what "fault" might suggest, valid page faults are not errors, and are common and necessary to increase the amount of memory available to programs in any operating system that utilizes virtual memory, including OpenVMS, Microsoft Windows, Unix-like systems (including Mac OS X, Linux, *BSD, Solaris, AIX, and HP-UX), and z/OS.
Types
Minor
If the page is loaded in memory at the time the fault is generated, but is not marked in the memory management unit as being loaded in memory, then it is called a minor or soft page fault. The page fault handler in the operating system merely needs to make the entry for that page in the memory management unit point to the page in memory and indicate that the page is loaded in memory; it does not need to read the page into memory. This could happen if the memory is shared by different programs and the page is already brought into memory for other programs.
The page could also have been removed from the working set of a process, but not yet written to disk or erased, such as in operating systems that use Secondary Page Caching. For example, HP OpenVMS may remove a page that does not need to be written to disk (if it has remained unchanged since it was last read from disk, for example) and place it on a Free Page List if the working set is deemed too large. However, the page contents are not overwritten until the page is assigned elsewhere, meaning it is still available if it is referenced by the original process before being allocated. Since these faults do not involve disk latency, they are faster and less expensive than major page faults.
Major
This is the mechanism used by an operating system to increase the amount of program memory available on demand. The operating system delays loading parts of the program from disk until the program attempts to use it and the page fault is generated. If the page is not loaded in memory at the time of the fault, then it is called a major or hard page fault. The page fault handler in the OS needs to find a free location: either a page in memory, or another non-free page in memory. This latter might be used by another process, in which case the OS needs to write out the data in that page (if it has not been written out since it was last modified) and mark that page as not being loaded in memory in its process page table. Once the space has been made available, the OS can read the data for the new page into memory, add an entry to its location in the memory management unit, and indicate that the page is loaded. Thus major faults are more expensive than minor faults and add disk latency to the interrupted program's execution.
Invalid
If a page fault occurs for a reference to an address that is not part of the virtual address space, meaning there cannot be a page in memory corresponding to it, then it is called an invalid page fault. The page fault handler in the operating system will then generally pass a segmentation fault to the offending process, indicating that the access was invalid; this usually results in abnormal termination of the code that made the invalid reference. A null pointer is usually represented as a pointer to address 0 in the address space; many operating systems set up the memory management unit to indicate that the page that contains that address is not in memory, and do not include that page in the virtual address space, so that attempts to read or write the memory referenced by a null pointer get an invalid page fault.
Invalid conditions
Illegal accesses and invalid page faults, as invalid conditions, can result in a segmentation fault or bus error, resulting in programming termination (crash) or core dump, depending on the operating system environment. Often these problems are caused by software bugs, but hardware memory errors, such as those caused by overclocking, may corrupt pointers and make correct software fail.
Operating systems such as Windows and UNIX (and other UNIX-like systems) provide differing mechanisms for reporting errors caused by page faults. Windows uses structured exception handling to report page fault-based invalid accesses as access violation exceptions, and UNIX (and UNIX-like) systems typically use signals, such as SIGSEGV, to report these error conditions to programs.
If the program receiving the error does not handle it, the operating system performs a default action, typically involving the termination of the running process that caused the error condition, and notifying the user that the program has malfunctioned. Recent versions of Windows often report such problems by simply stating something like "this program must close" (an experienced user or programmer with access to a debugger can still retrieve detailed information). Recent Windows versions also write a minidump (similar in principle to a core dump) describing the state of the crashed process. UNIX and UNIX-like operating systems report these conditions to the user with error messages such as "segmentation violation", or "bus error", and may also produce a core dump.
Page faults, by their very nature, degrade the performance of a program or operating system and in the degenerate case can cause thrashing. Optimization of programs and the operating system that reduce the number of page faults improve the performance of the program or even the entire system. The two primary focuses of the optimization effort are reducing overall memory usage and improving memory locality. To reduce the page faults in the system, programmers must make use of an appropriate page replacement algorithm that suits the current requirements and maximizes the page hits. Many have been proposed, such as implementing heuristic algorithms to reduce the incidence of page faults. Generally, making more physical memory available also reduces page faults.
Major page faults on conventional computers (which use hard disk drives for storage) can have a significant impact on performance. An average hard disk drive has an average rotational latency of 3 ms, a seek time of 5 ms, and a transfer time of 0.05 ms/page. Therefore, the total time for paging is near 8 ms (= 8,000 μs). If the memory access time is 0.2 μs, then the page fault would make the operation about 40,000 times slower.
Notes
- ↑ Microsoft uses the term "hard fault" in some versions of its Resource Monitor, e.g. in Windows Vista (as used in the Resource View Help in Microsoft operating systems).
References
- John L. Hennessy, David A. Patterson, Computer Architecture, A Quantitative Approach (ISBN 1-55860-724-2)
- Tanenbaum, Andrew S. Operating Systems: Design and Implementation (Second Edition). New Jersey: Prentice-Hall 1997.
- Intel Architecture Software Developer's Manual–Volume 3: System Programming
External links
- "So What Is A Page Fault?" from OSR Online (a Windows-specific explanation)
- "Virtual Memory Details" from the Red Hat website.
- "UnhandledExceptionFilter (Windows)" from MSDN Online.
- "Page fault overhead" for information about how page faults can crucially affect processing time.