Hard link
In computing, a hard link is a directory entry that associates a name with a file on a file system. All directory-based file systems must have (at least) one hard link giving the original name for each file. The term “hard link” is usually only used in file systems that allow more than one hard link for the same file.
Creating a hard link has the effect of giving one file multiple names (e.g. different names in different directories) all of which independently connect to the same data on the disc, none of which depends on any of the others. This causes an alias effect: e.g. if the file is opened by any one of its names, and changes are made to its content, then these changes will also be visible when the file is opened by an alternative name. By contrast, a soft link or “shortcut” to a file is not a direct link to the data itself, but rather is a short file that contains the text of a file name, or a location that gives direct access to yet another file name within some directory. The name contained in or referred to by the soft link may either be a hard link or another soft link. This also creates aliasing, but in a different way.
Every directory is itself a file, only special because it contains a list of file names maintained by the file system. Since directories themselves are files, multiple hard links to directories are possible which would create loops within the structure of the directories, rather than a branching structure like a tree. For that reason, the creation of hard links to directories is sometimes forbidden, even if possible.
Hard links – that is, multiple directory entries to the same file – are supported by POSIX-compliant and partially POSIX-compliant operating systems, such as Linux, Android, Mac OS X, Windows NT4[1] and later Windows NT operating systems.
Support also depends on the type of file system being used. For instance, the NTFS file system supports hard links, while FAT and ReFS do not.
Usage
On POSIX-compliant and partially POSIX-compliant operating systems, such as all Unix-like systems, additional hard links to existing files are created with the
system call, or the ln and link command-line utilities. The link()
stat
command can reveal how many hard links point to a given file. The link count is also included in the output of ls -l
.
On Microsoft Windows, hard links can be created using the mklink /H
command on Windows NT 6.0 and later systems (such as Windows Vista), and in earlier systems (Windows XP, Windows Server 2003) using fsutil.exe hardlink create
.
The Windows API from Windows 2000 onwards includes a CreateHardLink() call to create hard links, DeleteFile() is used to remove them, and GetFileInformationByHandle() can be used to determine the number of hard links associated with a file.[2] Hard links require an NTFS partition. [3] Starting with Windows Vista, hard links are used by Windows Component Store (WinSxS) to keep track of different versions of DLLs stored on the hard disk drive. Unix-like emulation or compatibility software running on Windows, such as Cygwin and Subsystem for UNIX-based Applications, allow the use of POSIX interfaces under Windows.
The process of unlinking dissociates a name from the data on the volume without destroying the associated data. The data is still accessible, as long as at least one link that points to it still exists. When the last link is removed, the space is considered free.[4]
A process ambiguously called undeleting allows the recreation of links to data that are no longer associated with a name. However, this process is not available on all systems and is often not reliable. When a file is deleted, it is added to a free space map for re-use. If a portion of the deleted file space is claimed by new data, undeletion will be unsuccessful, because some or all of the previous data will have been overwritten, and may result in cross-linking with the new data and leading to filesystem corruption. Additionally, deleted files on solid state drives may be erased at any time by the storage device for reclamation as free space.
Link counter
Most file systems that support hard links use reference counting. An integer value is stored with each physical data section. This integer represents the total number of links that have been created to point to the data. When a new link is created, this value is increased by one. When a link is removed, the value is decreased by one. If the link count becomes zero, the operating system usually automatically deallocates the data space of the file if no process has the file opened for access. The maintenance of this value assists users in preventing data loss. This is a simple method for the file system to track the use of a given area of storage, as zero values indicate free space and nonzero values indicate used space.
On POSIX-compliant operating systems, such as many Unix-variants, the reference count for a file or directory is returned by the stat() or fstat() system calls in the st_nlink
field of struct stat
.
Example
In the figure to the right, two hard links, named "LINK A.TXT" and "LINK B.TXT", point to the same physical data.
If the file "LINK A.TXT" is opened in an editor, modified and saved, then those changes will be visible if the file "LINK B.TXT" is then opened for viewing since both filenames point to the same data ("opened", because, on POSIX systems, an associated file descriptor remains valid after opening, even when the original file is moved). The same is true if the file were opened as "LINK B.TXT" — or any other name associated with the data.
Some editors however break the hard link concept, e.g. emacs. When opening a file "LINK B.TXT" for editing, emacs first renames "LINK B.TXT" to "LINK B.TXT~", loads "LINK B.TXT~" into the editor, and saves the modified contents to a newly created "LINK B.TXT". Using this approach, the two hard links are now "LINK A.TXT" and "LINK B.TXT~" (the backup file); "LINK B.TXT" would now have just one link and no longer shares the same data as "LINK A.TXT". (This behavior can be changed using the emacs variable backup-by-copying
.)
Any number of hard links to the physical data may be created. To access the data, a user only needs to specify the name of any existing link; the operating system will resolve the location of the actual data.
If one of the links is removed with the POSIX unlink function (for example, with the UNIX rm
command), then the data are still accessible through any other link that remains. If all of the links are removed and no process has the file open, then the space occupied by the data is freed, allowing it to be reused in the future. This semantic allows for deleting open files without affecting the process that uses them. This technique is commonly used to ensure that temporary files are deleted automatically on program termination, including the case of abnormal termination.
Limitations of hard links
To prevent loops in the filesystem, and to keep the interpretation of
(parent directory) consistent, many modern operating systems do not allow hard links to directories. UNIX System V allowed them, but only the superuser had permission to make such links.[5] Mac OS X v10.5 (Leopard) and newer use hard links on directories for the Time Machine backup mechanism only. Symbolic links and NTFS junction points are generally used instead for this purpose...
Hard links can be created to files only on the same volume. If a link to a file on a different volume is needed, it may be created with a symbolic link.
The maximum number of hard links to a single file is limited by the size of the reference counter. On Unix-like systems the counter is usually machine-word-sized (32- or 64-bit: 4,294,967,295 or 18,446,744,073,709,551,615 links, respectively), though in some filesystems, such as Btrfs, the number of hard links is limited more strictly by their on-disk format.[6] As of Linux 3.11, the ext4 filesystem limits the number of hard links on a file to 65,000.[7] Windows with NTFS filesystem has a limit of 1024 hard links on a file.[8]
Hard links were criticized as a "high-maintenance design" by Neil Brown in Linux Weekly News, since they complicate the design of programs that handle directory trees, including archivers and disk usage tools, such as du, which must take care to de-duplicate files that are linked multiple times in a hierarchy. Brown also calls attention to the fact that Plan 9 from Bell Labs, the intended successor to Unix, does not include the concept of a hard link.[9]
See also
- Fat link
- Symbolic link or soft link, which unlike hard link, only provides the text of an “actual” file name, not file data itself.
- NTFS junction point – the NTFS implementation
- alias (Mac OS) – a method for linking files introduced in Mac OS System 7 and still available in Mac OS X which is in some ways similar to a symbolic link. Note that true symbolic links are also available in OS X.
- shadow (OS/2) – the OS/2 implementation
- Firm link – a link between a hard link and a soft link, used in the GNU Hurd Operating System.
- ln (Unix) – The ln command, which is used to create new links on Unix-like systems.
- freedup – The freedup command frees-up disk space by replacing duplicate data stores with automatically generated hard links
Notes
- ↑ "Link Shell Extension".
- ↑ "NTFS Hard Links, Directory Junctions, and Windows Shortcuts". flexhex.com.
- ↑ "How hard links work".
- ↑ "Freeware to find and delete hard links".
- ↑ Bach, Maurice J. (1986). The Design of the UNIX Operating System. Prentice Hall. p. 128.
- ↑ "Hard Link Limitation". kerneltrap.org. 2010-08-08. Retrieved 2011-11-14.
- ↑ "Linux kernel source tree, fs/ext4/ext4.h, line 229".
- ↑ "MSDN - CreateHardLink function". Retrieved 14 January 2016.
- ↑ Neil Brown (23 November 2010). "Ghosts of Unix past, part 4: High-maintenance designs". Linux Weekly News. Retrieved 20 April 2014.