Universally unique identifier
A universally unique identifier (UUID) is an identifier standard used in software construction. A UUID is simply a 128-bit value. The meaning of each bit is defined by any of several variants.
For human-readable display, many systems use a canonical format using hexadecimal text with inserted hyphen characters. For example:
123e4567-e89b-12d3-a456-426655440000
The intent of UUIDs is to enable distributed systems to uniquely identify information without significant central coordination. In this context the word unique should be taken to mean "practically unique" rather than "guaranteed unique". Since the identifiers have a finite size, it is possible for two differing items to share the same identifier. This is a form of hash collision. The identifier size and generation process need to be selected so as to make this sufficiently improbable in practice. Anyone can create a UUID and use it to identify something with reasonable confidence that the same identifier will never be unintentionally created by anyone to identify something else. Information labeled with UUIDs can therefore be later combined into a single database without needing to resolve identifier (ID) conflicts.
Adoption of UUIDs is widespread with many computing platforms providing support for generating UUIDs and for parsing/generating their textual representation.
Definition
A UUID is a 16-octet (128-bit) number.
In its canonical form, a UUID is represented by 32 lowercase hexadecimal digits, displayed in five groups separated by hyphens, in the form 8-4-4-4-12
for a total of 36 characters (32 alphanumeric characters and four hyphens). For example:
123e4567-e89b-12d3-a456-426655440000
The first 3 sequences are interpreted as complete hexadecimal numbers, while the final 2 as a plain sequence of bytes. The byte order is "most significant byte first (known as network byte order)"[1](sec. 4.1.2) (note that GUID's byte order is different). This form is defined in the RFC[1](sec. 3) and simply reflects UUID's division into fields,[1](sec. 4.1.2) which apparently originates from the structure of the initial time and MAC-based version.
The number of possible UUIDs is 1632, which is 2128 or about 3.4 × 1038.
Variants and Versions
A UUID is formatted according to a specified variant and a specific version of the variant. The variant indicates the layout of the UUID. The UUID specification covers one particular variant. Other variants are reserved or exist for backward compatibility reasons (e.g., for values assigned before the UUID specification was produced). An example of a UUID that is a different variant is the nil (or empty) UUID, which is a UUID that has all 128 bits set to zero.
In the canonical representation, xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx
, the most significant bits of N
indicates the variant (depending on the variant; one, two, or three bits are used). The variant covered by the UUID specification is indicated by the two most significant bits of N
being 1 0
(i.e., the hexadecimal N
will always be 8
, 9
, A
, or B
).
RFC 4122 Variant
The variant covered by the UUID specification has five versions. For this variant, the four bits of M
indicate the UUID version (i.e., the hexadecimal M
will be either 1, 2, 3, 4, or 5).
Each version uses different information to generate the UUID and thus some may be more appropriate than the others in specific use cases. According to the specification, Version 1 UUIDs are generated from date-time and MAC address, Version 2 UUIDs are generated from group or user id and date-time, Version 3 & 5 produces deterministic UUIDs generated from a user-specified namespace and user-supplied data, and Version 4 is generated from pseudo-random number.
Version 1 (date-time & MAC address)
Conceptually, the original (version 1) generation scheme for UUIDs was to concatenate the UUID version with the MAC address of the computer that is generating the UUID, and with the number of 100-nanosecond intervals since the adoption of the Gregorian calendar in the West. By representing a single point in space (the computer) and time (the number of intervals), the chance of a collision in values is effectively nil.
This scheme has been criticized in that it is not sufficiently "opaque"; it reveals both the identity of the computer that generated the UUID and the time at which it did so. Its uniqueness across computers is guaranteed as long as MAC addresses are not duplicated (which can happen, for instance, due to manual setting or “spoofing” of the MAC address); however, given the speed of modern processors, successive invocations on the same machine of a naive implementation of a generator of version 1 UUIDs may produce the same UUID, violating the uniqueness property. (Non-naïve implementations can avoid this problem by, for example, remembering the most recently generated UUID, "pocketing" unused UUIDs, and using pocketed UUIDs in case a duplicate is about to be generated.)
Version 2 (DCE Security)
Version 2 UUIDs are similar to Version 1 UUIDs, with the first 4 bytes of the timestamp replaced by the user's POSIX UID or GID (with the "local domain" identifier indicating which it is) and the upper byte of the clock sequence replaced by the identifier for a "local domain" (typically either the "POSIX UID domain" or the "POSIX GID domain").[2][3] However, Version 2 is not explicitly defined in the UUID specification and thus is not implemented by all UUID generators.[4]
Version 3 (MD5 hash & namespace)
Version 3 UUIDs use a scheme deriving a UUID via MD5 from a URL, a
fully qualified domain name, an object identifier, a distinguished name (DN as used in Lightweight Directory Access Protocol), or on names in
unspecified namespaces. Version 3 UUIDs have the form xxxxxxxx-xxxx-3xxx-yxxx-xxxxxxxxxxxx
where x
is any hexadecimal digit and y
is one of 8
, 9
, A
, or B
. According to the specification, Version 3 exists for backwards compatibility, "If backward compatibility is not an issue, SHA-1 [Version 5] is preferred."[1]
To determine the version 3 UUID of a given name, the UUID of the namespace (e.g., 6ba7b810-9dad-11d1-80b4-00c04fd430c8
for a domain) is transformed to a string of bytes corresponding to its hexadecimal digits, concatenated with the input name, hashed with MD5 yielding 128 bits. Six bits are replaced by fixed values, four of these bits indicate the version, 0011
for version 3. Finally, the fixed hash is transformed back into the hexadecimal form with hyphens separating the parts relevant in other UUID versions.
Namespaces
Namespaces are, themselves, UUIDs. While the UUID specification gives example UUIDs for namespaces corresponding to fully qualified domain names (DNS), URLs, ISO OIDs, and X.500 DNs, any UUID can be used as the namespace when generating Version 3 or 5 UUIDs. Thus, "nesting" of namespaces is possible.
Version 4 (random)
Version 4 UUIDs use a scheme relying only on random numbers. This algorithm sets the version number (4 bits) as well as two reserved bits. All other bits (the remaining 122 bits) are set using a random or pseudorandom data source. Version 4 UUIDs have the form xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx
where x
is any hexadecimal digit and y
is one of 8
, 9
, A
, or B
(e.g., f47ac10b-58cc-4372-a567-0e02b2c3d479
).
Version 5 (SHA-1 hash & namespace)
Version 5 UUIDs use a scheme with SHA-1 hashing; otherwise it is the same idea as in version 3. RFC 4122 states that version 5 is preferred over version 3 name based UUIDs, as MD5's security has been compromised. Note that the 160 bit SHA-1 hash is truncated to 128 bits to make the length work out. An erratum addresses the example in appendix B of RFC 4122.
Random UUID probability of duplicates
Out of a total of 128 bits, two bits indicate an RFC 4122 ("Leach-Salz") UUID and four bits the version (0100 indicating "randomly generated"), so randomly generated UUIDs have 122 random bits. The chance of two such UUIDs having the same value can be calculated using probability theory (birthday problem). Using the approximation
these are the probabilities of an accidental clash after calculating n UUIDs, with x = 122:
n | probability |
---|---|
68,719,476,736 = 236 | 0.0000000000000004 (4 × 10−16) |
2,199,023,255,552 = 241 | 0.0000000000005 (5 × 10−13) |
70,368,744,177,664 = 246 | 0.0000000005 (5 × 10−10) |
When the term is close to zero, the probability of a clash can be accurately approximated by
- .
To put these numbers into perspective, the annual risk of a given person being hit by a meteorite is estimated to be one chance in 17 billion,[5] which means the probability is about 0.00000000006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%.
However, these probabilities only hold when the UUIDs are generated using sufficient entropy. Otherwise, the probability of duplicates could be significantly higher, since the statistical dispersion might be lower. Where unique identifiers are required for distributed applications, so that UUIDs do not clash even when data from many devices is merged, the randomness of the seeds and generators used on every device must be reliable for the life of the application. Where this is not feasible, RFC4122 recommends using a namespace variant instead.
Standards
UUIDs are standardized by the Open Software Foundation (OSF) as part of the Distributed Computing Environment (DCE).
UUIDs are documented as part of ISO/IEC 11578:1996 "Information technology – Open Systems Interconnection – Remote Procedure Call (RPC)" and more recently in ITU-T Rec. X.667 | ISO/IEC 9834-8:2005.
The IETF has published the Standards-Track, RFC 4122, that is technically equivalent with ITU-T Rec. X.667 | ISO/IEC 9834-8.
History
UUIDs were originally used in the Apollo Network Computing System and later in the Open Software Foundation's (OSF) Distributed Computing Environment (DCE). The initial design of DCE UUIDs was based on UUIDs as defined in the Apollo Computer Network Computing System,[6] whose design was in turn inspired by the (64-bit) unique identifiers defined and used pervasively in Domain/OS, an operating system also designed by Apollo Computer.
Later, the Microsoft Windows platforms adopted that design as globally unique identifiers (GUIDs).
Other significant uses include ext2/ext3/ext4 filesystem userspace tools (e2fsprogs uses libuuid provided by util-linux), LUKS encrypted partitions, GNOME, KDE, and Mac OS X,[7] most of which either use the libuuid library now provided by the util-linux package or implementations derived from it or from the original implementation by Theodore Ts'o in the e2fsprogs[8] package (the latter has been moved to the util-linux[9] package in version 2.15.1[10] for consistency). One of the uses of UUIDs in Solaris (using Open Software Foundation implementation) is identification of running operating system instance for the purpose of pairing crash dump data with Fault Management Event in the case of kernel panic.[11]
See also
References
- 1 2 3 4 P. Leach; et al. (July 2005). "RFC 4122 - A Universally Unique IDentifier (UUID) URN Namespace". Internet Engineering Task Force.
- ↑ The Open Group (1997). "CDE 1.1: Remote Procedure Call".
- ↑ The Open Group (1997). "DCE 1.1: Authentication and Security Services".
- ↑ Kuchling, A.M. "What’s New in Python 2.5". Python.org. Retrieved 23 January 2016.
- ↑ Old Farmer's Almanac 1994, 220–222, Taking your Chances: An Explanation of Risk
- ↑ Zahn, Lisa (1990). Network Computing Architecture. Prentice Hall. p. 10. ISBN 0-13-611674-4.
- ↑ gen_uuid.c in Apple's Libc-391, corresponding to Mac OS X 10.4
- ↑ gen_uuid.c in e2fsprogs
- ↑ gen_uuid.c in util-linux
- ↑ according to util-linux's man 3 uuid manual page, section AVAILABILITY
- ↑ blog post explaining layout of crash dump files in Solaris 11.2 and later corresponding to OS image UUIDs
External links
- International Standard "X.667 : … Generation … of Universally Unique Identifiers (UUIDs)…" (ITU-T Rec. X.667 as of 2008–08, freely available)
- International Standard "Generation and registration of Universally Unique Identifiers (UUIDs) and their use as ASN.1 Object Identifier components" (ITU-T Rec. X.667, freely available)
- ISO/IEC 9834-8:2008 "... Generation and registration of Universally Unique Identifiers (UUIDs) and their use as ASN.1 Object Identifier components"
- A Universally Unique IDentifier (UUID) URN Namespace (IETF RFC 4122)
- Extract the time from a version 1 UUID / GUID
- Global UUID registration function at ITU-T
- Commons Id
- DmaId for InstanceId Values (DCE Universally Unique IDentifiers, UUIDs)
- Syntax and semantics of the DCE variant of Universal Unique Identifiers (UUIDs)
- Random UUID Probability of Duplicates
- ITU registration of UUIDs and representation of a UUID as an OID