Virtual machine
| Program execution | 
|---|
| General topics | 
| Compilation strategies | 
| Notable runtimes | 
| 
 | 
In computing, a virtual machine (VM) is an emulation of a particular computer system. Virtual machines operate based on the computer architecture and functions of a real or hypothetical computer, and their implementations may involve specialized hardware, software, or a combination of both.
There are different kinds of virtual machines, each with different functions. System virtual machines (also known as full virtualization VMs) provide a complete substitute for the targeted real machine and a level of functionality required for the execution of a complete operating system. A hypervisor uses native execution to share and manage hardware, allowing multiple different environments, isolated from each other, to be executed on the same physical machine. Modern hypervisors use hardware-assisted virtualization, which provides efficient and full virtualization by using virtualization-specific hardware capabilities, primarily from the host CPUs. Process virtual machines are designed to execute a single computer program by providing an abstracted and platform-independent program execution environment.
Some virtual machines, such as QEMU, are designed to also emulate different architectures and allow execution of software applications and operating systems written for another CPU or architecture. Operating-system-level virtualization allows the resources of a computer to be partitioned via the kernel's support for multiple isolated user space instances, which are usually called containers and may look and feel like real machines to the end users.
Definitions
A virtual machine (VM) is a software implementation of a machine (for example, a computer) that executes programs like a physical machine. Virtual machines are separated into two major classes, based on their use and degree of correspondence to any real machine:
- A system virtual machine provides a complete system platform which supports the execution of a complete operating system (OS).[1] These usually emulate an existing architecture, and are built with the purpose of either providing a platform to run programs where the real hardware is not available for use (for example, executing on otherwise obsolete platforms), or of having multiple instances of virtual machines leading to more efficient use of computing resources, both in terms of energy consumption and cost effectiveness (known as hardware virtualization, the key to a cloud computing environment), or both.
- A process virtual machine (also, language virtual machine) is designed to run a single program, which means that it supports a single process. Such virtual machines are usually closely suited to one or more programming languages and built with the purpose of providing program portability and flexibility (amongst other things). An essential characteristic of a virtual machine is that the software running inside is limited to the resources and abstractions provided by the virtual machine—it cannot break out of its virtual environment.
A VM was originally defined by Popek and Goldberg as "an efficient, isolated duplicate of a real machine". Current use includes virtual machines which have no direct correspondence to any real hardware.[2]
System virtual machines
System virtual machine advantages:
- Multiple OS environments can co-exist on the same primary hard drive, with a virtual partition that allows sharing of files generated in either the "host" operating system or "guest" virtual environment. Adjunct software installations, wireless connectivity, and remote replication, such as printing and faxing, can be generated in any of the guest or host operating systems. Regardless of the system, all files are stored on the hard drive of the host OS.
- Application provisioning, maintenance, high availability and disaster recovery are inherent in the virtual machine software selected.
- Can provide emulated hardware environments different from the host's instruction set architecture (ISA), through emulation or by using just-in-time compilation.
The main disadvantages of VMs are:
- A virtual machine is less efficient than an actual machine when it accesses the host hard drive indirectly.
- When multiple VMs are concurrently running on the hard drive of the actual host, adjunct virtual machines may exhibit a varying and/or unstable performance (speed of execution and malware protection). This depends on the data load imposed on the system by other VMs, unless the selected VM software provides temporal isolation among virtual machines.
- Malware protections for VMs are not necessarily compatible with the "host", and may require separate software.
The desire to run multiple operating systems was the initial motivation for virtual machines, so as to allow time-sharing among several single-tasking operating systems. In some respects, a system virtual machine can be considered a generalization of the concept of virtual memory that historically preceded it. IBM's CP/CMS, the first systems to allow full virtualization, implemented time sharing by providing each user with a single-user operating system, the CMS. Unlike virtual memory, a system virtual machine entitled the user to write privileged instructions in their code. This approach had certain advantages, such as adding input/output devices not allowed by the standard system.[3]
As technology evolves virtual memory for purposes of virtualization, new systems of memory overcommitment may be applied to manage memory sharing among multiple virtual machines on one actual computer operating system. It may be possible to share "memory pages" that have identical contents among multiple virtual machines that run on the same physical machine, what may result in mapping them to the same physical page by a technique known as Kernel SamePage Merging. This is particularly useful for read-only pages, such as those that contain code segments; in particular, that would be the case for multiple virtual machines running the same or similar software, software libraries, web servers, middleware components, etc. The guest operating systems do not need to be compliant with the host hardware, thereby making it possible to run different operating systems on the same computer (e.g., Microsoft Windows, Linux, or previous versions of an operating system) to support future software.[4]
The use of virtual machines to support separate guest operating systems is popular in regard to embedded systems. A typical use would be to run a real-time operating system simultaneously with a preferred complex operating system, such as Linux or Windows. Another use would be for novel and unproven software still in the developmental stage, so it runs inside a sandbox. Virtual machines have other advantages for operating system development, and may include improved debugging access and faster reboots.[5]
Multiple VMs running their own guest operating system are frequently engaged for server consolidation.[6]
Process virtual machines
A process VM, sometimes called an application virtual machine, or Managed Runtime Environment (MRE), runs as a normal application inside a host OS and supports a single process. It is created when that process is started and destroyed when it exits. Its purpose is to provide a platform-independent programming environment that abstracts away details of the underlying hardware or operating system, and allows a program to execute in the same way on any platform.
A process VM provides a high-level abstraction – that of a high-level programming language (compared to the low-level ISA abstraction of the system VM). Process VMs are implemented using an interpreter; performance comparable to compiled programming languages can be achieved by the use of just-in-time compilation.
This type of VM has become popular with the Java programming language, which is implemented using the Java virtual machine. Other examples include the Parrot virtual machine, and the .NET Framework, which runs on a VM called the Common Language Runtime. All of them can serve as an abstraction layer for any computer language.
A special case of process VMs are systems that abstract over the communication mechanisms of a (potentially heterogeneous) computer cluster. Such a VM does not consist of a single process, but one process per physical machine in the cluster. They are designed to ease the task of programming concurrent applications by letting the programmer focus on algorithms rather than the communication mechanisms provided by the interconnect and the OS. They do not hide the fact that communication takes place, and as such do not attempt to present the cluster as a single machine.
Unlike other process VMs, these systems do not provide a specific programming language, but are embedded in an existing language; typically such a system provides bindings for several languages (e.g., C and FORTRAN). Examples are PVM (Parallel Virtual Machine) and MPI (Message Passing Interface). They are not strictly virtual machines, as the applications running on top still have access to all OS services, and are therefore not confined to the system model.
History
Both system virtual machines and process virtual machines date to the 1960s, and continue to be areas of active development.
System virtual machines grew out of time-sharing, as notably implemented in the Compatible Time-Sharing System (CTSS). Time-sharing allowed multiple users to use a computer concurrently: each program appeared to have full access to the machine, but only one program was executed at the time, with the system switching between programs in time slices, saving and restoring state each time. This evolved into virtual machines, notably via IBM's research systems: the M44/44X, which used partial virtualization, and the CP-40 and SIMMON, which used full virtualization and were early examples of hypervisors. The first widely available virtual machine architecture was the CP-67/CMS; see History of CP/CMS for details. An important distinction was between using multiple virtual machines on a single host system for time-sharing, as in M44/44X and CP-40, and using a single virtual machine on a host system for prototyping, as in SIMMON. Emulators, with hardware emulation of earlier systems for compatibility, date back to the IBM 360 in 1963,[7][8] while the software emulation (then-called "simulation") predates it.
While the emulators have continued to be used for compatibility and to combat obsolescence, time-sharing and virtualization fell relatively out of fashion since the late 1970s through early 1990s due to the personal computing revolution, which shifted attention to individual microcomputers. With the rise of the Internet from the mid-1990s and client-server computing, attention returned to virtualization in server farms and cloud computing. A significant player has been VMware, since 1998, which virtualized the x86 architecture, allowing programs targeting personal computers to run inside virtual machines.
Process virtual machines arose originally as abstract platforms for an intermediate language used as the intermediate representation of a program by a compiler; early examples date to around 1966. An early 1966 example was the O-code machine, a virtual machine which executes O-code (object code) emitted by the front end of the BCPL compiler. This abstraction allowed the compiler to be easily ported to a new architecture by implementing a new back end that took the existing O-code and compiled it to machine code for the underlying physical machine. The Euler language used a similar design, with the intermediate language known as P (portable).[9] This was popularized around 1970 by Pascal, notably in the Pascal-P system (1973) and Pascal-S compiler (1975), in which it was known as p-code and the resulting machine as a p-code machine. This has been influential, and virtual machines in this sense have been often generally called p-code machines. In addition to being an intermediate language, Pascal p-code was also executed directly by an interpreter implementing the virtual machine, notably in UCSD Pascal (1978); this influenced later interpreters, notably the Java virtual machine (JVM). Another early example was SNOBOL4 (1967), which was written in the SNOBOL Implementation Language (SIL), an assembly language for a virtual machine, which was then targeted to physical machines by transpilation to their native assembler via a macro assembler.[10] Macros have since fallen out of favor, however, so this approach has been less influential.
Significant advances occurred in the implementation of Smalltalk, particularly Smalltalk-80, in the Squeak Virtual Machine, VisualWorks, and later implementations such as Strongtalk and the Self dialect. These included important performance techniques such as just-in-time (JIT) compilation and adaptive optimization, which in 1999 proved commercially successful in the HotSpot Java virtual machine. Other innovations include having a register-based virtual machine, to better match the underlying hardware, rather than a stack-based virtual machine, which is a closer match for the programming language; in 1995, this was pioneered by the Dis virtual machine for the Limbo language.
Modern uses of virtual machines, either as an intermediate target for compilation or as a platform to implement directly in an interpreter, continue as seen in the Android Runtime (ART) released in 2013, which compiles bytecode to native code, and the earlier Dalvik virtual machine, which interprets bytecode.
Hardware virtualization techniques
Virtualization of the underlying raw hardware (native execution)
This approach is described as full virtualization of the hardware, and can be implemented using a type 1 or type 2 hypervisor: a type 1 hypervisor runs directly on the hardware, and a type 2 hypervisor runs on another operating system, such as Linux or Windows. Each virtual machine can run any operating system supported by the underlying hardware. Users can thus run two or more different "guest" operating systems simultaneously, in separate "private" virtual computers.
The pioneer system using this concept was IBM's CP-40, the first (1967) version of IBM's CP/CMS (1967–1972) and the precursor to IBM's LPAR VM family (1972–present). With the VM architecture, most users run a relatively simple interactive computing single-user operating system, CMS, as a "guest" on top of the VM control program (VM-CP). This approach kept the CMS design simple, as if it were running alone; the control program quietly provides multitasking and resource management services "behind the scenes". In addition to CMS communication and other system tasks are performed by multitasking VMs (RSCS, GCS, TCP/IP, UNIX), and users can run any of the other IBM operating systems, such as MVS, even a new CP itself or now z/OS. Even the simple CMS could be run in a threaded environment (LISTSERV, TRICKLE). z/VM is the current version of VM, and is used to support hundreds or thousands of virtual machines on a given mainframe. Some installations use Linux on z Systems to run Web servers, where Linux runs as the operating system within many virtual machines.
Full virtualization is particularly helpful in operating system development, when experimental new code can be run at the same time as older, more stable, versions, each in a separate virtual machine. The process can even be recursive: IBM debugged new versions of its virtual machine operating system, VM, in a virtual machine running under an older version of VM, and even used this technique to simulate new hardware.[11]
The standard x86 processor architecture as used in the modern PCs does not actually meet the Popek and Goldberg virtualization requirements. Notably, there is no execution mode where all sensitive machine instructions always trap, which would allow per-instruction virtualization.
Despite these limitations, several software packages have managed to provide virtualization on the x86 architecture, even though dynamic recompilation of privileged code, as first implemented by VMware, incurs some performance overhead as compared to a VM running on a natively virtualizable architecture such as the IBM System/370 or Motorola MC68020. By now, several other software packages such as Virtual PC, VirtualBox, Parallels Workstation and Virtual Iron manage to implement virtualization on x86 hardware.
Intel and AMD have introduced features to their x86 processors to enable virtualization in hardware.
As well as virtualization of the resources of a single machine, multiple independent nodes in a cluster can be combined and accessed as a single virtual NUMA machine.[12]
Emulation of a non-native system
Virtual machines can also perform the role of an emulator, allowing software applications and operating systems written for another computer processor architecture to be run. Emulation may be entirely in software, or may also include a hardware component that may also involve use of microcode.
Operating-system-level virtualization
Operating-system-level virtualization is a server virtualization technology which virtualizes servers on an operating system (kernel) layer. It can be thought of as partitioning: a single physical server is sliced into multiple small partitions (otherwise called virtual environments (VE), virtual private servers (VPS), guests, zones, etc.); each such partition looks and feels like a real server, from the point of view of its users.
For example, Solaris Zones supports multiple guest operating systems running under the same operating system such as Solaris 10.[13] Guest operating systems can use the same kernel level with the same operating system version, or can be a separate copy of the operating system with a different kernel version using Solaris Kernel Zones.[14] Solaris native Zones also requires that the host operating system is a version of Solaris; other operating systems from other manufacturers are not supported. However, Solaris Branded Zones would need to used to have other operating systems as zones.
Another example is System Workload Partitions (WPARs), introduced in version 6.1 of the IBM AIX operating system. System WPARs are software partitions running under one instance of the global AIX OS environment.
The operating system level architecture has low overhead that helps to maximize efficient use of server resources. The virtualization introduces only a negligible overhead and allows running hundreds of virtual private servers on a single physical server. In contrast, approaches such as full virtualization (like VMware) and paravirtualization (like Xen or UML) cannot achieve such level of density, due to overhead of running multiple kernels. From the other side, operating system-level virtualization does not allow running different operating systems (i.e. different kernels), although different libraries, distributions, etc. are possible.
Abstract virtual machine techniques
Virtual machines that execute code for an abstract machine, which is given by a detailed specification rather than an existing physical device, are traditionally known as p-code machines.
These virtual machines can be used either as abstract platforms for an intermediate language used as the intermediate representation of a program by a compiler, or be executed directly by an interpreter implementing the virtual machine. The use of an intermediate language improves compiler portability, dividing it into a front end, which compiles to the intermediate language, and a back end, which compiles the intermediate language to machine code for the underlying physical machine. Interpreting the source code or bytecode provides the same portability benefit as only the virtual machine software itself must be written separately for each type of computer on which it runs, provides additional benefits such as avoiding needing to compile code before execution (ahead-of-time (AOT) compilation), and allows more complex behavior at runtime such as supporting many features of dynamic programming languages (reflections, for example) and runtime optimization (adaptive optimization by dynamic recompilation).
Additionally, a single intermediate language can be targeted by many high-level languages, such as Java, Scala and Clojure all translating to the JVM bytecode, and many languages (such as C#, F#, Visual Basic.NET, etc.) translating to the .NET Framework bytecode.
Notable examples include the following:
- One of the first was the p-code machine specification, which allowed programmers to write Pascal programs that would run on any computer running virtual machine software that correctly implemented the specification.
- The specification of the Java virtual machine
- The Common Language Infrastructure virtual machine at the heart of the Microsoft's .NET initiative.
- Open Firmware allows plug-in hardware to include boot-time diagnostics, configuration code, and device drivers that can run on any kind of a CPU.
Implementation techniques for virtual machines include just-in-time (JIT) compilation and dynamic recompilation of hot spots (trace-based JIT), notably in the HotSpot virtual machine.
Virtual machine design must trade off closeness to the source language (simplifying the front end) and closeness to the machine language (simplifying the back end), and its common relation to multiple source or target languages. An important distinction is between stack-based and register-based virtual machines: stack-based machines are closer to the source language since most programming languages use a recursion that maps naturally to a stack, while register-based virtual machines are closer to the target machine language since most processors use registers internally (register machines, not stack machines). Earlier virtual machines typically were stack-based, including Pascal p-code (1970) through the Java virtual machine (1995), but newer virtual machines are generally register-based, which simplifies memory-to-memory mapping of the interpreter, avoiding excessive copying of values and reducing the total number of instructions per function. The register-based design dates to the Dis virtual machine in 1995, and has since been widely used in the Lua virtual machine (since the Lua 5 in 2003), Perl 6's Parrot, and Android's Dalvik.
Virtualization-enabled hardware
Examples of virtualization-enabled hardware include the following:
- Alcatel-Lucent 3B20D/3B21D emulated on commercial off-the-shelf computers with 3B2OE or 3B21E system
- AMD-V (formerly code-named Pacifica)
- ARM TrustZone
- Boston Circuits gCore (grid-on-chip) with 16 ARC 750D cores and Time-machine hardware virtualization module.
- Freescale PowerPC MPC8572 and MPC8641D
- IBM System/370, System/390, and zSeries mainframes
- IBM Power Systems
- Intel VT-x (formerly code-named Vanderpool)
- HP vPAR and cell based nPAR
- GE Project MAC then
- Honeywell Multics systems
- Honeywell 200/2000 systems Liberator replacing IBM 14xx systems, Level 62/64/66 GCOS
- IBM System/360 Model 145 Hardware emulator for Honeywell 200/2000 systems
- RCA Spectra/70 Series emulated IBM System/360
- NAS CPUs emulated IBM and Amdahl machines
- Honeywell Level 6 minicomputers emulated predecessor 316/516/716 minis
- Oracle Corporation (previously Sun Microsystems) SPARC sun4v (SPARC M6, T5, T4, T3, UltraSPARC T1 and T2) – utilized by Oracle VM Server for SPARC, also known as "Logical Domains"
- Xerox Sigma 6 CPUs were modified to emulate GE/Honeywell 600/6000 systems
See also
References
- ↑ "Virtual Machines: Virtualization vs. Emulation". Retrieved 2011-03-11.
- ↑ Smith, James; Nair, Ravi (2005). "The Architecture of Virtual Machines". Computer (IEEE Computer Society) 38 (5): 32–38. doi:10.1109/MC.2005.173.
- ↑ Smith and Nair, pp. 395–396
- ↑  Oliphant, Patrick. "Virtual Machines". VirtualComputing. Retrieved 23 September 2015. Some people use that capability to set up a separate virtual machine running Windows on a Mac, giving them access to the full range of applications available for both platforms. 
- ↑ Super Fast Server Reboots – Another reason Virtualization rocks. vmwarez.com (2006-05-09). Retrieved on 2013-06-14.
- ↑ "Server Consolidation and Containment With Virtual Infrastructure" (PDF). VMware. 2007. Retrieved 2015-09-29.
- ↑ Pugh, Emerson W. (1995). Building IBM: Shaping an Industry and Its Technology. MIT. p. 274. ISBN 0-262-16147-8.
- ↑ Pugh, Emerson W.; et al. (1991). IBM's 360 and Early 370 Systems. MIT. ISBN 0-262-16123-0. pages 160-161
- ↑ Wirth, N.; Weber, H. (1966). EULER: a generalization of ALGOL, and its formal definition: Part II, Communications of the Association for Computing Machinery, Vol.9, No.2, pp.89-99. New York: ACM.
- ↑ Griswold, Ralph E. The Macro Implementation of SNOBOL4. San Francisco, CA: W. H. Freeman and Company, 1972 (ISBN 0-7167-0447-1), Chapter 1.
- ↑ See History of CP/CMS for IBM's use of virtual machines for operating system development and simulation of new hardware
- ↑ Matthew Chapman and Gernot Heiser. "vNUMA: A virtual shared-memory multiprocessor". Proceedings of the 2009 USENIX Annual Technical Conference, San Diego, CA, USA, June, 2009
- ↑ "Oracle Solaris Zones Overview". docs.oracle.com. Retrieved 2015-06-26.
- ↑ "About Oracle Solaris Kernel Zones". docs.oracle.com. Retrieved 2015-06-26.
Further reading
- James E. Smith, Ravi Nair, Virtual Machines: Versatile Platforms For Systems And Processes, Morgan Kaufmann, May 2005, ISBN 1-55860-910-5, 656 pages (covers both process and system virtual machines)
- Craig, Iain D. Virtual Machines. Springer, 2006, ISBN 1-85233-969-1, 269 pages (covers only process virtual machines)
External links
- The Reincarnation of Virtual Machines, Article on ACM Queue by Mendel Rosenblum, Co-Founder, VMware
- Sandia National Laboratories Runs 1 Million Linux Kernels as Virtual Machines
- The design of the Inferno virtual machine by Phil Winterbottom and Rob Pike
- Software Portability by Virtual Machine Emulation by Stefan Vorkoetter
- Create new Virtual Machine in Windows Azure by Rahul Vijay Manekari
| 
 | ||||||||||||||||||||||||||||
| 
 | ||||||||||||||||||||||||||||||||||||||