Self-hosting
Self-hosting is the use of a computer program as part of the toolchain or operating system that produces new versions of that same program—for example, a compiler that can compile its own source code. Self-hosting software is commonplace on personal computers and larger systems. Other programs that are typically self-hosting include kernels, assemblers, command-line interpreters and revision control software.
If a system is so new that no software has been written for it, then software is developed on another self-hosting system and placed on a storage device that the new system can read. Development continues this way until the new system can reliably host its own development. Writing new software development tools "from the metal" (that is, without using another host system) is rare and in many cases impractical.
History
The first self-hosting compiler (excluding assemblers) was written for Lisp by Hart and Levin at MIT in 1962. They wrote a Lisp compiler in Lisp, testing it inside an existing Lisp interpreter. Once they had improved the compiler to the point where it could compile its own source code, it was self-hosting.[1]
The compiler as it exists on the standard compiler tape is a machine language program that was obtained by having the S-expression definition of the compiler work on itself through the interpreter.— AI Memo 39[1]
This technique is only possible when an interpreter already exists for the very same language that is to be compiled. It borrows directly from the notion of running a program on itself as input, which is also used in various proofs in theoretical computer science, such as the proof that the halting problem is undecidable.
Examples
Ken Thompson started development on Unix in 1968 by writing and compiling programs on the GE-635 and carrying them over to the PDP-7 for testing. After the initial Unix kernel, a command interpreter, an editor, an assembler, and a few utilities were completed, the Unix operating system was self-hosting - programs could be written and tested on the PDP-7 itself.[2]
Development of the Linux kernel was initially hosted on a Minix system. When sufficient packages, like GCC, GNU bash and other utilities are ported over, developers can work on new versions of Linux kernel based on older versions of itself (like building kernel 3.21 on a machine running kernel 3.18). This procedure can also be felt when building a new linux distribution from scratch.
Many programming languages have self-hosted implementations: compilers that are both in and for the same language. Such languages include Ada, BASIC, C, C++,[3] C#,[4] CoffeeScript, Dylan, F#, FASM, Forth, Gambas, Go, Haskell, Java, Lisp, Modula-2, OCaml, Oberon, Pascal, Python, Rust, Scala, Smalltalk, Vala, and Visual Basic.[4]
In some of these cases, the initial implementation was not self-hosted, but rather, written in another language (or even in machine language); in other cases, the initial implementation was developed using bootstrapping.
See also
- Bootstrapping (compilers)
- Cross compiler
- Dogfooding
- Futamura projection
- Self-interpreter
- Self reference
References
- 1 2 Tim Hart and Mike Levin. "AI Memo 39-The new compiler" (PDF). Retrieved 2008-05-23.
- ↑ Dennis M. Ritchie. "The Development of the C Language". 1993.
- ↑ gcc 4.8, LLVM/clang
- 1 2 Mono gmcs and Microsoft Roslyn