Thunk
In computer programming, a thunk is a subroutine that is created, often automatically, to assist a call to another subroutine. Thunks are primarily used to represent an additional calculation that a subroutine needs to execute, or to call a routine that does not support the usual calling mechanism. They have a variety of other applications to compiler code generation and in modular programming.
The term originated as a jocular derivative of something "thought".[1]
Background
The early years of compiler research saw broad experimentation with different evaluation strategies. A key question was how to compile a subroutine call if the arguments can be arbitrary mathematical expressions rather than constants. One approach, known as "call by value," calculates all of the arguments before the call and then passes the resulting values to the subroutine. In the rival "call by name" approach, the subroutine receives the unevaluated argument expression and must evaluate it.
A simple implementation of "call by name" might substitute the code of an argument expression for each appearance of the corresponding parameter in the subroutine, but this can produce multiple versions of the subroutine and multiple copies of the expression code. As an improvement, the compiler can generate a helper subroutine, called a thunk, that calculates the value of the argument. The address of this helper subroutine is then passed to the original subroutine in place of the original argument, where it can be called as many times as needed. Prof. Peter Ingerman first described thunks in reference to the ALGOL 60 programming language, which supported call-by-name evaluation.[2]
Applications
Functional programming
Although the software industry largely standardized on call-by-value and call-by-reference evaluation,[3] active study of call-by-name continued in the functional programming community. This research produced a series of lazy evaluation programming languages in which some variant of call-by-name is the standard evaluation strategy. Compilers for these languages, such as the Glasgow Haskell Compiler, have relied heavily on thunks, with the added feature that the thunks save their initial result so that they can avoid recalculating it;[4] this is known as memoization.
Functional programming languages have also allowed programmers to explicitly generate thunks. This is done in source code by wrapping an argument expression in an anonymous function that has no parameters of its own. This prevents the expression from being evaluated until a receiving function calls the anonymous function, thereby achieving the same effect as call-by-name.[5] The adoption of anonymous functions into other programming languages has made this capability widely available.
Object-oriented programming
Thunks are useful in object-oriented programming platforms that allow a class to inherit multiple interfaces, leading to situations where the same method might be called via any of several interfaces. The following code illustrates such a situation in C++.
class A {
int value;
virtual int access() { return this->value; }
};
class B {
int value;
virtual int access() { return this->value; }
};
class C : public A, public B {
int better_value;
virtual int access() { return this->better_value; }
};
int use(B *b) {
return b->access();
}
// ...
B someB;
use(&someB);
C someC;
use(&someC);
In this example, the code generated for each of the classes A, B and C will include a dispatch table that can be used to call access
on an object of that type, via a reference that has the same type. Class C will have an additional dispatch table, used to call access
on an object of type C via a reference of type B. The expression b->access()
will use B's own dispatch table or the additional C table, depending on the type of object b refers to. If it refers to an object of type C, the compiler must ensure that C's access
implementation receives an instance address for the entire C object, rather than the inherited B part of that object.[6]
As a direct approach to this pointer adjustment problem, the compiler can include an integer offset in each dispatch table entry. This offset is the difference between the reference's address and the address required by the method implementation. The code generated for each call through these dispatch tables must then retrieve the offset and use it to adjust the instance address before calling the method.
The solution just described has problems similar to the naïve implementation of call-by-name described earlier: the compiler generates several copies of code to calculate an argument (the instance address), while also increasing the dispatch table sizes to hold the offsets. As an alternative, the compiler can generate an adjustor thunk along with C's implementation of access
that adjusts the instance address by the required amount and then calls the method. The thunk can appear in C's dispatch table for B, thereby eliminating the need for callers to adjust the address themselves.[7]
Interoperability
Thunks have been widely used to provide interoperability between software modules whose routines cannot call each other directly, as in the following cases.
- The routines have different calling conventions or use different representations for arguments.
- The routines run in different CPU modes, or different address spaces, or at least one runs in a virtual machine.
A compiler (or other tool) can solve this problem by generating a thunk that automates the additional steps needed to call the target routine, whether that is transforming arguments, copying them to another location, or switching the CPU mode. A successful thunk minimizes the extra work the caller must do compared to a normal call.
Much of the literature on interoperability thunks relates to various Wintel platforms, including MS-DOS, OS/2,[8]Windows[9][10] and .NET, and to the transition from 16-bit to 32-bit memory addressing. As customers have migrated from one platform to another, thunks have been essential to support legacy software written for the older platforms.
Overlays and dynamic linking
On systems that lack automatic virtual memory hardware, thunks can implement a limited form of virtual memory known as overlays. With overlays, a developer divides a program's code into segments that can be loaded and unloaded independently, and identifies the entry points into each segment. A segment that calls into another segment must do so indirectly via a branch table. When a segment is in memory, its branch table entries jump into the segment. When a segment is unloaded, its entries are replaced with "reload thunks" that can reload it on demand.[11]
Similarly, systems that can dynamically link several modules into a program at run-time can rely on thunks as bridges between the modules. Each module has a table of thunks that it uses to call the routines it needs from other modules. The linker can fill in these tables based on the locations of the modules in memory, without having to keep track of each external call in each module.[12]
See also
Thunk technologies
- DOS Protected Mode Interface
- J/Direct
- Microsoft Layer for Unicode
- Platform Invocation Services
- Win32s
- Windows on Windows
- WoW64
Related concepts
- Anonymous function
- Futures and promises
- Remote procedure call
- Shim (computing)
- Trampoline (computing)
- Reducible expression
References
- ↑ Eric Raymond rejects "a couple of onomatopoeic myths circulating about the origin of this term" and cites the inventors of the thunk recalling that the term "was coined after they realized (in the wee hours after hours of discussion) that the type of an argument in Algol-60 could be figured out in advance with a little compile-time thought [...] In other words, it had 'already been thought of'; thus it was christened a thunk, which is 'the past tense of "think" at two in the morning'. See: Raymond, Eric S. (1996). Raymond, Eric S., ed. The New Hacker's Dictionary. MIT Press. p. 445. ISBN 9780262680929. Retrieved 2015-05-25.
- ↑ Ingerman, P. Z. (1961). "Thunks: A Way of Compiling Procedure Statements with Some Comments on Procedure Declarations". Communications of the ACM 4 (1). doi:10.1145/366062.366084.
- ↑ Scott, Michael (2009). Programming Language Pragmatics. p. 395.
- ↑ Marlow, Simon (2013). Parallel and Concurrent Programming in Haskell. p. 10.
- ↑ Queinnec, Christian (2003). Lisp in Small Pieces. p. 176.
- ↑ Stroustrup, Bjarne (Fall 1989). "Multiple Inheritance for C++" (PDF). Computing Systems (USENIX) 1 (4). Retrieved 4 August 2014.
- ↑ Driesen, Karel; Hölzle, Urs (1996). "The Direct Cost of Virtual Function Calls in C++" (PDF). OOPSLA. Retrieved 24 February 2011.
- ↑ Calcote, John (May 1995), "Thunking: Using 16-Bit Libraries in OS/2 2.0", OS/2 Developer Magazine 7 (3)
- ↑ King, Adrian (1994), Inside Windows 95
- ↑ Hazzah, Karen (1996), Writing Windows VxDs and Device Drivers
- ↑ Bright, Walter (1990-07-01). "Virtual Memory For 640K DOS". Dr. Dobb's Journal. Retrieved 2014-03-06.
- ↑ Levine, John R. (2000). Linkers and Loaders.