Shebang (Unix)
In computing, a shebang is the character sequence consisting of the characters number sign and exclamation mark (#!) at the beginning of a script. It is also called sha-bang,[1][2] hashbang,[3][4] pound-bang,[5] or hash-pling.[6]
Under Unix-like operating systems, when a script with a shebang is run as a program, the program loader parses the rest of the script's initial line as an interpreter directive; the specified interpreter program is run instead, passing to it as an argument the path that was initially used when attempting to run the script.[7] For example, if a script is named with the path path/to/script, and it starts with the following line:
#!/bin/sh
then the program loader is instructed to run the program /bin/sh instead, passing path/to/script as the first argument.
The shebang line is usually ignored by the interpreter because the "#" character is a comment marker in many scripting languages; some language interpreters that do not use the hash mark to begin comments (such as Scheme) still may ignore the shebang line in recognition of its purpose.[8] Other solutions rely on a preprocessor, which evaluates and removes the shebang line, before the remainder of the script is forwarded to a compiler or interpreter. For example, this is the case with InstantFPC—a command that allows to run programs written in Free Pascal as scripts under several operating systems.[9]
Syntax
The form of a shebang interpreter directive is as follows:[7]
#!interpreter [optional-arg]
The interpreter must be an absolute path to an executable[lower-alpha 1] program (if this interpreter program is a script, it must contain a shebang as well). The optional‑arg should either not be included or it should be a string that is meant to be a single argument (for reasons of portability, it should not contain any whitespace). A space after #!
is optional.
Examples
Some typical shebang lines:
-
#!/bin/sh
– Execute the file using the Bourne shell, or a compatible shell, with path /bin/sh -
#!/bin/csh -f
– Execute the file using csh, the C shell, or a compatible shell, and suppress the execution of the user's .cshrc file on startup -
#!/usr/bin/perl -T
– Execute using Perl with the option for taint checks -
#!/usr/bin/env python
– Execute using Python by looking up the path to the Python interpreter automatically via env
Shebang lines may include specific options that are passed to the interpreter. However, implementations vary in the parsing behavior of options; for portability, only one option should be specified without any embedded whitespace. Further portability guidelines are found below.
Purpose
Interpreter directives allow scripts and data files to be used as system commands, hiding the details of their implementation from users and other programs, by removing the need to prefix scripts with their interpreter on the command line.
Consider a Bourne shell script that is identified by the path some/path/to/foo and that has the following as its initial line:
#!/bin/sh -x
If the user attempts to run this script with the following command line (specifying "bar" and "baz" as arguments):
some/path/to/foo bar baz
then the result would be similar to having actually executed the following command line instead:
/bin/sh -x some/path/to/foo bar baz
If /bin/sh specifies the Bourne shell, then the end result is that all of the shell commands in the file some/path/to/foo are executed with the positional variables $1 and $2 having the values bar and baz, respectively. Also, because the initial number sign is the character used to introduce comments in the Bourne shell language (and in the languages understood by many other interpreters), the entire shebang line is ignored by the interpreter.
However, it is up to the interpreter to ignore the shebang line; thus, a script consisting of the following two lines simply echos both lines to standard output when run:
#!/bin/cat Hello world!
Strengths
When compared to the use of global association lists between file extensions and the interpreting applications, the interpreter directive method allows users to use interpreters not known at a global system level, and without administrator rights. It also allows specific selection of interpreter, without overloading the filename extension namespace (where one file extension refers to more than one file type), and allows the implementation language of a script to be changed without changing its invocation syntax by other programs.
Portability
Shebangs must specify absolute paths (or paths relative to current working directory) to system executables; this can cause problems on systems that have a non-standard file system layout. Even when systems have fairly standard paths, it is quite possible for variants of the same operating system to have different locations for the desired interpreter. Python, for example, might be in /usr/bin/python, /usr/local/bin/python, or even something like /home/username/bin/python if installed by an ordinary user.
Because of this it is common to need to edit the shebang line after copying a script from one computer to another because the path that was coded into the script may not apply on a new machine, depending on the consistency in past convention of placement of the interpreter. For this reason and because POSIX does not standardize path names, POSIX does not standardize the feature.
Often, the program /usr/bin/env can be used to circumvent this limitation by introducing a level of indirection. #!
is followed by /usr/bin/env, followed by the desired command without full path, as in this example:
#!/usr/bin/env sh
This mostly works because the path /usr/bin/env is commonly used for the env utility, and it invokes the first sh found in the user's $PATH, typically /bin/sh.
On a system with setuid script support this will reintroduce the race eliminated by the /dev/fd workaround described below. There are still some portability issues with OpenServer 5.0.6 and Unicos 9.0.2 which have only /bin/env and no /usr/bin/env.
Another portability problem is the interpretation of the command arguments. Some systems, including Linux, do not split up the arguments;[10] for example, when running the script with the first line like,
#!/usr/bin/env python -c
That is, python -c
will be passed as one argument to /usr/bin/env, rather than two arguments. Cygwin also behaves this way. Complex interpreter invocations are possible through the use of an additional wrapper.
Another problem is scripts containing a carriage return character immediately after the shebang, perhaps as a result of being edited on a system that uses DOS line breaks, such as Microsoft Windows. Some systems interpret the carriage return character as part of the interpreter command, resulting in an error message. [11]
POSIX requires sh to be a shell capable of a syntax similar to the Bourne shell, although it does not require it to be located at /bin/sh; for example, some systems such as Solaris have the POSIX-compatible shell at /usr/xpg4/bin/sh.[12] In many Linux systems and recent releases of Mac OS X, /bin/sh is a hard or symbolic link to /bin/bash, the Bourne Again shell.
Using syntax specific to Bash while maintaining a shebang pointing to the Bourne shell is not portable.[13]
Magic number
The shebang is actually a human-readable instance of a magic number in the executable file, the magic byte string being 0x23 0x21, the two-character encoding in ASCII of #!. This magic number is detected by the "exec" family of functions, which determine whether a file is a script or an executable binary. The presence of the shebang will result in the execution of the specified executable, usually an interpreter for the script's language. It has been claimed that some old versions of Unix expect the normal shebang to be followed by a space and a slash (#! /
), but this appears to be untrue; rather, blanks after the shebang have traditionally been allowed, and sometimes documented with a space, as in 1980 email in history section, below.
The shebang characters are represented by the same two bytes in extended ASCII encodings, including UTF-8, which is commonly used for scripts and other text files on current Unix-like systems. However, UTF-8 files may begin with the optional byte order mark (BOM); if the "exec" function specifically detects the bytes 0x23 and 0x21, then the presence of the BOM (0xEF 0xBB 0xBF) before the shebang will prevent the script interpreter from being executed. Some authorities recommend against using the byte order mark in POSIX (Unix-like) scripts,[14] for this reason and for wider interoperability and philosophical concerns. Additionally, a byte order mark is not necessary in UTF-8, as that encoding does not have endianness issues; it serves only to identify the encoding as UTF-8.
Security issues
On some systems, scripts can be marked with the setuid attribute, set-user-ID, a Unix feature which means that a program is executed with the access rights of the program file's owner instead of the rights of the user running it. Although this mechanism may be safe for compiled code, the extra step introduced by the interpreter directive provides an extra window of opportunity of attack.[15] This problem has been corrected on some modern systems, namely those supporting the /dev/fd filesystem can support the change, by opening the script first, producing a file descriptor which is safe from attack, then invoking the interpreter with that safe file descriptor as input. However, the discovery of the problem led many system administrators and developers to the conclusion that scripts couldn't be made secure, a case made more compelling by issues with the shell's internal field separator (also since corrected on modern systems); as a result, setuid functionality is often made unavailable to scripts.
Etymology
An executable file starting with an interpreter directive is simply called a script, often prefaced with the name or general classification of the intended interpreter. The name shebang for the distinctive two characters comes from an inexact contraction of SHArp bang or haSH bang, referring to the two typical Unix names for them. Another theory on the sh in shebang is that it is from the default shell sh, usually invoked with shebang.[16] This usage was current by December 1989,[17] and probably earlier.
History
The shebang was introduced by Dennis Ritchie between Edition 7 and 8 at Bell Laboratories. It was also added to the BSD releases from Berkeley's Computer Science Research (present at 2.8BSD[18] and activated by default by 4.2BSD). As AT&T Bell Laboratories Edition 8 Unix, and later editions, were not released to the public, the first widely known appearance of this feature was on BSD.
The lack of an interpreter directive, but support for shell scripts, is apparent in the documentation from Version 7 Unix in 1979, [19] which describes instead a facility of the Bourne shell where files with execute permission would be handled specially by the shell, which would (sometimes depending on initial characters in the script, such as ":" or "#") spawn a subshell which would interpret and run the commands contained in the file. In this model, scripts would only behave as other commands if called from within a Bourne shell. An attempt to directly execute such a file via the operating system's own exec() system trap would fail, preventing scripts from behaving uniformly as normal system commands.
In later versions of Unix-like systems, this inconsistency was removed. Dennis Ritchie introduced kernel support for interpreter directives in January 1980, for Version 8 Unix, with the following description:[18]
From uucp Thu Jan 10 01:37:58 1980 >From dmr Thu Jan 10 04:25:49 1980 remote from research The system has been changed so that if a file being executed begins with the magic characters #! , the rest of the line is understood to be the name of an interpreter for the executed file. Previously (and in fact still) the shell did much of this job; it automatically executed itself on a text file with executable mode when the text file's name was typed as a command. Putting the facility into the system gives the following benefits. 1) It makes shell scripts more like real executable files, because they can be the subject of 'exec.' 2) If you do a 'ps' while such a command is running, its real name appears instead of 'sh'. Likewise, accounting is done on the basis of the real name. 3) Shell scripts can be set-user-ID. 4) It is simpler to have alternate shells available; e.g. if you like the Berkeley csh there is no question about which shell is to interpret a file. 5) It will allow other interpreters to fit in more smoothly. To take advantage of this wonderful opportunity, put #! /bin/sh at the left margin of the first line of your shell scripts. Blanks after ! are OK. Use a complete pathname (no search is done). At the moment the whole line is restricted to 16 characters but this limit will be raised.
Kernel support for interpreter directives spread to other versions of Unix, and one modern implementation can be seen in the Linux kernel source in fs/binfmt_script.c.[20]
This mechanism allows scripts to be used in virtually any context normal compiled programs can be, including as full system programs, and even as interpreters of other scripts. As a caveat, though, some early versions of kernel support limited the length of the interpreter directive to roughly 32 characters (just 16 in its first implementation), would fail to split the interpreter name from any parameters in the directive, or had other quirks. Additionally, some modern systems allow the entire mechanism to be constrained or disabled for security purposes (for example, set-user-id support has been disabled for scripts on many systems).
Note that, even in systems with full kernel support for the #! magic number, some scripts lacking interpreter directives (although usually still requiring execute permission) are still runnable by virtue of the legacy script handling of the Bourne shell, still present in many of its modern descendants. Then scripts are then interpreted by the user default shell.
See also
Notes
- ↑ Depends on Unix flavours,
#!path arg1 arg2
may roughly translate toexecve("path", ["path", script], env);
,execve("path", ["path", "arg1 arg2", script], env);
,execve("path", ["path", "arg1", "arg2", script], env);
orexecve("path", ["path", "arg1", script], env);
. In Linux, the file specified by path can be executed if it has the execute right and contains code which the kernel can execute directly (ELF image), or if it has a wrapper defined for it via sysctl (e.g. for executing Microsoft EXE binaries using wine), or if it contains a shebang. On Linux and Minix, an interpreter can also be a script. A chain of shebangs and wrappers yields a directly executable file that gets the encountered scripts as parameters in reverse order. For example, if file /bin/A is a directly executable file (in ELF format), file /bin/B contains a#!/bin/A optparam
shebang, and file /bin/C contains a #!/bin/B shebang, then executing file /bin/C resolves to /bin/B /bin/C, which finally resolves to /bin/A optparam /bin/B /bin/C.
References
- ↑ "Advanced Bash Scripting Guide". Retrieved 2012-01-19.
- ↑ Cooper, Mendel (5 November 2010). Advanced Bash Scripting Guide 5.3 Volume 1. lulu.com. p. 5. ISBN 978-1-4357-5218-4.
- ↑ MacDonald, Matthew (2011). HTML5: The Missing Manual. Sebastopol, California: O'Reilly Media. p. 373. ISBN 978-1-4493-0239-9.
- ↑ Lutz, Mark (September 2009). Learning Python (4th ed.). O'Reilly Media. p. 48. ISBN 978-0-596-15806-4.
- ↑ Lie Hetland, Magnus (4 October 2005). Beginning Python: From Novice to Professional. Apress. p. 21. ISBN 978-1-59059-519-0.
- ↑ Schitka, John (24 December 2002). Linux+ Guide to Linux Certification. Course Technology. p. 353. ISBN 978-0-619-13004-6.
- 1 2 "execve(2) - Linux man page". Retrieved 2010-10-21.
- ↑ SRFI 22
- ↑ InstantFPC documentation
- ↑ "/usr/bin/env behaviour". Mail-index.netbsd.org. 2008-11-09. Retrieved 2010-11-18.
- ↑ "Carriage Return causes bash to fail". 2013-11-08.
- ↑ "The Open Group Base Specifications Issue 7". 2008. Retrieved 2010-04-05.
- ↑ pixelbeat.org: Common shell script mistakes "It's much better to test scripts directly in a POSIX compliant shell if possible. The `bash --posix` option doesn't suffice as it still accepts some 'bashisms'"
- ↑ "FAQ - UTF-8, UTF-16, UTF-32 & BOM: Can a UTF-8 data stream contain the BOM character (in UTF-8 form)? If yes, then can I still assume the remaining UTF-8 bytes are in big-endian order?". Retrieved 2009-01-04.
- ↑ http://docstore.mik.ua/orelly/other/puis3rd/0596003234_puis3-chp-6-sect-5.html
- ↑ "Jargon File entry for shebang". Catb.org. Retrieved 2010-06-16.
- ↑ "Perl didn't grok setuid scripts that had a space on the first line between the shebang and the interpreter name", USENET posting by Larry Wall
- 1 2 http://www.mckusick.com/csrg CSRG Archive CD-ROMs
- ↑ http://cm.bell-labs.com/7thEdMan/v7vol2a.pdf UNIX TIME-SHARING SYSTEM: UNIX PROGRAMMER’S MANUAL Seventh Edition, Volume 2A, January, 1979
- ↑ Rubini, Alessandro (31 December 1997). "Playing with Binary Formats". Linux Journal. Retrieved 1 January 2015.
External links
- Details about the shebang mechanism on various Unix flavours
- #! - the Unix truth as far as I know it (a more generic approach)
- FOLDOC shebang article