printf format string

An example of the printf function.

Printf format string (of which "printf" stands for "print formatted") refers to a control parameter used by a class of functions in the string-processing libraries of various programming languages. The format string is written in a simple template language, and specifies a method for rendering an arbitrary number of varied data type parameters into a string. This string is then by default printed on the standard output stream, but variants exist that perform other tasks with the result, such as returning it as the value of the function. Characters in the format string are usually copied literally into the function's output, as is usual for templates, with the other parameters being rendered into the resulting text in place of certain placeholders – points marked by format specifiers, which are typically introduced by a % character, though syntax varies. The format string itself is very often a string literal, which allows static analysis of the function call. However, it can also be the value of a variable, which allows for dynamic formatting but also a security vulnerability known as an uncontrolled format string exploit.

The term "printf" is due to the C language, which popularized this type of function, but these functions predate C, and other names are used, notably "format". Printf format strings, which provide formatted output (templating), are complementary to scanf format strings, which provide formatted input (parsing). In both cases these provide simple functionality and fixed format compared to more sophisticated and flexible template engines or parsers, but are sufficient for many purposes.

Programming languages

Many programming languages implement a printf function to output a formatted string. It originated from the C programming language, where it has a prototype similar to the following:

int printf(const char *format, ...);

The string constant format provides a description of the output, with placeholders marked by "%" escape characters, to specify both the relative location and the type of output that the function should produce. The return value yields the number of printed characters.

1950s: FORTRAN 66

Fortran's variadic PRINT statement referenced a non-executable FORMAT statement.

      PRINT 601, 123456, 1000.0, 3.1415, 250
  601 FORMAT (8H RED NUM,I7,4H EXP,E8.1,5H REAL,F5.2,4H VAL,I4)

will print the following (on a new line, because of the leading blank character):^[1]

RED NUM 123456 EXP 1.0E 03 REAL 3.14 VAL 250

1960s: COBOL

COBOL provides formatting via hierarchical data structure specification:

   data division.
   working-storage section.
   01 m-identity.
       02 m-surname    pic x(4) value "John".
       02 filler       pic x    value space.
       02 m-name       pic x(3) value "Doe".

    procedure division.
    display m-identity

    goback.

or through the STRING instruction:

    data division.
    working-storage section.
    01 m-surname       pic x(25) value "John".
    01 m-name          pic x(25) value "Doe".
    01 m-identity      pic x(50).

    procedure division.
    string
        m-surname     delimited by space
        space         delimited by size
        m-name        delimited by space
        into m-identity
    end-string

    display m-identity

    goback.

Both codes display "John Doe".

1960s: BCPL, ALGOL 68, Multics PL/I

C's variadic printf has its origins in BCPL's writef function (1966). For example, a statement to write the factorial equation 5! = 120 (assuming I is 5 and FACT computes the factorial) could be:^[2]

WRITEF("%N! = %I4*N", I, FACT(I))

ALGOL 68 Draft and Final report had the functions inf and outf, subsequently these were revised out of the original language and replaced with the now more familiar readf/getf and printf/putf.

 printf(($"Color "g", number1 "6d,", number2 "4zd,", hex "16r2d,", float "-d.2d,", unsigned value"-3d"."l$,
             "red", 123456, 89, BIN 255, 3.14, 250));

Multics has a standard function called ioa_ with a wide variety of control codes. It was based on a machine-language facility from Multics's BOS (Bootstrap Operating System).

call ioa_ ("Hello, ^a", "World!");

1970s: C, Lisp

 printf("Color %s, number1 %d, number2 %05d, hex %#x, float %5.2f, unsigned value %u.\n",
        "red", 123456, 89, 255, 3.14159, 250);

will print the following line (including new-line character, \n):

Color red, number1 123456, number2 00089, hex 0xff, float  3.14, unsigned value 250.

The printf function returns the number of characters printed, or a negative value if an output error occurs.

Common Lisp has the versatile FORMAT function. Unlike the more well-known printf, FORMAT's syntax supports a variety of control flow constructs, including loops and conditionals. For example, a common requirement to print a list of values separated by commas (that is, a comma between values, rather than after each value) is achieved as follows:

 (format t "~{~a~^, ~}!" '(hello world))
 ;; ⇒ "HELLO, WORLD!"

If the first argument to FORMAT is NIL, it returns a formatted string to its caller. The first argument can also be any output stream, in which case the formatted output is written to the stream. A special stream value T means "standard output".

FORMAT was introduced into ZetaLisp at MIT in 1978, based on the Multics ioa_, and was later adopted into the Common Lisp standard. Scheme incorporated Common Lisp-style format in SRFi-28 and SRFi-54.^[3]

Unix printf first appeared in Version 4, as part of the porting to C.^[4]

1980s: Perl, Shell

Perl also has a printf function. The GLib library contains g_print, an implementation of printf.

POSIX systems have a printf program for use in shell scripts; as a standalone command and/or built in some shells. This can be used instead of echo in situations where the latter is not portable. For example:

echo -n -e "$FOO\t$BAR"

may be rewritten portably and more reliably as:

printf "%s\t%s" "$FOO" "$BAR"

1990s: PHP, Python

1995: PHP also has the printf function, with the same specifications and usage as that in C/C++. MATLAB does not have printf, but does have its two extensions sprintf and fprintf which use the same formatting strings. sprintf returns a formatted string instead of producing a visual output.

1991: Python's % operator hearkens to printf's syntax when interpolating the contents of a tuple.^[5] This operator can, for example, be used with the print function:

print("%s\t%s" % (foo,bar))

It's easy to create a C language-like printf() function^[6] in either Python 2.x or 3.x:

import sys
def printf(format, *args):
    sys.stdout.write(format % args)

printf("%d x %d is %d\n", 6, 7, 6*7)

2000s: Java, Python

2004: Java supported printf from version 1.5 onwards as a member of the PrintStream^[7] class, giving it the functionality of both the printf and fprintf functions. At the same time sprintf-like functionality was added to the String class by adding the format(String, Object... args) method.^[8]

// Write "Hello, World!" to standard output (like printf)
System.out.printf("%s, %s", "Hello", "World!");
// create a String object with the value "Hello, World!" (like sprintf)
String myString = String.format("%s, %s", "Hello", "World!");

Unlike most other implementations, Java's implementation of printf throws an exception on encountering a malformed format string.

2008: Python (Version 2.6 and onwards)^[9] added the more versatile Format String Syntax^[10] str.format() method:

print("If you multiply five and six you get {0}.".format(5*6))

Format placeholder specification

Formatting takes place via placeholders within the format string. For example, if a program wanted to print out a person's age, it could present the output by prefixing it with "Your age is ". To denote that we want the integer for the age to be shown immediately after that message, we may use the format string:

"Your age is %d."

Syntax

The syntax for a format placeholder is

%[parameter][flags][width][.precision][length]type

Parameter field

This is a POSIX extension and not in C99. The Parameter field can be omitted or can be:

Character	Description
`n$`	n is the number of the parameter to display using this format specifier, allowing the parameters provided to be output multiple times, using varying format specifiers or in different orders. If any single placeholder specifies a parameter, all the rest of the placeholders MUST also specify a parameter. For example, `printf("%2$d %2$#x; %1$d %1$#x",16,17)` produces "`17 0x11; 16 0x10`".

Flags field

The Flags field can be zero or more (in any order) of:

Character	Description
`-` (minus)	Left-align the output of this placeholder. (the default is to right-align the output).
`+` (plus)	Prepends a plus for positive signed-numeric types. positive = '`+`', negative = '`-`'. (the default doesn't prepend anything in front of positive numbers).
(space)	Prepends a space for positive signed-numeric types. positive = ' ', negative = '`-`'. This flag is ignored if the '+' flag exists. (the default doesn't prepend anything in front of positive numbers).
`0` (zero)	When the 'width' option is specified, prepends zeros for numeric types. (the default prepends spaces). For example, `printf("%2X",3)` produces " `3`", while `printf("%02X",3)` produces in "`03`".
`#` (hash)	Alternate form: For '`g`' and '`G`' types, trailing zeros are not removed. For '`f`', '`F`', '`e`', '`E`', '`g`', '`G`' types, the output always contains a decimal point. For '`o`', '`x`', '`X`' types, the text '`0`', '`0x`', '`0X`', respectively, is prepended to non-zero numbers.

Width field

The Width field specifies a minimum number of characters to output, and is typically used to pad fixed-width fields in tabulated output, where the fields would otherwise be smaller, although it does not cause truncation of oversized fields.

The width field may be omitted, or a numeric integer value, or a dynamic value when passed as another argument when indicated by an asterisk "*". For example, printf("%*d", 5, 10) will result in " 10" being printed, with a total width of 5 characters.

Though not part of the width field, a leading zero is interpreted as the zero-padding flag mentioned above, and a negative value is treated as the positive value in conjunction with the left-alignment "-" flag also mentioned above.

Precision field

The Precision field usually specifies a maximum limit on the output, depending on the particular formatting type. For floating point numeric types, it specifies the number of digits to the right of the decimal point that the output should be rounded. For the string type, it limits the number of characters that should be output, after which the string is truncated.

The precision field may be omitted, or a numeric integer value, or a dynamic value when passed as another argument when indicated by an asterisk "*". For example, printf("%.*s", 3, "abcdef") will result in "abc" being printed.

Length field

The Length field can be omitted or be any of:

Character	Description
`hh`	For integer types, causes `printf` to expect an `int`-sized integer argument which was promoted from a `char`.
`h`	For integer types, causes `printf` to expect an `int`-sized integer argument which was promoted from a `short`.
`l`	For integer types, causes `printf` to expect a `long`-sized integer argument. For floating point types, causes `printf` to expect a `double` argument.
`ll`	For integer types, causes `printf` to expect a `long long`-sized integer argument.
`L`	For floating point types, causes `printf` to expect a `long double` argument.
`z`	For integer types, causes `printf` to expect a `size_t`-sized integer argument.
`j`	For integer types, causes `printf` to expect a `intmax_t`-sized integer argument.
`t`	For integer types, causes `printf` to expect a `ptrdiff_t`-sized integer argument.

Additionally, several platform-specific length options came to exist prior to widespread use of the ISO C99 extensions:

Characters	Description
`I`	For signed integer types, causes `printf` to expect `ptrdiff_t`-sized integer argument; for unsigned integer types, causes `printf` to expect `size_t`-sized integer argument. Commonly found in Win32/Win64 platforms.
`I32`	For integer types, causes `printf` to expect a 32-bit (double word) integer argument. Commonly found in Win32/Win64 platforms.
`I64`	For integer types, causes `printf` to expect a 64-bit (quad word) integer argument. Commonly found in Win32/Win64 platforms.
`q`	For integer types, causes `printf` to expect a 64-bit (quad word) integer argument. Commonly found in BSD platforms.

ISO C99 includes the inttypes.h header file that includes a number of macros for use in platform-independent printf coding. These need to not be inside double-quotes, e.g. printf("%" PRId64 "\n", t);

Example macros include:

Macro	Description
`PRId32`	Typically equivalent to `I32d` (Win32/Win64) or `d`
`PRId64`	Typically equivalent to `I64d` (Win32/Win64), `lld` (32-bit platforms) or `ld` (64-bit platforms)
`PRIi32`	Typically equivalent to `I32i` (Win32/Win64) or `i`
`PRIi64`	Typically equivalent to `I64i` (Win32/Win64), `lli` (32-bit platforms) or `li` (64-bit platforms)
`PRIu32`	Typically equivalent to `I32u` (Win32/Win64) or `u`
`PRIu64`	Typically equivalent to `I64u` (Win32/Win64), `llu` (32-bit platforms) or `lu` (64-bit platforms)
`PRIx32`	Typically equivalent to `I32x` (Win32/Win64) or `x`
`PRIx64`	Typically equivalent to `I64x` (Win32/Win64), `llx` (32-bit platforms) or `lx` (64-bit platforms)

Type field

The Type field can be any of:

Character	Description
`%`	Prints a literal '`%`' character (this type doesn't accept any flags, width, precision, length fields).
`d`, `i`	`int` as a signed decimal number. '`%d`' and '`%i`' are synonymous for output, but are different when used with `scanf()` for input (where using `%i` will interpret a number as hexadecimal if it's preceded by `0x`, and octal if it's preceded by 0.)
`u`	Print decimal `unsigned int`.
`f`, `F`	`double` in normal (fixed-point) notation. '`f`' and '`F`' only differs in how the strings for an infinite number or NaN are printed ('`inf`', '`infinity`' and '`nan`' for '`f`', '`INF`', '`INFINITY`' and '`NAN`' for '`F`').
`e`, `E`	`double` value in standard form ([`-`]d.ddd `e`[`+`/`-`]ddd). An `E` conversion uses the letter `E` (rather than `e`) to introduce the exponent. The exponent always contains at least two digits; if the value is zero, the exponent is `00`. In Windows, the exponent contains three digits by default, e.g. `1.5e002`, but this can be altered by Microsoft-specific `_set_output_format` function.
`g`, `G`	`double` in either normal or exponential notation, whichever is more appropriate for its magnitude. '`g`' uses lower-case letters, '`G`' uses upper-case letters. This type differs slightly from fixed-point notation in that insignificant zeroes to the right of the decimal point are not included. Also, the decimal point is not included on whole numbers.
`x`, `X`	`unsigned int` as a hexadecimal number. '`x`' uses lower-case letters and '`X`' uses upper-case.
`o`	`unsigned int` in octal.
`s`	null-terminated string.
`c`	`char` (character).
`p`	`void *` (pointer to void) in an implementation-defined format.
`a`, `A`	`double` in hexadecimal notation, starting with "0x" or "0X". '`a`' uses lower-case letters, '`A`' uses upper-case letters.^[11]^[12] (C++11 iostreams have a `hexfloat` that works the same).
`n`	Print nothing, but writes the number of characters successfully written so far into an integer pointer parameter. Note: This can be utilized in Uncontrolled format string exploits.

Custom format placeholders

There are a few implementations of printf-like functions that allow extensions to the escape-character-based mini-language, thus allowing the programmer to have a specific formatting function for non-builtin types. One of the most well-known is the (now deprecated) glibc's register_printf_function(). However, it is rarely used due to the fact that it conflicts with static format string checking. Another is Vstr custom formatters, which allows adding multi-character format names, and can work with static format checkers.

Some applications (like the Apache HTTP Server) include their own printf-like function, and embed extensions into it. However these all tend to have the same problems that register_printf_function() has.

The Linux kernel printk function supports a number of ways to display kernel structures using the generic %p specification, by appending additional format characters.^[13] For example, %pI4 prints an IPV4 address in dotted-decimal form. This allows static format string checking (of the %p portion) at the expense of full compatibility with normal printf.

Most non-C languages that have a printf-like function work around the lack of this feature by just using the "%s" format and converting the object to a string representation. C++ offers a notable exception, in that it has a printf function inherited from its C history, but also has a completely different mechanism that is preferred.

Vulnerabilities

Invalid conversion specifications

If the syntax of a conversion specification is invalid, behavior is undefined, and can cause program termination. If there are too few function arguments provided to supply values for all the conversion specifications in the template string, or if the arguments are not of the correct types, the results are also undefined. Excess arguments are ignored. In a number of cases, the undefined behavior has led to "Format string attack" security vulnerabilities.

Some compilers, like the GNU Compiler Collection, will statically check the format strings of printf-like functions and warn about problems (when using the flags -Wall or -Wformat). GCC will also warn about user-defined printf-style functions if the non-standard "format" __attribute__ is applied to the function.

Field width versus explicit delimiters in tabular output

Using only field widths to provide for tabulation, as with a format like "%8d%8d%8d" for three integers in three 8-character columns, will not guarantee that field separation will be retained if large numbers occur in the data. Loss of field separation can easily lead to corrupt output. In systems which encourage the use of programs as building blocks in scripts, such corrupt data can often be forwarded into and corrupt further processing, regardless of whether the original programmer expected the output would only be read by human eyes. Such problems can be eliminated by including explicit delimiters, even spaces, in all tabular output formats. Simply changing the dangerous example from before to " %7d %7d %7d" addresses this, formatting identically until numbers become larger, but then explicitly preventing them from becoming merged on output due to the explicitly included spaces. Similar strategies apply to string data.

Programming languages with printf

AMPL
awk (via sprintf)
The printf utility command, sometimes built in the shell like some implementations of the Korn shell (ksh), Bourne again shell (bash), or Z shell (zsh)
C
- C++ (also provides overloaded shift operators and manipulators as an alternative for formatted output - see iostream and iomanip)
- Objective-C
Clojure
Common Lisp
D
Elixir
F#
GNU MathProg
GNU Octave
G (LabVIEW)
Go
Haskell
J
Java (since version 1.5)
- Clojure
- Scala
Lua (string.format)
Maple
MATLAB
Mythryl
OCaml
- (OCaml Batteries Included provides an additional user-extensible printf)
( Pascal )
- Free Pascal (using format function)
PARI/GP
PHP
Perl
Python (using the % operator)
R
Red/System
Ruby
Rust
Tcl (via format command)
Transact-SQL (via xp_sprintf)
Vala (via print() and FileStream.printf())

References

↑ "ASA Print Control Characters". Retrieved 12 February 2010.
↑ Martin Richards' BCPL distribution
↑ "Final SRFis". Srfi.schemers.org. Retrieved 2014-03-17.
↑ McIlroy, M. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 (PDF) (Technical report). CSTR. Bell Labs. 139.
↑ "Built-in Types — Python 3.3.4 documentation". Python Programming Language – Official Website. Python Software Foundation. Retrieved 12 February 2014.
↑ Martelli, Alex (March 2005). Python Cookbook, Second Edition. Sebastopol, CA: O'Reilly Media, Inc. p. 183. ISBN 0-596-00797-3.
↑ "PrintStream (Java 2 Platform SE 5.0)". Sun Microsystems Inc. 1994. Retrieved 18 November 2008.
↑ "String (Java 2 Platform SE 5.0)". Sun Microsystems Inc. 1994. Retrieved 18 November 2008.
↑ https://www.python.org/download/releases/2.6/
↑ https://docs.python.org/3/library/string.html?highlight=.format#format-string-syntax
↑ ""The GNU C Library Reference Manual", "12.12.3 Table of Output Conversions"". Gnu.org. Retrieved 2014-03-17.
↑ "printf" ("%a" added in C99)
↑ "Linux kernel Documentation/printk-formats.txt". Git.kernel.org. Retrieved 2014-03-17.

External links

C++ reference for std::fprintf
gcc printf format specifications quick reference
printf: print formatted output – System Interfaces Reference, The Single UNIX® Specification, Issue 7 from The Open Group
The Formatter specification in Java 1.5
GNU Bash printf(1) builtin

C programming language

ANSI C C89 and C90 C99 C11 Embedded C MISRA C

C features	Functions Header files Libraries Operators String Syntax Preprocessor Data types

C standard library functions	Char (ctype.h) File I/O (stdio.h) Math (math.h) Dynamic memory (stdlib.h) String (string.h) Time (time.h) Variadic (stdarg.h) POSIX

C standard libraries	Bionic libhybris dietlibc EGLIBC glibc klibc Microsoft Run-time Library musl Newlib uClibc BSD libc

Compilers	Comparison of C compilers ACK Borland Turbo C Clang GCC LCC Pelles C PCC TCC Microsoft Visual Studio Express C++ Watcom C/C++

IDEs	Comparison of C IDEs Anjuta Code::Blocks CodeLite Eclipse Geany Microsoft Visual Studio NetBeans

C and other languages	Compatibility of C and C++ Comparison of C and Embedded C Comparison of C and Pascal Comparison of programming languages

Descendant languages	C++ C# D Objective-C Alef Limbo Go Vala

Category

Unix command-line interface programs and shell builtins

File system	cat chmod chown chgrp cksum cmp cp dd du df file fsck fuser ln ls mkdir mount mv pax pwd rm rmdir split tee touch type umask

Processes	at bg chroot cron fg kill killall nice pgrep pkill ps pstree time top

User environment	clear env exit finger history logname mesg passwd su sudo uptime talk tput uname w wall who whoami write

Text processing	awk banner basename comm csplit cut diff dirname ed ex fmt fold head iconv join less more nl paste printf sed sort spell strings tail tr uniq vi wc xargs yes

Shell builtins	alias cd echo test unset wait

Networking	dig host ifconfig inetd netcat netstat nslookup ping rdate rlogin route ssh traceroute

Searching	find grep locate whatis whereis

Documentation	apropos help man

Miscellaneous	bc dc cal expr lp od sleep true and false

List of Unix utilities

This article is issued from Wikipedia - version of the Thursday, April 21, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.