Mogensen–Scott encoding

In computer science, Scott encoding is a way to represent (recursive) data types in the lambda calculus. Church encoding performs a similar function. The data and operators form a mathematical structure which is embedded in the lambda calculus.

Whereas Church encoding starts with representations of the basic data types, and builds up from it, Scott encoding starts from the simplest method to compose algebraic data types.

Mogensen–Scott encoding extends and slightly modifies Scott encoding by applying the encoding to Metaprogramming. This encoding allows the representation of lambda calculus terms, as data, to be operated on by a meta program.

History

Scott encoding appears first in a set of unpublished lecture notes by Dana Scott. Torben Mogensen later extended Scott encoding for the encoding of Lambda terms as data.[1]

Discussion

Lambda calculus allows data to be stored as parameters to a function that does not yet have all the parameters required for application. For example,

 ((\lambda x_1 \ldots x_n.\lambda c.c\ x_1 \ldots x_n)\ v_1 \ldots v_n)\ f

May be thought of as a record or struct where the fields  x_1 \ldots x_n have been initialized with the values  v_1 \ldots v_n . These values may then be accessed by applying the term to a function f. This reduces to,

 f\ v_1 \ldots v_n

c may represent a constructor for an algebraic data type in functional languages such as Haskell. Now suppose there are N constructors, each with A_i arguments;

\begin{array}{c|c|c}
\text{Constructor} & \text{Given arguments} & \text{Result} \\
\hline
((\lambda x_1 \ldots x_{A_1}.\lambda c_1 \ldots c_N.c_1\ x_1 \ldots x_{A_1})\ v_1 \ldots v_{A_1}) &
f_1 \ldots f_N &
f_1\ v_1 \ldots v_{A_1} \\
((\lambda x_1 \ldots x_{A_2}.\lambda c_1 \ldots c_N.c_2\ x_1 \ldots x_{A_2})\ v_1 \ldots v_{A_2}) &
f_1 \ldots f_N &
f_2\ v_1 \ldots v_{A_2} \\
\vdots & \vdots & \vdots \\

((\lambda x_1 \ldots x_{A_N}.\lambda c_1 \ldots c_N.c_N\ x_1 \ldots x_{A_N})\ v_1 \ldots v_{A_N}) &
f_1 \ldots f_N &
f_N\ v_1 \ldots v_{A_N}
\end{array}

Each constructor selects a different function from the function parameters  f_1 \ldots f_N . This provides branching in the process flow, based on the constructor. Each constructor may have a different arity (number of parameters). If the constructors have no parameters then the set of constructors acts like an enum; a type with a fixed number of values. If the constructors have parameters, recursive data structures may be constructed.

Definition

Let D be a datatype with N constructors, \{c_i\}_{i=1}^N, such that constructor c_i has arity A_i.

Scott encoding

The Scott encoding of constructor c_i of the data type D is

\lambda x_1 \ldots x_{A_i} . \lambda c_1 \ldots c_N . c_i\ x_1 \ldots x_{A_i}

Mogensen–Scott encoding

Mogensen extends Scott encoding to encode any untyped lambda term as data. This allows a lambda term to be represented as data, within a Lambda calculus meta program. The meta function mse converts a lambda term into the corresponding data representation of the lambda term;

\begin{array}{rcl}
\operatorname{mse}[x] & = & \lambda a, b, c.a\ x \\
\ \operatorname{mse}[M\ N] & = & \lambda a, b, c.b\ \operatorname{mse}[M]\ \operatorname{mse}[N] \\
\ \operatorname{mse}[\lambda x . M] & = & \lambda a, b, c.c\ (\lambda x.\operatorname{mse}[M]) \\
\end{array}

The "lambda term" is represented as a tagged union with three cases:

For example,

 \operatorname{mse}[\lambda x.f\ (x\ x)]
 \lambda a, b, c.c\ (\lambda x.\operatorname{mse}[f\ (x\ x)])
 \lambda a, b, c.c\ (\lambda x.\lambda a, b, c.b\ \operatorname{mse}[f]\ \operatorname{mse}[x\ x])
 \lambda a, b, c.c\ (\lambda x.\lambda a, b, c.b\ (\lambda a, b, c.a\ f)\ \operatorname{mse}[x\ x])
 \lambda a, b, c.c\ (\lambda x.\lambda a, b, c.b\ (\lambda a, b, c.a\ f)\ (\lambda a, b, c.b\ \operatorname{mse}[x]\ \operatorname{mse}[x]))
 \lambda a, b, c.c\ (\lambda x.\lambda a, b, c.b\ (\lambda a, b, c.a\ f)\ (\lambda a, b, c.b\ (\lambda a, b, c.a\ x)\ (\lambda a, b, c.a\ x)))

Comparison to the Church encoding

The Scott encoding coincides with the Church encoding for booleans. Church encoding of pairs may be generalized to arbitrary data types by encoding c_i of D above as

\lambda x_1 \ldots x_{A_i} . \lambda c_1 \ldots c_N . c_i (x_1 c_1 \ldots c_N) \ldots (x_{A_i} c_1 \ldots c_N)

compare this to the Mogensen Scott encoding,

\lambda x_1 \ldots x_{A_i} . \lambda c_1 \ldots c_N . c_i x_1 \ldots x_{A_i}

With this generalization, the Scott and Church encodings coincide on all enumerated datatypes (such as the boolean datatype) because each constructor is a constant (no parameters).

Type definitions

Church-encoded data and operations on them are typable in system F, but Scott-encoded data and operations are not obviously typable in system F. Universal as well as recursive types appear to be required,[2] and since strong normalization does not hold for recursively typed lambda calculus, termination of programs manipulating Scott-encoded data cannot be established by determining well-typedness of such programs.

See also

Notes

  1. Mogensen, Torben (1994). "Efficient Self-Interpretation in Lambda Calculus". Journal of Functional Programming 2: 345–364.
  2. See the note "Types for the Scott numerals" by Martín Abadi, Luca Cardelli and Gordon Plotkin (February 18, 1993).

References


This article is issued from Wikipedia - version of the Saturday, April 16, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.