Algorithmic information theory

Algorithmic information theory is a subfield of information theory and computer science that concerns itself with the relationship between computation and information. According to Gregory Chaitin, it is "the result of putting Shannon's information theory and Turing's computability theory into a cocktail shaker and shaking vigorously."^[1]

Overview

Algorithmic information theory principally studies complexity measures on strings (or other data structures). Because most mathematical objects can be described in terms of strings, or as the limit of a sequence of strings, it can be used to study a wide variety of mathematical objects, including integers.

Informally, from the point of view of algorithmic information theory, the information content of a string is equivalent to the length of the most-compressed possible self-contained representation of that string. A self-contained representation is essentially a program – in some fixed but otherwise irrelevant universal programming language – that, when run, outputs the original string.

From this point of view, a 3000-page encyclopedia actually contains less information than 3000 pages of completely random letters, despite the fact that the encyclopedia is much more useful. This is because to reconstruct the entire sequence of random letters, one must know, more or less, what every single letter is. On the other hand, if every vowel were removed from the encyclopedia, someone with reasonable knowledge of the English language could reconstruct it, just as one could likely reconstruct the sentence "Ths sntnc hs lw nfrmtn cntnt" from the context and consonants present.

Unlike classical information theory, algorithmic information theory gives formal, rigorous definitions of a random string and a random infinite sequence that do not depend on physical or philosophical intuitions about nondeterminism or likelihood. (The set of random strings depends on the choice of the universal Turing machine used to define Kolmogorov complexity, but any choice gives identical asymptotic results because the Kolmogorov complexity of a string is invariant up to an additive constant depending only on the choice of universal Turing machine. For this reason the set of random infinite sequences is independent of the choice of universal machine.)

Some of the results of algorithmic information theory, such as Chaitin's incompleteness theorem, appear to challenge common mathematical and philosophical intuitions. Most notable among these is the construction of Chaitin's constant Ω, a real number which expresses the probability that a self-delimiting universal Turing machine will halt when its input is supplied by flips of a fair coin (sometimes thought of as the probability that a random computer program will eventually halt). Although Ω is easily defined, in any consistent axiomatizable theory one can only compute finitely many digits of Ω, so it is in some sense unknowable, providing an absolute limit on knowledge that is reminiscent of Gödel's Incompleteness Theorem. Although the digits of Ω cannot be determined, many properties of Ω are known; for example, it is an algorithmically random sequence and thus its binary digits are evenly distributed (in fact it is normal).

History

Algorithmic information theory was founded by Ray Solomonoff,^[2] who published the basic ideas on which the field is based as part of his invention of algorithmic probability – a way to overcome serious problems associated with the application of Bayes' rules in statistics. He first described his results at a Conference at Caltech in 1960,^[3] and in a report, February 1960, "A Preliminary Report on a General Theory of Inductive Inference."^[4] Algorithmic information theory was later developed independently by Andrey Kolmogorov, in 1965 and Gregory Chaitin, around 1966.

There are several variants of Kolmogorov complexity or algorithmic information; the most widely used one is based on self-delimiting programs and is mainly due to Leonid Levin (1974). Per Martin-Löf also contributed significantly to the information theory of infinite sequences. An axiomatic approach to algorithmic information theory based on Blum axioms (Blum 1967) was introduced by Mark Burgin in a paper presented for publication by Andrey Kolmogorov (Burgin 1982). The axiomatic approach encompasses other approaches in the algorithmic information theory. It is possible to treat different measures of algorithmic information as particular cases of axiomatically defined measures of algorithmic information. Instead of proving similar theorems, such as the basic invariance theorem, for each particular measure, it is possible to easily deduce all such results from one corresponding theorem proved in the axiomatic setting. This is a general advantage of the axiomatic approach in mathematics. The axiomatic approach to algorithmic information theory was further developed in the book (Burgin 2005) and applied to software metrics (Burgin and Debnath, 2003; Debnath and Burgin, 2003).

Precise definitions

Main article: Kolmogorov complexity

A binary string is said to be random if the Kolmogorov complexity of the string is at least the length of the string. A simple counting argument shows that some strings of any given length are random, and almost all strings are very close to being random. Since Kolmogorov complexity depends on a fixed choice of universal Turing machine (informally, a fixed "description language" in which the "descriptions" are given), the collection of random strings does depend on the choice of fixed universal machine. Nevertheless, the collection of random strings, as a whole, has similar properties regardless of the fixed machine, so one can (and often does) talk about the properties of random strings as a group without having to first specify a universal machine.

Main article: Algorithmically random sequence

An infinite binary sequence is said to be random if, for some constant c, for all n, the Kolmogorov complexity of the initial segment of length n of the sequence is at least n − c. It can be shown that almost every sequence (from the point of view of the standard measure — "fair coin" or Lebesgue measure — on the space of infinite binary sequences) is random. Also, since it can be shown that the Kolmogorov complexity relative to two different universal machines differs by at most a constant, the collection of random infinite sequences does not depend on the choice of universal machine (in contrast to finite strings). This definition of randomness is usually called Martin-Löf randomness, after Per Martin-Löf, to distinguish it from other similar notions of randomness. It is also sometimes called 1-randomness to distinguish it from other stronger notions of randomness (2-randomness, 3-randomness, etc.). In addition to Martin-Löf randomness concepts, there are also recursive randomness, Schnorr randomness, and Kurtz randomness etc. Yongge Wang showed ^[5] that all of these randomness concepts are different.

(Related definitions can be made for alphabets other than the set $\{0,1\}$ .)

Specific sequence

Algorithmic information theory (AIT) is the information theory of individual objects, using computer science, and concerns itself with the relationship between computation, information, and randomness.

The information content or complexity of an object can be measured by the length of its shortest description. For instance the string

"0101010101010101010101010101010101010101010101010101010101010101"

has the short description "32 repetitions of '01'", while

"1100100001100001110111101110110011111010010000100101011110010110"

presumably has no simple description other than writing down the string itself.

More formally, the Algorithmic Complexity (AC) of a string x is defined as the length of the shortest program that computes or outputs x, where the program is run on some fixed reference universal computer.

A closely related notion is the probability that a universal computer outputs some string x when fed with a program chosen at random. This Algorithmic "Solomonoff" Probability (AP) is key in addressing the old philosophical problem of induction in a formal way.

The major drawback of AC and AP are their incomputability. Time-bounded "Levin" complexity penalizes a slow program by adding the logarithm of its running time to its length. This leads to computable variants of AC and AP, and Universal "Levin" Search (US) solves all inversion problems in optimal time (apart from some unrealistically large multiplicative constant).

AC and AP also allow a formal and rigorous definition of randomness of individual strings to not depend on physical or philosophical intuitions about non-determinism or likelihood. Roughly, a string is Algorithmic "Martin-Loef" Random (AR) if it is incompressible in the sense that its algorithmic complexity is equal to its length.

AC, AP, and AR are the core sub-disciplines of AIT, but AIT spawns into many other areas. It serves as the foundation of the Minimum Description Length (MDL) principle, can simplify proofs in computational complexity theory, has been used to define a universal similarity metric between objects, solves the Maxwell daemon problem, and many others.

References

↑ Algorithmic Information Theory
↑ Vitanyi, P. "Obituary: Ray Solomonoff, Founding Father of Algorithmic Information Theory"
↑ Paper from conference on "Cerebral Systems and Computers", California Institute of Technology, February 8–11, 1960, cited in "A Formal Theory of Inductive Inference, Part 1, 1964, p. 1
↑ Solomonoff, R., "A Preliminary Report on a General Theory of Inductive Inference", Report V-131, Zator Co., Cambridge, Ma., (November Revision of February 4, 1960 report.)
↑ Yongge Wang: Randomness and Complexity. PhD Thesis, 1996. http://webpages.uncc.edu/yonwang/papers/thesis.pdf

External links

Blum, M. (1967) On the Size of Machines, Information and Control, v. 11, pp. 257–265
Blum M. (1967a) A Machine-independent Theory of Complexity of Recursive Functions, Journal of the ACM, v. 14, No.2, pp. 322–336
Burgin, M. (1982) Generalized Kolmogorov complexity and duality in theory of computations, Soviet Math. Dokl., v.25, No. 3, pp. 19–23
Burgin, M. (1990) Generalized Kolmogorov Complexity and other Dual Complexity Measures, Cybernetics, No. 4, pp. 21–29
Burgin, M. Super-recursive algorithms, Monographs in computer science, Springer, 2005
Calude, C.S. (1996) Algorithmic information theory: Open problems, J. UCS, v. 2, pp. 439–441
Calude, C.S. Information and Randomness: An Algorithmic Perspective, (Texts in Theoretical Computer Science. An EATCS Series), Springer-Verlag, Berlin, 2002
Chaitin, G.J. (1966) On the Length of Programs for Computing Finite Binary Sequences, J. Association for Computing Machinery, v. 13, No. 4, pp. 547–569
Chaitin, G.J. (1969) On the Simplicity and Speed of Programs for Computing Definite Sets of Natural Numbers, J. Association for Computing Machinery, v. 16, pp. 407–412
Chaitin, G.J. (1975) A Theory of Program Size Formally Identical to Information Theory, J. Association for Computing Machinery, v. 22, No. 3, pp. 329–340
Chaitin, G.J. (1977) Algorithmic information theory, IBM Journal of Research and Development, v.21, No. 4, 350–359
Chaitin, G.J. Algorithmic Information Theory, Cambridge University Press, Cambridge, 1987
Kolmogorov, A.N. (1965) Three approaches to the definition of the quantity of information, Problems of Information Transmission, No. 1, pp. 3–11
Kolmogorov, A.N. (1968) Logical basis for information theory and probability theory, IEEE Trans. Inform. Theory, vol. IT-14, pp. 662–664
Levin, L. A. (1974) Laws of information (nongrowth) and aspects of the foundation of probability theory, Problems of Information Transmission, v. 10, No. 3, pp. 206–210
Levin, L.A. (1976) Various Measures of Complexity for Finite Objects (Axiomatic Description), Soviet Math. Dokl., v. 17, pp. 522–526
Li, M., and Vitanyi, P. An Introduction to Kolmogorov Complexity and its Applications, Springer-Verlag, New York, 1997
Solomonoff, R.J. (1960) A Preliminary Report on a General Theory of Inductive Inference, Technical Report ZTB-138, Zator Company, Cambridge, Mass.
Solomonoff, R.J. (1964) A Formal Theory of Inductive Inference, Information and Control, v. 7, No. 1, pp. 1–22; No.2, pp. 224–254
Solomonoff, R.J. (2009) Algorithmic Probability: Theory and Applications, Information Theory and Statistical Learning, Springer NY, Emmert-Streib, F. and Dehmer, M. (Eds), ISBN 978-0-387-84815-0.
Van Lambagen (1989). "Algorithmic Information Theory". Journal for Symbolic Logic 54: 1389–1400. doi:10.1017/S0022481200041153.
Zurek, W.H. (1991) Algorithmic Information Content, Church-Turing Thesis, physical entropy, and Maxwell’s demon, in Complexity, Entropy and the Physics of Information, (Zurek, W.H., Ed.) Addison-Wesley, pp. 73–89
Zvonkin, A.K. and Levin, L. A. (1970) The Complexity of Finite Objects and the Development of the Concepts of Information and Randomness by Means of the Theory of Algorithms, Russian Mathematics Surveys, v. 256, pp. 83–124

Statistics

Descriptive statistics

Continuous data

Location	Mean arithmetic geometric harmonic Median Mode

Dispersion	Range Standard deviation Coefficient of variation Percentile Interquartile range

Shape	Variance Skewness Kurtosis Moments L-moments

Count data

Index of dispersion

Summary tables

Dependence

Statistical graphics

Data collection

Study design	Effect size Standard error Statistical power Sample size determination

Survey methodology	Sampling stratified cluster Opinion poll Questionnaire

Controlled experiments	Design control optimal Controlled trial Randomized Random assignment Replication Blocking Factorial experiment

Uncontrolled studies	Observational study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Confidence interval Testing hypotheses Power

Unbiased estimators	Mean unbiased minimum-variance Median unbiased

Biased estimators	Maximum likelihood Method of moments Minimum distance Density estimation

Parametric tests	Likelihood-ratio Wald Score

Specific tests

Z (normal) Student's t-test F Shapiro–Wilk Kolmogorov–Smirnov

Goodness of fit	Chi-squared G Sample source (Anderson–Darling) Sample normality (Shapiro–Wilk) Skewness / kurtosis normality (Jarque–Bera) Model comparison (Likelihood-ratio) Model quality (Akaike criterion)

Signed-rank	1-sample (Wilcoxon) 2-sample (Mann–Whitney U) 1-way anova (Kruskal–Wallis)

Bayesian inference

Correlation	Pearson product–moment Partial correlation Confounding variable Coefficient of determination

Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)

Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity

Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions

Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality

Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey

Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)

Frequency domain	Spectral density estimation Fourier analysis Wavelet

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time

Hazard function	Nelson–Aalen estimator

Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics

Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification

Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population statistics Psychometrics

Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Commons
WikiProject

This article is issued from Wikipedia - version of the Thursday, May 05, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.