No free lunch theorem

This article is about mathematical folklore. For treatment of the mathematics, see No free lunch in search and optimization.

In mathematical folklore, the "no free lunch" theorem (sometimes pluralized) of David Wolpert and William Macready appears in the 1997 "No Free Lunch Theorems for Optimization".^[1] Wolpert had previously derived no free lunch theorems for machine learning (statistical inference).^[2]

In 2005, Wolpert and Macready themselves indicated that the first theorem in their paper "state[s] that any two optimization algorithms are equivalent when their performance is averaged across all possible problems".^[3] The 1997 theorems of Wolpert and Macready are mathematically technical and some find them unintuitive.^[4]

The folkloric "no free lunch" (NFL) theorem is an easily stated and easily understood consequence of theorems Wolpert and Macready actually prove. It is weaker than the proven theorems, and thus does not encapsulate them.

Various investigators have extended the work of Wolpert and Macready substantively. See No free lunch in search and optimization for treatment of the research area.

Original NFL theorems

Wolpert and Macready give two NFL theorems that are closely related to the folkloric theorem. In their paper, they state:

We have dubbed the associated results NFL theorems because they demonstrate that if an algorithm performs well on a certain class of problems then it necessarily pays for that with degraded performance on the set of all remaining problems.^[1]

The theorem first hypothesizes objective functions that do not change while optimization is in progress, and the second hypothesizes objective functions that may change.^[1]

Theorem 1: For any algorithms a₁ and a₂, at iteration step m

\sum_f P(d_m^y | f, m, a_1) = \sum_f P(d_m^y | f, m, a_2),

where $d_m^y$ denotes the ordered set of size $m$ of the cost values $y$ associated to input values $x \in X$ , $f:X \rightarrow Y$ is the function being optimized and $P(d_m^y | f, m, a)$ is the conditional probability of obtaining a given sequence of cost values from algorithm $a$ run $m$ times on function $f$ .

The theorem can be equivalently formulated as follows:

Theorem 1: Given a finite set

V

and a finite set

S

of real numbers, assume that

f : V \to S

is chosen at random according to uniform distribution on the set

V^S

of all possible functions from

V

S

. For the problem of optimizing

f

over the set

V

, then no algorithm performs better than blind search.

Here, blind search means that at each step of the algorithm, the element $v \in V$ is chosen at random with uniform probability distribution from the elements of $V$ that have not been chosen previously.

In essence, this says that when all functions f are equally likely, the probability of observing an arbitrary sequence of m values in the course of optimization does not depend upon the algorithm. In the analytic framework of Wolpert and Macready, performance is a function of the sequence of observed values (and not e.g. of wall-clock time), so it follows easily that all algorithms have identically distributed performance when objective functions are drawn uniformly at random, and also that all algorithms have identical mean performance. But identical mean performance of all algorithms does not imply Theorem 1, and thus the folkloric theorem is not equivalent to the original theorem.

Theorem 2 establishes a similar, but "more subtle", NFL result for time-varying objective functions.^[1]

Intelligent design and the NFL theorem

Notes

1 2 3 4 Wolpert, D.H., Macready, W.G. (1997), "No Free Lunch Theorems for Optimization", IEEE Transactions on Evolutionary Computation 1, 67.
↑ Wolpert, David (1996), "The Lack of A Priori Distinctions between Learning Algorithms", Neural Computation, pp. 1341-1390.
↑ Wolpert, D.H., and Macready, W.G. (2005) "Coevolutionary free lunches", IEEE Transactions on Evolutionary Computation, 9(6): 721-735
↑ Forster, Malcolm R. (2009). "Notice: No Free Lunches for Anyone, Bayesians Included" (PDF). Retrieved 2014-01-14. The problem with the no-free-lunch theorems in machine learning is that they are rather complicated and difficult to follow because they are proven under the most general conditions possible.
↑ Dembski, W. A. (2002) No Free Lunch, Rowman & Littlefield
↑ Wolpert, D. (2003) "William Dembski's treatment of the No Free Lunch theorems is written in jello".
↑ Perakh, M. (2003) "The No Free Lunch Theorems and Their Application to Evolutionary Algorithms".

External links

No Free Lunch Theorems
No Free Lunches for Anyone, Bayesians Included (1999) - a simple example illustrating the idea behind these theorems
- graphics illustrating the theorem

This article is issued from Wikipedia - version of the Monday, October 05, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.