Change of variables

"Substitution (algebra)" redirects here. It is not to be confused with substitution (logic).

Part of a series of articles about

Fundamental theorem

Definitions
Derivative (generalizations) Differential infinitesimal of a function total
Concepts
Differentiation notation Second derivative Third derivative Change of variables Implicit differentiation Related rates Taylor's theorem
Rules and identities
Sum Product Chain Power Quotient General Leibniz Faà di Bruno's formula

Integral

Definitions
Lists of integrals
Antiderivative Integral (improper) Riemann integral Lebesgue integration Contour integration
Integration by
Parts Discs Cylindrical shells Substitution (trigonometric) Partial fractions Order Reduction formulae

Series

Convergence tests
Geometric (arithmetico-geometric) Harmonic Alternating Power Binomial Taylor
Summand limit (term test) Ratio Root Integral Direct comparison Limit comparison Alternating series Cauchy condensation Dirichlet Abel

Vector

Theorems
Gradient Divergence Curl Laplacian Directional derivative Identities
Divergence Gradient Green's Stokes'

Multivariable

Formalisms
Matrix Tensor Exterior Geometric
Definitions
Partial derivative Multiple integral Line integral Surface integral Volume integral Jacobian Hessian matrix

Specialized

In mathematics, the operation of substitution consists in replacing all the occurrences of a free variable appearing in an expression or a formula by a number or another expression. In other words, an expression involving free variables may be considered as defining a function, and substituting values to the variables in the expression is equivalent to applying the function defined by the expression to these values.

A change of variables is commonly a particular type of substitution, where the substituted values are expressions that depend on other variables. This is a standard technique used to reduce a difficult problem to a simpler one. A change of coordinates is a common type of change of variables. However, if the expression in which the variables are changed involves derivatives or integrals, the change of variable does not reduce to a substitution.

A very simple example of a useful variable change can be seen in the problem of finding the roots of the sixth order polynomial:

x^6 - 9 x^3 + 8 = 0. \,

Sixth order polynomial equations are generally impossible to solve in terms of radicals (see Abel–Ruffini theorem). This particular equation, however, may be written

(x^3)^2-9(x^3)+8=0

(this is a simple case of a polynomial decomposition). Thus the equation may be simplified by defining a new variable x³ = u. Substituting x by $\sqrt[3]{u}$ into the polynomial gives

u^2 - 9 u + 8 = 0 ,

which is just a quadratic equation with solutions:

u = 1 \quad \text{and} \quad u = 8.

The solutions in terms of the original variable are obtained by substituting x³ back in for u:

x^3 = 1 \quad \text{and} \quad x^3 = 8.

Then, assuming that x is real,

x = (1)^{1/3} = 1 \quad \text{and} \quad x = (8)^{1/3} = 2.

Simple example

Consider the system of equations

xy+x+y=71

x^2y+xy^2=880

where $x$ and $y$ are positive integers with $x>y$ . (Source: 1991 AIME)

Solving this normally is not terrible, but it may get a little tedious. However, we can rewrite the second equation as $xy(x+y)=880$ . Making the substitution $s=x+y, t=xy$ reduces the system to $s+t=71, st=880.$ Solving this gives $(s,t)=(16,55)$ or $(s,t)=(55,16).$ Back-substituting the first ordered pair gives us $x+y=16, xy=55$ , which easily gives the solution $(x,y)=(11,5).$ Back-substituting the second ordered pair gives us $x+y=55, xy=16$ , which gives no solutions. Hence the solution that solves the system is $(x,y)=(11,5)$ .

Formal introduction

Let $A$ , $B$ be smooth manifolds and let $\Phi: A \rightarrow B$ be a $C^r$ -diffeomorphism between them, that is: $\Phi$ is a $r$ times continuously differentiable, bijective map from $A$ to $B$ with $r$ times continuously differentiable inverse from $B$ to $A$ . Here $r$ may be any natural number (or zero), $\infty$ (smooth) or $\omega$ (analytic).

The map $\Phi$ is called a regular coordinate transformation or regular variable substitution, where regular refers to the $C^r$ -ness of $\Phi$ . Usually one will write $x = \Phi(y)$ to indicate the replacement of the variable $x$ by the variable $y$ by substituting the value of $\Phi$ in $y$ for every occurrence of $x$ .

Other examples

Coordinate transformation

Some systems can be more easily solved when switching to cylindrical coordinates. Consider for example the equation

U(x, y, z) := (x^2 + y^2) \sqrt{ 1 - \frac{x^2}{x^2 + y^2} } = 0.

This may be a potential energy function for some physical problem. If one does not immediately see a solution, one might try the substitution

\displaystyle (x, y, z) = \Phi(r, \theta, z)

given by

\displaystyle \Phi(r, \theta, z) = (r \cos(\theta), r \sin(\theta), z)

Note that if $\theta$ runs outside a $2\pi$ -length interval, for example, $[0, 2\pi]$ , the map $\Phi$ is no longer bijective. Therefore $\Phi$ should be limited to, for example $(0, \infty] \times [0, 2\pi) \times [-\infty, \infty]$ . Notice how $r = 0$ is excluded, for $\Phi$ is not bijective in the origin ( $\theta$ can take any value, the point will be mapped to (0, 0, z)). Then, replacing all occurrences of the original variables by the new expressions prescribed by $\Phi$ and using the identity $\sin^2 x + \cos^2 x = 1$ , we get

V(r, \theta, z) = r^2 \sqrt{ 1 - \frac{r^2 \cos^2 \theta}{r^2} } = r^2 \sqrt{1 - \cos^2 \theta} = r^2\left|\sin\theta\right|

Now the solutions can be readily found: $\sin(\theta) = 0$ , so $\theta = 0$ or $\theta = \pi$ . Applying the inverse of $\Phi$ shows that this is equivalent to $y = 0$ while $x \not= 0$ . Indeed we see that for $y = 0$ the function vanishes, except for the origin.

Note that, had we allowed $r = 0$ , the origin would also have been a solution, though it is not a solution to the original problem. Here the bijectivity of $\Phi$ is crucial. Note also that the function is always positive (for $x,y,z\in\reals$ ), hence the absolute values.

Differentiation

Main article: Chain rule

The chain rule is used to simplify complicated differentiation. For example, to calculate the derivative

\frac{d}{d x}\left(\sin(x^2)\right)\,

the variable x may be changed by introducing x² = u. Then, by the chain rule:

\frac{d}{d x} = \frac{d}{d u} \frac{d u}{d x} = \frac{d}{d x}\left(u\right) \frac{d}{d u} = \frac{d}{d x}\left(x^2\right) \frac{d}{d u} = 2 x \frac{d}{d u}\,

so that

\frac{d}{d x}\left(\sin(x^2)\right) = 2 x \frac{d}{d u}\left(\sin(u)\right) = 2 x \cos(x^2)\,

where in the very last step u has been replaced with x².

Integration

Main article: Integration by substitution

Difficult integrals may often be evaluated by changing variables; this is enabled by the substitution rule and is analogous to the use of the chain rule above. Difficult integrals may also be solved by simplifying the integral using a change of variables given by the corresponding Jacobian matrix and determinant. Using the Jacobian determinant and the corresponding change of variable that it gives is the basis of coordinate systems such as polar, cylindrical, and spherical coordinate systems.

Differential equations

Variable changes for differentiation and integration are taught in elementary calculus and the steps are rarely carried out in full.

The very broad use of variable changes is apparent when considering differential equations, where the independent variables may be changed using the chain rule or the dependent variables are changed resulting in some differentiation to be carried out. Exotic changes, such as the mingling of dependent and independent variables in point and contact transformations, can be very complicated but allow much freedom.

Very often, a general form for a change is substituted into a problem and parameters picked along the way to best simplify the problem.

Scaling and shifting

Probably the simplest change is the scaling and shifting of variables, that is replacing them with new variables that are "stretched" and "moved" by constant amounts. This is very common in practical applications to get physical parameters out of problems. For an n^th order derivative, the change simply results in

\frac{d^n y}{d x^n} = \frac{y_\text{scale}}{x_\text{scale}^n} \frac{d^n \hat y}{d \hat x^n}

where

x = \hat x x_\text{scale} + x_\text{shift}

y = \hat y y_\text{scale} + y_\text{shift}.

This may be shown readily through the chain rule and linearity of differentiation. This change is very common in practical applications to get physical parameters out of problems, for example, the boundary value problem

\mu \frac{d^2 u}{d y^2} = \frac{d p}{d x} \quad ; \quad u(0) = u(L) = 0

describes parallel fluid flow between flat solid walls separated by a distance δ; µ is the viscosity and $d p/d x$ the pressure gradient, both constants. By scaling the variables the problem becomes

\frac{d^2 \hat u}{d \hat y^2} = 1 \quad ; \quad \hat u(0) = \hat u(1) = 0

where

y = \hat y L \qquad \text{and} \qquad u = \hat u \frac{L^2}{\mu} \frac{d p}{d x}.

Scaling is useful for many reasons. It simplifies analysis both by reducing the number of parameters and by simply making the problem neater. Proper scaling may normalize variables, that is make them have a sensible unitless range such as 0 to 1. Finally, if a problem mandates numeric solution, the fewer the parameters the fewer the number of computations.

Momentum vs. velocity

Consider a system of equations

m \dot v = - \frac{ \partial H }{ \partial x }

m \dot x = \frac{ \partial H }{ \partial v }

for a given function $H(x, v)$ . The mass can be eliminated by the (trivial) substitution $\Phi(p) = 1/m \cdot v$ . Clearly this is a bijective map from $\mathbb{R}$ to $\mathbb{R}$ . Under the substitution $v = \Phi(p)$ the system becomes

\dot p = - \frac{ \partial H }{ \partial x }

\dot x = \frac{ \partial H }{ \partial p }

Lagrangian mechanics

Main article: Lagrangian mechanics

Given a force field $\phi(t, x, v)$ , Newton's equations of motion are

m \ddot x = \phi(t, x, v)

Lagrange examined how these equations of motion change under an arbitrary substitution of variables $x = \Psi(t, y)$ , $v = \frac{\partial \Psi(t, y)}{\partial t} + \frac{\partial\Psi(t, y)}{\partial y} \cdot w$ .

He found that the equations

\frac{ \partial{L} }{ \partial y} = \frac{\mathrm{d}}{\mathrm{d}t} \frac{\partial{L}}{\partial{w}}

are equivalent to Newton's equations for the function $L = T - V$ , where T is the kinetic, and V the potential energy.

In fact, when the substitution is chosen well (exploiting for example symmetries and constraints of the system) these equations are much easier to solve than Newton's equations in Cartesian coordinates.