Sum-of-squares optimization

This article deals with sum-of-squares constraints. For problems with sum-of-squares cost functions, see Least squares.

A sum-of-squares optimization program is an optimization problem with a linear cost function and a particular type of constraint on the decision variables. These constraints are of the form that when the decision variables are used as coefficients in certain polynomials, those polynomials should have the polynomial SOS property. When fixing the maximum degree of the polynomials involved, sum-of-squares optimization is also known as the Lasserre hierarchy of relaxations in semidefinite programming.

Sum-of-squares optimization techniques have been successfully applied by researchers in the control engineering field.[1][2][3]

Optimization problem

The problem can be expressed as

 \max_{u\in\R^n} c^T u

subject to

 a_{k,0}(x) + a_{k,1}(x)u_1 + \cdots + a_{k,n}(x)u_n \in \text{SOS}
\quad (k=1,\ldots, N_s).

Here "SOS" represents the class of sum-of-squares (SOS) polynomials. The vector c\in \R^n and polynomials \{ a_{k,j} \} are given as part of the data for the optimization problem. The quantities u\in \R^n are the decision variables. SOS programs can be converted to semidefinite programs (SDPs) using the duality of the SOS polynomial program and a relaxation for constrained polynomial optimization using positive-semidefinite matrices, see the following section.

Dual problem: constrained polynomial optimization

Suppose we have an  n  -variate polynomial  p(x): \mathbb{R}^n \to \mathbb{R}  , and suppose that we would like to minimize this polynomial over a subset {\textstyle  A \subseteq \mathbb{R}^n  }. Suppose furthermore that the constraints on the subset {\textstyle  A  } can be encoded using {\textstyle  m  } polynomial inequalities of degree at most  2d
  , each of the form {\textstyle  a_i(x) \ge b_i  } where  a_i: \mathbb{R}^n \to \mathbb{R}  is a polynomial of degree at most  2d
  and {\textstyle  b_i \in \mathbb{R}  }. A natural, though generally non-convex program for this optimization problem is the following:

 \min_{x \in \mathbb{R}^{[n] \cup \emptyset}} \langle C, x^{\otimes d} (x^{\otimes d})^\top \rangle

subject to:

 
\langle A_i, x^{\otimes d}(x^{\otimes d})^\top \rangle \ge b_i \qquad \forall \ i \in [m]

,    (1)
 x_{\emptyset} = 1
  ,

where {\textstyle  
 x^{\otimes d}

  } is the {\textstyle  
 d

  }-wise Kronecker product of {\textstyle  
 x


  } with itself, {\textstyle  
 C

  } is a matrix of coefficients of the polynomial {\textstyle  
p(x)

  } that we want to minimize, and{\textstyle  
 A_i

  } is a matrix of coefficients of the polynomial {\textstyle  
a_i(x)

  } encoding the  
i

  th constraint on the subset  
A \subset \mathbb{R}^n

  . The additional, fixed index in our search space,  x_{\emptyset} = 1
  , is added for the convenience of writing the polynomials {\textstyle  
p(x)

  } and {\textstyle  
a_i(x)

  } in a matrix representation.

This program is generally non-convex, because the constraints (1) are not convex. One possible convex relaxation for this minimization problem uses semidefinite programming to replace the Kronecker product  
x^{\otimes d}(x^{\otimes d})^\top

  with a positive-semidefinite matrix  X
  : we index each monomial of size at most  2d
  by a multiset   S  of at most  2d
  indices,   S \subset [n]^{\le d}  . For each such monomial, we create a variable  X_S
  in the program, and we arrange the variables  X_S
  to form the matrix {\textstyle  X \in \mathbb{R}^{[n]^{\le d} \times [n]^{\le d}}  }, where we identify the rows and columns of  X
  with multi-subsets of   [n]  . We then write the following semidefinite program in the variables  X_S
  :

 \min_{X \in \mathbb{R}^{[n]^{\le d} \times [n]^{\le d} }}\langle C, X \rangle

subject to:

 
\langle A_i, x^{\otimes d}(x^{\otimes d})^\top \rangle \ge 0 \qquad \forall \ i \in [m]

  ,{\textstyle  
Q

  }
 X_{\emptyset} = 1  ,
 
X_{U \cup V} = X_{S \cup T} \qquad  \forall \  U,V,S,T \subseteq [n]^{\le d},\text{ and} \ U \cup V = S \cup T

  ,
 
X \succeq 0

  ,

where again {\textstyle  
 C

  } is the matrix of coefficients of the polynomial {\textstyle  
p(x)

  } that we want to minimize, and{\textstyle  
 A_i

  } is the matrix of coefficients of the polynomial {\textstyle  
a_i(x)

  } encoding the  
i

  th constraint on the subset  
A \subset \mathbb{R}^n

  .

The third constraint ensures that the value of a monomial that appears several times within the matrix is equal throughout the matrix, and is added to make  X
  behave more like  
x^{\otimes d}(x^{\otimes d})^\top

  .

Duality

One can take the dual of the above semidefinite program and obtain the following program:

 
\max_{y \in \mathbb{R}^{m'}} b^\top y

  ,

subject to:

 
C - \sum_{i \in [m']} y_i A_i \succeq 0

  .

The dimension {\textstyle  
m'

  } is equal to the number of constraints in the semidefinite program. The constraint {\textstyle  
C - \sum_{i \in m'} y_i A_i \succeq 0

  } ensures that the polynomial represented by {\textstyle  
C - \sum_{i \in [m']} y_i A_i \succeq 0

  } is a sum-of-squares of polynomials: by a characterization of PSD matrices, for any PSD matrix {\textstyle  
Q\in \mathbb{R}^{m \times m}

  }, we can write {\textstyle  
Q = \sum_{i \in [m]} f_i f_i^\top

  } for vectors {\textstyle  
f_i \in \mathbb{R}^m

  }. Thus for any {\textstyle  
 x \in \mathbb{R}^{[n] \cup \emptyset}

  } with {\textstyle  
 x_\emptyset = 1


  },

 
 \begin{align}
(x^{\otimes d})^\top \left( C - \sum_{i\in [m']} y_i A_i \right)x^{\otimes d}
&= (x^{\otimes d})^\top \left( \sum_{i\in[n^{d+1}]} f_i f_i^\top \right)x^{\otimes d} \\ 
&= \sum_{i\in[n^{d+1}]} \langle x^{\otimes d}, f_i\rangle^2 \\
&= \sum_{i \in [m']} f_i(x)^2,
\end{align}

where we have identified the vectors {\textstyle  
f_i

  } with the coefficients of a polynomial of degree at most  d
  . This gives a sum-of-squares proof that the value {\textstyle  p(x) = \langle C, x^{\otimes d} (x^{\otimes d})^\top \rangle  } over  
A \subset \mathbb{R}^n

  is at least  
b^\top y

  , since we have that

 
\begin{align}
(x^{\otimes d})^\top C x^{\otimes d} 
&\ge   \sum_{i \in [(n+1)^d]} y_i \cdot (x^{\otimes d})^\top A_i x^{\otimes d}, \\
&\ge \sum_{i \in [(n+1)^d]} y_i \cdot b_i,
\end{align}

where the final inequality comes from the constraint {\textstyle  a_i(x) \ge b_i  } describing the feasible region  
A \subset \mathbb{R}^n

  .

Sum-of-squares hierarchy

The sum-of-squares hierarchy (SOS hierarchy), also known as the Lasserre hierarchy, is a hierarchy of convex relaxations of increasing power and increasing computational cost. For each natural number {\textstyle d \in \mathbb{N}} the corresponding convex relaxation is known as the {\textstyle d}th level or {\textstyle d}th round of the SOS hierarchy. The {\textstyle 1}st round, when {\textstyle d=1}, corresponds to a basic semidefinite program, or to sum-of-squares optimization over polynomials of degree at most 2. To augment the basic convex program at the {\textstyle 1}st level of the hierarchy to {\textstyle d}th level, additional variables and constraints are added to the program to have the program consider polynomials of degree at most 2d.

The SOS hierarchy derives its name from the fact that the value of the objective function at the {\textstyle d}th level is bounded with a sum-of-squares proof using polynomials of degree at most {\textstyle 2d} via the dual (see "Duality" above). Consequently, any sum-of-squares proof that uses polynomials of degree at most {\textstyle 2d} can be used to bound the objective value, allowing one to prove guarantees on the tightness of the relaxation.

In conjunction with a theorem of Berg, this further implies that given sufficiently many rounds, the relaxation becomes arbitrarily tight on any fixed interval. Berg's result[4][5] states that every non-negative real polynomial within a bounded interval can be approximated within accuracy {\textstyle \epsilon} on that interval with a sum-of-squares of real polynomials of sufficiently high degree, and thus if {\textstyle OBJ(x)} is the polynomial objective value as a function of the point {\textstyle x}, if the inequality {\textstyle c +\epsilon - OBJ(x) \ge 0} holds for all {\textstyle x} in the region of interest, then there must be a sum-of-squares proof of this fact. Choosing {\textstyle c} to be the minimum of the objective function over the feasible region, we have the result.

Computational cost

When optimizing over a function in {\textstyle n} variables, the {\textstyle d}th level of the hierarchy can be written as a semidefinite program over {\textstyle n^{O(d)}} variables, and can be solved in time {\textstyle n^{O(d)}} using the ellipsoid method.

Sum-of-squares background

A polynomial  p is a sum of squares (SOS) if there exist polynomials  \{f_i\}_{i=1}^m such that  p = \sum_{i=1}^m f_i^2 . For example,

p=x^2 - 4xy + 7y^2

is a sum of squares since

 p = f_1^2 + f_2^2

where

f_1 = (x-2y)\text{ and  }f_2 = \sqrt{3}y.

Note that if  p is a sum of squares then p(x) \ge 0 for all  x \in \R^n. Detailed descriptions of polynomial SOS are available.[6][7][8]

Quadratic forms can be expressed as  p(x)=x^T Q x where  Q is a symmetric matrix. Similarly, polynomials of degree  2d can be expressed as

 p(x)=z(x)^T Q z(x) ,

where the vector z contains all monomials of degree  \le d . This is known as the Gram matrix form. An important fact is that  p is SOS if and only if there exists a symmetric and positive-semidefinite matrix  Q such that p(x)=z(x)^T Q z(x) . This provides a connection between SOS polynomials and positive-semidefinite matrices.

Software tools

References

  1. Tan, W., Packard, A., 2004. "Searching for control Lyapunov functions using sums of squares programming". In: Allerton Conf. on Comm., Control and Computing. pp. 210219.
  2. Tan, W., Topcu, U., Seiler, P., Balas, G., Packard, A., 2008. Simulation-aided reachability and local gain analysis for nonlinear dynamical systems. In: Proc. of the IEEE Conference on Decision and Control. pp. 4097–4102.
  3. A. Chakraborty, P. Seiler, and G. Balas, “Susceptibility of F/A-18 Flight Controllers to the Falling-Leaf Mode: Nonlinear Analysis,” AIAA Journal of Guidance, Control, and Dynamics, Vol.34 No.1, 2011, 73–85.
  4. Berg, Christian (1987). Landau, Henry J., ed. "The multidimensional moment problem and semigroups". Proceedings of Symposia in Applied Mathematics.
  5. Lasserre, J. (2007-01-01). "A Sum of Squares Approximation of Nonnegative Polynomials". SIAM Review 49 (4): 651–669. doi:10.1137/070693709. ISSN 0036-1445.
  6. Parrilo, P., (2000) Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. Ph.D. thesis, California Institute of Technology.
  7. Parrilo, P. (2003) "Semidefinite programming relaxations for semialgebraic problems". Mathematical Programming Ser. B 96 (2), 293–320.
  8. Lasserre, J. (2001) "Global optimization with polynomials and the problem of moments". SIAM Journal on Optimization, 11 (3), 796{817.
This article is issued from Wikipedia - version of the Wednesday, April 13, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.