## Friday, November 1, 2013

### Underdetermined Systems in R

In this post, we begin a study of underdetermined systems.   We'll motivate this study by considering an application problem in competitive intelligence.  We note from the start, however, that this post is less about finding a particular solution to an underdetermined problem and more about ways to explore the possible solutions.

Problem:  We are trying to learn about the sales regions of a key competitor.  Suppose the competitor sells three versions of a product: basic, standard and luxury.  The product is sold in two regions: North and South.  Competitive intelligence mined from the competitor's annual reports gives us the following revenue schedule:

 Revenue  (k USD) by Region/Product Basic Standard Luxury TOTAL North x1 x2 x3 650 South x4 x5 x6 350 TOTAL 300 450 250 1000

The values of x1,x2,...,x6 are unknown revenue amounts for a product/region combination.  In this problem, we would like to investigate the possible values of x1,x2,...,x6 which are consistent with the known revenue totals appearing in the margins of the table.  Any set of values x1,x2,x3,...,x6 that is consistent with all the known revenue totals, we'll call a feasible solution.

We can read off the following 5 relations from the table:

\begin{align} x_1 + x_2 + x_3 &= 650 \\ x_4 + x_5 + x_6 &= 350 \\ x_1 + x_4 &= 300 \\ x_2 + x_5 &= 450 \\ x_3 + x_6 & =250 \end{align}
Actually, there is a bit of redundancy in these equations.  For example, the first equation is equal to the sum of the last three equations minus the second equation.  It's important to check for and remove all redundancy from the system for the sake of the computations we'll do below.  We remove one of the redundant equations and rewrite the system in the matrix form as:

$$\left[ \begin{matrix} 1 & 1 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 \end{matrix} \right] \left[ \begin{matrix} x_1 \\ x_2 \\ x_3 \\ x_4 \\ x_5 \\ x_6 \end{matrix} \right] = \left[ \begin{matrix} 650 \\ 300 \\ 450 \\ 250 \end{matrix} \right].$$

 A <- matrix(c(
1,1,1,0,0,0,
1,0,0,1,0,0,
0,1,0,0,1,0,
0,0,1,0,0,1),
byrow=TRUE,nrow=4, ncol=6)
f <- matrix(c(400,450,250,300), nrow=4, ncol=1)


This system, which we express symbolically as $A\vec{x} = \vec{f}$, is an underdetermined linear system since it consists of 6 unknowns but only 4 equations.  A key fact to recognize is that an underdetermined linear system that has a solution will always have infinitely many.

We will investigate three issues around underdetermined systems in this post:
1. When does a system have a solution?
2. If an underdetermined system has solutions, how can we generate them?
3. Once we have a method for generating solutions, can we develop any criteria for preferring one particular solution among the infinitely many possible solutions?

# Consistency and Stability

The first issue is easy to handle.  A system of linear equations is called consistent if it has one or more solutions. The system $A\vec{x} = \vec{f}$  has a solution whenever $\vec{f}$ is in the column space of $A$, i.e. whenever $\vec{f}$ is a linear combination of the columns of $A$.  This leads us to consider the notion of matrix rank: the rank of $A$ is the number of linearly independent columns in $A$.  In these terms, we have the following consistency test: a system is consistent if and only if the rank of $A$ is the same as the rank of $[A|\vec{f}]$, the matrix attained by attaching $\vec{f}$ as a column to $A$.

Here's one way to implement the consistency test in R:
 > require(Matrix)
> rankMatrix(A)
[1] 4
attr(,"method")
[1] "tolNorm2"
[1] FALSE
attr(,"tol")
[1] 2.76354e-15

> rankMatrix(cbind(A,f))
[1] 4
attr(,"method")
[1] "tolNorm2"
[1] FALSE
attr(,"tol")
[1] 1.377126e-12


We can see that $A$ and $[A|f]$ have the same rank, so that $A\vec{x} = \vec{f}$ has solutions.  Note that accurate computation of the rank of $A$ (as well as the QR decomposition of $A$ that we perform below) require that the matrix $A$ be numerically "well-behaved".  One way to quantify numerical "well-behaved" is to use the concept of the condition number of $A$.  In base R, the condition number of rectangular matrix $A$ is

$$\kappa(A) = \frac{\sigma_{\max}(A)}{\sigma_\min (A)}$$
where $\sigma_\max,\sigma_\min$ are the maximum and minimum non-zero singular values of $A$.

 > kappa(A)
[1] 2.987313


The condition number of $A$ roughly describes how sensitive the output of the transformation $A$ is to a small change in the input.  We should be worried about doing any computations with $A$ if $\kappa(A)$ is very large.  Fortunately, our condition number here is quite reasonable.  Note that a large condition number may suggest that $A$ contains some redundant rows.

# The Space of Solutions

For real-valued $A$, $\vec{f}$, every solution to the system
$$A \vec{x} = \vec{f}$$
can be written in the form
$$\vec{x} = \vec{x}_0 + \sum_{i=1}^{n_0} w_i q_i$$
where
• $\vec{x}_0$ is any particular solution to $A\vec{x} = \vec{f}$
• $q_1,...,q_{n_0}$ is a basis for the nullspace of $A$
• $w_1,...,w_n$ are real valued coefficients
• $n_0$ is the dimension of the nullspace of $A$.  If $A$ is $n \times m$ with rank $m$ then the rank-nullity theorem tells us that $n_0 = n-m$.
We will write this representation in the shorthand
$$\vec{x} = \vec{x}_0 + Q \vec{w}$$
where $Q$ is a $n \times (m-n)$ matrix whose columns are a basis for the nullspace of $A$ and $\vec{w} \in \mathbb{R}^{n-m}$.  We note that in general there are an infinite number of bases for the nullspace of a matrix, so this representation cannot be unique.

We now want to describe procedures in R that will allow us to numerically determine (1) a particular solution $\vec{x}_0$ to $A\vec{x} = \vec{f}$ and (2) a convenient choice of the matrix $Q$.

To obtain a particular solution of $A\vec{x} = \vec{f}$, we can make use of the pseudoinverse of $A$.  Assuming the $m \times n$ matrix $A$ has full rank with $m <n$, the matrix $AA^T$ is an $m \times m$ matrix of full rank, that is $AA^T$ is invertible.  It is not hard to prove the identity
$$A[A^T (AA^T)^{-1}] = I_n$$
where $I_n$ is the $n \times n$ identity matrix.  Right-multiplying both sides of this equation by $\vec{f}$, we obtain
$$A[A^T (AA^T)^{-1} \vec{f}] = \vec{f}$$.
Thus a particular solution of $A \vec{x} = \vec{f}$ is
$$\vec{x}_0 = A^T (AA^T)^{-1} \vec{f}.$$
Though there are infinitely many particular solutions we could use, the solution $\vec{x}_0$ turns out to be the solution with smallest Euclidean norm. Here is one way to compute $\vec{x}_0$ in R:

 x0 <- t(A)%*%solve(A%*%t(A),f)


For the example problem:
 > x0
[,1]
[1,] 191.66667
[2,] 91.66667
[3,] 116.66667
[4,] 258.33333
[5,] 158.33333
[6,] 183.33333


Next, we need to obtain a basis for the nullspace of $A$.  One approach for this is to apply a QR factorization to $A^T$.  This will decompose $A^T$ as
$$A^T = \left[ \begin{matrix} Q_1 & Q_2 \end{matrix} \right] \left[ \begin{matrix} R_1 \\ R_2 \end{matrix} \right]$$
where $Q_1$ is $n \times m$ and $Q_2$ is $n \times (n-m)$.  The columns of $Q_2$ are a basis for the null space of $A$.

Here is an R implementation:
 QR <- qr(t(A))
Q <- qr.Q(QR,complete=TRUE)[,-(1:QR$rank)] > Q [,1] [,2] [1,] 0.2654822 0.5126915 [2,] -0.5767449 -0.0264314 [3,] 0.3112627 -0.4862601 [4,] -0.2654822 -0.5126915 [5,] 0.5767449 0.0264314 [6,] -0.3112627 0.4862601  We now have our solution representation $\vec{x} = \vec{x}_0 + Q \vec{w}$. To see that this is really a solution for any choice of $\vec{w}$, note that $$A Q = 0_{n \times (n-m) }$$ and compute $$A \vec{x} = A \vec{x}_0 + A Q \vec{w} = \vec{f} + 0_{n \times (n-m)} \vec{w} = \vec{f}.$$ # Criteria for Selecting a Solution Now we consider the question of whether there is any criteria to prefer one solution of $A\vec{x} = \vec{f}$ to another. One obvious requirement is that our solution should have all non-negative components $\vec{x} \geq 0$. How can we ensure that $\vec{x} = \vec{x}_0 + Q \vec{w}$ is non-negative? Satisfying this condition will require us to choose $\vec{w}$ carefully. In the context of our problem, it turns out that the set of $\vec{w} = (w_1,w_2)$ which satisfy $\vec{x}_0 + Q \vec{w} \geq 0$ defines a convex polygon in the $w_1,w_2$ plane. [More generally, the set of $\vec{w} \in \mathbb{R}^m$ such that $\vec{x}_0 + Q \vec{w} \geq 0$ defines a convex polytope in $\mathbb{R}^m$.] We could imagine a scheme that generates positive solutions by finding vectors lying within this convex polygon. This is a rather interesting idea but will be very difficult to implement for even moderately sized problems. Even just the task of finding all the vertices of a polygon (polytope) defined by a matrix inequality $A\vec{x} \leq b$ is quite difficult! We'll leave this line of thinking for another post. Another approach is to formulate the problem as a linear program: $$\left\{ \begin{array}{l} \mathrm{minimize:} \qquad \vec{c} \cdot \vec{x} \\ \mathrm{subject \; to:} \quad A\vec{x}= \vec{f}, \qquad x \geq 0 \end{array} \right.$$ For this particular application, the vector $\vec{c}$ can be chosen to be anything we like, since the goal is to find a solution that satisfies the $A \vec{x} = \vec{f}$ and $\vec{x} \geq 0$ constraints. By varying the $\vec{c}$ function, we'll get different solutions to the system. Here is an implementation using the R package linprog:  > require(linprog) > cvec <- rep(1,6) > solveLP( cvec, f, A, const.dir=rep("=",4), lpSolve=TRUE) Results of Linear Programming / Linear Optimization (using lpSolve) Objective function (Minimum): 1000 Solution opt 1 0 2 100 3 300 4 450 5 150 6 0 Constraints actual dir bvec free 1 400 = 400 0 2 450 = 450 0 3 250 = 250 0 4 300 = 300 0  A drawback of the linear programming approach is that (in the absence of further information) it tends to choose the sparsest possible solutions. That is, it "forces" zeros into the solution vector. The reason for this is fairly clear from the mechanics of the simplex algorithm used to solve linear programs. We see that for the example problem, linear programming gives us the solution with x1=x6=0. Varying the $\vec{c}$ gives different but still sparse solutions to the system. For an algorithm that does not give preference to the sparsest solutions of $A\vec{x}=\vec{f}$, we can turn to nonlinear optimization. Here is one simple approach. We observe that for the example problem $\sum_{i=1}^6 x_i = 1000$ and so $$\mu = \frac{1}{6} \sum_{i=1}^6 x_i = \frac{1000}{6} \approx 166.667.$$ We then define the objective function $$F( \vec{x} ) = \sum_{i=1}^6 (x_i - \mu)^2$$ We seek the vector $\vec{x} \in \mathbb{R}^6$ such that $\vec{x}$ minimizes $F(\vec{x})$ and satisfies $$\vec{x} \geq 0 \qquad A \vec{x} = \vec{f}.$$ In words, we must minimize the squared deviation between the solution components and the mean, subject to the system constraints. This is a simple example of a quadratic program. There many approaches available to solve this species of optimization problem and R has several dedicated libraries. We will use the 'quadprog' library here. This library solves quadratic programming problems of the form $$\left\{ \begin{array}{l} -\vec{d}^T \vec{x} + \frac{1}{2} \vec{x}^T D \vec{x} \\ A_1^T \vec{x} = \vec{f}_1 \\ A_2^T \vec{x} \geq \vec{f}_2 \end{array} \right.$$ Here $D$ is a $n \times n$ symmetric, positive definite matrix. To cast our problem in this form, we write $$F(\vec{x}) = (\vec{x} - \vec{\mu} )^T (\vec{x} - \vec{\mu}) = \vec{x}^T \vec{x} - 2 \vec{\mu}^T \vec{x} + \vec{\mu}^T \vec{\mu} \geq \vec{x}^T \vec{x} - 2 \vec{\mu}^T \vec{x}$$ where $\vec{\mu} = \mu (1,\ldots, 1)$. We recognize that minimizing $F(\vec{x})$ is equivalent to minimizing $$\tilde{F}(\vec{x}) = \frac{1}{2} \vec{x}^T D \vec{x} - \vec{d}^T \vec{x},$$ where $\vec{d} = 2 \vec{\mu}$ and $D = 2 I$. Then, we set $A_1 = A^T$, $A_2 = I$, $\vec{f}_1 = \vec{f}$, and $\vec{f}_2 = \vec{0}$. Here is the R implementation:  require(quadprog) Dmat <- 2*diag(6) dvec <- 2*(1000/6)*rep(1,6) A1 <- A A2 <- diag(6) Amat <- t(rbind(A1,A2)) f1 <- f f2 <- rep(0,6) bvec <- c(f1,f2)  Now, we call the solver (note that the meq=4 parameter tells quadprog that the first four constraints in our system are equality constraints and the rest are inequality constraints):  > solve.QP(Dmat, dvec, Amat, bvec, meq=4)$solution
[1] 191.66667 91.66667 116.66667 258.33333 158.33333 183.33333
$value [1] -149166.7$unconstrained.solution
[1] 166.6667 166.6667 166.6667 166.6667 166.6667 166.6667


Clearly, we can generate many different solutions to the underdetermined system by modifying the objective function $F(\vec{x})$.  For example, replacing $\vec{\mu}$ with 0, we'll obtain the smallest 2-norm non-negative solution that satisfies $A\vec{x} = \vec{f}$.  This should agree with $\vec{x}_0$ in the case where it happens that $\vec{x}_0 \geq 0$.

The functional form of $F(\vec{x})$ allows us to assign preference to particular feasible solutions.  If we have additional expectations/beliefs about the values of x1,...,x6, these can be built into the objective function.  In the simple example above, our additional expectation is that the individual solution components shouldn't be very different from the mean.  Clearly, much more complex ideas can be built into the objective function to find feasible solutions that are also consistent with our expectations about x1,x2,...,x6.