Chapter 1. Systems, vectors and matrices - pa314roq/linalgkit-project GitHub Wiki
Gauss-Jordan method. Rouchè-Frobenious theorem
Consider a system of $m$ linear equations and $n$ unknowns,
$$ \left\{ \begin{array}{ccccccccc} a_{11} x_1 &+& a_{12} x_2 &+& \cdots &+& a_{1n} x_n &=& b_1 \ a_{21} x_1 &+& a_{22} x_2 &+& \cdots &+& a_{2n} x_n &=& b_2 \ \vdots & & \vdots & & & & \vdots & & \vdots \ a_{m1} x_1 &+& a_{m2} x_2 &+& \cdots &+& a_{mn} x_n &=& b_m \ \end{array} \right. $$
There are two matrices associated with a system like the previous one:
$$\begin{array}{cc} \left(\begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1n} \ a_{21} & a_{22} & \cdots & a_{2n} \ \vdots & \vdots & \ddots & \vdots \ a_{m1} & m_{m2} & \cdots & a_{mn} \end{array}\right) & \left(\begin{array}{cccc|c} a_{11} & a_{12} & \cdots & a_{1n} & b_1 \ a_{21} & a_{22} & \cdots & a_{2n} & b_2 \ \vdots & \vdots & \ddots & \vdots & \vdots \ a_{m1} & a_{m2} & \cdots & a_{mn} & b_m \end{array}\right) \ \quad : : \text{Coefficient matrix} & \qquad \text{Augmented matrix} \end{array}$$
- The coefficient matrix contains only the coefficients of the system.
- The augmented matrix contains the coefficients and the independent terms (constants on the other side of the equation).
If the augmented matrix of the previous system is,
$$\left(\begin{array}{cccc|c} a_{11} & a_{12} & \cdots & a_{1n} & b_1 \ a_{21} & a_{22} & \cdots & a_{2n} & b_2 \ & & \vdots & & \vdots \ a_{m1} & a_{m2} & \cdots & a_{mn} & b_m \end{array}\right)$$
The Gauss-Jordan reduction method (GJ) consists of reducing the augmented matrix of the system to its reduced row echelon form using elementary row operations (operations among the system's equations). From this last matrix, we can directly obtain the solution to the system, if it exists.
The reduced row echelon matrix is a matrix with the following characteristics:
- Each step has height one.
- All numbers below the steps are zero.
- In every corner of each step, there is a 1.
- Every column containing a step has all its elements zero (except for the step itself, which is 1).
In general, systems can be consistent if they admit at least one solution, or inconsistent if the system has no solution. If they are consistent, they are called determined if they have a unique solution; or undetermined if they have infinitely many solutions.
In the study of solutions to a system of equations with $m$ equations and $n$ unknowns, the steps obtained in the coefficient and augmented matrices (when we reduce them) are crucial. Let $A$ be the coefficient matrix of the system and $\bar{A}$ the augmented matrix. When we reduce the system using GJ, both matrices become row echelon: let $p$ and $\bar{p}$ be the steps of the reduced coefficient and augmented matrices, respectively. Then, to determine the compatibility of the system, it is only necessary to compare the number of unknowns $n$ with the steps $p$ and $\bar{p}$:
Rouché–Frobenius Theorem
Given a system of $m$ equations and $n$ unknowns, we have:
- If $n = p = \bar{p} \quad \Leftrightarrow \quad$ The system is consistent and determined.
- If $n > p = \bar{p} \quad \Leftrightarrow \quad$ The system is consistent and undetermined.
- If $p \neq \bar{p} \quad \Leftrightarrow \quad$ The system is inconsistent.
The set of elementary operations we perform on the matrix may vary, but the row echelon matrix will always be the same (unique). This leads to multiple ways of reducing a system and, therefore, to multiple algorithms that reduce the matrix using different criteria, even though the same reduction method is followed.
The next example will discuss how to reduce a given matrix with the GJ method. We will go through the steps and explain the criteria chosen for the algorithm.
Working example
Let's start reducing the following matrix:
matrix = np.array([[1, 2, 3, 4, 5],
[2, 4, 6, 8, 10],
[3, 8, 9, 12, 18],
[4, 11, 12, 17, 20]])
We have a $4 \times 5$ matrix from which we want to obtain its reduced row echelon form using Gauss–Jordan elimination (GJ). As previously mentioned, there are multiple ways to reduce a matrix, but the row echelon matrix obtained must be the same for everyone. Let's begin by looking at one of the possible ways to implement the GJ method.
First, take a copy of the original matrix on which we will perform elementary row operations:
# Copy of the matrix and shape
A = matrix.astype(float).copy()
m, n = matrix.shape
where we have changed the matrix datatype in order to perform operations.
The GJ method begins by taking the first row of the matrix: the first element of that row will be used to eliminate the elements below it. Let’s call k
the index of the row we are focusing on, and l
the index of the column where the first element of row k
is located. These two indices will define each step of our algorithm.
k = l = 0
Step Obviously, at this step, we are taking the first row and column of the matrix, which can be schematically represented as follows:
l = 0
|
k = 0 ---> [[ 1 2 3 4 5]
[ 2 4 6 8 10]
[ 3 8 9 12 18]
[ 4 11 12 17 20]]
These two indices represent the row row = A[k, l:]
and the column col = A[k:, l]
of A
, which will be useful for the operations we are going to perform:
# Prepare row and column of the matrix
k, l = 0, 0
col = A[k:, l]
row = A[k, l:]
Note. It is possible that the first row we take is not very convenient and we may want to pivot with another, more manageable one. In this case, the first row has a 1 in its first element, which is one of the most convenient situations: let’s leave the pivoting problem for a later step.
Now let’s perform the elementary row operations. To do this, we need to use the first row to eliminate the coefficients along column l
. We can do this as follows:
print("Full matrix:")
print(A)
# Indices of the rows to be eliminated
idx = np.arange(m)
idx = idx[idx != k] # All rows except the one we are focusing on
# Subarray and coefficients
block = A[idx, l:] # Subarray of the selected rows
coef = block[:, [0]] # First coefficients of those rows
print("Block to eliminate:")
print(block)
print("First coefficients:")
print(coef)
# Operation
A[idx, l:] = block - coef * row # Elementary row operations
print("After the operation:")
print(A)
which yields:
Full matrix:
[[ 1. 2. 3. 4. 5.]
[ 2. 4. 6. 8. 10.]
[ 3. 8. 9. 12. 18.]
[ 4. 11. 12. 17. 20.]]
Block to eliminate:
[[ 2. 4. 6. 8. 10.]
[ 3. 8. 9. 12. 18.]
[ 4. 11. 12. 17. 20.]]
Fist coefficients:
[[2.]
[3.]
[4.]]
After the operation:
[[1. 2. 3. 4. 5.]
[0. 0. 0. 0. 0.]
[0. 2. 0. 0. 3.]
[0. 3. 0. 1. 0.]]
The final print shows that we succeed to eliminate the elements of the first column.
k = l = 1
Step We now move to the second row and column of our matrix (k = l = 1
). Again, schematically we would have:
l = 1
|
[[1. 2. 3. 4. 5.]
k = 1 --> [0. 0. 0. 0. 0.]
[0. 2. 0. 0. 3.]
[0. 3. 0. 1. 0.]]
We take the corresponding row and column:
k, l = 1, 1
col = A[k:, l]
row = A[k, l:]
print("Row:", row)
print("Column:", col)
Row: [0. 0. 0. 0.]
Column: [0. 2. 3.]
The row we are currently at is not suitable for performing operations (it's all zeros), so it will be necessary to pivot rows. The criterion we will use is the following:
Among all the nonzero elements of the column
col = A[k:, l]
, we look for the row that contains the maximum in absolute value, and we pivot this row with rowk
.
In our case, the maximum value of the column col
is 3
, which is found in row 3
, and it should be pivoted with row k=1
. This can be done as follows:
# Indices of the nonzero elements
nonzero_idx = col.nonzero()[0]
# Index of the maximum absolute value in `col`
abs_col = np.abs(col)
max_nonzero_idx = nonzero_idx[np.argmax(abs_col[abs_col != 0])]
# Row of the matrix where that value is found
row_idx = max_nonzero_idx + k
print("Current row:")
print(row)
print("Row to pivot with:")
print(A[row_idx, l:])
# Pivot
A[[k, row_idx], l:] = A[[row_idx, k], l:]
print("Matrix after pivoting:")
print(A)
Current row:
[0. 0. 0. 0.]
Row to pivot with:
[3. 0. 1. 0.]
Matrix after pivoting:
[[1. 2. 3. 4. 5.]
[0. 3. 0. 1. 0.]
[0. 2. 0. 0. 3.]
[0. 0. 0. 0. 0.]]
Once we have pivoted the rows, it is convenient to turn the first element of row k
into 1. This is commonly known as normalization of the first element:
# Normalization of the first element
A[k, l:] = row / row[0]
Now we perform matrix operations between rows:
print("Full matrix:")
print(A)
# Indices of the rows to be eliminated
idx = np.arange(m)
idx = idx[idx != k] # All rows except the one we are focusing on
# Subarray and coefficients
block = A[idx, l:] # Subarray of the selected rows
coef = block[:, [0]] # First coefficients of those rows
print("Block to eliminate:")
print(block)
print("First coefficients:")
print(coef)
# Operation
A[idx, l:] = block - coef * row # Elementary row operations
print("After the operation:")
print(A)
Full matrix:
[[1. 2. 3. 4. 5. ]
[0. 1. 0. 0.33333333 0. ]
[0. 2. 0. 0. 3. ]
[0. 0. 0. 0. 0. ]]
Block to eliminate:
[[2. 3. 4. 5.]
[2. 0. 0. 3.]
[0. 0. 0. 0.]]
First coefficients:
[[2.]
[2.]
[0.]]
After the operation:
[[ 1. 0. 3. 3.33333333 5. ]
[ 0. 1. 0. 0.33333333 0. ]
[ 0. 0. 0. -0.66666667 3. ]
[ 0. 0. 0. 0. 0. ]]
k = l = 2
Step We move to column and row k = l = 2
. Schematically, we have:
l = 2
|
[[ 1. 0. 3. 3.33333333 5. ]
[ 0. 1. 0. 0.33333333 0. ]
k = 2 [ 0. 0. 0. -0.66666667 3. ]
[ 0. 0. 0. 0. 0. ]]
In this case, the column col = A[k:, l]
is already full of zeros, so no operation is needed. The next step is to move one unit to the right in the column, without moving from row k
. Since this is supposed to be an iterative algorithm, at each step we should check if our column is full of zeros:
k, l = 2, 2
col = A[k:, l]
row = A[k, l:]
# Nonzero elements
nonzero_idx = col.nonzero()[0]
# Check if columns is full of zeros
nonzero_idx.size == 0
True
In the loop we define, this will be checked at each step. If nonzero_idx.size == 0
holds, we move one unit to the right in the column (l += 1
) and return to the beginning; otherwise, we continue as before.
k =2, l =3
Step We are now at row k = 2
and column l = 3
:
l = 3
|
[[ 1. 0. 3. 3.33333333 5. ]
[ 0. 1. 0. 0.33333333 0. ]
k = 2 [ 0. 0. 0. -0.66666667 3. ]
[ 0. 0. 0. 0. 0. ]]
We take the corresponding row and column:
k, l = 2, 3
col = A[k:, l]
row = A[k, l:]
print("Column:", col)
print("Row:", row)
Column: [-0.66666667 0. ]
Row: [-0.66666667 3. ]
In this case, we do not need to pivot the row. Normalizing the row:
# Normalization of the first element
A[k, l:] = row / row[0]
Performing operations:
print("Full matrix:")
print(A)
# Indices of the rows to be eliminated
idx = np.arange(m)
idx = idx[idx != k] # All rows except the one we are focusing on
# Subarray and coefficients
block = A[idx, l:] # Subarray of the selected rows
coef = block[:, [0]] # First coefficients of those rows
print("Block to eliminate:")
print(block)
print("First coefficients:")
print(coef)
# Operation
A[idx, l:] = block - coef * row # Elementary row operations
print("After the operation:")
print(A)
Full matrix:
[[ 1. 0. 3. 3.33333333 5. ]
[ 0. 1. 0. 0.33333333 0. ]
[ 0. 0. 0. 1. -4.5 ]
[ 0. 0. 0. 0. 0. ]]
Block to eliminate:
[[3.33333333 5. ]
[0.33333333 0. ]
[0. 0. ]]
First coefficients:
[[3.33333333]
[0.33333333]
[0. ]]
After the operation:
[[ 1. 0. 3. 0. 20. ]
[ 0. 1. 0. 0. 1.5]
[ 0. 0. 0. 1. -4.5]
[ 0. 0. 0. 0. 0. ]]
The matrix is fully reduced, so we are mostly finished.
Last step
Once we have the reduced matrix, we need to count the steps (pivots) of the coefficient and augmented matrices. We can do this by looking at the indices of the nonzero elements remaining in the reduced matrix:
# Number of steps (pivots) in the augmented matrix
aug_s = np.unique(A.nonzero()[0]).size
# Number of steps (pivots) in the coefficient matrix
coef_s = np.unique(A[:, :-1].nonzero()[0]).size
print(aug_s, coef_s)
3 3
Both steps are equal, but less than the number of unknowns (4): the system would be consistent and undetermined.