matrix_derivatives - rasigadelab/mathwiki GitHub Wiki
Matrix Derivatives of $C = U L U^T$ with Respect to $U$ and $l$
Let $U$ be a square matrix of size $n$, and $L$ be a diagonal matrix with diagonal elements in the vector $l$. Define:
$$ C = U L U^T $$
This document provides the expressions for $C$ in index notation, as well as its first and second derivatives with respect to $U$ and $l$.
1. Expression for $C = U L U^T$ in Index Notation
Let:
- $U$ be an $n \times n$ matrix with elements $U_{ij}$.
- $L$ be a diagonal $n \times n$ matrix with diagonal elements given by the vector $\vec{l}$, so $L_{ij} = l_i \delta_{ij}$, where $\delta_{ij}$ is the Kronecker delta.
- $C = U L U^T$.
In index notation, the element $C_{ij}$ of the matrix $C$ can be written as:
$$ C_{ij} = \sum_{k} U_{ik} , l_k , U_{jk} $$
2. First Derivative of $C$ with Respect to $U$ and $\vec{l}$
Derivative of $C$ with Respect to $U$
To find $\frac{\partial C_{ij}}{\partial U_{pq}}$, we differentiate each term in $C_{ij}$ with respect to $U_{pq}$.
$$ \frac{\partial C_{ij}}{\partial U_{pq}} = \sum_k \left( \frac{\partial U_{ik}}{\partial U_{pq}} , l_k , U_{jk} + U_{ik} , l_k , \frac{\partial U_{jk}}{\partial U_{pq}} \right) $$
Using the Kronecker delta to evaluate these partial derivatives, we have:
$$ \frac{\partial U_{ik}}{\partial U_{pq}} = \delta_{ip} \delta_{kq} \quad \text{and} \quad \frac{\partial U_{jk}}{\partial U_{pq}} = \delta_{jp} \delta_{kq} $$
Substituting these into the expression, we get:
$$ \frac{\partial C_{ij}}{\partial U_{pq}} = l_q , U_{jq} , \delta_{ip} + l_q , U_{iq} , \delta_{jp} $$
Derivative of $C$ with Respect to $\vec{l}$
Now, we compute the derivative of $C_{ij}$ with respect to an element $l_m$ of $\vec{l}$.
Since $l_k$ appears only once in the expression for $C_{ij}$, differentiating with respect to $l_m$ selects only the term where $k = m$:
$$ \frac{\partial C_{ij}}{\partial l_m} = U_{im} , U_{jm} $$
3. Second Derivatives of $C$
Second Derivative with Respect to $U$ (Mixed Terms)
For the second derivative of $C_{ij}$ with respect to two different elements of $U$, say $U_{pq}$ and $U_{rs}$, we differentiate $\frac{\partial C_{ij}}{\partial U_{pq}}$ again with respect to $U_{rs}$.
From our earlier result:
$$ \frac{\partial C_{ij}}{\partial U_{pq}} = l_q , U_{jq} , \delta_{ip} + l_q , U_{iq} , \delta_{jp} $$
Differentiating this with respect to $U_{rs}$:
$$ \frac{\partial^2 C_{ij}}{\partial U_{pq} , \partial U_{rs}} = \frac{\partial}{\partial U_{rs}} \left( l_q , U_{jq} , \delta_{ip} + l_q , U_{iq} , \delta_{jp} \right) $$
Applying the Kronecker delta as before, we find:
$$ \frac{\partial^2 C_{ij}}{\partial U_{pq} , \partial U_{rs}} = l_q , \delta_{jq} , \delta_{ir} , \delta_{ps} + l_q , \delta_{iq} , \delta_{jp} , \delta_{qs} $$
Second Derivative with Respect to $l$
The second derivative with respect to the components of $l$ is straightforward. Since
$$ \frac{\partial C_{ij}}{\partial l_m} = U_{im} , U_{jm} $$
we take the derivative with respect to $l_n$:
$$ \frac{\partial^2 C_{ij}}{\partial l_m , \partial l_n} = 0 \quad \text{for} \quad m \neq n $$
and
$$ \frac{\partial^2 C_{ij}}{\partial l_m , \partial l_m} = 0 $$
Mixed Second Derivative with Respect to $U$ and $l$
For the mixed second derivative with respect to $U_{pq}$ and $l_m$:
$$ \frac{\partial^2 C_{ij}}{\partial U_{pq} , \partial l_m} = \frac{\partial}{\partial U_{pq}} \left( U_{im} , U_{jm} \right) $$
Differentiating with respect to $U_{pq}$:
$$ \frac{\partial^2 C_{ij}}{\partial U_{pq} , \partial l_m} = \delta_{ip} , U_{jm} , \delta_{mq} + U_{im} , \delta_{jq} $$
This concludes the expressions for the first and second derivatives of $C = U L U^T$ with respect to $U$ and $\vec{l}$ in index notation.