# 矩阵求导法则

## 标量求导

$y$为一个元素，$x^T = [x_1...x_q]$$q$维行向量，则：

$\frac{\partial y}{\partial x^T} = [\frac{\partial y}{\partial x_1}...\frac{\partial y}{\partial x_q}]$

## 向量求导

$y^T=[y_1...y_n]$$n$维行向量，$x=[x_1,...,x_p]$$p$维列向量，则：

\begin{aligned} \frac{\partial y^{T}}{\partial x} &=\left[\frac{\partial y_{1}}{\partial x} \cdot \cdots \frac{\partial y_{n}}{\partial x}\right] \\ &=\left[\begin{array}{lll} \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{n}}{\partial x_{1}} \\ \cdots & \cdots & \cdots \\ \frac{\partial y_{1}}{\partial x_{p}} & \cdots & \frac{\partial y_{n}}{\partial x_{p}} \end{array}\right] \end{aligned}

## 矩阵求导

$Y=\left(\begin{array}{lll} y_{11} & \ldots & y_{1 n} \\ \ldots & \ldots & \ldots \\ y_{m 1} & \ldots & y_{m n} \end{array}\right)$

$m\times n$矩阵，$x=[x_1,...,x_p]$$p$维列向量，则：

$\frac{\partial Y}{\partial x} = [\frac{\partial Y}{\partial x_1},...,\frac{\partial Y}{\partial x_p}]$

# 矩阵微积分

## 常见求导性质

### 实值函数相对于实向量的梯度

$f(x) = x = [x_1,...,x_n]^T$

$\frac{\partial f (x)}{\partial x^T} = \frac{\partial x}{\partial x^T} = I_{n\times n}$

$\frac{\partial (f (x))^T}{\partial x} = \frac{\partial x^T}{\partial x} = I_{n\times n}$

$\frac{\partial f (x)}{\partial x} = \frac{\partial x}{\partial x} = vec(I_{n\times n})$

$\frac{\partial (f (x))^T}{\partial x^T} = \frac{\partial x^T}{\partial x^T} = vec(I_{n\times n})^T$

## 常见性质

• $f(x) = Ax$，则

$\frac{\partial f (x)}{\partial x^T} = \frac{\partial (Ax)}{\partial x^T} =A$

• $f(x) = x^TAx$，则

$\frac{\partial f (x)}{\partial x} = \frac{\partial (x^TAx)}{\partial x} =Ax+A^Tx$

• $f(x) = a^Tx$，则

$\frac{\partial a^Tx}{\partial x} = \frac{\partial x^Ta}{\partial x} =a$

• $f(x) = x^TAy$，则

$\frac{\partial x^TAy}{\partial x} = Ay$

$\frac{\partial x^TAy}{\partial A} = xy^T$

• $df(X) = tr((\frac{\partial f(X)}{\partial X})^T d X)$

• 矩阵微分也满足线性法则、乘积法则。

• 矩阵的逆的微分

$d(X^{-1}) = -X^{-1}(dX)X^{-1}$

## 迹函数

### 迹函数相对于矩阵的梯度

$\frac{\partial (tr (ZZ^T))}{\partial Z} = \frac{\partial (tr (Z^TZ))}{\partial Z} = 2Z$

### 矩阵微分算子和迹算子的可交换性

$d(tr(X)) = tr(d(X)) = \sum\limits_{i=1}^{n} dx_{ii}$

### 常见性质

• $\frac{\partial tr(A)}{\partial A} = I_{n\times n}$

• $\frac{\partial tr(AB)}{\partial A} = B^T$

• $d(tr(AXB)) = tr(A(dX)B) = tr(BA(dX))$

$\frac{\partial tr(AXB)}{\partial X} = (BA)^T = A^TB^T$

• $d(tr(AX^{-1}B)) = tr(A(dX^{-1})B) = -tr(AX^{-1}(dX)X^{-1}B) = -tr(X^{-1}BAX^{-1}dX)$

$\frac{\partial tr(AX^{-1}B)}{\partial X} = -(X^{-1}BAX^{-1})^T = -X^{-T}A^TB^TX^{-T}$

• $\frac{\partial tr(X^TX)}{\partial X} = 2X$

## 行列式

### 行列式相对于矩阵的梯度

$\frac{\partial |Z|}{\partial Z} = |Z|(Z^{-1})^T$

### 微分形式

$d|X| = tr(|X| X^{-1} dX)$

### 常见性质

\begin{aligned} d|A X B| &=\operatorname{tr}\left(|A X B|(A X B)^{-1} d(A X B)\right) \\ &=\operatorname{tr}\left(|A X B|(A X B)^{-1} A(d X) B\right) \\ &=\operatorname{tr}\left(|A X B| B(A X B)^{-1} A(d X)\right) \end{aligned}

$\frac{\partial|A X B|}{\partial X}=|A X B| A^{T}\left(B^{T} X^{T} A^{T}\right)^{-1} B^{T}$

$\frac{\partial|X|}{\partial X}=|X| X^{-T}$

$\frac{\partial |XX^T|}{\partial X} = 2|XX^T| (XX^{T})^{-1}X$