What is derivatives?

  • A derivative of a function at any point tells us how much a minute increment to the argument of the function will increment the value of the function

To be clear, what we want is not differentiable, but how the change effects the outputs.

  • Based on the fact that at a fine enough resolution, any smooth, continuous function is locally linear at any point. So we can express like this Δy=αΔx \Delta y=\alpha \Delta x

Multivariate scalar function

Δy=α1Δx1+α2Δx2++αDΔxD \Delta y=\alpha_{1} \Delta x_{1}+\alpha_{2} \Delta x_{2}+\cdots+\alpha_{D} \Delta x_{D}

  • The partial derivative αi\alpha_i gives us how yy increments when only xix_i is incremented

  • It can be expressed as: Δy=xyΔx \Delta y=\nabla_{x} y \Delta x

  • where

xy=[yx1yxD] \nabla_{\mathrm{x}} y=\left[\frac{\partial y}{\partial x_{1}} \quad \cdots \quad \frac{\partial y}{\partial x_{D}}\right]


Single variable

  • Three different critical point with zero derivative
  • The second derivative is
    • 0\ge 0 at minima
    • 0\le 0 at maxima
    • =0=0 at inflection points

multiple variables

df(X)=Xf(X)dX d f(X)=\nabla_{X} f(X) d X

  • The gradient is the transpose of the derivative Xf(X)T\nabla_{X} f(X)^{T}(give us the change in f(x)f(x) for tiny variations in XX)
  • This is a vector inner product
    • df(x)d f(x) is max if dXdX is aligned with Xf(X)T\nabla_{X} f(X)^{\mathrm{T}}
    • (Xf(X)T,dX)=0\angle\left(\nabla_{X} f(X)^{\mathrm{T}}, d X\right)=0
  • The gradient is the direction of fastest increase in f(x)f(x)

  • Hessian

Unconstrained Minimization of function

  1. Solve for ZZ where the derivative equals to zero:

Xf(X)=0 \nabla_{X} f(X)=0

  1. Compute the Hessian Matrix at the candidate solution and verify that
    • Hessian is positive definite (eigenvalues positive) -> to identify local minima
    • Hessian is negative definite (eigenvalues negative) -> to identify local maxima

Closed form are not available

  • To find a maximum move in the direction of the gradient

xk+1=xk+ηkxf(xk)T x^{k+1}=x^{k}+\eta^{k} \nabla_{x} f\left(x^{k}\right)^{T}

  • To find a minimum move exactly opposite the direction of the gradient

xk+1=xkηkxf(xk)T x^{k+1}=x^{k}-\eta^{k} \nabla_{x} f\left(x^{k}\right)^{T}

  • Choose steps
    • fixed step size
    • iteration-dependent step size: critical for fast optimization


  • For convex functions
    • gradient descent will always find the minimum.
  • For non-convex functions
    • it will find a local minimum or an inflection point

results matching ""

    No results matching ""