How do we compose the network that performs the requisite function?


  • The bias can also be viewed as the weight of another input component that is always set to 1

  • z=iwixiz=\sum_{i} w_{i} x_{i}

  • What we learn: The ..parameters.. of the network

  • Learning the network: Determining the values of these parameters such that the network computes the desired function

  • How to learn a network?

    • W^=argminWXdiv(f(X;W),g(X))d \widehat{\boldsymbol{W}}=\underset{W}{\operatorname{argmin}} \int_{X} \operatorname{div}(f(X ; W), g(X)) d
    • div() is a divergence function thet goes to zero when f(X;W)=g(X)f(X ; W)=g(X)
  • But in practice g(x)g(x) will not have such specification

    • Sample g(x)g(x): just gather training data


Simple perceptron

do For i=1..Ntraini = 1.. N_{train}

O(xi)=sign(WTXi)O(x_i) = sign(W^TX_i)

if O(xi)yiO(x_i) \neq y_i

W=W+YiXiW = W+Y_iX_i

until no more classification errors

A more complex problem

  • This can be perfectly represented using an MLP
  • But perveptron algorithm require linearly separated labels to be learned in lower-level neurons
    • An exponential search over inputs
  • So we need differentiable function to compute the change in the output for ..small.. changes in either the input or the weights

Empirical Risk Minimization

Assuming XX is a random variable:

W^=argminWXdiv(f(X;W),g(X))P(X)dX=argminWE[div(f(X;W),g(X))] \begin{aligned} \widehat{\boldsymbol{W}}=& \underset{W}{\operatorname{argmin}} \int_{X} \operatorname{div}(f(X ; W), g(X)) P(X) d X \\\\ &=\underset{W}{\operatorname{argmin}} E[\operatorname{div}(f(X ; W), g(X))] \end{aligned}

Sample g(X)g(X), where di=g(Xi)+noised_{i}=g\left(X_{i}\right)+ noise, estimate function from the samples

The empirical estimate of the expected error is the average error over the samples E[div(f(X;W),g(X))]1Ni=1Ndiv(f(Xi;W),di) E[\operatorname{div}(f(X ; W), g(X))] \approx \frac{1}{N} \sum_{i=1}^{N} \operatorname{div}\left(f\left(X_{i} ; W\right), d_{i}\right)

Empirical average error (Empirical Risk) on all training data Loss(W)=1Nidiv(f(Xi;W),di) \operatorname{Loss}(W)=\frac{1}{N} \sum_{i} \operatorname{div}\left(f\left(X_{i} ; W\right), d_{i}\right)

Estimate the parameters to minimize the empirical estimate of expected error W^=argminWLoss(W) \widehat{\boldsymbol{W}}=\underset{W}{\operatorname{argmin}} \operatorname{Loss}(W)

Problem statement

  • Given a training set of input-output pairs

(X_1,d_1),(X_2,d_2),,(X_N,d_N) \left(\boldsymbol{X}\_{1}, \boldsymbol{d}\_{1}\right),\left(\boldsymbol{X}\_{2}, \boldsymbol{d}\_{2}\right), \ldots,\left(\boldsymbol{X}\_{N}, \boldsymbol{d}\_{N}\right)

  • Minimize the following function

Loss(W)=1Nidiv(f(Xi;W),di) \operatorname{Loss}(W)=\frac{1}{N} \sum_{i} \operatorname{div}\left(f\left(X_{i} ; W\right), d_{i}\right)

  • This is problem of function minimization
    • An instance of optimization

results matching ""

    No results matching ""