## Hopfield Net

• So far, neural networks for computation are all feedforward structures

### Loopy network

• Each neuron is a perceptron with +1/-1 output
• Every neuron receives input from every other neuron
• Every neuron outputs signals to every other neuron

• At each time each neuron receives a “field” $\sum_{j \neq i} w_{j i} y_{j}+b_{i}$
• If the sign of the field matches its own sign, it does not respond
• If the sign of the field opposes its own sign, it “flips” to match the sign of the field

• If the sign of the field at any neuron opposes its own sign, it “flips” to match the field
• Which will change the field at other nodes
• Which may then flip... and so on...

### Filp behavior

• Let $y^{-}_{i}$ be the output of the $i$-th neuron just before it responds to the current field

• Let $y_{i}^{+}$ be the output of the $i$-th neuron just after it responds to the current field

• if $y_{i}^{-}=\operatorname{sign}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)$, then $y_{i}^{+} = -y_{i}^{-}$

• If the sign of the field matches its own sign, it does not flip

• $y_{i}^{+}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)-y_{i}^{-}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)=0$

• if $y_{i}^{-}\neq\operatorname{sign}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)$, then $y_{i}^{+} = -y_{i}^{-}$

• $y_{i}^{+}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)-y_{i}^{-}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)=2 y_{i}^{+}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)$

• This term is always positive!

• Every flip of a neuron is guaranteed to locally increase $y_{i}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right)$

### Globally

• Consider the following sum across all nodes

$\begin{array}{c} D\left(y_{1}, y_{2}, \ldots, y_{N}\right)=\sum_{i} y_{i}\left(\sum_{j \neq i} w_{j i} y_{j}+b_{i}\right) \\\\ =\sum_{i, j \neq i} w_{i j} y_{i} y_{j}+\sum_{i} b_{i} y_{i} \end{array}$

• Assume $w_{ii} = 0$
• For any unit $k$ that “flips” because of the local field

$\Delta D\left(y_{k}\right)=D\left(y_{1}, \ldots, y_{k}^{+}, \ldots, y_{N}\right)-D\left(y_{1}, \ldots, y_{k}^{-}, \ldots, y_{N}\right)$

$\Delta D\left(y_{k}\right)=\left(y_{k}^{+}-y_{k}^{-}\right)\left(\sum_{j \neq k} w_{j k} y_{j}+b_{k}\right)$

• This is always positive!
• Every flip of a unit results in an increase in $D$

### Overall

• Flipping a unit will result in an increase (non-decrease) of

$D=\sum_{i, j \neq i} w_{i j} y_{i} y_{j}+\sum_{i} b_{i} y_{i}$

• $D$ is bounded

$D_{\max }=\sum_{i, j \neq i}\left|w_{i j}\right|+\sum_{i}\left|b_{i}\right|$

• The minimum increment of $D$ in a flip is

$\Delta D_{\min }=\min _{i,\{y_{i}, i=1 . \ldots N\}} 2|\sum_{j \neq i} w_{j i} y_{j}+b_{i}|$

• Any sequence of flips must converge in a finite number of steps
• Think of this as an infinite deep network where every weights at every layers are identical
• Find the maximum layer!

### The Energy of a Hopfield Net

• Define the Energy of the network as

$E=-\sum_{i, j \neq i} w_{i j} y_{i} y_{j}-\sum_{i} b_{i} y_{i}$

• Just the negative of $D$
• The evolution of a Hopfield network constantly decreases its energy
• This is analogous to the potential energy of a spin glass(Magnetic diploes)
• The system will evolve until the energy hits a local minimum
• We remove bias for better understanding

• The network will evolve until it arrives at a local minimum in the energy contour

• Each of the minima is a “stored” pattern
• If the network is initialized close to a stored pattern, it will inevitably evolve to the pattern
• This is a content addressable memory
• Recall memory content from partial or corrupt values
• Also called associative memory
• Evolve and recall pattern by content, not by location

### Evolution

• The network will evolve until it arrives at a local minimum in the energy contour
• We proved that every change in the network will result in decrease in energy
• So path to energy minimum is monotonic

#### For 2-neuron net

• Symmetric
• $-\frac{1}{2} \mathbf{y}^{T} \mathbf{W} \mathbf{y}=-\frac{1}{2}(-\mathbf{y})^{T} \mathbf{W}(-\mathbf{y})$
• If $\hat{y}$ is a local minimum, so is $-\hat{y}$

### Computational algorithm

• Very simple
• Updates can be done sequentially, or all at once
• Convergence when it deos not chage significantly any more

$E=-\sum_{i} \sum_{j>i} w_{j i} y_{j} y_{i}$

## Issues

### Store a specific pattern

• A network can store multiple patterns
• Every stable point is a stored pattern
• So we could design the net to store multiple patterns
• Remember that every stored pattern $P$ is actually two stored patterns, $P$ and $-P$
• How could the quadrtic function have multiple minimum? (Convex function)
• Input has constrain (belong to $(-1,1)$ )
• Hebbian learning: $w_{j i}=y_{j} y_{i}$
• Design a stationary pattern
• $\operatorname{sign}\left(\sum_{j \neq i} w_{j i} y_{j}\right)=y_{i} \quad \forall i$
• So
• $\operatorname{sign}\left(\sum_{j \neq i} w_{j i} y_{j}\right)=\operatorname{sign}\left(\sum_{j \neq i} y_{j} y_{i} y_{j}\right)$
• $\quad=\operatorname{sign}\left(\sum_{j \neq i} y_{j}^{2} y_{i}\right)=\operatorname{sign}\left(y_{i}\right)=y_{i}$
• Energy
• \begin{aligned} E=&-\sum_{i} \sum_{j
• This is the lowest possible energy value for the network

• Stored pattern has lowest energy
• No matter where it begin, it will evolve into yellow pattern(lowest energy)

### How many patterns can we store?

• To store more than one pattern

$w_{j i}=\sum_{\mathbf{y}_{p} \in\left\{\mathbf{y}_{p}\right\}} y_{i}^{p} y_{j}^{p}$

• $\{y_P\}$ is the set of patterns to store
• Super/subscript $p$ represents the specific pattern
• Hopfield: For a network of neurons can store up to ~$0.15N$ patterns through Hebbian learning(Provided in PPT)

### Orthogonal/ Non-orthogonal patterns

• Orthogonal patterns

• Patterns are local minima (stationary and stable)

• No other local minima exist
• But patterns perfectly confusable for recall
• Non-orthogonal patterns
• Patterns are local minima (stationary and stable)
• No other local minima exist
• Actual wells for patterns
• Patterns may be perfectly recalled! (Note K > 0.14 N)
• Two orthogonal 6-bit patterns
• Perfectly stationary and stable
• Several spurious “fake-memory” local minima..

### Observations

• Many “parasitic” patterns
• Undesired patterns that also become stable or attractors
• Patterns that are non-orthogonal easier to remember

• I.e. patterns that are closer are easier to remember than patterns that are farther!!
• Seems possible to store K > 0.14N patterns
• i.e. obtain a weight matrix W such that K > 0.14N patterns are stationary
• Possible to make more than 0.14N patterns at-least 1-bit stable