next up previous
Next: Advanced Training Methods for Up: NNA_Exercises_2012 Previous: Logistic regression [3 P]

Outer product approximation [3 P]

Derive an expression for the outer product (Quasi-Newton) approximation to the Hessian matrix for a network having $ K$ outputs with a softmax output unit activation function

$\displaystyle y_k({\bf x},{\bf w}) = \frac{\exp(a_k({\bf x},{\bf w}))}{\sum_{j=1}^K\exp(a_j({\bf x},{\bf w}))},$    

and output unit activations $ a_k$ , where $ k = 1,...,K$ , and a cross-entropy error function $ E_{CE}$ , corresponding to the result

$\displaystyle {\bf H}({\bf w}) \approx \sum_{n=1}^{N}{\bf b}_n({\bf w}) {\bf b}_n({\bf w})^T$    

with $ b_{ki}({\bf w}) \equiv \frac{\partial a_k}{\partial w_i}$ for the sum-of-squares error function

$\displaystyle E = \frac{1}{2}\sum_{n=1}^{N}(y_n - t_n)^2$    

and a linear output unit activation function, i.e. $ y_n = a_n$ .

Haeusler Stefan 2013-01-16