Deep Learning 7 - Reduce the value of a loss function by a gradient

15 June 2017

In order to minimize the value of a loss function, you need calculate a gradient for the weight. If the weight ${\bf W}$ has a $2 \times 3$ metrix;

$\begin{equation} {\bf W} = \left( \begin{array}{ccc} w_{11} & w_{21} & w_{31} \\ w_{12} & w_{22} & w_{32} \end{array} \right) \end{equation}$

Then, the gradient is represented by;

$\begin{equation} \frac{\partial L}{\partial {\bf W}} = \left( \begin{array}{ccc} \frac{\partial L}{\partial w_{11}} & \frac{\partial L}{\partial w_{21}} & \frac{\partial L}{\partial w_{13}} \\ \frac{\partial L}{\partial w_{12}} & \frac{\partial L}{\partial w_{22}} & \frac{\partial L}{\partial w_{32}} \end{array} \right) \end{equation}$

In numerical differentiation, the first element will be calculated as follow;

$\begin{equation} \frac{\partial L}{\partial w_{11}} = \lim_{h \to 0} \frac{L(w_{11} + h) - L(w_{11} - h)}{2h} \end{equation}$

The source code is not so difficult.

07_gradient.py

def numerical_gradient(f, x):
    h = 1e-4
    grad = np.zeros_like(x)

    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        idx = it.multi_index
        tmp = x[idx]
        x[idx] = float(tmp) + h
        fxh1 = f(x)

        x[idx] = tmp - h
        fxh2 = f(x)
        grad[idx] = (fxh1 - fxh2) / (2*h)

        x[idx] = tmp
        it.iternext()

    return grad

Finally, let’s calculate the gradient in the simple neural network by using softmax in Deep Learning 2 and cross_entropy_error in Deep Learning 6.

07_gradient.py

import numpy as np
from functions import softmax, cross_entropy_error
from gradient import numerical_gradient


class simpleNet:
    def __init__(self):
        self.W = np.random.randn(2,3)

    def predict(self, x):
        return np.dot(x, self.W)

    def loss(self, x, t):
        z = self.predict(x)
        y = softmax(z)
        loss = cross_entropy_error(y, t)

        return loss

x = np.array([0.6, 0.9])
t = np.array([0, 0, 1])

net = simpleNet()

f = lambda w: net.loss(x, t)
dW = numerical_gradient(f, net.W)

print(dW)

This is the result of the calculation;

[[ 0.02738297  0.22400595 -0.25138893]
 [ 0.04107446  0.33600893 -0.37708339]]

You can see $\partial L / \partial w_{11} = 0.02738297$ and $\partial L / \partial w_{32} = -0.37708339$ which means $w_{11}$ and $w_{32}$ should be updated in a negative and positive sense respectively from the perspective of the loss function.

The sample code is here.

Reference

Deep Learning From Scratch (Japanese)

Deep Learning 7 - Reduce the value of a loss function by a gradient

Reference

Related Posts