Wednesday, April 23, 2014

back propagation in neural network

Have a read here:
http://ufldl.stanford.edu/wiki/index.php/Backpropagation_Algorithm

If you are wondering how we get those equations:

% WHAT DO WE WANT?
% WE WANT dJ/dTheta2 and dJ/dTheta1 for gradient descent
% ie.  how much does the cost change as the theta (weights) change

% J     = -y* log(h) - (1-y)* log (1-h)
%     = (y-1)*log(1-h) - y*log(h)
%     = (y-1)*log(1-g) - y*log(g)
% where h = g = g(zL3) and zL3 = Theta2*aL2
%
% dJ/dTheta2    = (dJ/dzL3) * dzL3/dTheta2
%
%     dJ/dzL3    = (dJ/dg) * dg/dzL3
%         dJ/dg    = ((y-1)/(1-g))*(-1) - y/g
%                 = (1-y)/(1-g) - y/g
%         dg/dzL3    = g*(1-g)
%     dJ/dzL3    = g*(1-y) - y*(1-g)
%             = g- yg - y + yg
%             = g-y
%            
%     dzL3/dTheta2    = aL2
%
% dJ/dTheta2    = (dJ/dzL3) * dzL3/dTheta2
%             = (g - y) * aL2


%
% dJ/dTheta1 is a bit more tricky
% dJ/dTheta1 = dJ/dzL2 * dzL2/dTheta1
%
    % 1st term
    % dJ/dzL2    = dJ/dzL3 * dzL3/dzL2
        % zL3    = Theta2 * aL2
        %        = Theta2 * g(zL2)
        % dzL3/dzL2    = dzL3/dg(zL2) * dg(zL2)/dzL2
        %            = Theta2 * g*(1-g)    where g = g(zL2)
    % dJ/dzL2    = dJ/dzL3 * dzL3/dzL2
    %             = dJ/dzL3 * Theta2 * g*(1-g)
    %             = [dJ/dzL3 * Theta2] * g'(zL2)
    % note that in [dJ/dzL3 * Theta2], dJ/dzL3 is the "error term" from next layer and we back propagate it by the means of Theta2 to get the weighted average
% dJ/dTheta1     = dJ/dzL2 * dzL2/dTheta1
%                 = [dJ/dzL3 * Theta2] * g'(zL2) * aL1

No comments:

Post a Comment