Ishtiaq Rasool Khan: Part2: Convolutional Neural Networksfor Handwritten Digits Recognition

Part 2 (CoNN: Theory and Implementation)

Ishtiaq Rasool Khan

Professor,

Department of Computer Science and AI

University of Jeddah

https://ishtiaqrasool.github.io/

For the implementation of CoNN, understanding convolution, and forward and backward propagation is very important. In this part, I will explain the convolution and the basic structure of CoNN. Then three basic classes of my implementation are described. Readers will note that the implementation becomes quite convenient with these classes. Forward and backward propagations will be explained in part 3.

Convolution

The convolution of an (NxN) image with a (KxK) kernel can be understood by sliding a (KxK) window over the input image iteratively. For each position of the window, an output pixel is generated by taking the dot product (sum of the multiplication of the corresponding pixels) of the kernel with the input pixels lying under the window. In the following figures we have shown the calculation of the first two pixels of the output image Y, generated by convolution of a (6x6) input image X with a (2x2) kernel W.

Calculation of the 1st output pixel	Calculation of the 2nd output pixel
Fig: Pictorial description of 2D convolution

It can be noted that the size of the output image obtained by convolution of an N1xN2 input image to a K1xK2 kernel is (N1-K1+1) x (N2-K2+1)e.

Convolutional Neural Networks (CoNN)

The first layer has only one feature map which is the input image itself. In the following layers, each feature map keeps a certain number of unique kernels (2D arrays of weights), equal to the number of the feature maps in the previous layer. The size of each kernel in a feature map is the same and is a design parameter. The pixel values in a feature map are derived by convoluting its kernels with the corresponding feature maps in the previous layer. The number of feature maps in the last layer is equal to the number of output options. For example in 0-9 digit recognition, there will be 10 feature maps in the output layer, and the feature map with highest pixel value will be the result.

Fig. Structure of CoNN designed in this blog

The CoNN implemented in this study is shown in the figure above. The number of feature maps (FM) in each layer and their sizes are shown under each layer. For example, layer 1 has 6 feature maps, each of size 13 x 13. The number of kernels each feature map in a layer contains is also shown in the figure. For example, W[6][25] written under layer 2 indicates that each feature map in this layer has 6 kernels (equal to the number of FM in the previous layer), each of which has 25 weights (5x5 array). In addition to these weights, each FM has a bias weight (for its importance, please refer to any NN text). Each pixel has another parameter SF (sampling factor) and it will be explained in the forward propagation section. Three other parameters dBias, dErrorW and dErrorFM will be explained in the backpropagation section. The last two lines below each layer in the figure give the total number of pixels (neurons) and weights in the whole layer.

Implementation

It can be understood from the above discussion that the topmost structure in our implementation would be the CoNN itself. We write a CCNN class as given below:

class CCNN
{

public:

Layer *m_Layer;
int m_nLayer;

CCNN(void);
~CCNN(void);

void ConstructNN();
void DeleteNN();

//load weights from a text file
void LoadWeights(char *FileName);

//initialize weights to random values
void LoadWeightsRandom();

//save current weights to a text file
void SaveWeights(char *FileName);

//forward propagate
int Calculate(double *input, double *output);

//Backward propagate
void BackPropagate(double *desiredOutput, double eta);

//diagonal Hessian to speed up learning
void CalculateHessian( );

};

CCNN contains an array of layers defined by *m_Layer and a variable m_nLayers which keeps the record of the number of layers in the network. The member function LoadWeights( ) loads weights of each layer from a text file, and SaveWeights( ) saves the current weights. LoadWeightsRandom( ) initializes the weights to random floating point values and it is used before starting a new training session. Calculate( ) performs forward propagation using the current weights and gets the result. BackPropagate( ) is used for training the CoNN, and it flows the error in the output of the last layer back to the previous layers and adjusts the weights to minimize the error. CalculateHessian( ) calculates the diagonal Hessian to speed up the learning of the CoNN. In fact, the last three functions simply call the corresponding functions in the Layer class which is explained below.

class Layer
{

public:

FeatureMap* m_FeatureMap;

int m_nFeatureMap;
int m_FeatureSize;
int m_KernelSize;
int m_SamplingFactor;

Layer *pLayerPrev;

void ClearAll();

void Calculate();
void BackPropagate(int dOrder, double etaLearningRate);

void Construct(int nFeatureMap, int FeatureSize, int KernelSize, int SamplingFactor);

void Delete();

};

Layer has a pointer to an array of feature maps *m_FeatureMap, and a variable m_nFeatureMap for the number of FMs in this layer. It also has variables for the size of the feature map (m_FeatureSize), the size of the kernel (m_KernelSize), and the sampling factor (m_SamplingFactor) which is the step size of the sliding window during convolution.

A layer derives the pixel values of its FMs from the FMs in the previous layer during forward propagation, and during backpropagation, it sends the error back to the previous layer. Therefore a pointer to the previous layer *pLayerPrev is also provided to the Layer. ClearAll( ) initializes all arrays to zero. Calculate( ) and Backpropagate( ) performs forward and backward propagation respectively. dOrder parameter in the latter defines the order of the derivative of the error which is to be propagated. For training, the first-order derivative is backpropagated, and for calculating diagonal Hessian, the second-order derivative is backpropagated. These will be explained later.

Finally, we define a class FeatureMap as

class FeatureMap
{
public:

double bias, dErr_wrtb, diagHessianBias;
double *value, *dError;
double **kernel, **diagHessian, **dErr_wrtw;

Layer *pLayer;

void Construct( );
void Delete();

void Clear();

void ClearDError();

void ClearDiagHessian();

void ClearDErrWRTW();

double Convolute(double *input, int size, int r0, int c0, double *weight, int kernel_size);

void Calculate(double *valueFeatureMapPrev, int idxFeatureMapPrev );

void BackPropagate(double *valueFeatureMapPrev, int idxFeatureMapPrev, double *dErrorFeatureMapPrev, int dOrder );

};

FeatureMap defines variables bias, derivative of error with respect to bias (dErr_wrtb), and diagonal Hessian for bias (diagHessianBias). It defines arrays for holding the pixel values (*value) and the derivative of the error wrt pixel values (*dError). Weights are stored in 2D array **kernel, and the derivative of the error wrt weights in **dErr_wrtw. The Digonal Hessian for each weight is stored in **diagHessian. FeatureMap has a pointer to its parent Layer *pLayer, which is used to access the size of the FM, its kernel, and other information required for different calculations.

Some of the functions in these three classes will be explained in part 3.

Ishtiaq Rasool Khan

Tuesday, July 30, 2013

Part2: Convolutional Neural Networksfor Handwritten Digits Recognition

Part 2 (CoNN: Theory and Implementation)

Convolution

Convolutional Neural Networks (CoNN)

Implementation

Links

No comments:

About Me