Ishtiaq Rasool Khan: Part1: Convolutional Neural Networks for Handwritten Digits Recognition

Part 1 (Introduction)

Ishtiaq Rasool Khan

Professor,

Department of Computer Science and AI

University of Jeddah

https://ishtiaqrasool.github.io/

Introduction

Dr. Yann LeCun proposed a 5 layers CoNN for handwritten character recognition in his famous article:

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-Based Learning Applied to Document Recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.

Later Dr. Simrad published his work in 2003:

Patrice Y. Simard, Dave Steinkraus, John Platt, "Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis," International Conference on Document Analysis and Recognition (ICDAR), IEEE Computer Society, Los Alamitos, pp. 958-962, 2003.

He gave some practical tips regarding software implementation of Dr. LeCun's work, and suggested few techniques for getting better output. Dr. LeCun did not program his CoNN in C/C++, and Dr. Simrad did not make his code public.

Mike Oneill programmed this ANN in Visual C++, and his code and excellent explanation is publicly available at
http://www.codeproject.com/KB/library/NeuralNetRecognition.aspx

What is different in this implementation?

One problem with Mike's code is that he extensively used Microsoft Foundation Classes (MFC) library, STL Vectors, and unnecessary (in my opinion) multithreading which makes it difficult to understand for people like me, who feel more comfortable without these things (especially Microsoft-specific strings of "random CAPITAL letters").

I have implemented the same CoNN in as simple and pure C/C++ as possible. MFC has been used for some interface and displaying of results, however, its use has been kept to a minimum. There is only one dialog for all sorts of interfaces and it is implemented using MFC. Implementation of CoNN is in two separate files (CNN.h and CNN.cpp) which contain three classes (CCNN, CLayer, and CFeatureMap), and they are all written in standard C++. Anyone wishing to get rid of MFC completely can use these classes in his/her own code.

Mike's code has a bug in the implementation of the 2nd layer (as has been discussed in Q&A part of his notes), and this is the reason why he could not achieve higher accuracy. I have corrected that bug in my implementation and that is why it has been able to achieve a better performance.

In general Mike's code is an excellent work, however, there is one deficiency that it is written just keeping in mind the NN proposed by Dr. LeCun and Dr. Simrad, and has not been generalized. It has five layers of proposed fixed sizes. If a user wants to change the number of layers or number of neurons in a certain layer, he has to rewrite a significant part of the code. I have tried to generalize the code. A user can choose the number of layers and the number of neurons from the main routine without modifying the NN implementation code.

Dr. LeCun has defined sampling layers in between convolutional layers which create subsampled reduced-sized versions of the feature maps. Dr. Simrad proposed a more efficient approach by eliminating these sampling layers and instead incorporated the subsampling in the operation of the convolution layers. Mike followed this approach and I also maintained the same. Going one step further, I have eliminated the difference between the convolutional and the fully connected layers. A fully connected layer having an (NxN) feature map is simply defined as a convolutional layer having (NxN) feature maps each of size (1x1). Details will come later where implementation is explained.

Training and Testing Data Sets

A database of handwritten patterns was prepared by the National Institute of Standards and Technology ("NIST"), written by different individuals including high school students and US census workers. Dr. LeCun modified this dataset and broke it into training and testing sets, containing 60,000 and 10,000 images of handwritten characters, respectively. I have provided links to these datasets at the end of this page.

Dr. Simrad in his implementation induced different kinds of distortions (scaling, rotation, and Gaussian filtering, etc) in the training images. This can be considered as equivalent to increasing the number of training images by several times.

According to Mike, distortion helps in training the neural network, since it forces the network to extract the intrinsic shapes of the patterns, rather than allowing the network to (incorrectly) focus on peculiarities of individual patterns. He mentions that he could not reach a certain level of accuracy without inducing distortion. In my implementation, I could achieve better results without inducing distortion in the training images, compared to what Mike could get with distortion-induced extended data. The reason might be the bug in Mike's code, as mentioned above.

In the current phase, I have trained the NN without distortion. In a future implementation, I will try to use distortion to find if it really helps.

(Update: It has been implemented. You can check the "Induce Distorsion" box and the training will be done using distorted patterns)

Description of GUI

A trained (or untrained) NN can be tested on individual samples or the whole set of 10,000 test samples using the buttons on the left side of the interface dialog box. Individual samples can be tested by clicking on "Get", " >>", or "<<" buttons. The sample whose index is shown above these buttons is displayed in "Input". The result and the true result are shown in "Output" and "Label" respectively. If "Induce Distortion" box at the bottom is checked, different kinds of distortion (scaling, rotation, and Gaussian filtering, etc.) will be induced in the input sample before testing. The distorted sample is shown in the box labeled as "Distorted". The radio buttons at the bottom provide an option to choose between training or test data sets, on which NN will be tested.

Fig: User Interface of Convolutional Neural Network for Handwritten Digit Recognition

"Load NN Weights" button is provided to load the weights obtained during a previous training session. At the start, the NN is loaded with random weights, and therefore it will not be able to recognize the samples correctly until it undergoes some training, or the weights from a previous training session are loaded.

The training session starts with a click on "Training". There is an option, provided with radio buttons, to start a new training from scratch, or resume from a previous training session. Mismatches in each epoch during training are listed in the "Training History" edit box. After each epoch of training, the weights are stored in a text file. Weights from a previous training session can be loaded by clicking on "Load NN Weights" button.

When clicked on "Training", an open-file dialog appears asking for the training dataset and training dataset labels (the correct results) files, if they are not opened already. Generally, the training should be carried out using the training dataset only, so that you can check the performance of the CoNN using the test dataset which has not been seen by the CoNN previously. However, you can experiment by training with both datasets and see how much improvement can be achieved in performance. The radio buttons on the left that provide a choice between training or test datasets are related only to the testing. If you want to train the CoNN using the test dataset, choose the appropriate file during the file open dialog.

The "Induce Distortion" checkbox applies to both testing and training. The NN can be trained using distorted samples, and also its performance can be tested on distorted samples by checking this box.
Counters provided on the top right give the number of misrecognized patterns, and the total number of patterns processed so far, during training or testing. They are reset to zero at beginning of each testing or training epoch.

Ishtiaq Rasool Khan

Tuesday, July 30, 2013

Part1: Convolutional Neural Networks for Handwritten Digits Recognition

Part 1 (Introduction)

Introduction

What is different in this implementation?

Training and Testing Data Sets

Description of GUI

Links

No comments:

About Me