During my first quarter as a computer science grad I took an incredibly enlightening course in pattern recognition. Teams of students were assigned the task of implementing a handwritten digit recognizer. Like humans, computers need to determine the content of handwritten information before it can be used in a meaningful way. This is accomplished through a form of optical character recognition (OCR).
The postal service accepts packages and envelopes with handwritten addresses which must be read and interpreted in order to sort mail and send each item to its intended destination. It's both impractical and expensive to have humans sort large volumes of mail so automated computer systems are often used instead. The systems often consist of cameras which take pictures of the addresses and feed the images into program for processing.
My team decided to implement a convolutional neural network similar to Lecun's LeNet-5.
- The first layer is the input layer and consists of one neuron per pixel in a 29x29 padded version of the sample image.
- The second layer applies 6 feature maps to the input layer. Each feature map is a randomly distributed 5x5 convolutional kernel.
- The third layer applies 50 feature maps to all 6 of the previous feature maps after sub-sampling. Again, each feature map is a randomly distributed 5x5 convolutional kernel. These 2 layers are referred to as a trainable feature extractor.
- The fourth and fifth layers are referred to as a trainable feature classifier. These 2 layers are fully connected and compose a universal classifier.
A convolutional neural network exploits the spatial structure of digits and attempts to train weights to identify spatial differences between digits. We calculated classification error rates using standard backpropagation, which played a major role in training the system.
After running the network with 100 hidden nodes for 5 epochs over 60000 MNIST training samples and 10000 test samples, our network misclassified 851 samples and had a 91.4% success rate. Not bad.
While this project focused on recognizing handwritten digits, the concepts and algorithms covered can be easily extended to apply to all alphanumeric characters.
Resources: