Wednesday, October 29, 2008

Handwritten Digit Recognizer

During my first quarter as a computer science grad I took an incredibly enlightening course in pattern recognition. Teams of students were assigned the task of implementing a handwritten digit recognizer. Like humans, computers need to determine the content of handwritten information before it can be used in a meaningful way. This is accomplished through a form of optical character recognition (OCR).

The postal service accepts packages and envelopes with handwritten addresses which must be read and interpreted in order to sort mail and send each item to its intended destination. It's both impractical and expensive to have humans sort large volumes of mail so automated computer systems are often used instead. The systems often consist of cameras which take pictures of the addresses and feed the images into program for processing.

My team decided to implement a convolutional neural network similar to Lecun's LeNet-5.
  • The first layer is the input layer and consists of one neuron per pixel in a 29x29 padded version of the sample image.
  • The second layer applies 6 feature maps to the input layer. Each feature map is a randomly distributed 5x5 convolutional kernel.
  • The third layer applies 50 feature maps to all 6 of the previous feature maps after sub-sampling. Again, each feature map is a randomly distributed 5x5 convolutional kernel. These 2 layers are referred to as a trainable feature extractor.
  • The fourth and fifth layers are referred to as a trainable feature classifier. These 2 layers are fully connected and compose a universal classifier.
A convolutional neural network exploits the spatial structure of digits and attempts to train weights to identify spatial differences between digits. We calculated classification error rates using standard backpropagation, which played a major role in training the system.

After running the network with 100 hidden nodes for 5 epochs over 60000 MNIST training samples and 10000 test samples, our network misclassified 851 samples and had a 91.4% success rate. Not bad.

While this project focused on recognizing handwritten digits, the concepts and algorithms covered can be easily extended to apply to all alphanumeric characters.

Resources:

Monday, September 1, 2008

Work History Part 4: Xerox Corporation

Employer: Xerox Corporation
Location: Webster, NY
Position: Software / Firmware Engineer
Period: June 2008 - August 2008

Xerox is a global document management company specializing in the production of workstation printers, large scale printing presses, multifunction systems, and the supplies required to run such devices.

Nowadays Xerox seems to be shifting its focus to collaborative web technologies to make it easy to share, edit, and print digital media.

Xerox is named after the process of xerography, which was invented by Chester Carlson in 1938. The Center for Imaging Science (CIS) at RIT is named in honor of Mr. Carlson. A little over a year after I worked for Xerox I was hired to perform research on a flat plate xerographic printing press for the Print Research and Image Systems Modeling Laboratory in CIS.

While working for Xerox I was part of the Platform Development Unit / Control Systems Platform group and was part of the Xerographics / Image Path Software team. I helped maintain and develop the Nuvera Digital Production System line of products; however, my primary task was to implement C++ code to control a down-shooting printhead fixture.

The down-shooting fixture was a digital coater device capable of printing clear gel ink on paper. Such a device allows users to coat specific areas of a document with gloss instead of performing a flood coat of the entire page. Also, such a device can print Braille on pharmaceutical bottles. The European Union has made it mandatory that the name of all medications be printed on the packaging in Braille.

Responsibilities and accomplishments:
  • Migrated digital coater LabVIEW code to production-worthy C++ application code.
  • Designed a software architecture for Controller Area Network communication.
  • Analyzed and implemented solutions for problems with the Nuvera Digital Production System printer.

Thursday, May 15, 2008

Computer Graphics II Projects

During my last year as a software engineering undergrad I took a graduate-level course in advanced computer graphics using OpenGL. Each student was required to implement a ray tracer from scratch in order to render a scene similar to Turner Whitted's classic example of transmissive and reflective spheres. A record of my progress is posted on the LiveJournal (tag: raytracer) I kept for the class. The image to the right was the final product.

My personal research project during the class was to implement a particle system from scratch to animate realistic looking fire. I applied a texture map to each particle to give it shape and definition detail. Also, the texture map was necessary to provide the alpha channel used for alpha blending. Various system parameters can be modified in real time to give the particle system the appearance of a lava lamp or even multi-colored fireworks.



The project details are posted on my LiveJournal (tag: openfire). I named the project OpenFire because the source code is available free of charge to the open source community. If modified, all I ask is that I be credited in the source code comments.

Resources:

I also dabbled in a little bit of Pixar's Renderman technology. There was a student competition to see who could create the most interesting image. I won second place for my abstract rendition of a woman's head and brain! Check it out on my LiveJournal (tag: renderman).

Wednesday, April 23, 2008

Bachelors Senior Design Project: GridShell

My software engineering capstone project at the Rochester Institute of Technology involved developing a command-line shell to allow easy access to distributed computing environments, namely grid resources such as the TeraGrid. The project was part of a development effort put forth by the RIT Center for Advancing the Study of CyberInfrastructure (CASCI).

GridShell was one of many projects spearheaded by Cyberaide, a community of researchers and organizations interested in advancing cyberinfrastructure to promote collaboration between individuals and providing them with the computing power necessary to deploy and execute processor-intensive programs for scientific research.

The GridShell provides uniform syntax and semantics for creating, submitting, managing, and checking the status of tasks deployed to local and remote Unix resources. It simplifies workflow management by allowing users to execute tasks sequentially or in parallel and allowing users to specify dependancies between those tasks, which places a constraint on their order of execution.

Additionally, users can create scripts in Python to specify task order execution, which is useful for more complex jobs. This is because the GridShell front end is implemented in Python and can execute Python code natively in the command-line interpreter (CLI). Specifically, the CLI is based on the IPython environment. In turn JPype is used to establish a functional language binding between the Python front end and Java back end.

Java was chosen as the back end language in order to leverage existing web service technologies written in Java and utilize other Java utilities. Ultimately, our goal was to provide multiple interfaces to the underlying GridShell task management framework, including the traditional terminal interface as well as a web interface, MATLAB interface, Ruby interface, etc. For the web framework we decided to use a deployable Apache Axis2 web server with Rampart security.

Resources: