Pose Estimation

Inspiration

After recently learning about different types of neural networks I wanted to search for a task that would allow me to explore the intricacies of how they work. Given that the pose of a bolt was to be determined off of image data, my first instinct was to use neural networks and hope that the convolutional neural network filters or a dense neural network were able to pick up patterns (particularly on the depth image) and be able to identify both the transformation from the origin and the rotation.

What it does

My model attempts to look at the depth image of a bolt and predict the x,y, and z coordinate positions as well as the rotation of the bolt in degrees.

How we built it

This was built from scratch using the TensorFlow library. Lots of the preprocessing and the choices I made when compiling the model will be explained in the section below. In short, the 10,000 images were created and resized to contain fewer pixels. The model developed was a convolutional neural network that contained a total of 128 filters with pooling followed by a dense neural network.

Challenges we ran into

I faced a lot of challenges doing this project, and some of these challenges have not been fully conquered upon submitting this. However, this challenge only made me that more curious and interested in learning how neural networks can be applied most effectively in all types of situations.

The first challenge I faced was determining a model I would create. I was going to use the starter code to try and improve or create a model similar to the one that was given, however, I wanted to try and apply what I was passionate about. Given that I was dealing with image data, my gut take was to use a DNN (where each pixel would be an input node) or a CNN (use filters to pass over each image, create a feature map, reduce it, repeat).

I first tried building the DNN since the model would look at the image globally and require certain parts of the bolt to be in certain places to generate a unique value while a CNN would just look for patterns in the image. However, the input node size was ridiculously large (literally the product of the dimensions of the image (1080 x 1876). Even if I tried reducing it, it would still be too large. So I decided to go to plan B which was to develop a convolutional neural network.

The biggest hurdle that I did not realize until I was actually compiling the model was its efficiency.

First, the CNN that I built was more appropriately used for categorization, not for determining the numeric degree of something. This led to my "conventional" loss function (spare categorical cross-entropy) being switched to something else. I choose to mean absolute error just because it did a comparison of expected numeric values but other than that, it was chosen rather arbitrarily.

Secondly, and most importantly was creating a dataset. This lead to a multitude of problems and hours of waiting. A CNN needs at the very least, thousands of pieces of data to look at to develop a good model. And even that is for classification tasks. This project requires precise x,y, and z coordinate points and the rotation degree which contains more than tens of thousands of unique values. This means the CNN model would almost always be a little off and to generate enough features to cover off of those possible degrees and positions would be nearly impossible. Attempting to generate and store 10 images in a python list caused my program to crash. This meant I had to take some pretty extreme measures to reduce the size of these images. This caused me to reduce the size of every randomly generated image by 1/8th. This was extremely worrying to me since a slight tilt in degree will not be represented clearly on a lower resolution photo, even if the model is working perfectly fine.

At that point in the process, I was hoping that the data reduction could be compensated by generating ten thousand randomly assigned bolt images. Generating these 10,000 images took between 20-30 minutes. Training and evaluating the model took another 20 minutes. Still, I wanted to see if it could be done.

The convolutional Neural Network seems to have picked up some general trends but is far from accurate. But I learned an incredible amount.

Accomplishments that we're proud of and What we Learned

The first thing I was very proud of was to actually get the code running properly on my local machine. That took a while and a bit of reinstalling on my end to resolve. But more seriously, I really enjoyed experimenting with neural networks outside of my comfort zone. Being a sophomore undergraduate student, I have not had any robust education on machine learning (yet) but to come to Datathon and try and apply with what little I know was an incredible experience. What I actually appreciated most is that when I ran into a challenge I understood what the problem was. I never was just staring at my screen wondering why a line of code said this. I understood everything that I could do to improve. It is just a matter of if I decide to look into the solutions to these problems, like using NN to make numerical predictions and not just for classification. Perhaps some other models need to be looked at altogether. I cannot just slap a neural network on everything. To list a few specific things to improve: 1] Find a more efficient way to preprocess the image data. Trimming the outer edges that always had a value of 0 or 1 saves a lot of space and time. 2] Look into other loss functions AND activation functions. This may seem naive but since one of the labels was a degree value. It would be interesting to find or even modify a loss function to squish values between [0,360). Also, there are certain loss functions that are significantly more appropriate to use than others in certain situations. 3] Integrate more of the predefined functions written by General Motors.

Thanks, and Gig Em.

Conrad

Built With

jupiter
python
tensorflow

Updates

Conrad Krueger started this project — Oct 17, 2021 04:38 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.