Camera-based Volume Estimation of Objects for Nutrition

Volume Estimate of Soda: 302mL estimate vs 355mL actual
Volume Estimate of Soup: 513mL estimate vs 580mL actual

Inspiration

In recent times, obesity and food-related health problems have only been causing more preventable deaths in our society. One way to address this issue is to easily allow people to monitor their caloric intake. This project was inspired by the possibility of allowing anyone to accurately track their diet using just their phone. While many apps allow you to manually fill in a table of nutritional intake, I wanted to automate this process and take it a step further.

What it does

This project takes an image of an object, isolates it, then estimates the volume based on a user-given estimate of how far the camera is from the object. If the object is a food, a caloric value can be estimated as well.

How I built it

On an Android phone, I used the app IP Webcam Pro to allow me to access the phone's camera from a computer. I then took an image via the server localhosted by the app. The image was first fed into Google Cloud Vision API for object detection. The API was actually not used for any classification tasks, but was somewhat abused. The API draws boxes around objects, which I then extracted to simplify the task of segmenting the image. Using Skimage and SciPy, I wrote some custom segmentation to downscale the image (simplifying computations) and ran edge detection with erosion and dilation to fill in holes in the detected object.

After the object was isolated, I then used an integration approach to estimating the volume. The idea was, in the relative image space (units of pixels), to split the object up into one-pixel thick disks, approximate the volume of each disk as a cylinder (V = pi * r^2 * dx), and sum this up across the whole object. This would then, hopefully, generate a reasonable estimate to the volume. The computer then takes in a user-provided value estimating the distance the camera is from the object. This way, we can map our image space to real space by calculating how much actual volume a cubic pixel takes up. We now have an estimate to the volume of an object.

Challenges I ran into

The program only functions well on opaque, semi-cylindrical objects. The segmentation works by darkening the image to reduce light aberrations and emphasize edges. However, it does not work well with things like empty glass bottles and clear fluids. Additionally, an irregular shaped object will not be estimated with much accuracy as of now. Unfortunately, time constraints prevented me from developing the next step, which was to photograph the object from multiple sides and average a radius for volume estimation. This would likely be much more accurate.

Additionally, this project depends on a database being available that gives the energy densities of various foods to calculate a caloric value.

Accomplishments that I'm proud of

I am surprised that, in my admittedly limited testing, that error stayed well below 50%. I am also proud of thinking of using Cloud Vision API to reduce my workload by having it partially isolate the object for me. (Of course this depends on Google's neural network's ability to classify the object being photographed.) I think this shows that this concept is worthwhile, and it is something I would like to further develop in the future.

What I learned

I never had used any sort of cloud-based API before, so it was a treat to learn how to interface my code with that. I also very much enjoyed trying different segmentation approaches to see which functioned the best. I tried many algorithms like Canny and Sobel, and read about even more during my research.

What's next for Camera-based Volume Estimation of Objects for Nutrition

I have three main ideas for furthering this project. The first was mentioned before, where I wanted to create an aggregate volume based on multiple frames from different angles of an object. I think this will be much more accurate especially for irregularly shaped objects (like many foods).

The second is to further automate the task at hand, and find a way to excise the need for a user-provided distance value. This is difficult, however requiring the user to move the camera in a certain way and capturing accelerometer data may allow for a reasonable estimate for a map from image to real space.

Lastly, I would like to port this entire project to mobile, in order to make it accessible. This will not only involve the computer vision aspects, but also the automated data entry to allow people to record their caloric intake easily.

Built With

google-cloud-vision
ip-webcam-pro
python
scipy
skimage

Updates

Archie Shahidullah started this project — Mar 08, 2020 11:28 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.