Vid2Mesh

Inspiration

Users who visit our website are met with a text input box for a YouTube url. When this url is submitted, our site uses PAFY to download the video for processing. PAFY
Our backend down-samples this video of various objects to a specified framerate, providing a still image JPEG dataset. The down-sampling rate can be adjusted to balance speed (computation time) and performance (model clarity).
YOLO, a real-time object detection program, is used to identify objects within the input video. Following the detection of these objects, the objects are cut out from the video, while their bounding boxes are removed. Ultimately, folders are created containing images of each object from various angles. YOLO YOLO
The openMVG library is run on each object to create point clouds using photogrammetry. openMVG
The MVE library is used on each point cloud to add density and provide a surface mesh that can be interpreted as a colorized 3D model. MVE
The website then uploads this 3D model to SketchFab, which is used to embed the model in a returned webpage for the user. sketchfab

Two of our team members are first semester freshmen. Coordinating development environments on top of this inexperience was our first challenge. We spent most of the first night teaching and synchronizing.
We used a multiple environment and language specific libraries which would not run natively on our computers. This led us to use Docker containers.
Two of the steps in our pipeline have large computation times, especially at a high performance level. Testing incrementally was challenging, and producing demo-ready results took a long time.
After we finished the majority of our pipeline, we were still struggling with the modeling software failing on our image sets. This was a large technical challenge to work out the exact requirements in terms of EXIF metadata and resolution to produce a model.

It works

Train YOLO to identify very specific objects for very specific purposes. Example: Train YOLO to identify a wide variety of medications straight off the shelf, serving to simplify the process of healthcare facilities ordering from manufacturers.
Improve multi-object modeling to prevent errors in videos with multiple objects of the same type.