• Object recognition software (YOLO, etc.)
  • 3D modeling solutions (Agisoft, Autodesk 123d Catch)
  • Tensortube (from last year’s hackathon)


  • Input a video and end up with 3D models of objects within the video
  • Practice good hacking skills and teach freshman

How it works

  • Users who visit our website are met with a text input box for a YouTube url. When this url is submitted, our site uses PAFY to download the video for processing. PAFY
  • Our backend down-samples this video of various objects to a specified framerate, providing a still image JPEG dataset. The down-sampling rate can be adjusted to balance speed (computation time) and performance (model clarity).
  • YOLO, a real-time object detection program, is used to identify objects within the input video. Following the detection of these objects, the objects are cut out from the video, while their bounding boxes are removed. Ultimately, folders are created containing images of each object from various angles. YOLO YOLO
  • The openMVG library is run on each object to create point clouds using photogrammetry. openMVG
  • The MVE library is used on each point cloud to add density and provide a surface mesh that can be interpreted as a colorized 3D model. MVE
  • The website then uploads this 3D model to SketchFab, which is used to embed the model in a returned webpage for the user. sketchfab

Challenges we ran into

  • Two of our team members are first semester freshmen. Coordinating development environments on top of this inexperience was our first challenge. We spent most of the first night teaching and synchronizing.
  • We used a multiple environment and language specific libraries which would not run natively on our computers. This led us to use Docker containers.
  • Two of the steps in our pipeline have large computation times, especially at a high performance level. Testing incrementally was challenging, and producing demo-ready results took a long time.
  • After we finished the majority of our pipeline, we were still struggling with the modeling software failing on our image sets. This was a large technical challenge to work out the exact requirements in terms of EXIF metadata and resolution to produce a model.

Accomplishments that we're proud of

It works

What's next for Vid2Mesh

  • Train YOLO to identify very specific objects for very specific purposes. Example: Train YOLO to identify a wide variety of medications straight off the shelf, serving to simplify the process of healthcare facilities ordering from manufacturers.
  • Improve multi-object modeling to prevent errors in videos with multiple objects of the same type.
Share this project: