Inspiration
This project first started when one of our teammates’ dads purchased a house, and found several defects. It was a real time sink for him to understand what was happening in the house and find the best maintenance company to address it. Now, you might be wondering we can just use our eyes, why do we need Grounded? What Grounded does is that it automates the entire process of finding defects, finding the proper maintenance company to help the homeowner, and then connecting them together. Now, let us get into the technical details of how this works.
How we built it
The user first logs in, using Firebase sign in with Google (Authentication). Then the user is able to upload a video of a room. We then use FastAPI to send the video to our back end. As none of our computers have compute, we have to use Modal (Cloud Compute) to host our ML models. We then run VGGT (Visual Geometry Grounded Transformer) on Modal, and simultaneously run a YOLO-V9 CV model to detect room defects. Modal then returns a .glb file to the front end, and we then display the 3D Reconstruction. After this, we highlight "defect zones" in red, and classify the issue, as well as the confidence and severity of the issue. Afterwards, we use the Google Places API, which utilizes data from Google Maps to get the business information including the name, star rating, phone number, and location of contractors close to the user's current geolocation. We then call the top contractors through Retell AI, using the phone number that we get from the Google Places API, and ask them their availability and pricing. We then automate invoices.
Challenges we ran into
To talk about the challenges that we ran into while we were building this project, we first tried running our models on our devices, which were not powerful enough. Due to this reason, we ended up transitioning to Modal, a platform that allows us to run code in the cloud.
Another issue that we had was that initially, we were creating a .ply (polygon) file, which would create a lot of white triangles at random locations instead of creating a 3D map. This led us to switch to displaying the .glb file, which ended up being a lot better.
Additionally, we tried to use the Overpass API instead of the Google Places API originally. However, that obviously didn’t work (even their website wasn’t working too well) , so we were forced to pivot to using the Google Places API.
Accomplishments that we're proud of
We are proud of several accomplishments in this project. First, we successfully implemented VGGT using Modal, which was a challenging task since we had limited prior experience with the platform. Despite the learning curve, we were able to get the system running reliably. One of the coolest aspects of our project is the 3D reconstruction of the room, which transforms a simple video into an interactive model that can be analyzed for defects. Another feature we are especially proud of is integrating the Retell API, which allows the system to automatically make calls to maintenance companies. This functionality helps bridge the gap between detecting issues and actually getting them repaired, making our project much more practical and useful in real-world scenarios.
What we learned
Through this project, we learned how to build an end-to-end AI pipeline that convert a user uploaded video into a usable 3D reconstruction. We gained experience integrating the various APIs involved like the Recal API and Google Places API. We also learned how to use a YOLO model to detect defects on a wall.
What's next for Grounded
In the future, we want to be able to add text message functionality to Grounded, as that is something that we cannot currently do since we are not a business.
Built With
- ai
- css
- firebase
- firestore
- google-auth
- google-places
- html
- huggingface
- javascript
- node.js
- opencv
- python
- react.js
- retellai
- ultralytics
- vggt
- vite.js
- workflow
- yolo
Log in or sign up for Devpost to join the conversation.