🪄 VisionForge: 3D-Magic-Pencil
Imagine pointing your Spectacles at a doodle, saying "Shazam", and instantly seeing it transform into a 3D object anchored in your real world.
VisionForge is an AI-powered AR experience that converts sketches and voice commands into immersive 3D assets — in seconds.
✨ Inspiration
I've always been fascinated by Harry Potter and other magic-filled worlds — where you draw in the air or wave a wand and it comes to life.
As an AR + AI enthusiast, I wanted to bring that same sense of wonder into reality — turning sketches into living 3D objects with just a word.
🪄 What It Does
- 🎨 Sketch-to-3D: Point your Spectacles at a doodle and say the magic word.
- 🗣 Voice-activated: Trigger the scan hands-free with a voice command (default: "Shazam").
- 🧊 Instant 3D Generation: AI creates a textured model from your drawing.
- ⚓ Anchored in Reality: See it placed right in your world using AR.
🛠 How We Built It
- Computer Vision + AI
Used OpenAI Vision API to process camera frames and generate 3D-friendly prompts. - Text-to-3D Conversion
Integrated Meshy + Snap3D to generate models in real time. - AR Placement
Leveraged Lens Studio + Instant World Hit Test to anchor assets in space. - Voice Control
Added speech recognition to trigger the entire process hands-free. - UX Magic
Built a custom edge-fade masking effect for a smooth, immersive AR experience.
⚔️ Challenges I Ran Into
- Understanding the codebase for Spectacles and Lens Studio.
- As an Python user, working with typescript and js was quite complex ( even after taking help of AI tools)
- Balancing speed and quality of 3D generation.
- Crafting precise prompts so the AI focuses on the doodle and not the background.
- Ensuring anchoring accuracy so assets spawn exactly where expected.
- Making voice recognition work well in noisy hackathon venues.
🏆 Accomplishments I am Proud Of
- Built a working end-to-end pipeline: sketch → AI prompt → 3D model → AR placement.
- Achieved real-time model streaming for a seamless experience.
- Designed a magical, playful UX that makes people smile.
- Successfully integrated multiple APIs and SDKs into one smooth workflow.
📚 What We Learned
- The power of prompt engineering in controlling AI outputs.
- Optimizing real-time AR performance without compromising immersion.
- How effective and interesting is the AR and VR domain in terms of technical aspect as well.
- How to orchestrate complex pipelines between Vision, Meshy, and Lens Studio.
- That a touch of magic (voice triggers, edge-fade effects) makes tech feel human.
- How effective can be AI tools like Codex, cursor and Clint for agile software development.
🚀 What's Next for VisionForge: 3D-Magic-Pencil
- 🎨 Texture Generation: Full-color, photorealistic models.
- 🌍 Multiplayer Mode: Let multiple users share and interact with the same AR objects.
- ✍️ Gesture-based Drawing: Draw in mid-air — no paper needed.
- 📚 Model Library: Save and share your creations with the community.
- 📱 Mobile Support: Extend to smartphones and tablets.
📜 License
This project is licensed under the MIT License
© 2025 VisionForge Team
Built With
- ai
- gemini
- javascript
- meshy3d
- openai
- python
- snap
- snap3d
- snapchat
- typescript
- vscode

Log in or sign up for Devpost to join the conversation.