🪄 VisionForge: 3D-Magic-Pencil

Imagine pointing your Spectacles at a doodle, saying "Shazam", and instantly seeing it transform into a 3D object anchored in your real world.
VisionForge is an AI-powered AR experience that converts sketches and voice commands into immersive 3D assets — in seconds.

✨ Inspiration

I've always been fascinated by Harry Potter and other magic-filled worlds — where you draw in the air or wave a wand and it comes to life.
As an AR + AI enthusiast, I wanted to bring that same sense of wonder into reality — turning sketches into living 3D objects with just a word.

🪄 What It Does

  • 🎨 Sketch-to-3D: Point your Spectacles at a doodle and say the magic word.
  • 🗣 Voice-activated: Trigger the scan hands-free with a voice command (default: "Shazam").
  • 🧊 Instant 3D Generation: AI creates a textured model from your drawing.
  • Anchored in Reality: See it placed right in your world using AR.

🛠 How We Built It

  1. Computer Vision + AI
    Used OpenAI Vision API to process camera frames and generate 3D-friendly prompts.
  2. Text-to-3D Conversion
    Integrated Meshy + Snap3D to generate models in real time.
  3. AR Placement
    Leveraged Lens Studio + Instant World Hit Test to anchor assets in space.
  4. Voice Control
    Added speech recognition to trigger the entire process hands-free.
  5. UX Magic
    Built a custom edge-fade masking effect for a smooth, immersive AR experience.

⚔️ Challenges I Ran Into

  • Understanding the codebase for Spectacles and Lens Studio.
  • As an Python user, working with typescript and js was quite complex ( even after taking help of AI tools)
  • Balancing speed and quality of 3D generation.
  • Crafting precise prompts so the AI focuses on the doodle and not the background.
  • Ensuring anchoring accuracy so assets spawn exactly where expected.
  • Making voice recognition work well in noisy hackathon venues.

🏆 Accomplishments I am Proud Of

  • Built a working end-to-end pipeline: sketch → AI prompt → 3D model → AR placement.
  • Achieved real-time model streaming for a seamless experience.
  • Designed a magical, playful UX that makes people smile.
  • Successfully integrated multiple APIs and SDKs into one smooth workflow.

📚 What We Learned

  • The power of prompt engineering in controlling AI outputs.
  • Optimizing real-time AR performance without compromising immersion.
  • How effective and interesting is the AR and VR domain in terms of technical aspect as well.
  • How to orchestrate complex pipelines between Vision, Meshy, and Lens Studio.
  • That a touch of magic (voice triggers, edge-fade effects) makes tech feel human.
  • How effective can be AI tools like Codex, cursor and Clint for agile software development.

🚀 What's Next for VisionForge: 3D-Magic-Pencil

  • 🎨 Texture Generation: Full-color, photorealistic models.
  • 🌍 Multiplayer Mode: Let multiple users share and interact with the same AR objects.
  • ✍️ Gesture-based Drawing: Draw in mid-air — no paper needed.
  • 📚 Model Library: Save and share your creations with the community.
  • 📱 Mobile Support: Extend to smartphones and tablets.

📜 License

This project is licensed under the MIT License
© 2025 VisionForge Team

Built With

Share this project:

Updates