SozoFix

Inspiration

🛠️ The inspiration for SozoFix came from a desire to combat the "throwaway culture" that dominates our modern world. We saw a future where fixing, not replacing, is the default. Repairing saves precious resources and reduces landfill waste by extending a product's life. Reusing items cuts down on new material consumption and lowers our collective carbon footprint. Most creatively, Reinventing (or upcycling) transforms discarded objects into new, valuable items, fostering creativity and preventing waste.

🌱 We realized that the biggest barrier for most people isn't a lack of desire, but a lack of confidence and expert guidance. SozoFix was born from this vision: to create an AI-powered companion that makes sustainability accessible to everyone. We wanted to build a tool that could look at a broken toaster, a pile of old bottle caps, or a wobbly chair and say, "I can help you fix that," empowering anyone to become part of a more resource-efficient future.

What it does

🧠 SozoFix is an AI-powered platform designed to guide users through any DIY project from start to finish. A user can upload a photo of their problem, be it a broken appliance or a pile of recyclable materials and our AI analyzes it to generate a complete, custom project plan. This plan includes a title, a description, a list of necessary tools, and a set of clear, multimodal step-by-step instructions. For Upcycling and Sustainable Crafts, if a user does not set a goal, the AI will use the image to give the user a choice between three project ideas.

🎙️ The core feature is the AI Assistance Call. At any point, the user can initiate a real-time voice conversation with "Alfred," our expert AI assistant. Alfred is context-aware; it knows the user's name, their current project, and even remembers details from past conversations to provide a deeply personalized and intelligent support experience. It can answer specific questions about a step, offer general DIY advice, and gently guide the user toward completing their project, all with a natural, encouraging voice.

How we built it

🔧 SozoFix is built on a modern, robust tech stack, with frontend development primarily taking place inside the bolt.new integrated development environment. This allowed us to rapidly prototype and build out our product.

Frontend: The user interface is built with React and TypeScript, using Vite. We used Tailwind CSS for styling to create a clean, responsive design.

Backend: The server is a Python application built with the Flask web framework, serving as a secure API gateway.

Database & Authentication: We used Firebase for our familiarity with it and its comprehensive suite of tools, including Firebase Authentication, Realtime Database, and Storage.

Core AI Logic:

🧩 Project Generation: We use Google's Gemini API (gemini-2.0-flash) to analyze user images and generate custom project plans(text and step editing image generation). For text-to-speech we started with the new Gemini-2.5-flash-tts-preview but latency was a bit high. We then chose DeepGram, this improved project guide generation times by 60%.

💬 Conversational AI: The real-time voice conversation is powered by the ElevenLabs Conversational AI API, integrated via their official @elevenlabs/react SDK on the frontend.

🧠 AI Memory: Our backend uses Gemini (gemini-2.0-flash-lite in particular which is rapid and its impressive speed made this feature work seamlessly) to create a "pre-call briefing" by summarizing a user's past conversation transcripts. This summary is then passed to the ElevenLabs agent as context with the other dynamic variables like the user profile and the project the user is working on, allowing Alfred to have a continuous, evolving memory of each user.

👁️Augmented Reality(Experimental):This React component powers an AR viewer, overlaying digital instructions onto a live camera feed. It initializes a TensorFlow.js COCO-SSD model for real-time object detection and sets up a Three.js scene on a transparent canvas. The component dynamically parses step text to understand actions and tools, then uses detected objects to render AR hints (e.g., pulsing circles, marching lines) via Three.js. Concurrently, 2D SVG animations illustrate the action, and audio narration plays, creating an integrated augmented reality guide.

Architecture

SozoFix Architecture

Challenges we ran into

📱 Initially we wanted to go for a mobile app but starting expo in bolt especially from an iPad was challenging so we decided to go for web where we have more experience.

🐞 Our journey was a masterclass in debugging a complex, multi-service AI application. The challenges were exacerbated by the unique nature of our development process, where misunderstandings between our intentions and the AI's (both bolt.new's assistant and the models we were implementing) interpretation of our requests led to significant hurdles.

Our two biggest technical challenges were:

The Real-Time Voice Interface

🎧 Our initial attempts to build the voice call feature using native browser APIs failed due to inconsistent bugs across platforms. The breakthrough came from using the official @elevenlabs/react SDK, but even then, we had to solve a critical API mismatch. We learned that finding the right prompt and the right API call for our public agent was more important than writing more complex code.

Example challenges related to conversational AI:

⚠️ The "Instant Hang-up" Bug: On mobile browsers like Chrome on Android and Safari on iPadOS, the microphone would activate and immediately deactivate because the browser's speech recognition engine was too sensitive and would end prematurely.
🧟 The "Zombie Mic" Bug: On Safari, the microphone indicator would remain active even after the call was closed, indicating a resource leak that we had to meticulously track down and fix.

Implementing Augmented Reality (AR)

🪄 Beyond voice, our initial vision included an immersive AR feature where users could see instructions overlaid directly onto their project. We quickly discovered the immense complexity of creating a stable, cross-platform AR experience. Marker tracking, 3D model rendering, the limitations of the COCO-SSD’s vocab for object detection with tensorflow.js and real-world surface detection itself proved to be significant hurdles. This is where our development environment, bolt.new, played a fascinating role. It provided a baseline experimental feature for AR, which gave us a starting point but also highlighted the deep technical challenges involved. While we weren't able to fully realize our AR vision for V1, this struggle was a crucial learning experience in understanding the frontier of web-based AR.

Accomplishments that we're proud of

🏆 Overcoming those challenges is our biggest accomplishment. We are incredibly proud of building a complex, real-time, AI-powered voice application that works reliably. Specifically:

✅ Solving the Real-Time Puzzle (AI call): Successfully integrating five distinct services (React, Python/Flask, Firebase, Google Gemini, and ElevenLabs) into a single, cohesive application that functions in real-time.

🧠 Creating a Truly "Smart" Agent: The "Pre-Call Briefing" system is the feature we're most proud of. The AI doesn't just talk; it remembers. Having Alfred greet a user by name and ask about a previous project is the "wow" moment that makes SozoFix special.

🧑‍💻 Embracing a New Development Paradigm: Building this entire project within bolt.new pushed us to refine how we communicate with AI assistants, learning to be more precise with our prompts and instructions to achieve our desired results. The accessibility of bolt.new and the web interfaces of both Huggingface and GitHub allowed us to build this product primarily from an iPad.

What we learned

📚 This project was a deep dive into the future of software development. Our most important takeaways were:

🛡️ Trust the SDK: When a professional SDK is available, use it. It is designed to handle the complex and error-prone parts of web development (like cross-browser microphone access) for you.

🔍 Isolate and Test: The debug button became our most valuable tool. The ability to test a single part of the system independently was crucial in narrowing down and solving problems.

📖 Read the Docs, Then Read Them Again: A single sentence in the ElevenLabs documentation distinguishing between "Public" and "Private" agent connection methods was the key to solving our biggest connection issue.

What's next for Sozofix

🚀 Version 1 is just the beginning. We are incredibly excited about the future of SozoFix and have a clear roadmap ahead:

🧑‍🏭 Immersive AR Guidance: Our biggest ambition is to bring our AR concept to life. We can't wait to improve and build upon the experimental foundation provided by bolt.new to create a truly immersive experience. Imagine pointing your camera at a project and seeing animated arrows showing you exactly where to drill, or having 3D models of parts appear right in your workspace. This is the future of DIY instruction, and it's at the top of our priority list. We already have plans for training a custom object detection model for DIY in the next few weeks.

📸 Visual Feedback during Calls: We want to allow users to send images during a live call with Alfred. If a user says, "I'm not sure what this part is," they share their camera feed, and Alfred could analyze it in real-time to provide an answer.

🎞️ AI Generated Video: Enhance project guides with AI models like Veo 3 to provide the users with better visual steps.

🗣️ Proactive Assistance: We plan to use the project steps to make Alfred proactive. If a user is on "Step 3: Sanding the wood," Alfred could automatically start the conversation by saying, "I see you're on the sanding step. Do you have any questions about which grit of sandpaper to use?"

🌍 Community Project Sharing and Marketplace: We envision a platform where users can not only complete their projects but also share their creations, plans, and custom instructions with the SozoFix community, creating a collaborative ecosystem of DIY enthusiasts.

📚 Expanding Alfred's Knowledge: We plan to fine-tune a dedicated model on a vast library of DIY manuals, forums, and video tutorials to make Alfred an even more knowledgeable and capable expert