Inspiration

While learning or following recipes, we noticed that many cooking instructions assume prior knowledge. Terms like “a pinch,” “medium flame,” “sauté lightly,” or even ingredients like “clove (elaichi)” are obvious to experienced cooks but confusing to beginners and non-native speakers. Most recipe platforms rely heavily on text, which often fails to explain what an instruction actually looks like in practice. This gap between written instructions and real-world understanding inspired us to build Vyanjan AI.

What it does

Vyanjan AI is a visual cooking assistant that converts a recipe into simple, beginner-friendly steps. For each step, it generates a structured JSON prompt and a corresponding visual explanation. The JSON describes the scene, ingredients, quantities, and actions, while the image shows exactly what the step should look like. This removes ambiguity and helps users understand cooking instructions without guessing.

How we built it

We designed Vyanjan AI around a structured pipeline rather than free-form prompting. A recipe is first broken down into clear, simple steps. Each step is converted into a detailed JSON prompt that explicitly defines the visual intent. This JSON is then used to generate realistic cooking images using a local image generation pipeline compatible with FIBO. The frontend displays the JSON transparently alongside each generated image, making the AI process explainable and easy to understand.

Challenges we ran into

One major challenge was ensuring reliability in image generation, as external APIs were unstable during development. To solve this, we shifted to a local image generation setup, allowing us to maintain consistency and avoid downtime. Another challenge was balancing detail and simplicity — we had to ensure the JSON prompts were rich enough for accurate visuals but still readable and understandable for demonstration purposes.

Accomplishments that we’re proud of

We successfully built a system that doesn’t hide AI behavior behind a black box. By exposing the JSON prompts and pairing them with visuals, Vyanjan AI demonstrates transparency, structure, and real-world usability. We’re especially proud of creating a beginner-focused experience that bridges the gap between abstract cooking instructions and practical understanding.

What we learned

We learned that clarity is more important than complexity. A well-structured system with clear intent can be more impactful than a feature-heavy application. We also learned the value of explainable AI — showing how an output is generated builds more trust than simply presenting the output itself.

What’s next for Vyanjan AI

Next, we plan to expand Vyanjan AI beyond cooking by adapting the same visual-grounding approach to other instruction-based domains such as education and DIY tasks. We also aim to improve automatic step extraction and integrate real-time image generation directly into the workflow, making the system more dynamic and scalable.

Built With

Share this project:

Updates