Inspiration

Information today rarely arrives in a single, clean format. The same event often appears simultaneously as text, images, audio, and video—each telling a slightly different story. Most AI systems analyze these inputs in isolation or reduce them to summaries, which hides contradictions, missing context, and weak causal assumptions.

SignalWeaver was inspired by this gap. Instead of asking an AI to answer questions about media, I wanted to build a system that could reason over how events unfold, across modalities and over time, and expose that reasoning in a transparent way.


What SignalWeaver Does

SignalWeaver is a Gemini 3–powered multimodal reasoning system that constructs inspectable causal graphs from text, images, audio, and video. Rather than producing chat-style responses, the system builds structured reasoning artifacts that represent:

  • observed events (directly supported by input data)
  • inferred events (logical deductions)
  • temporal ordering
  • cause–effect relationships with confidence ranges

The result is a causal model that users can inspect, question, and understand—rather than a single opaque output.


How We Built It

SignalWeaver is implemented as a multi-agent orchestrated pipeline, with each agent powered by Gemini 3 and responsible for a specific reasoning task:

  1. Perception Agent
    Extracts structured claims from each modality using gemini-3-flash-preview.

  2. Temporal Reasoning Agent
    Orders events along a relative timeline and detects missing or inconsistent sequences using gemini-3-pro-preview with Thinking Config enabled.

  3. Cross-Modal Consistency Agent
    Compares claims across modalities to surface contradictions without assigning truth labels.

  4. Causality Agent
    Infers cause–effect relationships and distinguishes correlation from causation. If required preconditions are missing, the system automatically triggers a refinement loop to re-evaluate the timeline.

  5. Synthesis Agent
    Produces the primary output: a structured causal graph separating observed and inferred events, with confidence bands attached to individual causal links.

All intermediate agent outputs are preserved as structured JSON artifacts and exposed through an interactive Reasoning Trace panel.


Challenges Faced

One of the main challenges was avoiding the “single-prompt” trap. Designing the system as a persistent, self-correcting pipeline required careful separation of responsibilities between agents and strict use of structured outputs instead of free-form text.

Another challenge was visualization. Presenting causal reasoning without turning it into a dashboard or a chat interface required custom SVG-based graph rendering and deliberate UI restraint.


What We Learned

This project reinforced that multimodal AI becomes far more powerful when it is treated as a reasoning system, not a conversational interface. Gemini 3’s ability to reason across modalities and maintain context over multiple steps made it possible to build an application that exposes how conclusions are formed, not just what they are.


Future Directions

Future work could include incremental streaming inputs, richer causal graph interactions, and support for long-running reasoning sessions that evolve as new evidence is introduced.

Built With

  • gemini-3-flash-preview
  • gemini-3-pro-preview
  • google-ai-studio
  • google/genai-sdk
  • react
  • svg-(custom-visualization)
  • tailwind-css
  • typescript
Share this project:

Updates