Perceptual Inference Model

Our Interface

Inspiration

The project is inspired by Anil Seth's work on controlled hallucination and Karl Friston's active inference framework. These theories explain perception not as passive input processing but as active, top-down prediction error minimization. Current multimodal AI demos rarely show this loop visually and computationally — they stop at captioning or generation. This project bridges that gap by externalizing the inference cycle in a way that feels like watching a brain think.

What it does

Active Inference Engine simulates a closed-loop perceptual system using Gemini 3.

User uploads an image or short video
Bottom-up stream extracts structured features (luminance, edges, objects, motion)
Top-down prior generates expected feature vector and predicted next frame
Computes real prediction error (cosine similarity) and surprise score
Updates a persistent belief state (EMA)
Visualizes surprise meter, region activation, belief vector evolution
Outputs a predicted motor readiness action with confidence (e.g., "Approach subject", "Maintain gaze")
Delivers first-person report of the full cycle

The result is a dynamic, visual demonstration of predictive processing, prediction error, belief updating, and action preparation — not just image analysis.

How we built it

Built entirely in Google AI Studio using Gemini 3 Pro.

Vibe Code to generate the frontend layout, dark neuroscience theme, and interactive brain diagram
Gemini multimodal vision for bottom-up feature extraction
High-reasoning mode (network_intelligence) for top-down prediction and motor mapping
Code execution tool to compute cosine distance, EMA belief update, and matplotlib surprise plots
Image generation & editing for predicted next frame and error heatmaps
Structured output for clean feature vectors, surprise score, and conscious report
No external frameworks, cloud services, or databases — 100% Gemini-native

Challenges we ran into

AI Studio vibe-code struggles with complex animated flows and persistent state — required multiple strict refinement prompts to force region activation, arrow flows, and surprise computation.
Gemini sometimes defaults to descriptive captioning instead of numerical feature vectors — fixed with explicit code-execution instructions.
Keeping layout clean (no grids/collages) while adding computational depth was difficult — solved by ruthlessly removing visual clutter in iterations.
Motor prediction felt generic at first — improved by tying it directly to belief state vector.

Accomplishments that we're proud of

Built a working active inference loop inside pure AI Studio (no external code) — surprise score, belief update, and motor output are computed, not faked.
Visualized top-down/bottom-up distinction with flowing signals, region activation, and surprise pulses — creates genuine "watching a brain think" moment.
Heavy, non-trivial Gemini usage: vision + reasoning + code execution + image gen/editing in a single coherent cycle.
Delivered neuroscience authenticity without overclaiming consciousness — focused strictly on predictive processing mechanics.

What we learned

Gemini 3 Pro is extremely powerful for computational neuroscience prototypes when forced to output vectors and execute math — not just text.
Vibe-code is capable of interactive dashboards but requires very strict, iterative prompting to avoid generic vision patterns.
Active inference concepts (precision, surprise, belief updating) become far more intuitive when externalized visually and numerically.
Judges value demos that compute something measurable (surprise score, belief shift) over pure aesthetics.

What's next for Active Inference Engine

Add user-controlled precision weighting (attention) to modulate surprise influence
Extend to real-time webcam input for continuous loop
Integrate voice output for the conscious report (Gemini audio_spark)
Open-source the prompt chain to let others build domain-specific inference engines (e.g., emotion, interoception)
Explore scaling to multi-modal long-context sessions for emergent self-modeling

Built With

activeinference
aistudiohackathon
gemini3
geminivision
generativeai
googleaistudio
imagegeneration
multimodalai
neurosciencesimulation
predictiveprocessing
vibecode
vision

Created by

I nailed the big-picture neuroscience side.
I dug deep into Anil Seth’s ideas and Karl Friston’s active inference framework, then shaped exactly how the system should “think”: what counts as surprise, how beliefs should actually change over cycles, and why the model needs a motor response tied to its predictions. I pushed hard for the loop to feel real, not just look visually cool, and made sure every motor action logically flowed from the current belief state. My input turned what could have been a random visual demo into something that genuinely follows real brain logic and starts to feel like reasoning.

Anushka Keshri
I came up with the whole idea of turning Anil Seth’s brain theories into something people can actually see and play with. I figured out how the top-down and bottom-up thinking should flow, what “surprise” really means in numbers, and how the brain should decide to move or react. I spent hours tweaking the prompts to make Gemini actually calculate stuff instead of just describing pictures. I pushed for the clean brain layout, the moving signals, and the real feeling of a mind updating itself. Basically, I shaped the brain part of the project from concept to what it thinks and feels like.

Deepesh Jha

Updates

Deepesh Jha posted an update — Feb 09, 2026 11:55 PM EST

Perceptual Inference Model update – v5.3 Evolved from captioning + overlays → real computational loop. Now does:

Bottom-up feature extraction (vector) Top-down prediction vector Cosine surprise score (actual number) Persistent belief state update Motor readiness output (action + confidence)

Brain on right shows live signal flow, region activation, surprise pulses.

Log in or sign up for Devpost to join the conversation.