TRACI: Transformer-Reducing Agents Create Intelligence

Inspiration

Modern AI systems rely on single large models to perform prediction tasks, but intelligence in the real world emerges from cooperation. Human reasoning relies on neural centers. Even the intelligence of society results from the collective thinking and work from different people.

TRACI was inspired by this idea: instead of scaling a single model, what if we scale cooperation? By distributing inference across transformer-reduced agents and aggregating them through attention, we aim to create more robust, scalable, and cooperative intelligence.

What it does

TRACI allows users to solve complex prediction tasks using distributed, customizable cooperative agents that learn and reason in parallel. Because of TRACI’s distributed architecture, multiple transformer-reduced agents process tasks simultaneously and combine their representations through attention, producing more robust and scalable predictions.

How I built it

TRACI is structured as a distributed multi-expert system with an attention-based aggregator.

Each worker hosts an embedding module (FNN) that transforms raw inputs into learned representations. These modules act as independent experts, operating in parallel across GPUs. Instead of directly sharing rapidly changing model weights, each expert maintains a Polyak-averaged target network. Only these smoothed representations are exchanged across workers, ensuring stability and reducing variance in distributed training.

The embeddings from all experts are stacked and passed to a central Transformer reducer. The Transformer treats each expert’s embedding as a token in a sequence, applies multi-head attention to model inter-expert relationships, and produces a final prediction. This allows TRACI to dynamically weight and combine expert perspectives rather than averaging them statically.

Challenges I ran into

Communication between different agents required careful synchronization and ordering of messages. In addition, the training process was slowed by the movement of messages and model weights through queues.

Accomplishments that I'm proud of

I am delighted to see the model that trains separately actually trains and increases predictive ability over time. It demonstrates that components can train separately, but require soft updates to ensure stability.

What I learned

I learned how independent agents can learn together when their interactions are carefully managed.

What's next for TRACI: Transformer-Reducing Agents Create Intelligence

I would like to improve the abstraction ability of TRACI by including more robust reasoning mechanisms while keeping TRACI as efficient and as distributed as possible.

Built With

modal
python
pytorch
streamlit

Updates

Abhay Pokhriyal started this project — Mar 01, 2026 04:11 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.