BitStorm

Geobit
UI - Main
UI - Dashboard overview for feature evaluation
UI - Full feature classification reasoning
UI - Feature classification and reasoning

Inspiration

Compliance decisions often bottleneck product velocity. Teams ship specs with geo/age logic, but legal review is slow, inconsistent, and rarely documented well enough for audits. We wanted a system that meets product speed, makes evidence-backed decisions, and remains HITL-friendly—so humans can override when needed and the paper trail stays pristine.

What it does

BitStorm (a.k.a. ComplianceGuardAI+) ingests feature descriptions (CSV or quick-add), normalises jargon, plans retrieval, collects evidence (KB/Web), synthesises findings, and outputs an audit-ready decision (with confidence, conditions, and citations). The FE streams decisions row-by-row, and reviewers can approve/reject with rationale—all persisted for audits and re-hydration by feature_id.

Solution Overview

We developed a multi-agent architecture to automate complex regulatory analysis workflows, powered by OpenAI Agents SDK.

Workflow architecture:

(Pre Screen Agent -> Jargon Agent -> Planner Agent -> Retriever Agent -> Synthesiser Agent -> Reviewer Agent -> Summariser)

Pre-screener Agent: First-layer validation, to flag business-only or ambiguous specs before sending analysis pipeline.
Jargon Agent: Equipped with RAG and Web search capabilities, this agent maps acronyms to clear terms so downstream prompts operate on standardised language.
Analysis Planner Agent: Plans retrieval intents (queries + soft tags) from the standardised spec, used by our retriever agent for retrieving evidences.
Retriever Agent: Hits an internal mini-KB (+ optional web) and returns structured evidences with proper citations.
Synthesiser Agent: Turns structured evidences into findings with supports and potential open questions for further investigations.
Reviewer Agent: Handles scores finding, penalises uncertainty/blocked questions, and returns decision and justification.
Summariser Component: Shapes the final event to be yielded to the frontend as workflow completion

All workflow runs are traced using OpenAI Agent SDK, creating comprehensive audit logs that capture decision rationale, confidence scores, and evidence sources for regulatory compliance.

Challenges we ran into

Hallucination pressure: Prevented with retrieval-first prompts, schema-gated outputs, and a Summariser that refuses uncited claims.
Evidence contradiction: Findings can disagree; Reviewer aggregates with penalties and may reject if there is insufficient information.
Vocabulary drift: Jargon Normaliser ensures stable inputs; the State object will contain normalised jargons that will be used as context to ensure LLMs are context-aware.

What we Learned

Retrieval-augmented prompts dramatically reduce hallucinations when the summariser only accepts cited facts.
Sorted Data adds greater predictability and stability to the decision output/confidence

Accomplishments that we're proud of

Robust multi-agent architecture that is highly maintainable and scalable
- Each agent handles a single responsibility with clear interfaces, making the system highly maintainable and allowing independent updates without breaking the pipeline
- Modular design enables easy addition of new agents (e.g., specialized compliance checkers) without architectural changes
Asynchronous FastAPI Backend with production-ready and modular design
- Asynchronous design with agent_service singleton initialized during application startup for optimal performance
- Clean dependency injection pattern ensures consistent agent state management across all API endpoints
End-to-end streaming from spec upload → decision UI with explanations + citations.
- HITL loop that’s one click from the table, persists rationale, and updates the audit trail.
- Stable identifiers (feature_id UUID) that make re-hydration and cross-tool links trivial.

What's next for BitStorm

Improving on the HITL Pipeline: Allow the Model to learn from the Human Injection such that future similar features can be deducted correctly.
Auto-rechecks: Nightly re-analysis when laws/policies change, pushing deltas to Review Queue.
A2A framework for standardised agent-to-agent communication