Cortex — AI Chief of Staff

Cortex-AI chief of staff

Inspiration

As engineering teams scale from 20 to 200+ people, a dangerous pattern emerges: the leaders who need the most complete picture are the last to know when something is about to break. A security violation sits open for 17 days before the CTO hears about it. An engineer works 56 hours/week for two consecutive weeks — no one notices until they resign. A critical infrastructure decision is deadlocked for 14 days because no one owns it.

We asked ourselves: What if a CTO had an AI chief of staff that read every email, every Slack thread, every Jira ticket — and surfaced only what matters, before it becomes a crisis?

That question became Cortex.

What It Does

Cortex is an AI-powered organisational intelligence platform that acts as a virtual Chief of Staff for engineering leaders. It continuously monitors communication across Email, Slack, Jira, and Git — using a 6-agent AI pipeline powered by Amazon Nova Pro — to:

Extract decisions from meetings and messages with confidence scores and ownership tracking Detect cross-team conflicts (e.g., one team deploying Basic Auth while another mandates OAuth 2.0) Surface shadow topics — emerging risks being discussed in DMs with no formal owner or ticket Monitor organisational health with a real-time score $$ H = w_1 \cdot S_{jira} + w_2 \cdot S_{comm} + w_3 \cdot S_{git} + w_4 \cdot S_{capacity} $$ computed from ML signals, not surveys Brief the CTO every morning with a voice-first AI assistant (ARIA) powered by ElevenLabs A CTO can go from zero context to complete organisational visibility in 12 minutes — without opening Slack, email, or Jira.

How We Built It

The 6-Agent LangGraph Pipeline At the core of Cortex is a LangGraph-orchestrated pipeline of 6 specialised AI agents, each powered by Amazon Nova Pro via Bedrock:

$$\text{Event} \xrightarrow{\text{Archivist}} \xrightarrow{\text{Analyst}} \xrightarrow{\text{Auditor}} \xrightarrow{\text{Inspector}} \xrightarrow{\text{Director}} \xrightarrow{\text{Briefer}} \text{Intelligence}$$

Archivist — Retrieves sender history, recent decisions, and trending topics from the knowledge graph Analyst — Extracts decisions, tasks, and claims with confidence scores as structured JSON Auditor — Validates extraction quality, flags low-confidence items Inspector — Detects contradictions against existing knowledge (conflict detection) Director — Determines routing: who to notify, what to escalate Briefer — Generates executive summaries, headlines, and action items We chose 6 specialised agents over a single monolithic prompt because each agent focuses on one cognitive task — reducing hallucination and making every extraction independently verifiable.

Architecture

Backend: FastAPI + PostgreSQL + Neo4j (knowledge graph with temporal versioning) + Redis + Celery Frontend: Next.js 14 (App Router) + Chakra UI + Cytoscape.js for interactive graph visualisation Data Integration: 4 MCP (Model Context Protocol) servers for Email, Slack, Jira, and Git — enabling plug-and-play data source swapping Voice: Amazon Nova Pro for reasoning + ElevenLabs eleven_turbo_v2_5 (Rachel voice) for natural speech Real-time: WebSocket for live notifications without polling Infrastructure: Docker Compose orchestrating 10 services Why Amazon Nova Pro? We selected Nova Pro for four critical capabilities:

Requirement Why Nova Pro Excels

Structured JSON Output Decision extraction requires precise nested fields — owners, teams, confidence scores. Nova Pro consistently produces valid structured output Instruction Following Each of our 6 agents has a highly specific system prompt. Nova Pro follows complex multi-step instructions without drift Low-Latency via Bedrock Meeting transcripts need near-real-time processing. Bedrock's infrastructure provides consistent low-latency inference Factual Precision Extracting real names and ticket IDs (e.g., "SEC-STD-012") from noisy conversational text requires high accuracy Shadow Topic Detection One of our most novel features uses semantic clustering across all channels. We compute topic similarity using:

$$\text{similarity}(t_i, t_j) = \frac{\vec{v_i} \cdot \vec{v_j}}{|\vec{v_i}| \cdot |\vec{v_j}|}$$

When a cluster of semantically related messages crosses a volume threshold with no associated ticket, decision, or owner, it surfaces as a shadow topic with an urgency score:

$$U = \alpha \cdot \text{frequency} + \beta \cdot \text{cross_team_spread} + \gamma \cdot \text{velocity} + \delta \cdot (1 - \text{formal_coverage})$$

This caught risks in our demo that no existing tool would surface — like "Caching Reliability" (urgency 94) being discussed across 3 teams for weeks without anyone creating a ticket.

Challenges We Faced

Multi-Agent Coordination Getting 6 agents to work reliably in sequence — where each agent's output becomes the next agent's input — required careful state management. LangGraph's state graph abstraction was essential, but debugging pipeline failures where Agent 3 rejected Agent 2's output taught us the importance of structured intermediate schemas between each agent.
Knowledge Graph Versioning Decisions evolve. A decision proposed on Monday might be confirmed Wednesday and deprecated Friday. We needed Neo4j nodes with valid_from and valid_to timestamps to enable time-travel queries — viewing the organisational state at any past date. Getting this right with concurrent writes from the Celery worker was challenging.
Conflict Detection at Scale Detecting contradictions requires comparing every new extraction against the entire existing knowledge graph. Naive pairwise comparison is $O(n^2)$. We solved this with semantic indexing via pgvector — new claims are compared only against semantically similar existing claims, reducing the search space dramatically.
Voice-First UX Building a voice assistant that provides genuinely useful organisational answers (with real names, ticket IDs, and specific actions) required giving ARIA access to the full organisational context — conflicts, decisions, people, and recent communications — in every prompt. Balancing context richness with response latency was a constant tradeoff.
Docker Orchestration Coordinating 10 services (PostgreSQL, Neo4j, Redis, FastAPI, Celery worker, 4 MCP servers, Next.js) with proper health checks, dependency ordering, and environment variable passing required extensive Docker Compose tuning. Services needed to wait for databases to be truly ready, not just running.

What I Learned

Specialised agents beat general-purpose prompts. A single LLM call trying to extract decisions, detect conflicts, and generate summaries simultaneously produces unreliable results. Separating concerns across 6 agents dramatically improved accuracy. The most dangerous risks are the ones nobody has named. Shadow topic detection — finding patterns in conversations that haven't been formalised — was the feature that surprised us most. It finds things no dashboard can show because they don't exist as structured data yet. Amazon Nova Pro's structured output quality is production-grade. For our use case — extracting precise JSON with nested fields from noisy conversational text — Nova Pro via Bedrock delivered consistent, low-latency results that made the 6-agent pipeline viable. Voice changes everything. When a CTO can ask "What's urgent today?" and hear a complete answer in 8 seconds — with real names and specific actions — it fundamentally changes how they interact with organisational data.

What's Next

Predictive risk scoring using historical pattern analysis Multi-organisation support with role-based access control Mobile companion app for on-the-go briefings Fine-tuned Nova Pro model specifically for organisational decision extraction

Built With

Updates

Raj Singh started this project — Mar 16, 2026 07:10 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.