AlComm — Your AI-Powered Unified Communications Command Center

What if every message you receive — email, Slack, LinkedIn DM — landed in one inbox, and an AI agent decided what to do with it before you even looked?


Inspiration

We're drowning. The average knowledge worker checks 3-5 communication platforms dozens of times a day. Gmail, Slack, LinkedIn, Teams — each with its own inbox, its own notifications, its own mental overhead. We built AlComm because we were tired of context-switching between apps just to figure out what actually needs our attention.

The breakthrough insight: most messages don't need you. Newsletters, automated notifications, FYI threads — they're noise. What if an AI agent could triage your inbox like a chief of staff, surfacing only what matters and taking action on the rest?

That's AlComm.


What It Does

AlComm is a unified communications dashboard that aggregates Gmail, Slack, and any web-based chat app (LinkedIn, etc.) into a single Discord-like interface — powered by autonomous AI agents that don't just classify your messages, but act on them.

The Core Experience

One inbox, every platform. Gmail threads, Slack channels, LinkedIn DMs — all in a clean three-panel layout. No more tab-switching.

AI agents that think, not just label. Five autonomous agents run continuously:

Agent What It Does Trigger
Triage Classifies, prioritizes, extracts todos, drafts replies Every inbound message
Follow-up Tracks commitments, nudges on overdue items Every 30 minutes
Scheduling Detects meeting requests, creates RSVP todos Calendar content detected
Digest Generates actionable daily/weekly briefings On-demand or daily
Proactive Surfaces volume spikes, response pattern anomalies Every 2 hours

Each agent has access to 31 tools — classify, prioritize, create todos, draft replies, snooze, archive, star, mute, create Asana tasks, search history, and more. The AI decides which tools to use based on the message. No hardcoded pipeline.

Connect any chat app — no API required. Our browser connector uses Playwright to let you log into any web app (LinkedIn, Instagram, Twitter) through an embedded browser panel. An AI vision agent then automatically discovers the chat UI's CSS selectors, and AlComm starts polling for messages headlessly. It's like giving AlComm eyes.

Key Features

  • Smart tab classification with cross-classification (threads can appear in multiple tabs)
  • Intent-based filtering — tab descriptions are natural language, not regex
  • Priority scoring (0.0-1.0) with urgency detection
  • AI draft suggestions with tone control
  • Reply / Reply All for Gmail with rich text editor
  • Scheduled send and undo send (5-second window)
  • Todo extraction from message content
  • Follow-up reminders from outbound commitments
  • Contact intelligence — response patterns, interaction history
  • Tone analysis on every message
  • Smart compose — describe what you want to say, AI writes it
  • Weekly digest — actionable briefing, not just a summary
  • Newsletter management — one-click unsubscribe + bulk archive
  • Calendar RSVP from email invites
  • Asana integration — agents auto-create tasks for actionable items

How We Built It

Architecture

Frontend (React/Vite)          Backend (FastAPI)              AI Layer
┌──────────────────┐    ┌───────────────────────┐    ┌─────────────────────┐
│  3-Panel Layout   │◄──►  18 API Routers       │    │  5 Autonomous Agents│
│  32 Components    │    │  WebSocket Manager    │◄──►│  31 Registered Tools│
│  Zustand Stores   │    │  Gmail/Slack Pollers  │    │  OpenRouter (GPT)   │
│  RichTextEditor   │    │  Browser Connector    │    │  Fallback Chain     │
└──────────────────┘    └───────────────────────┘    └─────────────────────┘
                              │
                    ┌─────────┴─────────┐
                    │   PostgreSQL DB    │
                    │   13+ tables       │
                    │   JSONB metadata   │
                    └───────────────────┘

Stack: FastAPI (Python) + React (Vite/TypeScript) + PostgreSQL + Redis

AI Model: OpenAI GPT-5.4 Nano via OpenRouter BYOK — chosen for its strong tool-calling capabilities at minimal cost (~$15/month for full usage).

Codebase: ~20,000 lines across 4,000+ files, 15 test suites, 18 API routers.

The Agent System

This is where AlComm gets interesting. We didn't build an LLM wrapper — we built an agentic system.

The difference:

$$\text{LLM Wrapper: } \text{input} \xrightarrow{\text{fixed pipeline}} \text{output}$$

$$\text{Agent: } \text{input} \xrightarrow{\text{LLM decides}} \text{tool}_1 \xrightarrow{\text{observe}} \text{tool}_2 \xrightarrow{\text{observe}} \cdots \xrightarrow{\text{done}} \text{output}$$

Each agent receives a message, gets a set of available tools (as OpenAI-format function definitions), and the LLM decides which tools to call, in what order, based on the message content. A newsletter gets archived. A CEO email gets prioritized, a todo extracted, and a draft generated. A meeting invite gets a RSVP todo and a scheduling delegation.

The agent loop:

while not done:
    action = await llm.decide(context + observations)
    result = await execute_tool(action)
    observations.append(result)

The Browser Connector

This was the hardest part. How do you integrate with apps that have no API?

Answer: You give the AI a browser.

  1. User clicks "Add Custom App" and enters a login URL
  2. AlComm opens a Playwright browser, streams it as a VNC-style canvas in the UI
  3. User logs in normally (we handle Google OAuth, 2FA, etc.)
  4. We save the session cookies
  5. An AI vision agent takes a screenshot, identifies the chat UI layout, and discovers CSS selectors
  6. AlComm starts polling headlessly — reading messages and ingesting them into the standard pipeline

The vision-based discovery uses GPT-5.4 Nano's multimodal capabilities:

Screenshot → Vision model identifies regions →
DOM inspection at coordinates → CSS selectors extracted →
Recipe saved → Headless polling begins

We tested this with LinkedIn DMs — reading 38 messages and sending replies through the browser connector, completely headlessly.

The Fallback Chain

AI reliability was critical. We built a 7-model fallback chain:

$$\text{GPT-5.4 Nano} \xrightarrow{429} \text{Hermes 3} \xrightarrow{429} \text{Step 3.5 Flash} \xrightarrow{429} \cdots \xrightarrow{\text{all fail}} \text{Structured Fallback}$$

If every model is rate-limited, the app still works — returning structured data instead of AI-generated content. Graceful degradation, not error screens.


Challenges We Faced

1. The Classifier Problem

Our first classifier worked for our test data but failed in production. The prompt was tuned for specific email addresses instead of working generically. A GitHub notification from username "alhirani13" got classified into the "Al" tab (meant for a person named Al Hirani) because of fuzzy name matching.

Solution: We rewrote the classifier to be strictly intent-based with explicit rules about person matching vs. bot detection. Tab descriptions are user-defined natural language filters — the AI interprets intent, not keywords.

2. Rate Limits on Free Models

Building with free OpenRouter models meant constant 429s. Every API call needed retry logic, and the agent loop (which makes 3-6 calls per message) would exhaust rate limits quickly.

Solution: BYOK with GPT-5.4 Nano at ~$0.0003 per email. The entire agentic system costs less than a coffee per month.

3. LinkedIn's Anti-Bot Detection

LinkedIn aggressively detects automated browsers. Our first attempts failed — sessions got invalidated, Google OAuth blocked the embedded browser, and the DOM structure had changed from what our selectors expected.

Solution: Chrome stealth flags (navigator.webdriver removal, real user agent), session cookie persistence, and vision-based selector discovery instead of hardcoded CSS classes. The AI adapts to whatever DOM it finds.

4. The "Agentic vs Wrapper" Question

We honestly assessed our system mid-build and realized it was an LLM wrapper — a fixed pipeline running the same 6 AI calls on every message. The AI never decided anything.

Solution: We rebuilt the entire pipeline as a tool-calling agent loop. The Triage Agent now receives 31 tools and decides which to use. A newsletter needs 1 tool (archive). A CEO email needs 5 (classify, prioritize, extract todo, star, draft reply). The AI makes that call, not us.


What We Learned

  1. Production-first prompts. Never tune for test data. Every prompt must work for any user, any configuration, any content.
  2. Vision beats DOM parsing. Screenshots + multimodal LLMs are more reliable than CSS selectors for cross-platform browser automation.
  3. Graceful degradation > error handling. When AI fails, show structured data, not error screens.
  4. The line between wrapper and agent is tool-calling. You stop telling the AI what to do and start telling it what it can do.
  5. Cost estimation matters. We calculated ~$15/month for full agentic usage with GPT-5.4 Nano before committing to paid models.

What's Next

  • More browser connectors — Instagram DMs, Twitter/X, WhatsApp Web
  • Smarter recipe discovery — multi-step vision agent that clicks through UIs autonomously
  • Agent memory — agents learn your preferences across sessions (infrastructure built, needs training data)
  • Mobile app — React Native with the same WebSocket real-time architecture
  • Team features — shared inboxes, delegation, @mentions across platforms

Built With

  • Backend: Python, FastAPI, asyncpg, PostgreSQL, Redis
  • Frontend: React, TypeScript, Vite, Zustand, TailwindCSS, Tiptap
  • AI: OpenAI GPT-5.4 Nano via OpenRouter, tool-calling agents
  • Browser Automation: Playwright (headless Chromium)
  • Integrations: Gmail API, Slack Socket Mode, Asana API, Google Calendar
  • Infrastructure: WebSocket real-time updates, background polling loops
  • Testing: 15 test suites, Playwright E2E tests

AlComm — because your inbox should work for you, not the other way around.

Built With

Share this project:

Updates