Inspiration

After most meetings, people spend hours writing the emails, creating the Trello cards, and kicking off the repos that came out of the conversation. We wanted a tool that handled that work directly from the meeting itself, while respecting the fact that people change their minds halfway through a sentence.

What it does

Sidekik is a native iOS app paired with a Python backend. While a meeting is happening, the iOS app streams microphone audio to the server over a WebSocket. The server transcribes the audio with faster-whisper and runs a decision engine that tracks every action item through three states: proposed, confirmed, and rescinded. If a speaker takes something back ("actually, scratch that"), the action is rescinded and never executed. Confirmed actions are dispatched to agents that perform the real work: sending emails, creating Trello boards, and creating GitHub repos. Progress streams back to the iOS UI in real time.

How we built it

  • iOS client: Swift and SwiftUI. AVAudioEngine captures 16 kHz mono PCM int16 audio in 200 ms chunks and sends them as binary frames on a URLSessionWebSocketTask. JSON events flow back the other way and drive views for the live transcript, action cards, and agent progress.
  • Server: FastAPI on Python 3.11+, managed with uv. One WebSocket route handles audio in and events out per meeting. REST endpoints cover meeting creation, snapshot reads, and approve/reject for actions.
  • Transcription: faster-whisper running locally with the small.en model by default. Partial transcripts emit during a window; final transcripts emit on silence or after a short timeout.
  • Decision engine: every few seconds of new finalized transcript, the engine calls CLōD (OpenAI-compatible) with the running transcript and the current action list. CLōD returns strict JSON describing new actions, confirmations, and rescissions, and the engine reconciles that into the in-memory store.
  • Agents: each agent is a self-contained Python module that registers a run function in a shared registry. The shipping integrations are Trello, GitHub, and Email (Resend or SendGrid). Every integration is behind a feature flag so the server boots without any third-party keys.
  • Demo mode: with DEMO_MODE=true, the WebSocket loops through a scripted sequence of every event type, so the iOS client can be built against real frames before the live pipeline is wired and so the demo has a fallback if conference wifi misbehaves.

Challenges we ran into

  • Audio plumbing on iOS. Getting AVAudioEngine to produce the right PCM format, in the right chunk size, on the right thread, without dropping frames during a long session, took several iterations.
  • Time. The project was built during a single hackathon, so we kept the data layer in memory and feature-flagged every external integration.

Accomplishments that we are proud of

  • End-to-end pipeline: phone microphone to transcript to decision to real artifact in a browser tab, working live.
  • A clean action state machine (proposed, confirmed, rescinded) that handles the "actually, scratch that" case.
  • Three working agent integrations behind feature flags.
  • A demo path that does not depend on conference wifi.

What we learned

  • Strict-JSON LLM output plus a small reconciler is enough to run a useful state machine over an open-ended conversation.
  • A WebSocket carrying both audio in and events out keeps the iOS networking layer small.
  • Building a backup demo path early pays off more than any feature added in the last hour.

What is next for Sidekik

  • Per-user accounts and persistent storage so meetings survive a server restart.
  • More agents (Slack, Notion, Linear, Jira, Google Docs, Google Calendar).
  • On-device transcription on iOS for low-latency partials, with the server handling the decision engine and agents.
  • A team view that shows incoming meeting actions across an organization.

Built with

Swift, SwiftUI, AVAudioEngine, Python, FastAPI, uv, WebSockets, faster-whisper, CLōD, GPT-4o mini via Claude, Resend, SendGrid, Trello API, GitHub API, Pydantic, Cursor.

Sponsors and tools used

  • Cursor: the codebase was built in Cursor.
  • CLōD: OpenAI-compatible provider used for decision-state extraction and agent reasoning.
  • Greptile: used during development for cross-repo code search and PR review. Not part of the running app.

Built With

Share this project:

Updates