Inspiration

The inspiration for Spectator came from the frustration of having to explain the context of my life and conversations I have in person to LLMs. The goal was to create a memory layer which remembers in person conversation for AI assistants. I wanted to build a system that could seamlessly capture voice conversations and make them accessible to AI agents like Claude.

What it does

Spectator is a production-ready voice memory system that captures phone calls via Twilio integration, transcribes conversations in real-time using Deepgram, and stores accumulated transcripts in Google cloud. It provides an MCP (Model Context Protocol) server for Claude Desktop integration, enabling AI assistants to access conversation history for personalized, context-aware responses

How we built it

I built Spectator using Python FastAPI with async architecture, PostgreSQL on Google Cloud SQL for multi-tenant data storage, Twilio for call handling and WebSocket audio streaming, and Deepgram API for real-time transcription. The system includes a custom MCP server for Claude Desktop integration and deployed on Railway with automated CI/CD from GitHub. I implemented complete data isolation by phone number and robust error handling with connection recovery mechanisms.

Challenges we ran into

Our main challenges included database session management issues that initially prevented transcript persistence, race conditions in WebSocket connections for phone number extraction, Deepgram connection stability requiring automatic reconnection logic, and getting Claude Desktop to properly recognize our MCP server. I also had to simplify our initial complex vector embedding pipeline to reliable direct SQL text storage.

Accomplishments that we're proud of

The system is no able to handle calls and automatically transcribe voice in real time and then transcribe that to a database for storage. This storage database can then be queried by the mcp server for context retrieval.

What we learned

With more time we can handle automatic detection of phone placement and adjust the microphone sensitivity to be able to pickup conversations even if your phone is in your pocket. This feature was unable to be implemented.

What's next for Spectator

Implementing automatic mic detection and adjustment to be able to adjust for far away speakers and unclear audio. Also implementing storage of context in a vector embedded database for smart context retrieval once the data becomes large.

Built With

Share this project:

Updates