Inspiration

Developer workflows are often complex, entangled with layers of tooling, evolving codebases, and frequent interruptions (like meetings, breaks, or weekends). Thus, maintaining well-established context becomes central to reasoning through tasks and making software decisions.

Today, large language models are great at answering questions, but they're fundamentally stateless. In Turing City, we imagine a future where AI assistants can automatically operate alongside your work, enabling prompts like "Pick up where I left off last week." To actualize this vision, we need a way to capture high-quality context from real workflows, which is where Iconic Memory is born.

What it does

Iconic Memory is a developer observability and recontextualization tool for your personal workflow. It can:

  • Track activity in real time: After starting a session, a vision language model continuously observes the screen and produces structured insights about your work
  • Compress outputs into structured memory: Instead of storing raw video or even the verbose outputs of the vision language model, we apply a custom compression algorithm to produce concise input for downstream LLM use
  • Store knowledge as a queryable timeline: Compressed insights are indexed for search, enabling re-entry into past work.

How we built it

We built an end-to-end system with three core pieces:

Frontend Web UI

  • Built with TypeScript, React, and Tailwind CSS.
  • Allows developers to start and stop workflow sessions with one click
  • Renders a session timeline where past insights can be reviewed and searched

Overshoot Vision Model Integration

  • Built with Javascript and Overshoot SDK
  • Streams screen frames for continuous analysis
  • Produces structured natural-language observations about on-screen activity

Data Compression Algorithm

  • Built with TypeScript on the backend
  • Leverages intuition that closely-spaced vision model inference calls may produce similar outputs
  • Embeds natural language descriptions into vectors and determines cosine similarity of adjacent outputs
  • Filters similar outputs to produce concise input for downstream LLM use

Challenges we ran into

  • Streaming Reliability: Going through an entire workflow requires keeping a long-running stream stable, so we needed to configure the Overshoot processing parameters accordingly to adapt to our use case.
  • Large Volumes of Data: For long workflows, the vision language model often produced too much data for a downstream LLM to reliably handle, so we needed to devise a compression algorithm that ensured scalability while preserving meaning.

Accomplishments that we're proud of

  • Configuring the web UI, vision model, and data compression algorithm into a cohesive application
  • Enabling cross-application functionality beyond just code (screen display can observe any app)
  • Structuring data in a compressed manner for efficient usage

What we learned

  • Working with vision language model outputs even when raw outputs are often very verbose
  • Applying LLM reasoning to long-term developer workflows, rather than simple chat interfaces

What's next for Iconic Memory

  • Adaptive Focus Modes: Modify Overshoot processing parameters based on the application currently in use (i.e., higher precision for VS Code, lower for Slack, etc.)
  • Applying Workflow Data to Personalized AI: Aggregate high-quality, privacy-preserving workflow patterns to be used for training/evaluation data for personalized AI systems

Built With

Share this project:

Updates