Inspiration

We've all been there: explaining our job, our coding preferences, or our personal interests to Gemini, then switching to Claude and having to start from scratch. AI conversations are powerful but lack memory; every chat is a blank slate.

We were inspired by the idea of a "unified memory layer" for LLMs, but with a critical constraint: it had to be 100% private and on-device. We wanted our AI assistants to know us, without us having to sell our privacy to do it.

What it does

Aether acts as a central, local memory for all your LLM conversations.

  1. Listens: As you chat on sites like Gemini or ChatGPT, Aether's content script securely captures the prompts you send.
  2. Understands: The prompt is passed to Chrome's built-in Summarizer API (a local, on-device AI model) with a custom directive. This model extracts key facts about you like preferences, hobbies, and goals as structured JSON.
  3. Remembers: Each extracted fact is vectorized and stored in a private, on-device vector database.
  4. Informs: A 'bubble' button is injected into the chat text area. When you type a new query (e.g., "Suggest some good laptops"), you can click the bubble. Aether performs a vector search on your query, finds the most relevant memories (e.g., "User prefers 16-inch screens," "User is a TypeScript developer"), and injects them as context to help the LLM give a truly personalized answer.

How we built it

Aether is a Plasmo Chrome extension built with Plasmo, React and TypeScript.

  • Service Worker (background.ts): This is the brains of the operation. It initializes the on-device AI and the vector database. It receives prompts from content scripts and orchestrates the entire "learn-and-store" pipeline.
  • On-Device AI (summarizer.ts): We leverage the experimental Chrome Summarizer API (requiring Chrome 140+). We feed it a custom sharedContext prompt, turning it from a summarizer into a powerful, structured data extractor that runs entirely locally.
  • Vector Storage: We use @xenova/transformers.js to run the all-MiniLM-L6-v2 model inside the extension to generate embeddings for each memory. These vectors are then stored and indexed in @babycommando/entity-db, a lightweight, on-device vector database.
  • UI (bubbleButton.tsx, popup.tsx): The context-injection button is a Plasmo inline content script UI. The popup is a standard React component using @plasmohq/storage hooks to display and delete memories.

Challenges we ran into

  • Prompting a Summarizer: The Chrome Summarizer API is not designed for structured data extraction. Our biggest challenge was 'prompt-hacking' it. We spent hours iterating on the sharedContext prompt to force the 'nano' model to reliably output only a valid JSON array and nothing else.
  • Vector DB in a Service Worker: Running Transformers.js in a Manifest V3 service worker is tricky. We had to configure the environment (setup-env.ts) to disable web workers (which aren't supported in the SW) and force single-threaded WASM. This also required complex content_security_policy adjustments in the manifest.
  • The "Lexical Editor" Problem: Injecting text isn't as simple as textarea.value = .... Sites like Perplexity use complex editors (e.g., Lexical) that don't respond to simple value changes. We had to reverse-engineer their input handling and ended up simulating a native 'paste' event (new ClipboardEvent("paste", ...)) to reliably inject our context.

Accomplishments that we're proud of

We are incredibly proud of building a fully on-device, end-to-end AI pipeline. From prompt capture to local inference, vectorization, storage, and retrieval—no data ever leaves the user's machine.

Hacking the Summarizer API to act as a JSON-based preference extractor is a novel application of this new browser technology. We also built a robust injection system that works on complex React-based editors, not just simple text areas.

What we learned

We learned that the era of powerful, on-device AI is already here. The new Summarizer API is a game-changer for privacy-centric extensions. We also learned that even for small, local models, prompt engineering is everything. The difference between a useless summary and perfect JSON was just a few carefully chosen words in our sharedContext prompt.

What's next for Aether

  • Expanding Support: Add support for more LLMs and other AI services.
  • Memory Curation: Build a more advanced popup UI that allows users to manually edit, add, and curate their memories, not just delete them.
  • Smarter Retrieval: Implement more advanced retrieval logic, such as summarizing the top 5 memories into a single, cohesive paragraph before injection.
  • Multi-Modal Learning: Explore ways to learn from more than just text prompts, such as from the LLM's answers or even from general web browsing (with user permission).

Built With

  • entity-db
  • manifest-v3
  • plasmo
  • plasmo-platform:-google-chrome-(manifest-v3)-apis-&-libraries:-chrome-summarizer-api-(on-device-ai)
  • react
  • summariser-api
  • transformer.js
  • typescript
  • xenova/transformers.js-(local-embeddings)-databases:-@babycommando/entity-db-(on-device-vector-db)
Share this project:

Updates