LanOnasis Aether Memory – ARM Dev Copilot

The Lan Onasis

About the project

Inspiration

This project started from a feeling most developers know too well:

You finally figure something out at 2:13 a.m. – the perfect regex, the bug root cause, the right way to wire OAuth – and you tell yourself “I’ll remember this next time.”
Next time comes… and it’s gone.

Over the last few years I’ve bounced between IDEs, CLIs, browsers, terminals, documentation tabs, and AI tools. I was shipping features, but also losing hours every week hunting for “that snippet from last month” or “that explanation I wrote for myself on Slack.” My “memory system” was a messy mix of Notion pages, VS Code scratch files, screenshots, and half-finished TODOs.

At the same time, I’ve been building AI-driven tools and Memory-as-a-Service for teams. I kept asking myself:

“If I can build memory for my apps, why can’t I build a reliable, private memory companion for me as a developer?”

When I saw the ARM AI Developer Challenge, something clicked. Most of the AI tools I used depended on the cloud. But my real development life happens on laptops and phones that often have patchy or expensive internet – especially in transit, in power outages, or in countries where connectivity is not a given.

So Aether Memory was born from a simple but personal desire:
to stop losing hard-won insights, and to prove that on-device ARM AI can give developers a private, offline “second brain” that actually fits how we work.

What Aether Memory does

LanOnasis Aether Memory – ARM Dev Copilot is a cross-platform memory companion for developers that:

Lets you capture context anywhere – VS Code, web dashboard, mobile PWA, CLI.
Runs on-device semantic search using ARM-optimized embeddings.
Works completely offline, and syncs securely when you’re back online.
Understands meaning, not just keywords – so “that retry pattern from the payments service” is actually findable.

A typical moment:

You’re in the subway with no network and remember a production incident you never want to repeat.
You open the Aether Memory PWA on your ARM phone, jot a few lines, and it instantly generates an embedding on-device.
Later at your ARM laptop, in VS Code, you type “retry pattern for flaky third-party API” – and Aether surfaces the memory you captured on the train.

It’s not just “notes with tags.” It’s a personal, semantic memory bank that lives close to you – on your own ARM hardware.

How I built it

From day one, I designed Aether Memory as a monorepo with a shared core and multiple “faces”:

Shared package – a TypeScript SDK with:
- LanonasisClient for working with memories.
- A local AI module that runs embeddings via ONNX / transformers.js.
- Shared types for memories, topics, and search results.
Mobile PWA – an ARM-optimized React app built with:
- Vite + React + TypeScript + Tailwind.
- Service worker + Workbox for offline caching and installation.
- Local storage / IndexedDB for offline memory queueing.
VS Code extension – a sidebar that:
- Shows your memory list and semantic search results.
- Lets you capture snippets directly from your editor.
Web dashboard – a browser UI for exploring memories on bigger screens.
Server/API – a Node/Express/Hono backend:
- PostgreSQL + pgvector for cloud-side semantic search and analytics.
- Auth integration (Clerk/Auth.js) to keep things secure.

The core of the hackathon work focused heavily on on-device inference:

I integrated a quantized ONNX version of all-MiniLM-L6-v2.
Used ARM-friendly tooling so the model could run inside the browser on iPhones, Android phones, M-series Macs, and even Raspberry Pi.
Benchmarked load times, memory usage, and embedding latency across ARM devices, aiming for a sub-100 ms embedding time on modern phones and laptops.

The architecture is intentionally simple in one sense:

Generate embeddings locally → store them locally → sync to the cloud when it’s safe and convenient.

On-Device AI Model Loading Flow

App Mount → useLocalAI() hook initializes
Vite Config → Pre-bundles @xenova/transformers with ESNext target
First Load → Downloads quantized ONNX model from HuggingFace (~22MB)
Service Worker → Caches model with CacheFirst strategy
Subsequent Loads → Instant from cache, works offline
Memory Creation → embed() generates vectors on-device

But the side effect is powerful: the developer never has to “ask permission” from the network to remember something important.

Challenges I faced

This project was not a straight line. A few of the toughest parts:

1. Making on-device AI actually feel usable

Running a model locally is one thing; making it feel smooth is another.

Initial model load times on mobile felt long and fragile.
Memory usage had to stay low so it didn’t “fight” with the rest of the dev tools.
I had to carefully preload, cache, and lazy-initialize the model so that:
- The first experience wasn’t frustrating.
- Subsequent uses felt instant and invisible.

I iterated on caching, prefetching, and user feedback (loading indicators, “model ready” states) until it felt like a companion, not a burden.

2. Offline-first is a mindset shift

Most of my previous apps treated offline as an edge case. Here it had to be the default.

That meant:

Designing flows that never block on network.
Queuing writes and sync operations locally.
Handling conflict resolution gracefully.
Being transparent to the user when things are “stored locally and will sync later.”

It reminded me that real developers often work in less-than-ideal environments. A good tool has to respect that reality.

3. Balancing privacy with usefulness

I wanted this to be a tool I could trust with my own work:

Embeddings generated on the device never need to leave unless I choose to sync.
Text content is stored securely, not sprayed across third-party APIs.
The system is built to support future “team/shared memory” without sacrificing the personal nature of the tool.

Designing with privacy at the core forced me to be intentional about what runs where, and how.

4. Scope vs. hackathon timeline

The vision for Aether Memory is big: multi-platform, AI-first, team-ready, and deeply integrated. The hackathon forced me to ask:

“What is the smallest version of this idea that can prove the point and still be lovable?”

So I focused on:

ARM-optimized on-device embedding engine.
Mobile PWA & VS Code extension as the primary faces.
A clean, opinionated UX instead of trying to ship everything at once.

Saying “not yet” to some roadmap items was hard, but necessary.

What I learned

This project taught me more than just technical lessons.

Technically, I learned:

How far browser-side and on-device AI have come for ARM hardware.
Practical constraints of running transformer models on phones and low-power devices.
How to design an offline-first, sync-later data model that doesn’t feel clunky.
The importance of treating “memory” as more than just storage – it’s about retrieval quality and trust.

Personally, I learned:

A lot of developer pain is emotional, not just technical.
Losing context doesn’t just waste time – it chips away at confidence.
Building tools “for myself” first creates a different kind of honesty.
If I wouldn’t use it on a rough day, I shouldn’t ship it.
ARM devices are not just “mobile variants” of the desktop world; they’re becoming the home base for many developers, especially in regions where mobile is primary.

Most of all, I was reminded that small moments matter – the quick note in transit, the saved insight at midnight, the reconnecting of ideas months later. Aether Memory is an attempt to honour those moments and give them a safe place to live.

What’s next

This hackathon version is just the beginning. The roadmap includes:

On-device voice capture with Whisper for “spoken memories.”
Team-level shared memory banks for engineering squads.
Integrations with tools like Linear/Jira for linking tickets to real-world learnings.
Local LLM chat for talking with your memory, not just searching it.

But even in its current state, Aether Memory already does the one thing I needed most:

It stops my best ideas from disappearing into the air.

If this project can help even a few other developers feel a bit more grounded, a bit more in control of their mental load – especially in environments where connectivity isn’t guaranteed – then the hours spent profiling models on ARM chips were absolutely worth it.

Built With

arm-on-device-ai
auth.js
clerk
express.js
framer-motion
hono
node.js
onnx-runtime-web
pgvector
postgresql
pwa-(offline-first)
react
service-workers
tailwind-css
tanstack-query
transformers.js
typescript
vite
zustand

Submitted to

Arm AI Developer Challenge

Created by

I conceived and led LanOnasis Aether Memory end-to-end — from the original idea of an on-device ARM “memory copilot” to the product design, architecture, and implementation. I designed the monorepo structure and shared TypeScript SDK, implemented the ARM-optimized on-device embedding pipeline (quantized all-MiniLM-L6-v2 via transformers.js and ONNX Runtime Web), and built the offline-first PWA with React/TypeScript/Tailwind, service workers, and local caching. I wired up the backend API with Node.js, Hono/Express, PostgreSQL + pgvector for semantic search, and integrated secure auth. I also handled the developer experience and storytelling: VS Code extension integration, benchmarks on multiple ARM devices, documentation, and this Devpost submission.

Lan Onasis

Updates

Lan Onasis posted an update — Nov 29, 2025 09:54 AM EST

Nov 29: On-Device AI Engine Complete

✅ Transformers.js loads successfully in browser
✅ WebGPU acceleration detected on ARM devices
✅ WASM fallback configured for mobile
✅ Model caching via Service Worker
✅ Remote API integration working

Note for judges: First load downloads a 22MB model from HuggingFace. Once cached, all subsequent loads and offline usage work instantly. The on-device embedding engine is fully functional on ARM hardware.

Log in or sign up for Devpost to join the conversation.

Lan Onasis posted an update — Nov 29, 2025 09:24 AM EST

Nov 29: SDK & API Integration Fixes

Fixed API endpoint alignment (/memory vs /memories)
Added environment-based API key configuration
Improved transformers.js bundling for faster AI model loading
Memory creation and retrieval now working end-to-end with live backend

The PWA now connects seamlessly to the deployed MaaS backend at api.lanonasis.com.

Log in or sign up for Devpost to join the conversation.

Lan Onasis started this project — Nov 27, 2025 07:49 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.