Problem Statement

AI systems are no longer just answering questions. They are quietly learning how we think, how we search, what we trust, what we ignore, how we verify, and how we decide.

Every prompt, correction, follow-up search, and browsing pattern becomes behavioral data. Over time, that data turns into a high-value memory layer: not just what you asked, but how you operate.

The problem is that this memory is usually captured inside closed AI platforms.

Users do not truly see what is being inferred about them.
Users do not control how that behavioral memory is stored.
Users cannot easily move that memory to another provider.
Switching providers means losing the compounding value of learned context.

So the real problem is not only model lock-in. It is behavioral lock-in.

Why This Matters

This is important because behavioral data is becoming the new moat.

In older software platforms, lock-in came from files, messages, or customer records. In AI systems, lock-in is moving one layer deeper:

the learned model of how a person works

That includes things like:

which sources they trust
whether they verify before trusting an answer
whether they prefer technical docs or summaries
how they compare products before buying
how they learn new topics
when an AI answer is "good enough" versus when they keep searching

That memory becomes more valuable every day it is accumulated. And if it stays trapped inside one provider, the user loses leverage.

This matters now because AI platforms are consolidating fast. If users do not get ownership over this behavioral layer early, the default future is one where:

platforms accumulate lifelong behavioral memory
users cannot audit it clearly
users cannot port it cleanly
the cost of leaving becomes too high

Core Thesis

Behavioral memory should belong to the user.

Not to a single model vendor.
Not to a hidden proprietary memory store.
Not to a platform that can infer endlessly without portability.

Users should be able to:

capture behavioral signals locally
structure them into portable memory
inspect what is being learned
carry that memory across providers

That is the problem we are addressing.

Inspiration

We were inspired by a simple shift in how AI products are evolving.

The first generation of AI products competed on model quality: better benchmarks, bigger context windows, faster answers. But the next layer of competition is not just the model. It is the memory around the model.

As people work with AI more often, the system starts learning patterns:

what kinds of sources they trust
what kind of answers they verify
how they research
how they shop
how they learn
what they ignore and what they care about

That memory compounds. It becomes the real value. And right now, most users have almost no visibility into it, no ownership over it, and no portability if they want to switch providers.

That felt like an important problem to prototype against right now. We wanted to build an early version of infrastructure that says: if behavioral memory is being created, it should be captured in a form the user can own, inspect, and move.

What it does

DataSelf captures behavioral interaction data locally and structures it into portable memory layers.

At a high level, the system:

Collects browser telemetry from real user activity.
Groups that activity into task-oriented human + agent sessions.
Derives higher-level behavioral memory from those sessions.
Exports that memory into open JSON files that another provider or agent can reuse.

Instead of treating memory as a black box inside one model vendor, DataSelf breaks it into a hierarchy:

raw telemetry
sessions
patterns
preferences
trust profile
persona profile

This makes the memory legible and portable.

For the demo, we focused on three user behaviors:

research intent
shopping preference
learning style

The system can capture signals from those workflows, infer patterns over time, and visualize them in a dashboard.

How we built it

We built DataSelf as a local-first Node.js system with a lightweight but explicit memory pipeline.

1. Telemetry collection

We launch or connect to a real Chrome session using Playwright over CDP and observe browser activity:

navigation events
search queries
scroll depth
dwell time
clicks
network request activity

Those events are grouped into prompt-action pairs and written into a local SQLite database.

2. Human + agent session capture

We then built a session layer on top of raw telemetry.

Instead of only storing explicit thumbs-up or thumbs-down feedback, the system records:

behavior before an agent answer
the agent answer itself
behavior after the answer
implicit outcome signals

This was important because in the real world, users do not rate every answer. What they do is continue searching, verify, refine, compare, or stop. That behavior is the actual signal.

3. Memory hierarchy builder

We added a rule-based memory builder that reads the SQLite store and generates exportable memory layers:

sessions.json
patterns.json
preferences.json
trust_profile.json
persona_profile.json
memory_bundle.json

The memory builder translates repeated sessions into patterns, patterns into preferences, and preferences into a more compact persona-style summary.

4. Visualizer

Finally, we built a local visualizer to make the memory hierarchy visible over time. The dashboard shows:

memory timeline
extracted patterns
inferred preferences
trust profile
persona profile
session drill-down

This gives us a concrete way to show that portable behavioral memory is not just an abstract idea. It can be inspected and carried.

Challenges we ran into

The hardest challenge was deciding what kind of memory actually matters.

Raw browser telemetry is easy to collect, but raw events alone are not very meaningful. We had to figure out how to move from low-level signals like navigation and scroll events into something that actually represents user behavior.

Another challenge was avoiding a fake feedback loop. At first, we considered a system where the user explicitly rates every agent answer. That was too unrealistic. Most people are not going to do that in normal usage. So we shifted toward implicit behavioral evidence, which made the system much more believable but also harder to model.

We also had to think carefully about portability. A lot of memory systems are useful only because they remain internal and proprietary. We wanted a representation that was simple enough to export, structured enough to be useful, and explicit enough that another agent could load it later.

On the implementation side, there were practical issues too:

browser instrumentation across real pages
structuring event data cleanly
defining session boundaries
distinguishing before-answer and after-answer behavior
making the hierarchy readable instead of noisy

And because this is a local-first prototype, we also had to build a workflow that is transparent enough for a demo while still feeling like it points toward a real product.

Accomplishments that we're proud of

The biggest thing we are proud of is that DataSelf does not stop at telemetry collection. It actually demonstrates the beginning of a portable behavioral memory system.

We are proud that we built:

a working browser telemetry collector
a human + agent session model
an implicit-outcome based memory flow
a multi-layer memory export pipeline
a local dashboard to inspect what is being learned

We are also proud that the architecture reflects the thesis of the project. SQLite is the working memory store, but the real portability layer lives in exported JSON files. That separation matters because it reinforces the core product idea: user memory should not be trapped in a proprietary runtime.

And finally, we are proud that the system is opinionated. It is not trying to be a generic AI wrapper. It is trying to answer a specific question:

what would it look like if users actually owned the behavioral memory created by working with AI?

What we learned

We learned that behavioral memory is much more concrete than it first appears.

Once you start capturing sessions, you quickly see that meaningful patterns emerge from very ordinary activity:

where a user goes first
what they check after an answer
which sources they return to
when they stop
when they refine

We also learned that the right abstraction is not just "chat history" or "saved prompts." The more useful abstraction is behavior around tasks.

That includes:

user prompt
agent answer
pre-answer evidence
post-answer evidence
observed behavior
inferred outcome

That structure feels much closer to how memory should work for portability.

We also learned that confidence matters. If the system only has one or two sessions, it should not pretend to know the user deeply. So even in the prototype, we started treating memory as something that should accumulate gradually and expose uncertainty.