Daki Life — A Living Knowledge Graph for Your Mind

Inspiration

Journaling works — but nobody does it consistently. It requires carving out dedicated time, staring at a blank page, and knowing what to say. Most people give up before the habit forms. And even when they don't, their thoughts pile up in a flat list they never return to.

Daki Life turns the time you're already spending into a journaling habit. Run a Pomodoro, work, and when the timer ends, write one short note about what was on your mind. Thirty seconds. No blank page, no pressure. Over a session, those notes stop being a list and start becoming a map. Your thoughts self-organize into clusters — Health, Creativity, Relationships — with sub-topics nested inside. Thoughts that don't fit float as outliers instead of getting forced into a box. The map rebuilds in real time, so by the time you're done working, you can see exactly how your mind was moving.

Your ideas don't disappear anymore. They find each other.


What It Does

Daki Life is a focus journal that builds a living semantic knowledge graph from your reflection notes.

  • Focus — Pomodoro-style timer with structured reflection prompts at the end of each work block.
  • Graph — Semantic Knowledge Graph — Interactive graph that automatically maps your notes into clustered life themes like Health, Creativity, and Relationships.
  • Home — At-a-glance stats: sessions logged, top clusters by volume, and where your mental energy is actually going.

You don't just see what you wrote. You see who you are becoming.


Social Impact & Value

Daki Life sits at the intersection of mental wellness, equitable access, and personal development.

  • Therapy costs $150–$300/hour. Daki Life surfaces emotional and cognitive patterns for free — giving users a form of structured self-reflection previously only available to those who could afford support.
  • Self-awareness is a learned skill — and most people never get the tools to develop it. Seeing that 80% of your notes cluster around Work Stress and almost nothing lives in Relationships is a moment of clarity no to-do app can offer.
  • Burnout is invisible until it isn't. The graph shows neglected life areas before they become crises.

This is especially meaningful for students navigating identity formation, academic pressure, and emotional overload — often without support structures.


The AI Pipeline

This is where the data science lives. The graph isn't a feature — it is the product.

ClusterNode Object

The core of the backend operating on the ClusterNode object for the semantic journal graph

Field Type Description
id str Unique UUID for the ClusterNode
parent_id Optional[str] ClusterNode parent (for traversing backwards)
depth int Depth level of the ClusterNode
note_ids list[str] All note IDs within the ClusterNode
children list[ClusterNode] Children ClusterNodes (for traversing forwards)
coordinates_2d dict[str, dict] 2D coordinates for each note within the ClusterNode
semantic_centroid Optional[np.ndarray] Each note's text content is given a 1536-dimensional vector; the semantic centroid is the average of all these vectors
label Optional[str] All note texts undergo TF-IDF keyword extraction, producing a raw keyword list, which is then passed to a GPT-4o-mini API call to generate a label

Embedding

Notes are embedded at write-time via text-embedding-3-small (1536 dimensions) and stored in Supabase + pgvector. No retrieval-time re-embedding — every note already lives in semantic space.

UMAP — Dimensionality Reduction

We run UMAP in two separate passes:

  • 1536D → 8D for density-aware clustering, preserving local neighbourhood structure
  • 1536D → 2D for graph layout (min_dist=0.1), so semantically similar notes appear physically close

UMAP is preferred over t-SNE because it preserves global structure — clusters that are semantically related stay near each other in 2D, not just internally tight.

HDBSCAN — Recursive Hierarchical Clustering

HDBSCAN runs recursively on the 8D output to produce a multi-depth cluster tree:

Root ("Life") ├── Health │ ├── Running │ └── Sleep ├── Creativity └── Relationships └── Deep Connection

min_cluster_size adapts by depth:

  • depth 0 (root): max(10, floor(n / 16)) → 5–8 broad life themes
  • depth 1+: 3 → tight sub-topic groups

Outlier notes (label -1) are never force-assigned to the nearest cluster — they surface as standalone nodes. Every thought counts.

C-TF-IDF — Discriminative Label Extraction

Each cluster is treated as a single document. C-TF-IDF surfaces words that are specific to a cluster — high within it, rare across siblings:

$$score(term, cluster) = tf(term, cluster) \times idf(term, all\ clusters)$$

$$where\ tf = 1 + \ln(count)$$ $$idf = \ln\left(\frac{1 + m}{1 + df}\right) + 1$$

Top-10 n-grams per cluster are passed to GPT-4o-mini for final 1–3 word human-readable labels like "Creative Blocks" or "Deep Connection" — no user input required.

Identity Persistence — Jaccard Matching

On every rebuild, new clusters are matched to old DB records via Jaccard similarity on note membership:

$$J(A, B) = \frac{|A \cap B|}{|A \cup B|}, \quad \text{match if } J > 0.5$$

Labels are reused when the cluster's semantic centroid hasn't drifted — keeping your graph stable as new notes arrive. The full rebuild runs in under 3 seconds for ~250 notes.

Cluster Metrics

Four scores are computed per cluster and min-max normalized across the full tree so they're always comparable regardless of corpus size.

Taxonomic Complexity — how much a cluster has structurally fractured into specific sub-thoughts. Instead of counting raw depth, it measures how deeply nested and populated the subtree is:

$$\text{TC} = \sum_{i \,\in\, \text{sub-clusters}} \bigl(\text{depth}_i \times \log(\text{note_count}_i)\bigr)$$

Information Density — vocabulary richness weighted by term specificity:

$$ID = \frac{U}{\sqrt{T}} \times \frac{1}{K} \sum_{k=1}^{K} w_k$$

where U = unique token types with non-zero TF-IDF weight, T = total raw word count, and w_1 through w_K are the top K TF-IDF scores (K = 8).

Semantic Cohesion — the mean cosine similarity of all note embeddings to the cluster centroid. High scores mean notes are tightly focused around a single idea:

$$Cohesion = \frac{1}{N} \sum_{i=1}^{N} \frac{v_i \cdot c}{|v_i| |c|}$$

Semantic Divergence — how distinct a cluster's thinking is from your overall baseline. Cosine distance between the cluster centroid c and the global centroid c_global (the average of every note in your database):

$$Divergence = 1 - \frac{c \cdot c_{global}}{|c| |c_{global}|}$$

A high divergence score signals niche interests or exploratory ideas far from your average thought.

All four scores are normalized to [0, 1]:

$$x_{norm} = \frac{x - min}{max - min}$$


Data Science Quality

Pipeline Stage Method Why
Embedding text-embedding-3-small State-of-the-art; stored once at write-time
Dim reduction UMAP (two-pass) Preserves global manifold structure; faster than t-SNE
Clustering HDBSCAN (recursive) Noise-robust; no need to pre-specify k
Label extraction C-TF-IDF + GPT-4o-mini Discriminative, not just frequent
Edge weights Cosine similarity via L2-normalized dot product Efficient and numerically stable

Data is never sold, never shared, and never used to train models. All embeddings and notes are scoped per-user via Supabase Row Level Security.


Clarity & Communication

The product is designed around one principle: insight should feel effortless.

  • Semantically similar notes appear physically close on screen — no legend needed
  • Cluster circle radius scales with note count: r = max(9, min(30, 7 + 1.8√|notes|)) — larger clusters look bigger
  • Outlier nodes float freely rather than being hidden
  • The home dashboard surfaces your top clusters and streaks without overwhelming detail

You don't need to understand UMAP to feel what it shows you.


Innovation

Most AI journaling tools use LLMs as a chat interface — you ask, it answers. Daki Life inverts this: the AI observes, and you discover.

What's novel:

  • Recursive semantic clustering applied to personal reflection — not documents or codebases, but lived experience
  • Jaccard-based cluster identity persistence so insights feel stable, not chaotic, as your corpus grows
  • Outlier preservation — treating noise as signal, not garbage

The graph isn't a visualization of your notes. It's a model of your cognition.


Tech Stack

Layer Tech
Mobile React Native (Expo), TypeScript, Expo Router
Graph rendering react-native-svg, D3 force simulation
Gestures React Native Gesture Handler v2, Reanimated
API Node.js, Express, TypeScript
ML sidecar Python, FastAPI
ML libraries umap-learn, hdbscan, scikit-learn
LLM OpenAI GPT-4o-mini (labels), text-embedding-3-small (embeddings)
Database Supabase (Postgres + pgvector + Auth + Realtime)

What's Next

  • Mood and energy tagging to correlate emotional states with semantic clusters over time
  • Weekly reflection digests — an AI-generated summary of how your thinking has shifted in 7 days
  • Accessibility-first redesign for users with ADHD and anxiety who struggle with open-ended journaling

Built for the AI For Good Hackathon in partnership with ACM-W. Because understanding yourself is one of the most meaningful things AI can help with.

Share this project:

Updates