A text only journal entry
Reflections page
A journal entry with image
The main page you see when you use the app

Better Journal

Inspiration

Most journaling apps treat entries as static text. You write, you save, you forget. But emotions aren’t captured in words alone. A photo from a hike, a quick doodle during a stressful meeting, the way you press harder on the screen when you’re frustrated: these all carry emotional signal that traditional journals completely ignore.

We were inspired by affective computing research (Kang 2014, Kim 2018) and the Valdez-Mehrabian color-emotion model to ask a simple question: what if a journal could understand how you actually feel by fusing text, images, and drawings together, all without ever sending your data to a server?

We also noticed that existing mood-tracking apps force you to self-report with a single emoji tap, which is reductive and often inaccurate. We wanted mood detection that adapts to you over time, learning your personal emotional baseline and distinguishing between “you’re sad” and “you’re sadder than your usual self.”

What it does

Better Journal is a native iOS journaling app with a multimodal, on-device mood intelligence system. You journal naturally by writing, attaching photos, and sketching, and the app silently analyzes everything to build a rich emotional portrait of each entry.

The core pipeline works in six layers:

Layer 1: Signal Extraction. Three independent analyzers extract emotion signals from different input types.
- A text engine using a 500-word emotion lexicon with negation/intensifier handling and Apple’s NLTagger
- An image engine using Vision framework for facial expression geometry, color theory (HSB histograms), and scene classification
- A drawing engine that analyzes stroke kinematics like pressure variance, velocity, jaggedness, and warm/cool color ratios
Layer 2: Calibration. Each modality signal is temperature-scaled to normalize confidence levels.
Layer 3: Late Fusion. Signals are merged using confidence-weighted averaging with Bayesian precision-weighting on the valence/arousal dimensions.
Layer 4: Kalman Smoothing. A temporal filter prevents emotional whiplash between entries by treating each mood observation as a noisy measurement of an underlying latent mood state.
Layer 5: Baseline Normalization. Your mood is normalized against your personal baseline so the app can tell you “this is unusually positive for you” rather than just “this is positive.”
Layer 6: Personality Derivation. Long-term traits like emotional stability, optimism bias, reactivity, and recovery rate are derived from accumulated mood data.

On top of this, the app also provides:

Personalized coaching insights and mood-aware quotes powered by Apple’s on-device Foundation Models
Weekly insights, mood heatmaps, sparkline charts, and recap card stacks
A warm, cream-and-lavender design system with haptic feedback throughout

How we built it

The entire app is built in Swift and SwiftUI, targeting iOS with SwiftData for persistence. The architecture is actor-based for concurrency safety. Every analysis engine (image, drawing, text, fusion, insight, personality) is an isolated Swift actor, which lets us run modality extraction concurrently using async let without data races.

Text Analysis

Layered a custom 500-word emotion lexicon mapping to 12 emotion categories on top of Apple’s NLTagger for sentence-level valence
Added contextual modifiers for negation (“not happy”) and intensifiers (“very anxious”)

Image Analysis (three Vision framework sub-analyzers)

Face landmark geometry for smile/frown detection
HSB color histograms mapped through the Valdez-Mehrabian color-emotion model
VNClassifyImage for scene-to-emotion mapping

Drawing Analysis

Extracts 10 kinematic features from PencilKit strokes (pressure, velocity, jaggedness, coverage, spatial entropy, color variance, warm color ratio, and more)
Maps features through a research-based weight matrix to a 12-category emotion distribution

Fusion Layer

Merges emotion probability distributions with modality-specific trust weights (text: 1.0, image: 0.7, drawing: 0.5)
Performs Bayesian precision-weighted fusion in logit-space for the arousal dimension
Conflict detection flags when modalities genuinely contradict each other

Kalman Filter

Operates in two channels (valence and arousal) with time-gap-aware process noise
Outlier rejection at 2.5 sigma and mean-reversion toward the user’s personal baseline

On-Device AI

Integrated Apple’s FoundationModels framework (iOS 26) for journal coaching insights and mood-aware motivational quotes
Graceful fallback to static content when the model is not available

Challenges we ran into

Multimodal fusion balancing. Early versions would let a single bright photo override clearly anxious text, producing jarring mood mismatches. We solved this by introducing modality trust weights, confidence-gated fusion, and a conflict detection system that penalizes the overall confidence score when modalities genuinely disagree (valence spread > 1.4).
Kalman filter tuning. Too aggressive smoothing made mood feel “stuck,” while too little smoothing caused emotional whiplash where one exclamation mark could swing your whole mood. We ended up capping the Kalman gain at 0.6 with additional outlier rejection at 2.5 sigma and adding a variance floor so the filter never becomes overconfident.
Drawing-to-emotion mapping. Mapping stroke kinematics to emotions required careful weight matrix tuning. High pressure combined with high velocity does not always mean “frustrated” (it could mean “excited”). We referenced Kang (2014) and Kim (2018) for the feature-to-emotion weight matrix and added warm/cool color ratio and jaggedness as additional discriminative features.
Concurrency and persistence. Getting SwiftData, actor isolation, and concurrent analysis to play nicely together required careful architectural decisions, particularly ensuring that the analysis pipeline’s actor state (Kalman, baseline, calibration) persists correctly across app launches.

Accomplishments that we’re proud of

Built a six-layer mood analysis pipeline that runs entirely on-device with zero cloud dependency. Text, image, and drawing analysis all happen locally, and even the AI insights use Apple’s on-device Foundation Models. Your journal never leaves your phone.
The emotion system uses full 12-category probability distributions instead of single-label classification. This means the app can express nuance like “mostly calm with a hint of nostalgia” rather than forcing everything into a single emoji.
The human-in-the-loop design lets users correct the detected mood, and the system respects that override while learning from it.
Kalman temporal smoothing gives the mood tracking a sense of continuity and memory, so your emotional journey feels coherent rather than jumping randomly between entries.
Personal baseline normalization means the app genuinely adapts to each user over time rather than applying one-size-fits-all thresholds.
The personality profiler derives traits like emotional stability, reactivity, and recovery rate purely from mood metadata, without ever analyzing journal text content for personality. This feels like a genuinely responsible approach to personal analytics.

What we learned

Multimodal fusion is an order of magnitude harder than single-modality analysis. Each modality has different noise characteristics, confidence profiles, and failure modes. Text is great for specific emotions but misses sarcasm. Images capture energy but not nuance. Drawings are informative but incredibly noisy. The key insight was that fusion should be humble: the system should know when it doesn’t know and flag conflicts rather than papering over disagreements.
Temporal context matters enormously for mood tracking. A single entry’s mood in isolation is far less meaningful than that mood relative to the user’s baseline and recent trajectory. The Kalman filter transformed the experience from “random mood snapshots” to “an evolving emotional narrative.”
Swift’s actor model is perfect for multi-pipeline concurrent analysis. Once we committed to the actor architecture, an entire class of concurrency bugs simply disappeared.

What’s next for Better Journal

Voice journaling as a fourth modality. Analyzing speech prosody (pitch, tempo, energy) alongside transcribed text would add a powerful emotional signal, especially for capturing emotions that people don’t write down.
Correlation analysis in the insight engine. Connecting mood patterns to time-of-day, day-of-week, weather, and activity data to surface actionable patterns like “you tend to feel most anxious on Sunday evenings” or “your mood improves significantly on days you journal before 9 AM.”
On-device mood forecasting. Using the Kalman state’s predictive capability to give users a gentle heads-up when the system detects they are trending toward a low period, along with personalized suggestions based on what has helped them before.
Apple Watch integration. Adding passive physiological signals (heart rate variability, sleep quality) as a fifth modality input, creating a truly holistic picture of emotional wellbeing without compromising the app’s zero-cloud, privacy-first architecture.