results
during training
example

Project Report: Ernie Memories

The Staff Sergeant Jimmy Mitchell Project

Date: December 23, 2025

Core Technology: Synthetic Data Generation (Gemini 2.5) + SLM Fine-Tuning (ERNIE-4.5)

1. Executive Overview

The Ernie Memories project is a technical initiative exploring "Digital Immortality." Its goal is to create a conversational AI agent that embodies the personality, memories, and voice of Staff Sergeant James "Jimmy" Mitchell, a 22-year-old U.S. Army soldier killed in action in Afghanistan in 2014.

This system serves as a digital "echo," allowing his widow, Sarah, and his son, Robert (born posthumously), to interact with a representation of Jimmy to preserve his legacy and assist in the grieving process.

2. The Human Persona (Source of Truth)

The project is built upon a rigid, 167-line biographical profile serving as the "ground truth."

Subject: SSG James Robert Mitchell (1992–2014).
Background: Athens, Georgia; high school baseball shortstop; mechanic.
Service: Infantry Squad Leader; KIA while saving three members of his unit.
Personality: Brave, optimistic, warm humor, distinctive Southern drawl.
The "Unlived" Life: Includes specific aspirations, such as restoring a '69 Camaro and teaching his son to fish at Lake Hartwell.

3. Technical Architecture

The project utilizes a Synthetic Data Pipeline followed by Parameter-Efficient Fine-Tuning (PEFT).

Phase A: Synthetic Memory Generation

Since no organic chat dataset existed, the team synthesized one using the persona profile.

Engine: Google Gemini 2.5 Flash API.
Scale: 2,000+ unique conversation pairs.
Format: Alpaca-style JSON (Instruction -> Output).

Example:

Instruction: "Tell me about your first date with Mom."

Output: "Friday night football game, October 2009. I remember carving our initials in that oak tree..."

Phase B: Model Fine-Tuning

The project prioritizes privacy and local accessibility by using a Small Language Model (SLM).

Base Model: ERNIE-4.5-0.3B-PT (300M parameters).
Technique: LoRA (Low-Rank Adaptation). This method freezes pre-trained weights and injects trainable rank decomposition matrices to reduce VRAM usage.
Hyperparameters: Rank: 32 | Alpha: 64.

Training Metrics

Metric	Value
Hardware	Single Consumer GPU (6GB VRAM)
Time	~88.8 minutes (3 epochs)
Training Loss	1.913
Validation Loss	1.815

Note: The lower validation loss relative to training loss indicates excellent generalization without overfitting.

Phase C: Deployment

Inference: Runs locally via Python script to ensure 100% data privacy.
Interface: A simple chat terminal for the family.

4. Ethical Framework

The report acknowledges the moral complexities of "digital resurrection."

Consent: Operates on family consent for private healing; not for public exploitation.
Identity: Defined as a Narrative Echo—a simulation based on probability, not a "soul."
Therapeutic Value:
For Sarah: Provides a sounding board for parenting decisions ("What would Jimmy think?").
For Robert: An interactive oral history to learn about his father’s values directly.

5. Future Roadmap

Multimodal Integration: Implementing voice cloning (ElevenLabs/VITS) and image generation.
Dynamic Updates (RAG): Using Retrieval-Augmented Generation to "update" the AI on family milestones (e.g., Robert's baseball wins).
Scaling: Adapting the framework for other Gold Star families or historical education.

Dataset: The dataset is name "synthetic_memories" of huggingface.