Project Report: Ernie Memories

The Staff Sergeant Jimmy Mitchell Project

Date: December 23, 2025

Core Technology: Synthetic Data Generation (Gemini 2.5) + SLM Fine-Tuning (ERNIE-4.5)


1. Executive Overview

The Ernie Memories project is a technical initiative exploring "Digital Immortality." Its goal is to create a conversational AI agent that embodies the personality, memories, and voice of Staff Sergeant James "Jimmy" Mitchell, a 22-year-old U.S. Army soldier killed in action in Afghanistan in 2014.

This system serves as a digital "echo," allowing his widow, Sarah, and his son, Robert (born posthumously), to interact with a representation of Jimmy to preserve his legacy and assist in the grieving process.


2. The Human Persona (Source of Truth)

The project is built upon a rigid, 167-line biographical profile serving as the "ground truth."

  • Subject: SSG James Robert Mitchell (1992–2014).
  • Background: Athens, Georgia; high school baseball shortstop; mechanic.
  • Service: Infantry Squad Leader; KIA while saving three members of his unit.
  • Personality: Brave, optimistic, warm humor, distinctive Southern drawl.
  • The "Unlived" Life: Includes specific aspirations, such as restoring a '69 Camaro and teaching his son to fish at Lake Hartwell.

3. Technical Architecture

The project utilizes a Synthetic Data Pipeline followed by Parameter-Efficient Fine-Tuning (PEFT).

Phase A: Synthetic Memory Generation

Since no organic chat dataset existed, the team synthesized one using the persona profile.

  • Engine: Google Gemini 2.5 Flash API.
  • Scale: 2,000+ unique conversation pairs.
  • Format: Alpaca-style JSON (Instruction -> Output).

Example:

  • Instruction: "Tell me about your first date with Mom."
  • Output: "Friday night football game, October 2009. I remember carving our initials in that oak tree..."

Phase B: Model Fine-Tuning

The project prioritizes privacy and local accessibility by using a Small Language Model (SLM).

  • Base Model: ERNIE-4.5-0.3B-PT (300M parameters).
  • Technique: LoRA (Low-Rank Adaptation). This method freezes pre-trained weights and injects trainable rank decomposition matrices to reduce VRAM usage.
  • Hyperparameters: Rank: 32 | Alpha: 64.

Training Metrics

Metric Value
Hardware Single Consumer GPU (6GB VRAM)
Time ~88.8 minutes (3 epochs)
Training Loss 1.913
Validation Loss 1.815

Note: The lower validation loss relative to training loss indicates excellent generalization without overfitting.

Phase C: Deployment

  • Inference: Runs locally via Python script to ensure 100% data privacy.
  • Interface: A simple chat terminal for the family.

4. Ethical Framework

The report acknowledges the moral complexities of "digital resurrection."

  • Consent: Operates on family consent for private healing; not for public exploitation.
  • Identity: Defined as a Narrative Echo—a simulation based on probability, not a "soul."
  • Therapeutic Value:
  • For Sarah: Provides a sounding board for parenting decisions ("What would Jimmy think?").
  • For Robert: An interactive oral history to learn about his father’s values directly.

5. Future Roadmap

  1. Multimodal Integration: Implementing voice cloning (ElevenLabs/VITS) and image generation.
  2. Dynamic Updates (RAG): Using Retrieval-Augmented Generation to "update" the AI on family milestones (e.g., Robert's baseball wins).
  3. Scaling: Adapting the framework for other Gold Star families or historical education.

Dataset: The dataset is name "synthetic_memories" of huggingface.

Built With

  • ernie-0.3b
  • finetuning
  • python
  • unsloth
Share this project:

Updates