Project Report: Ernie Memories
The Staff Sergeant Jimmy Mitchell Project
Date: December 23, 2025
Core Technology: Synthetic Data Generation (Gemini 2.5) + SLM Fine-Tuning (ERNIE-4.5)
1. Executive Overview
The Ernie Memories project is a technical initiative exploring "Digital Immortality." Its goal is to create a conversational AI agent that embodies the personality, memories, and voice of Staff Sergeant James "Jimmy" Mitchell, a 22-year-old U.S. Army soldier killed in action in Afghanistan in 2014.
This system serves as a digital "echo," allowing his widow, Sarah, and his son, Robert (born posthumously), to interact with a representation of Jimmy to preserve his legacy and assist in the grieving process.
2. The Human Persona (Source of Truth)
The project is built upon a rigid, 167-line biographical profile serving as the "ground truth."
- Subject: SSG James Robert Mitchell (1992–2014).
- Background: Athens, Georgia; high school baseball shortstop; mechanic.
- Service: Infantry Squad Leader; KIA while saving three members of his unit.
- Personality: Brave, optimistic, warm humor, distinctive Southern drawl.
- The "Unlived" Life: Includes specific aspirations, such as restoring a '69 Camaro and teaching his son to fish at Lake Hartwell.
3. Technical Architecture
The project utilizes a Synthetic Data Pipeline followed by Parameter-Efficient Fine-Tuning (PEFT).
Phase A: Synthetic Memory Generation
Since no organic chat dataset existed, the team synthesized one using the persona profile.
- Engine: Google Gemini 2.5 Flash API.
- Scale: 2,000+ unique conversation pairs.
- Format: Alpaca-style JSON (
Instruction->Output).
Example:
- Instruction: "Tell me about your first date with Mom."
- Output: "Friday night football game, October 2009. I remember carving our initials in that oak tree..."
Phase B: Model Fine-Tuning
The project prioritizes privacy and local accessibility by using a Small Language Model (SLM).
- Base Model: ERNIE-4.5-0.3B-PT (300M parameters).
- Technique: LoRA (Low-Rank Adaptation). This method freezes pre-trained weights and injects trainable rank decomposition matrices to reduce VRAM usage.
- Hyperparameters: Rank: 32 | Alpha: 64.
Training Metrics
| Metric | Value |
|---|---|
| Hardware | Single Consumer GPU (6GB VRAM) |
| Time | ~88.8 minutes (3 epochs) |
| Training Loss | 1.913 |
| Validation Loss | 1.815 |
Note: The lower validation loss relative to training loss indicates excellent generalization without overfitting.
Phase C: Deployment
- Inference: Runs locally via Python script to ensure 100% data privacy.
- Interface: A simple chat terminal for the family.
4. Ethical Framework
The report acknowledges the moral complexities of "digital resurrection."
- Consent: Operates on family consent for private healing; not for public exploitation.
- Identity: Defined as a Narrative Echo—a simulation based on probability, not a "soul."
- Therapeutic Value:
- For Sarah: Provides a sounding board for parenting decisions ("What would Jimmy think?").
- For Robert: An interactive oral history to learn about his father’s values directly.
5. Future Roadmap
- Multimodal Integration: Implementing voice cloning (ElevenLabs/VITS) and image generation.
- Dynamic Updates (RAG): Using Retrieval-Augmented Generation to "update" the AI on family milestones (e.g., Robert's baseball wins).
- Scaling: Adapting the framework for other Gold Star families or historical education.
Dataset: The dataset is name "synthetic_memories" of huggingface.
Built With
- ernie-0.3b
- finetuning
- python
- unsloth


Log in or sign up for Devpost to join the conversation.