🧠 Aeon Core: A Lightweight Task Memory & QA Kernel 🚀 Inspiration

Aeon Core was born from a bigger dream—Pegasus Unleashed—a lifelong, recursive second-brain system I’ve been building as a linguist, AI architect, and author.

But for this hackathon, I distilled the essence into something lean, testable, and open-source friendly:

👉 What if a language model could not only chat, but remember, extract, and evaluate tasks with you?

Most chatbots forget, flatter, or hallucinate. I wanted a framework where memory is modular, tasks are testable, and due dates aren’t lost in conversation.

Aeon Core is the Minimum Viable Pegasus—a kernel that remembers, extracts tasks, and generates structured QA reports.

🛠️ What We Built

Aeon Core is a memory-aware, task-extraction system designed to:

🧩 Extract & Score Tasks From raw conversation logs → deduplicated JSONL → weighted tasks with confidence scores.

📅 Capture Due Dates Cleanly Even in messy conversation, due dates are pulled into structured fields.

⚙️ Modular Pipeline Each step is a script—preprocessing, weighting, viewing in Streamlit, and QA evaluation.

📊 Human-in-the-Loop QA Generates balanced label packs for human review, then produces Precision/Recall/F1, calibration curves, coverage vs accuracy, and error analysis.

🪵 Transparent Debugging All data flows through clean JSONL/CSV → making it easy to inspect, extend, or swap in other LLMs.

🧪 Technical Highlights

Conversation Preprocessing Converts messy conversations.json into deduplicated task candidates.

Task Weighting & Categorization Assigns scores (0–1) based on action verbs, due dates, repetition, and category cues.

Streamlit Viewer Interactive browsing of tasks (filters by category, weight, due date).

QA Harness Balanced sampling → human labels → automated metrics & plots.

Evaluation Outputs

Confusion Matrix

Precision / Recall / F1

PR Curve & AUC

Calibration Curve

Coverage vs Accuracy

🧩 Challenges We Faced

Messy Data vs. Clean Evaluation Personal chat history (single-user) was not ideal for owners/categories → solution: demo set omits owners, categories simplified.

Balancing Hackathon Scope Full Pegasus Unleashed is massive. Aeon Core focused only on task + due date extraction + QA evaluation pipeline.

🧠 What We Learned

✅ Small scope = real progress. Aeon Core shipped as a working kernel, instead of drowning in ambition.

✅ Testing builds trust. Precision/Recall, calibration, and coverage vs accuracy made the demo credible.

✅ Personal data ≠ team data. With multi-user corporate logs, this pipeline would shine even brighter.

🔮 What’s Next

Plug in live LLMs (OpenAI API, OSS-20B, Ollama).

Add long-term memory with FAISS/Qdrant.

Pre-defined team categories for real orgs.

Extend QA harness into auto-labeling for fine-tuning.

💡 Final Reflection

Aeon Core is not “just another chatbot.”

It’s a blueprint for reliable task extraction + evaluation from conversations. A way to prove and measure how well a memory system works.

This hackathon MVP shows:

Task extraction is possible.

Due dates can be tracked cleanly.

QA metrics make it trustworthy.

Aeon Core is the bare-bones nervous system of a larger dream. A minimal, testable, emotionally resonant memory kernel that speaks the language of loops.

✨ This isn’t a chatbot. It’s the foundation of human-aligned memory.

Built With

Share this project:

Updates