Inspiration
I wanted a “butler” for my daily tasks: something that can turn messy, natural-language plans into clean, actionable to‑dos with proper due times—without forcing me to manually split, name, and schedule everything. The hackathon theme around Flutter + Serverpod was the perfect chance to build an end‑to‑end product that feels agentic but stays practical and reliable.
What it does
PodButler is a Flutter app backed by a Serverpod API that helps you create tasks quickly using natural language.
Key capabilities:
- Natural-language task parsing (English + Indonesian): understands phrases like “tomorrow at 10”, “in 2 hours”, “next Monday”, etc.
- Smart Task Breakdown: paste a complex plan and PodButler decomposes it into multiple smaller tasks and creates them in one request.
- Due time intelligence: improves reliability for relative dates by injecting current time context (UTC + Asia/Jakarta) into the model prompt.
- Batch creation API: a dedicated backend endpoint creates multiple tasks at once (
parseAndCreateMany) so the UI stays fast and consistent. - Observability for AI: every parse request stores an LLM trace (
llm_trace) in Postgres (input, raw model output, heuristic due-time, final JSON, errors) for offline evaluation and debugging. - Export tooling: traces can be exported to JSON/CSV and prepared for offline evaluation workflows (e.g., Opik).
How we built it
- Flutter for the mobile/web UI and user experience.
- Serverpod for the backend: endpoints, ORM persistence, and clean API boundaries.
- LLM integration (Gemini) on the server side with strict JSON outputs, robust coercion (array/object), and safe fallbacks.
- Trace-first workflow: instead of coupling the app to a live observability SDK, we log high-signal structured traces in the database and export them for later analysis.
Challenges we ran into
- Relative time ambiguity & time zones: “tomorrow” or “next week” can shift depending on locale/timezone. We improved consistency by explicitly providing “current time” in both UTC and Asia/Jakarta inside the prompt.
- Multi-step intent parsing: users often paste paragraphs, not one task. Turning that into multiple tasks required careful prompting, JSON validation, and a batch API path.
- Debugging production mismatches: we hit cases where the hosted API lagged behind local changes—solved by aligning deployment and adding a simple smoke-test script.
What we learned
- “Agentic” UX is not just the model—it’s the pipeline: strict schemas, validation, error handling, and observability matter as much as prompting.
- Storing structured traces early makes iteration dramatically faster and safer, especially under hackathon time pressure.
What’s next
- One-click import of traces into evaluation tools (Opik) and automated scoring.
- Better task templates (meeting prep, workouts, travel planning) and smarter recurrence handling.
- Optional real-time observability integration once requirements are stable.
Log in or sign up for Devpost to join the conversation.