PodButler

intro page

Inspiration

I wanted a “butler” for my daily tasks: something that can turn messy, natural-language plans into clean, actionable to‑dos with proper due times—without forcing me to manually split, name, and schedule everything. The hackathon theme around Flutter + Serverpod was the perfect chance to build an end‑to‑end product that feels agentic but stays practical and reliable.

What it does

PodButler is a Flutter app backed by a Serverpod API that helps you create tasks quickly using natural language.

Key capabilities:

Natural-language task parsing (English + Indonesian): understands phrases like “tomorrow at 10”, “in 2 hours”, “next Monday”, etc.
Smart Task Breakdown: paste a complex plan and PodButler decomposes it into multiple smaller tasks and creates them in one request.
Due time intelligence: improves reliability for relative dates by injecting current time context (UTC + Asia/Jakarta) into the model prompt.
Batch creation API: a dedicated backend endpoint creates multiple tasks at once (parseAndCreateMany) so the UI stays fast and consistent.
Observability for AI: every parse request stores an LLM trace (llm_trace) in Postgres (input, raw model output, heuristic due-time, final JSON, errors) for offline evaluation and debugging.
Export tooling: traces can be exported to JSON/CSV and prepared for offline evaluation workflows (e.g., Opik).

How we built it

Flutter for the mobile/web UI and user experience.
Serverpod for the backend: endpoints, ORM persistence, and clean API boundaries.
LLM integration (Gemini) on the server side with strict JSON outputs, robust coercion (array/object), and safe fallbacks.
Trace-first workflow: instead of coupling the app to a live observability SDK, we log high-signal structured traces in the database and export them for later analysis.

Challenges we ran into

Relative time ambiguity & time zones: “tomorrow” or “next week” can shift depending on locale/timezone. We improved consistency by explicitly providing “current time” in both UTC and Asia/Jakarta inside the prompt.
Multi-step intent parsing: users often paste paragraphs, not one task. Turning that into multiple tasks required careful prompting, JSON validation, and a batch API path.
Debugging production mismatches: we hit cases where the hosted API lagged behind local changes—solved by aligning deployment and adding a simple smoke-test script.

What we learned

“Agentic” UX is not just the model—it’s the pipeline: strict schemas, validation, error handling, and observability matter as much as prompting.
Storing structured traces early makes iteration dramatically faster and safer, especially under hackathon time pressure.

What’s next

One-click import of traces into evaluation tools (Opik) and automated scoring.
Better task templates (meeting prep, workouts, travel planning) and smarter recurrence handling.
Optional real-time observability integration once requirements are stable.

Built With

flutter
gemini
opik
serverpod

Updates

Posma Janius Sianturi started this project — Jan 29, 2026 10:22 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.