Project Story: Prompt-Prompter
i had ai write the technical bits where things get listed but tried to answer the subheadings myself im tierd theres 5 hours left til deadline and i just got home from the fireworks. happy new year hackathon!
Inspiration
theres this feeling when you put real effort into a prompt and it just works... genuinely returning exactly what you wanted. but then theres the frustration of lazy prompting where you get below-average results and wonder why you even bothered.
heres the thing.... sometimes crafting the prompt becomes the whole project. you lose sight of what you were actually building becuase youve lost sleep iterating on a prompt for the hundredth time. it feels crazy to fix things manually, so you set up an automation.... a nice workflow getting consistant results. for a moment
then, inevitably, theres a new model update, or maybe just the universe messing with you when literally nothing has changed, and you come back to find catastrophic errors and a total mess. honestly, it would have been quicker to do it all by hand. but we push through because automation is the goal, right?
i realized Datadog is awesome for holding the LLM accountable. instead of just hoping for the best, i wanted to empower the user AND the LLM.. so asked: "Why arent we treating prompts like infrastructure?" if a K8s node fails, it restarts. if a prompt's quality "decays" it should "heal" itself
What it does
Prompt-Prompter monitors prompt inputs in to guide and nurture top-tier prompting as second nature, not a superpower. basically, it catches bad habits 'before' they happen and surfaces whats actually going on so you can get better without even thinking about it
its like having someone watching over your shoulder.... but helpful, not annoying.... and masked by actual data, not just "vibes."
[ai list]
- Executes your prompt via Google Vertex AI Gemini 3.0 Flash.
- Measures performance quantitatively (Accuracy, Hallucination Risk, Cost).
- Streams telemetry to Datadog in real-time.
- Auto-Heals: if the data shows a prompt is garbage (e.g., accuracy $< 0.8$), it triggers an optimization cycle to rewrite it instantly.
How it got built...
hooked into Datadog's LLM observability to watch prompts as theyre being crafted... not after the fact when its too late, but right there in the moment. the idea was to catch lazy prompting before it gets sent and gently nudge toward better rewrites. had it working for a moment but it got bogged down in the fray of fixing things and making sure i entered something before deadline
i asked ai to build a modern stack focused on speed:
[ai list]
- Backend: Python 3.11 with FastAPI for high-concurrency async processing.
- AI Engine: Google Vertex AI (Gemini 3.0 Flash) for low-latency inference.
- Frontend: SolidJS + TypeScript on Bun. SolidJS was crucial for handling the real-time metric streams without destroying the browser.
- Observability: Datadog is the heart of it. we built a feedback loop that actually tells you whats going on instead of just logging everything and hoping youll look at it later (spoiler: i probably wouldnt).
the core logic uses a multi-objective optimization function: $$ \text{Score} = \max \left( \alpha \cdot \text{Accuracy} + \beta \cdot (1 - \text{Hallucination}) - \gamma \cdot \text{Cost} \right) $$
Challenges
honestly, the biggest one was the meta problem of prompting to build something about prompting... spent way too long iterating prompts for the analysis layer lol.
figuring out what even makes a 'good' prompt was harder than expected becuase its kinda subjective and changes depending on the model or topic... had to engineer a framework of sorts to penalize ambiguity
then there was the 'realtime aspect'... it had to be fast enough that it doesnt slow you down but thorough enough to be useful. and of course, dealing with pesky model drift.... building detection for when your perfectly tuned prompts just stop working becuase of an update or sometimes for no reason at all
Accompl'ish'ments
taking prompting from this mystical superpower that only a few people have figured out and making it feel like "second nature for everyone" (hopefully.. still needs work on the seamless bit)
my favourite bit is i built actual accountability into LLM workflows without making it (too) annoying. probably the thing im most proud of is turning the "frustration loop".... where the prompt becomes the project.... into something that captures knowledge and provides unique, useful data. Datadog as empowerment, not just debugging, is a whole 'vibe' in the most ironic way
What was learned
the difference between hoping for the best and actually 'knowing' whats happening is massive. like, genuinely game-changing. turns out prompt quality IS measurable; you just need to be watching the right things.
learned that "automation without visibility is basically just chaos-casino", reserved for when you least expect it. users trying to get a result of a certain standard dont need more magic.... they want to understand how things work so they can fix it when they dont. it feels really rewarding to know that the seed has been planted here.
What's next for Prompt-Prompter
(writing this after finally getting the demo uploaded on New Year's Eve... better to submit something than miss the deadline!)
[ai list]
- Prompt Versioning: so you can actually rollback when an update breaks evrything.
- Cross-Model Calibration: becuase what works on one LLM might be garbage on another.
- Team Level Insights: learning from everyones patterns, not just your own.
- Proactive Alerts: catching degradation before catastrophic failure instead of after.
- Integrations Everywhere: IDE plugins, CI/CD hooks, Slack notifs for prompt health.
honestly, just keep pushing on making this feel invisible... like second nature, not another tool to manage. its going to be useful in literally every project i can think of.
Built With
- bun
- datadog
- ddtrace
- docker
- docker-compose
- fastapi
- gcp
- gemini
- javascript
- pydantic
- pytest
- python
- react
- recharts
- ruff
- statssd
- typescript
- uv
- vertexai-sdk
- vite
Log in or sign up for Devpost to join the conversation.