Inspiration
Choosing which AI model to build on is harder than it should be. Vendor pages read like marketing copy, benchmark tables reduce a model to a single number on a test"}
Inspiration
Choosing which AI model to build on is harder than it should be. Vendor pages read like marketing copy, benchmark tables reduce a model to a single number on a test you'll never run, and the practical truth is scattered across forums and threads: where a model quietly fails, what it costs in latency, and who probably should not pick it.
We wanted a more honest signal, and we noticed something simple: the models most motivated to name a model's flaws are often the other models competing with it. They are also capable of recognizing its strengths. So instead of asking a model to describe itself, we decided to convene a panel of its rivals and ask them.
I wrote a longer build note with screenshots and a two-minute tour here: Don't take a model's word for it. Ask its rivals.
What it does
Pulse evaluates a subject AI model by putting a panel of evaluator models around it and asking every model the same three builder-focused questions through OpenRouter. It then assembles the answers into a clean, public, per-model page.
The public site shows a card for each published model snapshot. Each per-model page includes a synthesized Pulse summary for each question, plus every evaluator's full individual take one click away.
The admin console has three surfaces:
- Model Registry: a synced OpenRouter catalog of evaluator candidates.
- Snapshot Builder: pick a subject model, choose the six-model panel, ask three questions, run the evaluation live, preview, and publish.
- Snapshots Manager: review and delete published or draft snapshots.
The unit of content is a snapshot: what this panel said about this model, on this day. Publishing a snapshot writes a row to the database, which makes a card appear on the public home page. Deleting it removes the card. The database is the single source of truth.
How we built it
Pulse is a Next.js 16 application using the App Router, React Server Components, and Server Actions. It is written in TypeScript, styled with Tailwind CSS v4 and shadcn/ui, and deployed on Vercel.
The back end is Amazon Aurora PostgreSQL. The design decision we cared most about was how the app reaches the database. Every query goes through the RDS Data API, an HTTPS endpoint authenticated with IAM and an AWS Secrets Manager secret. That means the app holds no persistent database connection and needs no connection pool, which is a clean fit for serverless functions.
A single typed sql<T>() helper in lib/aurora.ts is the only place the database is touched. It converts JavaScript values to SqlParameters and maps result fields back to plain rows. No ORM.
Evaluations are orchestrated by POST /api/run-evaluation, which fans out to every evaluator in parallel with Promise.allSettled, stores each answer as a row under a new evaluation_run, and records latency and failures. Publishing and deletion are handled by Server Actions that write to Aurora and call revalidatePath, so the public site reflects changes immediately.
The schema has six core tables: subjects, evaluator_models, questions, evaluation_runs, answers, and snapshots. They are created idempotently by a setup-db route.
Challenges we ran into
- No connection pooling in serverless. Classic Postgres clients assume a long-lived connection. Routing everything through the RDS Data API removed that constraint, but meant building a thin typed query layer to translate values and result fields by hand.
- Heterogeneous models behave differently. The evaluator panel spans six providers. Reasoning-style models needed a larger token budget and no custom temperature, while other models worked better with simpler settings.
- Partial failures are normal. With six external calls per run, some calls may be slow or fail. We used
Promise.allSettled, tracked per-model latency, fallback, and error states, marked runs ascompletedorpartial, and added a single retry within OpenRouter's reported affordable token budget when credits run low. - Keeping live content honest. We deliberately avoided client-only state for what is published. The home page reads only published snapshots straight from Aurora, deduped to the latest per subject, so there is exactly one source of truth.
Accomplishments that we're proud of
- A genuinely live evaluation where six models answer in parallel, with real latencies, and the result is persisted to Aurora.
- A clean separation between the raw evaluation store and the publishing layer, so work in progress in the admin and live content on the public site do not drift apart.
- A serverless-native data path using the RDS Data API, with no pool, no ORM, and one typed helper.
- Output that is actually useful: competing perspectives that disagree in informative ways, written for an engineer with one weekend to ship.
What we learned
- Matching the database to the read patterns mattered. Pulse's value lives on the read side: published cards, per-model pages, deduped summaries, and snapshot lookup. Aurora PostgreSQL was a natural fit.
- The RDS Data API is an underrated fit for serverless apps. Trading persistent connections for IAM-authenticated HTTPS calls removes a whole class of operational pain.
- A demo lands better when the system is genuinely doing the thing. Running the panel for real and showing it stored in Aurora is more convincing than a static page that only claims to be evaluated.
- The most interesting AI product surfaces often come from comparison, not single-answer generation. The disagreements between models are part of the signal.
What's next for Pulse
- Temporal tracking: re-run the same panel on a schedule and show how consensus shifts as models update over weeks.
- Visitor-initiated panels: let anyone run a panel on a model Pulse has not covered yet, turning a curated gallery into a living, community-fed reference.
- Richer synthesis: add structured agreement and disagreement views, plus confidence signals across the evaluator panel.
Built With
- amazon-aurora
- amazon-rds-data-api
- aws-iam
- aws-secrets-manager
- lucide-react
- nextjs
- node.js
- openrouter
- postgresql
- react
- shadcn-ui
- tailwindcss
- typescript
- vercel
- vercel-analytics

Log in or sign up for Devpost to join the conversation.