Inspiration
I've sat in a lot of meetings where everything feels resolved by the end. Decisions made. Owners named. Someone says "I'll get that done by Friday." People nod.
Then Friday comes.
Nothing happened. Not because anyone was lazy or dishonest but because that spoken commitment lived entirely in people's heads. It never became a task. It never got tracked. The gap between saying something and doing something is where good intentions go to die.
I looked at the tools that exist. Otter.ai, Fireflies, Notion AI — they all solve transcription. They give you a clean summary of what was said. But a summary doesn't ship code. A summary doesn't create the Jira ticket. A summary is just a more readable version of the same problem: important commitments sitting in a document nobody will re-read.
The question I kept coming back to was: what if the meeting could execute itself?
Not summarize. Execute. Turn the sentence "I will finalize the PRD by March 14th" into an actual ticket with an owner and a deadline — before the meeting recording even finishes uploading.
That question became DecisionPilot.
What it does
DecisionPilot is an AI pipeline that converts a raw meeting recording into executed action items. You upload an audio file — and optionally, slides or whiteboard photos. In under 60 seconds, the system analyzes the meeting and routes every commitment to one of three outcomes:
$$ \text{Route}(item) = \begin{cases} \text{AUTO} & \text{if } \bar{c} \geq 0.80 \ \text{REVIEW} & \text{if } 0.65 \leq \bar{c} < 0.80 \ \text{CLARIFY} & \text{if } \bar{c} < 0.65 \end{cases} $$
AUTO items become real Jira tickets immediately. Not stubs. Not previews. Live tickets, clickable, assigned, with the verbatim quote as the description.
REVIEW items are surfaced as diff-style suggestions — the system shows you exactly what it would change and why, and you approve with one click.
CLARIFY items don't get ticketed at all. Instead, the system identifies the precise question that must be answered first — "Who is responsible for the rollback plan?" — rather than creating a bad ticket that will sit unworked.
The confidence score $\bar{c}$ is the average across four dimensions Nova Lite evaluates independently:
$$\bar{c} = \frac{c_{\text{action clarity}} + c_{\text{ownership certainty}} + c_{\text{evidence strength}} + c_{\text{deadline clarity}}}{4}$$
On top of execution, the system also detects missed commitments — vague sentences like "someone should probably handle the marketing copy" that carry real intent but fail the quality gate. These get flagged explicitly so they don't disappear into the noise.
Every meeting gets a Meeting Accountability Score from 0–100 with a grade A–F, broken down by six dimensions: ownership clarity, action clarity, evidence quality, deadline presence, ambiguity rate, and risk coverage. It answers the question every team should ask after every meeting: how likely is this meeting to actually produce results?
How we built it
The core architectural principle was: AI should reason. Deterministic systems should verify.
Every stage in the pipeline is deliberately typed — either AI judgment or deterministic rule — and they never mix.
| Stage | Technology | Type |
|---|---|---|
| Transcription | AWS Transcribe (speaker diarization) | Deterministic |
| Slide & whiteboard vision | Amazon Nova Lite (multimodal) | AI |
| Decision extraction | Amazon Nova Lite (temperature=0) |
AI |
| Evidence linking | Custom rule engine | Deterministic |
| Quality gate | 7 rule-based checks | Deterministic |
| Confidence scoring | Amazon Nova Lite (second-pass) | AI |
| Execution router | Threshold logic | Deterministic |
| Ticket creation | Jira REST API v3 | Deterministic |
Nova Lite is invoked three times with fundamentally different tasks each time:
Extraction — reads the transcript and outputs structured JSON: action items, decisions, risks, open questions, with speaker attribution and ISO deadline parsing. Runs at
temperature=0to get the same answer every time.Vision — receives JPEG frames of uploaded slides and whiteboard photos. Extracts commitments that were written but never spoken. These inject as
[SLIDE-N]evidence chunks that flow through the rest of the pipeline identically to transcript evidence — no special handling required.Confidence scoring — re-reads each action item alongside its verbatim quote and scores it on all four dimensions independently. This second-pass judgment is what enables precise routing rather than a binary pass/fail.
The backend is FastAPI with a multi-stage pipeline that runs as a background task. Results are persisted to disk so they survive server restarts. The frontend is Next.js 15 with a live pipeline stage tracker, and for completed meetings, data is server-side prefetched so the results appear instantly with no loading state.
Challenges we ran into
Preventing hallucination in citations
The hardest problem. Language models want to summarize and paraphrase. We needed verbatim quotes — because if a citation can't be verified against the raw transcript, the action item might not even be real. The solution was three-layered: temperature=0 to remove sampling variance, a strict prompt requiring exact quote reproduction, and a deterministic verification engine that rejects any item whose cited text cannot be found in the transcript. An action item that fails citation verification fails the quality gate entirely.
Calibrating the routing thresholds
The values $c_{\text{AUTO}} = 0.80$ and $c_{\text{REVIEW}} = 0.65$ aren't arbitrary. We ran the pipeline against multiple real meeting types — sprint planning, design reviews, sales calls — and measured how the confidence distribution shifted. Too low a threshold and garbage tickets get auto-created. Too high and everything lands in REVIEW, defeating the purpose. The final values produce roughly 60% AUTO, 25% REVIEW, 15% CLARIFY across a diverse meeting corpus.
Making slide-derived evidence invisible to downstream stages
We wanted slide content to flow through the quality gate, confidence scorer, and router identically to transcript content — with no special-casing anywhere. The solution was a simple identifier convention: slide items inject as [SLIDE-1], [SLIDE-2], etc. To every downstream stage, they look like any other evidence chunk. It took about an hour to implement and made the architecture dramatically simpler.
The missed commitments detector
Getting useful signal without generating noise. "I'll look into that" should be flagged. "I'll grab a coffee" should not. We ended up with 12 handcrafted regex patterns tuned to commitment-shaped language — vague ownership pronouns, indefinite timelines, passive-voice task assignments — combined with a confidence tier system so judges can see at a glance which flags are worth acting on. Zero LLM calls. Instant. Completely deterministic.
Accomplishments that we're proud of
The moment we saw the first real Jira ticket appear — from audio, automatically, with the right owner and deadline — everything clicked. That's the thing we set out to build. It works.
More specifically:
- Zero mocked AI calls. Every Nova Lite invocation hits real Bedrock endpoints. Every Jira ticket is a real ticket you can click and comment on.
- The CLARIFY path turned out to be the most valuable feature. We originally thought AUTO was the star. It isn't. The most valuable thing the system does is refuse to create a bad ticket and instead surface the exact question that needs to be answered. That's the insight: a missing Jira ticket is better than a wrong one.
- Nova Lite's vision capability surprised us. We expected it to handle clean slide decks. It also extracted structured commitments from messy whiteboard photos, handwritten bullets, and complex architecture diagrams. Far beyond what we anticipated.
- The evidence playback feature. You can click any action item, see the verbatim quote, and play the exact audio timestamp where it was spoken. Full auditability from ticket back to voice.
What we learned
The single biggest lesson: AI systems become trustworthy when they admit what they don't know.
The routing system doesn't try to handle everything. It routes uncertain items to humans and asks targeted questions. That restraint — knowing when not to act — is what makes the AUTO band trustworthy. If everything got auto-executed, you'd never trust it. Because some things don't get executed, you can trust the ones that do.
We also learned that separating AI stages from deterministic stages isn't just good engineering — it's what makes the system auditable. If a ticket gets created, you can trace exactly why: which confidence scores, which gate checks, which verbatim quote. Nothing is a black box.
And practically: building against AWS Bedrock with Nova Lite is fast. The latency on extraction is well under 10 seconds for a 14-minute meeting transcript. The multimodal vision endpoint accepted our whiteboard JPEGs without any preprocessing beyond resizing. The API is clean.
What's next for DecisionPilot
The immediate roadmap is about closing the remaining gaps in the execution loop:
REVIEW band execution — right now REVIEW items queue for human approval but the 1-click approval flow that creates the Jira ticket with edits pre-applied isn't wired up yet. That's the next thing to build.
Nova Act integration — replace the Jira REST API call with Nova Act browser automation. The ticket creation becomes visible, inspectable, and auditable in real time. More importantly, it opens the door to creating tickets in any project management tool — Linear, Asana, GitHub Issues — without building a dedicated integration for each.
Slack/Teams integration — CLARIFY questions should route directly to the meeting channel. The system knows who was in the room and what channel they use. "Hey Marcus — who owns the rollback plan?" posted automatically, with the context quote attached.
Retroactive analysis — run DecisionPilot against historical meeting recordings to surface forgotten commitments. Most teams have months of recordings sitting in Google Drive that represent hundreds of untracked action items.
Cloud deployment — currently running on local infrastructure via Cloudflare tunnel. The architecture is stateless enough (S3 for uploads, RDS for results) that moving to ECS would be a week of work.
The vision is simple: meetings should end with work already in motion. Not with a follow-up email. Not with a summary doc. With tickets in the board and owners already notified. DecisionPilot is the missing layer between conversation and execution — and we're just getting started.
Built With
- amazon-bedrock
- amazon-nova-lite
- aws-transcribe
- fastapi
- jira-api
- next-js
- pdf2image
- python
- tailwind-css
- typescript

Log in or sign up for Devpost to join the conversation.