OmniLoop: The Autonomous SWE Interviewer

OmniLoop is a real-time AI interviewer built around a simple belief: interviews are not chat.
They’re a live, high‑bandwidth evaluation loop — voice, interruptions, screens, code, diagrams, tradeoffs, and evidence.

So we built OmniLoop as a bi-directional streaming system, not a “prompt + response” app. The interviewer speaks first, listens in real time, and stays grounded in what the candidate is actually doing on screen. When the question demands it, OmniLoop opens the right surface — a coding IDE or a system design board — and it keeps the evaluation fair by default: probe and assess, don’t teach.


Main highlights (why this is different)

1) Audio BiDi streaming that feels like a real round

Most “voice agents” still behave like turn-based chat. OmniLoop doesn’t.

  • continuous microphone PCM streaming → backend → live interviewer
  • streaming audio back → browser playback
  • interruption handling, so the loop stays natural

2) Vision streaming (screen share) that adds signal, not noise

OmniLoop sends periodic screenshots from the shared screen as vision input. The key is restraint: no narrating frames, no “we see a cursor moving.” Instead, the interviewer uses vision as background context and brings it up only at natural breaks:

  • “We can see you’re in the repo — open the entrypoint and the main service boundary.”
  • “Let’s look at how you deploy this — show the config and walk through failure modes.”

It’s closer to how a human interviewer actually uses a shared screen.

3) Resume + role-aware interviews via an auditable RAG pipeline

Generic questions aren’t fair to candidates and aren’t useful for evaluation. OmniLoop ingests:

  • resume (PDF → parse → chunk → embed)
  • job description (chunk → embed)

Then it exposes explicit tools:

  • resume_search(query)
  • role_search(query)

When these tools are used, the UI shows citations. That single design choice changes the trust profile:

  • claims become checkable
  • hallucinations become easier to spot
  • “in-context” feels real, not implied

4) Tool-driven “interview surfaces” (the right workspace at the right time)

Coding
When a coding question is asked, OmniLoop opens an in‑browser Python IDE (Monaco + Pyodide). Candidates run code locally, see structured pass/fail results, and send code + tests back as evidence.

System design
For architecture questions, OmniLoop opens a drawing board (Excalidraw). It sends:

  • a snapshot (so the model can see the diagram)
  • a structured scene summary (nodes / flows / boundaries) so reasoning isn’t dependent on pixels

This makes system design legible to the model and reviewable for humans.

5) A strict evaluator stance (fairness-first)

One of the biggest problems with AI interviewers is accidental teaching. OmniLoop is strict by default:

  • ask one question at a time
  • probe tradeoffs, correctness, and reasoning
  • do not reveal solutions
  • explain only if the candidate explicitly asks or clearly says “I don’t know”

That preserves evaluation signal, which is the whole point.

6) Async scoring pipeline with Redis Streams

The interview loop must stay low-latency. Scoring can be async. We separate them:

  • live interviewer session (real-time)
  • scoring worker (async) via Redis Streams

The worker returns strict JSON (rubric dimensions + gates + decision band), and the frontend displays a rolling decision panel in real time.

7) “Internal dataset” trail: events + artifacts you can learn from

If you want enterprise-grade systems, you need more than a transcript. OmniLoop records:

  • transcript events
  • tool calls / grounding events
  • board updates (summary + optional scene)
  • code submissions + test results
  • screenshots/artifacts
  • scores + decisions

This becomes the foundation for continuous improvement: calibration, rubric tuning, evaluation replay, and building internal datasets without guessing what happened.

8) Enterprise report output

At the end of an interview, OmniLoop generates a PDF report and can email it to an admin address — not as a gimmick, but as a closing loop: the interview produces something reviewable.


End-to-end flow (what the user experiences)

  1. Landing → Setup
    Configure role/level/focus/company. Upload resume + job description/context.

  2. Interview
    Live audio BiDi stream + screen share vision stream. Interviewer initiates and runs a strict, question-driven loop.

  3. Auto-surfaces
    Coding question → IDE opens.
    System design question → architecture board opens.

  4. Scoring + decision panel
    Worker scores answers asynchronously; UI shows rolling decision state.

  5. Finalize
    Persist artifacts/events + generate PDF report + optional email.


What we learned building it

The model is powerful, but the product is in the orchestration: low-latency streaming, interruption handling, clean tool flows, auditable grounding, and a persistence trail that turns “a conversation” into “an evaluable system.”

OmniLoop is our bet that the future of interview prep (and eventually evaluation) is not a smarter chat box — it’s a live, multimodal loop with evidence.

Built With

  • canvas-api
  • cloud-sql-(postgres)
  • docker
  • docker-compose
  • excalidraw
  • fastapi
  • google-cloud-run
  • google-cloud-storage-(gcs)
  • google-cloud-vertex-ai-(gemini-live-api)
  • google-genai-sdk
  • javascript
  • monaco-editor
  • psycopg
  • pyodide
  • python
  • react
  • react-router
  • redis-streams
  • reportlab
  • smtp
  • sqlalchemy
  • uvicorn
  • vite
  • web-audio-api
  • webrtc
  • websockets
Share this project:

Updates