ClawBerries — Your Hiring Detective

Inspiration

HR Managers, especially in the local tech scene, "live in Telegram." Yet, when it comes to candidate screening, they are forced to log into clunky Applicant Tracking Systems (ATS) and manually cross-reference claims by digging through LinkedIn, GitHub, and scattered portfolios. Furthermore, resumes are frequently exaggerated and wildly inconsistent in formatting. We wanted to build a "Hiring Detective" — an autonomous AI researcher that lives exactly where HR managers already communicate, instantly cross-checking a candidate's CV against their actual open-web footprint.

What it does

ClawBerries is an on-demand candidate research agent. An HR manager can either forward a candidate's CV via a Google Form intake, interact with the system through Telegram via OpenClaw, or use the web dashboard for a more visual and trackable experience. Within minutes, the system:

  1. Parses the CV using Gemini Vision — extracting structured identity, work history, education, skills, publications, and awards from even the most complex PDF layouts.
  2. Dispatches parallel autonomous agents (TinyFish) to investigate the candidate's LinkedIn, GitHub, portfolio, employer registries, and the open web.
  3. Cross-references every claim against retrieved web evidence using LLM synthesis with structured output, detecting inconsistencies (e.g., "CV says Tech Lead, LinkedIn shows Senior Engineer").
  4. Delivers a comprehensive candidate brief — CV validity score, verified claims, gap analysis, and word-for-word interview questions — back to HR via Telegram or the web dashboard.

How we built it

We architected a modular, asynchronous pipeline using a modern TypeScript stack:

  • Interface: Google Forms for applicant intake, OpenClaw + Telegram for HR interaction, and a React/Vite web application for visual candidate management. OpenClaw orchestrates the verification flow via a custom skill, running CLI commands asynchronously.
  • CV Parsing: Google Gemini's multimodal Vision API handles complex OCR and structured JSON extraction, converting unstructured PDFs into machine-readable PdfOcrResult objects with identity, work history, skills, publications, and awards.
  • Research Agents: TinyFish browser automation agents run in parallel via SSE streaming, with a concurrency limiter matching TinyFish's 2-slot limit. Agents with direct URLs (LinkedIn, GitHub, portfolio) navigate directly; agents without URLs start at Google and reason about what to search.
  • LLM Synthesis: Gemini (with full OpenAI support) produces structured candidate briefs via JSON Schema enforcement (responseJsonSchema for Gemini, response_format.json_schema with strict mode for OpenAI). The output includes CV validity scoring, per-claim verification, gap analysis, and prioritized interview must-confirm items.
  • Orchestration: Node.js with Redis (ioredis) manages task dispatch, progress tracking, and inter-step coordination. The pipeline runs asynchronously — the CLI returns immediately and HR can check status at any time.
  • Data Layer: PostgreSQL with Drizzle ORM for type-safe operations, storing candidate profiles, agent results, synthesized briefs, and audit logs.
  • OpenClaw + ByteRover: OpenClaw serves as the Telegram-based AI assistant for HR. The ByteRover plugin (openclaw plugins install @byterover/byterover) gives OpenClaw persistent memory via a knowledge management system, storing project patterns and decisions in .brv/context-tree/ as version-controllable Markdown files. Core commands: brv query to retrieve context, brv curate to store knowledge. No API key needed for basic operations.
  • Webhook Pipeline: Google Forms → Apps Script → Cloudflare tunnel → ClawBerries webhook server → Telegram notification with inline buttons → OpenClaw handles button callbacks via the ClawBerries skill.
  • Tooling: pnpm, Vitest for testing, Biome for linting/formatting, Docker Compose for local Postgres + Redis.

Challenges we ran into

  • The CV Parsing Nightmare: Resumes have zero standardization. Traditional rule-based parsers consistently broke on multi-column layouts, graphics, or image-based PDFs. We pivoted to using Gemini's Vision API to simultaneously "see" the layout and extract entities accurately — including Vietnamese names with diacritics.
  • TinyFish Concurrency & Timeouts: TinyFish browser agents have a 2-concurrent-run limit, and each run takes 40-170 seconds for complex goals. We initially dispatched all agents simultaneously, causing mass timeouts. We solved this with a concurrency limiter and realistic per-agent timeouts calibrated from observed execution times.
  • Search Engine Fallbacks: Early iterations constructed Google search URLs for agents without direct URLs. This produced poor results — overly specific queries returned empty pages, and the LLM fabricated placeholder URLs (example.com). We solved this by filtering out search engine URLs and instead sending agents to google.com with context-rich goals, letting TinyFish reason about what to search.
  • Structured LLM Output: Getting consistent, parseable JSON from LLMs required moving beyond prompt instructions to schema enforcement — Gemini's responseJsonSchema and OpenAI's strict mode json_schema. This eliminated malformed responses and guaranteed the CandidateBrief type contract.
  • Dual-Interface Consistency: Ensuring the Telegram bot, OpenClaw skill, CLI, and web application all read from the same database state without race conditions. The async pipeline design (immediate return + status polling) was key to making this work.

Accomplishments that we're proud of

  • Flexible Multi-Channel UX: Google Form intake → Telegram notification → OpenClaw orchestration → Web dashboard review. HR can interact through whichever channel fits their workflow.
  • Resilient Multimodal Extraction: Successfully processes complex, multi-column PDF resume layouts with Vietnamese diacritics without breaking.
  • Structured Verification Reports: The synthesis step produces actionable, structured briefs with CV validity scores (0-100), per-claim verification status, severity-rated inconsistencies, gap analysis, and word-for-word interview questions — not just a text summary.
  • Global CLI: clawberries run, status, report, cancel, serve, install-skill — works from anywhere on the machine after pnpm link --global.
  • OpenClaw Skill Integration: OpenClaw handles the full verification flow asynchronously via the ClawBerries skill — start the pipeline, check progress, deliver reports — all through natural Telegram conversation with HR.

What we learned

  • Multi-modal LLMs are vastly superior to traditional text-extraction libraries when dealing with human-designed, unstructured documents like resumes.
  • "Meeting users where they already are" (Telegram) is the most effective way to drive adoption for enterprise AI tools — but providing a web interface is crucial for scalability, collaboration, and visibility.
  • Browser automation agents (TinyFish) are powerful for web research but require careful concurrency management and generous timeouts — simple tasks take 30-60 seconds, complex multi-step goals take 2-4 minutes.
  • Schema-enforced structured output (JSON Schema) is essential for reliable LLM integration — prompt instructions alone are insufficient for production systems.
  • Reliably coordinating parallel AI agents requires rigorous data schemas, robust failure handling, and a clear async pattern (fire-and-forget + status polling) to prevent timeouts from bottlenecking the entire system.

What's next for ClawBerries

  • Deep Tech Screening: Upgrading the GitHub agent to analyze actual code quality, commit depth, and contribution patterns — not just repository names and star counts.
  • ATS Integrations: Automatically pushing the verified candidate brief and parsed JSON into platforms like Greenhouse or Workable post-delivery.
  • Automated Candidate Outreach: Evolving the system to automatically send calendar invites or conduct preliminary pre-screen interactions via Telegram before human intervention.
  • Collaborative Web Features: Expanding the web dashboard into a collaborative workspace where hiring teams can review, comment, and track candidate evaluations together.
  • Production Infrastructure: Named Cloudflare tunnels (or deployed server), background job scheduling for data lifecycle management, and SLA monitoring.

Implementation Status

Step Description Status
3 CV parsing via Gemini Vision OCR ✅ Done
4 Agent planning & dispatch ✅ Done
5 Parallel research execution (TinyFish) ✅ Done
6 Live progress reporting (Telegram) ✅ Done
7 LLM synthesis with structured output ✅ Done
8 Brief assembly & formatting ✅ Done
9 Delivery (webhook + CLI + OpenClaw skill) ✅ Done
10 Post-delivery actions (Deep Dive, ATS) ⬜ Future
11 Data lifecycle & background ops ⬜ Future

Tech Stack

Layer Technology
Runtime Node.js (ESM) + TypeScript
CV OCR Google Gemini (multimodal Vision API)
Browser Automation TinyFish SSE endpoint
LLM Synthesis Gemini + OpenAI (structured output via JSON Schema)
Telegram Interface OpenClaw + ByteRover plugin
Web Interface React + Vite + Tailwind
Database PostgreSQL + Drizzle ORM
Cache / Queue Redis + ioredis
Webhook Node HTTP server + Cloudflare tunnel
Validation Zod v4
Build tsup, tsx
Lint Biome
Tests Vitest (unit + integration)

Built With

  • agentic-workflow
  • ai
  • byterover
  • codex
  • hiring
  • large-language-models
  • openai
  • tinyfish
Share this project:

Updates