BeeBob - AI co-pilot for research operations

AI is working
AI quality checker
Participant list
Study dashboard
Auto-screener survey generator
Prompt for research study

Inspiration

This project was inspired by my experience working on a design team, where we closely collaborated with user researchers. One of the biggest bottlenecks was user recruitment—a repetitive and manual process involving multiple tools like Typeform, SurveyMonkey, Calendly, and Email tools.

A recurring issue was that if participants didn’t meet the required criteria, research results were compromised, especially in time-intensive moderated interviews. As a result, researchers became more selective, and recruitment slowed down even more. This operational burden consumed up to half their time, preventing them from focusing on what truly matters: generating insights, identifying opportunities, and guiding product direction.

I saw a clear opportunity—automate the non-strategic parts of research ops so teams can focus on impact, not logistics.

What it does

BeeBob is a full-stack AI co-pilot for research operations. It automates everything from:

AI-assisted screener survey generation
Participant criteria and scoring setup
Quality check logic powered by AI
Dashboard tracking of study status and progress

With BeeBob, teams can launch studies faster, recruit higher-quality participants, and spend more time delivering insights.

How we built it

We started by mapping the typical user research recruitment flow—from planning to screening to scheduling—and identifying the friction points. Then we built:

An AI generation pipeline for screener surveys and quality check questions.
A simple frontend using Bolt for rapid prototyping.
Internal PRDs and structured prompts to guide UI generation.
An early backend to simulate database behavior and state transitions across steps.

We focused on creating a step-by-step workflow that mirrors how researchers think—while letting AI do the heavy lifting.

Challenges we ran into

AI limitations: While powerful, AI improvises. It often generates different outputs for the same prompt. We had to get very specific—down to page names and component locations—to get consistent results.
Prompt iteration fatigue: More complex layouts required tighter, more structured prompts, which took time to perfect.
Tooling constraints: While Bolt made prototyping fast, we hit limitations when trying to connect state across pages or simulate backend data flow.
Credit usage: AI cost management became a real issue as token usage ramped up quickly during testing.
Agentic System Design: Designing an effective agentic architecture is both complex and critical. We invested significant time exploring and iterating on the optimal configuration. In parallel, we recognized the importance of building a robust Retrieval-Augmented Generation (RAG) system. Ultimately, we adopted a hybrid RAG approach that combines graph-based and traditional RAG methods—laying the foundation for delivering causal, relation-driven insights to users in the future.

Accomplishments that we're proud of

Built a working end-to-end flow for screener + participant qualification in just a few weeks.
Created a scalable AI prompt system tied to real UX workflows.
Designed a modular setup that can expand into other parts of the research ops stack.
Brought clarity to a messy process and surfaced it in a clean, intuitive interface.
Real-World Use Case as a Startup: We approached this hackathon not as a demo, but as a launchpad for a real-world product with business potential. Over five in-depth user interview sessions, we validated the problem space and iteratively refined our solution—treating this process as we would a real startup.

What we learned

AI is best when scoped: You can’t “AI everything” at once. Start small, validate, then scale.
Structure is everything: Writing clear PRDs and interface requirements dramatically improves AI consistency.
Fast doesn’t mean easy: Tools like Bolt are great for speed, but real functionality still needs backend support.
Research ops is deeply underserved: There’s a real gap between what tools offer and what researchers need. Automating the right layers creates massive leverage.

What's next for BeeBob – Full-stack AI co-pilot for research operations

Next, we plan to:

Add calendar and interview scheduling automation
Expand participant database integration
Build a more robust AI interview quality scoring system
Offer research insight summaries using AI

Ultimately, BeeBob aims to become the research operations autopilot—reducing friction, increasing quality, and letting researchers focus on strategy, not logistics.

Built With

autogen
elevenlab-as-voice-interface
elevenlabs
fastapi
fastapi-+-langchain-+-mcp-+-websocket-for-different-kind-of-agents
google-adk-for-agent-to-agent-comunication-with-redis-as-msg-pub/sub
googleadk
graphiti
graphiti-by-getzep.com-for-building-episode-for-graph
langchain
langchain/langraph-for-ochestration
langgraph
mcp
neo4j
neo4j-for-graphdb
redis
supabase
websocket

Updates

Tina Huang started this project — Jun 30, 2025 06:13 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.