Inspiration

I'll be honest — I don't come from a testing background. I'm a developer who's been working on RAG pipelines and AI projects, and I've been actively using AI-powered tools in my workflow. I was on VS Code for a long time before switching to an AI-powered IDE, and the experience of having AI understand my code and help me build faster completely changed how I work.

When I heard about this hackathon, I had no idea what to build. I was stuck for days.

Then one evening, I was talking to a friend who was manually testing an API. He was writing requests by hand, checking responses one by one, and he missed a few critical edge cases. Out of curiosity, I described those same API endpoints to an AI chat — and it instantly generated test cases that covered not just the happy paths, but edge cases my friend had completely overlooked. I ran them with curl and they worked. That was my first "aha" moment.

A few days later, I was talking to a colleague who does API testing for his team. He was also using AI — but differently. He'd script the AI to generate test code, run it from the terminal, and parse the results manually. No UI, no visual feedback, just raw scripts. When he needed to do quick one-off tests, he'd switch to Postman. But he told me something that stuck: "Postman is great, but it has zero AI. And my AI scripts have zero UI. I wish there was one tool that had both."

That was the second "aha" moment. I thought — what if I took the best of both worlds?

Then I tried something else. I took an OpenAPI spec from one of our projects and gave it to an LLM. It generated test scripts that understood the exact endpoints, the request schemas, the required fields — everything. My colleague tried it and was genuinely happy. He said: "I don't have to manually fill in the JSON body anymore. This actually understands the API."

That's when I knew what to build. Not just another Postman clone, but an API testing tool where AI is a first-class citizen — where you can talk to it in plain English, where it reads your OpenAPI spec and generates tests that actually make sense, where it tells you why a test failed instead of just showing you a red status code.

That's how TestOrbit was born.

What it does

TestOrbit is a full-featured API testing platform with deep AI integration powered by Google Gemini:

✨ Key Features

  • Natural Language Requests — Describe API calls in plain English. Gemini creates the request with the right method, URL, headers, and body.
  • AI Test Generation — Paste your OpenAPI spec, and Gemini generates comprehensive test suites (happy paths, edge cases, security checks).
  • Instant Mock Server — Send any request to /mock-server/* and Gemini invents realistic JSON responses on the fly. No configuration, no JSON files, no setup.
  • One-Click Failure Diagnosis — When a test fails, Gemini analyzes the root cause and suggests a fix.
  • Agent Integration (MCP) — Exposes testing tools via Model Context Protocol so AI agents like Claude, Cursor, and Windsurf can programmatically run tests.
  • Visual Dashboard — View all active mocks, request logs, and manage endpoints.
  • Interactive Tutorials — In-app feature cards that perform real UI actions to show users exactly how features work.
  • Multi-Language Code Gen — Convert any request into production code (Python, JS, Go, Java, cURL).

Professional Testing Features

  • Batch Collection Runner — Run entire test suites sequentially or in parallel with progress tracking, variable chaining via pre/post scripts, and per-request assertion evaluation.
  • Pre/Post Scripts — Write JavaScript scripts that run before or after each request. Full Postman-compatible pm.* API for variables, environment access, and test assertions.
  • Response Assertions — Visual assertion builder for status codes, response times, JSON paths, body content, and headers. No code needed — just point, click, and validate.
  • WebSocket Testing — Connect to WebSocket servers, send/receive messages, filter by direction, and view JSON-formatted message logs with timestamps.
  • Multi-Format Import — Already have collections? Import from Postman, Insomnia, OpenAPI/Swagger, Thunder Client, or raw cURL commands. Format is auto-detected.
  • Environment Variables — Global and custom environments with {{variable}} substitution. Switch between dev, staging, and production with one click.
  • Dark Mode — Full dark theme across every component, persisted across sessions.

How I built it

Frontend: React 19 + Vite 7 + Tailwind CSS v4 + Zustand for state management. The UI is a three-panel layout (Sidebar, Editor, Intelligence Panel) inspired by professional IDEs. State is persisted to localStorage with versioned migrations so no data is lost between sessions.

Backend: Python FastAPI server that wraps the Google Gemini API (gemini-3-flash-preview model). Endpoints handle test generation, batch execution, natural language processing, code generation, failure diagnosis, mock response generation, and CORS proxying.

AI Mock Server: A dedicated route handler that intercepts any request to /mock-server/*, extracts the path, method, and body, sends it to Gemini with a prompt to "invent a realistic API response", and returns the generated JSON. No database, no configuration — pure AI inference.

MCP Integration: Implemented the Model Context Protocol to expose testing tools as structured functions that external AI agents can call. This turns TestOrbit into a "testing API" that other agents can orchestrate.

Key Technical Decisions:

  • Sandboxed script execution using new Function() with a controlled pm.* API surface — power without risk
  • Zustand persist middleware with additive version migrations (v1 through v6) so the app can evolve without losing user data
  • CORS proxy endpoint so the frontend can test any API without browser restrictions
  • Streaming text effect for AI responses to feel like a natural conversation

Challenges I faced

  • I'm not a testing expert — I had to learn how tools like Postman actually work under the hood to build a competitive alternative. Understanding the pm.* scripting API, assertion patterns, and collection runner logic was a steep learning curve.
  • AI prompt engineering — Getting Gemini to return exactly the right number of tests (one per user question, not more, not less) required very specific prompt constraints. Generic prompts either gave me 20 tests when I asked for 3, or missed the point entirely. Small wording changes made huge differences.
  • Mock Server believability — Making AI-generated mock responses feel realistic required careful prompt design. I had to teach Gemini to infer field names from URL paths, generate appropriate IDs, timestamps, and nested structures, and maintain consistency across requests.
  • Zustand persistence migrations — Adding new fields to deeply nested objects (requests inside folders inside collections) required recursive migration helpers. One missed field meant data loss on upgrade, and I hit this bug more than once.
  • MCP protocol implementation — Exposing tools via the Model Context Protocol required understanding the exact schema format that agents like Claude expect. Getting the function signatures and response formats right took careful debugging.

What I learned

  • AI as a feature, not a gimmick — The AI features work because they have real context (OpenAPI specs, conversation history, response data). Just slapping "Ask AI" on a button doesn't make a product useful. The context you feed the model is everything.
  • Real users shape real products — Every major feature came from watching my friend and colleague actually test APIs. Their pain points became my feature list. My friend missing edge cases became AI test generation. My colleague's "no UI" scripts became the visual collection runner.
  • Prompt design is product design — Small wording changes in prompts dramatically affect output quality. I spent as much time tuning prompts as writing React components.
  • Building a Postman-compatible scripting API taught me how much complexity lives behind seemingly simple developer tools. There's a reason Postman is a billion-dollar company.

What's next

  • OAuth 2.0 flow support — Automated token refresh and authorization code flows
  • GraphQL support — Dedicated query editor with schema introspection
  • Team collaboration — Share collections and environments across team members
  • Mock persistence — Save generated mocks to a file so they persist across restarts
  • Load testing — Parallel execution with configurable concurrency for performance testing
  • Hosted version — Deploy as a web app so anyone can use it without local setup

Built With

Share this project:

Updates