-
-
MCP security analysis! Quickly check if an MCP server attemps for tool-poisonning & prompt injection via tools, resources or prompts
-
LLM Playground: test your MCP server with an LLM (you can setup your model & Api key, for now only Anthropic models)
-
LLM Playground (DARK)
-
LLM Playground: your LLM is capable of using the tools of the MCP you connected to the inspector
-
Tools Evaluation: Generate Test cases to evaluate how prone your tools are to confusion (when tool description are too close)
-
Tools Evaluation: Grizzly tells you why your tests failed to help you improve your MCP servers
-
Tools Evaluation: Check for model provider consumption problems and MCP specification completeness
Inspiration
While contributing to MCP and developing our own servers, we found Inspector useful but limited. Grizzly adds the features we wish we had.
Who it's for
- anyone who need to verify MCP server safety and capabilities before using it
- MCP server developers looking for a complete debugging toolset
- MCP client developers looking to benchmark a group of MCP servers for their use-case
Try it out: npx @alpic-ai/grizzly
What it does
Grizzly evaluates MCP servers across three critical dimensions:
- Stability: Tests protocol implementation accuracy
- Performance: Detects potential tool confusion and ensures tool are well understood by the LLM
- Security: Identifies prompt injection, tool poisoning and other security related issues
Grizzly also includes a chat playground in order to bridge the gap and enable testing behavior with your chosen LLM directly from the web app.
How we built it
We built on top of existing https://github.com/modelcontextprotocol/inspector. The original inspector is clear on its intention to remain LLM agnostic and focused on the protocol itself. The added features require an LLM provider to :
- evaluates tools, resources and prompts descriptions and look for security issues
- generate test cases
Challenges we ran into
Our initial strategy for tool confusion detection was to compute tools description embeddings distances and compare them with a threshold value. However Anthropic lacks an embedding API and we didn't want to ask users to provide 2 keys to use our tool. We changed strategy to have Sonnet build tool test cases instead.
Accomplishments that we're proud of
The look and feel of this renewed inspector!
What we learned
- Anthropic has no embeddings API
- That one day hackathon is short but abusing vibe-coding is the goto strategy!
What's next for Grizzly
- Support for additional LLM providers beyond Anthropic
- Add support for resources and prompts in the LLM playground in addition to tools
- Export generated tool confusion test cases in order to be able to version them and use them in CI
Built With
- anthropic
- mcp
- react
- typescript
- vite


Log in or sign up for Devpost to join the conversation.