Inspiration

Every day, the internet grows by thousands and thousands of new pages and endpoints — from APIs and dashboards to IoT devices and forgotten admin panels. But how many of these are properly secured? Inspired by the rise of Attack Surface Management (ASM) solutions and the latest AWS technologies in agentic AI, we wanted to build an intelligent system capable of exploring and reasoning about exposed digital assets, much like a human analyst would.

We know attackers are getting more sophisticated by using cutting edge AI in their attacks, so why should the defence be left behind?

What it does

Surface Sentinel is an AI agent that autonomously explores a domain, identifies publicly exposed assets, and cross-references them against known vulnerabilities. It mimics a cybersecurity researcher performing reconnaissance:

  • Navigates websites using a browser-based agent.
  • Extracts metadata from exposed pages, endpoints, and forms.
  • Grounds findings in a live vulnerability database (CVE).
  • Generates structured reports highlighting potential risks.

The agent operates within a controlled, serverless environment, built entirely on AWS, ensuring scalability and isolation while performing web reconnaissance safely and responsibly.

How we built it

Surface Sentinel was built with the Strands Agents SDK running on AWS Bedrock. We started with a simple prompt-driven prototype, then iteratively evolved it into a production-ready agent capable of autonomous reasoning and tool use.

Key steps:

  • Agent Design: we used Strands Agent class, defining the system prompt and have it powered by Anthropic's Claude 4.5 model running on Bedrock.
  • Web Exploration: initially, we used the Playwright MCP server to simulate browsing; later, we replaced it with the managed Bedrock AgentCore Browser Tool, which simplified scaling, security and observability.
  • Knowledge Integration: the agent grounds its findings in the Common Vulnerabilities and Exposures (CVE) dataset through Bedrock Knowledge Bases, backed by OpenSearch & S3 Vectors.
  • State & Storage: early versions used the Filesystem MCP for local tracking; later iterations used Strands managed state to streamline runtime persistence.
  • Observability & Evaluation: integration with LangFuse and CloudWatch allowed full traceability — from tool calls and token usage to full traces of the agents reasoning and acting.

Challenges we ran into

  • Context window limits: Keeping the agent aware of past exploration while managing the token budget required careful balancing.
  • State persistence: while file and state-based persistence are ok for demo purposes, we'd like to investigate more scalable and flexible solutions in the future
  • Performance trade-offs: headed browsing introduced extra latency and resource usage, but gives us more (visual insight.

Accomplishments that we're proud of

  • Built a fully autonomous agent capable of reasoning and acting across a live environment.
  • Demonstrated end-to-end deployment on AWS Fargate and Bedrock AgentCore.
  • Implemented observability and evaluation with both open-source (LangFuse) and AWS-native (CloudWatch) options.
  • Delivered a reusable architecture adaptable to other security domains, such as penetration testing and asset discovery.

What we learned

Building intelligent systems that act — not just chat — requires more than clever prompting. It demands a deep understanding of:

  • Context engineering: managing reasoning and action history effectively.
  • Security-conscious design: treating every tool and API as a potential attack vector.
  • Observability discipline: treating agents like microservices, with metrics, logs, and traces.
  • We also learned how reasoning models evolve from CoT → ToT → GoT → ReAct patterns, and how these theoretical ideas translate to real-world agent design.

What's next for Surface Sentinel

Next iterations will focus on:

  • Smarter state management: storing exploration graphs and vulnerability evidence in e.g. Dynamo DB.
  • Automated evaluation pipelines: benchmarking accuracy against known vulnerable assets.
  • Multi-agent collaboration; using swarm topologies for parallel reconnaissance.
  • Integration with ticketing systems; automatically filing vulnerability reports for review - put the human-in-the-loop to stay ahead of the attackers.

Ultimately, we want Surface Sentinel to become a trustworthy cybersecurity companion — fast, explainable, and cost-efficient.

Built With

  • agentcore
  • bedrock
  • cdk
  • cloudwatch
  • fargate
  • mcp
  • opensearch
  • python
  • strands
Share this project:

Updates