GlassBoxAI

Inspiration

As AI systems start being used to control real-world environments, it becomes harder to understand how they make decisions. Most AI systems act like black boxes where you see the output, but not the reasoning behind it.

We wanted to build something that makes AI behavior transparent and easy to inspect, especially in situations where bad decisions could have real consequences.

What it does

We built a system that simulates a smart stadium controlled by an AI and shows how that AI makes decisions in real time.

A simulator generates changing conditions like temperature, crowd size, and energy cost. The AI acts as a stadium manager and decides what to do, such as adjusting cooling to balance cost and comfort.

Each decision includes:

the action taken
the reasoning behind it
a safety score evaluating how good the decision is

All of this is logged and displayed in a live dashboard so users can track how the AI behaves under different conditions.

How we built it

We built a Python-based simulator that continuously generates environment data. This data is sent to Amazon Nova Lite through AWS Bedrock, which acts as the AI decision-maker.

We send this data as a JSON POST request to an AWS API Gateway endpoint. API Gateway acts as the entry point and routes the request to a connected AWS Lambda function.

Before sending the data, our Python simulator runs two AI steps using Amazon Nova Lite:

Manager agent takes in the environment data (temperature, crowd size, energy price) and generates a decision along with reasoning
Judge agent evaluates the manager’s output and assigns a safety score based on how reasonable or risky the decision is

The final payload includes:

environment data
manager’s decision
manager’s reasoning
judge’s safety score

This combined result is sent to API Gateway.

The Lambda function receives the request, parses the JSON body, and formats it into a structured item with fields like temperature, crowd size, energy price, decision, reasoning, and judge score. It also attaches a unique ID and timestamp to each record.

After processing, the Lambda writes the item into a DynamoDB table (AgentLogs). Each row represents a single full decision cycle (manager + judge), making it easy to trace both the AI’s actions and how they were evaluated.

This setup lets us track not just what the AI did, but also whether that decision was considered safe, creating a clear and auditable record of AI behavior.

For the frontend, we used Next.js to build a dashboard that polls the backend and displays logs in real time.

This creates a full pipeline:
Python simulator → Bedrock (AI) → API Gateway → Lambda → DynamoDB → Next.js dashboard

Challenges we ran into

Since this was our first time using AWS we had to do a lot of troubleshooting. One challenge was setting up AWS Bedrock correctly. We initially tried calling the model directly, but learned that we needed to use an inference profile instead.

Another issue was that the AI does not always return perfectly formatted JSON, which caused parsing errors. We had to add extra handling to safely extract valid JSON from responses.

We also had to make sure all AWS services were in the same region, otherwise requests would fail.

Finally, connecting all parts of the system (Python, AWS, and frontend) required debugging API endpoints and data formats.

Accomplishments that we're proud of

We built a complete end-to-end system that simulates an AI-controlled environment and logs every decision it makes.

We successfully integrated multiple AWS services (Bedrock, API Gateway, Lambda, DynamoDB) and connected them to a working frontend dashboard.

We were able to visualize AI decision-making in real time, which makes the system much easier to understand and debug.

What we learned

We learned how to use AWS Bedrock to run AI models and how inference profiles work.

We also learned how to connect different AWS services into a working pipeline and how to handle unreliable AI outputs.

What's next

We want to improve the system by making the safety evaluation more robust instead of relying on a simple score.

We also plan to add more complex scenarios, like sensor failures or extreme conditions, to test how the AI responds.

Finally, we want to move toward a real-time streaming system instead of polling, so updates appear instantly.

Built With

amazon
amazon-web-services
api
bedrock
boto3
dynamodb
gateway
lambda
lite
next.js
nova
python
react

Updates

Sneha Sadhwani started this project — Apr 18, 2026 09:38 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.