💡 Inspiration
Large Language Models are becoming part of everyday workflows: writing, coding, analysis, and even decision-making.
However, while prompting feels simple, prompt risks are often invisible.
In real usage, I noticed a recurring problem:
- Users don’t realize when a prompt is legally risky, ethically ambiguous, or policy-sensitive
- Problems are often discovered after content is generated — sometimes too late
- Most existing tools focus on output moderation, not input risk awareness
This project was inspired by a simple but unsettling question:
“What if the risk could be detected and explained **before* the model responds?”*
That became the starting point for this AI Prompt Risk Scanner & Auto-Rewrite Assistant.
🧠 What it does
This project is a prompt-first risk analysis assistant, designed to help users:
- Understand potential risks in a prompt before execution
- Receive structured, explainable risk analysis
- Get a safer rewritten version of the original prompt without losing intent
Key capabilities include:
- Multi-language prompt understanding (Chinese & English)
- Structured risk classification with confidence scores
- Human-readable risk explanations
- Automated safe rewrite suggestions
- Ready-to-integrate schema output for IDE and developer tools
The core idea is not censorship, but risk literacy — helping users recognize and manage AI usage risks before they become real problems.
🛠️ How we built it
The project is structured into three main modules:
1. Prompt Risk Analysis Engine
- Uses a dedicated system prompt to enforce consistent, expert-level reasoning
- Analyzes user prompts before execution
- Outputs structured JSON schemas instead of free-form text, making results easy to integrate into tools and workflows
2. Risk Explanation Layer
- Converts model reasoning into clear, human-readable explanations
- Highlights why a prompt is risky, not just that it is
- Supports bilingual explanations (Chinese / English)
3. Safe Rewrite Generator
- Preserves the original intent of the prompt
- Applies minimal, targeted changes to reduce risk
- Allows users to compare the original and safer versions side by side
The overall system is modular and extensible, designed with future integrations in mind, such as IDE plugins, internal tooling, or automated review pipelines.
🚧 Challenges we ran into
Defining “risk” without overblocking
Risk is contextual, not binary. One challenge was avoiding simplistic “allowed / not allowed” judgments, and instead designing a system that explains trade-offs and potential consequences.Making AI reasoning transparent
Large language models often produce correct results without showing their reasoning. Designing a structured output that balances machine readability and human trust required multiple iterations.Multi-language consistency
Ensuring that Chinese and English prompts receive equivalent risk judgments and explanations required careful prompt normalization and testing.
🏆 Accomplishments that we're proud of
- Built a working, end-to-end prompt risk → explanation → safe rewrite experience within hackathon constraints
- Designed a structured risk schema suitable for real product integration, not just demos
- Successfully supported bilingual prompt analysis without duplicating logic
- Translated real-world AI failure patterns into clickable, understandable examples that users can immediately learn from
📚 What we learned
- Prompt design is becoming a new layer of software engineering
- Safety tools are more effective when they teach, not block
- Structured AI outputs dramatically improve trust, usability, and integration potential
- Many AI risks are not caused by malicious intent, but by unclear boundaries and invisible assumptions
This project reinforced my belief that AI safety and usability must evolve together.
🚀 What's next for SentinelAI
Potential future directions include:
- IDE plugins (VS Code, Cursor, JetBrains)
- Team-level prompt governance and review dashboards
- Custom risk policies for enterprise environments
- Dataset-driven risk tuning and evaluation
This hackathon version focuses on proving the core experience:
prompt → risk insight → safer alternative
All delivered in a single, easy-to-understand flow.
Built With
- json?????json????????
Log in or sign up for Devpost to join the conversation.