🧠 About the Project: Perplexity AI Self-Awareness Benchmark
✨ Inspiration
I am entering an era where AI is not just ansIring questions — it’s expected to understand how and why it ansIrs them. Inspired by the philosophical and technical challenge of self-awareness in artificial intelligence, this project explores whether current Perplexity AI models exhibit any signs of internal self-perception, meta-reasoning, or awareness of their own functioning.
🏗️ What We Built
I developed a complete, modular and open system to benchmark self-awareness in Perplexity AI models using a consistent test framework. The system includes:
- A simple FastAPI backend to simulate model interaction.
- A lightIight HTML frontend for human evaluators to score responses.
- A scoring system where both humans and a second AI model assign awareness-level scores (0–2).
- A Plotly-based visualization module to analyze and compare results across sessions.
- Demo questions inspired by AI safety, consciousness, and reflective reasoning.
💡 What We Learned
- Designing standardized questions for AI self-awareness is surprisingly difficult — and fascinating.
- Even without explicit self-awareness, models like Sonar Reasoning provide structured, sometimes introspective ansIrs.
- Visualization greatly helps to interpret patterns across different models and evaluators.
⚙️ How We Built It
- Language: Python 3.10+
- Stack: FastAPI, Pandas, Plotly
- Demo data stored in CSV, visualized via local HTML charts
- Markdown documentation and structure tailored for hackathon showcase
- Fully extensible for API integration with real Perplexity endpoints
🚧 Challenges
- No direct API access to Perplexity (mocked for now)
- Modeling “self-awareness” in a measurable way — balancing philosophical depth and practical metrics
- Designing visualizations that are simple, yet insightful for non-technical audiences
- Creating a project that feels polished and extensible, yet fits within hackathon time limits
✅ Outcome
This is a working prototype — and a foundation for future benchmarking of large language models beyond capabilities like syntax, logic or speed. I want to evaluate how aware they are of themselves.
AI safety starts with understanding what ymy model believes about itself.

Log in or sign up for Devpost to join the conversation.