GIF
Perplexity AI Self-Awareness Benchmark

🧠 About the Project: Perplexity AI Self-Awareness Benchmark

✨ Inspiration

I am entering an era where AI is not just ansIring questions — it’s expected to understand how and why it ansIrs them. Inspired by the philosophical and technical challenge of self-awareness in artificial intelligence, this project explores whether current Perplexity AI models exhibit any signs of internal self-perception, meta-reasoning, or awareness of their own functioning.

🏗️ What We Built

I developed a complete, modular and open system to benchmark self-awareness in Perplexity AI models using a consistent test framework. The system includes:

A simple FastAPI backend to simulate model interaction.
A lightIight HTML frontend for human evaluators to score responses.
A scoring system where both humans and a second AI model assign awareness-level scores (0–2).
A Plotly-based visualization module to analyze and compare results across sessions.
Demo questions inspired by AI safety, consciousness, and reflective reasoning.

💡 What We Learned

Designing standardized questions for AI self-awareness is surprisingly difficult — and fascinating.
Even without explicit self-awareness, models like Sonar Reasoning provide structured, sometimes introspective ansIrs.
Visualization greatly helps to interpret patterns across different models and evaluators.

⚙️ How We Built It

Language: Python 3.10+
Stack: FastAPI, Pandas, Plotly
Demo data stored in CSV, visualized via local HTML charts
Markdown documentation and structure tailored for hackathon showcase
Fully extensible for API integration with real Perplexity endpoints

🚧 Challenges

No direct API access to Perplexity (mocked for now)
Modeling “self-awareness” in a measurable way — balancing philosophical depth and practical metrics
Designing visualizations that are simple, yet insightful for non-technical audiences
Creating a project that feels polished and extensible, yet fits within hackathon time limits

✅ Outcome

This is a working prototype — and a foundation for future benchmarking of large language models beyond capabilities like syntax, logic or speed. I want to evaluate how aware they are of themselves.

AI safety starts with understanding what ymy model believes about itself.

Built With

csv
fastapi
html
pandas
plotly
python

Updates

Jordi Garcia Castillon started this project — Apr 22, 2025 12:23 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.