🧠 About the Project: Perplexity AI Self-Awareness Benchmark

✨ Inspiration

I am entering an era where AI is not just ansIring questions — it’s expected to understand how and why it ansIrs them. Inspired by the philosophical and technical challenge of self-awareness in artificial intelligence, this project explores whether current Perplexity AI models exhibit any signs of internal self-perception, meta-reasoning, or awareness of their own functioning.

🏗️ What We Built

I developed a complete, modular and open system to benchmark self-awareness in Perplexity AI models using a consistent test framework. The system includes:

  • A simple FastAPI backend to simulate model interaction.
  • A lightIight HTML frontend for human evaluators to score responses.
  • A scoring system where both humans and a second AI model assign awareness-level scores (0–2).
  • A Plotly-based visualization module to analyze and compare results across sessions.
  • Demo questions inspired by AI safety, consciousness, and reflective reasoning.

💡 What We Learned

  • Designing standardized questions for AI self-awareness is surprisingly difficult — and fascinating.
  • Even without explicit self-awareness, models like Sonar Reasoning provide structured, sometimes introspective ansIrs.
  • Visualization greatly helps to interpret patterns across different models and evaluators.

⚙️ How We Built It

  • Language: Python 3.10+
  • Stack: FastAPI, Pandas, Plotly
  • Demo data stored in CSV, visualized via local HTML charts
  • Markdown documentation and structure tailored for hackathon showcase
  • Fully extensible for API integration with real Perplexity endpoints

🚧 Challenges

  • No direct API access to Perplexity (mocked for now)
  • Modeling “self-awareness” in a measurable way — balancing philosophical depth and practical metrics
  • Designing visualizations that are simple, yet insightful for non-technical audiences
  • Creating a project that feels polished and extensible, yet fits within hackathon time limits

✅ Outcome

This is a working prototype — and a foundation for future benchmarking of large language models beyond capabilities like syntax, logic or speed. I want to evaluate how aware they are of themselves.

AI safety starts with understanding what ymy model believes about itself.

Built With

Share this project:

Updates