Inspiration
In an era where AI systems increasingly assess human behavior, there's a pressing need to evaluate these AI systems themselves. The rapid development of AI products necessitates a plug-and-play solution to assess their ethical and practical implications. Drawing inspiration from the concept of a lighthouse guiding ships safely, AI SanityCheck aims to be a beacon for the ethical and practical use of AI in agents and large language models (LLMs).
What it does
AI SanityCheck is the first platform that applies human psychometric tests to evaluate AI systems. By leveraging established psychological assessments, it provides insights into the behavior, biases, and ethical considerations of AI agents. The current MVP focuses on testing the Gemini LLM, but the architecture is designed to accommodate any AI agent or LLM with specific configurations.
How we built it
The platform is built using modern web technologies, ensuring scalability and adaptability. It integrates psychometric testing methodologies with AI evaluation frameworks, allowing users to input AI agents and receive comprehensive assessments. The MVP is deployed at AI SanityCheck, showcasing its capabilities with the Gemini LLM.
Challenges we ran into
- Standardization: Adapting human psychometric tests for AI systems required careful calibration to ensure relevance and accuracy.
- Integration: Ensuring seamless integration with various AI agents and LLMs posed technical challenges, especially in maintaining consistency across different platforms.
- Ethical Considerations: Addressing the ethical implications of evaluating AI systems using human-centric tests necessitated thorough research and consultation with experts in both AI and psychology.
Accomplishments that we're proud of
- Innovative Approach: Pioneering the application of human psychometric tests to AI systems, offering a novel perspective on AI evaluation.
- Scalable Architecture: Designing a platform that can adapt to various AI agents and LLMs, ensuring future-proofing and broad applicability.
- User-Friendly Interface: Developing an intuitive interface that allows users to easily input AI agents and interpret assessment results.
What we learned
- Interdisciplinary Collaboration: The fusion of psychology and AI requires collaboration across disciplines to ensure both technical accuracy and ethical integrity.
- Continuous Evolution: As AI systems evolve, so too must our evaluation methods, necessitating ongoing research and development.
- User Engagement: Providing clear, actionable insights is crucial for user engagement and the practical application of assessment results.
What's next for AI SanityCheck
- Expanded Compatibility: Integrate with a broader range of AI agents and LLMs, offering assessments across diverse platforms.
- Enhanced Assessments: Develop more nuanced psychometric tests tailored specifically for AI behaviors and decision-making processes.
- Community Involvement: Engage with the AI and psychology communities to refine methodologies and ensure ethical standards are upheld.
- Educational Resources: Provide resources and tools to educate users on the importance of ethical AI evaluation and the role of psychometrics.
Built With
- netlify
- react
- supabase

Log in or sign up for Devpost to join the conversation.