Inspiration
The idea for Aether came from one frustrating moment:
we asked an AI assistant a factual question, and it confidently gave a wrong answer — again.
That moment made us realize that as AI gets more powerful, trust becomes the real challenge.
We wanted to build something that could audit AI itself.
Not to punish it, but to make it more transparent, measurable, and accountable.
That’s how Aether was born — the AI that audits AI.
What it does
Aether is an AI Audit Engine that evaluates any AI-generated text for:
- 🧠 Bias — detects ethical or cultural skew using Gemini reasoning
- 🔍 Hallucination — measures factual accuracy using Elastic hybrid search
- 📚 Source Confidence — quantifies how well the text is grounded in real evidence
When a user submits a piece of text, Aether:
- Generates embeddings via Vertex AI
- Searches relevant evidence from Elastic Cloud (BM25 + vector)
- Calculates bias, hallucination, and confidence scores
- Uses Gemini 2.5 Flash to generate a natural-language explanation
The result is a transparent audit report — a JSON output and dashboard that shows how “trustworthy” an AI response really is.
How we built it
We designed Aether as a modular, service-oriented system built with:
- Backend: NestJS (TypeScript) for the audit API pipeline
- Frontend: React for the visual dashboard
- Search Layer: Elastic Cloud Hybrid Search (BM25 + kNN)
- AI Models: Google Vertex AI (text-embedding-004) and Gemini 2.5 Flash
- Deployment: Cloud Run
Pipeline flow:
Text → Vertex AI (embedding) → Elastic (search evidence) → Gemini (reasoning) → JSON Report
Challenges we ran into
- 🧩 Multi-service orchestration: combining Vertex AI, Elastic Cloud, and Gemini into one seamless audit pipeline was complex. Each service had different latency, authentication, and response formats that required a carefully synchronized architecture.
⚖️ Hybrid search calibration: fine-tuning Elastic’s BM25 lexical and vector similarity components to achieve reliable factual grounding involved multiple iterations of weight balancing and threshold tuning.
📈 Dynamic scoring consistency: defining formulas for
hallucination_scoreandsource_confidencethat worked across diverse text inputs was challenging. We built a normalization layer based on similarity distribution and text length variance.🎨 Human-centered UI: designing a clear and intuitive dashboard that visualized complex audit metrics (bias, hallucination, confidence) without overwhelming users took several design iterations.
Accomplishments that we're proud of
- ✅ Built a fully working AI audit pipeline integrated with Google Cloud + Elastic Cloud
- 🌍 Created bilingual evidence dataset (English–Indonesian) for ethical AI testing
- 🔎 Designed a transparent JSON audit output + UI visualization
- 🧠 Proved that AI can evaluate AI responsibly
What we learned
- Grounding and transparency are the most critical factors for AI trustworthiness.
- Elastic Hybrid Search is extremely powerful when combined with Vertex AI embeddings for factual grounding.
- Gemini can serve as both a reasoning engine and bias auditor — when prompted carefully.
- Real-world responsible AI doesn’t need massive data, but clear structure and measurable trust metrics.
What's next for Aether
- 📈 Add a real-time audit dashboard to visualize hallucination trends over time
- 🌏 Support multilingual grounding beyond English–Indonesian
- 🤝 Integrate Aether SDK for developers to audit their own AI models
- 🧩 Expand dataset via automated crawler pipelines
- 🧾 Publish Aether as an open-source framework for responsible AI auditing
Our goal is to make AI explainable, measurable, and — above all — trustworthy.
Built With
- cloudrun
- elasticsearch
- google/genai
- nestjs
- react
- restapi
- vertex
Log in or sign up for Devpost to join the conversation.