Inspiration

The idea for Aether came from one frustrating moment:
we asked an AI assistant a factual question, and it confidently gave a wrong answer — again.
That moment made us realize that as AI gets more powerful, trust becomes the real challenge.

We wanted to build something that could audit AI itself.
Not to punish it, but to make it more transparent, measurable, and accountable.
That’s how Aether was born — the AI that audits AI.

What it does

Aether is an AI Audit Engine that evaluates any AI-generated text for:

  • 🧠 Bias — detects ethical or cultural skew using Gemini reasoning
  • 🔍 Hallucination — measures factual accuracy using Elastic hybrid search
  • 📚 Source Confidence — quantifies how well the text is grounded in real evidence

When a user submits a piece of text, Aether:

  1. Generates embeddings via Vertex AI
  2. Searches relevant evidence from Elastic Cloud (BM25 + vector)
  3. Calculates bias, hallucination, and confidence scores
  4. Uses Gemini 2.5 Flash to generate a natural-language explanation

The result is a transparent audit report — a JSON output and dashboard that shows how “trustworthy” an AI response really is.

How we built it

We designed Aether as a modular, service-oriented system built with:

  • Backend: NestJS (TypeScript) for the audit API pipeline
  • Frontend: React for the visual dashboard
  • Search Layer: Elastic Cloud Hybrid Search (BM25 + kNN)
  • AI Models: Google Vertex AI (text-embedding-004) and Gemini 2.5 Flash
  • Deployment: Cloud Run

Pipeline flow:


Text → Vertex AI (embedding) → Elastic (search evidence) → Gemini (reasoning) → JSON Report

Challenges we ran into

  • 🧩 Multi-service orchestration: combining Vertex AI, Elastic Cloud, and Gemini into one seamless audit pipeline was complex. Each service had different latency, authentication, and response formats that required a carefully synchronized architecture.
  • ⚖️ Hybrid search calibration: fine-tuning Elastic’s BM25 lexical and vector similarity components to achieve reliable factual grounding involved multiple iterations of weight balancing and threshold tuning.

  • 📈 Dynamic scoring consistency: defining formulas for hallucination_score and source_confidence that worked across diverse text inputs was challenging. We built a normalization layer based on similarity distribution and text length variance.

  • 🎨 Human-centered UI: designing a clear and intuitive dashboard that visualized complex audit metrics (bias, hallucination, confidence) without overwhelming users took several design iterations.

Accomplishments that we're proud of

  • ✅ Built a fully working AI audit pipeline integrated with Google Cloud + Elastic Cloud
  • 🌍 Created bilingual evidence dataset (English–Indonesian) for ethical AI testing
  • 🔎 Designed a transparent JSON audit output + UI visualization
  • 🧠 Proved that AI can evaluate AI responsibly

What we learned

  • Grounding and transparency are the most critical factors for AI trustworthiness.
  • Elastic Hybrid Search is extremely powerful when combined with Vertex AI embeddings for factual grounding.
  • Gemini can serve as both a reasoning engine and bias auditor — when prompted carefully.
  • Real-world responsible AI doesn’t need massive data, but clear structure and measurable trust metrics.

What's next for Aether

  • 📈 Add a real-time audit dashboard to visualize hallucination trends over time
  • 🌏 Support multilingual grounding beyond English–Indonesian
  • 🤝 Integrate Aether SDK for developers to audit their own AI models
  • 🧩 Expand dataset via automated crawler pipelines
  • 🧾 Publish Aether as an open-source framework for responsible AI auditing

Our goal is to make AI explainable, measurable, and — above all — trustworthy.

Built With

Share this project:

Updates