Echo | Devpost

Hero Image: Landing page screenshot showing Echo Bot branding, value prop and core capabilities
Architecture Diagram: Mermaid diagram from README (Voice → Gemini Live → Analyzer → Intervention)
/dashboard Screenshot: Terminal UI showing mood score, trending topics, interventions

Inspiration

Discord moderation is outdated. Most tools scan text for slurs, ignore voice entirely, and only act after harm has already happened. Yet the most damaging conflicts — shouting matches, harassment, coordinated pile-ons — happen in voice channels, where no moderation tool is listening.

We built Echo to change that: an autonomous community guardian that understands both text and voice, reasons about community dynamics over time, and intervenes only when it actually helps. With Gemini 3’s long-context reasoning and Gemini Live’s real-time audio understanding, Echo doesn’t react to keywords. It reads the room.

What it does

Echo is an autonomous moderation agent for Discord servers. It continuously observes text and voice activity, maintains a live understanding of community mood and context, and decides when — and how — to intervene.

From a user’s perspective: Echo feels like a calm, human moderator. But from a moderator’s perspective: Echo provides structured, real-time community intelligence.

Echo does not punish by default. It facilitates, de-escalates, and escalates to humans only when safety is at risk.

How we built it

We built Echo as a full end-to-end system designed for real-time operation, restraint, and safety.

Core system

Discord integration: Node.js + Discord.js for text and voice
Audio pipeline: Opus → PCM → 48 kHz mono with VAD and backpressure handling
State persistence: MySQL for live and historical server state
Real-time CLI Dashboard: Powered by direct DB access

AI layer

Gemini 3 Flash:
- Batch text analysis
- Cross-modal reasoning
- Long-context server state modeling
Gemini 2.5 Flash Live:
- Real-time voice semantic analysis via WebSocket
- Ephemeral processing with no audio storage

Echo fuses text sentiment and voice tension into a single server state, enabling interventions based on patterns, not isolated messages.

Challenges we ran into

Real-time audio was difficult:
- Discord audio arrives as Opus packets, while Gemini Live requires raw PCM
- We built a custom decoding and buffering pipeline to avoid silent failures
Gemini Live disconnects after a few turns:
- We implemented auto-reconnect with exponential backoff and rolling context summaries
Over-intervention harmed trust:
- Early versions felt invasive
- Confidence scoring, temporal decay, and cooldowns were required
Safety cannot rely on AI alone:
- We added multilingual regex-based safety detection that bypasses Gemini entirely

Accomplishments that we’re proud of

Hearing Echo calmly de-escalate a live voice argument
Successfully linking text sentiment and voice tension into higher-quality decisions
Achieving truly ephemeral voice processing with zero storage
Building a dashboard that visualizes invisible community health in real time