πŸ™ Seva Agent: Real-Time Autonomous Prayer Assistant

Inspiration

The Problem: 30M+ Sikhs attend prayer services recited in Punjabi/Gurmukhi Script. Younger generations understand spoken Punjabi but cannot read Gurmukhi or grasp authentic spiritual meanings. They listen passively through 2-3 hour services without full engagement.

πŸ’‘Vision: What if AI could listen to live prayers and instantly display original verses with English meanings, creating immersive spiritual experience for everyone?

The Solution: Local AI Agent that listens to live prayer recitation and synchronizes projector display with original Punjabi text + English meanings. Transforms passive listeners into active participants who immerse in spiritual experience while enhancing Punjabi reading skills.

πŸ”§ How we built it

Architecture

Audio Input β†’ ASR Engine β†’ Ensemble Verse Matching β†’ Desktop Control β†’ Display Output

Core Components

1. Real-Time Speech Recognition

  • Fine-tuned ASR on curated Gurbani dataset: 60+ hours, 10+ epochs
  • Custom vocabulary/tokenizer constraining inference to domain-specific output
  • Preprocessed ground truth to match real-world recitation patterns
  • ASR transcript alignment back to original verses presenting original content

2. Ensemble Verse Matching

  • Multi-algorithm real-time alignment: Fuzzy matching, LCS, SequenceMatcher, Levenshtein
  • Leading indicators for verse identification at recitation start
  • Consensus-based matching with confidence thresholds

3. Autonomous Desktop Integration

  • AppleScript UI automation + OCR for SikhiToTheMax control
  • Socket.IO communication replacing manual typing/scrolling
  • Automated operator workflow: listen β†’ search β†’ display

4. Smart Navigation

  • Anchor Mode: Initial positioning via consensus matching
  • Paath Mode: Sequential reading with drift detection
  • Leading Trigger: Predictive verse transitions

Tech Stack

  • ASR: PyTorch, Transformers, HuggingFace, Wandb
  • Audio: SoundDevice, NumPy, soundfile, librosa
  • Matching: RapidFuzz, Levenshtein, difflib
  • Automation: PyTesseract, PIL, AppleScript, Socket.IO
  • Data: YouTube (yt-dlp), curated datasets, synthetic augmentation

🚧 Challenges we ran into

1. Low-Resource ASR: Limited Punjabi training data Solution: Domain-specific fine-tuning + ensemble alignment techniques

2. Real-Time Performance: Sub-second latency requirements Solution: Sliding window processing + leading verse detection

3. Verse Disambiguation: Similar phrases across verses Solution: Contextual matching + drift monitoring

4. API-less Integration: No SikhiToTheMax APIs Solution: OCR + Socket.IO reverse engineering + AppleScript automation

5. Sacred Text Accuracy: Zero tolerance for errors Solution: Post-processing alignment ensuring original text preservation

🧠 What we learned

Domain Adaptation: Base ASR models fail for low-resource languages. General Punjabi ASR β‰  Religious domain ASR. High-quality domain-specific data is essential.

Ensemble Methods: Noisy ASR requires multiple alignment techniques. Single algorithms fail; consensus delivers production reliability.

Real-World Performance: Lab performance β‰  production. Training on studio audio fails in halls with AV systems/ambient noise. Synthetic augmentation crucial.

Cultural AI: Religious contexts demand strict accuracy standards transferable to medical/legal domains. AI for underrepresented minorities has significant community impact.

πŸ† Accomplishments that we're proud of

  • First autonomous Gurbani recognition system serving global Sikh community
  • Inference Alignment Using ensemble approach for real-time verse identification and synchronization
  • Zero operator dependency - fully autonomous projector displays
  • Educational impact: Improved Punjabi literacy and spiritual engagement
  • Open source contribution: The Agent is open source, the model is hosted on huggingface.

πŸš€What's next for Seva Agent

  • Fine tune gpt-oss: Augment ensemble of techniques for leading verse identification
  • Federated learning across global deployments at Sikh temple (Gurdwaras)
  • Mobile integration for worldwide radio/internet listeners
  • Edge optimization: Quantization for limited compute with real-time latency
  • Multi-language translation (10+ languages) preserving original context
  • More modes include keertan - recitation with singing and musical instruments
  • Multi-modal learning: Audio + contextual signals (time, ceremony type)
  • Custom ChatGPTs leveraging gpt-oss for personalized religious chatbots

Impact: Demonstrates production AI for low-resource languages, cultural preservation, and autonomous religious technology applicable to education, healthcare, and legal domains requiring strict accuracy.

Built With

Share this project:

Updates