Seva Agent: Real-Time Autonomous Prayer Assistant

🙏 Seva Agent: Real-Time Autonomous Prayer Assistant

Inspiration

The Problem: 30M+ Sikhs attend prayer services recited in Punjabi/Gurmukhi Script. Younger generations understand spoken Punjabi but cannot read Gurmukhi or grasp authentic spiritual meanings. They listen passively through 2-3 hour services without full engagement.

💡Vision: What if AI could listen to live prayers and instantly display original verses with English meanings, creating immersive spiritual experience for everyone?

The Solution: Local AI Agent that listens to live prayer recitation and synchronizes projector display with original Punjabi text + English meanings. Transforms passive listeners into active participants who immerse in spiritual experience while enhancing Punjabi reading skills.

🔧 How we built it

Architecture

Audio Input → ASR Engine → Ensemble Verse Matching → Desktop Control → Display Output

Core Components

1. Real-Time Speech Recognition

Fine-tuned ASR on curated Gurbani dataset: 60+ hours, 10+ epochs
Custom vocabulary/tokenizer constraining inference to domain-specific output
Preprocessed ground truth to match real-world recitation patterns
ASR transcript alignment back to original verses presenting original content

2. Ensemble Verse Matching

Multi-algorithm real-time alignment: Fuzzy matching, LCS, SequenceMatcher, Levenshtein
Leading indicators for verse identification at recitation start
Consensus-based matching with confidence thresholds

3. Autonomous Desktop Integration

AppleScript UI automation + OCR for SikhiToTheMax control
Socket.IO communication replacing manual typing/scrolling
Automated operator workflow: listen → search → display

4. Smart Navigation

Anchor Mode: Initial positioning via consensus matching
Paath Mode: Sequential reading with drift detection
Leading Trigger: Predictive verse transitions

Tech Stack

ASR: PyTorch, Transformers, HuggingFace, Wandb
Audio: SoundDevice, NumPy, soundfile, librosa
Matching: RapidFuzz, Levenshtein, difflib
Automation: PyTesseract, PIL, AppleScript, Socket.IO
Data: YouTube (yt-dlp), curated datasets, synthetic augmentation

🚧 Challenges we ran into

1. Low-Resource ASR: Limited Punjabi training data Solution: Domain-specific fine-tuning + ensemble alignment techniques

2. Real-Time Performance: Sub-second latency requirements Solution: Sliding window processing + leading verse detection

3. Verse Disambiguation: Similar phrases across verses Solution: Contextual matching + drift monitoring

4. API-less Integration: No SikhiToTheMax APIs Solution: OCR + Socket.IO reverse engineering + AppleScript automation

5. Sacred Text Accuracy: Zero tolerance for errors Solution: Post-processing alignment ensuring original text preservation

🧠 What we learned

Domain Adaptation: Base ASR models fail for low-resource languages. General Punjabi ASR ≠ Religious domain ASR. High-quality domain-specific data is essential.

Ensemble Methods: Noisy ASR requires multiple alignment techniques. Single algorithms fail; consensus delivers production reliability.

Real-World Performance: Lab performance ≠ production. Training on studio audio fails in halls with AV systems/ambient noise. Synthetic augmentation crucial.

Cultural AI: Religious contexts demand strict accuracy standards transferable to medical/legal domains. AI for underrepresented minorities has significant community impact.

🏆 Accomplishments that we're proud of

First autonomous Gurbani recognition system serving global Sikh community
Inference Alignment Using ensemble approach for real-time verse identification and synchronization
Zero operator dependency - fully autonomous projector displays
Educational impact: Improved Punjabi literacy and spiritual engagement
Open source contribution: The Agent is open source, the model is hosted on huggingface.

🚀What's next for Seva Agent

Fine tune gpt-oss: Augment ensemble of techniques for leading verse identification
Federated learning across global deployments at Sikh temple (Gurdwaras)
Mobile integration for worldwide radio/internet listeners
Edge optimization: Quantization for limited compute with real-time latency
Multi-language translation (10+ languages) preserving original context
More modes include keertan - recitation with singing and musical instruments
Multi-modal learning: Audio + contextual signals (time, ceremony type)
Custom ChatGPTs leveraging gpt-oss for personalized religious chatbots

Impact: Demonstrates production AI for low-resource languages, cultural preservation, and autonomous religious technology applicable to education, healthcare, and legal domains requiring strict accuracy.

Built With

agent
asr
gpt-oss
openai
python

Updates

Jaspal Singh started this project — Sep 11, 2025 07:53 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.