EKTA | Devpost

Inspiration

Living in a world where 3.6 million people in Turkey rely on Turkish Sign Language (TID) as their primary means of communication, I was struck by a simple question: why can't hearing people just speak to a TID translator instead of typing?

I analyzed 596 user reviews across four major TID mobile applications (Sesim Elim, TID3B Avatar, TID Sozluk, Isaret Dili Hareketli) and found that zero of them supported real-time speech input. Beyond that, 52% of all complaints were about technical instability — apps that simply wouldn't open or crashed on launch. The gap between what users needed and what existed was clear. EKTA was born from that gap.

What it does

EKTA (Erişilebilir Konuşma Tercüme Asistanı) is a real-time, emotion-aware speech-to-Turkish Sign Language translation system.

A hearing user speaks naturally in Turkish → EKTA transcribes the speech, detects the emotional tone, and displays the corresponding TID sign GIF sequences — all in under 3 seconds.

What makes EKTA unique is its emotion layer. The phrase "Gel buraya" (Come here) means something very different when spoken warmly versus angrily. Existing systems ignore this entirely. EKTA doesn't.

How we built it

EKTA uses a four-module architecture:

1. Speech Recognition
OpenAI Whisper Small (~460 MB, runs fully offline) transcribes Turkish audio captured at 16 kHz in 8-second windows, achieving 94.7% word accuracy.

2. Three-Layer Multimodal Emotion Analysis
The core contribution — a weighted fusion of three modalities:

$$E_{final} = \alpha \cdot E_{audio} + \beta \cdot E_{text} + \gamma \cdot E_{rule}$$

$$\alpha = 0.20, \quad \beta = 0.50, \quad \gamma = 0.30$$

Layer 1 (Audio Prosody, α=0.20): librosa extracts RMS energy, zero-crossing rate, and mel-spectrograms to detect vocal intensity patterns
Layer 2 (BERT Turkish Sentiment, β=0.50): savasy/bert-base-turkish-sentiment-cased captures contextual semantic emotion from transcribed text
Layer 3 (Rule-Based Lexicon, γ=0.30): High-precision keyword matching across 40+ Turkish emotion terms (precision: 0.92)

3. TID Translation
A 2,000+ sign dictionary with Turkish character normalization, suffix stripping, and fuzzy matching (difflib, threshold=0.60).

4. Web Interface
Flask + Socket.IO for real-time bidirectional communication, with live emotion probability bar charts and synchronized GIF playback.

Challenges we ran into

Multimodal weight calibration: Finding optimal fusion weights (α, β, γ) required grid search over a manually labeled 100-sample validation set. Audio prosody alone performed poorly (45% accuracy) due to inter-speaker variability.
Turkish NLP specifics: Turkish is an agglutinative language — "gidiyorum," "gideceğim," and "gittim" all stem from "git." Suffix removal for dictionary matching required custom normalization beyond standard stemming tools.
Offline reliability vs. performance: Choosing Whisper Small over larger variants was a deliberate tradeoff — user feedback showed that existing apps failed due to network dependency. Local inference was non-negotiable.
Emotion ambiguity: Surprised and fearful emotions share similar prosodic profiles (high ZCR, variable energy), resulting in the lowest per-emotion F1-scores (0.70–0.72).

Accomplishments that we're proud of

78% emotion recognition accuracy (F1: 0.76) — a 33% improvement over audio-only baselines and 8% over text-only
Sub-3-second end-to-end latency on consumer hardware
First Turkish real-time speech-to-TID system with integrated emotion analysis, as confirmed by systematic literature review
Empirical user research foundation: 596 reviews, sentiment analysis, keyword frequency mapping — not just a technical demo

What we learned

User feedback is a goldmine for system design. The 25% "calismiyor" (not working) mention rate in TID3B Avatar reviews directly shaped our offline-first architecture decision.
Multimodal fusion genuinely outperforms any single modality — but only when weights reflect each modality's actual reliability, not equal distribution.
Sign language carries emotion through non-manual markers (facial expressions, body posture) that text-only systems completely discard. Bridging this gap is both a technical and a linguistic challenge.

What's next for EKTA

Expanded vocabulary: Partnership with TID linguists to grow beyond the current 2,000-sign dictionary (~10–15% of estimated TID vocabulary)
Emotion-driven sign parameters: Modifying signing velocity and intensity based on detected emotion, grounded in TID linguistic research
Bidirectional translation: Sign language recognition (camera input → spoken Turkish) to enable full two-way conversation
Real-world user testing: Evaluation with Turkish deaf community members — the most important validation step not yet completed
Submission to SIU 2026 (34. IEEE Signal Processing and Communications Applications Conference, Piri Reis University, Istanbul)

Built With

bert
css3
flask
flask-socketio
gtts
httml
javascript
librosa
numpy
pandas
pyaudio
pygame
pymysql
python
pytorch
scikit-learn
sqlite
torchaudio
websocket
whisper

Updates

dogaulkua Ülkü started this project — Mar 29, 2026 08:50 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.