Nakama (仲間): The Language Sensei

Stop reading subtitles. Start seeing the language.


💡 Inspiration

Nakama means "friend" or "comrade" in Japanese. I learned this while watching One Piece, a show I have followed for nearly four years and over 1,100 episodes. My love for the series sparked a deep curiosity and excitement for learning the Japanese language.

My mother tongue is Marathi. Interestingly, I never formally studied Hindi in school; I acquired it naturally by watching dubbed cartoons like Shinchan, Ben 10, and Pokémon. This made me wonder: Why can’t I do the same with Japanese?

I realized the barrier was clarity. While watching with English subtitles and Japanese audio, I couldn't clearly map the spoken pronunciation to the translated meaning. I attempted to learn Japanese using Gemini(I actually did, not just saying because gemini hackathon), and I realized that language acquisition is essentially mastering grammar and becoming familiar with vocabulary.

The fundamental hurdle is the structural difference:

  • English Grammar: Subject + Verb + Object (SVO)
  • Japanese Grammar: Subject + Object + Verb (SOV)

After 1,100 episodes, I already had the potential for a strong vocabulary. I realized that if I could see the English meaning and the Japanese phonetic pronunciation (Romaji) simultaneously—color-coded by grammatical role—I could learn Japanese passively while enjoying anime. That is why I built Nakama.


⚙️ What It Does

Nakama takes Japanese Romaji and English subtitles as input. It identifies the Subject, Verb, and Object in both sets of subtitles and synchronizes them using a shared color-coded system.

The output is a new, specialized subtitle file. When used, it displays both the English meaning and the Romaji pronunciation on-screen. Because both are color-coded, viewers can instantly identify how parts of a sentence translate and shift between the two languages.


🛠️ How We Built It

We utilized Gemini for the most critical task: Grammar Mapping.

Gemini receives both the Japanese and English sentences simultaneously. It determines the grammatical components (Subject, Object, and Verb), maps them across both languages, and assigns the appropriate colors.

The Visual Grammar Scheme

  • Cyan: Subject (Who is performing the action)
  • Yellow: Object (The target of the action)
  • Green: Verb (The action itself)

The final output is generated in the Advanced SubStation Alpha (.ass) file format, which allows for precise positioning and distinct styling for dual-language learning.


🚧 Challenges We Ran Into

  1. Token Limits & Context Window: Initially, providing entire subtitle files (around 20,000 tokens) caused the system to struggle. I solved this by implementing a pipeline that processes the files in 20-line chunks.
  2. Hallucinations vs. Input: In early tests, I asked the LLM to handle the chunking itself. However, I noticed missing sentences and inaccuracies. I discovered that the LLM was often relying on its internal knowledge of One Piece Episode 1 scenes rather than the specific input I provided. I refined the prompt to ensure the model strictly adheres to the provided subtitle data.

🏆 Accomplishments That We're Proud Of

I have successfully built a functional pipeline. By providing English and Japanese subtitles, the system generates a one_piece_episode_dual.ass file that serves as a real-time learning tool.

I shared this with my friends (my Nakama), and the response was incredible—they are excited to use it for their own language journeys. This logic can be applied to any language pair, potentially changing how we approach language acquisition.


📚 What We Learned

  • The Power of In-Context Learning: I learned that LLMs are incredibly capable of linguistic structural analysis, but they require strict "grounding" to prevent them from hallucinating from their training data.
  • Subtitle Engineering: I gained deep technical knowledge of the .ass (Advanced SubStation Alpha) format, specifically how to use override tags to control positioning, fad effects, and multi-color layering within a single dialogue line.
  • Linguistic Mapping: I learned how to programmatically bridge the gap between SVO and SOV languages, identifying how particles like wa, ga, and wo act as anchors for meaning.

🚀 What's Next for Nakama: The Language Sensei

While the project started with anime, the goal is to expand to:

  • TV Series and Movies: Applying the mapping logic to live-action content in various global languages.
  • Automatic Synchronization: Developing an AI-driven tool that can automatically detect and correct timing offsets (like studio intros or ad breaks) without manual intervention.

The ultimate vision is for Nakama to be a tool used by anyone, anywhere, to learn any language they desire through the media they love.


Built With

Share this project:

Updates