Inspiration
Language evolves very predictably over time; From geographic cultural intermixing, to speech patterns and phonetic shifts; These patterns are well defined and yet there is no reasonable way to interface with either past or future iterations of our common tongue: English.
What it does
Chrono_ is the first AI model that you can speak to that runs the whole gamut of english history and (predictive) future developments. We scraped derivatives of the english language and embedded their words in a high dimensional vector space (700+ dimensions) as a means of translating semantic meaning across languages. We built a local front end where you can speak or type in english words and then translate them to your language of choice with an audio model speaking the translation back to you. We did this through a standardized phonetic representation (IPA), semantic vector space of embeddings, and a combination of speech-to-text and text-to-speech models.
How we built it
Planning and Research - Gemini 1.5 with Deep Research Data Collection - Scraped Wiktionary API and parsed data using the Claude Haiku API for Proto-Indo-European, Latin (incomplete), Old English, US English, British English and Toronto English Embeddings - Using XLM-RoBERTa we embedded all our collected words from these languages (200k+ words) Speech-to-text - Webkit Speech Recognition Text-to-speech - Amazon Polly for IPA to Audio translation Front End - Flask Server, HTML, css Future Forecast - Tracked phonetic shifts over time using k-closest neighbours facilitated by ModernBERT Development - Python enabled by Cursor Composer and ChatGPT o1
Challenges we ran into
Data collection and parsing was often inaccurate, led to loss of useful Latin data Too much fun Difficulties converting IPA directly to a useful forecasting format
Accomplishments that we're proud of
Bossed up
What we learned
The power of GenAI
What's next for Chrono_
Finetuning a LLM to work with our custom vocabulary that contains many english language relatives More accurate text to speech, trained on real audio data Better more accurate translation between languages
Log in or sign up for Devpost to join the conversation.