Chrono_

This is our UI
This is a de-dimensionalized (700+ dimension -> 3 dimensions) image of our vector embedded database.

Inspiration

Language evolves very predictably over time; From geographic cultural intermixing, to speech patterns and phonetic shifts; These patterns are well defined and yet there is no reasonable way to interface with either past or future iterations of our common tongue: English.

What it does

Chrono_ is the first AI model that you can speak to that runs the whole gamut of english history and (predictive) future developments. We scraped derivatives of the english language and embedded their words in a high dimensional vector space (700+ dimensions) as a means of translating semantic meaning across languages. We built a local front end where you can speak or type in english words and then translate them to your language of choice with an audio model speaking the translation back to you. We did this through a standardized phonetic representation (IPA), semantic vector space of embeddings, and a combination of speech-to-text and text-to-speech models.

How we built it

Planning and Research - Gemini 1.5 with Deep Research Data Collection - Scraped Wiktionary API and parsed data using the Claude Haiku API for Proto-Indo-European, Latin (incomplete), Old English, US English, British English and Toronto English Embeddings - Using XLM-RoBERTa we embedded all our collected words from these languages (200k+ words) Speech-to-text - Webkit Speech Recognition Text-to-speech - Amazon Polly for IPA to Audio translation Front End - Flask Server, HTML, css Future Forecast - Tracked phonetic shifts over time using k-closest neighbours facilitated by ModernBERT Development - Python enabled by Cursor Composer and ChatGPT o1

Challenges we ran into

Data collection and parsing was often inaccurate, led to loss of useful Latin data Too much fun Difficulties converting IPA directly to a useful forecasting format

Accomplishments that we're proud of

Bossed up

What we learned

The power of GenAI

What's next for Chrono_

Finetuning a LLM to work with our custom vocabulary that contains many english language relatives More accurate text to speech, trained on real audio data Better more accurate translation between languages