Automated LLM translation benchmark

Inspiration

I have worked in the language industry all my life - as a translator, terminologist, teacher and tech consultant. Getting an instant translation from a linguistic asset has been made easy over the years, but getting a rationale and a benchmark for those translations is harder to come by.

What it does

My application leverages LLM-based technology to offer instant text translations across four languages (currently). It then applies a ranking of the three LLM output translations including valuable insights into the confidence level of each translation. By offering a rationale for the ranked translations, we empower users to make informed decisions and trust the accuracy of the output.

How we built it

I strung together some text input boxes with different LLMs to get output translations from those models. Lastly, the LLM outputs are fed to Claude for comparison and rationale.

Challenges we ran into

None

Accomplishments that we're proud of

It was easy to build. I showed it to an NLP developer and he was impressed.

What we learned

My first attempt at PartyRock and I found it easy to use.

What's next for Automated LLM translation benchmark

I'd like to use other models as well that are not yet available in PartyRock to benchmark even more LLMs against each other. I am also thinking about building an app that compares different source vs. source segments to highlight differences and give translators insights into which changes they should be addressing.

Built With

partyrock

Updates

Andreas Ljungström started this project — Feb 19, 2024 03:38 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.