QuickGloss

Inspiration

Albert has worked as a documentary linguist before, and is particularly interested in severely under-resourced languages. He has always been frustrated with the lack of a user-friendly, semi-automated interface to help document and transcribe these oft-endangered languages. He has worked with tools like ELAN and FLEx in the past, but was always left unsatisfied with their poor user interface and the necessity for painstaking manual review.

What it does

QuickGloss aims to semi-automate all of the roadblocks described above. We have three main features: speech to text, manual glossing, and, our centerpiece, automated glossing. Our speech to text takes an audio file of native speaker materials, and automatically transcribes it. It then prompts the user to gloss that transcribed text. The user then uploads a necessary corpus of morphemes, including as many or as little properties as they would like, in a custom list format. Our program automatically detects which language is being used. The algorithm then matches the morphemes with the ones in the provided sentences, and marks it down using standard Leipzig notation. Finally, for the morphemes not found in the corpus, it uses predictive ML techniques to extrapolate likely glosses based on morphological patterns in the language.

The manual glossing feature is built for under-resourced languages that have little to no readily available online data to train an ML model. The user simply uploads their desired text, morphemes, and meanings, and the program matches the words and morphemes, without machine learning. While we would like to add a trainable ML model for these languages, it is difficult given the scope of the hackathon.

How we built it

The backend is built using Python (Flask), which handles the majority of the algorithms and underlying logic. We used Whisper, an open source transcription model, for the speech-to-text model. We used spaCy, a commonly used NLP tool, for the machine learning aspects.

The frontend is built using HTML/CSS and JavaScript.

Challenges we ran into

None of us had worked with Flask before, but we saw it as an opportunity to learn a new framework. There was certainly a learning curve, which resulted in some friction between backend and frontend development. On numerous occasions, we had to adjust certain aspects of the site to resolve this friction. Once we learned how routes coordinate the two together, it became simpler. We also spent way too much time researching APIs or glossing dictionaries, a task which could have (and should have!) been done before the hackathon date.

Accomplishments that we're proud of

We're proud of the accuracy of our model. Going in, we didn't really have high expectations for what our necessarily rudimentary model was going to output, but we were pleasantly surprised with the result. While the actual segmentation has a few errors here and there (as is inevitable for predictive models), it is, for the most part, accurate.

What we learned

Our team consists of programmers, who don't know that much linguistics, and linguists, who don't know that much programming. Each learned from the other; our programmers took home a number of linguistic terms and concepts, while our linguist learned a bit about how to integrate his experience computationally.

What's next for QuickGloss

Looking ahead, we plan to add more accommodations for severely under-resourced languages. These include developing a user-friendly interface for uploading parallel corpora or pre-tagged texts to bootstrap language-specific models, and integrating a training interface for linguists to “teach” the system over time. We would also like to like to implement a way to crowd-source, allowing native speakers to directly contribute glosses and correct errors.

We also plan to integrate front-end frameworks like React or Vue.js to make the interface more modular and scalable. This would enable more advanced features like real-time collaboration or integration with glossing libraries and dictionaries through API calls.

Built With

css
flask
html
javascript
langdetect
python
spacy

Submitted to

LingHacks VI
- Winner Best Use of AI/ML
- Winner Top 10 Teams

Created by

My main job was to organize, ideate, and present QuickGloss. This was an amazing opportunity to see my interests in other fields of linguistics reflected in computational linguistics!

Albert Ko
I was the principal back-end developer, and contributed to the front-end a little bit as well. While I am familiar with Python, I can't say the same for Flask, so I definitely learned a lot on that front. I also had to deal with some extremely complicated JavaScript logic, but I think I did a decent job simplifying it.

Jacob Aronow
My main job was to develop the front-end for QuickGloss. Using Javascript, HTML, and CSS, I smoothly integrated our backend logic into a user-friendly interface, ensuring a cohesive experience. Also, I contributed a little to the backend flow.

Matthew Hawkes

Updates

Albert Ko started this project — Jun 15, 2025 01:00 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.