Inspiration
A famous man once said "why use many words when few do trick" (Kevin, The Office, S08E02.)
Learning is such an important part of life - whether that be through experiences or through tales. Dyslexic students often are at a much greater disadvantage when it comes to the daily struggle of trying to comprehend a world made for people who have no difficulty reading. In a lot of these cases, visualising the text slightly differently can help make a huge impact in how easy it is for dyslexic people to understand it (such as colour of background and size). If only there existed some software that could do this and make this process of reading easier for dyslexic people...
Well now there is.
What it does
The application uses obtains the HTML format of large textual documents, processes them and splits them into batches then stochastically sends them over to a Unity server which visualises these text batches in a VR environment. The VR application allows users to select preferences that helps them in reading and processing information - e.g. font size, background colour, line spacing etc. Having these features be personalisable means that, since different dyslexic people have different needs for what helps them to better read text, every person can be catered for.
Not only this, K.E.V.I.N.(S) integrates a large language model to simultaneously provide condensed summaries of whatever portion of text document the user is currently reading off side - so a user will be able to read a paragraph of a text, and then look to the side to find a condensed summary to provide confidence in their understanding. This large language model runs dynamically and synchronously with the Unity server so each time the user turns or chooses a visualisation preference for the virtual page, the same will be done for the condensed summary.
How we built it
The data processing was done using the HTML Parser module in Python. The outcome of this processing is saved into JSON, which is then pulled off and loaded into Unity to be visualised as content through a C# script. From here, several C# server scripts work in tangent to provide an interface for a user to interact with, including forward and backwards buttons and a main menu page.
The large language model was developed using the openAI API in Python. This API interacts with ChatGPT to provide a comprehensive summary of the text fed through to the model on the server, which is then sent back to the Unity client across a UDP network written in Python.
Challenges we ran into
Initially, we intended to use QRT's large language model server as is this better suited computers with lower processing capabilities, however, this server, unfortunately, crashed at 1AM and did not return until 9:37AM, meaning that we had to find another large language model and re-innovate the integration between our different components.
Additionally, it proved much more difficult to split text into batches as there was often a trade-off between suitability and semantic correctness. For example, often, these batches cut off a sentence in the middle, which meant that we had to be careful with where we sliced our text document.
Accomplishments that we're proud of
This project makes use of several features in computer science that proves, when integrated with the user in mind, it is possible to make an incredible piece of software that will help millions.
We're proud of:
- harnessing our knowledge of C# into something a lot larger
- learning about parameters in large language modelling (despite never having used them before),
- being able to integrate technologies running in Python and C# using UDP communication
- tackling the large learning curve of Unity and C# to successfully integrate data processing files into a VR environment
- working together, time managing and effectively decomposing such a (initially daunting) project into tasks according to each member's skill sets
What we learned
As mentioned in the challenges that we ran into, it can often be difficult to find a balance between what's nice and what's effective. We've learnt to make the right decisions and to also not be afraid to explore various paths. We also learnt an awful lot about large language models and have enhanced our understand of Unity and VR applications.
What's next for K.E.V.I.N.(S)
In the future, we hope to implement a text-to-speech feature that is accompanied with word highlighting to make the processing of retaining attention and reading complex documents even easier.
We also hope to make the data processing pipeline more generalisable as at the moment, the type of documents we can cover are limited in that we can only use HTML versions of documents. We also hope to add more personalisable features e.g. rather than having set font sizes, introduce a scale so users have even more variation.
We also hope to make better use of storage facilities such as Google Cloud rather than storing several JSON files locally as this is not feasible for multiple documents.
And lastly, as is the case with all large language models and machine learning, optimising the hyperparameters through further tuning and meta-heurstics analysis to improve the performance of or model will be on our to-do lists.
Log in or sign up for Devpost to join the conversation.