Team100_1J_EverLingo

Inspiration

We were inspired by how language is thought to us as we grow.

When we were babies, our parents would point at tangible objects so we can associate them to words.

When we were kids and teenagers, we read, heard and shared stories that engaged us and tingled our imaginations, spurring our growth in vocabulary through imaginative and 3rd person lenses.

As we aged, relationships became more and more important; and speaking to each other directly makes us better orators as language changes over time.

Stephen Krashen Input Hypothesis theorised that input through storybooks was more effective than copy pasting from a textbook as you seek meaning from the work (An internal driver). Merrill Swain's Output Hypothesis theorised that the drive to communicate complex information is another internal driver of language learning and that would require interaction.

This is why we feel that the run off the mill daily Active Recall apps such as DuoLingo is insufficient to language learning. It lacks interaction and words lack context when not part of a larger body of text.

What it does

We present EverLingo. Inspired by how we learnt languages as we grow, we have 3 key features

An image-based dictionary: Learn complex words and ideas by AI generated visual aid. For example: The feeling of anger will be represented by a red painting, growth represented by a sapling and passion by a burning heart. Associate new words with these images.
Choose your own adventure story generator: Insert yourself into stories and entertain your inner child. You may provide details about yourself and a retrieval augmented generation model (RAG) will fetch relevant data quickly so you can read the next riveting arch. Visualise passages of the text with imagery.
Scenario Practice: Talk to another person directly speaking the language you are trying to learn or improve at. AI generated text-to-speech is used to add gravity to words and to improve intonations; this is so you can grasp how a native speaker actually sounds like. Adjust the difficulty, the settings and change your speaking partner.

How we built it

Frontend: Javascript React.js deployed on github pages: React is used for speed of development

Backend: Python FastAPI deployed on Render: FastAPI asynchronous calls is perfect for our I/O heavy operations

Database: AWS DynamoDB and Azure Cognitive Search: DynamoDB for noSQL data management. Azure Cognitive Search Indexes used for Vector Search Capabilities.

Cloud Platforms Used: AWS, Azure and Render.

Other APIs Used:

OpenAI: Made use of gpt, Dalle, embedder and Text To Speech Capabilities
HuggingFace: Explored models

Challenges we ran into

Images: Images take long to generate which may hinder UX. We make use of asynchronous processing and inform the user when the image is ready.
Stories: Stories may become too large, exceeding the token limit for models. To overcome this, we use RAG. Whereby, a summariser Langchain agent summarises keypoints of a story, OpenAI embedder embeds and places it in our Vector Database (Azure CS). When a query is made to continue the story, the Relevant documents are retrieved to provide context to the LLM quickly without exceeding its token limit. The IDs of the vector documents are recorded and garbage cleaned as appropriate.
Scenarios: Providing enough customisation while keeping it focused on learning.

Accomplishments that we're proud of

Retrieval Augmented Generation implementation with Azure CS and Langchain for stories and user provisioned details
Deployment on github pages and render as well as jwt token implementation.
Tenacity by picking up new things and coming to agreements

What we learned

Retrieval Augmented Generation
Effective JWT Tokens
Effective Ascynhronous calls
Websocket connections
TTS from llm models
Translation from llm models
Image generation from llm models

What's next for EverLingo

Centralise cloud resources in one provider. For example, choosing AWS and using AWS Kendra instead of Azure CS for retrieval in RAG.
Upgrade server instances and cloud resources.
Internalise some LLM functionality through LLAMA models.
Fine-tuning AI for higher quality output
Inviting more users

Complexities Covered

Image generation
Text Generation
AI Agents and ChatBot
Speech Generation (TTS)

Built With

amazon-dynamodb
amazon-web-services
asynch-programming
azure-cognitive-search
fast-api
github
langchain
pydantic
python
react
render-servers
retrieval-augmented-generation(rag)

Updates

Jacques Cogal started this project — Feb 05, 2024 09:43 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.