Team100_1J_EverLingo
Inspiration
We were inspired by how language is thought to us as we grow.
When we were babies, our parents would point at tangible objects so we can associate them to words.
When we were kids and teenagers, we read, heard and shared stories that engaged us and tingled our imaginations, spurring our growth in vocabulary through imaginative and 3rd person lenses.
As we aged, relationships became more and more important; and speaking to each other directly makes us better orators as language changes over time.
Stephen Krashen Input Hypothesis theorised that input through storybooks was more effective than copy pasting from a textbook as you seek meaning from the work (An internal driver). Merrill Swain's Output Hypothesis theorised that the drive to communicate complex information is another internal driver of language learning and that would require interaction.
This is why we feel that the run off the mill daily Active Recall apps such as DuoLingo is insufficient to language learning. It lacks interaction and words lack context when not part of a larger body of text.
What it does
We present EverLingo. Inspired by how we learnt languages as we grow, we have 3 key features
An image-based dictionary: Learn complex words and ideas by AI generated visual aid. For example: The feeling of anger will be represented by a red painting, growth represented by a sapling and passion by a burning heart. Associate new words with these images.
Choose your own adventure story generator: Insert yourself into stories and entertain your inner child. You may provide details about yourself and a retrieval augmented generation model (RAG) will fetch relevant data quickly so you can read the next riveting arch. Visualise passages of the text with imagery.
Scenario Practice: Talk to another person directly speaking the language you are trying to learn or improve at. AI generated text-to-speech is used to add gravity to words and to improve intonations; this is so you can grasp how a native speaker actually sounds like. Adjust the difficulty, the settings and change your speaking partner.
How we built it
Frontend: Javascript React.js deployed on github pages: React is used for speed of development
Backend: Python FastAPI deployed on Render: FastAPI asynchronous calls is perfect for our I/O heavy operations
Database: AWS DynamoDB and Azure Cognitive Search: DynamoDB for noSQL data management. Azure Cognitive Search Indexes used for Vector Search Capabilities.
Cloud Platforms Used: AWS, Azure and Render.
Other APIs Used:
- OpenAI: Made use of gpt, Dalle, embedder and Text To Speech Capabilities
- HuggingFace: Explored models
Challenges we ran into
Images: Images take long to generate which may hinder UX. We make use of asynchronous processing and inform the user when the image is ready.
Stories: Stories may become too large, exceeding the token limit for models. To overcome this, we use RAG. Whereby, a summariser Langchain agent summarises keypoints of a story, OpenAI embedder embeds and places it in our Vector Database (Azure CS). When a query is made to continue the story, the Relevant documents are retrieved to provide context to the LLM quickly without exceeding its token limit. The IDs of the vector documents are recorded and garbage cleaned as appropriate.
Scenarios: Providing enough customisation while keeping it focused on learning.
Accomplishments that we're proud of
Retrieval Augmented Generation implementation with Azure CS and Langchain for stories and user provisioned details
Deployment on github pages and render as well as jwt token implementation.
Tenacity by picking up new things and coming to agreements
What we learned
- Retrieval Augmented Generation
- Effective JWT Tokens
- Effective Ascynhronous calls
- Websocket connections
- TTS from llm models
- Translation from llm models
- Image generation from llm models
What's next for EverLingo
- Centralise cloud resources in one provider. For example, choosing AWS and using AWS Kendra instead of Azure CS for retrieval in RAG.
- Upgrade server instances and cloud resources.
- Internalise some LLM functionality through LLAMA models.
- Fine-tuning AI for higher quality output
- Inviting more users
Complexities Covered
- Image generation
- Text Generation
- AI Agents and ChatBot
- Speech Generation (TTS)
Built With
- amazon-dynamodb
- amazon-web-services
- asynch-programming
- azure-cognitive-search
- fast-api
- github
- langchain
- pydantic
- python
- react
- render-servers
- retrieval-augmented-generation(rag)
Log in or sign up for Devpost to join the conversation.