Inspiration
Large Language Models have gained widespread attention in recent years for their remarkable ability to generate human-like text. One of the significant challenges while working with LLMs is the limitation on context length.
For tasks like textbooks or document summarization, a much larger context length is required. One possible solution is fine-tuning the entire model on the new dataset.
This approach performs well, but fine-tuning requires substantial GPU memory and time. Also, the finetuned weights are different for each task, and distributing separate weights for each task may not be easy due to their size.
What it does
To overcome the issue of limited context length we propose to add additional layers alongside the LLM and finetune the added weights while keeping the original weights frozen.
The proposed additional layers will be similar to the original model but with very few parameters compared to the original network.
How we built it
We use Pytorch to train the model and Gradio to deploy it.
Challenges we ran into
Finding out the best learning rate and model size.
Log in or sign up for Devpost to join the conversation.