Control Pre-trained Transformers

Inspiration

Large Language Models have gained widespread attention in recent years for their remarkable ability to generate human-like text. One of the significant challenges while working with LLMs is the limitation on context length.

For tasks like textbooks or document summarization, a much larger context length is required. One possible solution is fine-tuning the entire model on the new dataset.

This approach performs well, but fine-tuning requires substantial GPU memory and time. Also, the finetuned weights are different for each task, and distributing separate weights for each task may not be easy due to their size.

What it does

To overcome the issue of limited context length we propose to add additional layers alongside the LLM and finetune the added weights while keeping the original weights frozen.

The proposed additional layers will be similar to the original model but with very few parameters compared to the original network.

How we built it

We use Pytorch to train the model and Gradio to deploy it.

Challenges we ran into

Finding out the best learning rate and model size.

What's next for Control Pre-trained Transformers

Built With

Updates

Basu Jindal started this project — Jul 16, 2023 03:37 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.