LLM Training

We improved LLM training by using a cosine cycle scheduler, full fine-tuning, and a corrected loss function. Pretraining on SlimPajama for 3h followed by 1h fine-tuning on MathQA gave strong results. LoRA and quantization attempts were less effective due to time and performance constraints. Training runs on 1 A100 GPU and requires prior tokenization. Final MathQA results: val loss 1.705, perplexity 5.50.

Built With

python
pytroch

Updates

Jakhongir Saydaliev started this project — May 17, 2025 09:19 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.