We improved LLM training by using a cosine cycle scheduler, full fine-tuning, and a corrected loss function. Pretraining on SlimPajama for 3h followed by 1h fine-tuning on MathQA gave strong results. LoRA and quantization attempts were less effective due to time and performance constraints. Training runs on 1 A100 GPU and requires prior tokenization. Final MathQA results: val loss 1.705, perplexity 5.50.

Built With

Share this project:

Updates