We improved LLM training by using a cosine cycle scheduler, full fine-tuning, and a corrected loss function. Pretraining on SlimPajama for 3h followed by 1h fine-tuning on MathQA gave strong results. LoRA and quantization attempts were less effective due to time and performance constraints. Training runs on 1 A100 GPU and requires prior tokenization. Final MathQA results: val loss 1.705, perplexity 5.50.
Built With
- python
- pytroch
Log in or sign up for Devpost to join the conversation.