Inspiration

Financial reports are dense, repetitive, and hard to digest β€” especially SEC 10-K filings. I wanted to make reading financial data faster, smarter, and more interactive by fine-tuning a language model specifically on real financial documents.

What It Does

MaskMind: Financial Mask-Filler MLM allows users to enter partial financial sentences with [MASK] tokens, and the model intelligently predicts the missing information based on Apple's SEC 10-K filings.

It’s like "autocomplete" β€” but for financial experts.

How I Built It

  • Data Collection: Downloaded Apple's SEC 10-K annual reports
  • Preprocessing: Cleaned, tokenized, and chunked the raw text
  • Model Fine-Tuning: Started with BERT (fine-tuned on SQuAD), then custom fine-tuned further using masked language modeling (MLM) on Apple's financials
  • Gradio App: Built a secure, private app that users can interact with directly without needing an internet connection

Challenges I Faced

  • Dealing with the size and complexity of 10-K filings
  • Avoiding generic predictions by re-training on specialized domain-specific financial language
  • Hosting large model checkpoints outside GitHub without losing accessibility

Accomplishments I'm Proud Of

  • Built a fully private, secure financial domain MLM
  • Trained a fine-tuned masked language model from scratch
  • Created a professional, clean Gradio web interface ready for real-world use

What's Next

  • Expand to other companies beyond Apple
  • Add multi-company or industry-specific models
  • Build lightweight financial Q&A systems alongside the mask-filler

Built With

  • 10-k
  • 3.12
  • apple
  • custom
  • docker-ready
  • filings
  • fine-tuning
  • gradio
  • huggingface
  • python
  • scripts
  • sec
  • transformers
Share this project:

Updates