Inspiration Ever tried copying code from a YouTube tutorial, Twitter screenshot, or programming book? We got tired of manually retyping code snippets and wanted a solution that just works.

What it does

Seentax is an AI-powered OCR that extracts code from any image—books, screenshots, videos—while perfectly preserving syntax, indentation, and structure across all programming languages. Runs 100% locally, no API needed.

How we built it

We fine-tuned PaddleOCR's vision-language model using a Kaggle dataset of 10,000+ code images across 15+ languages. Trained on NVIDIA H100 GPUs provided by Novita AI, achieving 35% accuracy improvement over base PaddleOCR.

Challenges we ran into

Getting the model to preserve code indentation and special characters was tough. We also had to balance accuracy with inference speed for local deployment. Accomplishments that we're proud of 95%+ syntax accuracy, 60% reduction in character errors, and it runs entirely offline. Successfully handles complex nested code and multiple programming languages.

What we learned

Fine-tuning vision models for domain-specific tasks requires carefully curated data. We also learned GPU optimization techniques that reduced our training time from days to hours.

What's next for Seentax

Multi-language detection in single images, IDE integration plugins, mobile app for on-the-go code capture, and expanding support for handwritten code recognition.

Built With

Share this project:

Updates