Inspiration We were inspired by how difficult to use Excel is and none of the actual AI integrations do a good job of fixing that.
What it does ExcelLang is a microlanguage we built to make Excel requests way more efficient when working with LLMs. Instead of typing out something like "I want to add up all the values in column B from row 2 to row 100, then make the header row bold with a blue background," you can just write =SUM(B2:B100) and 1:1|bold|bg:blue. We're seeing 50-80% token savings with this approach. The whole thing works end-to-end. It understands what you're asking for in plain English, compresses it into our compact syntax, and shows you exactly how it all works through a visual demo. We process case documents and generate Excel commands that you can trace back to the original request. We're using three main approaches:
Pattern matching for common stuff like formulas and formatting Machine learning to figure out what you're trying to do (using Logistic Regression with sentence embeddings) An LLM fallback when things get complicated or ambiguous
How we built it Architecture We went with a three-layer system: Intent Classification Our ML classifier uses Logistic Regression trained on 600+ examples. When it's not confident, we fall back to k-nearest neighbors to find similar examples. Then we pull out the important bits (columns, rows, colors, etc.) using regex patterns. Conversion We have a pattern-matching engine with rule sets for different operation types. Each intent maps to a template that converts entities into ExcelLang syntax. If our rules don't work, we call OpenAI GPT-4 or Google Gemini. Demo Interface Built a Flask backend that processes Word documents, extracts the key info, and generates Excel commands. The frontend shows the whole pipeline visually - from document upload to command execution - with interactive cards and real-time stats. Tech Stack Python powers the core engine and ML components. We're using scikit-learn for classification, HuggingFace for embeddings, and Flask for the API. The frontend is vanilla JavaScript/HTML/CSS with Chart.js for visualizations. Python-docx handles document parsing. Training the Model We created 600+ training examples covering different intent types, converted them to 384-dimensional embeddings, and trained a Logistic Regression classifier. Getting about 85% accuracy on our test set. The system auto-trains if the model's missing, so setup is dead simple. Challenges we ran into Python versions were a pain. The | operator needs Python 3.10+, but some environments were running 3.9. We switched to Union types to fix compatibility issues. Library conflicts drove us crazy. Had numpy and scikit-learn version mismatches throwing "dtype size changed" errors everywhere. Ended up creating isolated virtual environments with locked versions. Intent classification was tricky. Ambiguous requests like "combine all numerical" stumped the initial classifier. We expanded training data, added the k-NN fallback, and implemented LLM handling for really unclear requests. Extracting column references is harder than it looks. People say "column B," "B," "col B," or don't mention it at all. We built multi-pattern regex with priority ordering and context awareness. Finding the right balance between rules and ML took time. Pure rules were fast but limited. Pure ML was flexible but slower. We ended up combining both with LLM for edge cases. Schema context was complicated. Converting column names to Excel letters (A, B, C) required tracking context from uploaded files. Built an encoding system that maps names to letters. Accomplishments that we're proud of The compression ratio is solid. Getting 50-80% token savings consistently. Formulas compress by 65-75%, formatting by 60-70%, overall around 65%. The hybrid architecture actually works. Rules handle common cases fast, ML routes intents accurately (~85%), and LLM catches complex requests. It's seamless. End-to-end pipeline is production-ready. Goes from raw case documents to executable Excel commands with full traceability. The visual interface makes everything transparent. Zero-config deployment. System trains itself if the model's missing. No manual steps needed. Coverage is comprehensive. Supports 14+ operation categories - formulas, formatting, conditional formatting, sorting, filtering, charts, pivot tables, validation, cleaning, and more. The UI looks professional. Dark theme with Excel green accents. Real-time stats, interactive cards, Chart.js visualizations. It actually feels polished. What we learned Domain-specific beats general-purpose. Excel-optimized syntax hits 65% compression vs. way less with generic techniques. Understanding the domain matters. Hybrid approaches win. Rules give you speed, ML gives you flexibility, LLM handles the weird stuff. Together they're better than any single method. Intent classification is everything. Getting the intent right upfront makes conversion way better. Sentence embeddings capture meaning better than keywords. Token efficiency matters in practice. 65% compression means 2.85x more requests for the same cost. Real savings. Visualization helped us build it. The pipeline view helped us debug and explain what's happening. Made iteration way faster. Entity extraction needs context. Can't just pattern match. "Sort by column A" vs "from A to Z" require understanding what the user means. What's next for ExcelLang Better ML model. Expand to 2000+ examples, try BERT fine-tuning, add active learning from user corrections. Direct Excel integration. Build a Python library that executes commands on actual Excel files. Real-time preview before applying changes. Smarter natural language handling. Support multi-step operations, conversational context, automatic schema detection from uploads. Cloud API. Deploy as REST API with batch processing, rate limiting, usage analytics. Excel plugin. Native add-in with syntax highlighting, autocomplete, ribbon integration for both Excel Online and desktop. More syntax coverage. Excel Tables, Power Query, Power Pivot, dynamic arrays, new functions, conditional formatting with formulas. Performance improvements. Cache embeddings and predictions, optimize LLM calls with prompt caching, async batch processing. Community ecosystem. Open-source the spec, build a community pattern library, integrate with Zapier and Power Automate.
Built With
- anthropic
- autoprefixer
- chart.js
- chrome
- component-based
- css
- extension
- flask
- flask-cors
- html
- hybrid
- javascript
- ml
- numpy
- openai
- openpyxl
- pandas
- postcss
- pytest
- python
- python-docx
- pytorch
- react
- restful
- rich
- scikit-learn
- sentence-transformers
- tailwindcss
- tiktoken
- token
- typescript
- ui
- vite
Log in or sign up for Devpost to join the conversation.