How I built it

Gmail Triage Bot

An intelligent email-sorting assistant that uses Python automation and lightweight NLP to categorize, prioritize, and summarize incoming Gmail messages helping users reclaim time and focus on what matters.

Inspiration

Email has evolved into a cognitive overload problem.
Between university announcements, project updates, newsletters, and job alerts, the inbox often turns into digital noise.

One night, after manually filtering over 200 unread emails before a hackathon, I asked myself:

“What if my inbox could triage itself like a smart assistant decide what’s urgent, what’s FYI, and what can wait?”

That question became the seed for Gmail Triage Bot an intelligent, rule-driven, and NLP-enhanced automation agent that brings structure to chaos.
The aim wasn’t just automation, but context-aware prioritization so the bot doesn’t merely read emails, it understands intent.

How I Built It

The bot was built as a modular, scalable Python application with a clear architecture:

gmail-triage-py/
│
├── src/
│   ├── triage/
│   │   ├── rules.py        # Logic rules for classification
│   │   └── runner.py       # Orchestrates Gmail API + rules engine
│   └── __init__.py
│
├── main.py                 # Entry-point CLI
├── requirements.txt        # Dependencies
└── .env / creds.json       # OAuth2 credentials (gitignored)

Tech Stack

  • Python 3.11
  • Gmail API for secure message retrieval
  • Authlib + OAuth 2.0 for authentication
  • Regex + NLP heuristics for intent extraction
  • Rule Engine in rules.py for logic-based triage

Logic example in LaTeX form:

[ \text{if }(\text{subject contains "invoice"} \lor \text{sender in contacts}) \Rightarrow \text{priority = High} ]

Triage Pipeline

  1. Authenticate and fetch new emails
  2. Parse metadata (sender, subject, timestamp, snippet)
  3. Apply rules for classification: urgent / informational / promotional
  4. Output structured summaries in the terminal or logs

What I Learned

This project helped me appreciate how automation and interpretability coexist.
Key lessons:

  • Gmail API pagination and quota handling require smart batching
  • OAuth token refresh cycles can be fully automated
  • Regex-driven heuristics sometimes outperform complex ML models for domain-specific tasks
  • Clear, interpretable rules enhance both performance and maintainability

Lesson learned: Start with deterministic logic; add ML only when it adds measurable value.

Challenges Faced

  1. OAuth setup — configuring secure tokens without manual refreshes
  2. Rate limits — batching API calls and using exponential backoff
  3. Email parsing — decoding MIME/HTML content correctly
  4. Balancing ML and Rules — avoiding unnecessary model complexity

Outcome & Impact

The bot can triage 100+ emails in under 30 seconds, achieving:

[ \text{Accuracy} \approx 0.92, \quad \text{Precision}_{\text{urgent}} = 0.88 ]

This automation reduced manual email review time by 70%, freeing mental bandwidth for real work.
Beyond metrics, it made me trust small-scale, interpretable automation again.

Future Work

Planned improvements:

  • Integrate a summarization module using a lightweight LLM for context-aware summaries
  • Schedule periodic triage via CRON jobs
  • Deploy a minimal web dashboard to visualize triage metrics and daily trends

Built With

Share this project:

Updates