Cashback App

A comprehensive personal finance tracking application built with Python and Streamlit. Track your spending, categorize transactions, visualize trends, and gain insights into your financial habits.


About the Project

Inspiration

Money Thing 2 was born out of a simple need: understanding where my money actually goes. Like many people, I found traditional banking apps lacking in detail and flexibility. I wanted:

  • Granular categorization with parent-child relationships (e.g., Shopping → Clothes → Shoes)
  • Automated transaction imports from bank statements
  • Visual spending trends to identify patterns over time
  • Complete control over my financial data without relying on third-party services

The project evolved from a basic spreadsheet tracker into a full-featured web application that could handle everything from receipt scanning to multi-level category hierarchies.

What I Learned

Building this project taught me valuable lessons across multiple domains:

Backend Development:

  • Database design with SQLite, managing complex relationships between transactions, products, vendors, and categories
  • Thread safety in Python applications (solving the infamous SQLite objects created in a thread can only be used in that same thread error)
  • Data aggregation patterns for calculating weekly spending: $\text{Weekly Spend} = \sum_{t \in \text{week}} \sum_{i \in \text{items}} p_i \times q_i$ where $p_i$ is price and $q_i$ is quantity

Frontend Development:

  • Streamlit framework for rapid prototyping and interactive dashboards
  • Plotly for dynamic, responsive data visualizations
  • State management across multiple pages and user sessions

Document Processing:

  • PDF parsing with pdfplumber to extract transaction data from bank statements
  • OCR with pytesseract to read digital receipts
  • Text pattern matching to identify dates, amounts, and merchant names

Software Architecture:

  • Separation of concerns with distinct modules for database management, UI components, and business logic
  • Recursive algorithms for hierarchical category traversal
  • Data transformation pipelines from raw inputs to aggregated insights

How It's Built

Tech Stack:

  • Python 3.13 - Core language
  • Streamlit - Web framework for the UI
  • SQLite - Lightweight database
  • Pandas - Data manipulation and analysis
  • Plotly Express - Interactive charting
  • pdfplumber - PDF text extraction
  • pytesseract - OCR for receipt scanning
  • bcrypt - Password hashing for authentication

Architecture:

moneything2/
├── main.py                 # Entry point, navigation
├── page/                   # UI pages
│   ├── spending_view_page.py    # Analytics dashboard
│   ├── transactions_page.py     # Transaction input/upload
│   ├── categories_page.py       # Category management
│   └── ...
├── src/                    # Business logic
│   ├── db_manager.py       # Database orchestration
│   ├── sql_database.py     # SQLite wrapper
│   ├── pdf_reader.py       # Bank statement parser
│   ├── receipt_reader.py   # OCR receipt processor
│   └── db_classes/         # ORM-style table classes
└── database.db             # SQLite database

Key Features:

  1. Hierarchical Categories: Recursive tree structure allowing categories to have parents and children

    • Shopping → Groceries → Fresh Produce
    • Spending automatically rolls up from children to parents
  2. Multi-Source Data Entry:

    • Manual transaction input with auto-completion
    • Bulk upload from HSBC bank statement PDFs
    • Digital receipt scanning (Lidl receipts via OCR)
  3. Smart Spending Analysis:

    • Weekly aggregation: $\text{Week Total} = \sum_{d \in \text{Mon-Sun}} \text{transactions}_d$
    • Category breakdown views showing subcategory contributions
    • Unassigned transaction tracking for incomplete data
  4. Flexible Data Model:

    • Transactions can have override amounts OR itemized spending items
    • Products linked to vendors and categories
    • Money stores track different accounts/payment methods

Challenges Faced

1. SQLite Threading Issues The biggest technical hurdle was Streamlit's multi-threaded architecture conflicting with SQLite's single-thread default. Dialogs and fragments run in separate threads, causing ProgrammingError when accessing the database.

Solution: Set check_same_thread=False in the SQLite connection, ensuring sequential access through Streamlit's execution model.

2. Hierarchical Category Aggregation Calculating spending for a category while including all descendant categories required careful recursive logic. The challenge was avoiding double-counting and properly separating "direct" parent spending from child totals.

Solution: Implemented get_all_child_category_ids() to recursively collect all descendants, then filter transactions by this set. For breakdowns, separate parent-only transactions from child transactions.

3. Date/Time Parsing from PDFs Bank statements have inconsistent formatting, making reliable extraction difficult:

  • "20 Sep 2025" vs "20/09/2025" vs "2025-09-20"
  • Mixed layouts with varying column positions
  • Multi-line transaction descriptions

Solution: Position-based parsing using x-coordinates to identify columns, with fuzzy text matching for start/end markers like "BALANCEBROUGHTFORWARD".

4. Empty Spending Items Initially, the system assumed all transactions had itemized spending items, but bulk imports only had transaction-level totals.

Solution: Dual-path logic checking for spending items first, falling back to override_money field when items don't exist.

5. Performance with Large Datasets As transaction counts grew, recalculating all categories became slow.

Solution: Pandas vectorization for date grouping and aggregation, moving from row-by-row iteration to batch operations.


Built With

  • next.js
  • streamlet
Share this project:

Updates