Cashback App

A comprehensive personal finance tracking application built with Python and Streamlit. Track your spending, categorize transactions, visualize trends, and gain insights into your financial habits.

About the Project

Inspiration

Money Thing 2 was born out of a simple need: understanding where my money actually goes. Like many people, I found traditional banking apps lacking in detail and flexibility. I wanted:

Granular categorization with parent-child relationships (e.g., Shopping → Clothes → Shoes)
Automated transaction imports from bank statements
Visual spending trends to identify patterns over time
Complete control over my financial data without relying on third-party services

The project evolved from a basic spreadsheet tracker into a full-featured web application that could handle everything from receipt scanning to multi-level category hierarchies.

What I Learned

Building this project taught me valuable lessons across multiple domains:

Backend Development:

Database design with SQLite, managing complex relationships between transactions, products, vendors, and categories
Thread safety in Python applications (solving the infamous SQLite objects created in a thread can only be used in that same thread error)
Data aggregation patterns for calculating weekly spending: $\text{Weekly Spend} = \sum_{t \in \text{week}} \sum_{i \in \text{items}} p_i \times q_i$ where $p_i$ is price and $q_i$ is quantity

Frontend Development:

Streamlit framework for rapid prototyping and interactive dashboards
Plotly for dynamic, responsive data visualizations
State management across multiple pages and user sessions

Document Processing:

PDF parsing with pdfplumber to extract transaction data from bank statements
OCR with pytesseract to read digital receipts
Text pattern matching to identify dates, amounts, and merchant names

Software Architecture:

Separation of concerns with distinct modules for database management, UI components, and business logic
Recursive algorithms for hierarchical category traversal
Data transformation pipelines from raw inputs to aggregated insights

How It's Built

Tech Stack:

Python 3.13 - Core language
Streamlit - Web framework for the UI
SQLite - Lightweight database
Pandas - Data manipulation and analysis
Plotly Express - Interactive charting
pdfplumber - PDF text extraction
pytesseract - OCR for receipt scanning
bcrypt - Password hashing for authentication

Architecture:

moneything2/
├── main.py                 # Entry point, navigation
├── page/                   # UI pages
│   ├── spending_view_page.py    # Analytics dashboard
│   ├── transactions_page.py     # Transaction input/upload
│   ├── categories_page.py       # Category management
│   └── ...
├── src/                    # Business logic
│   ├── db_manager.py       # Database orchestration
│   ├── sql_database.py     # SQLite wrapper
│   ├── pdf_reader.py       # Bank statement parser
│   ├── receipt_reader.py   # OCR receipt processor
│   └── db_classes/         # ORM-style table classes
└── database.db             # SQLite database

Key Features:

Hierarchical Categories: Recursive tree structure allowing categories to have parents and children
- Shopping → Groceries → Fresh Produce
- Spending automatically rolls up from children to parents
Multi-Source Data Entry:
- Manual transaction input with auto-completion
- Bulk upload from HSBC bank statement PDFs
- Digital receipt scanning (Lidl receipts via OCR)
Smart Spending Analysis:
- Weekly aggregation: $\text{Week Total} = \sum_{d \in \text{Mon-Sun}} \text{transactions}_d$
- Category breakdown views showing subcategory contributions
- Unassigned transaction tracking for incomplete data
Flexible Data Model:
- Transactions can have override amounts OR itemized spending items
- Products linked to vendors and categories
- Money stores track different accounts/payment methods

Challenges Faced

1. SQLite Threading Issues The biggest technical hurdle was Streamlit's multi-threaded architecture conflicting with SQLite's single-thread default. Dialogs and fragments run in separate threads, causing ProgrammingError when accessing the database.

Solution: Set check_same_thread=False in the SQLite connection, ensuring sequential access through Streamlit's execution model.

2. Hierarchical Category Aggregation Calculating spending for a category while including all descendant categories required careful recursive logic. The challenge was avoiding double-counting and properly separating "direct" parent spending from child totals.

Solution: Implemented get_all_child_category_ids() to recursively collect all descendants, then filter transactions by this set. For breakdowns, separate parent-only transactions from child transactions.

3. Date/Time Parsing from PDFs Bank statements have inconsistent formatting, making reliable extraction difficult:

"20 Sep 2025" vs "20/09/2025" vs "2025-09-20"
Mixed layouts with varying column positions
Multi-line transaction descriptions

Solution: Position-based parsing using x-coordinates to identify columns, with fuzzy text matching for start/end markers like "BALANCEBROUGHTFORWARD".

4. Empty Spending Items Initially, the system assumed all transactions had itemized spending items, but bulk imports only had transaction-level totals.

Solution: Dual-path logic checking for spending items first, falling back to override_money field when items don't exist.

5. Performance with Large Datasets As transaction counts grew, recalculating all categories became slow.

Solution: Pandas vectorization for date grouping and aggregation, moving from row-by-row iteration to batch operations.

Built With

next.js
streamlet

Submitted to

GreatUniHack 2025

Created by

I worked on Money thing: the spending statistics, allowing users to see their spending patterns.

Jake Purton
Worked on CountryCollide. Worked on the merge feature and the AI features

Harry Foster
Fergus Young
Christian Harris

Updates

Jake Purton started this project — Nov 09, 2025 05:54 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.