Cashback App
A comprehensive personal finance tracking application built with Python and Streamlit. Track your spending, categorize transactions, visualize trends, and gain insights into your financial habits.
About the Project
Inspiration
Money Thing 2 was born out of a simple need: understanding where my money actually goes. Like many people, I found traditional banking apps lacking in detail and flexibility. I wanted:
- Granular categorization with parent-child relationships (e.g., Shopping → Clothes → Shoes)
- Automated transaction imports from bank statements
- Visual spending trends to identify patterns over time
- Complete control over my financial data without relying on third-party services
The project evolved from a basic spreadsheet tracker into a full-featured web application that could handle everything from receipt scanning to multi-level category hierarchies.
What I Learned
Building this project taught me valuable lessons across multiple domains:
Backend Development:
- Database design with SQLite, managing complex relationships between transactions, products, vendors, and categories
- Thread safety in Python applications (solving the infamous
SQLite objects created in a thread can only be used in that same threaderror) - Data aggregation patterns for calculating weekly spending: $\text{Weekly Spend} = \sum_{t \in \text{week}} \sum_{i \in \text{items}} p_i \times q_i$ where $p_i$ is price and $q_i$ is quantity
Frontend Development:
- Streamlit framework for rapid prototyping and interactive dashboards
- Plotly for dynamic, responsive data visualizations
- State management across multiple pages and user sessions
Document Processing:
- PDF parsing with pdfplumber to extract transaction data from bank statements
- OCR with pytesseract to read digital receipts
- Text pattern matching to identify dates, amounts, and merchant names
Software Architecture:
- Separation of concerns with distinct modules for database management, UI components, and business logic
- Recursive algorithms for hierarchical category traversal
- Data transformation pipelines from raw inputs to aggregated insights
How It's Built
Tech Stack:
- Python 3.13 - Core language
- Streamlit - Web framework for the UI
- SQLite - Lightweight database
- Pandas - Data manipulation and analysis
- Plotly Express - Interactive charting
- pdfplumber - PDF text extraction
- pytesseract - OCR for receipt scanning
- bcrypt - Password hashing for authentication
Architecture:
moneything2/
├── main.py # Entry point, navigation
├── page/ # UI pages
│ ├── spending_view_page.py # Analytics dashboard
│ ├── transactions_page.py # Transaction input/upload
│ ├── categories_page.py # Category management
│ └── ...
├── src/ # Business logic
│ ├── db_manager.py # Database orchestration
│ ├── sql_database.py # SQLite wrapper
│ ├── pdf_reader.py # Bank statement parser
│ ├── receipt_reader.py # OCR receipt processor
│ └── db_classes/ # ORM-style table classes
└── database.db # SQLite database
Key Features:
Hierarchical Categories: Recursive tree structure allowing categories to have parents and children
- Shopping → Groceries → Fresh Produce
- Spending automatically rolls up from children to parents
Multi-Source Data Entry:
- Manual transaction input with auto-completion
- Bulk upload from HSBC bank statement PDFs
- Digital receipt scanning (Lidl receipts via OCR)
Smart Spending Analysis:
- Weekly aggregation: $\text{Week Total} = \sum_{d \in \text{Mon-Sun}} \text{transactions}_d$
- Category breakdown views showing subcategory contributions
- Unassigned transaction tracking for incomplete data
Flexible Data Model:
- Transactions can have override amounts OR itemized spending items
- Products linked to vendors and categories
- Money stores track different accounts/payment methods
Challenges Faced
1. SQLite Threading Issues
The biggest technical hurdle was Streamlit's multi-threaded architecture conflicting with SQLite's single-thread default. Dialogs and fragments run in separate threads, causing ProgrammingError when accessing the database.
Solution: Set check_same_thread=False in the SQLite connection, ensuring sequential access through Streamlit's execution model.
2. Hierarchical Category Aggregation Calculating spending for a category while including all descendant categories required careful recursive logic. The challenge was avoiding double-counting and properly separating "direct" parent spending from child totals.
Solution: Implemented get_all_child_category_ids() to recursively collect all descendants, then filter transactions by this set. For breakdowns, separate parent-only transactions from child transactions.
3. Date/Time Parsing from PDFs Bank statements have inconsistent formatting, making reliable extraction difficult:
- "20 Sep 2025" vs "20/09/2025" vs "2025-09-20"
- Mixed layouts with varying column positions
- Multi-line transaction descriptions
Solution: Position-based parsing using x-coordinates to identify columns, with fuzzy text matching for start/end markers like "BALANCEBROUGHTFORWARD".
4. Empty Spending Items Initially, the system assumed all transactions had itemized spending items, but bulk imports only had transaction-level totals.
Solution: Dual-path logic checking for spending items first, falling back to override_money field when items don't exist.
5. Performance with Large Datasets As transaction counts grew, recalculating all categories became slow.
Solution: Pandas vectorization for date grouping and aggregation, moving from row-by-row iteration to batch operations.
Built With
- next.js
- streamlet


Log in or sign up for Devpost to join the conversation.