Rosehack 2026 - AI-Powered Journaling CLI

Overview

My goal is to build a robust, interpretable, and resilient text emotion analysis tool. Users can enter free-form text and quickly obtain:

Overall Mood: Very Positive to Very Negative
Sentiment Score: Positive vs. Negative probability
Contextual Emotions: Top detected emotions from the text
Persistence: Save entries locally in SQLite for historical tracking

This project focuses on performance and robustness while maintaining clean logic, readable code, reproducibility, and handling edge cases gracefully.

Background

In real-world applications, raw textual input is rarely well-structured. The tool is designed to handle linguistic ambiguity and context by addressing common challenges like mixed slang, idioms, emojis, and inconsistent grammar.

Text Input	Expected Sentiment	Contextual Challenge
"I'm crushing it at work!"	Positive / Excited	"Crushing" can also be negative
"The workload is crushing me..."	Negative / Dissapointed	Same keyword, different context
"lmao that was wild 😂"	Positive / Amusement	Emoji conveys strong sentiment

Installation & Setup

Since this project relies on powerful pre-trained transformer models and platform-specific Python environments, the setup steps must be clearly defined.

1. Dependencies and Environment

The application requires Python 3.10+ and several external libraries.

torch: The core framework for running the model inference. I chose PyTorch for its widespread support in the academic and industrial ML community.
transformers: Hugging Face's library, used to easily download and manage the pre-trained RoBERTa (sentiment) and BERT (emotion) models. This abstracts away complex model loading logic.
nltk: Used here primarily for basic text processing necessities, such as tokenization.
sqlite3: This is used for persistence and is part of Python's standard library (sqlite3 module). I chose this for its zero-dependency, self-contained nature, making the project highly portable and resilient across different operating systems without requiring external database services.

2. Setup Steps

Follow these steps to set up the project locally. These commands are universal across Windows (using an Anaconda/WSL terminal), macOS, and Linux.

# 1. Clone the repository
git clone https://github.com/kumaraaryan511/Rosehack_sentiment_analysis_tool.git
cd AI-Powered-Journaling-Logic-App

# 2. Create and activate a virtual environment 
python3 -m venv venv    #SKIP IF USING WINDOWS
source venv/bin/activate  #SKIP IF USING WINDOWS

# 3. Install required packages
pip install torch transformers nltk hf-xet

# 4. Download NLTK data
python -c "import nltk; nltk.download('punkt')"

# 5. Run the app
python app.py

#NOTE: wait for the model.safetensors to load for a bit after running app.py, it may take some time
#once the safetensors have loaded, press enter (only occurs during first run of app)

Technical Design & Methodology

This project is built on a dual-model architecture to ensure both high-level mood assessment and granular emotional insight, explicitly balancing accuracy with context sensitivity.

Sentiment Analysis

Model: cardiffnlp/twitter-roberta-base-sentiment-latest

Use: Excellent at understanding general sentiment, even during ambiguity

Process: Tokenize → Compute softmax → Calculate Score: (Positive - Negative) → Map score to mood thresholds.

Justification for using RoBERTa:

I initially prototyped with VADER (Valence Aware Dictionary and sEntiment Reasoner), but quickly found it failed in real-world scenarios, particularly with modern slang, emojis, and contextual ambiguities (e.g., failing to distinguish "This job is sick" from "I am sick of this job"). The RoBERTa model is excellent at understanding general sentiment even with messy, abbreviated language, providing a robust foundation.

Contextual Emotion Extraction

Model: monologg/bert-base-cased-goemotions-original

Use: Good at picking up specific emotions (anger, joy, excitement), but struggles with ambiguity

Process: Tokenize → Apply sigmoid to logits → Zero-out neutral label → Filter for top 3 strongest.

Justification for GoEmotions:

While I tested several other multi-label BERT models, this model, trained on the GoEmotions dataset (known for its fine-grained classification across 27 distinct emotion labels), consistently outshined other models in its ability to detect specific psychological states like admiration, grief, or surprise. Its strength lies in picking up specific emotions, which is useful in a journaling tool.

Tests

I have included a file of test data and its resulting output in the repository. You can find it in "test_outputs.txt"

Features

1. Overall Mood Analysis

Sentiment is calculated as Positive probability − Negative probability using the cardiffnlp/twitter-roberta-base-sentiment-latest model.

Sentiment Score	Mood Label
> 0.6	VERY POSITIVE
> 0.2	POSITIVE
-0.2 → 0.2	NEUTRAL
< -0.2	NEGATIVE
< -0.6	VERY NEGATIVE

2. Contextual Emotions

Uses monologg/bert-base-cased-goemotions-original to detect emotions.

Processes top 25 predicted emotions but displays only the top 3 strongest to avoid clutter.
Neutral or weak emotions are ignored. If no strong emotions are detected, the output is: No strong emotions detected.
Positive Set: Admiration, Amusement, Approval, Caring, Confidence, Curiosity, Desire, Excitement, Gratitude, Joy, Love, Optimism, Pride, Relief, Surprise
Negative Set: Anger, Annoyance, Disappointment, Disgust, Embarrassment, Fear, Grief, Nervousness, Remorse, Sadness

3. Persistence + Database schema

All entries are stored in a SQLite database (history.db).

Field	Type	Description
`id`	INTEGER PRIMARY KEY	Unique identifier for the entry.
`text`	TEXT NOT NULL	Raw user input text.
`score`	REAL NOT NULL	Calculated sentiment score.
`emotion`	TEXT NOT NULL	Formatted string of top emotions.