Project Overview Parkinson's disease (PD) is a progressive neurological disorder that affects movement and speech. Early detection is crucial for slowing progression and improving patient quality of life. In this project, we develop a machine learning-based system to predict whether a person has Parkinson’s disease using voice measurements.

🎯 Objective To build a classification model that analyzes biomedical voice measurements and predicts the presence or absence of Parkinson’s disease.

📂 Dataset Source: UCI Machine Learning Repository – Parkinson’s Disease Dataset

Instances: ~195 voice recordings

Features: 22 biomedical voice measurements including:

MDVP:Fo(Hz) – Average vocal fundamental frequency

MDVP:Jitter(%), MDVP:Shimmer – Measures of variation in frequency and amplitude

NHR, HNR – Noise-to-harmonics and harmonics-to-noise ratios

DFA, PPE – Nonlinear dynamical complexity measures

Target variable: status (1 = Parkinson’s, 0 = Healthy)

🔧 Tools & Technologies Language: Python

Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn

Models Used:

Logistic Regression

Support Vector Machine (SVM)

Decision Tree

Random Forest

🔍 Workflow Data Preprocessing:

Handle missing/null values (if any)

Feature scaling using StandardScaler

80:20 Train-Test split

Exploratory Data Analysis (EDA):

Visualize class distribution

Correlation heatmaps

Outlier detection and treatment (if needed)

Model Training:

Train multiple models (Logistic Regression, SVM, Decision Tree, Random Forest)

Use cross-validation for performance comparison

Evaluation Metrics:

Accuracy

Precision, Recall, F1-score

Confusion Matrix

ROC-AUC Curve

Model Comparison:

Identify the best-performing model based on test accuracy and F1-score

Built With

  • matplotlib
  • numpy
  • seaborn
  • tools-&-technologies-language:-python-libraries:-pandas
Share this project:

Updates