Project Overview Parkinson's disease (PD) is a progressive neurological disorder that affects movement and speech. Early detection is crucial for slowing progression and improving patient quality of life. In this project, we develop a machine learning-based system to predict whether a person has Parkinson’s disease using voice measurements.
🎯 Objective To build a classification model that analyzes biomedical voice measurements and predicts the presence or absence of Parkinson’s disease.
📂 Dataset Source: UCI Machine Learning Repository – Parkinson’s Disease Dataset
Instances: ~195 voice recordings
Features: 22 biomedical voice measurements including:
MDVP:Fo(Hz) – Average vocal fundamental frequency
MDVP:Jitter(%), MDVP:Shimmer – Measures of variation in frequency and amplitude
NHR, HNR – Noise-to-harmonics and harmonics-to-noise ratios
DFA, PPE – Nonlinear dynamical complexity measures
Target variable: status (1 = Parkinson’s, 0 = Healthy)
🔧 Tools & Technologies Language: Python
Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn
Models Used:
Logistic Regression
Support Vector Machine (SVM)
Decision Tree
Random Forest
🔍 Workflow Data Preprocessing:
Handle missing/null values (if any)
Feature scaling using StandardScaler
80:20 Train-Test split
Exploratory Data Analysis (EDA):
Visualize class distribution
Correlation heatmaps
Outlier detection and treatment (if needed)
Model Training:
Train multiple models (Logistic Regression, SVM, Decision Tree, Random Forest)
Use cross-validation for performance comparison
Evaluation Metrics:
Accuracy
Precision, Recall, F1-score
Confusion Matrix
ROC-AUC Curve
Model Comparison:
Identify the best-performing model based on test accuracy and F1-score
Built With
- matplotlib
- numpy
- seaborn
- tools-&-technologies-language:-python-libraries:-pandas
Log in or sign up for Devpost to join the conversation.