Inspiration

EEG and brain-computer interfaces sound futuristic, but once we started working with real EEG data, we realized how messy and difficult it is. Motor imagery EEG is especially interesting because it could help people control devices without moving, which is actually useful in real life.

We wanted to build something that works in a realistic way, not something that looks good only on our own data. So our main focus became: generalizing across different participants and avoiding data leakage.

What it does

NeuroWave takes EEG epochs shaped like (N, 64, 656) and predicts the motor imagery class for each epoch.

It automatically generates the required submission files:

  • For every X_eval_<id>.npy (or X_test_<id>.npy)
  • It outputs y_pred_<id>.npy with shape (N, 1)
    Then we zip the folder into predictions.zip for submission.

How we built it

We built a full pipeline in Python:

1) Loading + pairing files Training data comes as:

  • X_train_<id>.npy
  • y_train_<id>.npy

Evaluation data comes as:

  • X_eval_<id>.npy (also supports X_test_<id>.npy)

2) Features We tried two types of features:

  • Bandpower features (basic EEG bands like delta/theta/mu/beta/gamma + simple stats)
  • CovLog features (covariance-based features that capture relationships between channels)

CovLog helped the most because motor imagery isn’t just one channel, it’s patterns across channels.

3) Validation We used GroupKFold by participant ID, so the same person’s data never appears in both training and validation. This matters a lot in EEG because otherwise the score can look “good” but it won’t generalize.

4) Model We trained an XGBoost multiclass classifier and used a simple fold ensemble by averaging predicted probabilities.

Challenges we ran into

  • EEG data is noisy and unpredictable.
  • Dataset folders and file names were easy to mess up.
  • Setting up XGBoost on Mac had issues (OpenMP).
  • We had to be careful with validation to avoid leakage.

Accomplishments that we’re proud of

  • We built a pipeline that trains, validates correctly, and produces submission files automatically.
  • Moving from bandpower to CovLog features gave a noticeable improvement.
  • We generated helpful diagnostics like a confusion matrix and top feature importances for the presentation.

What we learned

  • EEG generalization is hard, especially across different people.
  • Validation design matters a lot more than people think.
  • Covariance-based features are powerful for motor imagery.

What’s next

If we had more time, we’d improve NeuroWave in a few ways:

  • CSP (Common Spatial Patterns): It’s a popular technique for motor imagery EEG. It learns better “spatial filters” across channels, so the classes can be separated more clearly.
  • Riemannian methods: Since we’re already using covariance-based features, Riemannian classifiers could be a stronger fit and might generalize better across different people.
  • Hyperparameter tuning: We used solid default settings, but tuning XGBoost properly (with GroupKFold) could boost accuracy.
  • Small demo/visuals: We’d add a simple page that shows an EEG sample, the predicted class, and confidence. That would make it easier for judges to understand.

We also built PaperPilot-Gemini, and we’d fully connect it to NeuroWave. The idea is: after training, it would automatically generate a clean report in Markdown (summary, what we tried, results like confusion matrix/feature importance, and what to do next). That would make the project easier to explain and easier for others to reproduce.

Built With

Share this project:

Updates