NeuroWave

Inspiration

EEG and brain-computer interfaces sound futuristic, but once we started working with real EEG data, we realized how messy and difficult it is. Motor imagery EEG is especially interesting because it could help people control devices without moving, which is actually useful in real life.

We wanted to build something that works in a realistic way, not something that looks good only on our own data. So our main focus became: generalizing across different participants and avoiding data leakage.

What it does

NeuroWave takes EEG epochs shaped like (N, 64, 656) and predicts the motor imagery class for each epoch.

It automatically generates the required submission files:

For every X_eval_<id>.npy (or X_test_<id>.npy)
It outputs y_pred_<id>.npy with shape (N, 1)
Then we zip the folder into predictions.zip for submission.

How we built it

We built a full pipeline in Python:

1) Loading + pairing files Training data comes as:

X_train_<id>.npy
y_train_<id>.npy

Evaluation data comes as:

X_eval_<id>.npy (also supports X_test_<id>.npy)

2) Features We tried two types of features:

Bandpower features (basic EEG bands like delta/theta/mu/beta/gamma + simple stats)
CovLog features (covariance-based features that capture relationships between channels)

CovLog helped the most because motor imagery isn’t just one channel, it’s patterns across channels.

3) Validation We used GroupKFold by participant ID, so the same person’s data never appears in both training and validation. This matters a lot in EEG because otherwise the score can look “good” but it won’t generalize.

4) Model We trained an XGBoost multiclass classifier and used a simple fold ensemble by averaging predicted probabilities.

Challenges we ran into

EEG data is noisy and unpredictable.
Dataset folders and file names were easy to mess up.
Setting up XGBoost on Mac had issues (OpenMP).
We had to be careful with validation to avoid leakage.

Accomplishments that we’re proud of

We built a pipeline that trains, validates correctly, and produces submission files automatically.
Moving from bandpower to CovLog features gave a noticeable improvement.
We generated helpful diagnostics like a confusion matrix and top feature importances for the presentation.

What we learned

EEG generalization is hard, especially across different people.
Validation design matters a lot more than people think.
Covariance-based features are powerful for motor imagery.

What’s next

If we had more time, we’d improve NeuroWave in a few ways:

CSP (Common Spatial Patterns): It’s a popular technique for motor imagery EEG. It learns better “spatial filters” across channels, so the classes can be separated more clearly.
Riemannian methods: Since we’re already using covariance-based features, Riemannian classifiers could be a stronger fit and might generalize better across different people.
Hyperparameter tuning: We used solid default settings, but tuning XGBoost properly (with GroupKFold) could boost accuracy.
Small demo/visuals: We’d add a simple page that shows an EEG sample, the predicted class, and confidence. That would make it easier for judges to understand.

We also built PaperPilot-Gemini, and we’d fully connect it to NeuroWave. The idea is: after training, it would automatically generate a clean report in Markdown (summary, what we tried, results like confusion matrix/feature importance, and what to do next). That would make the project easier to explain and easier for others to reproduce.

Built With

geminiapi
joblib
numpy
python
scipy
streamlit
xgboost

Submitted to

Rice Datathon 2026

Created by

I help in choosing the topic and read and understood the requirment, I looked at the code, helped in pushing it and created the slides and recoded the technical part of the video.

Taki Boubekri
AbuBakr Akram
Elchibek Dastanov
Vanessa Torres

Updates

Elchibek Dastanov started this project — Jan 25, 2026 12:44 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.