CereBro

Inspiration

EEG-based brain-computer interfaces have long promised to decode what people are thinking — but most research either fine-tunes a single model or tests on a single dataset, leaving huge gaps in generalizability. We were inspired by the recent surge in EEG foundation models (large pretrained transformers trained on thousands of hours of neural recordings) and asked: what if you didn't have to pick just one? That's when we came to the idea of an ensemble that fuses multiple foundation model perspectives on the same inner-speech signal, the way an ensemble of experts reaches better consensus than any one alone.

What It Does

CereBro is an end-to-end pipeline for decoding imagined/inner speech from EEG signals by ensembling four state-of-the-art EEG foundation models: LaBraM, BENDR, REVE, and NeuroLM. Each backbone independently encodes a raw EEG trial into a feature vector; a lightweight fusion head then projects and combines all four representations to predict the spoken word or phoneme class. The system supports two inner-speech datasets (ds007591 and KaraOne), within-subject cross-validation, leave-one-subject-out generalization, and cross-dataset transfer — all from a single CLI.

How We Built It

We built a unified preprocessing pipeline (notch filter → 0.1–75 Hz bandpass → 200 Hz resample → µV normalization) applied identically to both datasets, outputting a standardized HDF5 format. On top of that, we wrapped all four pretrained backbones behind a single BackboneClassifier interface with shared training, evaluation, and feature-extraction scripts. The fusion architecture uses per-backbone projection heads (Linear → GELU → LayerNorm → Dropout) that map heterogeneous feature dimensions to a common 128-D space, concatenates them into a 512-D vector, and passes it through a shared MLP classifier. We also implemented Euclidean Alignment for domain shift, layer-wise LR decay for careful fine-tuning, and a suite of evaluation regimes (k-fold, LOSO, cross-dataset, probe vs. fine-tune).

Challenges We Ran Into

Channel heterogeneity: Each dataset uses different electrode naming conventions that don't directly map to foundation model vocabularies. We built a greedy coordinate-based mapping to 10-10 standard positions, but channels that couldn't be mapped had to be dropped, leaving ~92 channels for ds007591 and ~50 for KaraOne.
Backbone incompatibilities: BENDR was pretrained at 256 Hz (we serve it 200 Hz), LaBraM requires a fixed 3000-sample input (requiring zero-padding), and NeuroLM's VQ tokenizer is non-differentiable, meaning its codebook must always stay frozen even during full fine-tuning.
REVE gating: REVE's Hugging Face repositories are access-controlled, adding an extra setup step for anyone running the full four-model ensemble.

Accomplishments That We're Proud Of

A single --model {labram,bendr,reve,neurolm} flag that drives the exact same training loop for four architecturally very different models.
A working fusion pipeline that trains a shared head over all four frozen backbones end-to-end from raw EEG in one command.
Pretrained LaBraM linear probe results already beating chance (0.293 ± 0.058 vs. 0.20 chance) on 5-class inner-speech with no fine-tuning at all — a clean sanity-check floor before GPU fine-tuning.
Clean dataset-agnostic HDF5 I/O and k-fold/LOSO splitters that work identically across both very different EEG datasets.

What We Learned

EEG foundation models are more robust to sample-rate mismatches than we initially feared — serving BENDR's 256 Hz pretrained weights at 200 Hz did not catastrophically break the representations.
Layer-wise LR decay is essential for fine-tuning deep EEG transformers: without it, early layers quickly overwrite pretrained structure on small datasets.
The fusion approach exposes just how different each backbone's learned geometry is — t-SNE plots of features from the four models before fusion look strikingly dissimilar even on the same trials, which validates the case for ensembling.
Preprocessing choices matter more than model choice for cross-dataset transfer: standardizing both datasets to the same canonical chain made channel alignment tractable.

What's Next for CereBro

GPU fine-tuning results: The full 16-run sweep (4 backbones × 2 datasets × 2 splits) is queued on a GPU partner's machine — those numbers will be the headline comparison table showing how much fine-tuning lifts each backbone above the pretrained-feature baseline.
Cross-dataset transfer: Training on KaraOne and evaluating zero-shot on ds007591 (and vice versa) to test whether the fused representations generalize across recording setups.
Attention to REVE: Once HF access is sorted, adding REVE as the fourth backbone in the fusion will let us test whether the four-way ensemble outperforms every three-way subset.
Real-time inference: Stripping the pipeline down to a lightweight streaming version suitable for live BCI demos.
More inner-speech datasets: Expanding beyond two datasets toward a larger multi-cohort benchmark to stress-test LOSO generalization.

Built With

accessibility
bci
bendr
brain-computer-interface
deep-learning
eeg
ensemble-learning
feature-extraction
fine-tuning
foundation-models
huggingface
imagined-speech
inner-speech
labram
machine-learning
neural-decoding
neurolm
neuroscience
neurotech
pytorch
signal-processing
time-series
transfer-learning
transformer

Updates

Federico Mengozzi started this project — Apr 12, 2026 02:58 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.