probe_layer coverage · Bruno Aristimunha

Layer-wise linear probing tested across the entire braindecode.models.util.models_dict registry: every model, a sample of its named submodules, one forward pass per (model, layer) pair. The dots below are the result — green for a passing hook, vermillion for a failure, hatched for skipped.

show numbers · REVE

probe_layer	AUROC ± SD	bal_acc ± SD	n
canonical (probe at REVE final output)	0.788 ± 0.014	0.500 ± 0.032	3
model.to_patch_embedding.0	0.633 ± 0.005	0.369 ± 0.014	3
model.mlp4d	0.500 ± 0.000	0.250 ± 0.000	3
model.transformer.layers.0.1	0.639 ± 0.024	0.372 ± 0.022	3
model.transformer.layers.3.1	0.669 ± 0.010	0.386 ± 0.004	3
model.transformer.layers.7.1	0.759 ± 0.002	0.483 ± 0.003	3
model.transformer.layers.11.1	0.802 ± 0.003	0.498 ± 0.008	3
model.transformer.layers.15.1	0.807 ± 0.002	0.536 ± 0.006	3
model.transformer.layers.21.1	0.693 ± 0.001	0.398 ± 0.003	3
model.ln	0.500 ± 0.000	0.250 ± 0.000	3

show numbers

probe_layer	AUROC ± SD	bal_acc ± SD	n
canonical (probe at BENDR final output)	0.509 ± 0.017	0.259 ± 0.018	3
encoder.encoder.Encoder_0	0.651 ± 0.023	0.372 ± 0.018	3
encoder.encoder.Encoder_1	0.639 ± 0.006	0.368 ± 0.006	3
encoder.encoder.Encoder_2	0.625 ± 0.005	0.355 ± 0.007	3
encoder.encoder.Encoder_3	0.626 ± 0.006	0.348 ± 0.002	3
encoder.encoder.Encoder_4	0.581 ± 0.014	0.317 ± 0.001	3
encoder.encoder.Encoder_5	0.544 ± 0.013	0.296 ± 0.006	3
contextualizer.input_conditioning.0	0.519 ± 0.038	0.265 ± 0.026	3
contextualizer.input_conditioning.1	0.508 ± 0.015	0.260 ± 0.005	3
contextualizer.input_conditioning.3	0.544 ± 0.044	0.284 ± 0.039	3
contextualizer.relative_position.0	0.498 ± 0.034	0.270 ± 0.021	3
contextualizer.transformer_layers.0	0.504 ± 0.011	0.253 ± 0.008	3
contextualizer.transformer_layers.1	0.500 ± 0.009	0.253 ± 0.007	3
contextualizer.transformer_layers.2	0.506 ± 0.006	0.252 ± 0.015	3
contextualizer.transformer_layers.3	0.502 ± 0.004	0.245 ± 0.006	3
contextualizer.transformer_layers.4	0.504 ± 0.006	0.262 ± 0.013	3
contextualizer.transformer_layers.5	0.501 ± 0.015	0.255 ± 0.020	3
contextualizer.transformer_layers.6	0.508 ± 0.016	0.259 ± 0.013	3
contextualizer.transformer_layers.7	0.499 ± 0.017	0.249 ± 0.016	3

NeuralBench-EEG-Core — Cross-FM evaluation

Layer-wise linear probes on six EEG foundation models across the nine tasks with public-data YAMLs in facebookresearch/neuroai. The Core spec lists 36 EEG tasks, but only 9 ship dataset configs in the public release — the other 27 require manual access to gated corpora (TUH EEG, THINGS-images, etc.). So “NeuralBench-EEG-Core v1.0” here is the full public-data sweep: 6 FMs × 9 tasks × 10–11 probe layers × 3 seeds. Three (FM×task) cells are structurally impossible: LaBraM × {mental_arithmetic, mental_workload, motor_execution}, where the dataset has channels not in LABRAM_CHANNEL_ORDER (only fixable with InterpolatedLaBraM).

Per-FM probe depth — motor imagery / Tangermann 2012

Depth analysis — does the best probe layer generalise?

Each foundation model was probed at every layer of its own architecture across all nine NeuralBench-EEG-Core tasks. The four views below answer one question: if I have to pick a probe layer without knowing the downstream task, where should I tap?

Per-FM depth profiles

AUROC at each probe layer, averaged across the nine tasks. Shaded band: ±1 SD across tasks (narrow = robust choice, wide = task-specific). Stage brackets show the architecture's natural blocks.

Layer × task drilldown

Each cell is AUROC for one (probe layer, task) pair. Tasks ordered by FM mean, so the strongest tasks lie on the left of every panel.

How well does probe_layer
cover braindecode?

The matrix

Why layers fail, when they do

By cause

By model (top offenders)

NeuralBench-EEG-Core — Cross-FM evaluation

Per-FM probe depth — motor imagery / Tangermann 2012

Depth analysis — does the best probe layer generalise?

Per-FM depth profiles

Layer × task drilldown

How well does probe_layercover braindecode?

The matrix

Why layers fail, when they do

By cause

By model (top offenders)

NeuralBench-EEG-Core — Cross-FM evaluation

Per-FM probe depth — motor imagery / Tangermann 2012

Depth analysis — does the best probe layer generalise?

Per-FM depth profiles

Layer × task drilldown

How well does probe_layer
cover braindecode?