Patient MEG scan

Predict visual stimuli from human brain activity

Kaggle is hosting a contest where the task is to predict visual stimuli from magnetoencephalography (MEG) recordings of human brain activity. A subject is presented a stimulus (a human face or a distorted face) and the concurrent brain activity is recorded. The relation between the recorded signal and the stimulus may provide insights on the underlying mental process. We use Vowpal Wabbit to beat the benchmark.


Go to the Kaggle competition page to read the full description.

We have the data for 23 participants in the study. All participants have completed around 580 trials. There are time series of brain activity (starting 0.5 seconds before the stimulus is presented, a total of 375 bins). There are 306 channels (from the MEG sensors).

Labels are either 1 (a human face) or 0 (a distorted face). We have the labels for the trials of 16 participants (the train set). We have to predict the labels for the trials of 7 participants (the test set).

Scrambled face

We extract the MEG data in the first 500ms from when the stimulus starts. For the training set this means we have about 354.960.000 data points (125*306*580*16).

If we run the very welcome provided benchmark script we need over 10GB of memory to train a model with logistic regression. It would be nice if we can do regression without loading all data in memory–Then we can create models on average computers too.

We use the exact same method from the provided benchmark code: simply pool all trials from all participants in one data set. Then instead of scikit-learn logistic regression we use Vowpal Wabbit with hinge loss.

The file format containing the data is Matlab (.mat [pdf]).

Brain activity

Image is from the University of Utah page on their MEG system.

The data was collected during a study described in the paper A parametric empirical Bayesian framework for the EEG/MEG inverse problem: generative models for multi-subject and multi-modal integration. Even more information about these studies is available in the paper MEG Decoding Across Subjects.

MEG scan

Figure from the Bayesian framework paper

This competition is associated with the the 19th International Conference on Biomagnetism (Biomag 2014).

Munging the data

We have the data conveniently  split up in .zip files. For every participant there is a specific .mat file of about 260MB (for example train_subject03.mat). We can load .mat files in Python with


If you have problems downloading the data for this competition have a look at this thread with tips for download managers and how to download Kaggle files with wget.

Feature engineering

Using the simple benchmark code we generate the exact same features:

  • We reduce the time series data to capture 0.5 second directly after the stimulus was presented (restricting the time window to 125 bins).
  • We concatenate 306 time series of each trial in one long vector–This gives us 38.250 features per trial.
  • We also normalize each feature independently (z-scoring).

We end up with two data sets: face.train.vw (5.2GB) and face.test.vw (2.2GB). A single line from face.train.vw will look like:

-1 '0 |f 0:-1.51399 1:-1.37631 ... 38248:0.583798 38249:1.61402

We are going to classify in binary mode, so we change label 0 to label -1.

If you want to follow along you can use the Python script

Vowpal Wabbit

Now it is time for Vowpal Wabbit to do its magic. We did not expect to outperform scikit-learn’s in-memory logistic regression with online learning, yet we did on the first attempt. We use Vowpal Wabbit 7.6.1 inside Cygwin (other versions may give slightly different results).

Training a model

With the following command we create a model using the train data:

./vw face.train.vw -c -k --passes 60 --loss_function hinge --binary -f face.model.vw


  • ./vw is the Vowpal Wabbit executable
  • face.train.vw is the train data set
  • -c -k --passes 60 tells VW to use a cache, kill existing cache and run 60 passes
  • --loss_function hinge tells VW to use hinge loss
  • --binary puts VW in binary mode (good results for binary classification tasks like these)
  • -f face.model.vw tells VW to save the model

Newer versions of Vowpal Wabbit will stop doing passes when the average loss does not improve in 3 consecutive passes. So while we tell Vowpal Wabbit to run 60 passes it will actually do only 9 passes. We get an average loss of 0.252922 h. h stands for holdout [pdf]: a functionality added to Vowpal Wabbit 7.4 by Zhen Qin.

Holdout graph

A graph illustrating the benefits of the holdout functionality.


Now we make predictions on the test set:

./vw face.test.vw -t -i face.model.vw -p face.preds.txt


  • ./vw is the VW executable
  • face.test.vw is our test set
  • -t says to test only (no learning)
  • -i face.model.vw tells VW which model to use
  • -p face.preds.txt saves the predictions inside a file

Vowpal Wabbit creates 4058 predictions (-1 or 1) in a few seconds.


We can graph the time series with matplotlib. We also have a flattened layout file for all the MEG sensors. In the future we will use those 2-D coordinates to create an animation, for now we generate a plot with the brain activity as recorded by the different channels:

Brain activity plot

You could use the script to replicate above image.


To get our leaderboard score we have to turn the predictions made by Vowpal Wabbit into the Kaggle submission format. You can use the script to do this.

We went from raw data to submission in about 20 minutes (16 minutes for the munging stage, 4 minutes for model building and predicting). We never used more than 280MB of memory!

Evaluation metric for this competition is prediction accuracy (the total number of correct predictions / the total number of test cases). We score about 2% higher than the benchmark for a public leaderboard score of ~0.66100 (currently second place).

Note that the test set is comprised of completely different participants/subjects than the train set–there is no random sampling. This may drastically change the scores between the public leaderboard and the private leaderboard. We think it would pay if we study methods to identify participants with similar brain activities.


Code for this blog post is available at the MLWave Github. Most of the feature generation part was taken (unmodified or slightly modified) from the competition’s Python benchmark script, which is “Copyright Emanuele Olivetti 2014, BSD license, 3 clauses.”


This competition is organized by Emanuele Olivetti, Mostafa Kia and Paolo Avesani (NeuroInformatics Lab, Fondazione Bruno Kessler and Università di Trento, IT).

The competition is sponsored by Elekta Oy, MEG International Services Ltd (MISL), Foundation Bruno Kessler, and Besa. The people Daniel Wakeman (Martinos Center, MGH, USA), Richard Henson (MRC/CBU, Cambridge, UK), Ole Jensen (Donders Institute, NL), Nathan Weisz (University of Trento, IT) and Alexandre Gramfort (Telecom ParisTech, CNRS, CEA / Neurospin) contributed to the competition.

MLWave would like to thank FastML for their useful articles on Vowpal Wabbit and Emanuele Olivetti for writing the helpful benchmark code in Python.

The intro image for this post came from Wikimedia Commons and is released into the public domain, credit to “National Institute of Mental Health, National Institutes of Health, Department of Health and Human Services”.


  • Animate the brain activity from all the sensors.
  • Try neural networks for prediction, because we need to go deeper.
  • More advanced feature generation.
  • Algorithm tweaking/grid search.
  • Bootstrapping multiple models trained on smaller chunks of data.

Leave a Reply

Your email address will not be published. Required fields are marked *