brain activity

Kaggle Connectomics: Python Benchmark Code

For the Connectomics contest on Kaggle the task is to write a brain connectivity estimator using neuron activation time series data. Benchmark code for Discretization Pearson Correlation was available in C++ and Matlab. Now here in Python too!

This article is under construction for the duration of the contest. The competition admins have released their own Python with correlation benchmark code. Check out their Github repo.

About the contest

The goal of the Connectomics contest hosted by Kaggle is:

Reconstruct the wiring between neurons from fluorescence imaging of neural activity.

The contest is brought to us by Challenges in Machine Learning (ChaLearn)

The contest admins have kindly provided benchmark sample code, a 5 minute tutorial, and references.

The gist

Basically you get a data set of 1,000 neuron activity time series and you are tasked to predict the connections between these neurons. This can be called a brain connectivity estimator. The neurons can be grown in vivo and the activity recorded with the help of fluorescence.

Recently, researchers have been able to record in vivo the activity of the brain of a zebrafish embryo in 80% of its 100,000 neurons. This video shows the simulated activities:

Parsing the data

Parsing the data is done with the functions in brainparse.py. All available data (Neuron activities, Neuron positions, Neuron connections) is stored in dictionaries.

Use import brainparse as bp at the top of your model scripts and make use of these functions:

Timeseries

Use series_dict = bp.parse_time_series("c:\data\fluoride.txt") to create a time series dictionary from a file location.

A time series dictionary looks like {neuron_id: list of floats} so for example time_series[5] = [0.14,0.15,0.14,0.78,0.88 ... ]

Discretization

Use discrete_dict = bp.discretize_time_series(series_dict, threshold=0.12) to apply discretization to a time series dictionary with a threshold of 0.12. Discretization can be done by first applying numpy.diff to a series. The diff() of [0.1,0.5,0.4] is [0.4,0.1]. Then apply the threshold. If diff at n is larger than threshold=0.12 put 1 else 0. [0.4,0.1] with binary threshold applied is [1,0].

Neural Connections

Use neuron_connections, blocked = bp.parse_neuron_connections("c:\neuralconnections.txt") to create a dictionary with the neuron_connections and an integer blocked holding the total of blocked (-1) connections.

Neuron connections is a dictionary in the shape dict[(neuron_i, neuron_j)] = int connection.

Int connection can either be -1 meaning blocked. 1 means a connection is present and 0 means no connection is present.

Neuron positions

Input a position file location, output a dictionary with dict[neuron_id] = (x,y) where x and y are scaled between 0 and 500 (from an original scale of 0,1000). This allows you to visually layout the network, check the cluster density or calculate (Euclidean) distance between individual neurons.

The Correlation with Discretization Benchmark

You can find the Python Correlation Benchmark at GitHub (model.py). First it parses the data needed to make this model: A time series dictionary with discretization (threshold at 0.12), and a dictionary with the neuron positions.

Then the matrix of neurons (100×100 or 1000×1000) is traversed. The correlation between two timeseries is calculated with scipy.stats.pearsonr.

Pearson correlation for valid_5_100 or “the connection between neuron 5 and neuron 100 in the validation set” is the same as valid_100_5, so this benefits from caching using a cache dictionary.

Python Code

All code for this benchmark is available at Github. Run-time for a data set of 1000 neurons (1 million connection estimations) is about 5 hours and 52 minutes 38 minutes on a laptop. The script uses at most ~5.5GB less than 1GB of memory and give a public leaderboard score of ~0.87337 AUC.

Visualization & Animation

The parse script can also be used to create visualizations and animations. brainsimulator.py uses PyGame to animate a neural network of 100 neurons.

It can produce animations like the one below:

Neural network animation

Another script neuronplot.py uses matplotlib to graph neuron activities. It can be used to produce graphs like below:

graphed neuron activities

Acknowledgements

ML Wave would like to thank Kaggle, its competitors and ChaLearn for this wonderful competition, tips on the forum and data sets.

3 thoughts on “Kaggle Connectomics: Python Benchmark Code”

Leave a Reply

Your email address will not be published. Required fields are marked *