Some experiments and research into applying Normalized Compression Distance (NCD) to chess games. The goal is to cluster the games of different grand masters. Using Bzip as the compressor we get an accuracy of 71% on 21 files from 7 different grand masters.
NCD? A short introduction
NCD, in layman terms, works on the principle that two files which share data patterns compress better when you add them together, than two files which don’t share data patterns, since compressors are more efficient when dealing with repeating data patterns.
For the Connectomics contest on Kaggle the task is to write a brain connectivity estimator using neuron activation time series data. Benchmark code for Discretization Pearson Correlation was available in C++ and Matlab. Now here in Python too!
This article is under construction for the duration of the contest. The competition admins have released their own Python with correlation benchmark code. Check out their Github repo.
About the contest
The goal of the Connectomics contest hosted by Kaggle is:
Reconstruct the wiring between neurons from fluorescence imaging of neural activity.
The contest is brought to us by Challenges in Machine Learning (ChaLearn)
The contest admins have kindly provided benchmark sample code, a 5 minute tutorial, and references.
Kaggle is hosting a contest where the task is to predict survival rates of people aboard the titanic. A train set is given with a label 1 or 0, denoting ‘survived’ or ‘died’. We are going to use Vowpal Wabbit to get a score of about 0.79426 AUC (top 10%).
In this Kaggle contest, they ask you to complete the analysis of what sorts of people were likely to survive. In particular, they ask you to apply the tools of machine learning to predict which passengers survived the tragedy.