The goal of this benchmark is to apply active and online learning methods to the well-known MNIST database of handwritten digits. This database has been used as a reference in many works about classification and very good results (0.4 error rate) have been reported. Thus, the expected ideal results of the benchmark would consist in obtaining similar results as the reference methods, but using a much less number of instances from the training set.
Given the whole training set (without labels), the participants must predict as well as possible all the labels of the test set, by requesting as few labels of the training set as possible. The maximum number of labels that participants will be able to request from the whole training set will be limited to 5.000. A common initial set of 50 labeled samples will be provided to all participants. Using this initial set, participants will have to iteratively select the most discriminative samples from the training set in order to train the classifier. The expected result will be a sorted list of 5.000 samples.
Although participants will have access to a validation set in order to adjust the parameters of their methods, the final test set will be unknown to the participants. Results (see evaluation section) will be obtained by running common classifiers on the blind test set, trained with increasing learning sets built by taking the samples according to the sorted list provided by each participant. In order to limit the random effect of the initial labeled set, 10 different initial sets will be provided. Participants will be asked to return a sorted list of samples for each initial set.
Initially a common set of 5 samples of every class is labeled and provided to the system. Then, 10.000 selected samples of the training set (without their label) are shown to the system one by one. The system has to predict the label of every sample. After the prediction is done, the true label is revealed to the system so that it can modify the learning parameters according to the success or failure of the prediction. The goal (see evaluation section) will be to minimize the total number of prediction errors.