Original MNIST dataset
Test data
For active learning, test data consists of the whole training data from the original dataset (used to select the subset of samples used for learning), 10 different initial sets of labeled samples and a test set. This test set is provided only for validation purposes. The final test set used for final evaluation of methods will be blind for participants.
For online learning, test data consists of a set of 10.000 samples for which the labels has to be predicted sequentially.
Feature extraction
Images have been processed to obtain one set of common features. The feature extraction method consists of applying PCA to the original images, yielding to feature vectors of 50 dimensions. Participants will have to use this common set of features to guarantee a fair comparison between methods focusing on active and online learning.
Download
Training data for active learning: matlab, text file
Test set for validation active learning: matlab, text file