Datasets

 

ICDAR /GREC 2011 competition

In the last years, there has been a growing interest in the analysis of handwritten music scores. In this context, the focus of interest is two-fold: the recognition of handwritten music scores (Optical Music Recognition), and the identification (or verification) of the authorship of an anonymous music score. In this sense, our goal is to foster interest in the analysis of handwritten music scores by the proposal of two different competitions.

ICDAR /GREC 2011 competition

 

 

CVC-MUSCIMA

The CVC-MUSCIMA database contains handwritten music score images, which has been specially designed for writer identification and staff removal tasks.The database contains 1,000 music sheets written by 50 different musicians. All al them are adult musicians, in order to ensure that they have their own characteristic handwriting style. Each writer has transcribed the same 20 music pages, using the same pen and the same kind of music paper (with printed staff lines). The set of the 20 selected music sheets contains music scores for solo instruments and music scores for choir and orchestra.
Furthermore, for the staff removal task, each music page has been distorted using different transformation techniques, which, together with the originals, yield a grand total of 12,000 images.

The CVC-MUSCIMA Database

 

ICDAR2011 Robust Reading

The CVC-MUSCIMA database contains handwritten music score images, which has been specially designed for writer identification and staff removal tasks.The database contains 1,000 music sheets written by 50 different musicians. All al them are adult musicians, in order to ensure that they have their own characteristic handwriting style. Each writer has transcribed the same 20 music pages, using the same pen and the same kind of music paper (with printed staff lines). The set of the 20 selected music sheets contains music scores for solo instruments and music scores for choir and orchestra.

 ICDAR 2011 Robust Reading Competiton

 

The Marriage Licenses ground-truth

The Marriage Licenses ground-truth is compiled from the Marriage Licenses Books conserved at the Archives of the Cathedral of Barcelona.The Marriage Register Books are composed of 291 books with information of approximately 600,000 unions celebrated in 250 parishes between 1451 and 1905. In addition to the marriage licenses, each book includes an index with all the husband’s family names and the page number where the marriage information appear. In some cases (see Figure 2), the wife’s family name is also included.

The Marriage Licenses ground-truth

 

 

 

 

MIPRCV Documents

The MIPRCV demo page and management of datasets

 

 

Tools (consolider/ingenio)

 

MIPRCV Ground Truth

Ground truth datasets

 

 

gt(consolider/ingenio)

 

MIPRCV CD Covers

This dataset is composed of 6.000 CD/DVD cover images and some associated labels. The term “cover” refers to the font-facing panel of a CD/DVD package, and, increasingly, the primary image accompanying a digital download of the album, or of its individual tracks.

 

cd covers (MIPRCV)

 

 

 

MIPRCV MNIST

The goal of this benchmark is to apply active and online learning methods to the well-known MNIST database of handwritten digits. This database has been used as a reference in many works about classification and very good results (0.4 error rate) have been reported. Thus, the expected ideal results of the benchmark would consist in obtaining similar results as the reference methods, but using a much less number of instances from the training set.

 MNIST (MIPRCV)

 

 

CVC-FP: Floor plan database for structural analysis

The collection consists of 122 scanned floor plans documents divided in 4 diff erent subsets  regarding their origin and style.

This dataset is fully groundtruthed for the structural symbols: rooms, walls, doors, windows, parking doors, and room separations. The GT not only makes specific their locations in the images, but also includes structural relations between them.

We include the conceived tool for efficient structural labeling. This tool, named SGT, can easily be installed in any web server and an simple user administration system allows the collaborative ground truth task.

CVC-FP & SGT tool