Format & Download

CD cover images are stored as .jpg images. Labels are stored in .xml files.

Original images are identified by a unique name, composed of the name of the subset of images (training, validation, test) and a unique consecutive sequence number, for example: training_0001.jpg.

For each image file (training_0001.jpg) there exists a xml file (training_0001.xml) with the following information: the name of the image, text regions (defined by their rectangular bounding box vertices), artist, title of the CD and original URL of the image.

Download instructions:

The database is provided as a unique ZIP file with the images corresponding to the Training and Validation sets, and some Matlab scripts to make easy the access to the database and the evaluation process. See the README.TXT included in the file.

Click here to dowload the database. On the results section you will find the Test images which must be used for evaluation purposes.

XML description:

Original Image (Validation_0001.jpg)


<image>
<file>
Validation_0001.jpg
</file>
<path>
Images/Validation/
</path>
<imageInfo>
<artist>
TheBeatles
</artist>
<productname>
TheBeatles(TheWhiteAlbum)
</productname>
<asin>
B000002UAX
</asin>
<url>
http://ecx.images-amazon.com/images/I/21IVvn7zGAL.jpg
</url>
</imageInfo>
<textRegions>
<name>
Text
</name>
<region>
<point>
<x>
166.8818
</x>
<y>
180.557
</y>
</point>
<point>
<x>
294.5171
</x>
<y>
180.557
</y>
</point>
<point>
<x>
166.8818
</x>
<y>
155.0299
</y>
</point>
<point>
<x>
294.5171
</x>
<y>
155.0299
</y>
</point>
</region>
<feature>
<namef>
Posicio
</namef>
<value>

</value>
</feature>
</textRegions>
</image>