The ESPOSALLES Database

The Marriage Licenses ground-truth is compiled from the Marriage Licenses Books conserved at the Archives of the Cathedral of Barcelona.

The Marriage Register Books are composed of 291 books with information of approximately 600,000 unions celebrated in 250 parishes between 1451 and 1905. One example can be seen in Figure 1. In addition to the marriage licenses, each book includes an index with all the husband’s family names and the page number where the marriage information appear. In some cases (see Figure 2), the wife’s family name is also included.

Each marriage license (see Figure 3) contains information about the husband’s occupation, husband’s and wife’s former marital status, socioeconomic position signaled by the fee imposed on them,  and in some cases, fathers’ occupations,  place of residence or geographical origin.

Figure 1. Esposalla
Figure 2. Index
Figure 3. Detailed view

Database

The original documents have been digitized at 300 dpi in true colors. The ground-truth contains two different kind of documents.

Old Marriage Records

It is composed of one volume, written by the same writer. It contains 173 pages, 1,747 registers, 5,447 lines. For each page, we provide one image for each line and the corresponding transcription in one separate text file. We also provide one image for each register, which has been generated by concatenating the lines that belong to the same register into one single line.

Indices

It contains the indexes of two volumes, containing a total of 29 pages. Each page is divided vertically into two columns, and each column divided into lines, yielding a total of 1,563 lines. For each column in the page, we provide one image for each line (with the information of the surname and page number) and the corresponding transcription in one text file.

Partitions

For facilitating the comparison among different approaches, we devised the following partitions:

Old Marriage Records

For the old marriage records, we have proposed 7 partitions.

P0 P1 P2 P3 P4 P5 P6
Pages 25 25 25 25 25 25 23
Registers 256 246 246 249 243 255 252
Lines 827 779 786 768 771 773 743
Run. words 8893 8595 8802 8506 8572 7799 8610
OOV 426 374 368 340 329 373 317
Lexicon 1119 1096 1106 1036 1046 1078 1011
Characters 48464 46459 47902 45728 46135 47529 46012

Indices

For the indices, we have proposed 4 partitions.

P0 P1 P2 P3
Text lines 390 391 391 391
Words 1629 1640 1632 1633
Characters 7629 7817 7554 7809
OOV 326 346 298 350

Getting the ground truth datasets

The following  ground truth datasets may only be used for non-commercial and research purposes.  For other purposes, please contact us first. Additionally, if you use this ground truth in your scientific work or publications , please cite this work as follows:

  • V. Romero, A. Fornés, N. Serrano, J.A. Sánchez, A.H. Toselli, V. Frinken, E. Vidal, J. Lladós. “The ESPOSALLES Database: An Ancient Marriage License Corpus for Off-line Handwriting Recognition”, Pattern Recognition, Volume 46, Issue 6, Pages 1658–1669, 2013. (DOI:http://dx.doi.org/10.1016/j.patcog.2012.11.024).

 

Available datasets
Item Size Description
Old Marriage Records 123 Mb A subset of the database, including the whole page, the segmented lines, the single-line registers, the transcriptions, and the partitions.
Indices 50 Mb A subset of the database, including the whole page, the segmented lines, the transcription, and the partitions.
Segmented Lines 55 Mb A subset of the database that includes the coordinates of the segmented blocks and text lines.

There is a new version of this database, used in a international competition: link here (http://www.cvc.uab.es/5cofm/competition/)

If you have any questions or suggestions, please contact Alicia Fornes.