The ESPOSALLES Database

The Marriage Licenses ground-truth is compiled from the Marriage Licenses Books conserved at the Archives of the Cathedral of Barcelona.

The Marriage Register Books are composed of 291 books with information of approximately 600,000 unions celebrated in 250 parishes between 1451 and 1905. One example can be seen in Figure 1. In addition to the marriage licenses, each book includes an index with all the husband’s family names and the page number where the marriage information appear. In some cases (see Figure 2), the wife’s family name is also included.

Each marriage license (see Figure 3) contains information about the husband’s occupation, husband’s and wife’s former marital status, socioeconomic position signaled by the fee imposed on them,  and in some cases, fathers’ occupations,  place of residence or geographical origin.

Figure 1. Esposalla
Figure 2. Index
Figure 3. Detailed view

Database

The original documents have been digitized at 300 dpi in true colors. The ground-truth contains two different kind of documents.

Old Marriage Records

It is composed of one volume, written by the same writer. It contains 173 pages, 1,747 registers, 5,447 lines. For each page, we provide one image for each line and the corresponding transcription in one separate text file. We also provide one image for each register, which has been generated by concatenating the lines that belong to the same register into one single line.

Indices

It contains the indexes of two volumes, containing a total of 29 pages. Each page is divided vertically into two columns, and each column divided into lines, yielding a total of 1,563 lines. For each column in the page, we provide one image for each line (with the information of the surname and page number) and the corresponding transcription in one text file.

Partitions

For facilitating the comparison among different approaches, we devised the following partitions:

Old Marriage Records

For the old marriage records, we have proposed 7 partitions.

P0P1P2P3P4P5P6
Pages25252525252523
Registers256246246249243255252
Lines827779786768771773743
Run. words8893859588028506857277998610
OOV426374368340329373317
Lexicon1119109611061036104610781011
Characters48464464594790245728461354752946012

Indices

For the indices, we have proposed 4 partitions.

P0P1P2P3
Text lines390391391391
Words1629164016321633
Characters7629781775547809
OOV326346298350

Getting the ground truth datasets

The following  ground truth datasets may only be used for non-commercial and research purposes.  For other purposes, please contact us first. Additionally, if you use this ground truth in your scientific work or publications , please cite this work as follows:

  • V. Romero, A. Fornés, N. Serrano, J.A. Sánchez, A.H. Toselli, V. Frinken, E. Vidal, J. Lladós. “The ESPOSALLES Database: An Ancient Marriage License Corpus for Off-line Handwriting Recognition”, Pattern Recognition, Volume 46, Issue 6, Pages 1658–1669, 2013. (DOI:http://dx.doi.org/10.1016/j.patcog.2012.11.024).

You can found the new datasets in this link: Information Extraction in Historical Handwritten Records

If you have any questions or suggestions, please contact Alicia Fornes.