The Marriage Licenses ground-truth is compiled from the Marriage Licenses Books conserved at the Archives of the Cathedral of Barcelona.
The Marriage Register Books are composed of 291 books with information of approximately 600,000 unions celebrated in 250 parishes between 1451 and 1905. One example can be seen in Figure 1. In addition to the marriage licenses, each book includes an index with all the husband’s family names and the page number where the marriage information appear. In some cases (see Figure 2), the wife’s family name is also included.
Each marriage license (see Figure 3) contains information about the husband’s occupation, husband’s and wife’s former marital status, socioeconomic position signaled by the fee imposed on them, and in some cases, fathers’ occupations, place of residence or geographical origin.
Database
The original documents have been digitized at 300 dpi in true colors. The ground-truth contains two different kind of documents.
Old Marriage Records
It is composed of one volume, written by the same writer. It contains 173 pages, 1,747 registers, 5,447 lines. For each page, we provide one image for each line and the corresponding transcription in one separate text file. We also provide one image for each register, which has been generated by concatenating the lines that belong to the same register into one single line.
Indices
It contains the indexes of two volumes, containing a total of 29 pages. Each page is divided vertically into two columns, and each column divided into lines, yielding a total of 1,563 lines. For each column in the page, we provide one image for each line (with the information of the surname and page number) and the corresponding transcription in one text file.
Partitions
For facilitating the comparison among different approaches, we devised the following partitions:
Old Marriage Records
For the old marriage records, we have proposed 7 partitions.
P0 | P1 | P2 | P3 | P4 | P5 | P6 | |
---|---|---|---|---|---|---|---|
Pages | 25 | 25 | 25 | 25 | 25 | 25 | 23 |
Registers | 256 | 246 | 246 | 249 | 243 | 255 | 252 |
Lines | 827 | 779 | 786 | 768 | 771 | 773 | 743 |
Run. words | 8893 | 8595 | 8802 | 8506 | 8572 | 7799 | 8610 |
OOV | 426 | 374 | 368 | 340 | 329 | 373 | 317 |
Lexicon | 1119 | 1096 | 1106 | 1036 | 1046 | 1078 | 1011 |
Characters | 48464 | 46459 | 47902 | 45728 | 46135 | 47529 | 46012 |
Indices
For the indices, we have proposed 4 partitions.
P0 | P1 | P2 | P3 | |
---|---|---|---|---|
Text lines | 390 | 391 | 391 | 391 |
Words | 1629 | 1640 | 1632 | 1633 |
Characters | 7629 | 7817 | 7554 | 7809 |
OOV | 326 | 346 | 298 | 350 |
Getting the ground truth datasets
The following ground truth datasets may only be used for non-commercial and research purposes. For other purposes, please contact us first. Additionally, if you use this ground truth in your scientific work or publications , please cite this work as follows:
- V. Romero, A. Fornés, N. Serrano, J.A. Sánchez, A.H. Toselli, V. Frinken, E. Vidal, J. Lladós. “The ESPOSALLES Database: An Ancient Marriage License Corpus for Off-line Handwriting Recognition”, Pattern Recognition, Volume 46, Issue 6, Pages 1658–1669, 2013. (DOI:http://dx.doi.org/10.1016/j.patcog.2012.11.024).
Item | Size | Description |
Old Marriage Records | 123 Mb | A subset of the database, including the whole page, the segmented lines, the single-line registers, the transcriptions, and the partitions. |
Indices | 50 Mb | A subset of the database, including the whole page, the segmented lines, the transcription, and the partitions. |
Segmented Lines | 55 Mb | A subset of the database that includes the coordinates of the segmented blocks and text lines. |
There is a new version of this database, used in a international competition: link here (http://www.cvc.uab.es/5cofm/competition/)
If you have any questions or suggestions, please contact Alicia Fornes.