May | 2017 | XARXES

Low poly colorful world map. Vintage

Credit: Freepik.

Teaching computers to understand handwritten documents can help bringing our past to life, giving value to billions of documents currently stored in thousands of archives and libraries all over Europe. These document collections are not only composed of text documents, but also drawings, pictures, scientific cards, maps or music scores.

In every single historical building in Europe, there are ancient documents with nominal information that allows drawing the history of our ancestors or the migratory waves in a particular context. As our CVC researchers see it, it is bringing a huge amount of information to the surface and thus understanding the evolution of the European society through its featured actors: ourselves.

XARXES is funded by Recercaixa, the social program of La Caixa dedicated to research, and it is the continuation of another joint project (EINES) between CVC and the Centre for Demographic Studies (CED), of the Universitat Autònoma de Barcelona . “By using computer vision as an enabling technology, we are going to include computational methods in order to automatically incorporate contextual and semantic information to the original population sources”, states Dr. Alicia Fornés, fellow researcher at the CVC/UAB. This multidisciplinary project is coordinated by Dr. Fornés (CVC) and Dr. Joana Maria Pujadas (CED).

“Secondly, we will use record linkage techniques to connect different sources to be able to construct historical social networks”, appoints Dr. Fornes, “and finally, we will actively incorporate the participation of citizens and archivists, both in the extraction of demographic information through gamification, and in the design of new user experiences”. The objective is to facilitate the consumption and dissemination of the historical knowledge in an illustrative and pedagogic way. But at the same time, “social and historical researchers will enjoy of having open databases, something not still really common in Spain in comparison with Sweden or United States” states Dr. Pujadas.

Information extraction from historical documents

“The concept of information extraction can be explained in the following way: it is reading a document image, understanding the meaning of what is written in it, and filling a database. The idea is that you can automatically extract the knowledge of what is contained in the documents, and store it into a database. Consequently, all this information is accessible, ‘searchable’, and available not only for academia, but also open to society” Dr. Fornés explains.

In Europe, Europeana estimates that only the 10% of the documents that are stored in archives or museums, are digitized. “From this 10%, which is already an incredibly huge number, the amount of documents that are transcribed and stored into datasets is really, really ridiculous”. Therefore, instead of using an incredible amount of human resources to actually extract and read this information, the idea we have is to use document image analysis techniques to automatically process the information contained in these collections as much as possible. Instead of a mere transcription, the aim is to understand what is being transcribed and thus automatically extract and store the information contained.

Currently, the recognition of text in historical manuscripts is quite challenging due not only to the physical degradation of paper, but also the handwriting style (which is quite different from person to person) and the use of old, regional vocabulary and dialects. “This is the reason we’ve also decided to use word spotting techniques, which are defined as the detection of key words in in manuscripts in order to search and index the information contained”, as stated by Dr. Fornes.

How demographic networks are built

A typical application scenario of information extraction is demographic documents, since they contain people’s names, birthplaces, dates, occupations, etc. In this scenario, the extraction of the key contents and its storage into databases allows the access to their contents and envision innovative services based in genealogical, social or demographic searches.

Once the information is extracted, demographic networks will be set up by record linkage and videogames. Record linkage refers to the task of combining data from same people across different data sources (as can be databases, books or websites). For example, one individual may appear in different documents, such as censuses, birth/marriage/death certificates.

And videogames. “The idea is to use videogames to motivate people to help in the transcription and information linkage”. Transcription is a key aspect because it is used to teach computers how to read, and also, to learn that a certain word means something specific (e.g. surname, place, etc.). The more we transcribe, the more the computer learns towards an almost automatic transcription. By using videogame techniques, the more a person plays, the further the transcription is improved. Therefore, we transform an automatic, and potentially monotonous process into a motivating activity.

Within this project, the municipalities that will be linked are located in the surroundings of Barcelona, and they will work as a primary experiment. The same model will then be replicable within the rest of Spain or Europe. The final aim would be to interconnect all the information stored in all the local archives that we have spread all over the old continent.

In this last step, the knowledge and expertise of the Centre for Demographic Studies of the UAB is crucial. Demographers will put all this information into context and interpret the demographic networks in order to narrate our history through the centuries stored at our local libraries.

Project supported by:

Notícia que trobareu a: http://www.cvc.uab.es/outreach/?p=291

En els darrers anys les anomenades “Humanitats Digitals”, és a dir, l’aplicació de les Tecnologies de la Informació a les disciplines humanístiques i socials, han esdevingut una peça clau en la recerca i la transferència que s’està duent a terme a la UAB. Gràcies a ser una universitat generalista, una gran part de les disciplines que s’inclourien dins les HD ja estan presents al campus dins les activitats dels grups, centres de Recerca i serveis cientificotècnics. La conjunció del món de les Humanitats i les Ciències Socials amb les tecnologies era una qüestió de temps que ja fa anys que es va assolir. En aquest sentit la recent creació de la Xarxa d’Humanitats Digitals de la UAB i la seva Esfera no és més que la voluntat de tots aquests grups i centres de continuar col·laborant per tal d’eixamplar el número de projectes i grups implicats, buscant sinèrgies i establint noves formes de relació amb la societat i amb el territori.

Les Humanitats Digitals a la UAB abasten una gran llista de camps de coneixement que van des de l’Art, les Filologies, la Història, la Filosofia, la Geografia, l’Arqueologia, la Musicologia, l’Antropologia, l’Educació, la Traducció, la Sociologia, la Comunicació, el Dret, o les Ciències Polítiques i Econòmiques. Aquests amb la col·laboració de departaments com els de les Ciències de la Computació, Telecomunicacions i d’altres centres de recerca de tall més tecnològic com el Centre de Visió per Computador o l’Institut d’Investigació en Intel·ligència Artificial han esdevingut el catalitzador de projectes de recerca i de transferència punters en l’àmbit nacional i internacional.

Aquesta considerada “disciplina interdisciplinar” aplica les noves tecnologies com la digitalització de textos, imatges, objectes i processos, la intel·ligència artificial, la visió per computador, la mineria de dades, el big data, la publicació de processos i resultats en web, software i hardware especialitzat a la recerca i la transferència dels resultats de les Ciències Humanes i Socials, reforçant la transversalitat de la investigació i unint grups de recerca de ciències humanes i de tecnologia.

Per posar uns exemples, en les darreres convocatòries del programa RecerCaixa s’han finançat fins a set projectes de recerca de la UAB orientats a les HD, recerques que van des de l’aplicació de la tecnologia en la construcció de xarxes socials històriques, l’ús de la realitat virtual i la intel·ligència artificial per entendre la vida al neolític o l’aplicació de jocs seriosos per a la transmissió del patrimoni cultural, fins a noves eines aplicades a la protecció de menors a internet o una eina per a la construcció d’un coneixement col·lectiu a internet sobre la societat en xarxa catalana. En tots aquests projectes finançats la col·laboració entre els grups de recerca dels diferents àmbits ha estat cabdal.

El dia internacional de les Humanitats Digitals és, doncs, una oportunitat per poder compartir projectes i buscar sinèrgies en l’àmbit internacional. Podeu consultar totes les activitats i participar en la jornada a http://dayofdh2017.linhd.es/?lang=es.

Informacions sobre el projectes de la UAB finançats pel programa Recercaixa orientats a les Humanitats Digitals:

XARXES: Tecnologia i innovació ciutadana en la construcció de xarxes socials històriques per a la comprensió del llegat demogràfic. Alicia Fornés Bisquerra (Departament de Ciències de la Computació, Centre de Visió per Computador) i Joana Maria Pujadas-Mora (Centre d’Estudis Demogràfics).

EINES: Eines i procediments per a la informatització massiva de les fonts històriques de població. Josep Lladós Canet, Centre de Visió per Computador (CVC) i Albert Esteve Palós, Centre d’Estudis Demogràfics (CED).

Reconstrucció digital del passat prehistòric: realitat virtual i intel·ligència artificial per entendre la vida social al neolític. Raquel Piqué, Departament de Prehistòria UAB, i Juan Antonio Rodríguez, CSIC.

ARREL. Aplicació de jocs seriosos en entorns col∙laboratius per a la transmissió del patrimoni cultural de Catalunya. Josep Maria Macias Solé, Institut Català d’Arqueologia Clàssica, i Juan José Ramos González, Departament de Telecomunicació i d’Enginyeria de Sistemes, UAB.

HumanismePlural.com: Investigació i Desenvolupament d’una eina a Internet per a la participació en la construcció de Coneixement col·lectiu sobre la societat xarxa a Catalunya. Laboratori de Periodisme i Comunicació per a la Ciutadania Plural (LPCCP). Departament de Periodisme i Ciències de la Comunicació UAB.

MediaKids: aplicació per augmentar la participació i la coresponsabilitat de la societat civil en la prevenció dels riscos que generen les TIC. Emma Teodoro i Jorge González, l’Institut de Dret i Tecnologia (IDT-UAB)

PADICAT: Patrimoni Arqueològic Digital de Catalunya. Juan Antonio Barceló, Departament de Prehistòria UAB.

Trobareu la notícia original al següent link: UABDivulga

XARXES

Tecnologia i innovació ciutadana en la construcció de xarxes socials històriques per a la comprensió del llegat demogràfic

Monthly Archives: May 2017

XARXES: Connecting The Lives Of Our Ancestors

L’aplicació de les TI, peça clau en la recerca humanística i social a la UAB