Wednesday 24 March 2021
12.30 - 13.45
Building of Demographic Databases II
Elisabeth Engberg :
The National SwedPop-initiative: Merging Five Large Databases into a National Resource for Demographic Research
The long-term Swedish demographic data is unique in an international perspective, in terms of detail, quality and coverage, and is an extraordinary asset for studies of demographic, social and economic processes on different levels of society. Since the 1970s and forward large databases have been constructed such as the DDB ... (Show more)
The long-term Swedish demographic data is unique in an international perspective, in terms of detail, quality and coverage, and is an extraordinary asset for studies of demographic, social and economic processes on different levels of society. Since the 1970s and forward large databases have been constructed such as the DDB in Umeå and the SEDD in Lund, that have been used for numerous studies of the demographic, social and economic processes that transformed population, health and living conditions in the past. Yet, the full research potential of the Swedish data has until now been hampered by the diversity of existing databases. Developed within various research projects, with different coding and database structure, data is difficult to use for comparative purposes and dataset from the different database cannot immediately be linked together. The SwedPop consortium was established in 2016, with the aim to harmonize, standardize and merge existing databases as well as to further increase the quality of the data by filling strategic gaps in time and space through a supplementary digitization of new records. In the SwedPop-project, longitudinal microdata in the Umeå and Lund databases will be merged with full count census data and population registers with data from Stockholm and Gothenburg, the two largest cities in Sweden, forming a coherent data infrastructure with harmonized data, a standardized data format and easy access, through a common web-portal. (Show less)
Daniela Marza, Ioan Bolovan :
Historical Population Database of Transylvania (HPDT) - a Valuable Tool for Family Reconstitution in Transylvania, 1850-1914
Family reconstitution is the process of reconstructing historical data on family membership, the relationships among family members, and family change over time from often incomplete registers of vital events and similar sources. The techniques of family reconstitution are an important part of the tool kit of historical demographers (Encyclopedia of ... (Show more)
Family reconstitution is the process of reconstructing historical data on family membership, the relationships among family members, and family change over time from often incomplete registers of vital events and similar sources. The techniques of family reconstitution are an important part of the tool kit of historical demographers (Encyclopedia of Population, The Gale Group, 2003).
This study aims to show the usefulness of HPDT for family restitution. Out of the localities included in the database, Ocna Mure? was selected, due to its more numerous and diverse population.
Ocna Mure? was a small industrial town in Transylvania. Among its enterprises were a salt mine and a soda factory. It was also a popular spa during that period, as it was well-linked from an infrastructural viewpoint (a railway line was constructed in 1872). Ocna Mure? was a multi-denominational and multi-ethnical community: the majority of its inhabitants were Greek-Catholics, followed closely by Calvinists and Roman-Catholics.
This paper?s aim is a better understanding of the dynamics of family relationships within the extended family (including grandparents, brothers, uncles, aunts, cousins), of the alliances between families; a clue, in this regard, was the recurrence of some of the given names along generations. This paper aims to investigate a part of the social networks in the community.
As a methodology, this reconstitution process starts from birth records: couples are reconstituted from their father's name along with their mother's. The information thus obtained is continuously corroborated with data from the marriage and death records. Each reconstituted couple receives a unique ID. Then, the possible links between them are investigated. The Godparents are also considered, as an additional element of identifying families, and sometimes as part of them (when they were family members). This research involves some difficulties, the biggest being that family reconstitution process must be done manually. In many countries from Europe and around the world, family reconstitution is done automatically by computer; in order to do this in the case of HPDT, the information needs to be standardized, to allow the record linkage. For various reasons, record linkage is not possible in the near future, so the information analysis has to be done manually; In this context, information from HPDT also raises the following issues: data is often incomplete - often the mother's name is either completely missing at the birth registration, or only her first name is mentioned; the same name can be written in many ways, which excludes automatic selection; people with identical names are very numerous, therefore it is difficult to know to which family each one belongs. (Show less)
Joana-Maria Pujadas-Mora, Alícia Fornés, Josep Lladós, Miquel Valls, Gabriel Brea :
Building Individual-level Historical Demographic Databases using Computer Vision Methods based on Deep Learning. The Barcelona Case.
Nowadays, one of the great challenges of Historical Demography is integrating handwriting recognition techniques into data collection of primary sources as a way of being part of the Big Data revolution (Pujadas-Mora et al., 2016). This integration would make possible to reduce the time of data collection and processing large ... (Show more)
Nowadays, one of the great challenges of Historical Demography is integrating handwriting recognition techniques into data collection of primary sources as a way of being part of the Big Data revolution (Pujadas-Mora et al., 2016). This integration would make possible to reduce the time of data collection and processing large collections of documents and would offer everincreasing arrays of information. Moreover, this process, also, fits with the gradual introduction of information technology occurring since decades in Humanities, the more recent massive campaigns of digitization of historical sources, which have become customary and the important progress in document image analysis and recognition techniques. In particular, successful adoption of deep learning to handwritten text recognition (HTR) and key word spotting (KWS) has been developed. In this times, these techniques are moving towards ‘Document Understanding’ rather than pure transcription in order to narrow the semantic gap regarding the interpretation of the contents, which is extremely useful to build databases automatically, and more specifically demographic databases.
The aim of the paper is to describe the main document image analysis techniques that have been developed for extracting the information from handwritten demographic sources in order to create the Barcelona Historical Marriage Database (BHMD) within the Advanced Grant project ‘Five Centuries of Marriages’ (IP: Cabré, A.) funded by the European Research Council and the Baix Llobregat Demographic Database (BALL) inside the projects: ‘Tools and procedures for the large scale digitization of historical sources of population’ (IP: Lladós, J.; Esteve, A.; Pujadas-Mora, J.M.) and ‘Networks. Technology and citizen innovation for building historical social networks to understand the demographic past’ (IP: Fornés, A.; Pujadas-Mora, J.M.) funded both by RecerCaixa program – Obra Social “la Caixa”. The BHMD brings together the marriage licenses recorded at the Llibres d’Esposalles covering the Diocese of Barcelona (formed by 250 parishes in 1900) from 1451 to 1905, accounting for more than 600,000 marriages. The BALL database is an ongoing database containing individual census data from the Catalan county of Baix Llobregat (Barcelona, Spain), up to now, of 9 different municipalities for the nineteenth and twentieth centuries gathering more than 220,000 individual observations.
The specific applied techniques of document analysis in those projects are the Key Word Spotting and the Handwritten Text Recognition. Key Word Spotting turns out to be more suitable when a document does not have a clear internal structure or when the handwriting style is new to the system. Word spotting has been approached through structural and learning-free method, graphs and statistical and learning-based method when training data is available. To train the system we have included the human in the loop through a bimodal crowdsourcing platform. So, when the document is legible and there is enough training data of a particular handwriting style, a handwriting recognition system could be properly trained. In this way, once the words are recognized, the next step consists in assigning a semantic category to create the database. (Show less)