This paper will describe a project designed to explore automatic record linkage strategies for linking historical vital events records into individual and family histories for the whole of Scotland in the period 1855-2000, and to develop a prototype record linkage algorithm specific to the Scottish data. The data was generated ...
(Show more)This paper will describe a project designed to explore automatic record linkage strategies for linking historical vital events records into individual and family histories for the whole of Scotland in the period 1855-2000, and to develop a prototype record linkage algorithm specific to the Scottish data. The data was generated and derived from a one-name study which includes all Scottish birth and marriage certificates carrying the Ormistion surname in the period 1855-1940 and all death certificates carrying the same surname in the period 1855-2000. In addition, it includes the death certificates of women born as Ormistons who changed surnames at marriage. Scottish civil registration provides full names of parents including mother’s maiden surname on marriage, birth and death certificates. Parents’ marriage date and place are also captured at birth registrations. Thus, there is an abundance of overlapping information that may be used to link deaths to births, marriage records of brides and grooms to their birth and death records, and births to the parental marriages and to the births of siblings.
We have adopted a family-based approach to linking individual events, where a combination of the person’s name, year of birth, names of parents, and name of spouse (where applicable) forms the basis of linkage. Families are reconstructed by linking birth records to the parental marriages using a set of identifiers including parental names and their date of marriage. By taking into consideration the combined discriminating powers of these identifiers, greater degrees of assurance can be obtained. The automatic matching was divided into several steps, starting with the exact match. Progressively, more relaxed matches were then carried out and elements of fuzzy matching were employed to handle name variations, errors in dates and ages and transcription errors. Using the manual family reconstructions as the ‘gold standard’, the accuracy of automated linkage algorithm was estimated. Sensitivity and positive predictive value (PPV) for birth-death links were 85% and 98% respectively. Sensitivity and PPV for birth-marriage links were 92% and 100% respectively. Sensitivity and PPV for marriage-death links were 75% and 100% respectively. Sensitivity and PPV for birth-parental marriage links were 98% and 100% respectively. Further research will focus on testing the linkage algorithm and refining the linkage strategy using a larger dataset including a name pool.
(Show less)