Conditions of Non-Unique Identifiers in Record Linkage Using Japanese Cohort Dataset

The applications of unique identifiers such as name, home address and social security number to link different datasets have been commonly used and well-published. Also, the theoretical concepts of probabilistic algorithm in record linkage have been well-defined in the literature. However, few studies have reported the applications of its probabilistic algorithm using non-unique identifiers. In this paper, we investigate several variables (weight, height, waist, age, sex, smoking and alcohol habit) as non-unique identifiers using Japanese cohort dataset with three-year baseline of 1989-1991 to observe how effectively these identifiers can be used and what influence those may have on record linkage. Moreover, we modify the conditions of these identifiers and estimate the sensitivity, specificity and accuracy for comparison. We further investigate this by using extended ten-year baseline of 1989-1999 as well. As a result, we conclude that the combination of age, sex, weight and height predicts better estimation with regards to the sensitivity, specificity and accuracy than other combinations in both men and women in case of using three-year baseline, whereas the combination of age, sex and height predicts better in both men and women in case of using ten-year baseline.

Conflicts of Interest

The authors declare no conflicts of interest.

Nakai, M. , Nishimura, K. and Miyamoto, Y. (2015) Conditions of Non-Unique Identifiers in Record Linkage Using Japanese Cohort Dataset. Journal of Data Analysis and Information Processing, 3, 103-111. doi: 10.4236/jdaip.2015.34011.


