Overview of Duplicate Researcher Records and ORCID iDs in IS VaVaI

For member institutions, the National ORCID Centre team that is part of the National Library of Technology have prepared an analysis of data from 2023–2024 focusing on potentially duplicate records of researchers and ORCID iDs in IS VaVaI. The purpose of this analysis is to provide institutions with concrete supporting data for working with researcher identifiers and to support the review and gradual improvement of data quality.

A typical example is a situation where a researcher appears in the data with the same name and affiliation but two different ORCID iDs. Based on automated processing alone, it is not always possible to determine unambiguously whether this represents one person with multiple ORCID iDs or two different researchers. The prepared dataset therefore works with potential duplicates, which can – and should – be further verified at the institutional level.

The main objective of the analysis was to determine how many researchers are associated with more than one ORCID iD, and whether the same ORCID iD appears for multiple individuals. The analysis also includes other selected types of potential duplicates that may indicate ambiguities or inconsistencies in the data.

Basic data overview

  • Total number of researchers in the dataset: 60,404
  • Researchers with an ORCID iD: 31,381 (51.95%)
  • Publishing researchers (at least one result of type J, B, C, or D): 50,260
  • Publishing researchers with an ORCID iD: 29,612 (62.44%)
  • Problematic ORCID iDs (multiple ORCID iDs for one person or the same ORCID iD assigned to multiple creators): 1,854
  • Potentially duplicate researcher records (number of occurrences of researchers where some indetifiers differ while the ORCID iD or VedIDK is the same, or where the same name and institution are shared): 3,057

The analysis primarily serves as a practical tool for working with data at the level of individual institutions. It allows institutions to filter specific types of potential duplicates, focus only on cases relevant to them, and use the data as a basis for reviewing and gradually improving researcher identifiers. In most institutions, the number of such cases is relatively low; nevertheless, the analysis can help identify recurring ambiguities or systemic data issues.

At the same time, we are preparing a separate overview of duplicates specifically for IS VaVaI, which will require further methodological refinement. In parallel, we plan to monitor the development of these indicators over time, enabling long-term assessment of the impact of working with ORCID iDs and overall data quality.