Organization of the Québec BCG Vaccination Registry
Vaccination certificates covering the years between 1926 and 1955 inclusively were only available on microfilms. Beginning in the 1960s, clerical staff manually entered the information from vaccination certificates dated 1956 onwards to create listings for consultation purposes. Eight series of alphabetically sorted paper listings were produced, each covering a few calendar years from 1956 to 1992. This represented 140 volumes totalling over 60,000 pages (~4.2 million lines of data), which, along with 123 microfilm rolls, constituted the registry’s searchable hard-copy. In the paper listings, information corresponding to an individual event was summarized on one line that specifically contained information on surname and given name, father’s given name, birth date, sex, tuberculin reactivity test (type, date, and reaction in millimetres), current vaccination (date and administration mode), date of prior vaccinations, and medical institution code.
Vaccination certificates (microfilms and paper) and paper-based listings of the BCG Vaccination Registry have been converted to an electronic format in 2010. Computerization, performed by Trigonix Inc. (Montreal, Canada), comprised graphical imaging of vaccination certificates (microfilms and paper) and transfer of paper listings into a searchable electronic database using optical character recognition. In a first step, this involved scanning each page within the registry’s paper listings to create PDF (Portable Document Format) files. In a second step, the optical character recognition parameters were optimized for each of the eight series of paper listings due to their differing formats, thereby allowing their conversions into an electronic database.
Data quality
The quality and accuracy of the electronic version of the Québec BCG Vaccination Registry, hereafter referred to as the electronic BCG Registry, was ascertained using two distinct verification samples. Sample size was determined based on having a feasible number of records to retrieve and review, and achieving a reasonable statistical precision. Firstly, we determined agreement between paper listings and the electronic BCG Registry to assess accuracy of the computerization process. Individual records (n = 5,268) from paper listings (1956–1992) were sampled by systematically selecting pages within volumes, and lines within pages to roughly represent 0.1% from each of the eight series. Secondly, we determined agreement between vaccination certificates and the electronic BCG Registry to document the accuracy and completeness of the electronic database compared with its archived raw data. Vaccination certificates, stored in archive filing cabinets, are organized by year and geographical region where vaccination took place. We systematically selected 4,972 vaccination certificates (~0.1% of the registry), approximately 250 certificates for each of the 20 years of interest (1956–1975) sampled from all available geographical regions.
For the two samples, the quality control and verification process was documented in terms of the completeness of information and presence of discrepancies. Information was compared along several personal nominal identifiers and BCG-related variables. We computed, for each variable, percent agreement and 95% confidence intervals (CIs) among records containing valid information (non-empty fields). The proportion of records with complete agreement on all variables was also calculated. Verification and analyses were respectively carried out with Filemaker Pro, version 11.0 (FileMaker Inc., Santa Clara, California) and SPSS, version 17.0 (SPSS Inc., Chicago, Illinois).
Linkage with administrative databases
To assess linkage feasibility, we determined the proportion of successful record linkages with provincial demographic and administrative medical databases, overall and per birth year. From the electronic BCG Registry, 3,500 subjects were randomly sampled among 491,861 individuals born from 1961 to 1974 (~250 per birth year) and vaccinated between 1970 and 1974. The nominal information extracted included child’s surname, given names of the child and father, sex, and date of birth. Data linkage was independently performed between the electronic BCG Registry and both the Birth Registry, administered by the Institut de la statistique du Québec (ISQ), and the Healthcare Registration File (Régie de l’assurance maladie du Québec, RAMQ). The Birth Registry provides information on all births and stillbirths occurring in the province as well as perinatal and parental sociodemographic data, retrieved from birth certificates. The RAMQ is the government body responsible for administration of health care services in Québec. Our linkage was conducted with the Healthcare Registration File which includes, for all beneficiaries of the universal public health system, information such as birth date, sex, postal code and year of death (if applicable).
Probabilistic record linkages between the electronic BCG Registry and the Birth Registry were done by the Environnement pour la promotion de la santé et du bien-être (EPSEBE) team at ISQ [12] using five basic identifiers (child’s surname and given name, date of birth, sex, initial of father’s given name). Deterministic linkages between the electronic BCG Registry and the Healthcare Registration File were carried out by RAMQ using the same identifiers, except for using the full father’s given name instead of his initial. For matching purposes, nominal information was independently standardized by ISQ and RAMQ in several ways (e.g., capitalizing all letters, eliminating general identifiers such as “Ms.” or “unknown”, standardizing date formats, removing blank spaces, splitting hyphenated names into several data fields). This entire project, including the transfer of confidential data, was done in accordance with the province of Québec’s legal and ethical requirements. Procedures, data access, and ethical issues were assessed and approved by the Commission d’accès à l’information (reference number: 11 02 67 (10 08 48, 09 08 39)), INRS Research Ethics Committee (reference number: CER-09-203), and ISQ (reference number: KB2-Rousseau-pilote, 09–08).
The presence of certain potential confounders pertinent to investigating hypotheses on a link between BCG vaccination and chronic diseases was verified, given that a typical limitation of using administrative databases for research purposes is the lack of information on such variables. Some useful variables documented in the Birth Registry over the years covered include gestational age, birth weight, number of older siblings, mother’s municipality of residence, as well as parents’ age and birthplace.