Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records

Table 6 Sensitivity, Specificity and Positive Predictive Value of Full Model

Ethnic group	Sensitivity	Specificity	Positive predictive value
White	99.7%	56.0%	98.2%
South Asian	94.7%	99.8%	90.4%
Black	20.4%	99.8%	63.6%
Chinese/Other	21.0%	99.9%	57.6%
Mixed	0%	100%

A multinomial logistic regression model was used to predict ethnic group. The model was developed on a randomly selected 50% sample of the 85352 cases whose ethnicity was recorded in the HES dataset. The remaiming 50% of cases were used to validate the model and derive the above estimates. The predictors used in the model were: ethnicity derived from name recognition software; Census estimates of ethnic distribution of population; number of hospital admissions; year of diagnosis; patient seen outside the NHS (yes/no); screen-detected cancer (yes/no); death certificate only cancer registration (yes/no); cancer treatment type (surgery/radiotherapy/chemotherapy); deprivation score; gender; age at diagnosis; cancer site; and death during follow-up period (all-cause and due to primary cancer separately) and time to death/censoring (Nelson-Aalen cumulative hazard).

ISSN: 1472-6947