A patient distance metric for neurology

Objective: Neurologists lack a metric for measuring the distance between neurological patients. When neurological signs and symptoms are represented as neurological concepts from a hierarchical ontology and neurological patients are represented as sets of concepts, distances between patients can be represented as inter-set distances. Methods: We converted the neurological signs and symptoms from 721 published neurology cases into sets of concepts with corresponding machine-readable codes. We calculated inter-concept distances based a hierarchical ontology and we calculated inter-patient distances by semantic weighted bipartite matching. We evaluated the accuracy of a k-nearest neighbor classifier to allocate patients into 40 diagnostic classes. Results:

The creation of a distance metric for neurological patients based on signs and symptoms is challenging. First, neurological symptoms and neurological signs are recorded in the electronic health record as unstructured free text. Second, examiners use a variety of equivalent terms to represent the same meaning: hyperreflexia is equivalent to increased reflexes; Babinski sign is equivalent to extensor plantar response; and so on. Third, the number of signs and symptoms may vary from patient to patient. Some patients may have as few as one or two signs, while other more complex patients may have as many as 10 or 20 different signs. Fourth, converting unstructured text from electronic health records into machine-readable codes is difficult [7][8].
The SNOMED CT ontology and the UMLS Metathesaurus allow the consolidation of multiple synonymous terms under the same concept [9][10]. Both terminologies assign unique machine-readable codes to a concept. We have identified 1167 core concepts from the UMLS Metathesaurus as a basis for capturing the signs and symptoms of the neurological examination [11].
When signs and symptoms are converted to concepts and represented as machinereadable codes, patients can be instantiated mathematically as a set or as a vector. If sets are used to represent a patient, each sign or symptom is added to the set as a unique element. The cardinality of the set (number of set elements) is equal to the number of signs and symptoms. When a patient is represented as a vector, each sign or symptom can be represented as a dimension of the vector so that the number of dimensions is equal to the number of signs and symptoms. The magnitude of the vector in each dimension indicates whether the finding is present or absent. A variety of distance metrics are available to calculate the distances between vectors and sets including the Manhattan distance, cosine distance, Euclidean distance, and Mahalanobis distance. When patients are represented as sets of concepts ( Figure 1), a commonly used metric is the Jaccard similarity [ Equation 1].
In Equation [1], the Jaccard similarity [12] between Set A and Set B is the intersection of Set A with Set B divided by the union of Set A with Set B. Haase et al. [13] introduced a bipartite graph metric of set similarity.
In Equation [2], the sum of the maximum similarity between each element in set A with an element is Set B is divided by |A|, the number of elements in set A. Girardi et al. [14] and Mabotuwana [15] have suggested that the accuracy of inter-set distances can be enhanced when inter-concept distances from a concept hierarchy [16][17][18][19][20][21][22][23][24][25][26] are included in the calculation of inter-set distance. Inter-concept measures of distance can be combined with inter-set measures of distance to compute inter-patient distances [27].
Currently, no single concept-based inter-set measure of patient distance has proven superior to others.

Distance between signs and symptoms (inter-concept distances).
We calculated distances between signs and symptoms by the method of Wu and Palmer [16] utilizing the subsumption hierarchy in the neuro-ontology [11]. Distances between concepts were normalized with a minimum of 0.0 (closest) and a maximum of In Equation [3] dist (a,b) is the semantic distance between concept a and concept b, LCS is the lowest common subsumer in the hierarchical ontology for both a and b, depth(a) is number of levels from the root concept to concept a, depth (b) is the number of levels from the root concept to concept b, and depth(LCS) is the number of levels from the root concept to the LCS.

Distances between patients (inter-set distances).
We calculated distances between by semantic weighted bipartite matching [27].
( , )� [4] In Equation [4], distance (A,B) is the distance between Set A and Set B, |A| is the number of elements in Set A, |B| is the number of elements in Set B, and dist (a, b) is the semantic distance between concept a and concept b based on a hierarchical ontology.

Statistical methods.
We used Prism 8.4.0 for t-tests, ANOVA, scatter plots, and box plots. We used Orange 3.23.0 for k-nearest neighbor classification and multidimensional scaling (MDS).

Stratification of the dataset by diagnosis.
We stratified the dataset of 721 patients into four diagnostic tranches: movement disorders, nerve and muscle disorders, cranial nerve disorders, and brain and spinal cord disorders. Each of the four tranches consisted of 10 diagnoses. Each diagnosis occurred 3 or more times in the tranche (Table 1).

Results
Mean inter-patient distances are lower with semantic weighted bipartite matching than with the Jaccard distance (unpaired t-test, df = 1440, p < .0001). Because of the uncommonness of exact concept matches between patients, the mean inter-patient distance by the Jaccard distance metric is 0.98 ± 0.01, very close to the maximum of 1.00. By using the semantic weighted bipartite matching, the mean inter-patient distance for the 721 patients drops to 0.59 ± 0.04. The greater dispersion of interpatient distances with semantic weighted bipartite matching is illustrated when comparing inter-patient distances between a single patient with multiple sclerosis with the other 720 patients in the dataset ( Figure 2 ). The histogram illustrates that the semantic weighted bipartite metric results in smaller inter-patient distances and greater dispersion of the distances than the Jaccard distance.
Based on patient diagnosis, we divided the dataset into four diagnostic tranches, each with ten diagnosis classes ( Table 1). Each of the 40 diagnoses in the four diagnostic tranches occurred at least 3 times in the dataset. We calculated the mean intra-class patient distance for the 20 diagnostic classes in the brain and spinal cord tranche and the movement disorder tranche ( Figure 3, means differ one-way ANOVA, dF = 19, p <0.0001). We also calculated mean intra-class patient distance for the 20 diagnoses in the cranial nerve and nerve and muscle tranches ( Figure 4, means differ, one-way ANOVA, dF= 19, p<0.0001).
Mean inter-class distances were calculated for all 40 of the diagnosis classes shown in  Figure 10 (class distances from the class of ALS patients, means differ, one-way ANOVA, dF =9, p<0.0001) ) and Figure 11 (class distances from the class of Parkinson disease patients, means differ, one-way ANOVA, dF =9, p<0.0001). Distance measures also allow the identification of the nearest neighbors to any specific patient in the dataset. Figure 12 shows the 20 nearest neighbors to a patient with amyotrophic lateral sclerosis, and Figure 13 shows the 20 nearest neighbors to a patient with Parkinson disease.
We used a k-nearest neighbor classifier to the classify each patient in the four diagnostic tranches by diagnosis ( Table 2). Based on performance of the classifier across a variety of values of k, we selected the value of k = 3 with uniform distance weighting ( Figure 9) for all the tranches. F1, accuracy, recall, and precision for each of the four tranches is shown in Table 3 (means differ, one-way ANOVA, dF = 3, p<0.0001).

Discussion
We have represented neurological patients as sets of signs and symptoms. We have demonstrated a reduction in inter-patient distances when concept similarity based on an ontology hierarchy is considered in calculating inter-patient distances. For the 721 patients in our dataset, the mean inter-patient distance went from 0.98 ± 0.01 to 0.59 ± 0.14 when concept similarity was considered in the distance metric. Furthermore, dispersion of inter-patient distances is improved when concept similarity is considered ( Figure 2). The large reduction in mean inter-patient distances with the semantic weighted bipartite matching metric (0.39), reflects the high frequency of zero similarity between patients in the dataset by Jaccard similarity. For the 519,841 inter-patient comparisons made, 86.5% showed zero Jaccard similarity (i.e. that the two patients shared no concepts in common). Mabotuwana et al. [12] in a study comparing the similarity of radiology reports based on a semantic vector found that when concept similarity was not considered, 77% to 86% of the report comparisons shared no exact concept matches (zero similarity). By considering semantic similarity between concepts based on the SNOMED CT hierarchy, they found a distance gain of approximately 0.10 by using concept similarity to enhance the calculation of inter-report distances.
The problem of stereotypy. Neurologists hold views that within certain diagnosis classes, patients are more similar to each other, i.e. more stereotyped in presentation, than in other diagnosis classes. Within a diagnosis class of patients, the inter-patient distance can be calculated between each patient and the other patients in the same class (Figures 3 and 4). We suggest that a low intra-class mean inter-patient distance implies greater intra-class similarity and that the presentation is more stereotyped in that class. We also suggest that a large mean inter-patient distance in a diagnosis class corresponds to more variability in presentation between patients in that class. The twenty diagnoses in Figure 3 are from the movement disorder tranche and the brain and spinal cord tranche. The twenty diagnoses from Figure 4 are from the cranial nerve tranche and muscle and nerve tranche. We suggest that diagnosis classes with smaller intra-class means (essential tremor, hemiballismus, and dystonia in Figure 3 and sixth nerve palsy, meralgia paresthetica, and Bell's Palsy in Figure 4) are more stereotyped in the presentation. On the other hand, we suggest that conditions such as subdural hematoma, multiple sclerosis, lumbar root disease, myelopathy, and polyneuropathy with large intra-class mean patient distances are more variable in presentation.
The problem of proximity. Diagnostic proximity is the idea that some diagnoses are similar to certain diagnoses and dissimilar to other diagnoses. inter-patient distances. For example, Figure 5 illustrates the proximity of median nerve neuropathy to ulnar neuropathy and the proximity of myasthenia gravis to myopathy. Figure 6 illustrates the proximity of Alzheimer disease to frontotemporal dementia as well as the proximity of amyotrophic lateral sclerosis to myelopathy. Figure 7 illustrates the proximity of Parkinson disease to its close neighbors progressive supranuclear palsy and striatonigral degeneration.
The problem of misclassification.
Errors in diagnosis (misclassification) are a major challenge to medicine in general and neurology in particular [2][3][4][5]40]. The reasons for diagnostic error among physicians are complex [41]. Although some patients present to physicians with stereotyped patterns of signs and symptoms that lead to a rapid diagnosis through matching to stored patterns, other patients present with signs and symptoms that do not fit a stored pattern and require hypothesis generation and testing. These types of disease presentations pose a greater diagnostic challenge to the physician and a greater risk of diagnostic error [40,41]. We examined the performance of the k-nearest neighbor classifier for 40 different diagnostic conditions in 4 different diagnosis tranches (Table 2). Mean accuracy of diagnosis varied between 76% and 87% in the four tranches (Tables 2 and   3). These accuracies are comparable to the bedside diagnostic accuracy of human experts [3][4][5]. The diagnoses most often misclassified in each of the four tranches were polyneuropathy, B12 deficiency, Wilson disease, striatonigral degeneration, multiple sclerosis, herpes simplex encephalitis, Devic disease, and Ramsay Hunt syndrome.
Factors that may make these diseases more prone to misclassification include overlap with other diagnoses (for example Ramsay Hunt syndrome overlaps with Bell Palsy; striatonigral degeneration with Parkinson disease, etc. and the lack of a highly stereotyped presentation as with B12 deficiency and multiple sclerosis).

Implications for the neurologist and neurological plausibility.
Any proposed neurology inter-patient distance metric needs to be neurologically should make sense to neurologists who routinely see these types of patients. A distance metric between neurological patients based on signs and symptoms can allow neurologists to think quantitatively about how similar the presentation of one patient is to another, how similar the signs and symptoms of one diagnosis class is to another, and the risk of misclassification for certain diagnosis classes.

Limitations
One limitation of the proposed distance metric is that we did not consider the severity of deficits such as weakness or ataxia. When deficits were present, they were binarized as either present or absent and not graded in severity. Another limitation is that some of the diagnosis classes were narrower than others. Although some of the diagnosis classes were specific (Huntington disease, Alzheimer disease, and Parkinson disease), others were more general such as polyneuropathy, myopathy, and meningitis. This decision to use more general categories for some diagnosis classes reflects the reality that signs and symptoms alone are unlikely to distinguish specific causes of meningitis, polyneuropathy, or myopathy without additional ancillary testing. Studies are needed to compare the computed inter-patient distances to expert neurologic opinion. The validity of the results would be improved by a larger dataset of patients than the 721 in our sample. We accepted the signs and symptoms from these published studies as the ground truth for distance calculations. Studies have shown that when neurologists examine patients, they may disagree as to the ground truth [42]. A further limitation of the study is that we utilized published cases from the textbooks of neurology rather than de-identified patient records from electronic medical records. Furthermore, we abstracted concepts from the case histories rather than by an NLP algorithm. This process can be automated by natural language processing (NLP) methods [43][44][45]. We chose to use manual abstraction rather than NLP because we wanted to carefully curate a database of test patients with minimal coding errors. Our experience with MetaMap [44][45] is that as many as 25% of concepts will need post-processing to improve accuracy. In the future, we plan to extend our methods to a larger dataset of patients and utilize signs and symptoms from de-identified patients in electronic health records. Future advances in NLP could make the conversion of signs and symptoms in EHR text to machine-readable codes more accurate and efficient [46][47][48].
Inter-rater reliability for abstracting clinical cases into UMLS codes or SNOMED CT codes is another concern [6-7]. Finally, we have not compared the semantic weighted bipartite distance metric to other available distance metrics [12,27]. Additional work will need to be done to establish the reliability of the abstraction process and identify the best metrics to calculate patient distances between neurological patients.

Conclusions
Neurological signs and symptoms that reside in the electronic health record as unstructured data can be represented as sets of UMLS concepts and stored as UMLS CUI machine readable codes. Inter-concept distances for signs and symptoms can be calculated based on a concept hierarchy [11]. Using a semantic weighted bipartite matching metric, inter-patient distances can be calculated for neurological cases. When distances for groups of patients in the same diagnosis class are known, inter-patient distances can give important insights into variability within a given diagnosis class, the proximity between different diagnosis classes, and the risk of misclassification by diagnosis.

Human Studies
The Institutional Review Board of the University of Illinois at Chicago approved this work.

Funding
None to report.

Conflicts of Interest
None to report.

Authors contributions.
Research design by DBH. Data collection by SUB, DBH, and JK. Data analysis by all.
Manuscript writing and editing by all.       Table 1 for abbreviations. Figure 9. Performance of k-nearest neighbor classifier related to k and stratified with 10-fold cross-validation with uniform distance metric. Accuracy is shown separately by tranche. Means differ, one-way ANOVA, dF=3, p <0.0001