Skip to main content

Table 2 Feature ablation study on the Random Forest model. Each set of features is removed, and the difference of the performance is measured

From: Deep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records

 #featuresValidation setTest set
Full model140.88320.8246
- Token-based50.8689 (−1.5%)0.8129 (−1.2%)
- Character-based20.8655 (−1.8%)0.8154 (−0.9%)
- Sequence-based40.8697 (−1.4%)0.8034 (−2.1%)
- Semantic-based10.8704 (−1.3%)0.8235 (−0.1%)
- Entity-based20.8738 (−0.9%)0.8150 (−0.9%)