| MIMIC-III Disch | MIMIC-III Rad | Tayside Brain Img |
---|
|T| | 59,652 | 522,279 | 156,618 |
|D| | 127,150 | 109,096 | 7,761 |
\(|D_{weak^+}|\) | 15,598 | 13,907 | 1,137 |
\(|D_{weak^{-}}|\) | 74,217 | 65,171 | 2,898 |
\(|T_{RD}|\) | 37,110 | 73,589 | 7,321 |
\(|T^{weak}_{RD}|\) | 10,568 | 21,102 | 2,855 |
\(|T^{ann}|\) | 500 | 1,000 | 5,000 |
\(|D^{ann}|\) | 1,073 | 198 | 279+4 |
\(|T^{ann}_{RD}|\) | 312 | 145 | 273 |
- |T|, number of documents; |D|, number of mention-UMLS pairs; \(|D_{weak^+}|\), \(|D_{weak^-}|\), number of weakly labelled positive and negative mention-UMLS pairs, respectively; \(|T_{RD}|\), \(|T^{weak}_{RD}|\), number of documents associated with one or more rare diseases detected by SemEHR and SemEHR+WS (i.e. further with weak supervision), respectively; \(|T^{ann}|\), \(|D^{ann}|\), \(|T^{ann}_{RD}|\), number of documents sampled, number of mention-UMLS pairs sampled, and number of the sampled documents with one or more rare diseases identified by SemEHR, respectively. For Tayside data, 4 new positive mention-UMLS pairs in \(|D_{ann}|\) were identified from the reports during the manual annotation