Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder

Table 2 Precision, recall and F1 score of CLAMP, cTAKES, and MetaMap on 20,408 ASD-related PubMed abstracts

	Number of true positives	Number of true entities	Number of predicted entities	Precision	Recall	F1 Score
CLAMP unfiltered	96,235	106,284	370,654	0.26	0.91	0.4
CLAMP filtered	89,185	106,284	118,862	0.75	0.84	0.79
cTAKES unfiltered	101,219	106,284	489,520	0.21	0.95	0.34
cTAKES filtered	101,127	106,284	185,966	0.54	0.95	0.69
MetaMap unfiltered	97,992	106,286	1,839,606	0.05	0.92	0.10
MetaMap filtered	92,570	106,286	224,282	0.41	0.87	0.56

The number of true entities represents the number of benchmark (BM) ASD terms found in the texts. MetaMap has a slightly different number of true entities than CLAMP and cTAKES because of the pre-processing methods used in order to run MetaMap on the texts. Details on how the statistics were computed can be found in “Methods”.

ISSN: 1472-6947