Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder

Table 1 Precision, recall and F1 scores of CLAMP, cTAKES, and MetaMap on 544 ASD-related full-text PubMed articles

	Number of true positives	Number of true entities	Number of predicted entities	Precision	Recall	F1 Score
CLAMP unfiltered	43,330	48,706	256,525	0.17	0.89	0.28
CLAMP filtered	39,533	48,706	65,037	0.61	0.81	0.70
cTAKES unfiltered	45,579	48,706	337,125	0.14	0.94	0.24
cTAKES filtered	45,509	48,706	103,783	0.44	0.93	0.60
MetaMap unfiltered	47,544	48,804	1,726,985	0.03	0.97	0.05
MetaMap filtered	45,078	48,804	145,926	0.31	0.92	0.46

The number of true entities represents the number of benchmark (BM) ASD terms found in the texts. MetaMap has a slightly different number of true entities than CLAMP and cTAKES because of the pre-processing methods used in order to run MetaMap on the texts. Details on how the statistics were computed can be found in “Methods”

ISSN: 1472-6947