English eligibility criteria sentences preprocess | Descriptions |
---|---|
Delete ordinal number | There are many types of ordinal number (e.g., “1.”, “①”, “(1)”), and were deleted by regular expression |
Replace the ASCII code | We replace the ASCII code with the format that MetaMap can handle based on rules |
Lemmatization | Lemmatization is a process of grouping together the different inflected forms of a word and be analyzed as canonical form of the word. We did it with Python package NLTK |
Replace abbreviation | We replace the abbreviation with full spelling format based on dictionary |
Delete symbols of number, operator and unit | Various expression formats of number, operator and unit sometimes will interfere the output of MetaMap, and was deleted by regular expression |