Skip to main content

Advertisement

Table 3 Performances of five classification algorithms with different feature sets in the test data

From: Using natural language processing methods to classify use status of dietary supplements in clinical notes

Type Features Decision tree Random forest Naïve Bayes SVM Maximum Entropy
P R F P R F P R F P R F P R F
Type 1 raw unia 0.819 0.817 0.816 0.858 0.853 0.853 0.770 0.757 0.755 0.818 0.816 0.815 0.850 0.849 0.849
Type 2 uni 0.846 0.845 0.844 0.878 0.876 0.876 0.793 0.784 0.783 0.837 0.835 0.834 0.874 0.873 0.873
Type 3 tf-idf 0.862 0.857 0.857 0.862 0.857 0.857 0.763 0.704 0.701 0.844 0.839 0.839 0.840 0.831 0.831
Type 4 bia 0.760 0.720 0.716 0.760 0.720 0.716 0.715 0.707 0.702 0.735 0.719 0.720 0.749 0.739 0.739
Type 5 uni + bi 0.872 0.864 0.863 0.872 0.864 0.863 0.815 0.808 0.807 0.881 0.877 0.876 0.890 0.888 0.887
Type 6 uni + bi+tria 0.863 0.852 0.850 0.863 0.852 0.850 0.815 0.808 0.808 0.880 0.876 0.875 0.887 0.883 0.882
Type 7 india only 0.848 0.847 0.846 0.861 0.860 0.860 0.860 0.849 0.848 0.851 0.849 0.849 0.862 0.859 0.859
Type 8 uni + bi+indi 0.860 0.860 0.860 0.875 0.865 0.864 0.813 0.803 0.801 0.899 0.897 0.897 0.895 0.903 0.902
Type 9 uni + bi+tri + indi 0.860 0.857 0.857 0.872 0.861 0.860 0.813 0.803 0.801 0.899 0.897 0.897 0.905 0.903 0.902
  1. auni: unigrams; bi: bigrams; tri: trigrams; indi: indicators
  2. Bolded data represent the largest value