Skip to main content

Table 3 Performances of five classification algorithms with different feature sets in the test data

From: Using natural language processing methods to classify use status of dietary supplements in clinical notes

Type

Features

Decision tree

Random forest

Naïve Bayes

SVM

Maximum Entropy

P

R

F

P

R

F

P

R

F

P

R

F

P

R

F

Type 1

raw unia

0.819

0.817

0.816

0.858

0.853

0.853

0.770

0.757

0.755

0.818

0.816

0.815

0.850

0.849

0.849

Type 2

uni

0.846

0.845

0.844

0.878

0.876

0.876

0.793

0.784

0.783

0.837

0.835

0.834

0.874

0.873

0.873

Type 3

tf-idf

0.862

0.857

0.857

0.862

0.857

0.857

0.763

0.704

0.701

0.844

0.839

0.839

0.840

0.831

0.831

Type 4

bia

0.760

0.720

0.716

0.760

0.720

0.716

0.715

0.707

0.702

0.735

0.719

0.720

0.749

0.739

0.739

Type 5

uni + bi

0.872

0.864

0.863

0.872

0.864

0.863

0.815

0.808

0.807

0.881

0.877

0.876

0.890

0.888

0.887

Type 6

uni + bi+tria

0.863

0.852

0.850

0.863

0.852

0.850

0.815

0.808

0.808

0.880

0.876

0.875

0.887

0.883

0.882

Type 7

india only

0.848

0.847

0.846

0.861

0.860

0.860

0.860

0.849

0.848

0.851

0.849

0.849

0.862

0.859

0.859

Type 8

uni + bi+indi

0.860

0.860

0.860

0.875

0.865

0.864

0.813

0.803

0.801

0.899

0.897

0.897

0.895

0.903

0.902

Type 9

uni + bi+tri + indi

0.860

0.857

0.857

0.872

0.861

0.860

0.813

0.803

0.801

0.899

0.897

0.897

0.905

0.903

0.902

  1. auni: unigrams; bi: bigrams; tri: trigrams; indi: indicators
  2. Bolded data represent the largest value