Skip to main content

Table 1 The statistical characteristics of the dataset

From: A bibliometric analysis of natural language processing in medical research

Characteristics

Statistics

Total #pub.

1405

#pub. with author address information

1386

#pub. with abstract

1382

#pub. with author keywords or PubMed MeSH

1277

#unique publication sources

324

#unique countries/first countries

56/45

#unique authors/first authors

4391/1053

#unique affiliations/first affiliations

961/514

Average #words/word characters in title

12.53; 6.50

Average number/standard deviation of character in title

95.43; 29.72

Average #words/word characters in abstract

215.24; 5.62

Average number/standard deviation of character in abstract

1456.95; 536.2

Top 10 frequency words/phrases in author keywords or PubMed MeSH

Electronic health record (363; 25.84%); Data mining (278; 19.79%); Information storage and retrieval (239; 17.01%); Artificial intelligence (179; 12.74%); Female (163; 11.60%); Semantics (156; 11.10%); Male (153; 10.89%); Controlled vocabulary (140; 9.96%); Automatic pattern recognition (127; 9.04%); Medical record system (112; 7.97%)

Top 10 frequency words/phrases extracted from title

Electronic health record (69; 4.91%); Medical record (55; 3.91%); Clinical text (45; 3.20%); Clinical note (41; 2.92%); Patient (37; 2.63%); Text mining (23; 1.64%); Classification (22; 1.57%); Clinical narrative (21; 1.49%); Radiology report (21; 1.49%); Natural language processing method (20; 1.42%)

Top 10 frequency words/phrases extracted from abstract

Patient (322; 22.92%); Precision (217; 15.44%); F-measure (205; 14.59%); Recall (178; 12.67%); Accuracy (164; 11.67%); Electronic health record (161; 11.46%); Natural language processing method (155; 11.03%); Medical record (143; 10.18%); Disease (141; 10.04%); Concept (128; 9.11%)