Skip to main content

Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Table 1 The statistical characteristics of the dataset

From: A bibliometric analysis of natural language processing in medical research

Characteristics Statistics
Total #pub. 1405
#pub. with author address information 1386
#pub. with abstract 1382
#pub. with author keywords or PubMed MeSH 1277
#unique publication sources 324
#unique countries/first countries 56/45
#unique authors/first authors 4391/1053
#unique affiliations/first affiliations 961/514
Average #words/word characters in title 12.53; 6.50
Average number/standard deviation of character in title 95.43; 29.72
Average #words/word characters in abstract 215.24; 5.62
Average number/standard deviation of character in abstract 1456.95; 536.2
Top 10 frequency words/phrases in author keywords or PubMed MeSH Electronic health record (363; 25.84%); Data mining (278; 19.79%); Information storage and retrieval (239; 17.01%); Artificial intelligence (179; 12.74%); Female (163; 11.60%); Semantics (156; 11.10%); Male (153; 10.89%); Controlled vocabulary (140; 9.96%); Automatic pattern recognition (127; 9.04%); Medical record system (112; 7.97%)
Top 10 frequency words/phrases extracted from title Electronic health record (69; 4.91%); Medical record (55; 3.91%); Clinical text (45; 3.20%); Clinical note (41; 2.92%); Patient (37; 2.63%); Text mining (23; 1.64%); Classification (22; 1.57%); Clinical narrative (21; 1.49%); Radiology report (21; 1.49%); Natural language processing method (20; 1.42%)
Top 10 frequency words/phrases extracted from abstract Patient (322; 22.92%); Precision (217; 15.44%); F-measure (205; 14.59%); Recall (178; 12.67%); Accuracy (164; 11.67%); Electronic health record (161; 11.46%); Natural language processing method (155; 11.03%); Medical record (143; 10.18%); Disease (141; 10.04%); Concept (128; 9.11%)