From: A systematic review of speech recognition technology in health care
Author | Aim | Setting | Outcome measures | Results |
---|---|---|---|---|
Year | Â | Sample | Â | Â |
Country Design | Â | Â | Speech technology (ST) | Â |
Design | Â | Â | Â | Â |
Al-Aynati and Chorneyko 2003 [18] | To compare SR software with HT for generating pathology reports | Setting: Surgical pathology | 1. Accuracy rate | Accuracy rate (mean %) |
Sample: 206 pathology reports | 2. Recognition/ Transcription errors | SR: 93.6 HT: 99.6 | ||
Canada Experimental | Â | ST: IBM Via Voice Pro version 8 with pathology vocabulary dictionary | Â | Mean recognition errors |
SR: 6.7 HT: 0.4 | ||||
Mohr et al. 2003 [22] | To compare SR software with HT for clinical notes | Setting: Endocrinology and Psychiatry | 1. Dictation/recording time + transcription (minutes) = Report Turnaround Time (RTT). | RTT (mins) |
Endocrinology | ||||
SR: (Recording + transcription) = 23.7 | ||||
HT: (Dictation + transcription) = 25.4 | ||||
USA Experimental | Sample: 2,354 reports | |||
ST: Linguistic Technology Systems LTI with clinical notes application | Â | SR: 87.3% (CI 83.3, 92.3) productive compared to HT. | ||
Psychiatry transcriptionist | ||||
SR: (Recording + transcription) = 65.2 | ||||
HT: (Dictation + transcription) = 38.1 | ||||
SR: 63.3% (CI 54.0, 74.0) productive compared to HT. | ||||
Psychiatry secretaries | ||||
SR: (Recording + transcription) = 36.5 | ||||
HT: (Dictation + transcription) = 30.5 | ||||
SR: 55.8% (CI 44.6, 68.0) productive compared to HT. | ||||
Author, secretary, type of notes were predictors of productivity (p < 0.05). | ||||
NSLHD 2012 [29] | To compare accuracy and time between SR software and HT to produce emergency department reports | Setting: Emergency Department | 1. RTT | RTT mean (range) in minutes |
Australian Experimental | Â | Sample: 12 reports | Â | SR: 1.07 (46Â sec, 1.32) |
ST: Nuance Dragon Voice Recognition | Â | HT: 3.32 (2.45, 4.35) | ||
HT: Spelling and punctuation errors | ||||
SR: Occasional misplaced words | ||||
Alapetite, 2008 [30] | To evaluate the impact of background | Setting: Simulation laboratory | 1. Word Recognition Rate (WRR) | WRR |
Denmark Non-experimental | noise (sounds of alarms, aspiration, metal, people talking, scratch, silence, ventilators) and other factors affecting SR accuracy when used in operating rooms | Sample: 3600 short anaesthesia commands | Â | Microphone |
 |  | Microphone 1: Headset 83.2% | ||
ST: Philips Speech Magic 5.1.529 SP3 and Speech Magic Inter Active Danish language, Danish medical dictation adapted by Max Manus | Â | Microphone 2: Handset 73.9% | ||
Recognition mode | ||||
Command 81.6% | ||||
Free text 77.1% | ||||
Background noise | ||||
Scratch 66.4% | ||||
Silence 86.8% | ||||
Gender | ||||
Male 76.8% | ||||
Female 80.3% | ||||
Alapetite et al. 2009 [31] | To identify physician’s perceptions, attitudes and expectations of SR technology. | Setting: Hospital (various clinical settings) | 1. Users’ expectation and experience | Overall |
Denmark Non-experimental | Â | Sample: 186 physicians | Predominant response noted. | Q1 Expectation: positive 44% |
Q1 Experience: negative 46% | ||||
Performance | ||||
Q8 Expectation: negative 64% | ||||
Q8 Experience: negative 77% | ||||
Time | ||||
Q14 Expectation: negative 85% | ||||
Q14 Experience: negative 95% | ||||
Social influence | ||||
Q6 Expectation negative 54% | ||||
Q6 Experienced negative 59% | ||||
Callaway et al. 2002 [20] | To compare an off the shelf SR software with manual transcription services for radiology reports | Setting: 3 military medical facilities | 1. RTT (referred to as TAT) | RTT |
USA Non-experimental | Â | Sample: Facility 1: 2042 reports | 2. Costs | Facility 1: Decreased from 15.7Â hours (HT) to 4.7Â hours (SR) |
Facility 2: 26600 reports | Â | Completed in <8Â h: SR 25% HT 6.8% | ||
Facility 3: 5109 reports | Â | Facility 2: Decreased from 89Â hours (HT) to 19Â hours (SR) | ||
ST: Dragon Medical | Â | Cost | ||
Professional 4.0 | Â | Facility 2: $42,000 saved | ||
Facility 3: $10,650 saved | ||||
Derman et al. 2010 [32] | To compare SR with existing methods of data entry for the creation of electronic progress notes | Setting: Mental health hospital | 1. Perceived usability | Usability |
Canada Non-experimental | Â | Sample: 12 mental health physicians ST: Details not provided | 2. Perceived time savings | 50% prefer SR |
3. Perceived impact | Time savings: No sig diff (p = 0.19) | |||
Impact | ||||
Quality of care No sig diff (p = 0.086) | ||||
Documentation No sig diff (p = 0.375) | ||||
Workflow No sig improvement (p = 0.59) | ||||
Devine et al. 2000 [33] | To compare ‘out-of-box’ performance of 3 continuous SR software packages for the generation of medical reports. | Sample: 12 physicians from Veterans Affairs facilities New England | 1. Recognition errors (mean error rate) | Recognition errors (mean-%) |
USA Non-experimental | Â | ST: System 1 (S1) IBM ViaVoice98 General Medicine Vocabulary. | 2. Dictation time | Vocabulary |
3. Completion time | S1 (7.0 -9.1%) S3 (13.4-15.1%) S2 (14.1-15.2%) | |||
System 2 (S2) Dragon Naturally Speaking Medical Suite, V 3.0. | 4. Ranking | S1 Best with general English and medical abbreviations. | ||
Dictation time: No sig diff (P < 0.336). | ||||
System 3 (S3) L&H Voice Xpress for Medicine, General Medicine Edition, V 1.2. | 5. Preference | Completion time (mean): | ||
S2 (12.2Â min) S1 (14.7Â min) S3 (16.1Â min) | ||||
Ranking: 1 S1 2 S2 3 S3 | ||||
Irwin et al. 2007 [34] | To compare SR features and functionality of 4 dental software application systems. | Setting: Simulated dental | 1. Training time | Training time |
USA Non-experimental |  | Sample: 4 participants (3 students, 1 faculty member) | 2. Charting time | S1 11 min 8 sec S2 9 min 1 sec (no data reported for S3 ad S4). |
3. Completion | ||||
ST: Systems 1 (S1) Microsoft SR with Dragon NaturallySpeaking. | 4. Ranking | Charting time: S1 5 min 20 sec S2 9 min 13 sec, (no data reported for S3 ad S4). | ||
System 2 (S2) Microsoft SR |  | Completion %: S1 100 S2 93 S3 90 S4 82 | ||
Systems 3 (S3) & System 4 (S4) Default speech engine. | Â | Ranking | ||
1Â S1 104/189 2Â S2 77/189 | ||||
Kanal et al. 2001 [35] | To determine the accuracy of continuous SR for transcribing radiology reports | Setting: Radiology department | 1. Error rates | Error rates (mean ± %) |
USA Non-experimental |  | Sample: 72 radiology reports 6 participants |  | Overall (10.3 ± 33%) |
Significant errors (7.8 ± 3.4%) | ||||
ST: IBM MedSpeaker/Radiology software version 1.1 |  | Subtle significant errors (1.2 ± 1.6%) | ||
Koivikko et al. 2008 [36] | To evaluate the effect of speech recognition onadiology workflow systems over a period of 2 years | Setting: Radiology department | 1. RTT (referred to as TAT) at 3 collection points: | RTT (mean ± SD) in minutes |
Finland Non-experimental |  | Sample: > 20000 reports; 14 Radiologists | HT: 2005 (n = 6037) | HT: 1486 ± 4591 |
ST: Finnish Radiology Speech | SR1: 2006 (n = 6486) | SR 1: 323 ± 1662 | ||
Recognition System (Philips Electronics) | SR2: 2007 (n = 9072) | SR 2 : 280 ± 763 | ||
HT: cassette-based reporting | 2. Reports completed ≤ 1 hour | Reports ≤ 1 hour (%) | ||
SR1: SR in 2006 | Â | HT: 26 | ||
SR2: SR in 2007 | Â | SR 1 : 58 | ||
Training: | Â | Â | ||
10-15 minutes training in SR | Â | |||
Langer 2002 [37] | To compare impact of SR on radiologist productivity. Comparison of 4 workflow systems | Setting: Radiology departments | 1. RTT (referred to as TAT) | RTT (mean ± SD%) in hours/ RP |
USA Non-experimental | Â | Sample: Over 40 radiology sites | 2. Report productivity (RP), number of reports per day | System 1 |
System 1 Film, report dictated, HT |  | RTT: 48.2 ± 50 RP: 240 | ||
System 2 Film, report dictated, SR | Â | System 2 | ||
System 3 Picture archiving and communication system + HT |  | RTT: 15.5 ± 93 RP: 311 | ||
System 3 | ||||
System 4 Picture archiving and communication system + SR |  | RTT: 13.3 ± 119 (t value at 10%) RP: 248 | ||
System 4 | ||||
RTT: 15.7 ± 98 (t value at 10%) RP: 310 | ||||
Singh et al. 2011 [23] | To compare accuracy and turnaround | Setting: Surgical pathology | 1. RTT (referred to as TAT) | RTT in days |
USA Non-experimental | times between SR software and traditional transcription service (TS) when used for generating surgical pathology reports | Sample: 5011 pathology reports | 2. Reports completed ≤ 1 day | Phase 0: 4 |
ST: VoiceOver (version 4.1) Dragon Naturally Speaking Software (version 10) | 3. Reports completed ≤ 2 day | Phase 1: 4 | ||
Phase 0: 3 years prior SR | Phase 2–4: 3 | |||
Phase 1: First 35 months of SR use, gross descriptions | Reports ≤ 1 day (%) | |||
Phase 0: 22 | ||||
Phase 2–4: During use of SR for gross descriptions and final diagnosis | Phase 1: 24 | |||
Phase 2–4: 36 | ||||
Reports ≤ 2 day (%) | ||||
Phase 0: 54 | ||||
Phase 1: 60 | ||||
Phase 2–4: 67 | ||||
Zick et al. 2001 [38] | To compare accuracy and RTT between | Setting: Emergency Department | 1. RTT (referred to as TAT) | RTT in mins |
USA Non-experimental | SR software and traditional transcription service (TS) when used for recording in patients’ charts in ED | Sample: Two physicians - 47 patients’ charts | 2. Accuracy | SR: 3.55 TS: 39.6 |
3. Errors per chart | Accuracy % (Mean and range) | |||
ST: Dragon NaturallySpeaking Medical suite version 4 | 4. Dictation and editing time | SR: 98.5 (98.2-98.9) TS: 99.7 (99.6-99.8) | ||
4. Throughput | Average errors/chart | |||
SR: 2.5 (2–3) TS: 1.2 (0.9-1.5) | ||||
Average dictation time in mins (Mean and range) | ||||
SR: 3.65 (3.35-3.95) TS: 3.77 (3.43-4.10) | ||||
Throughput (words/minute) | ||||
SR: 54.5 (49.6-59.4) TS: 14.1 (11.1-17.2) |