A systematic review of speech recognition technology in health care

Johnson, Maree; Lapkin, Samuel; Long, Vanessa; Sanchez, Paula; Suominen, Hanna; Basilakis, Jim; Dawson, Linda

doi:10.1186/1472-6947-14-94

BMC Medical Informatics and Decision Making

Table 3 Summary of speech recognition (SR) review results

From: A systematic review of speech recognition technology in health care

Author	Aim	Setting	Outcome measures	Results
Year		Sample
Country Design			Speech technology (ST)
Design
Al-Aynati and Chorneyko 2003 [18]	To compare SR software with HT for generating pathology reports	Setting: Surgical pathology	1. Accuracy rate	Accuracy rate (mean %)
Al-Aynati and Chorneyko 2003 [18]		Sample: 206 pathology reports	2. Recognition/ Transcription errors	SR: 93.6 HT: 99.6
Canada Experimental		ST: IBM Via Voice Pro version 8 with pathology vocabulary dictionary		Mean recognition errors
Canada Experimental				SR: 6.7 HT: 0.4
Mohr et al. 2003 [22]	To compare SR software with HT for clinical notes	Setting: Endocrinology and Psychiatry	1. Dictation/recording time + transcription (minutes) = Report Turnaround Time (RTT).	RTT (mins)
				Endocrinology
				SR: (Recording + transcription) = 23.7
				HT: (Dictation + transcription) = 25.4
USA Experimental		Sample: 2,354 reports		HT: (Dictation + transcription) = 25.4
		ST: Linguistic Technology Systems LTI with clinical notes application		SR: 87.3% (CI 83.3, 92.3) productive compared to HT.
				Psychiatry transcriptionist
				SR: (Recording + transcription) = 65.2
				HT: (Dictation + transcription) = 38.1
				SR: 63.3% (CI 54.0, 74.0) productive compared to HT.
				Psychiatry secretaries
				SR: (Recording + transcription) = 36.5
				HT: (Dictation + transcription) = 30.5
				SR: 55.8% (CI 44.6, 68.0) productive compared to HT.
				Author, secretary, type of notes were predictors of productivity (p < 0.05).
NSLHD 2012 [29]	To compare accuracy and time between SR software and HT to produce emergency department reports	Setting: Emergency Department	1. RTT	RTT mean (range) in minutes
Australian Experimental		Sample: 12 reports		SR: 1.07 (46 sec, 1.32)
		ST: Nuance Dragon Voice Recognition		HT: 3.32 (2.45, 4.35)
				HT: Spelling and punctuation errors
				SR: Occasional misplaced words
Alapetite, 2008 [30]	To evaluate the impact of background	Setting: Simulation laboratory	1. Word Recognition Rate (WRR)	WRR
Denmark Non-experimental	noise (sounds of alarms, aspiration, metal, people talking, scratch, silence, ventilators) and other factors affecting SR accuracy when used in operating rooms	Sample: 3600 short anaesthesia commands		Microphone
				Microphone 1: Headset 83.2%
		ST: Philips Speech Magic 5.1.529 SP3 and Speech Magic Inter Active Danish language, Danish medical dictation adapted by Max Manus		Microphone 2: Handset 73.9%
				Recognition mode
				Command 81.6%
				Free text 77.1%
				Background noise
				Scratch 66.4%
				Silence 86.8%
				Gender
				Male 76.8%
				Female 80.3%
Alapetite et al. 2009 [31]	To identify physician’s perceptions, attitudes and expectations of SR technology.	Setting: Hospital (various clinical settings)	1. Users’ expectation and experience	Overall
Denmark Non-experimental		Sample: 186 physicians	Predominant response noted.	Q1 Expectation: positive 44%
				Q1 Experience: negative 46%
				Performance
				Q8 Expectation: negative 64%
				Q8 Experience: negative 77%
				Time
				Q14 Expectation: negative 85%
				Q14 Experience: negative 95%
				Social influence
				Q6 Expectation negative 54%
				Q6 Experienced negative 59%
Callaway et al. 2002 [20]	To compare an off the shelf SR software with manual transcription services for radiology reports	Setting: 3 military medical facilities	1. RTT (referred to as TAT)	RTT
USA Non-experimental		Sample: Facility 1: 2042 reports	2. Costs	Facility 1: Decreased from 15.7 hours (HT) to 4.7 hours (SR)
		Facility 2: 26600 reports		Completed in <8 h: SR 25% HT 6.8%
		Facility 3: 5109 reports		Facility 2: Decreased from 89 hours (HT) to 19 hours (SR)
		ST: Dragon Medical		Cost
		Professional 4.0		Facility 2: $42,000 saved
		Professional 4.0		Facility 3: $10,650 saved
Derman et al. 2010 [32]	To compare SR with existing methods of data entry for the creation of electronic progress notes	Setting: Mental health hospital	1. Perceived usability	Usability
Canada Non-experimental		Sample: 12 mental health physicians ST: Details not provided	2. Perceived time savings	50% prefer SR
			3. Perceived impact	Time savings: No sig diff (p = 0.19)
				Impact
				Quality of care No sig diff (p = 0.086)
				Documentation No sig diff (p = 0.375)
				Workflow No sig improvement (p = 0.59)
Devine et al. 2000 [33]	To compare ‘out-of-box’ performance of 3 continuous SR software packages for the generation of medical reports.	Sample: 12 physicians from Veterans Affairs facilities New England	1. Recognition errors (mean error rate)	Recognition errors (mean-%)
USA Non-experimental		ST: System 1 (S1) IBM ViaVoice98 General Medicine Vocabulary.	2. Dictation time	Vocabulary
			3. Completion time	S1 (7.0 -9.1%) S3 (13.4-15.1%) S2 (14.1-15.2%)
		System 2 (S2) Dragon Naturally Speaking Medical Suite, V 3.0.	4. Ranking	S1 Best with general English and medical abbreviations.
			4. Ranking	Dictation time: No sig diff (P < 0.336).
		System 3 (S3) L&H Voice Xpress for Medicine, General Medicine Edition, V 1.2.	5. Preference	Completion time (mean):
				S2 (12.2 min) S1 (14.7 min) S3 (16.1 min)
				Ranking: 1 S1 2 S2 3 S3
Irwin et al. 2007 [34]	To compare SR features and functionality of 4 dental software application systems.	Setting: Simulated dental	1. Training time	Training time
USA Non-experimental		Sample: 4 participants (3 students, 1 faculty member)	2. Charting time	S1 11 min 8 sec S2 9 min 1 sec (no data reported for S3 ad S4).
		Sample: 4 participants (3 students, 1 faculty member)	3. Completion
		ST: Systems 1 (S1) Microsoft SR with Dragon NaturallySpeaking.	4. Ranking	Charting time: S1 5 min 20 sec S2 9 min 13 sec, (no data reported for S3 ad S4).
		System 2 (S2) Microsoft SR		Completion %: S1 100 S2 93 S3 90 S4 82
		Systems 3 (S3) & System 4 (S4) Default speech engine.		Ranking
		Systems 3 (S3) & System 4 (S4) Default speech engine.		1 S1 104/189 2 S2 77/189
Kanal et al. 2001 [35]	To determine the accuracy of continuous SR for transcribing radiology reports	Setting: Radiology department	1. Error rates	Error rates (mean ± %)
USA Non-experimental		Sample: 72 radiology reports 6 participants		Overall (10.3 ± 33%)
		Sample: 72 radiology reports 6 participants		Significant errors (7.8 ± 3.4%)
		ST: IBM MedSpeaker/Radiology software version 1.1		Subtle significant errors (1.2 ± 1.6%)
Koivikko et al. 2008 [36]	To evaluate the effect of speech recognition onadiology workflow systems over a period of 2 years	Setting: Radiology department	1. RTT (referred to as TAT) at 3 collection points:	RTT (mean ± SD) in minutes
Finland Non-experimental		Sample: > 20000 reports; 14 Radiologists	HT: 2005 (n = 6037)	HT: 1486 ± 4591
		ST: Finnish Radiology Speech	SR₁: 2006 (n = 6486)	SR_1: 323 ± 1662
		Recognition System (Philips Electronics)	SR₂: 2007 (n = 9072)	SR₂: 280 ± 763
		HT: cassette-based reporting	2. Reports completed ≤ 1 hour	Reports ≤ 1 hour (%)
		SR1: SR in 2006		HT: 26
		SR2: SR in 2007		SR₁: 58
		Training:
		10-15 minutes training in SR
Langer 2002 [37]	To compare impact of SR on radiologist productivity. Comparison of 4 workflow systems	Setting: Radiology departments	1. RTT (referred to as TAT)	RTT (mean ± SD%) in hours/ RP
USA Non-experimental		Sample: Over 40 radiology sites	2. Report productivity (RP), number of reports per day	System 1
		System 1 Film, report dictated, HT		RTT: 48.2 ± 50 RP: 240
		System 2 Film, report dictated, SR		System 2
		System 3 Picture archiving and communication system + HT		RTT: 15.5 ± 93 RP: 311
				System 3
		System 4 Picture archiving and communication system + SR		RTT: 13.3 ± 119 (t value at 10%) RP: 248
				System 4
				RTT: 15.7 ± 98 (t value at 10%) RP: 310
Singh et al. 2011 [23]	To compare accuracy and turnaround	Setting: Surgical pathology	1. RTT (referred to as TAT)	RTT in days
USA Non-experimental	times between SR software and traditional transcription service (TS) when used for generating surgical pathology reports	Sample: 5011 pathology reports	2. Reports completed ≤ 1 day	Phase 0: 4
		ST: VoiceOver (version 4.1) Dragon Naturally Speaking Software (version 10)	3. Reports completed ≤ 2 day	Phase 1: 4
			Phase 0: 3 years prior SR	Phase 2–4: 3
			Phase 1: First 35 months of SR use, gross descriptions	Reports ≤ 1 day (%)
			Phase 1: First 35 months of SR use, gross descriptions	Phase 0: 22
			Phase 2–4: During use of SR for gross descriptions and final diagnosis	Phase 1: 24
				Phase 2–4: 36
				Reports ≤ 2 day (%)
				Phase 0: 54
				Phase 1: 60
				Phase 2–4: 67
Zick et al. 2001 [38]	To compare accuracy and RTT between	Setting: Emergency Department	1. RTT (referred to as TAT)	RTT in mins
USA Non-experimental	SR software and traditional transcription service (TS) when used for recording in patients’ charts in ED	Sample: Two physicians - 47 patients’ charts	2. Accuracy	SR: 3.55 TS: 39.6
		Sample: Two physicians - 47 patients’ charts	3. Errors per chart	Accuracy % (Mean and range)
		ST: Dragon NaturallySpeaking Medical suite version 4	4. Dictation and editing time	SR: 98.5 (98.2-98.9) TS: 99.7 (99.6-99.8)
			4. Throughput	Average errors/chart
				SR: 2.5 (2–3) TS: 1.2 (0.9-1.5)
				Average dictation time in mins (Mean and range)
				SR: 3.65 (3.35-3.95) TS: 3.77 (3.43-4.10)
				Throughput (words/minute)
				SR: 54.5 (49.6-59.4) TS: 14.1 (11.1-17.2)

Report productivity (RP): Normalises the output of staff to the daily report volume.
Note: SR = speech recognition ST = speech technology HT = human transcription RTT = report turnaround time WRR = word recognition rate PACS = picture archiving and communication system RP = report productivity TS = traditional transcription service ED = emergency department Sig. = Significant Diff = difference. TAT = turnaround time, equivalent to RTT.

Back to article page

ISSN: 1472-6947

Contact us

General enquiries: journalsubmissions@springernature.com