Assessment of quality of routine health information system data and associated factors among departments in public health facilities of Harari region, Ethiopia

Background Despite the improvements in the knowledge and understanding of the role of health information in the global health system, the quality of data generated by a routine health information system is still very poor in low and middle-income countries. There is a paucity of studies as to what determines data quality in health facilities in the study area. Therefore, this study was aimed to assess the quality of routine health information system data and associated factors in public health facilities of Harari region, Ethiopia. Methods A cross-sectional study was conducted in all public health facilities in the Harari region of Ethiopia. The department-level data were collected from respective department heads through document reviews, interviews, and observation checklists. Descriptive statistics were used to data quality and multivariate logistic regression was run to identify factors influencing data quality. The level of significance was declared at P value < 0.05. Result The study found good quality data in 51.35% (95% CI 44.6–58.1) of the departments in public health facilities in the Harari Region. Departments found in the health centers were 2.5 times more likely to have good quality data as compared to those found in the health posts. The presence of trained staffs able to fill reporting formats (AOR = 2.474; 95% CI 1.124–5.445) and provisions of feedbacks (AOR = 3.083; 95% CI 1.549–6.135) were also significantly associated with data quality. Conclusion The level of good data quality in the public health facilities was less than the expected national level. Lack of trained personnel able to fill the reporting format and feedback were the factors that are found to be affecting data quality. Therefore, training should be provided to increase the knowledge and skills of the health workers. Regular supportive supervision and feedback should also be maintained. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-021-01651-2.

budget, and one report" policy of Ethiopia, HIS is the core information system [3]. The information revolution is one of the four big agendas of Ethiopia's Health sector transformation plan II (HSTP-II) and it is the phenomenal advancement in the methods and practice of collecting, analyzing, presenting, and disseminating information. Data quality, defined as data's fitness to serve its purpose in a given context in terms of accuracy, completeness, and timeliness [4],-is an essential element of this information revolution agenda [5].
Routine health care data have no importance unless it is accurate, processed, and used to inform decisions hence responsive to the local situations [6]. Improved health system performance is directly linked with the quality and use of routine data in a country's HIS [5,7].
Despite the improvements in the knowledge and understanding of the role of health information in the global health system, the quality of data generated by routine HIS is still very poor in low and middleincome countries [8]. The quality of data was found to be between 34 and 72% in many African countries [9]. The large volume and variety of data generated in public health facilities are overlooked due to their limited qualities [10][11][12][13].
In Ethiopia, routine health information systems data quality problem is for most indicators [14], and data quality is below the 80% national expectation [15]. The data completeness, accuracy, and timeliness were found to be between 33 and 78% in different parts of the country [4,5,[16][17][18][19]. In Addis Ababa, the overall data quality was found to be between 57. 9 and 76.22% [20,21] whereas it was 75.3% in Dire Dawa [15].
All functions of the health system and public health policy are seriously reliant on the presence and use of quality HIS data [3,22]. However, lack of quality data and poor usage are affecting the health system's performance and the health of the society. This is evident by frequent over and under stocks of supplies, poor detection and management of outbreaks, and scarcity of human resources at different times [23].
Performance of Routine Information System Management (PRISM) framework categorized the factors that influence the data quality in to three groups; behavioral, technical, and organizational factors [24]. Level of knowledge [25], negligence, data manipulation for competition sake [26], motivation [27], and sense of responsibility [28] are among the behavioral factors associated with the data quality. User-friendliness of reporting format, and standardized indicators [29] are the technical factors affecting the data quality while availability of the training [30,31], feedback [15], supervision [32], and data use [33,34] are grouped under the organizational factors of the data quality.
Although there are studies conducted on the data quality, limited study has been conducted at the department level in this study area to explore the factors affecting the data quality. Moreover, the few studies conducted did not quantify the magnitude of the associations. Therefore, this study was aimed to assess the magnitude of the quality of routine health information system data and its determinants among public health facilities.

Study area and study period
The study was conducted in public health facilities of Harari regional State of Ethiopia from July 1 to 15, 2020. Located 518 km to the East of Addis Ababa, Harari Region is one of the ten regional States in Ethiopia with an estimated area of 311.25 km 2 . Based on the 2007 national census conducted by the Central Statistical Agency of Ethiopia (CSA), Harari Region has a total population of 183,415, and has 9 Districts (6 urban and 3 rural) and 36 kebeles (the smallest administrative units in Ethiopia) [35]. There are seven hospitals in the Harari Region of which one is owned by the Harari Regional Health Bureau while the rest are either other governmental or private hospitals. Among these, the 2 hospitals are governmental public health facilities. There are also 8 public health centers, 32 health posts, 10 not-for-profit private clinics, and 15 private clinics for profit in the Harari Region.

Study design
A facility-based cross-sectional study design was employed.

Study population
The study populations for this study were all departments that were implementing routine health management information systems (HMIS) in all public health facilities of Harari Regional State.

Inclusion and exclusion criteria Inclusion criteria
The study inclusion criterion was all health departments/ units which were implementing routine health information system in the public health facilities of Harari regional state.

Exclusion criteria
Those departments/units that were closed during the data collection period were excluded from the study.

Sample size determination and sampling procedure
The sample size of the study was determined by using a single population proportion formula where n = Sample size, Zα/2 = Standard normal distribution corresponding to a significance level of alpha (α) of 0.05 = 1.96, P = magnitude of the data quality of routine health information system among departments in public health facilities of Dire Dawa (75.3%) [15] and d = degree of precision = 0.05. Accordingly Since the 245 total number of departments was less than 10,000, the correction formula was used and gave n f = 314/1 + (314/245) = 138. However, since the existing departments implementing health information systems were found to be manageable, a census of all (245) departments found in all 42 public health facilities (8 health centers, 32 health posts, and 2 hospitals) was considered.

Data collection instrument
The questionnaire was adapted from the Performance of Routine Information System Management assessment tool version 3.1 (see Additional file 1) [36], and used with little modifications to collect quantitative data. It comprised four sections: The first section was composed of questions related to socio-demographic characteristics of the department heads such as age, educational status, working experiences, professional category, salary, residence, and others. The second and third sections of the questionnaire included items assessing the technical, organizational, and behavioral factors associated with the quality of routine health information system data respectively. Observations, interviews, and document reviews guided by an observation checklist (fourth section of the questionnaire) were used to collect data on the departments' data quality from all the departments through their respective department heads/representative of each department.

Data collection procedures
Twelve health professionals who had basic data management training and prior experience of data collection and four health professionals who were members of the HIS monitoring team were assigned for the data collection and supervision respectively. Before the data collection, n = (1.96) 2 * (0.753) * 0.247 (0.05) 2 + 10%, non -response rate = 314 2 days training was provided on the purpose, how to collect data, and on ethical issues emphasizing the importance of the safety of the participants, and data quality. The data were collected by going to all the health facilities, explaining the aim of the study, ensuring the confidentiality of the data, obtaining the written consent from each facility head and participants, observing and interviewing to fill the checklist, and distributing the questionnaire to the department heads to read and fill the rest.

Study variables Dependent variable
Data quality was the dependent variable of the study.

Independent variables
The independent variables include: Organizational factors training, feedback, supervision, computer, internet, reward, engagement in HIS activities, performance review meeting, and data use, Technical factors presence of standard indicators, report formats, and trained person able to fill format, and.
Behavioral factors motivation, attitude, data manipulation for competition, negligence, sense of responsibility, knowledge, and data quality checking skills.

Operational definitions
Data quality is an assessment of data's fitness to serve its purpose in a given context in-terms of accuracy, completeness and timeliness. For departments reporting on monthly basis, a real time data of 2 months (December 2019 and January 2020) were selected to assess data quality while for departments that make a report on quarterly basis, the first and third quarters of 2012 EFY (Ethiopian Fiscal Year) were selected to assess the data quality.
Poor quality data The data that does not fit the three criteria (accuracy < 80%, or completeness < 85%, or timeliness < 85%).
Completeness is the average of the source document or registration content completeness and report content completeness.

Completenes =
% of register content completeness + % of report completeness 2 The data is complete if the average is ≥ 85% [37].
Register content completeness was measured by dividing the number of completely recorded cases (taking the last 15 cases from the registration of the department for the selected month/quarter) by the total cases checked. If the total cases/entries are less than 15, the available cases are considered.

Report content completeness% =
No. of data elements reported in the report format total number of expected data elements to be reported * 100 [36]. For departments that do not keep the report copy with themselves, it was taken from the HMIS unit. Data Accuracy was measured by recounting already reported data elements/indicators from the source document/register and compared with the one reported in the reporting format. The data elements/indicators for which the verification factor (recounted value from the source document divided by the value reported in the HMIS report) fell between 0.9 and 1.1 were regarded as accurate (have normal verification factor).
The department data is considered accurate if the average is ≥ 80% [32].
Timeliness was assessed as a report submission within the accepted time period through observing the reporting date on the reporting form of two randomly selected monthly reports. Departments at the health posts were expected to report from 20 to 22nd, departments at the health centers and hospitals report to the next level from 20 to 24th. The data of the department is timely if the average is ≥ 85% [37].
Knowledge on HIS: It was the knowledge of rationale of routine HIS data that was measured by using the three knowledge-related open-ended questions which have a total raw score of 7 and for which the answers were coded according to the themes on the PRISM assessment user guide [36]. The 50% mean score was used to classify the knowledge as good or poor.

Data quality control
The pre-test of the questionnaire was done on 12 departments which are found in health facilities outside of the Harari Region to identify any ambiguity, consistency, and acceptability of the questionnaire as well as the time needed to fill the questionnaires. The register content completeness% = no. of completely recorded cases total cases * 100 Accuracy = the sum of accurate data elements(recounted over reported between 0.9 − 1.1) total number of data elements checked * 100 necessary modifications were made before the actual data collection. The quality of data was monitored frequently both in the field and during data entry. This was done in the field through close supervision of the data collectors. All completed questionnaires were examined for completeness and consistency during data collection. An incomplete and unclear filled questionnaire was given back to the study participants immediately.

Data processing and analysis
Data were entered using Epi Data and exported to SPSS software version 25 for data recording, cleaning, and statistical analysis. Descriptive statics using frequencies, percentages, tables, and figures were used to describe the departments in the public health facilities, and the overall data quality was categorized as poor and good data quality. The figures in this study were free from apparent manipulation. Bivariate logistic regression analysis was done to identify variables that were candidates for multivariate analysis. All variables that have an association on bivariate analysis at a liberal P value of < 0.25 were considered for inclusion in the multivariate analysis. Afterwards, multivariate analysis was done to control the confounding effect of other variables and to identify independent predictors of routine health data quality in the health facilities. The magnitude and direction of the relationship between the variables were expressed as odds ratios (OR) with 95% CI and P value < 0.05 was used to declare the statistical significance. Model fitness was checked by using Hosmer-Lemeshow's test at P value of > 0.05. The multicollinearity check was also carried out to detect the multicollinearity problem at a variance inflation factor (VIF) > 10. However, no multicollinearity problem was detected among the study independent variables.

Description of the departments
From the total of 245 departments found in the 42 public health facilities of Harari region, 222 departments participated in the study with a 91% response rate. Among the 222 departments, 103 (46.39%), 82 (36.94%), and 37 (16.67%) were from the health posts, health centers, and hospitals respectively. Further, 42 (18.9%) maternal and child health, 17 (7.7%) Tuberculosis, and 6 (2.7%) Anti-retroviral therapy participated in the study (Table 1).

Socio-demographic characteristics of the department heads
The mean age of the respondents was 31.32 (± 6.226 SD) years with an average work experience of 8.65 (± 5.517 SD) years. About three quarters (74.3%) were females, more than half (51.8%) reside in urban areas, 64.4% were diploma holders, and 40.1% of the department heads were health extension workers ( Table 2).

Organizational factors
In this study, more than three quarters 172 (77.5%) of participants reported to have received supervision, and 33 (14.9%) of them have received refreshment trainings on HIS in the last 6 months (Table 3).

Technical factors
Most of the departments-183 (82.4%) have the standardized indicators, 178 (80.2%) reported that their reporting formats are user-friendly, and more than three quarters 174 (78.4%) have trained personnel able to fill the reporting formats.

Behavioral factors
This study also revealed that majority of the department heads-216 (97.3%) were motivated to do HIS tasks, and more than two third (69.4%) have positive attitude towards HIS activities. In other ways, about 87 (39.2%) reported the presence of data manipulation in their departments, close to one third 70 (31.5%) reported presence of negligence, and only 48 (21.6%) had good knowledge of rationale of routine HIS data ( Table 4).

Level of the data quality Data quality in-terms of accuracy
Among the 222 departments for which the data accuracy was checked, 129 (58.1%) of departments had  accurate data, and the lowest proportion of accuracy (45.6%) was observed in the health posts (Fig. 1).

Data quality in-terms of completeness
Of the 17,589 data elements checked for report content completeness, 16,415 (93%) of the data elements were completely filled in the reporting format. Among the 5230 cases checked for registration content completeness with the relevant information, more than two third (69.6%) of the cases were completely registered on the registration. Overall, this study revealed that about 89 (40%) of the departments have incomplete data whereas the rest 133 (60%) have complete data (Fig. 2).

Data quality in-terms of timeliness
Our study also revealed that from the studied departments, 94 (91.26%) from the health posts, 82 (100%) from the health centers, and 32 (86.48%) departments from the hospitals submitted their report to the next level according to the national schedule. In general, the majority (93.7%) of the study units submitted their report on time while only 14 (6.3%) did not (Fig. 3).

Overall data quality
Of the total departments assessed, 114 (51.35%; 95% CI 44.6-58.1%) departments have good quality data. Moreover, more than one third-40 (38.83%), about two third 54-(65.85%), and more than half-(54.05%) of the departments at the health posts, health centers and hospitals respectively were found to have good quality data (Fig. 4). Among the three data quality dimensions assessed in this study, timeliness of 93.7%, completeness of 60%, and accuracy level of 58.1% were observed among departments in the studied facilities (Fig. 5).  person able to fill reporting formats, internet access, refreshment training, supervision and feedback were associated to the data quality. However, the type of facility, presence of trained person able to fill reporting formats and feed-back were significantly associated to the data quality in both bivariate and multivariate analysis. The departments that were found in the health centers were 2.5 times more likely to have good quality data than the departments found in the health posts (AOR = 2.499; 95% CI 1.059-5.897). The departments that have trained personnel able to fill the formats were 2.5 times more likely to have good quality data as compared to the departments that do not have the trained person (AOR = 2.474; 95% CI 1.124-5.445). The departments that received feed-back were 3 times more likely to have good quality data as compared to the departments that do not (AOR = 3.083; 95% CI 1.549-6.135) ( Table 5).

Discussion
This study provides an insight into the various technical, behavioral, and organizational factors that influence quality of routine health data. The accuracy of data in this study was found to be 129 (58.1%) and it was less than the accuracy of data reported from Hadiya zone, Southern  Ethiopia where seventy six percent of the departments at the health center had accurate data [17] and 79% in Nigeria [38]. The difference might be because of the difference in the type of facilities and level of the feedback provided to the departments in which 95.8% of the departments at Hadiya zone [17] and 61.7% of the departments in this study received the feedback. Also, the interval of verification factor used to measure the data accuracy in Nigeria was wider (0.85-1.15) [38] than the verification factor interval used in this study (0.9-1.1) to measure the data accuracy. Generally, data accuracy may be affected by errors that occur during data entry, intentionally manipulating the data for different reasons possibly competition among the staffs and facilities, false report to increase achievement, and reports not made on time. The study conducted in Tanzania supports some of these explanations; for example, data manipulation can affect the accuracy of data [39].
In this study, the 69.6% registration (source document) content completeness was lower than the 93% report content completeness. This is supported by the recently published study which was conducted in East Wollega where the 78.2% registration content completeness was less than the 86% report content completeness indicating that the health workers focus more on managing patients rather than recording data due to the work load and lack of commitment to the data [40].
The 93.7 percent timeliness of the data revealed in this study was closer to the one reported in the data   21:287 quality review conducted by the Ethiopian public health institute which was 100% data timeliness in Harari Region [18] but higher than the timeliness reported from the other parts of Ethiopia-70% in East Wollega and 89% in West Wollega [41]. The easy accessibility of the health facilities in our study area is the possible explanation for the difference observed.
The result of the study revealed that more than half (51.35%) of the departments implementing routine health information system have good levels of data quality. This is similar with the findings from many developing countries that the data quality falls between 34 and 72% [9]. However, it is lower than the result from the studies conducted in Dire Dawa and Addis Ababa which reported three fourth (75.3%), [15] and 76.22% [21] level of good quality data respectively. This might be because of the difference in the way the dimensions of the data quality were measured. The study conducted in Dire Dawa measured the completeness in-terms of the report completeness only while in this study the completeness was measured in-terms of both the registration content completeness and report content completeness. The difference might also be attributed to the effect of Corona Virus Disease (COVID-19) on the health information system performance including data quality. Because this study was conducted while the COVID-19 was seriously challenging the health system as in general.
The departments that were found in the health centers were 2.5 times more likely to have good quality data compared to those found in the health posts. This is evident by the findings from the pioneering regions of Ethiopia in which the data quality was better at the health centers and hospitals than at the health posts [42]. The low level of education among the staffs at the health posts (all are diploma holder and below), the larger amount of data collected by limited number of health extension workers and lack of HMIS personnel who closely monitor the data quality as compared to the health centers and hospitals are the possible reasons for the variation. It might also be due to the more attention given by the government and other stakeholders such as Capacity Building and Mentorship Program (CBMP), a program supporting the health centers, and hospitals through HMIS trainings and onsite mentorships.
This study found that presence of the trained personnel able to fill the reporting formats, and the provisions of feed-backs were significantly associated with the good quality data. This was supported by the study conducted in Dire Dawa where the presences of trained staffs and feed-back were significantly associated to the data quality with AOR = 2.25; 95% CI 1.082-4.692 and AOR = 2.48; 95% CI 1.262-4.846 respectively [15]. A recent scoping review conducted also showed that the combination of feed-back with the other capacity building activities contribute to the data quality improvement [43]. Training can make clarity on the issues of HIS related activities and tools and increases familiarity with the HIS tools such as registers, reporting formats and information communication technology soft wares.
Although supportive supervision showed association with the data quality on bivariate logistic regression, it was not significantly associated to the data quality on multivariate logistic regression in this study. This was different from the finding of the study conducted in Gurage Zone in which the supervision was associated to the community health information system performance (data quality and use) [32]. The difference might be attributed to the quality of supervision as noted from the study in Tanzania [25]. The other possible justification is that in most practical cases, supervision is just to find fault rather than being supportive supervision. But, it is the targeted supportive supervision which helps the departments to fill their gap in data recording, processing, analyzing, reporting and data quality checking.
The limitation of this study was its inability to show the consistency between the data in the routine health information system and that same data in the real-world since the study addressed only the three dimensions of data quality. Future studies should incorporate qualitative studies to have a deeper insight on the behavioral factors that influence data quality.

Conclusion
The level of good data quality among the departments in the public health facilities of Harari region was less than the 80% expected national level. The refreshment training given to the staff was found to be low. The type of facility, lack of trained personnel able to fill the reporting formats, and the feedback were the factors that significantly associated with the data quality. Continuous refreshment in-service HMIS related training should be arranged and provided by Harari Regional health bureau and other stakeholders to increase the knowledge and skills of the health workers. It is also better for the supervisors at different levels of the Harari region particularly woreda health offices to provide supportive supervision focusing on the data quality and provide feedback to the departments regularly.
Abbreviations HIS: Health information system; HMIS: Health management information system; PRISM: Performance of Routine Information System Management; WHO: World Health Organization.