- Research article
- Open Access
- Open Peer Review
Handheld computers for self-administered sensitive data collection: A comparative study in Peru
BMC Medical Informatics and Decision Makingvolume 8, Article number: 11 (2008)
Low-cost handheld computers (PDA) potentially represent an efficient tool for collecting sensitive data in surveys. The goal of this study is to evaluate the quality of sexual behavior data collected with handheld computers in comparison with paper-based questionnaires.
A PDA-based program for data collection was developed using Open-Source tools. In two cross-sectional studies, we compared data concerning sexual behavior collected with paper forms to data collected with PDA-based forms in Ancon (Lima).
The first study enrolled 200 participants (18–29 years). General agreement between data collected with paper format and handheld computers was 86%. Categorical variables agreement was between 70.5% and 98.5% (Kappa: 0.43–0.86) while numeric variables agreement was between 57.1% and 79.8% (Spearman: 0.76–0.95). Agreement and correlation were higher in those who had completed at least high school than those with less education. The second study enrolled 198 participants. Rates of responses to sensitive questions were similar between both kinds of questionnaires. However, the number of inconsistencies (p = 0.0001) and missing values (p = 0.001) were significantly higher in paper questionnaires.
This study showed the value of the use of handheld computers for collecting sensitive data, since a high level of agreement between paper and PDA responses was reached. In addition, a lower number of inconsistencies and missing values were found with the PDA-based system. This study has demonstrated that it is feasible to develop a low-cost application for handheld computers, and that PDAs are feasible alternatives for collecting field data in a developing country.
In the last 20 years, different methodologies have appeared to improve data collection quality in sensitive topics . Sexual behavior is largely determined by social, cultural, religious, moral, and legal norms and constraints . In addition, a complete evaluation of sexual behavior includes knowledge, attitudes, risk behaviors and more, all of which are very difficult to evaluate because individuals tend to deny involvement in socially undesirable behaviors to avoid stigmatization . Social desirability or self-presentation interviewer can affect reports about sexual behaviors as well as other sensitive behaviors. This might change the analysis for non-responses items .
Systematic reviews of research in sexual behavior have been published recently. Most publications note that the validity and reliability of data collected by computers depend on variables like age group of participants and the types of sensitive questions . Many studies have been designed to develop methods to maximize the accuracy of reporting risky sexual behaviors for sexually transmitted diseases (STD) and HIV infection in the general population . Although most of these studies have included pen-and-paper self-completed interviews, about 20 years ago, computer-assisted interviewing (CAI) and computer-assisted self-interviewing (CASI) appeared as an alternative to paper questionnaires for the collection of reliable information on sensitive behaviors [7–9]. Some types of CASI include audio, video, or telephone enhancements . These have been used to assess general risk , patient history , and a variety of health related data [13–15].
Particularly in developing countries, data collection methods are needed that are reliable, inexpensive, and do not require extensive technological expertise . Applicability of portable computers for surveys in the general population could be limited due to the cost of computers, software costs, and the risk of data loss due to mishandling, malfunction or theft. In spite of these difficulties, handheld CASI is emerging as a new tool for collection of risk-behavior data due to its advantages, including portability and energy efficiency , reduction on interviewer bias, real time authentication and validity, conditional branching, and minimization of data transcription and transfer errors .
The objective of this study is to present two experiences with the use of Personal Digital Assistants (PDA) in CAI and CASI for the collection of sex-related sensitive data from participants of a household based survey, and to compare these data to similar data collected in paper questionnaires.
Study design and setting
Two cross sectional surveys were undertaken in Ancon, a district of Lima, Peru (August 2005 and August 2006). In both surveys, a sample of clusters was selected; then a census of each household in the selected clusters was conducted. Within each household, eligible individuals (male or female, 18–29 years, literate, and in the household at the moment of the interview) were selected. Participants provided verbal informed consent prior to participate, and completed a detailed questionnaire on sexual practices. Participation in both surveys was anonymous.
Low educational level was defined as having had no more than a secondary school education. A low income was defined as having a personal monthly income less than or equal to 140 dollars.
Questionnaire characteristics and interview
The questionnaire explored past and current STD symptoms and signs, as well as sexual practices. Topics were approximately 110 closed-ended questions and were filled-in by the participant confidentially.
In the first cross-sectional survey, each participant completed the questionnaire in two formats: paper and PDA. Participants were first asked to complete the paper-based self-applied questionnaire, and then to fold it and put it into a locked voting bag. Then they received a short training session (approximately 2–3 minutes) on the use of the PDA, and completed a PDA-based questionnaire . In the second cross-sectional survey, field workers were assigned to teams of two alphabetically based on their last name. Within each team, the first interviewer conducted the interview with the electronic format while the second interviewer conducted the interview with the paper format. As a result, half of participants answered the PDA questionnaire and the other half responded the paper-based questionnaire.
Program used in handheld computers (PDA-PREVEN)
The PDA software program was built using Open-Source tools and contained the same sequence of questions as the paper format. The GNU Compiler Collection (GCC), a General Public License Free Software application, was used for building Palm OS applications in C and C++ using the cross-compiler libraries and SDK that can be downloaded at the Palmsource website (ACCESS Linux Platform) . The questionnaire structure was built from a Comma-separated value (CSV) file, used by a small application (written in the C++ language) running under the Debian Linux Operating System, to produce a Palm executable application using the aforementioned cross-compiler. Low-cost Palm Zire-31® PDAs were used and data and applications were transferred to them using Palm's HotSync program.
The questionnaire contained a set of data entry types (pop-up lists, multi-option answers, one-option answers, etc.). Participants entered data using those types of entry options. They chose answers from a list previously established. Participants did not have to entry text using the pen stylus. Some questions were only asked if the response to a previous question met a predefined rule. Participants were required to select a response prior to moving to the next question. The program also allowed participants to return to previous questions within the same section to modify their answers.
During fieldwork, each handheld computer was inserted into a wooden and Styrofoam clipboard to shield it from possible damage and to conceal it (Figure 1).
Data management and statistical analysis
All paper questionnaires from both surveys were double entered into a Microsoft Access 2000 template (Microsoft Corporation, Washington, USA), while PDA data was transferred to a computer through a HotSync operation (synchronization), converted into a CSV format using a program based on C, and then reorganized into a single database within Microsoft Visual FoxPro 7.0 (Microsoft Corporation, Washington, USA). Statistical analysis was performed in STATA 8.0 for Windows (STATA Corporation, Texas, USA). A subset of questions from the questionnaire was selected based on their sensitivity for comparisons between the methodologies.
For the first survey, categorical variables were compared using Kappa coefficient analysis while numeric variables were compared using Spearman Rho correlation. Overall agreement (for both categorical and numeric measures) was defined as the number of equivalent responses in both questionnaires divided by the total number of responses. Also, the correlation of variables according to sex and education level of participants was calculated.
For the second survey, the same categorical variables were compared using χ2 test or Fisher's exact test, while numeric variables were compared using Student's t test. In this case, we also compared the number of missing values, the number of inconsistent responses and the duration of the interview. A missing value was defined as the lack of response, while an inconsistent response was defined as a discordant answer between two related questions. The duration of the interview was evaluated as the time measured between the beginning and the ending of the self-applied questionnaire.
The first survey enrolled 200 participants. Ten pairs of questionnaires (5%) could not be matched because of miscoding, and therefore, 190 self-applied paper and PDA questionnaires were analyzed. Ninety four (49.5%) of the participants were male and the mean sample age was 22.9 (SD: 3.4).
The second survey enrolled 198 participants. Similarly, a total of 98 records were recovered from PDA, while 100 records were attained by the paper format. Ninety nine (50.0%) of the participants were male and the mean age sample was 22.7 (SD: 3.4). Population characteristics of both survey groups are shown in Table 1.
Evaluation of responses in the first survey
The comparison of the responses to the two formats is shown in Table 2. General agreement between paper and PDA self-applied questionnaires was 86%. Agreement for categorical variables ranged from 70.5% to 98.5%, with Kappa coefficients from 0.43 to 0.86. For numerical variables, agreement varied from 57.1% to 79.8%, with a Spearman's Rho coefficient between 0.76 and 0.95 depending on the question evaluated. Likewise, the comparison between paper and PDA self-applied questionnaires according to sex of participants only demonstrated slight differences between men and women. However, participants with higher education level consistently had better agreement in both categorical and numerical variables than those with less education (Table 3).
Evaluation of responses in the second survey
Table 4 shows the comparison of responses for the second survey using the same questions evaluated in the first one. It is important to notice that two questions evaluated in this survey ("have you ever had sex with a female sex worker" and "age of first sexual intercourse") had p-values near 0.05. When the number of inconsistencies was evaluated, the mean in the paper format was 1.93 (SD: 1.98), while it was 0.08 (SD: 0.54) in the PDA format (p < 0.0001). Similarly, the mean number of missing values was 0.85 (SD: 1.35) in the paper questionnaire and 0.29 (SD: 1.02) in the PDA format (p = 0.001). Finally, the average time in answering in the paper format was 9.68 (SD: 12.98) minutes, whilst in the PDA format was 7.20 (SD: 9.38) minutes (p = 0.065). However, in spite of rapidness, 6.9% of interviews had to reset the electronic device during the field work.
The results of the first survey show an overall kappa coefficient of 0.86 suggesting an almost perfect agreement between PDA and paper responses . This finding supports the utility of PDA-PREVEN for collecting survey data in the field. The correlation was greater for numerical than for nominal variables. In addition, observed agreement for numeric variables had less concordance when the overall number of responses was smaller. Other studies aimed at young populations have found similar results, perhaps due to the willingness of young people to use new technological devices such as computers, PDAs, cell phones, etc [2, 3]. Since young Peruvian people are not familiar with the use of handheld computers, rather than desktops and Internet, we decided to conduct a short training session before collecting data. In addition, we conducted the training to recognize the type of possible models of questions and responses, and to avoid PDA screen damage by pressure. Likewise, the high agreement could be explained by the use of a set of questions with a pre-defined menu of alternatives as a part of the program. Besides, the agreement in those who had completed at least high school was higher than those who did not, which could be in accordance with the skill level required to operate electronic devices and the ability to respond to both questionnaires in a consistent manner.
In the second survey, data collected by both techniques were very similar, which is supported by the fact that the statistical analysis found no significant difference between groups. Although the responses to the two aforementioned questions were near to the usual significance level, those were not considered significant after their alpha level was corrected by the Bonferroni's procedure (cut-off for 15 comparisons: 0.003) [21, 22]. When comparisons were performed to evaluate data accuracy through the number of missing values and inconsistent answers, these were statistically lower in the PDA group. Similar to previous studies, responding the questionnaire in PDA format was about 25% faster than paper format [18, 23, 24]. However, this difference was not statistically significant. Overall, the PDA avoids inconsistencies during data collection, helps preserve data integrity, and performs at least as well as the paper questionnaire.
In previous studies [25, 26], technical malfunction has been described as the main disadvantage with the use of PDA format. In this study, 6.9% of interviews had to reset the electronic device during the field work. We designed our PDA application to have an option to return to the question where the interview was interrupted, which minimized data loss.
In general, our results agree with studies using PC-based CASI or audio-CASI for collecting data from general population [2, 27], blood donors [28, 29], and for surveys on alcohol or drug consumption [11, 30]. In a previous study using PDAs conducted by Fletcher , agreement attained between both kinds of questionnaires was higher (about 96%). However, the information was collected twice by trained staff members, whereas in our surveys both questionnaires were self-applied and answered by the participants after a short training period. For this study, all questions were closed-ended, which could help explain the high level of correlation. At the same time, our design reflects the actual setting and experience of conducting a field survey.
The major strength of this study is the application of a PDA software program using Open Source tools for collecting data, and two different methodologies to evaluate it, which allows us to develop a low-cost system, tailored more closely to our needs and specifications without the limitations of proprietary systems. To our knowledge, this is the first report that evaluates the usefulness of using a software program built with Open Source tools in a PDA to collect data about sexual behavior in the field in Peru. The first methodology allows us to demonstrate an almost perfect correlation between the two sorts of questionnaires since the same questions were applied twice to the participants, reducing the inter-observant variation. The second methodology allows us to compare the rate of responses, the rate of consistencies, the rate of missing values, and the duration between both sorts of questionnaires, which were not evaluated in the first survey.
Most of the studies with PDAs have used commercial and expensive programs to create data entry forms [1, 11]. The use of programs based on Open Source tools has been previously described in rural areas  to allow paramedical health workers to view large databases. Using these tools, other authors have developed databases and web-applications for collecting, storing, and querying biological pathway data  or managing information in biomedical studies [33, 34]. In our case, we needed an application for collecting information rather than simply viewing it. Notably, during fieldwork we did not lose any PDA, probably due to the ability to conceal them within the clipboard.
Our study has several limitations. One of the most important is that inconsistencies between both questionnaires may be due to non-selective misclassification because of recall problems. Difficulties in remembering information during the interview might have been present even if the participants would have asked to fill out paper-based surveys twice or handheld computer surveys twice. Unfortunately, this issue was not evaluated in the surveys. Later studies should be performed to assess if less recall problems are present using handheld computers versus paper-based questionnaires. Also, some bias could have been introduced in the first survey because all the participants were asked to complete the paper-based before PDA questionnaire. However, we believe that whether the half of participants had firstly responded to the PDA questionnaire, they would not have paid attention to the paper questionnaire or would have left without answers due to the boredom caused by answering the questions twice, which would have been more unfavorable to the paper questionnaire. Another limitation was the small sample size, which did not allow us to compare some questions between groups. Although we found some differences related to education level, agreement and correlation were high in low and high educational level groups.
Handheld computers were useful for collecting information about sexual behavior in young people in Peru. The two surveys administered have demonstrated that it is feasible to develop a low-cost application for handheld computers to collect sexual behavior data. Our study suggests that PDAs are feasible alternatives to paper forms for field data collection in a developing country.
Bobula JA, Anderson LS, Riesch SK, Canty-Mitchell J, Duncan A, Kaiser-Krueger HA, Brown RL, Angresano N: Enhancing survey data collection among youth and adults: use of handheld and laptop computers. Comput Inform Nurs. 2004, 22 (5): 255-265. 10.1097/00024665-200409000-00004.
Fenton KA, Johnson AM, McManus S, Erens B: Measuring sexual behaviour: methodological challenges in survey research. Sex Transm Infect. 2001, 77 (2): 84-92. 10.1136/sti.77.2.84.
Ghanem KG, Hutton HE, Zenilman JM, Zimba R, Erbelding EJ: Audio computer assisted self interview and face to face interview modes in assessing response bias among STD clinic patients. Sex Transm Infect. 2005, 81 (5): 421-425. 10.1136/sti.2004.013193.
Kurth AE, Martin DP, Golden MR, Weiss NS, Heagerty PJ, Spielberg F, Handsfield HH, Holmes KK: A comparison between audio computer-assisted self-interviews and clinician interviews for obtaining the sexual history. Sex Transm Dis. 2004, 31 (12): 719-726. 10.1097/01.olq.0000145855.36181.13.
Paperny DM, Aono JY, Lehman RM, Hammar SL, Risser J: Computer-assisted detection and intervention in adolescent high-risk health behaviors. J Pediatr. 1990, 116 (3): 456-462. 10.1016/S0022-3476(05)82844-6.
Turner CF, Ku L, Rogers SM, Lindberg LD, Pleck JH, Sonenstein FL: Adolescent sexual behavior, drug use, and violence: increased reporting with computer survey technology. Science. 1998, 280 (5365): 867-873. 10.1126/science.280.5365.867.
Lessler JT, O'Reilly JM: Mode of interview and reporting of sensitive issues: design and implementation of audio computer-assisted self-interviewing. NIDA Res Monogr. 1997, 167: 366-382.
Schneider DJ, Taylor EL, Prater LM, Wright MP: Risk assessment for HIV infection: validation study of a computer-assisted preliminary screen. AIDS Educ Prev. 1991, 3 (3): 215-229.
Simoes AM, Bastos FI: [Audio Computer-Assisted Interview: a new technology in the assessment of sexually transmitted diseases, HIV, and drug use]. Cad Saude Publica. 2004, 20 (5): 1169-1181.
Lau JT, Tsui HY, Wang QS: Effects of two telephone survey methods on the level of reported risk behaviours. Sex Transm Infect. 2003, 79 (4): 325-331. 10.1136/sti.79.4.325.
Fletcher LA, Erickson DJ, Toomey TL, Wagenaar AC: Handheld computers. A feasible alternative to paper forms for field data collection. Eval Rev. 2003, 27 (2): 165-178. 10.1177/0193841X02250527.
Gribble JN, Miller HG, Cooley PC, Catania JA, Pollack L, Turner CF: The impact of T-ACASI interviewing on reported drug use among men who have sex with men. Subst Use Misuse. 2000, 35 (6–8): 869-890.
Appel PW, Piculell R, Jansky HK, Griffy K: Assessing alcohol and other drug problems (AOD) among sexually transmitted disease (STD) clinic patients with a modified CAGE-A: implications for AOD intervention services and STD prevention. Am J Drug Alcohol Abuse. 2006, 32 (2): 225-236. 10.1080/00952990500479555.
Buck DS, Rochon D, Turley JP: Taking it to the streets: recording medical outreach data on personal digital assistants. Comput Inform Nurs. 2005, 23 (5): 250-255. 10.1097/00024665-200509000-00008.
Murphy DA, Durako S, Muenz LR, Wilson CM: Marijuana use among HIV-positive and high-risk adolescents: a comparison of self-report through audio computer-assisted self-administered interviewing and urinalysis. Am J Epidemiol. 2000, 152 (9): 805-813. 10.1093/aje/152.9.805.
Curioso WH: New Technologies and Public Health in Developing Countries: The Cell PREVEN Project. Internet and Health Care: Theory, Research. Edited by: Murero M, Rice R. 2006, New Jersey: Lawrence Erlbaum Associates, 375-393. [http://www.voxiva.net/resources/curioso.asp]
Weber BA, Roberts BL: Data collection using handheld computers. Nurs Res. 2000, 49 (3): 173-175. 10.1097/00006199-200005000-00010.
Lane SJ, Heddle NM, Arnold E, Walker I: A review of randomized controlled trials comparing the effectiveness of hand held computers with paper methods for data collection. BMC Med Inform Decis Mak. 2006, 6: 23-10.1186/1472-6947-6-23.
ACCESS Linux Platform. [http://www.access-company.com/home.html]
Landis JR, Koch GG: The measurement of observer agreement for categorical data. Biometrics. 1977, 33: 159-174. 10.2307/2529310.
Perneger TV: What's wrong with Bonferroni adjustments. BMJ. 1998, 316 (7139): 1236-1238.
Sankoh AJ, Huque MF, Dubey SD: Some comments on frequently used multiple endpoint adjustment methods in clinical trials. Stat Med. 1997, 16 (22): 2529-2542. 10.1002/(SICI)1097-0258(19971130)16:22<2529::AID-SIM692>3.0.CO;2-J.
Lal SO, Smith FW, Davis JP, Castro HY, Smith DW, Chinkes DL, Barrow RE: Palm computer demonstrates a fast and accurate means of burn data collection. J Burn Care Rehabil. 2000, 21 (6): 559-561. discussion 558
Walker I, Sigouin C, Sek J, Almonte T, Carruthers J, Chan A, Pai M, Heddle N: Comparing hand-held computers and paper diaries for haemophilia home therapy: a randomized trial. Haemophilia. 2004, 10 (6): 698-704. 10.1111/j.1365-2516.2004.01046.x.
Dale O, Hagen KB: Despite technical problems personal digital assistants outperform pen and paper when collecting patient diary data. J Clin Epidemiol. 2007, 60 (1): 8-17. 10.1016/j.jclinepi.2006.04.005.
Shelby-James TM, Abernethy AP, McAlindon A, Currow DC: Handheld computers for data entry: high tech has its problems too. Trials. 2007, 8: 5-10.1186/1745-6215-8-5.
Jones R: Survey data collection using Audio Computer Assisted Self-Interview. West J Nurs Res. 2003, 25 (3): 349-358. 10.1177/0193945902250423.
Sanchez AM, Schreiber GB, Glynn SA, Bethel J, Kessler D, Chang D, Zuck TF: Blood-donor perceptions of health history screening with a computer-assisted self-administered interview. Transfusion. 2003, 43 (2): 165-172. 10.1046/j.1537-2995.2003.00295.x.
Sellors JW, Hayward R, Swanson G, Ali A, Haynes RB, Bourque R, Moore KA, Lohfeld L, Dalby D, Howard M: Comparison of deferral rates using a computerized versus written blood donor questionnaire: a randomized, cross-over study [ISRCTN84429599]. BMC Public Health. 2002, 2: 14-10.1186/1471-2458-2-14.
van Griensven F, Supawitkul S, Kilmarx PH, Limpakarnjanarat K, Young NL, Manopaiboon C, Mock PA, Korattana S, Mastro TD: Rapid assessment of sexual behavior, drug use, human immunodeficiency virus, and sexually transmitted diseases in northern thai youth using audio-computer-assisted self-interviewing and noninvasive specimen collection. Pediatrics. 2001, 108 (1): E13-10.1542/peds.108.1.e13.
Anantraman V, Mikkelsen T, Khilnani R, Kumar VS, Pentland A, Ohno-Machado L: Open source handheld-based EMR for paramedics working in rural areas. Proc AMIA Symp. 2002, 12-16.
Cerami EG, Bader GD, Gross BE, Sander C: cPath: open source software for collecting, storing, and querying biological pathways. BMC Bioinformatics. 2006, 7: 497-10.1186/1471-2105-7-497.
Blaya J, Fraser HS: Development, Implementation and Preliminary Study of a PDA-based tuberculosis result collection system. AMIA Annu Symp Proc. 2006, 41-45.
Viksna J, Celms E, Opmanis M, Podnieks K, Rucevskis P, Zarins A, Barrett A, Neogi SG, Krestyaninova M, McCarthy MI: PASSIM–an open source software system for managing information in biomedical studies. BMC Bioinformatics. 2007, 8: 52-10.1186/1471-2105-8-52.
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1472-6947/8/11/prepub
Supported by the Joint International Infectious Disease Initiative of the Wellcome Trust and the Burroughs-Wellcome Foundation (059131/Z/99/A), by the University of Washington Center for AIDS Research Grant AI 27757, and STI-TM Cooperation Research Center AI by the NIH Fogarty International Center AIDS International Training and Research Program Grant D43-TW00007, by the Comprehensive International Program of Research on AIDS Grant 5U19AI053218, and by the Global Health Peru Program at UPCH, a Fogarty International Center/NIH funded grant (5R25TW007490).
The author(s) declare that they have no competing interests.
AB, WHC and MAG conceived the idea. AB and MAG drafted the paper. MAG analyzed the results. CPC contributed his expertise in epidemiological studies and participated in the design of the study. JMC and WE contributed their expertise in Open Source technology and PDA use. JPH contributed his expertise in statistical analysis. PJG, GPG, and KKH are senior authors who conceived the overall idea and guided the progress of this manuscript. All the authors read and approved the final manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.