BMC Medical Informatics and Decision Making

Background: Low-cost handheld computers (PDA) potentially represent an efficient tool for collecting sensitive data in surveys. The goal of this study is to evaluate the quality of sexual behavior data collected with handheld computers in comparison with paper-based questionnaires.


Background
In the last 20 years, different methodologies have appeared to improve data collection quality in sensitive topics [1]. Sexual behavior is largely determined by social, cultural, religious, moral, and legal norms and constraints [2]. In addition, a complete evaluation of sexual behavior includes knowledge, attitudes, risk behaviors and more, all of which are very difficult to evaluate because individuals tend to deny involvement in socially undesirable behaviors to avoid stigmatization [3]. Social desirability or self-presentation interviewer can affect reports about sexual behaviors as well as other sensitive behaviors. This might change the analysis for non-responses items [4].
Systematic reviews of research in sexual behavior have been published recently. Most publications note that the validity and reliability of data collected by computers depend on variables like age group of participants and the types of sensitive questions [5]. Many studies have been designed to develop methods to maximize the accuracy of reporting risky sexual behaviors for sexually transmitted diseases (STD) and HIV infection in the general population [6]. Although most of these studies have included pen-and-paper self-completed interviews, about 20 years ago, computer-assisted interviewing (CAI) and computerassisted self-interviewing (CASI) appeared as an alternative to paper questionnaires for the collection of reliable information on sensitive behaviors [7][8][9]. Some types of CASI include audio, video, or telephone enhancements [10]. These have been used to assess general risk [11], patient history [12], and a variety of health related data [13][14][15].
Particularly in developing countries, data collection methods are needed that are reliable, inexpensive, and do not require extensive technological expertise [16]. Applicability of portable computers for surveys in the general population could be limited due to the cost of computers, software costs, and the risk of data loss due to mishandling, malfunction or theft. In spite of these difficulties, handheld CASI is emerging as a new tool for collection of risk-behavior data due to its advantages, including portability and energy efficiency [17], reduction on interviewer bias, real time authentication and validity, conditional branching, and minimization of data transcription and transfer errors [18].
The objective of this study is to present two experiences with the use of Personal Digital Assistants (PDA) in CAI and CASI for the collection of sex-related sensitive data from participants of a household based survey, and to compare these data to similar data collected in paper questionnaires.

Study design and setting
Two cross sectional surveys were undertaken in Ancon, a district of Lima, Peru (August 2005 and August 2006). In both surveys, a sample of clusters was selected; then a census of each household in the selected clusters was conducted. Within each household, eligible individuals (male or female, 18-29 years, literate, and in the household at the moment of the interview) were selected. Participants provided verbal informed consent prior to participate, and completed a detailed questionnaire on sexual practices. Participation in both surveys was anonymous.

Definitions
Low educational level was defined as having had no more than a secondary school education. A low income was defined as having a personal monthly income less than or equal to 140 dollars.

Questionnaire characteristics and interview
The questionnaire explored past and current STD symptoms and signs, as well as sexual practices. Topics were approximately 110 closed-ended questions and were filled-in by the participant confidentially.
In the first cross-sectional survey, each participant completed the questionnaire in two formats: paper and PDA. Participants were first asked to complete the paper-based self-applied questionnaire, and then to fold it and put it into a locked voting bag. Then they received a short training session (approximately 2-3 minutes) on the use of the PDA, and completed a PDA-based questionnaire [11]. In the second cross-sectional survey, field workers were assigned to teams of two alphabetically based on their last name. Within each team, the first interviewer conducted the interview with the electronic format while the second interviewer conducted the interview with the paper format. As a result, half of participants answered the PDA questionnaire and the other half responded the paperbased questionnaire.

Program used in handheld computers (PDA-PREVEN)
The PDA software program was built using Open-Source tools and contained the same sequence of questions as the paper format. The GNU Compiler Collection (GCC), a General Public License Free Software application, was used for building Palm OS applications in C and C++ using the cross-compiler libraries and SDK that can be downloaded at the Palmsource website (ACCESS Linux Platform) [19]. The questionnaire structure was built from a Comma-separated value (CSV) file, used by a small application (written in the C++ language) running under the Debian Linux Operating System, to produce a Palm executable application using the aforementioned cross-compiler. Low-cost Palm Zire-31 ® PDAs were used and data and applications were transferred to them using Palm's HotSync program.
The questionnaire contained a set of data entry types (pop-up lists, multi-option answers, one-option answers, etc.). Participants entered data using those types of entry options. They chose answers from a list previously established. Participants did not have to entry text using the pen stylus. Some questions were only asked if the response to a previous question met a predefined rule. Participants were required to select a response prior to moving to the next question. The program also allowed participants to return to previous questions within the same section to modify their answers.
During fieldwork, each handheld computer was inserted into a wooden and Styrofoam clipboard to shield it from possible damage and to conceal it ( Figure 1).

Data management and statistical analysis
All paper questionnaires from both surveys were double entered into a Microsoft Access 2000 template (Microsoft Corporation, Washington, USA), while PDA data was transferred to a computer through a HotSync operation (synchronization), converted into a CSV format using a program based on C, and then reorganized into a single database within Microsoft Visual FoxPro 7.0 (Microsoft Corporation, Washington, USA). Statistical analysis was performed in STATA 8.0 for Windows (STATA Corporation, Texas, USA). A subset of questions from the questionnaire was selected based on their sensitivity for comparisons between the methodologies.
For the first survey, categorical variables were compared using Kappa coefficient analysis while numeric variables were compared using Spearman Rho correlation. Overall agreement (for both categorical and numeric measures) was defined as the number of equivalent responses in both questionnaires divided by the total number of responses. Also, the correlation of variables according to sex and education level of participants was calculated.
For the second survey, the same categorical variables were compared using χ 2 test or Fisher's exact test, while numeric variables were compared using Student's t test. In this case, we also compared the number of missing values, the number of inconsistent responses and the duration of the interview. A missing value was defined as the lack of response, while an inconsistent response was defined as a discordant answer between two related questions. The duration of the interview was evaluated as the time measured between the beginning and the ending of the selfapplied questionnaire.

Study participants
The first survey enrolled 200 participants. Ten pairs of questionnaires (5%) could not be matched because of miscoding, and therefore, 190 self-applied paper and PDA questionnaires were analyzed. Ninety four (49.5%) of the participants were male and the mean sample age was 22.9 (SD: 3.4).
The second survey enrolled 198 participants. Similarly, a total of 98 records were recovered from PDA, while 100 records were attained by the paper format. Ninety nine (50.0%) of the participants were male and the mean age sample was 22.7 (SD: 3.4). Population characteristics of both survey groups are shown in Table 1. An example a clipboard with PDA Figure 1 An example a clipboard with PDA. Photograph shows the form of interviewing through a PDA put into a clipboard.

Evaluation of responses in the first survey
The comparison of the responses to the two formats is shown in Table 2. General agreement between paper and PDA self-applied questionnaires was 86%. Agreement for categorical variables ranged from 70.5% to 98.5%, with Kappa coefficients from 0.43 to 0.86. For numerical variables, agreement varied from 57.1% to 79.8%, with a Spearman's Rho coefficient between 0.76 and 0.95 depending on the question evaluated. Likewise, the comparison between paper and PDA self-applied questionnaires according to sex of participants only demonstrated slight differences between men and women. However, participants with higher education level consistently had better agreement in both categorical and numerical variables than those with less education ( Table 3). Table 4 shows the comparison of responses for the second survey using the same questions evaluated in the first one.

Evaluation of responses in the second survey
It is important to notice that two questions evaluated in this survey ("have you ever had sex with a female sex worker" and "age of first sexual intercourse") had p-values near 0.05. When the number of inconsistencies was evaluated, the mean in the paper format was 1.93 (SD: 1.98), while it was 0.08 (SD: 0.54) in the PDA format (p < 0.0001). Similarly, the mean number of missing values was 0.85 (SD: 1.35) in the paper questionnaire and 0.29 (SD: 1.02) in the PDA format (p = 0.001). Finally, the average time in answering in the paper format was 9.68 (SD: 12.98) minutes, whilst in the PDA format was 7.20 (SD: 9.38) minutes (p = 0.065). However, in spite of rapidness, 6.9% of interviews had to reset the electronic device during the field work.

Discussion
The results of the first survey show an overall kappa coefficient of 0.86 suggesting an almost perfect agreement between PDA and paper responses [20]. This finding supports the utility of PDA-PREVEN for collecting survey data in the field. The correlation was greater for numerical than for nominal variables. In addition, observed agreement for numeric variables had less concordance when the overall number of responses was smaller. Other studies aimed at young populations have found similar results, perhaps due to the willingness of young people to use new technological devices such as computers, PDAs, cell phones, etc [2,3]. Since young Peruvian people are not familiar with the use of handheld computers, rather than desktops and Internet, we decided to conduct a short training session before collecting data. In addition, we conducted the training to recognize the type of possible models of questions and responses, and to avoid PDA screen damage by pressure. Likewise, the high agreement could be explained by the use of a set of questions with a pre-defined menu of alternatives as a part of the program. Besides, the agreement in those who had completed at least high school was higher than those who did not, which could be in accordance with the skill level required to operate electronic devices and the ability to respond to both questionnaires in a consistent manner.
In the second survey, data collected by both techniques were very similar, which is supported by the fact that the statistical analysis found no significant difference between groups. Although the responses to the two aforementioned questions were near to the usual significance level, those were not considered significant after their alpha level was corrected by the Bonferroni's procedure (cut-off  [21,22]. When comparisons were performed to evaluate data accuracy through the number of missing values and inconsistent answers, these were statistically lower in the PDA group. Similar to previous studies, responding the questionnaire in PDA format was about 25% faster than paper format [18,23,24]. How-ever, this difference was not statistically significant. Overall, the PDA avoids inconsistencies during data collection, helps preserve data integrity, and performs at least as well as the paper questionnaire.  In previous studies [25,26], technical malfunction has been described as the main disadvantage with the use of PDA format. In this study, 6.9% of interviews had to reset the electronic device during the field work. We designed our PDA application to have an option to return to the question where the interview was interrupted, which minimized data loss. In general, our results agree with studies using PC-based CASI or audio-CASI for collecting data from general population [2,27], blood donors [28,29], and for surveys on alcohol or drug consumption [11,30]. In a previous study using PDAs conducted by Fletcher [11], agreement attained between both kinds of questionnaires was higher (about 96%). However, the information was collected twice by trained staff members, whereas in our surveys both questionnaires were self-applied and answered by the participants after a short training period. For this study, all questions were closed-ended, which could help explain the high level of correlation. At the same time, our design reflects the actual setting and experience of conducting a field survey.
The major strength of this study is the application of a PDA software program using Open Source tools for collecting data, and two different methodologies to evaluate it, which allows us to develop a low-cost system, tailored more closely to our needs and specifications without the limitations of proprietary systems. To our knowledge, this is the first report that evaluates the usefulness of using a software program built with Open Source tools in a PDA to collect data about sexual behavior in the field in Peru. The first methodology allows us to demonstrate an almost perfect correlation between the two sorts of questionnaires since the same questions were applied twice to the participants, reducing the inter-observant variation. The second methodology allows us to compare the rate of responses, the rate of consistencies, the rate of missing values, and the duration between both sorts of questionnaires, which were not evaluated in the first survey.
Most of the studies with PDAs have used commercial and expensive programs to create data entry forms [1,11]. The use of programs based on Open Source tools has been previously described in rural areas [31] to allow paramedical health workers to view large databases. Using these tools, other authors have developed databases and web-applications for collecting, storing, and querying biological pathway data [32] or managing information in biomedical studies [33,34]. In our case, we needed an application for collecting information rather than simply viewing it.
Notably, during fieldwork we did not lose any PDA, probably due to the ability to conceal them within the clipboard.
Our study has several limitations. One of the most important is that inconsistencies between both questionnaires may be due to non-selective misclassification because of recall problems. Difficulties in remembering information during the interview might have been present even if the participants would have asked to fill out paper-based surveys twice or handheld computer surveys twice. Unfortunately, this issue was not evaluated in the surveys. Later studies should be performed to assess if less recall problems are present using handheld computers versus paperbased questionnaires. Also, some bias could have been introduced in the first survey because all the participants were asked to complete the paper-based before PDA questionnaire. However, we believe that whether the half of participants had firstly responded to the PDA questionnaire, they would not have paid attention to the paper questionnaire or would have left without answers due to the boredom caused by answering the questions twice, which would have been more unfavorable to the paper questionnaire. Another limitation was the small sample size, which did not allow us to compare some questions between groups. Although we found some differences related to education level, agreement and correlation were high in low and high educational level groups.

Conclusion
Handheld computers were useful for collecting information about sexual behavior in young people in Peru. The two surveys administered have demonstrated that it is feasible to develop a low-cost application for handheld computers to collect sexual behavior data. Our study suggests that PDAs are feasible alternatives to paper forms for field data collection in a developing country.