Physicians at our university hospital, searching for patient-centred problems in PubMed, do not differ much from the general public using search engines such as Google. They make very simple queries, containing two to three terms on average. In consequence, many queries yield a list of more than 161 articles, which are not further evaluated for relevance. The use of PubMed search tools was very limited and the performance of these tools was comparable to the use of more than three terms in a query.
Our participants used two to three terms on average. Previous research found three terms ; this difference may be because we did not count Boolean operators as terms, unlike the authors of . As all searches are connected to patient-related problems we expected the queries to contain more terms to describe the question more adequately. Another reason for expecting more terms in a query is that general questions are relatively easy to find in information sources containing aggregated data, such as evidence-based textbooks. Physicians are therefore advised to use reviews and studies as consecutive last steps in the search process when other sources cannot provide an answer . This makes it unlikely that the questions that were looked up in PubMed were general in nature. The more likely reason for lack of detail is that despite all recommendations for constructing proper queries in evidence-based medicine [5, 6], physicians do not take the time to construct such queries. A study by Ely et al showed that physicians could not answer 41% of pursued questions. Analysis of unanswered questions showed that it was possible to answer a proportion of unanswered questions if queries were reformulated, better describing the question[17, 18]. It has been shown that training courses in evidence-based practice improve search skills considerably [19, 20]. Our results show that term count and number of retrieved articles in the query result have independent effects. If using more terms only reduced the number of irrelevant articles, then term count should not have an independent effect. Using more terms related to a question must therefore also increase the number of relevant articles. This is most likely to be related to a more precise description of the question. Although the percentage of queries yielding no articles rises slowly with the use of more terms, it does not have a negative effect on abstract selection up to at least 6 terms. Physicians should therefore be urged to use enough terms, describing the question accurately, and should not fear that this will yield too few articles. As our population is familiar with evidence-based searching, the question is why they do not use advanced search methods. One possible reason is that search tools are not on the main page of our portal and PubMed but require navigation to special search sections. As truly effective tools are likely to be used even when they are difficult to locate, this may not be a valid argument. Another reason might be that participants do not use the PubMed search tools effectively. Our participants selected fewer abstracts with search tools than with the use of four or five terms, and this might be related to improper use of the search tools. Tools that are effective in laboratory situations but are difficult to use properly during daily medical practice are inefficient for this type of search and should not be advocated for use initially. A final reason might be that other search engines do not require the use of advanced search methods and physicians try to search in the way most familiar to them. Examples of such search engines, delivering ranked results, are Google, Google Scholar and Relemed . Because these search engines perform relevance ranking they can be used effectively with natural language queries. The relative ease of Google searching has led to a publication advocating the use of Google to help solve patient-related diagnostic problems . The question is whether physicians should be taught to use these search engines or to use better search techniques in PubMed. One argument against Google is that there are several fundamental issues regarding the reliability of the information retrieved and the validity of the ranking method . More importantly, formulating accurate clinical questions and translating them into well formed queries, with or without the use of additional search tools, is likely to increase the accuracy of the search result regardless of the search engine used.
PICO as a method to improve a query
One method for translating clinical questions into accurate queries is the PICO method. This method can help to build adequate queries regarding patient-related problems [5, 6, 24, 25]. In the PICO method the physician is instructed to describe the patient-related problem in three to four concepts (Patient characteristics, Intervention, Comparison and Outcome). This technique was designed for questions regarding therapy but can be adapted to questions about diagnosis. Using the PICO formulation is likely to result in better queries, limiting the number of results. Although the majority of questions posed by clinicians are related to treatment and diagnosis that can be translated into PICO, many clinical questions cannot be translated into PICO. For example, questions regarding prognosis, the etiology of a disease, economic consequences, biochemical compounds, physiological principles, pathology, genetics and complications are difficult to translate. This is one of the limitations of PICO. Herskovic et al. stated that educators and PubMed user interface researchers should not focus on specific topics, but on overall efficient use of the system . It is not feasible or practical to create versions of PICO adapted for all possible medical questions. As PICO is a method to break down a question into several concepts it might be useful to break down the question into several concepts regardless of the topic. We show that creating a PubMed query using four or five relevant terms is a good option to start with, regardless of the search topic. Using search tools may increase the search results further but we could not prove this because of the limited use of advanced search tools.
Abstract selection in relation to query evaluation, retrieved articles and terms
The number of articles retrieved by a query showed a nearly logarithmic distribution, comparable to previous results. The fact that rarely more than the first ten results were evaluated is an important finding. Previous research has shown that searchers seldom view more than 20 results when using search engines with relevance ranking. Because such engines are likely to display the most relevant results on the first page, this can be a reasonable strategy. PubMed, however, does not perform relevance ranking, but by default displays the articles roughly by publication date in PubMed, beginning with the most recent. It is also possible to sort articles by author, actual publication date, journal and title but not according to relevance to the query. The chance of finding a relevant abstract within a list of several hundreds of article titles sorted by publication date, when only a fraction of the result is reviewed, is very low. Given the number of articles viewed on average by our population, the percentage of queries resulting in abstract selection started to decline rapidly with queries yielding more than 161 articles. The number of articles retrieved by a query is influenced by the number of terms used. Although using more relevant terms will usually result in a more accurate search result, using more terms increases the risk that the query will yield no results or no relevant results. The decline in the number of abstracts viewed when more than 5 terms are used can be explained by this phenomenon. The question is whether the fact that 4 or 5 terms in a query are optimal can be wholly attributed to the number of articles retrieved by a query. As both term count and number of articles retrieved affect the viewing of abstracts, one factor cannot be attributed entirely to the other.
The query in relation to the search process
We investigated single queries, but the entire search process usually consists of sequential steps that should lead to an answer. After a PubMed query retrieves a set of articles the searcher may choose to evaluate a certain percentage of the abstracts and full-texts, but may also decide to refine the query. If the result is too large the query may be refined using hedges or more terms. If the result is too small the searcher may choose to remove terms that are too specific or expand terms with the "or" operator. The effects of these different measures are difficult to predict, especially if several options are combined. It is not surprising that using more relevant terms in a query will lead to fewer articles in the result, increasing the chance of article evaluation. The fact that four or five terms were optimal and fewer than 161 articles were optimal was an important finding. A previous study, describing the implementation of a Medline search tool for handhelds in a clinical setting, reported optimal values for term count and retrieved articles comparable to our results . Knowing the optimal values can help in the design of search interfaces that promote the use of multiple terms in a query and the use of search tools, but can also aim for an optimal number of retrieved articles. Presenting the first ten unsorted results of several thousand articles is not useful for searching physicians. Analysis of queries that did not retrieve a sensible number of articles can help to guide the physician to increase the accuracy of the query, thus increasing the chance of retrieving a reasonable number of articles.
We observed Dutch physicians. As English is not their native language they may have used erroneous terms, which is likely to result in more queries with no articles in the result.
A possible source of error is that PubMed is our default database for searching. If a physician entered a query for UpToDate but forgot to select UpToDate as the search database, the query was sent to PubMed. Sending a query containing one term to other databases is usually sufficient, so the number of single term queries sent to PubMed might have been overestimated.
Our observation that the effect of using Mesh and limits is comparable to that of using adequate terms in a query is consistent with previous research [27, 28].
We treated all queries as single entities and did not focus on the process of refining them. There is no way that a previous query can influence the articles retrieved by the next, so it cannot influence the next query result. Article selection might depend on experience from previous queries. Articles that were scanned in the first query will not be scanned in the second regardless of relevance to the question, so selection of articles in previous queries is not likely to result in bias.
Because we have observed natural behaviour by physicians in a very specific setting, our results are likely to be influenced by many factors and different ones may be obtained in different settings, limiting their generalizability.