Performing systematic reviews is a complex and time consuming task, because of the body of literature to be searched and the high number of databases that must be used, considering that no one of them is considered exhaustive. The use of GS is increasing, as well as its coverage, and we wanted to assess if this coverage is high enough to be used alone in systematic reviews.
GS allowed to retrieve 100% of the studies included in the systematic reviews we studied, and which covered many different fields of medicine.
Although GS does not cover all the medical literature, we therefore observed that its coverage of the studies of sufficient quality or relevance to be included in a systematic review was complete. In other words, if the authors of these 29 systematic reviews had used only GS, they would have obtained the very same results.
The validity of our gold standard database could nevertheless be questioned. To identify the studies that worth to be included in a systematic review, we relied on the works of the experts used as reviewer in the systematic reviews we included, since all of them used at least 2 independent reviewers. Furthermore, we excluded from our gold standard database personal communications, because they cannot be retrieved by any database, and abstracts because it has been clearly demonstrated that such abstracts often display non-valid results [21, 22]. Considering the methods used by the authors of the systematic reviews we selected, the use of at least two independent reviewers to select relevant articles in these reviews, the high number of databases searched and the absence of restriction to English studies in each of them, we can also assume that, for each topic covered, all the relevant studies were identified. Therefore, we can assume that our gold-standard database really included all the studies of sufficient quality and relevant to the topics covered by the systematic reviews, and only them.
We chose to study the systematic reviews published by the JAMA and Cochrane because they usually don’t restrict their search to English literature and they use more than one database to perform the search, which is not the case of most of the systematic reviews published by the Annals of Internal Medicine, for example.
Although the recall of GS was 100%, the amount of information delivered by GS was heterogeneous. Yet, some of the studies were only identified as "citations", which means that GS only displayed the authors, the title of the article and the name, year and pages of the journals. This can be considered as insufficient, but traditional biomedical databases (such as Medline or Embase) do the same for old articles or for articles published in another language that English. Furthermore, this is exactly the same situation when authors of systematic reviews perform hand searching in the reference list of selected articles. Therefore, we considered valid to include these hits as positive results.
This 100% coverage of GS can be seen as amazing, since no single database is supposed to be exhaustive, even for good quality studies. For example, the recall ratios of Medline for randomized control trials (RCTs) only stand between 35% and 56% [23, 24]. Since GS accesses only 1 million of the some 15 million records at PubMed, how can our results be explained? In fact, through agreements with publishers, GS accesses the “invisible” or “deep” Web, that is, commercial Web sites the automated “spiders” used by search engines such as Google cannot access. Furthermore, we observed in our study that most of the articles indentified by GS were found directly on the publishing journal web-sites, and not on the PubMed web-site.
Nevertheless, while its advantages are substantial, GS is not without flaws. The shortcomings of the system and its search interface have been well documented in the literature and include lack of reliable advanced search functions (e.g. no MeSH term subheading search function), lack of controlled vocabulary, lack of a “similar pages” feature, and issues regarding scope of coverage and currency [4, 5, 25]. Furthermore, whereas PubMed displays results in a chronological order, GS places more relevance on articles that are cited most often. Therefore, the citations located are reportedly biased toward older literature [26, 27]. This last point can also be viewed as an advantage, since it allows to identify quickly landmark articles, i.e. articles of importance in a field. Yet, when comparing searches with PubMed and Google Scholar by evaluating the first 20 articles recovered for four clinical questions for relevance and quality, Nourbakhsh and coll. demonstrated that GS provided more relevant results that PubMed, although the difference was not significant (p=0.116) .
GS has been reported to be less precise than PubMed, since it retrieves hundreds or thousands of documents, most of them being irrelevant [29, 30]. Nevertheless, we should not overestimate the precision of PubMed in real life since Precision and recall of a search in a database is highly dependent on the skills of the user . Many of them overestimate the quality of their searching performance, and experienced reference librarians typically retrieve about twice as many citations as do less experienced users [31, 32].
Although this was not the purpose of our study, we tried to assess the precision of GS for some of the clinical questions that were studied by the systematic reviews.
For example, searching for "(Erythropoietin or Darbepoetin) cancer" in GS gave a recall of 100% and a precision of 0.1% (36,630 articles found, for 36 included in the systematic review). In GS, the search string "(depression treatment placebo antidepressant) ("general practice" OR "Primary care")" identified 16100 articles, leading to a recall of 100% and a precision of 0.09 (14 articles included in the corresponding systematic review).