Skip to main content

HealthRecSys: A semantic content-based recommender system to complement health videos



The Internet, and its popularity, continues to grow at an unprecedented pace. Watching videos online is very popular; it is estimated that 500 h of video are uploaded onto YouTube, a video-sharing service, every minute and that, by 2019, video formats will comprise more than 80% of Internet traffic. Health-related videos are very popular on YouTube, but their quality is always a matter of concern. One approach to enhancing the quality of online videos is to provide additional educational health content, such as websites, to support health consumers. This study investigates the feasibility of building a content-based recommender system that links health consumers to reputable health educational websites from MedlinePlus for a given health video from YouTube.


The dataset for this study includes a collection of health-related videos and their available metadata. Semantic technologies (such as SNOMED-CT and Bio-ontology) were used to recommend health websites from MedlinePlus. A total of 26 healths professionals participated in evaluating 253 recommended links for a total of 53 videos about general health, hypertension, or diabetes. The relevance of the recommended health websites from MedlinePlus to the videos was measured using information retrieval metrics such as the normalized discounted cumulative gain and precision at K.


The majority of websites recommended by our system for health videos were relevant, based on ratings by health professionals. The normalized discounted cumulative gain was between 46% and 90% for the different topics.


Our study demonstrates the feasibility of using a semantic content-based recommender system to enrich YouTube health videos. Evaluation with end-users, in addition to healthcare professionals, will be required to identify the acceptance of these recommendations in a nonsimulated information-seeking context.

Peer Review reports


Recent studies have shown an increasing trend in the use of the Internet as a search tool for health-related information [1,2,3]. Web 2.0 [4] allows contributions from any user in a network, which has given rise to a wealth of health-related information with a wide range of co-existing trustworthy sources [5, 6]. For this reason, screening tools can assist users in selecting relevant information.

Recommender systems are among the many solutions used to obtain valid information. When searching for an item, users obtain a list of recommended results that may match their preferences. Various filtering methods make it possible to refine and tailor these recommendations [7, 8]. Recommender systems can be divided into three basic groups: collaborative, context-based, and hybrid systems. Collaborative systems build on experience gathered from previous user experiences, i.e., items previously chosen by other users shape future results [9]. Context-based systems focus on the characteristics of an item, i.e., when searching for a camera, the recommendation output is based on its resolution, price, and color. Hybrid recommender systems combine features of context-based and collaborative systems [10]. Recommender systems can be also used to give additional item recommendations for a given item, such as the related videos that are shown by YouTube next to the user’s current video. These recommendations often relay user ratings, but can also be based on knowledge-based systems.

Recommender systems have been used in several applications for finding accurate information. They were introduced as a computer-based intelligent technique that assists people with the problem of information overload. These systems provide personalized solutions in various specific domains [11,12,13]. Recommender systems reflect the user’s interest and make proper personalized recommendation through several methods. Most current systems have adopted recently developed algorithms that use machine-learning [14,15,16], naive Bayes [16, 17], social-trust-based [18,19,20,21], constraint-based [22], case-based [23, 24], and matrix factorization [25, 26] approaches. Recommender systems are also found in clinical settings, mainly to assist health professionals, though some systems assist family members, patients, or caregivers [27,28,29].

Recent advancements in online recommender systems are enhanced by the “Semantic Web” [30], which allows for the extraction of vast amounts of information through metadata mining and artificial intelligence techniques [31]. Using these techniques, it is possible to rank and classify items based on terms that encompass several properties grouped into ontologies [32]. In the life sciences, ontologies play an important role in filtering relevant item and creating knowledge-based systems. Knowledge-based, cased-based, and social-trust-based approaches utilize user metadata, such as age and gender, to define recommendation rules. Machine-learning and naïve Bayes methods create models to learn users’ interests from their historical behavior. Matrix factorization learns a user’s latest interests by collaboratively factoring the rating matrix over historically recorded user-item preferences.

Health terms are also grouped into ontologies, creating an important potential resource for many applications, including recommender systems. Health ontologies usually have an application-programming interface (API) to precisely define their operation. One example of an APIFootnote 1 is Bio-ontology,Footnote 2 which contains more than 600 health-related ontologies. Using Bio-ontology, Rivero-Rodriguez et al. recommended relevant links for a subset of health-related YouTube videos [33] by extracting corresponding clinical terms from the Medline Plus API for the International Health Terminology Standards Development Organization, which maintains SNOMED-CT, a multilingual clinical healthcare ontology.Footnote 3

Our previous work

This study is based on our previous work. Fernandez-Luque et al. reused algorithms from [33], but added the Bio-ontology API to improve the results for obtaining links from Medline Plus. In this study, we also rely on diabetes videos [34] for which we have already explored the use of semantic technologies to provide additional content recommendations [35]. Based on [33, 34], the proposed method gathers recommendations for Medline Plus links (see Fig. 1) from video subtitles to increase the number of associated terms using health ontologies. An additional movie file shows this in more detail [see Additional file 1]. An important limitation, both in the current and previous recommender systems, stems from the difficulty of mapping suitable terms to the ontology, especially when extracting representative terms from video content. One interesting approach to this problem uses natural language processing (NLP) [36,37,38] techniques, which can combine syntactic, semantic, and contextual analyses. NLP has previously been used in healthcare [39, 40], especially for mining electronic health records [41].

Fig. 1
figure 1

HealthRecSys Extraction of Medical Terms for Videos. Structure and logic of the extraction of medical terms and Medline Plus links for diabetes videos

HealthRecSys Study Overview. Video describing the HealthRecSys algorithm and the results of the study.


In online browsing, it is common to search for content related to online material currently being viewed. For example, after watching a video on YouTube, the watcher might look for additional content as part of an information seeking strategy. This search strategy has led to the creation of recommender systems that provide recommendations for related content. In this study, we explore the feasibility of recommending links to health educational content as a supplement to online health videos, focusing on recommendation methods that use semantic-based technologies to enhance online health content recommender systems. Further, this study investigates website recommendations that will enhance health videos, because video formats have shown the fastest growth on the Internet and it is estimated that, by 2019, video will constitute more than 80% of Internet traffic. Footnote 4


In this study, we introduce HealthRecSys, a recommender system with Bio-ontology terms that generates Medline Plus links from text extracted from the metadata of selected YouTube videos (see Fig. 1).

Our recommender system involves several steps: A) collecting filtered words from the title of a video, B) collecting any one SNOMED-CT term from the title, C) collecting a group of SNOMED-CT terms from the title, and D) determining the union of the results of steps B and C. Step A uses a “stop word” filtering system (i.e., that avoids preposition, adverbs, and similar terms), and steps B and C are combined with SNOMED-CT to extract web links from MedlinePlus.

Algorithm design

We selected keywords (or terms) from video metadata (i.e., the video title, description, and subtitles). These keywords are used to identify semantic terms from Medline Plus. Fig. 1 shows the Term Extraction process for diabetes videos from YouTube. The algorithm contains two steps:

  • Source term collection: the video title, description, and subtitle are collected as possible terms.

  • NLP: In this case, NLP is applied to the title description and video subtitles using the cTAKES framework.Footnote 5 This is a health-specific NLP implementation that extracts SNOMED-CT health terms from text. See Fig. 2 for an example of extracted metada of a video.

    Fig. 2
    figure 2

    cTakes XML Example with Video Metadata. Example of XML source code from the cTakes result for a video related to blood cells

We conducted a text analysis using the Unified Medical Language System (UMLS)Footnote 6 with SNOMED-CT annotations to match the cTAKES framework. To achieve this, we inject the original video metadata files (with title, description, and subtitle) procedures from the UMLS library, resulting in an XML file that contains a morphological, syntactic, and semantic analysis.

From this file, we filtered the UmlConcept labels that contain collected terms from the SNOMED-CT ontology properties. For instance, Fig. 2 shows example XML for the terms Blood, Entire Cell, and Cells. The cTAKES configuration uses the standard pipeline AggregatePlaintextFastUMLSProcessor to extract the SNOMED-CT terms.

Fig. 3
figure 3

Web Form for Raters. Example screenshot of the video and rating system presented to raters. (Video source:

To work with the UMLS library, we used a profile license.Footnote 7 Appendix 1 shows the configuration used to run the cTAKES execution.

Once the SNOMED-CT terms are extracted, we cross-match them with the terms from the Bio-ontology API to find synonymous MedlinePlus terms. These outputs allow us to obtain a web link from the MP_HEALTH_TOPIC_URL MedlinePlus property, which is obtained via a Representational state transfer (REST) endpoint from the associated extracted term, which allows us to provide trusted recommendations to end users. For instance, the example terms Blood and Stem Cell both have corresponding Medline Plus links,Footnote 8 . Footnote 9

Given that the number of SNOMED-CT vocabulary terms is larger than those on MedlinePlus, we anticipated that many results would not have matching terms. Although Bio-ontology offers an Annotator Web service that annotates user-provided text (e.g., journal abstracts) with relevant ontology concepts, this feature was not used for this work.

For practical reasons, we ignored isolated terms from SNOMED-CT that did not have a Medline Plus match. Although it is possible to select other ontologies to find a corresponding Medline Plus term, in this paper, we focus on results obtained only with these two ontologies.

Datasets of videos and raters

We assigned 26 health professionals (raters) to the three set of videos divided by topic (general medicine, diabetes, or hypertension). We recruited these healthcare professionals directly either by email or other means, based on their familiarity with health topics and online health. After explaining to them the goals of the project and acquiring informed consent, the raters were asked to determine if the recommended links for a given video were relevant for the video topic. The exercise of rating the recommendations was not based on any personal information from the participants, but rather their expert opinion of a web tool (see Figs. 3 and 4). As such, this research does not involve human subjects (the study does not obtain information about living individuals).

Fig. 4
figure 4

Juvenile Diabetes Research Foundation Video. Example diabetes video from the Diabetes Research Foundation and links extracted from MedlinePlus. (Video source:

Our dataset contained 53 videos, some of which had been utilized in our previous research [33]: a) 10 general medical videos (i.e., general health-related videos extracted from hospital YouTube channels), b) 22 videos about diabetes, and c) 21 videos about hypertension.

To rate the relevance of the videos and recommended links, we used Cohen’s kappa to determine the level of agreement between two given raters. Kappa is defined as follows [42]:

$$ k=\frac{Pr(a)- Pr(e)}{1- Pr(e)}, $$

where Pr(a) is the relative observed agreement and Pr(e) is the hypothetical chance of agreement. Therefore, this formula calculates the ratio of observed agreement to hypothetical agreement by chance. If the raters are in complete agreement, then k = 1. A k coefficient greater than 0.80. indicates good agreement for a given recommendation.

Cohen's kappa was calculated using the irr package of the R application (version 3.3.1 on linux-gnu). The method in question is kappa2(ratings, “unweighted”). This function includes the vector of the rater values.

For each category of videos (general medical, diabetes, and hypertension), we selected a pair of reviewers with a high level of inter-rater agreement, based on Cohen’s kappa, to have consistent rater agreement. The pair of raters had a Cohen’s kappa inter-rater agreement of 0.626 for the general medical videos (z = 4.35, p-value = 1.33 × 10−05), 0.582 for diabetes (z = 6.47, p-value = 9.9 × 10−11), and 0.717 for hypertension (z = 7.7, p-value = 1.31 × 10−14).

In the next step, we selected videos and links with an acceptable level of inter-rater agreement based on the Cohen’s kappa values. Using the algorithm described in the previous section, we generated 510 recommended MedlinePlus links, but evaluated only the first five recommendations for each video, as our recommender system limits the number of recommendations. The final dataset contained 10 general medical videos with 48 recommended links, 22 diabetes videos with 102 recommended links, and 21 hypertension videos with 103 recommended links.

This approach allowed us to focus the evaluation on videos and links for which there was a homogenous agreement level among professionals. The rationale of this approach is relayed in our previous research, which highlighted the lack of consensus between professionals on certain types of health videos [43].

Evaluation metrics of the recommendations

We used two metrics to evaluate the relevance of the recommended links for a given video. These metrics, precision at k [44] and normalized discounted cumulative gain [45], are widely used to evaluate search algorithms in information retrieval and indicate the relevance of the “top” retrieved results. The importance of focusing on the top retrieved results is based on the web browsing behavior of users, as they tend to focus only on the top few item suggestions.

Precision at k

Precision (also called positive predictive value) is the fraction of retrieved instances that are relevant, in our case, this is the relevance of the links recommended for a given video. Precision is calculated as

$$ \mathrm{Precision}=\frac{\left|\mathrm{Trusted}\ \mathrm{Recommendations}\left|\ {\displaystyle \cap}\right|\mathrm{Recovered}\ \mathrm{Recommendations}\right|}{\left|\mathrm{Recovered}\ \mathrm{Recommendations}\right|}. $$

The precision at k (P@k) [46, 47] accounts for the order of the returned recommendations and is calculated as the fraction of the first k accepted links to all k links.

Normalized Discounted Cumulative Gain

The normalized discounted cumulative gain (nDCG) is another common information retrieval metric [45]. It is a measure of ranking quality, where DCGk are highly relevant documents appearing lower in a search result and the ideal discounted cumulative gain (iDCGk) is the DCG of the vector with all links with an accepted value:

$$ n D C{G}_k = \frac{DC{G}_k}{iDC{G}_k}\ . $$


To evaluate each recommendation, we considered two scenarios: a) robust and b) moderate. In the robust scenario, we consider as relevant only those link recommendations that are supported by both raters. In the moderate scenario, we consider a link to be relevant if at least one rater agreed with the recommendation. The moderate case is most appropriate when the risk of misinformation is low, while the robust scenario is the most appropriate when there is greater potential to spread misinformation.

In these scenarios, P@k and nDCGk were calculated as follows. The relevance of the k first link recommendations is calculated as follows for each recommended link j (1 ≤ j ≤ k):

(a) If both raters approve link j, it is accepted (its value is 1, or relevant).

(b) If both raters do not approve link j, it is rejected (its value is 0, or irrelevant).

(c) In the case in which one rater approves link j and the other rejects it, in the robust scenario, link j is considered irrelevant (value 0), whereas in the moderate scenario, it is considered relevant (value 1).

The P@k results are shown in Table 1 and nDCG results are showed in Table 2. Overall, the performance of the recommender system was higher when giving recommendations for the general medicine and diabetes videos.

Table 1 Mean precision @ K recommended links
Table 2 Mean nDCG for K recommended links


The results show that it is feasible to recommend relevant links for health videos using a semantic-based recommender system. However, there are several concerns that deserve special attention. Although positive overall, recommendation performance varied across the different topics used in this study, which could be due several factors. For example, there might be fewer links related to diabetes than other topics (e.g., hypertension), thus limiting the potential items that can be recommended. Further, our semantic-based approach might also suffer from the semantic-gap between the layperson’s language and a medical thesaurus. Although work has been done to develop a Consumer Health Vocabulary, this has not been implemented in our approach; additionally, the semantic gap may differ across health topics [48].

In contrast, our approach of using semantics to identify relevant links allows the algorithms to find links that are related to synonyms and disambiguation. Still, this poses some additional challenges. For example, in a video titled Juvenile Diabetes Research Foundation − Cure Video – Dalas, Footnote 10 our algorithm extracted the term “shots,” which resulted in a recommendation for a link regarding the importance of vaccination (a topic of relative importance in diabetes). One advantage of relying on medical terms is that our algorithm has an enhanced capability to reduce the number of links that have no relation to the video content, which is an important limitation of previous studies, where such terms could not be avoided [33, 34].

Recommender systems can play a major role, not only in education, but also in supporting behavioral changes for a wide range of health conditions [49,50,51], including smoking cessation [51]. In such cases, the recommendations are not only chosen with regard to content, but also with respect to timing, and consider different psychological health factors (aka user context) [52]. Our work does not address context-awareness regarding the time and place of the recommendations. However, by providing trustworthy recommendations for websites when a user is watching a video, we can support complex health information seeking [53, 54].

The applications of recommender systems in the health domain are still emerging. Therefore, we lack common evaluation methods that can allow us to compare work across separate studies in this topic [29, 55]. There are examples in the literature of recommender systems in the health domain that, for example, provide recommendations based on a personal health record [56]. In our case, we deal with a very different type of content-based recommendation, as we are not recommending content for a given user but rather for a given health educational item.

Our work is aligned with previous studies in which health information is enriched with additional content [56]. There is still quite a substantial knowledge gap on how people search for online health information, and, even more importantly, on how that affects the health behaviors of the information seeker [57]. Our recommender system approach does not aim to provide recommendations personalized for a user, but rather to provide further reliable information for users watching a health video. This content-based recommendation approach is crucial for supporting the current patterns of health consumers looking for multiple sources when searching for health information online [58].

Most previous studies of health recommender systems do not address their impact on health outcomes; in contrast, we do so using information retrieval accuracy metrics. This approach has the potential to create risks for health consumers, which is one of our motivations for using health professionals in this evaluation. Ekstrand et al. recently reviewed potential ways in which health recommender systems can do harm and the ways to minimize potential harm [59]. Giving wrong or potentially misleading health information can be a cause for serious concern; for example, recently, the FDA forced the company 23andMe to remove and edit personalized health information regarding genetic health risks [60]. Further, health information can be used for unhealthy purposes (e.g., the abuse of diuretics for weight loss is common in people with eating disorders).


Our study relies on the ratings of hundreds of recommended links for given videos. However, these ratings were given by healthcare professionals and not health consumers. As explained in our previous work, professionals and consumers often disagree on the relevance of health content [43]. Experiments with health consumers will be required to further evaluate recommendation quality.

Note that our study only investigates the feasibility of this approach. Consequently, extrapolating the results to larger studies is necessary. Ideally, further studies will consider more users (and not necessarily healthcare professionals). In addition, our rating approach was rather simplistic, considering the multiple quality dimensions of health videos [35]. Further, the ideal evaluation should take place in a real information seeking scenario and not a simulated one because many factors affect information seeking by health consumers, including stress or literacy levels [53]. The patient perspective was not explored in this study because we consider it to be more ethically appropriate to first study the feasibility of an approach with health experts. Patients’ perspectives and acceptance can also vary substantially across age, health literacy levels, and other factors. Future research will need to explore the application of our method in a patient portal with additional content and users.

Another limitation of our study is that our video dataset is not generalizable. We selected several topics of high importance (diabetes and hypertension), but we cannot extrapolate that our approach will work with other health topics. A major challenge to generalizing semantic-based approaches such as ours is the gap between medical and consumer health vocabularies [61]. Because we use content generated by health organizations (not individuals) and a medical ontology, we might expect more difficulties when recommending links to consumer-generated content.


This study demonstrated that a semantic-based recommender algorithm can provide relevant education health websites as further reading for a given health video. The relevance of websites recommended by our system decreased as we provided more recommendations, but HealthRecSys still performed well with up to five recommended links per video. Because user browsing behavior is often limited to a few items, this does not pose a serious limitation. Conversely, our approach can reduce the burden of health consumers when searching for reliable additional health educational content. Further, the speed of navigation to a reliable source, as identified by Strauss, is an important factor in information seeking [62].

Future improvements to recommender systems will incorporate more semantic analytics and perhaps be able to determine the patient’s context (i.e., mood) to make better recommendations. It will be possible to use this algorithm to recommend content and videos to counterbalance misinformation, find information on controversial topics, and filter out videos with little scientific acceptance. For instance, a video that promotes steroid consumption could recommend information alerting the individual to their potential negative effects.


  1. Bio-ontology API endpoint documentation

  2. Bio-ontology website

  3. SNOMED-CT website


  5. cTAKES website

  6. UMLS website

  7. UMLS web license profile https://uts.nlm.nih.giv//uts.htmlprofile

  8. Blood:

  9. Cell:

  10. Juvenile Diabetes Research Foundation



Application-programming interface


Natural language processing


Unified Medical Language System


  1. Griffis HM, Kilaru AS, Werner RM, Asch DA, Hershey JC, Hill S, Ha YP, Sellers A, Mahoney K, Merchant RM. Use of social media across US hospitals: descriptive analysis of adoption and utilization. J Med Internet Res. 2014;16(11):e264. doi:10.2196/jmir.3758.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Ziebland S, Wyke S. Health and illness in a connected world: how might sharing experiences on the internet affect people’s health? Milbank Q. 2012;90:219–49.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Jamal A, Khan SA, AlHumud A, Al-Duhyyim A, Alrashed M, Bin Shabr F, et al. Association of online health information-seeking behavior and self-care activities among type 2 diabetic patients in Saudi Arabia. J Med Internet Res. 2015;17:e196.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Lewis D. What is Web 2.0? Crossroads. 2006;13:3.

    Article  Google Scholar 

  5. Zeng X, Parmanto B. Web content accessibility of consumer health information web sites for people with disabilities: a cross sectional evaluation. J Med Internet Res. 2004;6:e19.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Cline RJW, Haynes KM. Consumer health information seeking on the Internet: the state of the art. Health Educ Res. 2001;16:671–92.

    Article  CAS  PubMed  Google Scholar 

  7. Fernandez-Luque L, Karlsen R, Vognild LK. Challenges and opportunities of using recommender systems for personalized health education. In Stud Health Technol Inform. 2009;150:903–7. doi:10.3233/978-1-60750-044-5-903.

    Google Scholar 

  8. Ponce V, Deschamps JP, Giroux LP, Salehi F, Abdulrazak B. QueFaire: Context-aware in-person social activity recommendation system for active aging. In: Geissbühler A, Demongeot J, Mokhtari M, Abdulrazak B, Aloulou H, editors. Inclusive Smart Cities and e-Health. Springer International Publishing; 2015. p. 64–75.

  9. Adomavicius G, Tuzhilin A. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE T Knowl Data Eng. 2005;17:734–49.

    Article  Google Scholar 

  10. Sanchez-Bocanegra CL, Sanchez-Laguna F, Sevillano JL. Introduction on health recommender systems. Methods Mol Biol (Clifton NJ). 2015;1246:131–46.

    Article  CAS  Google Scholar 

  11. Xavier A. Past, Present, and Future of Recommender System. In: Proceedings of the 21st International Conference on Intelligent User Interfaces, ACM 2016.

  12. Breese JS, Heckerman D, Kadie C. Empirical analysis of predictive algorithms for collaborative filtering. In: UAI'98 Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, San Francisco, CA, USA. 1998. p. 43–52.

  13. Elahi M, Ricci F, Rubens N. A survey of active learning in collaborative filtering recommender systems. Computer Science Review. 2016;20:29–50.

    Article  Google Scholar 

  14. Gu Y, Zhao B, Hardtke D, Sun Y. Learning Global Term Weights for Content-based Recommender Systems. In: International World Wide Web Conference Committee (IW3C2). Canada: ACM; 2016.

    Google Scholar 

  15. Zhang H-R, Min F. Three-way recommender systems based on random forests. Knowl-Based Syst. 2016;91:275–86.

    Article  Google Scholar 

  16. Li Y, Zheng Y, Kang J, Bao H. Designing a Learning Recommender System by Incorporating Resource Association Analysis and Social Interaction Computing. In: Li Y, Chang M, Kravcik M, Popescu E, Huang R, Kinshuk, Chen N-S, editors. Springer State-of-the-Art and Future Directions of Smart Learning. Singapore: Springer; 2015. p. 137–43.

    Google Scholar 

  17. Campos L, Fernández-Luna J, Huete J, Rueda-Morales M. Combining content-based and collaborative recommendations: A hybrid approach based on Bayesian networks. Int J Approx Reason. 2010;51:785–99.

    Article  Google Scholar 

  18. Pham T-N, Vuong T-H, Thai T-H, Tran M-V, Ha Q-T. Sentiment Analysis and User Similarity for Social Recommender System: An Experimental Study. Information Science and Applications (ICISA). 2016;376:1147–56.

    Google Scholar 

  19. Aggarwal C. Social and Trust-Centric Recommender Systems. Recommender Systems. 2016. p. 345–84.

    Google Scholar 

  20. Doerfel S, Jäschke R, Stumme G. The Role of Cores in Recommender Benchmarking for Social Bookmarking Systems. ACM Transactions on Intelligent Systems and Technology (TIST). 2016;7:1–33.

    Article  Google Scholar 

  21. Jiang W, Wang G, Bhuiyan MZ, Wu J. Understanding Graph-Based Trust Evaluation in Online Social Networks. ACM Computing Surveys (CSUR). 2016;49:1–35.

    Article  Google Scholar 

  22. Felferning A, Friedrich G, Jannach D, Zanker M. Constraint-based recommender systems. Recommender Systems Handbook 2016;161–90. doi:10.1007/978-1-4899-7637-6_5.

  23. Bridge D, Goker M, McGinty L, Smyth B. Case-based recommender systems. Knowl Eng Rev. 2005;20(3):315–20.

    Article  Google Scholar 

  24. Garrido A, Morales L, Serina I. On the use of case-based planning for e-learning personalization. Expert Systems with Applications. 2016;60:1–15.

    Article  Google Scholar 

  25. Hernando A, Bobadilla J, Ortega F. A non-negative matrix factorization for collaborative filtering recommender systems based on a Bayesian probabilistic model. Knowl-Based Syst. 2016;97:188–202.

    Article  Google Scholar 

  26. Ioannidis E, Weinsberg E, Taft NA, Joye M, Nikolaenko V, inventors; Thomson Licensing, assignee. A method and system for privacy-preserving recommendation to rating contributing users based on matrix factorization. United States patent US 20,160,012,238. 24 Jan 2016.

  27. Espín V, Hurtado MV, Noguera M. Nutrition for elder care: A nutritional semantic recommender system for the elderly. Expert Syst. 2016;33:201–10.

    Article  Google Scholar 

  28. Borbolla D, Del Fiol G, Taliercio V, Otero C, Campos F, Martinez M, Luna D, Quiros F. Integrating personalized health information from MedlinePlus in a patient portal. Stud Health Technol Inform. 2014;205:348–52.

    PubMed  Google Scholar 

  29. Valdez AC Ziefle M, Verbert K, Felfernig A, Holzinger A. Recommender systems for health informatics: stateof-the-art and future perspectives. Machine Learning for Health Informatics. Springer International Publishing, 2016, p. 391–414.

  30. Berendt B, Hotho A, Stumme G. Towards semantic Web mining. In: Horrocks I, Hendler J, editors. The Semantic Web—ISWC 2002. Heidelberg: Springer; 2002. p. 264–78.

    Chapter  Google Scholar 

  31. Berners-Lee T, Hendler J. Publishing on the semantic web. Nature. 2001;410(6832):1023–4.

    Article  CAS  PubMed  Google Scholar 

  32. Yu HQ, Zhao X, Deng Z, Dong F. Ontology driven personal health knowledge discovery. In: Uden L, Heričko M, Ting I-H, editors. Knowledge Management in Organizations. Springer International Publishing; 2015. p. 649–63.

  33. Rivero-Rodriguez A, Konstantinidis ST, Sanchez-Bocanegra CL, Fernandez-Luque L. A health information recommender system: enriching YouTube health videos with MedlinePlus information by the use of SNOMEDCT terms. In: Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems, Porto, Portugal, June. IEEE. 2013. p. 257–61.

    Chapter  Google Scholar 

  34. Sánchez-Bocanegra CL, Rivero-Rodriguez A, Fernández-Luque L, Sevillano JL. Diavideos: A diabetes health video portal. Stud Health Technol Inform. 2013;2(1):e6.

  35. Gabarron E, Fernandez-Luque L, Armayones M, Lau AY. Identifying measures used for assessing quality of YouTube videos with patient health information: a review of current literature. Interact J Med Res. 2013;2(1):e6. doi:10.2196/ijmr.2465.

  36. Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc. 2011;18(5):544–51. doi:10.1136/amiajnl-2011-000464.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Alexander C, Chris F, Shalom L. The Handbook of Computational Linguistics and Natural Language Processing. USA: Wiley-Blackwell; 2010. ISBN 978-1-4051-5581-6.

    Google Scholar 

  38. Xu H, Jiang M, Oetjens M, Bowton EA, Ramirez AH, Jeff JM, et al. Facilitating pharmacogenetic studies using electronic health records and natural-language processing: A case study of warfarin. J Am Med Inform Assoc. 2011;18:387–91.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Jensen PB, Jensen LJ, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet. 2012;13:395–405.

    Article  CAS  PubMed  Google Scholar 

  40. Uzuner O, Stubbs A. Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks. J Biomed Inform. 2015;58:S1–5.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Wang Y, Luo J, Hao S, Xu H, Shin AY, Jin B, et al. NLP based congestive heart failure case finding: a prospective analysis on statewide electronic medical records. Int J Med Inf. 2015;84:1039–47.

    Article  Google Scholar 

  42. Sim J, Wright CC. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys Ther. 2005;85(3):257–68.

    PubMed  Google Scholar 

  43. Fernandez-Luque L, Karlsen R, Melton GB. HealthTrust: a social network approach for retrieving online health videos. J Med Internet Res. 2012;14(1):e22. doi:10.2196/jmir.1985.

  44. Sujatha P, Dhavachelvan P. Precision at K in multilingual information retrieval. Int J Comput Appl. 2011;24:40–3.

    Google Scholar 

  45. Wang Y, Wang L, Li Y, He D, Chen W, Liu TY. A theoretical analysis of normalized discounted cumulative gain (NDCG) ranking measures. In: Proc 26th Annu Conf Learning Theory. 2013. p. 1–30.

    Google Scholar 

  46. Kobayashi M, Takeda K. Information retrieval on the Web. ACM Comput Surv. 2000;32(2):144–73.

    Article  Google Scholar 

  47. Chapelle O, Wu M. Gradient descent optimization of smoothed information retrieval metrics. Inf Retr. 2009;13:216–35.

    Article  Google Scholar 

  48. Zeng Q, Tse T, Divita G, Keselman A, Crowell J, Browne AC. Term identification methods for consumer health vocabulary development. J Med Internet Res. 2007;9(1):e4.

    Article  PubMed Central  Google Scholar 

  49. Wiesner M, Pfeifer D. Health recommender systems: Concepts, requirements, technical basics and challenges. Int J Environ Res Publ Health. 2014;11(3):2580–607.

    Article  Google Scholar 

  50. Sadasivam RS, Cutrona SL, Kinney RL, Marlin BM, Mazor KM, Lemon SC, et al. Collective-Intelligence recommender systems: Advancing computer tailoring for health behavior change into the 21st century. J Med Internet Res. 2016;18(3):e42.

    Article  PubMed  PubMed Central  Google Scholar 

  51. Hors-Fraile S, Civit A, Nuñez Benjumea FJ, Carrasco Hernández L, Ortega Ruiz F, Fernandez-Luque L. Coupling neuroscience smoking cessation interventions with social media and mobile devices. Front Hum Neurosci. Conference Abstract: SAN2016 Meeting. 2016. doi:10.3389/conf.fnhum.2016.220.00025.

  52. Lin Y, Jessurun AJ, de Vries B, Timmermans HJP. Motivate: towards context-aware recommendation mobile system for healthy living. In: Proc IEEE 5th Int Conf Pervasive Computing Technologies for Healthcare. 2011. p. 250–3.

    Google Scholar 

  53. Lau AYS, Coiera EW. Can cognitive biases during consumer health information searches be reduced to improve decision making? J Am Med Inform Assoc. 2009;16(1):54–65.

    Article  PubMed  PubMed Central  Google Scholar 

  54. Sharit J, Taha J, Berkowsky RW, Czaja SJ. Seeking and Resolving Complex Online Health Information Age Differences in the Role of Cognitive Abilities. Proc Human Factors and Ergonomics Soc Annu Meeting. 2016;60:1.

    Article  Google Scholar 

  55. Elsweiler D, Ludwig B, Said A, Schaefer H, Trattner C. Engendering Health with Recommender Systems. In: Proc 10th ACM Conf Recommender Systems. 2016. p. 409–10.

    Chapter  Google Scholar 

  56. Wiesner M, Pfeifer D. Adapting recommender systems to the requirements of personal health record systems. In: Veinot T, editor. Proc 1st ACM Int Health Inform Symp (IHI’10). USA: ACM; 2010. p. 410–4.

    Chapter  Google Scholar 

  57. Anker AE, Reinhart AM, Feeley TH. Health information seeking: a review of measures and methods. Patient Educ Couns. 2011;82(3):346–54.

    Article  PubMed  Google Scholar 

  58. Longo DR, Schubert SL, Wright BA, LeMaster J, Williams CD, Clore JN. Health information seeking, receipt, and use in diabetes self-management. Ann Family Med. 2010;8(4):334–40.

    Article  Google Scholar 

  59. Ekstrand JD, Ekstrand MD. First do no harm: Considering and minimizing harm in recommender systems designed for engendering health. Engendering Health Workshop RecSys 2016 Conf. 2016. Accessed 2 Apr 2017.

  60. Kill A. The direct-to-consumer genetics debate. Lancet Oncol. 2016;17(7):e265.

    Article  PubMed  Google Scholar 

  61. Zeng QT, Tony T, Guy D, Alla K, Jon C, Browne AC, Sergey G, Long N. Term Identification Methods for Consumer Health Vocabulary Development. J Med Internet Res. 2007;9(1):e4.

  62. Straus S, Haynes RB. Managing evidence-based knowledge: the need for reliable, relevant and readable resources. CMAJ Can Med Assoc J. 2009;180:942–5.

    Article  Google Scholar 

Download references


We would like to acknowledge the efforts of the video reviewers who completed the evaluations that enabled us to complete this study.


This research was co-funded by a HealthTrust research grant from the University of Tromso (Norway). This study was also co-funded by the SmokeFreeBrain Project of the European Union's Horizon 2020 Research and Innovation Programme under Grant No. 681120. The funding sources did not have any role in the execution of the study.

Availability of data and materials

The dataset supporting the conclusions of this article is included within the article.

Authors’ contributions

CLSB led the study and consequently was involved in all research and manuscript preparation. LFL, AC, and JSR reviewed the manuscript and advised CLSB on the algorithm used in the study. CR participated in the study design, discussions, and content analysis, and reviewed the manuscript. All the authors read and approved the final version of the manuscript.

Authors’ information

LFL performed part of this work at the Northern Research Institute (Norut) in Tromso, Norway.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

Not applicable. The experts that participated in the study did not provide any personal information, but rather evaluated a tool; consequently, their participation is not considered to be human subject research. The research protocol of this study was evaluated by the PhD Research Committee of the University of Seville, Spain, and all collaborators provided informed written consent.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Luis Fernandez-Luque.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sanchez Bocanegra, C.L., Sevillano Ramos, J.L., Rizo, C. et al. HealthRecSys: A semantic content-based recommender system to complement health videos. BMC Med Inform Decis Mak 17, 63 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: