Skip to main content

Leveraging artificial intelligence and data science techniques in harmonizing, sharing, accessing and analyzing SARS-COV-2/COVID-19 data in Rwanda (LAISDAR Project): study design and rationale

Abstract

Background

Since the outbreak of COVID-19 pandemic in Rwanda, a vast amount of SARS-COV-2/COVID-19-related data have been collected including COVID-19 testing and hospital routine care data. Unfortunately, those data are fragmented in silos with different data structures or formats and cannot be used to improve understanding of the disease, monitor its progress, and generate evidence to guide prevention measures. The objective of this project is to leverage the artificial intelligence (AI) and data science techniques in harmonizing datasets to support Rwandan government needs in monitoring and predicting the COVID-19 burden, including the hospital admissions and overall infection rates.

Methods

The project will gather the existing data including hospital electronic health records (EHRs), the COVID-19 testing data and will link with longitudinal data from community surveys. The open-source tools from Observational Health Data Sciences and Informatics (OHDSI) will be used to harmonize hospital EHRs through the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The project will also leverage other OHDSI tools for data analytics and network integration, as well as R Studio and Python. The network will include up to 15 health facilities in Rwanda, whose EHR data will be harmonized to OMOP CDM.

Expected results

This study will yield a technical infrastructure where the 15 participating hospitals and health centres will have EHR data in OMOP CDM format on a local Mac Mini (“data node”), together with a set of OHDSI open-source tools. A central server, or portal, will contain a data catalogue of participating sites, as well as the OHDSI tools that are used to define and manage distributed studies. The central server will also integrate the information from the national Covid-19 registry, as well as the results of the community surveys. The ultimate project outcome is the dynamic prediction modelling for COVID-19 pandemic in Rwanda.

Discussion

The project is the first on the African continent leveraging AI and implementation of an OMOP CDM based federated data network for data harmonization. Such infrastructure is scalable for other pandemics monitoring, outcomes predictions, and tailored response planning.

Peer Review reports

Introduction

In December 2019, a critical respiratory disease from an unknown cause was identified in Wuhan, China [1]. Thereafter, the causative pathogen was discovered as a novel coronavirus and was named the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1,2,3]. Since then, the SARS-CoV-2 virus has quickly spread across China and worldwide [1, 4,5,6]. Since the first released reports of the confirmed cases of the coronavirus disease of 2019 (COVID-19) in Wuhan, China, the whole world has witnessed severe unprecedented mortality and morbidity due to this disease resulting in serious public health emergencies that defined the disease as a global pandemic [7,8,9]. Although the initial outbreak of the pandemic in Africa was delayed due to less rapid importation, the risks posed by a significant outbreak remained high due to Socio-economic and health service capacity issues [10,11,12]. The first patient diagnosed with COVID-19 was detected in Rwanda in March 2020 [13]. After detection of the first case, different public health measures have been implemented by the Rwandan government from total lockdowns, inter-district lockdowns, localized lockdowns and others [14, 15]. The classic public health measures to prevent COVID-19 were also emphasized such as the mandatory wearing of masks, social distancing and regular handwashing [16, 17]. Rwanda has also joined the rest of the world to secure vaccines against COVID-19 [18]. Despite all efforts, Rwanda has exceeded 80,000 cases of COVID-19 with more than 1000 deaths and numbers keep increasing [13].

In addition, with the apparent unpredictability of the increase or decrease of COVID-19 cases, both the decision-makers and the general population live in an uncertain situation [19]. Accurate short-term forecasting of COVID-19 spread plays an essential role in improving the management of the overcrowding problem in hospitals and enables appropriate optimization of the available resources [20]. This forecasting effort helps also to reduce the burden of COVID-19 in terms of planning and adjustment of public health measures. This study, the first to our best knowledge in Rwanda, has proposed the building of data hubs which will later be used to design COVID-19 prediction models using Artificial intelligence (AI) and Machine Learning (ML) techniques. Recently deep learning methods have gained particular attention in time-series modeling and analysis because of their outstanding generalization capability and superior nonlinear approximation [20,21,22,23].

In Rwanda, the use of AI and data science techniques is motivated by a large implementation of electronic medical record (EMR) systems in the health care facilities which makes data accessible. However, the data is fragmented, as health facilities are using different EMRs, and the COVID-19 data were not systematically collected. To effectively visualize and re-use the data, there’s a need to create a centralized common data model using already collected data for both COVID-19 and other medical conditions. In addition to understanding the spread of COVID-19 data, prospective understanding of respect of public health measures is also mandatory. For example, in the study by Barak et al., they emphasized that obtaining information on symptoms dynamics is of great importance to control the complications of the disease in the population [24]. In the current study, we will assess longitudinally COVID-19 related symptoms and preventive measures taken in each area. This project will leverage Artificial Intelligence (AI), Machine Learning (ML) and other Data Science (DS) techniques to create a scalable framework for inventorying, harmonizing and federating the accumulated data from COVID-19 patients and converting it to a standardized data format so that it can be used as part of wider studies on the disease. The harmonized data will consist of COVID-19 diagnosed/serotyped patient data and non-infected individuals from electronic health records (EHRs) of different hospitals and databases of testing centers (positive and negative results). In the second phase of the project, we will collect new data longitudinally including patient-reported outcome (PROs). Those newly collected longitudinal data will be in a similar standardized model by design and could be potentially be linked to federated data.

The project outcome is to leverage all federated data to drive evidence which will fulfil Rwanda's needs and priorities in predicting and monitoring the burden of COVID-19 pandemic in the Rwandan community, on hospital admissions related to COVID-19 and overall COVID-19 infection rates. The generated evidence will also monitor the impact of different public health measures on the COVID-19 pandemic evolution in Rwanda.

Materials and methods

Study setting

The study sites will include 13 hospitals (The University teaching hospital of Kigali, Butare University teaching hospital, Ngoma regional hospital, Ruhengeri provincial hospital, Muhima district hospital, Kibagabaga district hospital, Nyamata district hospital, Nyagatare district hospital, Kinihira district hospital, Kigeme district hospital, Kirehe district hospital, Gisenyi district hospital and Gihundwe district hospital); two health centers (Remera health center and Nyamata health center); and one centralized dataset gathering 9 COVID-19 test centers (Kanyinya, Rwankeri, Gatenga, Kicukiro, ASPEK-Ngoma, Kigali Transit Centre, Rugerero, Kabgayi and Rusizi). These study sites have been selected to represent all four provinces of the country and the Kigali city for prediction and generalizability purposes.

Inclusion and exclusion criteria

The included health facilities were selected based on the availability of EMRs at hospitals. For the community surveys, all participants aged 18 years and above will be eligible to enter the study. Participants will only be excluded if they have no eligible household phone contacts. According to the national regulations, all participants will provide consent to participate in the study. The consent will be electronically signed and embedded into a mobile application used for surveys.

Study design

The LAISDAR (Leveraging Artificial Intelligence and Data Science Techniques in Harmonizing, Sharing, Accessing and Analyzing SARS-COV-2/COVID-19 Data in Rwanda) project is a federated data network, based on the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), as well as on open-source Observational Health Data Sciences and Informatics (OHDSI) tools for data analytics and network integration, and R Studio and Python. As demonstrated in Fig. 1, the network will include several hospitals, whose EHR (Electronic Health Records) data will be harmonized to OMOP CDM, and enriched both with COVID-19 test results, COVID-19 survey results from a national database, and the results of the community surveys. An initial proof-of-concept (POC) implementation was set up and tested, which included the central LAISDAR instance and 2 data nodes—one on a Mac Mini and one on an AWS EC2 (Amazon Elastic Compute Cloud) instance.

Fig. 1
figure 1

LAISDAR participating institutions

There are 2 different open source EHR systems used by the participating hospitals; OpenMRS [25] and OpenClinic GA [26]. Therefore, two different ETL (Extract Transform Load) processes will be implemented in order to have as few local adaptations per hospital as possible.

Enrichment of EHR data is part of the ETL process, whereby available COVID-19 test and survey results will be retrieved from a central repository over a secure interface. One critical challenge with this step is consolidating individual patients; different person identifiers (national ID, mobile number, name, address), or combinations thereof, are used across different systems. We envision generating unique identifiers based on the available keys to facilitate a reliable and reproducible matching of records from the different source systems.

The integration of the sites with a central hub will be accomplished by using the open-source version of Arachne, which provides a platform for performing network studies, integrating OHDSI standards and tools.

Software deployments at the participating hospitals will rely on Docker-based containerization; this approach ensures consistent and reproducible installation across the different sites. For most participating hospitals, a pre-configured Mac Mini will be provided with the complete LAISDAR Dockerized software suite.

Conceptual framework for hospital EHRs harmonization

The conceptual framework is presented in Fig. 2. After inventorying existing datasets this project will set up an infrastructure for data harmonization where novel techniques will be developed including the data access and data analysis interface where we will mix the existing methods and innovative techniques. Within this conceptual framework it’s planned to include: a data catalogue describing the different data sources, the Arachne central hub, a central OHDSI Atlas instance, a central database, as well as R Studio and Jupyter.

Fig. 2
figure 2

Overall LAISDAR architecture for data harmonization

Data collection, analysis and management

The study will involve four main steps with regards to data collection, analysis and management:

Step 1: Data gathering/collection

This work will start by inventorying and gathering various scattered data in Rwanda including existing data from the first 24 months of the COVID- 19 pandemic in Rwanda (the first case was identified in march 2020) together with other hospital data completed with new community survey data.

It’s anticipated to have different formats of data sources ranging from Covid-19 related data registered in Excel documents, via data sources containing Minimum Clinical Data (MCD) in DHIS2 [25] and other systems, to more granular Electronic Medical Record (EMR) data in Open Clinic GA [26], OpenMRS [25] and other EMR systems. We will start by mapping full hospital patients’ records, focusing on 15 health facilities located in regions with a high number of COVID-19 patients and completing with other isolated datasets.

The new data collection (community surveys) will use standardized protocol and questionnaires and will be done according to a longitudinal approach. The participants will be randomly sampled following the sampling frame used for the recent Rwanda Demographic Health Survey (based on the fourth Rwanda Population and Housing Census RPHC) provided by the National Institute of Statistics of Rwanda (NISR) [27]. This sampling frame is a complete list of districts covering the whole country. The data collection is done through bi-weekly phone call, 6 phases planned (starting by December 2021), involving 30 well trained data collectors, one per district supervised by 10 investigators. The minimum sample size required was estimated at 107 people per each of 30 administrative districts in Rwanda. To anticipate on consent refusal and drop outs, we doubled this number making 214 participants in each district. If the participant has a medical file in participating hospitals, or in other COVID-19 testing dataset, the datasets will be linked with possibilities of other linkage data request in future.

The sample will proportionally include males and females based on the number of inhabitants. Each participant will receive a mobile fee connection and internet bundle each week to allow data collection. To mitigate the expected gap of the gender digital divide but also of selected persons without a mobile phone anymore, the consortium established mitigation measures, including but not limited to, leveraging the community healthcare workers (CHWs). Each village in Rwanda has a CHW who participates in various ministry of health (MoH) programs and they have all received the mobile phones from MoH. If we select a respondent without a mobile phone we will liaise with the nearest CHW to reach out to him.

The questionnaires (which will be translated into 3 languages, Kinyarwanda, English and French) include 10 modules (at least 8 of them has to be fulfilled by the project): (1) Demographics; (2) Face mask use; (3) Hand hygiene; (4) Respect of social distancing measures and risk minimization measures; (5) Recent risk situations exposures and COVID-19 measures. On the outcome side, the collected data will include; (6) Coronavirus like-Signs and symptoms; (7) Mental health indicators (based on General anxiety disorder-GAD); (8) Social-economic impact (based on loss of income, or categories) and 9) Covid-19 test results [28].

Gender considerations

Gender facets are found in all COVID-19 consequences including morbidity in general, gender based violence and mental health problems in particular and Socio-economic situations related to lock-down and other COVID-19 consequences. Social and cultural factors related to gender such as specific considerations for some collected data elements will be addressed as well, eg. reproductive health data, the usage of gender-sensitive research questions and gender-impartial language. Moreover, the sampling will pay special emphasis to gender proportional balance while collecting new COVID-19 data and gender key output/aspects will be driven from data analysis.

Step 2: Infrastructure for data harmonizing (developing novel techniques)

For data harmonization, the custom-designed ETL scripts will be developed per data source to extract, transform and load the source data to an OMOP CDM database instance. In the early stages, when the hospital EHRs are not yet harmonized, we will also use synthetic data approaches to help automate harmonization processes. The data owner-side infrastructure will include the OMOP CDM database instance, the Arachne client, the OHDSI Atlas [29] analytical tool, R Studio [29], and Jupyter [29]. The data harmonization process converts the observational data from the format of the source data system to the OMOP CDM supported by the OHDSI organization.

Step 3: Infrastructure for data access, query, and data analysis (mixing existing methods and innovative techniques)

The central platform data access, query, and data analysis, will handle the participating data sources. The central site will use Arachne for the central portal with the data and study catalogues, but also a PostgreSQL database, an OHDSI Atlas analytical platform instance and an R Studio instance. Additional tools can be added like a Jupyter server instance. As a standard, the database will include an OMOP CDM schema, and additional schema(s) to support the central data catalogue. At the beginning of the harmonization process, as the data from hospitals EHRs will be not yet available, this project will use synthetic data to help automate harmonization processes and training models, specially we will use the OHDSI community’s available mock-up data (like Synthea) to train different algorithms /models before we use them on real data.

The Arachne central server setup will allow central management of network studies, with tight integration with the OHDSI tools such as Atlas. The Arachne data catalogue will incorporate the Achilles output from each participating site; the Achilles tool generates a profile of the participating sites’ data on an aggregate level, which will allow a central view of the descriptive statistics for each site. The R Studio and Jupyter instances will allow the development and testing of R scripts as part of a study design, or to analyze data collected from data source sites as part of studies.

Step 4: Data analysis and interpretation (mixing existing methods and innovative techniques)

Within this work, further data analysis is foreseen. It’s known that the federated datasets are challenging to analyze with traditional statistical methods, because they are, like other real-world-data (RWD), 1) collected without any intention for being used in research; 2) incomplete and not cleaned and 3) collected in a sporadic way, not pure longitudinal approach so no way to derive cohort-like data from them. On such data the use of artificial Intelligence and Machine Learning (AI/ML) in analysis has been advised. However, within this work no single model of AI/ML has selected upfront as the data gathered was unlikely similar to others found in the literature and this was a pioneer work in Rwanda. Therefore, we planned first to evaluate the performances of different deep learning methods before deciding which model fits to the gathered data. The use of AI/ML for data analysis will be fully described in further work as well as the contribution of previous work on a Capsule Network based framework [30].

In prediction models, the sequential reproduction number R(t) will be estimated using the Bayesian approach on the Extended SEIR compartmental model. The Bayes rule is used to update the beliefs about the true R(t) based on model predictions and new cases that have been reported each day.

Model definition

We will use an extension of the SEIR model (Fig. 3) inspired by the previous works [31]. This model splits the population into different categories, i.e. susceptible, exposed, infected and removed. The latter two categories are further broken down into super mild, mild, heavy and critical for the infected part of the population, whereas the removed population indicates the immune and dead fraction. A super mild infection refers to the category of asymptotic people who are infected but are unaware of their own infection. Recent figures from Chinese scientists put this number at 86% of all infections [31].

Fig. 3
figure 3

Prediction model definition [32]

Transitioning between different fractions of the population is indicated by the arrows and its rates are expressed by parameters in the model. The two most important parameters in such a model are: (1) the incubation period and (2) rate of virus spread. Other parameters include the odds of having a super mild, mild, heavy or critical infection. For each type of infection, there is an infectious period, etc. All parameters except one were gathered from the available literature on coronavirus. The parameter that remained to be calibrated is ‘beta’, which determines the rate of transitioning individuals from susceptible to exposed. Beta can be interpreted as the degree of social interaction or the amount of exposure to the virus. It is this parameter that is targeted when governments impose restrictions on their citizens. We will, therefore, focus on this parameter. Finally, a documented mathematical model will be discussed at a later stage, at the beginning of the project implementation.

Study limitations

We acknowledge the limitations imposed by a federating network approach used by LAISDAR project, where the data remains at each participating hospital’s site, rather than the patient-level data being pooled in a central location. There are different approaches to accommodate a federated approach for the ML methods described here; the main challenge relates to training any model across all the considered remote data, as it must be done in a federated manner.

Also, the project being pioneer work in this field in Rwanda, we expect some other limitations: data incompleteness, missing data and/or data inaccuracy. Additionally, some data are still captured at health facilities as free text and will not be exploitable by our work, as no Natural Language Processing (NLP) technique will be applied. Finally, this work will work only with Health Facilities where Electronic Health Records is used, leaving behind other health facilities still using paper-based records. This might have an impact on generalizability of the study findings. However, the large sample size expected, the robust sampling methods and federating approach will mitigate this potential bias.

Sustainability and scalability of the project

This project will set up infrastructure both non-material (federation network of hospital data, harmonized data and OMOP data layer in each hospital) but also materials (equipment: servers, workstations). Those infrastructures will stay in place and can be re-used for future projects. The maintenance of this infrastructure is in the Ministry of Health (MoH) of Rwanda hands, through the key partner on this project the Rwanda Biomedical Center, an implementation body of MoH.

The existing infrastructure can be also scalable either to focus on other infectious diseases (Malaria, Tuberculosis) or non-communicable diseases (Hypertension, metabolic diseases) or other emerging pandemics in future. However, the scalability might be challenged by lack of local funding needed for staff/researchers training, new equipment to be purchased etc.

Study expected results, outcomes and impact

Technical infrastructures

An initial proof-of-concept (POC) implementation was set up and tested early in the project, which included the central LAISDAR instance and 2 data nodes—one on a Mac Mini and one on an AWS EC2 instance. The data nodes were set up using Docker containers providing the following services: a (PostgreSQL) OMOP CDM database, Atlas/web API, Achilles and Arachne (connected to the Arachne Central instance). The central server was set up with Arachne Central, where the Data Catalogue was configured, and studies were created and executed for testing the integration at the data nodes. The objective of the POC was to test the integration layer (Arachne), as well as to demonstrate the overall process flow for network studies; these objectives were met.

The next phase of the development is well underway, which includes the completion of the ETL implementations, and the integration with the central COVID-19 test and survey results.

The first phase of the project will include 15 health facilities with plans to include additional hospitals in a later phase of the project.

Capacity building through training.

This project will mainly contribute to research and capacity building through training staff before and during the project both at UR and at participating hospitals. The planned training includes:1) data mapping infrastructure; 2) training on surveys instruments and 3) training on sensitive patient data handling, data harmonization, interoperability and medical terminology: A team from Ghent University (Belgium) has trained the Rwandan research team on OHDSI OMOP CDM mappings including terminology and coding.

Clinical, epidemiological, mental and socio-economic outcomes results:

This project will yield prediction models for the burden of COVID-19 in the community but also the potential impact on hospital admissions or overall infection rates and the impact of various public health measures on (1) the pandemic evolution in the country; (2) on the social-economic situation, (3) and on the mental health (stratified by gender and other vulnerable groups). As intermediate results, the community survey will be analyzed separately on all scopes including descriptive statistics of Socio-economic impact, epidemiology, mental and clinical outcomes. For Socio-economic outcome, the variables to be analyzed are related to the effect of covid-19 on livelihood with a focus on its effect on basic needs (food, medical, care, school fees and transport), income, employment and saving. A logistic model will be formulated and used to analyze the Socio-economic characteristics of people who have been experiencing some economic difficulties due to the COVID-19 situation.

On epidemiological aspects we will investigate the prognosis factors associated with clinical outcome of COVID-19 burden in Rwanda, and the drivers of COVID burden in Rwanda.

Regarding the gender and mental health multiple axes of research are planned including (1) the longitudinal study on stigmatization and associated factors during the COVID-19 pandemic in Rwanda; (2) Behavioral/ Gender based violence outcome of COVID-19 in Rwanda; (3) longitudinal study on mental health wellbeing and associated factors during the COVID-19; and others.

Finally, a cultural analysis is planned to investigate how Rwandans deal with the COVID-19 pandemic and the related control measures.

Conclusion

To the best of authors’ knowledge, this project is the first on the African continent to implement data harmonization on COVID-19. The design and implementation of an OMOP CDM based federated data network for COVID-19 related studies in Rwanda will provide researchers in Rwanda and elsewhere with the tools and data access needed to better track the disease, predict outcomes, and plan appropriate responses.

The chosen architecture lends itself to expansion to additional hospitals or other data sources, should there be a need. Building the LAISDAR infrastructure on the open-source OMOP CDM data model and utilizing OHDSI tools and other open-source tools facilitates easy involvement of new partners. In addition, these choices provide opportunities for participation in other OHDSI based network studies around the world.

Availability of data and materials

This article is a research proposal. The dataset generated for some of the phases of the study have not yet been analysed but will be made available from the corresponding author on reasonable request.

Abbreviations

AI:

Artificial intelligence

LAISDAR:

Leveraging artificial intelligence and data science techniques in harmonizing, sharing, accessing and analyzing SARS-COV-2/COVID-19

CHUK:

The University Teaching Hospital of Kigali

SARS-COV-2:

Severe acute respiratory syndrome coronavirus 2

COVID-19:

Coronavirus disease of 2019

OHDSI:

Observational Health Data Sciences and Informatics

OMOP:

Observational medical outcomes partnership

CDM:

Common data model

EHR:

Electronic Health Records

EMR:

: Electronic Medical Records

ETL:

: Extract transform load

DS:

Data science

ML:

: Machine learning

PROs:

With patient reported outcome

CHWs:

Community health workers

MoH:

Ministry of Health

NISR:

National Institute of Statistics, Rwanda

R&I:

Research and Innovation

RWD:

Real World Data

UR:

University of Rwanda

SEIR:

Susceptible-Exposed-Infected-Removed

API:

Application Programming Interface

GAD:

General anxiety disorder

POC:

Proof of concept

MCD:

Minimum Clinical Data

WHO:

World Health Organization

References

  1. She J, Jiang J, Ye L, Hu L, Bai C, Song Y. 2019 novel coronavirus of pneumonia in Wuhan, China: emerging attack and management strategies. Clin Transl Med. 2020;9(1):1–7.

    Article  Google Scholar 

  2. Fu L, Wang B, Yuan T, Chen X, Ao Y, Fitzpatrick T, et al. Clinical characteristics of coronavirus disease 2019 (COVID-19) in China: a systematic review and meta-analysis. J Infect. 2020;80(6):656–65.

    Article  CAS  Google Scholar 

  3. Zhou P, Yang X-L, Wang X-G, Hu B, Zhang L, Zhang W, et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579(7798):270–3.

    Article  CAS  Google Scholar 

  4. Zhou G, Chen S, Chen Z, Back to the spring of,. facts and hope of COVID-19 outbreak. Vol. 14, Frontiers of Medicine. Springer. 2020;2020:113–6.

    Google Scholar 

  5. Zhou P, Yang X-L, Wang X-G, Hu B, Zhang L, Zhang W, et al. Addendum: a pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;588(7836):E6–E6.

    Article  CAS  Google Scholar 

  6. Zhou P, Yang X-L, Wang X-G, Hu B, Zhang L, Zhang W, et al. Discovery of a novel coronavirus associated with the recent pneumonia outbreak in humans and its potential bat origin. BioRxiv. 2020. Available from: https://doi.org/10.1101/2020.01.22.914952v2.

  7. Açikgöz Ö, Günay A. The early impact of the Covid-19 pandemic on the global and Turkish economy. Turkish J Med Sci. 2020;50(SI-1):520–6.

    Article  Google Scholar 

  8. Hitt MA, Holmes RM Jr, Arregle J-L. The (COVID-19) pandemic and the new world (dis) order. J World Bus. 2021;56(4): 101210.

    Article  Google Scholar 

  9. Kumar A, Singh R, Kaur J, Pandey S, Sharma V, Thakur L, et al. Wuhan to world: the COVID-19 pandemic. Front Cell Infect Microbiol. 2021;11:242.

    Google Scholar 

  10. Lone SA, Ahmad A. COVID-19 pandemic–an African perspective. Emerg Microbes Infect. 2020;9(1):1300–8.

    Article  CAS  Google Scholar 

  11. Hager E, Odetokun IA, Bolarinwa O, Zainab A, Okechukwu O, Al-Mustapha AI. Knowledge, attitude, and perceptions towards the 2019 Coronavirus Pandemic: a bi-national survey in Africa. PLoS ONE. 2020;15(7): e0236918.

    Article  CAS  Google Scholar 

  12. Acter T, Uddin N, Das J, Akhter A, Choudhury TR, Kim S. Evolution of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) as coronavirus disease 2019 (COVID-19) pandemic: a global health emergency. Sci Total Environ. 2020;730:138996.

    Article  CAS  Google Scholar 

  13. Rwanda_Biomed_Center. Rwanda COVID Response 2021 [Internet]. 2021. Available from: https://www.rbc.gov.rw/index.php?id=707

  14. Karim N, Jing L, Lee JA, Kharel R, Lubetkin D, Clancy CM, et al. Lessons learned from Rwanda: innovative strategies for prevention and containment of COVID-19. Ann Glob Heal. 2021;87(1):23.

    Article  Google Scholar 

  15. Ngamije J, Yadufashije C. COVID-19 pandemic in Rwanda: An overview of prevention strategies. Asian Pac J Trop Med. 2020;13(8):333.

    Article  CAS  Google Scholar 

  16. Condo J, Uwizihiwe JP, Nsanzimana S. Learn from Rwanda’s success in tackling COVID-19. Nature. 2020;581(7809):384–5.

    Article  CAS  Google Scholar 

  17. Nkeshimana M, Igiraneza D, Turatsinze D, Niyonsenga O, Abimana D, Iradukunda C, et al. Experience of Rwanda on COVID-19 case management: from uncertainties to the era of neutralizing monoclonal antibodies. Int J Environ Res Public Health. 2022;19(3):1023.

    Article  CAS  Google Scholar 

  18. Loembé MM, Nkengasong JN. COVID-19 vaccine access in Africa: global distribution, vaccine platforms, and challenges ahead. Immunity. 2021;54(7):1353–62.

    Article  Google Scholar 

  19. Musanabaganwa C, Cubaka V, Mpabuka E, Semakula M, Nahayo E, Hedt-Gauthier BL, et al. One hundred thirty-three observed COVID-19 deaths in 10 months: unpacking lower than predicted mortality in Rwanda. BMJ Glob Heal. 2021;6(2): e004547.

    Article  Google Scholar 

  20. Dairi A, Harrou F, Zeroual A, Hittawe MM, Sun Y. Comparative study of machine learning methods for COVID-19 transmission forecasting. J Biomed Inform. 2021;118: 103791.

    Article  Google Scholar 

  21. Sun C, Hong S, Song M, Li H, Wang Z. Predicting COVID-19 disease progression and patient outcomes based on temporal deep learning. BMC Med Inform Decis Mak. 2021;21(1):1–16.

    Article  Google Scholar 

  22. Li WT, Ma J, Shende N, Castaneda G, Chakladar J, Tsai JC, et al. Using machine learning of clinical data to diagnose COVID-19: a systematic review and meta-analysis. BMC Med Inform Decis Mak. 2020;20(1):1–13.

    Article  CAS  Google Scholar 

  23. Sudat SEK, Robinson SC, Mudiganti S, Mani A, Pressman AR. Mind the clinical-analytic gap: electronic health records and COVID-19 pandemic response. J Biomed Inform. 2021;116: 103715.

    Article  Google Scholar 

  24. Mizrahi B, Shilo S, Rossman H, Kalkstein N, Marcus K, Barer Y, et al. Longitudinal symptom dynamics of COVID-19 infection. Nat Commun. 2020;11(1):1–10.

    Article  Google Scholar 

  25. Collaborative O, Network. OpenMRS: open-source platform to build customized EMR system. 2016; Available from: https://openmrs.org/

  26. Verbeke F. OpenClinic GA: open-source integrated hospital information management system [Internet]. 2016. Available from: https://sourceforge.net/projects/open-clinic/

  27. [Rwanda] NIoSo RN, [Rwanda] MoHM II. RwandaDemographic and Health Survey 2019–20 Key Indicators Report. [Internet]. 2020. Available from: https://dhsprogram.com/pubs/pdf/PR124/PR124.pdf

  28. Laisdar Project Investigators. Laisdar website. 2020; Available from: https://laisdar.rbc.gov.rw/

  29. OHDSI: Observational Health Data Sciences and Informatics. ArachneNodeAPI. 2020; Available from: https://github.com/OHDSI/ArachneNodeAPI

  30. Gupta PK, Siddiqui MK, Huang X, Morales-Menendez R, Pawar H, Terashima-Marin H, et al. COVID-WideNet—A capsule network for COVID-19 detection. Appl Soft Comput. 2022;122:108780.

    Article  CAS  Google Scholar 

  31. Alleman TW, Vergeynst J, De Visscher L, Rollier M, Torfs E, Nopens I, et al. Assessing the effects of non-pharmaceutical interventions on SARS-CoV-2 transmission in Belgium by means of an extended SEIQRD model and public mobility data. Epidemics. 2021;37:100505.

    Article  CAS  Google Scholar 

  32. Alleman T, Torfs E, Nopens I. Covid-19: from model prediction to model predictive control. Unpubl Pre-print, https//biomathugentbe/sites/default/files/2020-04/Alleman_etal_v2 pdf [Google Sch. 2020].

Download references

Acknowledgements

The LAISDAR project is funded by the Canada’s International Development Research Centre (IDRC) Grant 109587 and the Swedish International Development Cooperation Agency (Sida), under the Global South AI4COVID Program. The part of community surveys is funded locally by the National Council for Science and Technology (NCST) Rwanda through the research grant NCST-NRIF/COVID-19/002/2020 for the project titled “Longitudinal datasets hub for predicting and monitoring COVID-19 evolution in the community and mitigation measures outcomes in Rwanda (PREDICT project)”.

Author information

Authors and Affiliations

Authors

Contributions

All authors were involved in the conception and design of the study. AN and MT initiated the draft manuscript. All authors contributed and reviewed the manuscript until the final version. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Aurore Nishimwe.

Ethics declarations

Ethics approval and consent to participate

This study has been approved by the Rwanda National Ethics Committee Rwanda (No.112/RNEC/2021), the University of Rwanda, College of Medicine and Health Sciences’ Institutional Review Board, and the ethics committee of the Kigali University Teaching Hospital (EC/CHUK/080/2021). Informed consents will be obtained from each study participant. Confidentiality of the participants will be maintained at all times. No identifying information will be stored by the research team, reducing the risk of breaches of confidentiality. Questionnaires will be number-coded thereby keeping the identity of the participants anonymous. Given the focus on sensitive clinical data, it is important to govern data adequately and ensure appropriate management of the data. Therefore, the consortium will specifically address the data governance matters, from the sources of data, their integration and use ensuring suitable privacy protection and information governance. No patient data will be shared even anonymized. As the IDRC embraces the principle of sharing research data and encourages researchers to make their data openly available, the researchers will be able to access data where they get only the aggregated data (no data download). Each individual will need to register and request access to the whole or a part of data available from the common analytical interface. He or She will sign data access agreement, limited by the research project duration. Research findings will also be made accessible to research participants through a login credentials, but also findings will be disseminated to participants through various media. This study will be carried out in accordance with relevant guidelines and regulations in the Ethical Declarations.

Consent for publication

Not applicable.

Competing interests

The authors report no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nishimwe, A., Ruranga, C., Musanabaganwa, C. et al. Leveraging artificial intelligence and data science techniques in harmonizing, sharing, accessing and analyzing SARS-COV-2/COVID-19 data in Rwanda (LAISDAR Project): study design and rationale. BMC Med Inform Decis Mak 22, 214 (2022). https://doi.org/10.1186/s12911-022-01965-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12911-022-01965-9

Keywords