Skip to main content

Predictive algorithm for the regional spread of coronavirus infection across the Russian Federation



Outbreaks of infectious diseases are a complex phenomenon with many interacting factors. Regional health authorities need prognostic modeling of the epidemic process.


For these purposes, various mathematical algorithms can be used, which are a useful tool for studying the infections spread dynamics. Epidemiological models act as evaluation and prognosis models. The authors outlined the experience of developing a short-term predictive algorithm for the spread of the COVID-19 in the region of the Russian Federation based on the SIR model: Susceptible (vulnerable), Infected (infected), Recovered (recovered). The article describes in detail the methodology of a short-term predictive algorithm, including an assessment of the possibility of building a predictive model and the mathematical aspects of creating such forecast algorithms.


Findings show that the predicted results (the mean square of the relative error of the number of infected and those who had recovered) were in agreement with the real-life situation: σ(I) = 0.0129 and σ(R) = 0.0058, respectively.


The present study shows that despite a large number of sophisticated modifications, each of which finds its scope, it is advisable to use a simple SIR model to quickly predict the spread of coronavirus infection. Its lower accuracy is fully compensated by the adaptive calibration of parameters based on monitoring the current situation with updating indicators in real-time.

Peer Review reports


Infectious disease outbreaks are a complex phenomenon involving numerous interaction factors. According to virology researchers, the SARS-CoV-2 coronavirus, which causes the COVID-19 disease, has spread rapidly because it combines genomes with different properties from other coronavirus strains [1]. Regardless of the state of the global health infrastructure, the novel coronavirus disease poses a threat to all countries. Globally, the number of COVID-19 patients is rising rapidly [2].

Over 5.7 million new cases were recorded from July 4 to July 10, 2022, which is an increase of 6% from the week before. Over 9800 fatalities were reported, which was about the same as the previous week in terms of new fatalities. Globally, there had been over 6.3 million reported fatalities and slightly under 553 million confirmed cases as of July 10, 2022. The new COVID-19 coronavirus has proven to be highly contagious, though not the deadliest disease that the world has seen in decades. COVID-19 has a relatively long incubation period of 2 days to 2 weeks (5–7 days on average), during which carriers are already infected even without obvious symptoms. Fever, fatigue, breathing problems, and a dry cough are the main signs and symptoms of COVID-19. The breathing issue is more severe and intense than the others. There is also a lack of necessary devices to deal with the increased number of patients [3].

The world's healthcare systems were unprepared for the latest coronavirus outbreak. The majority of government efforts at the moment are focused on stopping the coronavirus's spread and identifying potential hot spots. Due to their required close proximity to probable coronavirus patients, healthcare workers and vital personnel are the most susceptible to coronavirus infections. Public healthcare services are unable to handle the rising patient load. Due to the excessive number of patients, the hospitals have soon become overcrowded, but there aren't enough ventilators, PPE, oxygen providers, etc. Some patients must lay on mattresses on the floor because the hospitals can't handle the surge of people. Due to the ongoing influx of new patients who are coronavirus-infected, hospitals have emerged as significant coronavirus carriers. The hospital administrative staff, surgeons, and health care providers have the highest risk of contracting an infection [4].

During the World Health Summit on October 25–27th, 2020, the WHO Director-General and a number of senior executives and experts urged all governments to invest more resources in innovation, research and solutions to combat COVID-19 [5]. However, the health systems of the vast majority of countries worldwide have been struggling to contain the spread of coronavirus. For example, the State of Israel has repeatedly imposed strict restrictive measures, up to the complete closure of the country's airports [6]. After coming to power in January 2021, the new U.S. administration also rushed to apply stricter restraint policies [7].

On July 20, 2022, 221,955 cases of COVID-19 infection were recorded in China. The total number of deaths from coronavirus infection was 5,213 people. There are 5,661 people in the active phase of the disease, of which 383 are in critical condition. The lethality rate is 2.35%. There is tension in the country that the infamous situation in Wuhan could resurface repeatedly after the easing of quarantine restrictions. The situation in China is closely monitored by other countries, as they may end up in an even worse situation in a few days. The actions taken in China may seem excessive, but the Chinese authorities attempt to avoid past miscalculations. In the early stages of the coronavirus outbreak in Hubei province, the magnitude of the issue was not adequately assessed, and countermeasures were thus delayed. An equally acute problem is observed in South Korea, Singapore, Taiwan, and Hong Kong, which so far have managed to contain the spread of coronavirus [8].

In Russia, a rapid spread of infection in 2020 has significantly changed the lifestyle of all social groups [9, 10] and forced to reorganize the health care system in all regions of the country. Despite the relatively stable situation, 18,504,729 confirmed cases had been reported in the country as of 20 July 2022. For all time, 381,997 people died, 17,919,843 recovered.

Judging by the epidemiological situation with coronavirus in Europe and the United States, as of July 2022, an increase in the incidence is also expected in the Russian Federation by the autumn–winter period. If transport links between countries had remained at the pre-pandemic level, then the increase in the incidence would have begun somewhat earlier [11]. After a large wave of the Omicron strain passed in the country at the beginning of 2022, collective immunity grew quite strongly in the population, which gradually begins to weaken. Russian virologists note that the Omicron variant is less dangerous for humans than the parallel strains Alpha and Delta. With Omicron, the mortality rate is 0.3–0.4%. At the beginning of the pandemic, when the original strain from Wuhan (China) was spreading, the mortality rate was 5–6%. However, around 800,000 people around the world continue to be infected with the coronavirus every day. Despite the more recent mild course of the disease with the Omicron strain, about 900 thousand people died in the world in the first half of 2022.

The Centaur subspecies of the Omicron strain can be deadlier than other variants. In addition, it is more contagious than its predecessors BA.1-BA.5, it can successfully bypass the formed immunity, therefore its spread in Russia causes serious concern. It is possible that in the coming months a new, more pathogenic line will appear. The further situation with the spread of coronavirus will depend on which line of evolution the virus will follow. It is almost impossible to control it. According to official statistics as of 07/19/2022 in Russia, 18,499,044 cases of COVID-19 infection were laboratory confirmed, of which 17,915,526 patients fully recovered, 381,960 deaths were recorded. In the period from July 18 to July 19, 2022, there were 4200 COVID-19 cases, 4186 recoveries, and 44 deaths (Table 1).

Table 1 Selective up-to-date data on the regions of the Russian Federation as of July 19, 2022

Pandemics have been more frequent during the past few decades as a result of increased urbanization and international travel. It's critical to create models to predict the risk of an infectious disease spreading further since a disease that spreads in one area has the potential to become a pandemic with global humanitarian and economic consequences.

The study of population epidemics has a long history dating back to the work of Kermack in 1927, in terms of mathematics, outbreak modeling, and control. In the field of statistical physics of disordered systems, spreading phenomena have also received substantial study. In this context, of high relevance is predictive modeling of a new coronavirus infection outbreak at the level of a given region. A variety of mathematical algorithms can be used as tools to study the dynamics of the spread of infectious diseases since epidemiological models are used to predict and evaluate other pathogenic behaviors [12]. It is quite clear that the specifics of the COVID-19 pandemic process should be studied and analyzed thoroughly, involving the methodology of mathematical modeling. Its essence is to replace the original object with its abstract image by a more detailed examination of the model based on computer and logic algorithms and study the true outbreak process. This approach of design and prediction combines the benefits of theoretical constructions with experimental work. The interaction not with the process itself, but with its model allows studying its situational behavior quickly and inexpensively. Based on the strength of the modern mathematical apparatus, computational experiments with object models allow studying and predicting the phenomenon in detail in all its aspects. Mathematical modeling algorithms are steadily improving, capturing increasingly new areas of knowledge. Most models are designed and implemented to predict short-term morbidity. That is driven by the needs of the anti-epidemics services for the timely preparation and implementation of efficient preventive, anti-epidemics, and therapeutic measures [13,14,15,16].

In the fight against COVID-19, contemporary technologies like deep learning, machine learning, and data science are helping. The healthcare system is greatly benefited by these strategies. Linear Regression, Support Vector Machine, Multi-Layer Perceptron, and Vector Auto-Regression are among the most popular methods among them. Machine learning can be a helpful method for accurately assessing, screening, following, forecasting, and predicting the characteristics and trends of the COVID-19, according to Rahman et al. [17].

Deep learning has created a new pathway in the healthcare system. The healthcare system has made considerable strides toward autonomous disease identification with the use of deep neural networks, including tumor detection, cancer cell detection, chest disease detection, and genomic sequence analysis. A combined architecture of Convolutional Neural Network consisting of 20 layers, On the basis of the automatic feature extraction from X-ray pictures, the COVID-19 identification using Recurrent Neural Networks and Long Short-term Memory is significantly impacted [18,19,20].


When setting the research problem, the team of authors was guided by the following considerations. Since viral infections like COVID-19 are characterized by an exponential increase in the number of cases at the initial stage of the epidemic, it would be very interesting to know the following: is it possible to make some forecast of the development of the epidemic from the first data on the number of cases? Since the answer to such an important question is far from obvious, the authors first wanted to answer the following questions:

  1. 1.

    Is it possible, based on a limited number of cases, to draw conclusions about how the epidemic spread of a particular infectious disease corresponds to its viral nature?

  2. 2.

    Is it possible to make a more or less accurate prediction based on the information already available about the behavior of the disease?

To answer these questions, the authors used a feature that is very characteristic of a viral disease, namely, the exponential spread of the disease in the initial period of the epidemic. Since the growth rate must remain constant in the exponential nature of the epidemic, the authors therefore needed to make sure that this rate is stable. If its stability is not confirmed, then the expected prediction results are unlikely to be relevant since then the dynamics of disease growth will differ from the typical behavior of viral infections.

With an affirmative answer to these key questions, one can proceed to forecasting. To predict for a long period, it is necessary to have information about the nature of the disease. A simpler task is to build a short-term forecast for the very near future when there is no doubt that the behavior of the infectious agent will not change much. In this case, for forecasting, it would be optimal to use simple mathematical models for the development of infections, for example, the SIR model.

The study was carried out using the data of Nizhny Novgorod region with 11,499 confirmed COVID-19 cases by early June 2020. Based on monitoring data as of June 15, 2020, the number of people infected with the virus increased by 330, and the number of people who died was 125. The number of patients who recovered and were discharged reached 4612 (Fig. 1).

Fig. 1
figure 1

Dynamics of COVID-19 prevalence in Nizhny Novgorod region

The massive outbreak of COVID-19 pathogen in Nizhny Novgorod region has posed many challenges to the management of the region and regional healthcare. It required challenging management decisions, the important basis of which may be the information obtained by the timely use of prognostic algorithms for the spread of infection.

The prediction of values for virus disease propagation dynamics (VDD) parameters is carried out using a data analysis system built according to the following scheme (Fig. 2).

Fig. 2
figure 2

Flow chart of VDD Data Analysis System

The purpose of using a VDD online data analysis system is not to design a template that considers as many factors as possible, but rather to obtain satisfactory projections of disease dynamics with minimal time and resources. There are a number of prognostic algorithms to analyze the perspectives of viral infections, ranging from simple to very complex ones. Hence, the major problem in developing a prognostic model is to choose the best option. In the COVID-19 pandemic, which is not only threatening, but is developing rapidly, simple prognostic algorithms describing possible scenarios are the most appealing. Such algorithms are particularly convenient for fast-evolving situations.

According to scientists from the Singapore's University of Technology and Design, the COVID-19 outbreak in Russia was expected to end in early autumn 2020. As it is evident today, that was a miscalculation. Unfortunately, the computational algorithm has not been released to the public. All what is known is that the SIR model has been used, which was once considered the gold standard for describing the spread of infectious diseases: Susceptible, Infected, Recovered. It was introduced in the 1920's and has an extensive history of application [21]. According to the attitude towards the disease, the population is divided in groups: susceptible—S, infected—I, and recovered—R.

The rate of increase in the number of diseases is determined by the following formula:

$${\text{Xi}} = \left( {{\text{Xi}} + 1{-}{\text{Xi}}} \right)/{\text{Xi}}$$

The results of the growth rate of the number of diseases are as follows (Fig. 3).

Fig. 3
figure 3

Disease Growth Rate

The growth factor X varies within a limited range, which confirms the hypothesis that the data in question can be considered data on a viral disease. Thus, the 1st question can be answered in the affirmative.

To answer the 2nd question, it is necessary to assess the stability of the disease growth process. To do this, the authors construct the Shewhart (Walter Andrew Shewhart) map of this process. To do this, it is necessary to calculate the sliding range R using the formula:

$${\text{R}} = {\text{ABC}}\left( {{\text{Xi}} - {\text{Xi}} - 1} \right)$$

where ABC is the absolute value function, Xi, Xi − 1—current and previous value of the growth factor.

The calculation of the sliding range R is tabulated.











































To construct the boundaries of the Shewhart map, it is necessary to determine the average value of X (MX) and mR.

MX = 0.167; mR = 0.070.

The Shewhart map for average values should contain the upper and lower bounds, within which the values of the stable process should be contained. If the values obtained are outside the process boundaries, then this is an indication that the process is affected by special causes of variability. In the task of assessing the stability of disease growth rate, going beyond the boundaries of the Shewhart map will mean that there are no grounds to build a disease prognosis, since other extraneous factors act in addition to the virus impact factor.

The average moving range is multiplied by 2.66 to determine the Upper Natural Process Limit (UNPL), which is then added to the X-center map's line:

UNPL = MX +(2.66*mR) = 0.167 + (2.66*0.070) = 0.354.

By dividing the average moving range by 2.66 and removing the result from the X-center map's line, one can get the lower natural process limit (LNPL):

LNPL =X−(2.66*mR) = 0.167−(2.66*0.070) = − 0.020.

Since the lower natural limit of the process turned out to be less than 0, the authors will not take this limit into account. As a result, the Shewhart map for the growth rate of diseases will reflect the content of Fig. 4.

Fig. 4
figure 4

Shewhart map for disease growth rate, X

Analysis of the Shewhart map allows one to note the following features of the process:

  • At the beginning of the process, the value of the coefficient of growth in the number of diseases goes beyond the upper natural limit of the process.

  • In the second part of the process timeline, process stability increases.

Going beyond the UNPL border of the second point is explained by the small amount of information about the growth of the disease at the beginning of its spread. To test the assumption of a decrease in the variability of the disease growth rate, one needs to determine the UNPL for points from 10 to 20:

UNPL = MX +(2.66*mR) = 0.167 + (2.66*0.042) = 0.279.

The validity of the assumption is confirmed—the points of the process are within narrower boundaries.

One can draw the conclusion that it is possible to create an algorithm for predicting a disease after doing an analysis of a disease's growth rate stability.

In time, it is possible that S changes to I, and I to R. A simplified version of the SIR implies that:

  1. 1.

    The number of treatments per unit of time is proportional to the total number of infected individuals, i.e., each infected person has a fixed probability of recovery in units of time.

  2. 2.

    The number of infections is proportional to the product S × I. This hypothesis is based on the notion that the infection occurs through the so-called unsafe contacts, i.e., the contact between susceptible and infected individuals. If the total number of contacts among persons per unit of time is constant, and if the population is sufficiently mixed, the proportion of hazardous contacts should be commensurate with product S × I.

The mathematical model of SIR, describing the dynamics of changes in the number of potential patients (susceptible to a given disease), those who fell ill, and those who recovered (incl. died), is represented by a system of three equations:

$$\begin{aligned} \frac{dS}{{dt}} & = \frac{\beta IS}{N} \\ \frac{dI}{{dt}} & = \frac{\beta IS}{N} - \gamma I \\ \frac{dR}{{dt}} & = \gamma I \\ \end{aligned}$$

where N = S + I + R is the population size; S(t) is the number of susceptible individuals at time t; I(t) is the number of infected individuals at time t; R(t) is the number of individuals who have been infected at time t; β—intensity factor of contacts with subsequent infection; γ—intensity factor of infected individuals’ recovery (the value inverse of the average duration of the disease).

The first equation indicates that the variation in the number of healthy but susceptible individuals decreases over time in proportion to the number of contacts with infected individuals. After contact, the infection occurs, and the susceptible person becomes infected.

The second equation shows that an increase in the number of people infected occurs in proportion to the number of contacts between healthy and infected individuals and diminishes as they recover. The proportional coefficient β is one of the principal parameters of the mathematical model.

The third equation shows that the number of individuals recovered per unit of time is proportionate to the number of infected individuals. In other words, each person who has fallen ill must recover after a while. The coefficient of proportionality γ characterizes how a patient’s body adapts to a new virus [22].

As can be seen from the SIR model, the disease develops according to the ‘susceptible to become infected and then recover’ pattern. The following condition:

$$\frac{dS}{{dt}} + \frac{dI}{{dt}} + \frac{dR}{{dt}} = 0$$

Describes the invariability of the population and does not take into account deaths from the disease. The number of patients at a given time is determined by a parameter called the base reproduction number:

$$R_{0} = \frac{\beta }{\gamma }$$

It is generally accepted that the most important parameters in the SIR model are: characteristic time t—typical recovery time; reproduction rate R0—the ratio between infection and recovery rates. Parameter R0 can be considered the average number of people on whom an infected person spreads the virus over time before recovery [23,24,25]. The main feature of the SIR model is the epidemic transition: the VVD depends radically on whether the R is superior or inferior to one. At R0 < 1, the epidemic subsides; at R0 > 1, it develops, covering a large part of the population. The extent of coverage depends on the specific value of R0, which, in turn, depends upon the characteristics of the virus, the proportion of the vaccinated or recovered population, and measures taken to control the outbreak, i.e., various quarantine forms. For example, if \(R_{0} = 2\), the number of those infected could represent approximately 80% of the total population.

Initial data for the first month of the pandemic (from 2020-04-08 to 2020-05-08) are shown in the table (Table 2).

Table 2 Baseline data on the development of the epidemic

Solving the prediction problem begins with determining the model parameters for the actual disease data. In the equation system of the model (1), the expressions for the base model parameters β and γ can be obtained (2):

$$\gamma = \frac{{\frac{dR}{{dt}}}}{I} \beta = \frac{{\left( {\frac{dI}{{dt}}} \right) \times N}}{I \times S} + \gamma$$

Endpoint dependence data from the SIR model were used in epidemic development calculations in the average model of epidemic system dynamics in the form of flows and stocks [17, 18] (Fig. 5).

Fig. 5
figure 5

The average model of epidemic system dynamics in the form of flows and stocks

Modeling and calculations were performed in iThink v9.0.2 software ( This figure shows the computational window in iThink (Fig. 6).

Fig. 6
figure 6

The computational window in iThink

Stock S allows one to enter data on the population, stock I characterizes the number of cases, stock R characterizes the number of recovered. The Rate of disease stream determines the number of cases per unit of time, and the Rate of recovery stream determines the number of recoveries per unit of time. The iThink program allows one to describe the relationships between model variables, as well as derive a system of equations for relationships between variables:

I(t) = I(t—dt) + (Rate_of_disease—Rate_of_recovery) * dtINIT I = 624.


Rate_of_disease = Bet*I*S/N.


Rate_of_recovery = I*Gam.

R(t) = R(t—dt) + (Rate_of_recovery) * dtINIT R = 669.


Rate_of_recovery = I*Gam.

S(t) = S(t—dt) + (- Rate_of_disease) * dtINIT S = 199,769.


Rate_of_disease = Bet*I*S/N.

Bet = 0.16.

Gam = 0.045.

N = 200,000.

The iThing program allows one to display the results of calculations both in the form of a graph with curves and in tables (Figs. 7 and 8).

Fig. 7
figure 7

The results of calculations both in the form of a graph with curves and in tables

Fig. 8
figure 8

The results of calculations both in the form of a graph with curves and in tables

The calculation shall include the following initial conditions:

N = 300,000.

I = 48;

R = 8;

S = 299,944;

β = 0.2455e-0.0414t;

γ = 0.0041 + 0.0005t.


Actual data on the development of an epidemic are generally represented by the following datasets: the number of infected individuals detected, and the number of patients recovered. The number of infected cases (I) is defined as the difference between the number of cases established and recovered. The dR and dI increments were determined using a 3-point scheme that provided a more precise definition of derivatives. Parameters γ and β were determined by formulas (2). An example of calculation (the full table comprises 30 rows) of the parameters of model γ and β is given in Table 3:

Table 3 Calculation of model parameters

Changes in γ and β parameters during the outbreak show that these parameters are not constant in the real epidemiological situation, and they present both random differences and a trend line (Figs. 9 and 10):

Fig. 9
figure 9

Changes of parameter γ with time (G(t))

Fig. 10
figure 10

Changes of the parameter β with time (B(t))

There is a large amount of computing data associated with a small number of individuals who recovered during the first month of the disease. It allows using only a linear function to estimate the calculated data on the number of people who recovered from the disease.

$$\gamma = 0.0041 + 0.0005 \times t$$

The high variability of the data defined the low coefficient of determination R2 = 0.2.

Data on variation in the intensity of β (B) infection allow choosing an exponential dependence as a proxy function.

$$\beta = 0.2455 \times e^{ - 0.0414 \times t}$$

A larger sequence of data on infection intensity β (B) results in a higher coefficient of determination, R2 = 0.6736.

The results of the calculations are presented as charts (Figs. 11 and 12).

Fig. 11
figure 11

Calculation of the number of infected I’s

Fig. 12
figure 12

Results of calculating the number of R’s

The comparison of baseline and calculated number of infected I-s allows concluding that the discrepancy reaches a significant value at the end of the time range. To eliminate this discrepancy, approximate parameter β = 0.2455e-0.0414t can be adjusted. By changing this time-related parameter, a value can be found that provides a better match between the calculated and baseline data (Fig. 13).

Fig. 13
figure 13

Results of calculating the number of I’s infected using the modified formula

The modified formula is as follows:

$$\beta = 0.2455 \times e^{ - 0.038 \times t}$$

In this case, the exponent index coefficient is reduced from 0.0141 to 0.039.

By changing one of the approximation formula parameters for the infection intensity, a fairly satisfactory agreement between the calculation data and the original data can be achieved. As a result, data from the first month of the outbreak can help determine the following parameters of the simulation model.

$$\begin{aligned} \beta & = 0.2455 \times e^{ - 0.038 \times t} \\ \gamma & = 0.0041 + 0.0005 \times \\ \end{aligned}$$

t is a model parameter that specifies a discrete calendar time since the start of the epidemic. At any given time point, individuals in the susceptible compartment may become infected by contacting individuals in the infectious compartment.

The equation for involves time-shifted terms and can be solved explicitly. The function e to the power of x is a special case of the exponential function, where the number e, otherwise called the Euler number, acts as the base. Otherwise, such a function is called exponential, it can be written in several forms: =exp(x).

β—intensity factor of contacts with subsequent infection; γ—intensity factor of infected individuals’ recovery. Based on these dependencies, the expected epidemic development for the following month, from May 8 to June 8, was calculated (Fig. 14).

Fig. 14
figure 14

Results of predicting the number of I’s infected

The comparative results of the predicted values and data on the evolution of the epidemic in May–June allow concluding the correspondence between the predictive data and the actual situation.

To assess the forecast, the following criteria must be considered:

$$\sigma \left( X \right) = M\left( {\frac{{X_{ip} - X_{i} }}{{X_{i} }}} \right)^{2}$$

where σ is the mean square of the relative prediction error of the epidemic indicator; Xip—forecast value of the epidemic parameter on the forecast interval; Xi is the actual value of the epidemic parameter in the forecast interval.

The accuracy of the prediction for the number of infected I's and the number of overexposed R's was estimated (Table 4).

Table 4 Evaluation of the COVID-19 outbreak prediction accuracy

The calculations yielded the following results (Fig. 15):

  • Mean square of relative error for number of σ(I) infected = 0.0129,

  • The mean square of the relative error of the number of people who have had a disease σ(R) = 0.0058.

Fig. 15
figure 15

Results of predicting the number of people who have had a disease


To date, an entire clone of succeeding SIR models using more complex prognostic algorithms has been known. That is because at different times and under different circumstances, scientists have had to consider a variety of additional epidemic factors [26,27,28]:

  • SIRS—susceptible-infected-recovered-susceptible. Suitable to describe the propagation dynamics of infections following which only temporary immunity occurs.

  • SEIR—susceptible-exposed-infected-recovered. It considers the particularities of the spread of infections during a pronounced incubation period.

  • SIS—susceptible-infected-susceptible. Suitable for describing the spread of infections without developing immunity.

  • MSEIR—maternal derived immunity-susceptible-exposed-infected-recovered. The model considers children whose immunity is acquired in utero.

Although the mathematical models mentioned are more accurate, they do not take into account: the scope of anti-epidemics/quarantine measures, such as strict observance of social distance and masking regime; climatic conditions; the age of the population; logistics of passenger flows between countries and regions; immunity features, except for MSEIR; and other factors that can significantly affect the accuracy of forecasting. However, major drawback is the complexity of their use, which makes them much less convenient for forecasting and assessing the situation in extreme situations, when decisions need to be made in a situation where time, resources, and reliable information are not sufficient [29,30,31]. This may be supported by an example using the SEIR pattern, whose algorithms were used to make decisions regarding the introduction of restrictive measures in the city of Chicago. Unfortunately, it could not make a significant contribution to stopping the spread of the epidemic in the United States [32].

Mathematical modeling of the COVID-19 pandemic following the SEIR variant was also performed by Celestial experts, considering the peculiarities of Chinese population migration, the role of public health, and the variability of the incubation period. It served as the basis for introducing a system of restrictive measures by the country's government, which helped to combat the pandemic. However, any human-caused suppression of natural epidemic evolution is an interference with the free development of events that changes the conditions for the formation of herd immunity, which may even make the population more vulnerable to repeated epidemic waves [33, 34].

Prediction of the outbreak peak in the city of Moscow (with an error of 2 days and 174 less cases detected than in reality) was obtained by a group of Russian scientists from March to May 2020 using amendments from multiple authors of the SEIR model under the acronyms SEIR-D and SEIR-HCD. They proposed a prognostic scenario for disease progression in the city of Moscow and the Novosibirsk region for different numbers of data tested. At the same time, they believe that the use of coarser mathematical models that take into account fewer homogenous groups is justified only in the case of the availability of larger statistical datasets and a shorter time horizon. However, for modelling and prediction, additional constraints on model parameters other than the mortality parameter were not taken into account. Furthermore, they emphasized that there was no need to generate cluster calculations [35].

It is common knowledge that the risk of infection is by no means constant over time. Control strategies can alter the frequency of human interaction, which is considered to remain constant in simpler models, as a pandemic progresses. The frequency of interaction will change as a result of countermeasures like masks, social isolation, and social distancing, slowing the spread of the pandemic. This is connected with the manifestation of the phenomenon of nonlinear parametric resonance in dynamic systems. If the value of the integral in the mathematical expression

$$\frac{1}{T}\int_{0}^{T} {\frac{\beta \left( t \right)}{{\mu + \gamma }}} dt < 1 \Rightarrow \mathop {\lim }\limits_{t \to + \infty } \left( {S\left( t \right),I\left( t \right)} \right) = DFE = \left( {N,0} \right)$$

is greater than 1, then this means that the disease will not disappear, and there may be such resonances. One can discover, for instance, that the output is a periodic function whose period is a multiple of the input period when the changing contact frequency is used as the system input. This enables the relationship between the period of fluctuations in the frequency of contact and the pseudo-period of damped oscillations near the endemic equilibrium to be used to explain periodic epidemic breakouts of infectious diseases. It is noteworthy that the behavior of infectious disease waves can occasionally be chaotic or quasi-periodic [33]. A model for estimating the likelihood of global spread of a pandemic was recently developed by Valdez et al. [36]. How a stable equilibrium is eventually established is a crucial question for any dynamical system. Do the trajectories change as they get closer to the equilibrium state or do they usually go from one to the other smoothly? The SIR model has high oscillatory dynamics, but as the system balances over time, the magnitude of these fluctuations reduces.

An attractive aspect of the SIR model is that it can be easily modified to simulate vaccination. For this, a special additional parameter V for vaccinated persons is usually used. Using this approach provides an accurate infection-free periodic solution for a fluctuating epidemic situation, which is very attractive even on a global scale if the vaccination rate is high enough. Moreover, this approach demonstrates that if the level of vaccination is less than some critical value, then the disease continues to persist [37].

The SIR model extension can be used to describe the effects of lockdowns within a population, in particular, to simulate the decay of an epidemic over time by reproducing a lattice network model of the spread of an epidemic based on concepts taken from percolation theory. Percolation theory in statistical physics and mathematics defines how a network behaves when new nodes or links are added. To accurately determine how far a virus has spread throughout a region or the entire country and to put appropriate lockdowns in place in each specific location, tracking sick people in a population and their movements is crucial [38].

It is noteworthy that the SIR model is the base one. In the classical version, it does not provide for the possibility of vaccinating the population; herd immunity is assumed to be acquired naturally. Very often, the standard SIR model seems too simple and unrealistic, as it does not take into account the life cycles of the population and assumes that human is contagious immediately after infection. However, for example, in the SEIR model, it is assumed that the infection has a latent period, during which individuals are infected, but not yet infectious. The SEIR model grows more slowly after a pathogen invasion, despite the fact that the SIR model and SEIR behave similarly at equilibrium (when the parameters are scaled appropriately). This is because individuals must first pass through the exposed class before they can contribute to the transfer process. In addition, one of the serious disadvantages of SIR is that people who died from this disease are among the recovered [39]. However, the basic SIR model serves as a starting point for developing more complex models that include characteristics such as demographic groups with different health risks, the impact of public health interventions, natural birth and death rates, and the impact of stochasticity. New pathogens are constantly emerging in the world, which can lead to a pandemic and which humanity has yet to defeat. Therefore, science does not stand still, offering more and more new methods for predicting the behavior of infections, here the role of mathematical modeling is great, which is based on SIR. The topic considered in the article is very extensive and relevant, especially in the context of our time, when humanity has been suffering from a coronavirus infection for several years now.

Thus, the findings of this study confirmed the effectiveness of using simpler SIR models for operational forecasting. Their low precision can be compensated by an adaptive parameter calibration based on monitoring of current situation and real-time data updates. It should be noted that the results of forecasting the number of outbreaks are entirely consistent with data on the evolution of the outbreak. Available data volume was used in the following way: in the 1st month of the epidemic to build a forecast model, and in the 2nd month – to compare the forecast results with actual indicators. The equations of changes in the parameters of γ and β models were calculated based on the data of the first month.

As COVID-19 is a new virus, the R0 parameter is not known exactly. The R0 during the outbreak at Wuhan City was estimated by different authors between 2.2 and 6.5. The problem with the coronavirus is that it has a long incubation period of up to 14 days, and probably a very large proportion of carriers remain asymptomatic. These asymptomatic carriers play an active role in spreading infections. Different R0 values may be present even in the same city, but in different segments of society. Consequently, the issue of determining R0 for coronavirus is not completely closed. At R0 < 1, the pandemic subsides, while at R0 > 1, it spreads. According to data obtained in this study, the coefficient R0 changed exponentially, decreasing from R0 = 5.0 to values below 1 with time.

In the present work, the authors could not pass over in silence the latest achievements in the field of machine learning architecture of neural networks, which could be used to support and make the proposed model more efficient. It is no secret that an already trained and debugged neural network does not respond well to the transition from task to task. Therefore, for each especially specific task, this neural network must be retrained. Significant progress in solving the problem of universality has been made at Google Brain. The neural network, designed to work simultaneously on sets of tasks from different domains, had a complex architecture with blocks for processing different input data and generating a result [40]. Expert level users sometimes need information about data markup. These labels are stored in the database and are available for use by the orchestrator with datasets for training, validation, or testing. The question of what data should be marked up can be defined manually or programmed into the orchestrator. For this, input production data are used when the neural network worked correctly, but uncertainly; and when it was confident, it did not work right. This is one of the foundations of active machine learning architecture.

In implementing a full-scale inclusive health policy, the state provided the Nizhny Novgorod region with all the necessary organizational, material, and financial support. The Government of the Nizhny Novgorod Region, acting in full compliance with the recommendations of the Government of the Russian Federation, has taken measures to prevent the spread of the COVID-19 both to neighboring regions and the penetration of coronavirus from them. These measures included: severe restrictions on the movement of migration flows through total checks with the identification of infected and disinfection of vehicles, the transition of educational institutions to an online learning format, the closure of public catering establishments, shopping centers, etc., the maximum transition of workers to teleworking. There was also cancellation of public events, limiting the number of people who can gather, decrease in public transport, the obligation to stay at home (especially for children and people of retirement age), quarantine measures in health facilities, as well as travel restrictions not only inside country, but also abroad. The measures described have been very effective in reversing the upward trend in the number of cases and deaths from the COVID-19.

The features of the state health policy during the pandemic were as follows: a self-isolation regime was introduced in the Nizhny Novgorod region, which led to an almost complete obstruction of the movement of the population between neighboring regions. Therefore, in fact, it was decided not to include other regions when considering the application of the simulation model. This could certainly make it excessively heavy and not in the best way affect the accuracy of forecasting. In addition, turning again to the policy of the constituent entity of the Russian Federation in the field of healthcare, it should be noted that during the pandemic, for objective reasons, many medical institutions had to temporarily abandon some non-urgent planned surgical interventions and procedures, since a large number of medical personnel were sent to work in COVID-19 hospitals. However, this did not lead to any negative consequences associated with an increase in mortality or social tension in society. Similar indirect effects of the COVID-19 epidemic in other countries have often been more severe in this respect. Poor citizens and disadvantaged groups were more likely to miss out on basic health care there. However, in the Nizhny Novgorod region and in Russia as a whole, such unfavorable phenomena were avoided thanks, among other things, to the use of the model proposed here.

As the incidence of COVID-19 declined, the provision of medical services, which were temporarily suspended, was promptly resumed. Decisions on changes in the provision of services by the leadership of the Ministry of Health of the Nizhny Novgorod Region were made based on accurate and timely data on the real demand for the main set of essential medical services. The COVID-19 pandemic has exposed dangerous gaps in health system preparedness, coverage, and access to health services. Simulation models must play an objectively important role in the prevention of such risks.


In the event of an epidemic, forecasting is necessary to determine how many hospital beds, ventilators, and how many months' worth of supplies of medical personnel protective equipment should be made. It is up to officials to determine if the current limitations and prohibitions are adequate or if more need to be put in place. To maximize productive efforts, judgments that turned out to be useless should be canceled so that the model accurately captures reality and may be used to determine the effectiveness of each newly adopted measure. With the help of both the SIR model and its other modifications, a lot can be explained and taken into account, but they cannot absolutely accurately predict something in the distant future. It is a matter of finding the sweet spot where the system is easy enough to learn to be accessible. Alexander Pope wrote about this: “There is a certain majesty in simplicity which is far above all the quaintness of wit.”

The algorithm described in the paper provides a real quantitative basis for prediction, which is required to understand the VDD and the impact of surveillance activities on it. At the same time, it is important to note that the key information for decision-makers in the region should be real-time follow-up data on the epidemiological situation. Using the SIR model, a prognosis algorithm was proposed based on real-time COVID-19 data. The data analysis revealed an interesting feature: the β infection parameter is not a constant but decreases exponentially. However, a thorough examination of this item is beyond the scope of this paper. The reliability of the proposed algorithm is rather high, even though the results can naturally differ numerically over a long forecast period. Over a 10–15 day horizon, the forecast results normally coincide with the actual data. The authors believe that the model-based approach for disease prediction is much more valuable than the use of formal mathematical methods only since it enables functional inter-relationships between epidemiological variables to be taken into account.

Thus, in case of repeated pandemic waves of new coronavirus infection, the proposed algorithm can be effectively used as a convenient prognostic tool to timely adopt adequate measures and prevent the spread of COVID-19 both in a particular region and in the country in general. However, despite the great importance that predictive algorithms have, it should be clear that they can never be an integral substitute for objective reality. Their role is to serve as a useful tool for obtaining further information, which will certainly prove useful in controlling the spread of this dangerous infectious disease. To enhance the effectiveness of these measures, it is advisable to establish a constant and close professional interaction between epidemiological predictors and public health authorities. After emerging out of this crisis, it is of paramount importance to continue research problematic aspects of forecasting.

Availability of data and materials

The data will be available from the corresponding author (Andrey Reshetnikov) upon a reasonable request.


  1. Jaimes JA, André NM, Chappie JS, Millet JK, Whittaker GR. Phylogenetic analysis and structural modeling of SARS-CoV-2 spike protein reveals an evolutionary distinct and proteolytically sensitive activation loop. J Mol Biol. 2020;432(10):3309–25.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Ullah SMA, Islam MM, Mahmud S, Nooruddin S, Raju MTU, Haque MR. Scalable telehealth services to combat novel coronavirus (COVID-19) pandemic. SN Comput Sci. 2021;2:18.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Islam MM, Ullah SMA, Mahmud S, Raju SMTU. Breathing Aid devices to support novel coronavirus (COVID-19) infected patients. SN Comput Sci. 2020;1:274.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Islam MM, Mahmud S, Muhammad LJ, Islam MR, Nooruddin S, Ayon SI. Wearable technology to assist the patients infected with novel coronavirus (COVID-19). SN Comput Sci. 2020;1:320.

    Article  PubMed  PubMed Central  Google Scholar 

  5. World Health Organization. Weekly Epidemiological Update. 2021. Accessed 12 Dec 2021.

  6. Ministry of Health. Government approves: lockdown extended. 2021. Accessed 12 Dec 2021.

  7. The White House. The Biden-Harris plan to beat COVID-19. 2021. Accessed 12 Dec 2021.

  8. Hui DS, Azhar EI, Madani TA, Ntoumi F, Kock R, Dar O, et al. The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health—the latest 2019 novel coronavirus outbreak in Wuhan, China. IJID. 2020;91:264–6.

    Article  CAS  PubMed  Google Scholar 

  9. Abaeva OP. Attitude of students and teachers to distance learning during the epidemic of new-type coronavirus infection. Chief Physic. 2020;6:36–43.

    Google Scholar 

  10. Reshetnikov AV, Prisyazhnaya NV, Pavlov SV, Vyatkina NY. Perception of the COVID-19 pandemic by Moscow residents. Sotsiologicheskie issledovaniia. 2020;7:138–43.

    Article  Google Scholar 

  11. Cinazzi M, Davis JT, Aggelli M, Joannini K, Joannini K, Litvinova M. Impact of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Sci. 2020;368:395–400.

    Article  Google Scholar 

  12. Daley DJ, Gani J. Epidemic models: an introduction Cambridge studies in mathematical biology. Hum Biol. 2000;72(5):903–5.

    Google Scholar 

  13. Albert R, Barabási AL. Statistical mechanics of complex networks. Rev Mod Phys. 2002;74(1):47–97.

    Article  Google Scholar 

  14. Bollt EM, Ben-Avraham D. What is special about diffusion on scale-free nets? New J Phys. 2005;7(1):26.

    Article  Google Scholar 

  15. Song C, Havlin S, Makse HA. Self-similarity of complex networks. Nature. 2005;433(7024):392–5.

    Article  CAS  PubMed  Google Scholar 

  16. Orlova E, Smirnova L, Grabovskaya O, Kayumova L. Netherton syndrome in combination with iron-deficiency anemia. J Glob Pharma Technol. 2020;12(1):12–21.

    Google Scholar 

  17. Rahman MM, Islam MM, Manik MMH, Islam MR, Al-Rakhami MS. Machine learning approaches for tackling novel coronavirus (COVID-19) pandemic. SN Comput Sci. 2021;2:384.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Islam MZ, Islam MM, Asraf A. A combined deep CNN-LSTM network for the detection of novel coronavirus (COVID-19) using X-ray images. Inform Med Unlocked. 2020;20:100412.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Muhammad PS, Sadi S, Islam MM. EMCNet Automated COVID-19 diagnosis from X-ray images using convolutional neural network and ensemble of machine learning classifiers. Inform Med Unlocked. 2021;22:100505.

    Article  Google Scholar 

  20. Islam MM, Islam MZ, Asraf A, Ding W. Diagnosis of COVID-19 from X-rays using combined CNN-RNN architecture with transfer learning. MedRxiv. 2021; doi: Accessed 12 Dec 2021.

  21. Dietz K. The first epidemic model: a historical note on PD En’ko. Aust N Z J Stat. 1988;30(1):56–65.

    Article  Google Scholar 

  22. Hsieh YH, Liu J, Tzeng YH, Wu J. Impact of visitors and hospital staff on nosocomial transmission and spread to community. J Theor Biol. 2014;356:20–9.

    Article  PubMed  Google Scholar 

  23. Hethcote HW. The mathematics of infectious diseases. SIAM Rev. 2000;42(4):599–653.

    Article  Google Scholar 

  24. Berdutin VA, Zaporozhtsev AV. Mathematical modeling of the COVID-19 distribution dynamics in a large industrial region of the Russian Federation. Chief Physic. 2020;6:6–19.

    Google Scholar 

  25. Sterman JD. Business dynamics systems thinking and modeling for a complex world. New York: McGraw-Hill; 2000.

    Google Scholar 

  26. Kiss IZ, Miller JC, Simon PL. Mathematics of epidemics on networks. New York: Springer; 2017.

    Book  Google Scholar 

  27. Noll NB, Aksamentov I, Druelle V, Badenhorst A, Ronzani B, Jefferies G, Albert J, Neher RA. COVID-19 Scenarios: an interactive tool to explore the spread and associated morbidity and mortality of SARS-CoV-2. MedRxiv. 2020; doi: Accessed 12 Dec 2021.

  28. Volz E. SIR dynamics in random networks with heterogeneous connectivity. J Math Biol. 2008;56(3):293–310.

    Article  PubMed  Google Scholar 

  29. Getz WM, Dougherty ER. Discrete stochastic analogs of Erlang epidemic models. J Biol Dyn. 2018;12(1):16–38.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Timergalieva VR, Khusnutdinov RR, Musina RR, Elizarova ES, Alsynbaev RR, Nasibullin SF, Moustafine RI. Development of orodispersible ibuprofen tablets based on a polymer-drug complex. Drug Dev Registr. 2022;11(3):113–20.

    Article  CAS  Google Scholar 

  31. Semivelichenko ED, Ermolaeva AA, Ponomarenko VV, Novoselov AV, Plisko GA, Ivkin DY, Antonov VG, Karev VE, Titovich IA, Eremin AV. Study of the effectiveness of drugs based on molecular complexes of adenosine-polymer on the model of thermal burn. Drug Dev Registr. 2022;11(3):209–19.

    Article  CAS  Google Scholar 

  32. Read JM, Bridgen JR, Cummings DA, Ho A, Jewell CP. Novel coronavirus 2019-nCoV: early estimation of epidemiological parameters and epidemic predictions. MedRxiv. 2020; doi: Accessed 12 Dec 2021.

  33. Chen Y, Cheng J, Jiang Y, Liu K. A time delay dynamical model for outbreak of 2019-nCoV and the parameter identification. J Inverse Ill Posed Probl. 2020;28(2):243–50.

    Article  Google Scholar 

  34. Tang B, Wang X, Li Q, Luigi Bragazzi N, Tang S, Xiao Y, Wu J. Estimation of the transmission risk of the 2019-nCoV and its implication for public health interventions. J Clin Med. 2020;9(2):462.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Krivorot’ko OI, Kabanikhin SI, Zyat’kov NY, Prikhod’ko AY, Prokhoshin NM, Shishlenin MA. Mathematical modeling and forecasting of COVID-19 in Moscow and Novosibirsk region. Numer Anal Appl. 2020;13(4):332–48.

    Article  Google Scholar 

  36. Valdez LD, Braunstein LA, Havlin S. Epidemic spreading on modular networks: the fear to declare a pandemic. Phys Rev. 2020;101(3):032309.

    Article  CAS  Google Scholar 

  37. Yang Z, Zeng Z, Wang K, Wong SS, Liang W, Zanin M. Modified SEIR and ai prediction of the epidemics trend of COVID-19 in China under public health interventions. J Thorac Dis. 2020;12(3):165.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Croccolo F, Roman HE. Spreading of infections on random graphs: a percolation-type model for COVID-19. Chaos Solit Fractals. 2020;139:110077.

    Article  Google Scholar 

  39. Zhumartova BO, Ysmagul RS. Application of the SIR model in epidemic modeling. Int J Humanit Nat Sci. 2021;12–2(63):1–9.

    Article  Google Scholar 

  40. Kaiser L, Gomez AN, Shazeer N, Vaswani A, Parmar N, Jones L, Uszkoreit J. One model to learn them all. Mach Learn. 2017; arXiv preprint arXiv:1706.05137.

Download references


Not applicable.


This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations



All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by AR, VB, AZ, and SR. The first draft of the manuscript was written by OA, NP, NV, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Andrey Reshetnikov.

Ethics declarations

Ethics approval and consent to participate

Not applicable. No administrative permissions were required to access the “COVID-19 data in the Nizhny Novgorod region” described in the study.

Consent for publication

Not applicable.

Competing interests

The authors state no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reshetnikov, A., Berdutin, V., Zaporozhtsev, A. et al. Predictive algorithm for the regional spread of coronavirus infection across the Russian Federation. BMC Med Inform Decis Mak 23, 48 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: