The coming Omicron waves and factors affecting its spread after China reopening borders

The Chinese government relaxed the Zero-COVID policy on Dec 15, 2022, and reopened the border on Jan 8, 2023. Therefore, COVID prevention in China is facing new challenges. Though there are plenty of prior studies on COVID, none is regarding the predictions on daily confirmed cases, and medical resources needs after China reopens its borders. To fill this gap, this study innovates a combination of the Erdos Renyl network, modified computational model \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$SEIRS$$\end{document}SEIRS, and python code instead of only mathematical formulas or computer simulations in the previous studies. The research background in this study is Shanghai, a representative city in China. Therefore, the results in this study also demonstrate the situation in other regions of China. According to the population distribution and migration characteristics, we divided Shanghai into six epidemic research areas. We built a COVID spread model of the Erodos Renyl network. And then, we use python code to simulate COVID spread based on modified \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$SEIRS$$\end{document}SEIRS model. The results demonstrate that the second and third waves will occur in July–September and Oct-Dec, respectively. At the peak of the epidemic in 2023, the daily confirmed cases will be 340,000, and the cumulative death will be about 31,500. Moreover, 74,000 hospital beds and 3,700 Intensive Care Unit (ICU) beds will be occupied in Shanghai. Therefore, Shanghai faces a shortage of medical resources. In this simulation, daily confirmed cases predictions significantly rely on transmission, migration, and waning immunity rate. The study builds a mixed-effect model to verify further the three parameters' effect on the new confirmed cases. The results demonstrate that migration and waning immunity rates are two significant parameters in COVID spread and daily confirmed cases. This study offers theoretical evidence for the government to prevent COVID after China opened its borders.

resource needs prediction.These researches provide a great deal of support when governments make decisions.COVID is characterized by high variability.Hundreds of mutant strains and five majority strains were in the past three years, such as Alpha, Beta, Gamma, Delta, and Omicron.Up to now, the dominant strain is Omicron.The high transmission rate poses new challenges for epidemic prevention and control.
Before the Omicron spread, the original and Delta strain were the dominant strains.Due to the lower transmission rate of these two strains, the Chinese government relatively reached a balance between controlling the virus and economic development to some event.However, the current dominant strain is Omicron, and the transmission rate of Omicron is much higher than previous strains.The high transmission rate poses new challenges for epidemic prevention and control.Therefore, it becomes impossible to reach a balance.The Zero-COVID policy directly leads to economic recession in China [1,2].As an illustration, China GDP growth rate is only 0.3% in the second quarter of 2022 [3].In addition, the mortality rate of Omicron is much lower than previous strains.The mortality rate of Omicron is 0.0093%.However, the original and Delta strains are 0.079% and 0.054%, respectively [4].In general, Omicron is already a minimal hazard to the body, but the negative impact on the economy is enormous if the government maintains the Zero-COVID policy.Hence, relaxing the COVID policy has become necessary and urgent.
Fortunately, the Chinese government relaxed the Zero-COVID policy on 15 December and reopened the borders on 8 January.Nevertheless, China has a high population density, lacks medical resources, and residents have low antibodies against COVID.Millions of people may die if the government does not prepare and predict before opening the borders.Therefore, predicting medical resource needs and daily confirmed cases becomes more necessary.Although there are many prior studies on COVID, most concentrate on the original or Delta strain.The virus spreading in China is Omicron, which differs from the previous strains.However, no previous research predicts COVID spreading after China reopens the border.In this sense, the paper fills a gap.Since the demographics and characteristics of Shanghai are typical in China, it is an appropriate background to demonstrate the COVID situation in the whole of China.
This study merges Erdos Renyl networks, modified computational models SEIRS and python code to pre- dict daily confirmed cases and medical resources needs in Shanghai after reopening the borders.Firstly, we build Erdos Renyl networks in terms of Shanghai population density and migration characteristics.Secondly, according to the modified computational model SEIRS , we utilize python to simulate the epidemic spreading after determining the nine parameters.Finally, we obtain the results, likely daily confirmed cases, hospital bed needs, and ICU bed needs.Additionally, the study investigates which parameter of transmission, migration and waning immunity rate has the most significant impact on the COVID spreading.In this part, we calculate the overall coefficient by building a mixed-effect model.The highest overall coefficient represents the most significant parameter in COVID spreading.
The rest of the paper is structured as follows: Literature review section reviews the related literature.The progress of the simulation is in Methodology section.Mixed-effect model is presented in Impact of the parameters on daily confirmed cases section.Next, Discussion and Conclusion sections report the discussion and conclusion, respectively.

Literature review
SARS-CoV-2, a novel coronavirus that causes COVID-19 emerged in China in late 2019 and was declared a pandemic by March 2020 [5,6].After that, COVID-19 spread throughout the world.The virus mutated into various strains over the past few years, causing different effects on human health.Due to the high variability of COVID cessing to analyse the severity of COVID, which is, there were hundreds of mutant strains in the past three years [7].There are five main mutant strains since COVID-19 transmission.Having previously been defined as Alpha, Beta, Gamma, and Delta, Omicron became the fifth "variant of concern" by the World Health Organization in November 2021 [8,9].Currently, Omicron is the dominant strain in the world.Researchers point out that these five main mutant strains differ entirely [10][11][12].Under the studies, Omicron has more mutations than Alpha, Beta, Gamma, Delta, and wild-type, making it easier for the immune system to escape and speeding up the spread of the disease.However, Omicron's hospitalization and mortality rate are significantly lower than previous.
So far, scholars worldwide have done a lot of research on COVID-19.They can be divided into several categories, likely daily confirmed cases prediction, sequelae analysis and reason of transmission.
Some researchers investigate the effect of temperature on virus transmission and conclude that higher temperature may sharply decrease the transmission speed [13,14].In the epidemic study, the most classic researches concentrate on predicting the daily confirmed cases and deaths.Numerous kinds of research are about these areas.Another study points out that the role of garbage in the transmission chain is more indirect in the sense that garbage has a complex relationship with public toilets [15].Hence, pushing the ratio of public toilets to the local population in a city to its optimal level would help to reduce the total infection in a region.
Although various kinds of research are valuable, predicting the confirmed cases is always a typical research topic in infectious disease.Researchers use a mathematical model to investigate this problem.For example, [16,17] are under ordinary differential equations (ODEs) to analyse the dynamics of local outbreaks of COVID.It predicted daily confirmed cases and the peak of the outbreak.The transmission rate is measured by varying the level of social distancing.Other papers also use another differential equation, such as the partial differential equation (PDE).The study utilizes PDE to predict the trend of COVID in Arizona, USA [18].However, ODEs and PDE are pure mathematical models which lack empirical research evidence.Hence, the accuracy of results is usually much lower than others.However, some researchers use classic epidemic models.Some previous studies use the typical epidemic mathematical model SIR and SIRS to predict the COVID situation [19][20][21].These prior studies investigate the number of susceptible, infected and recovered.However, some people may hospitalize or die during the COVID spreading.Obviously, the prior studies do not cover these areas.
As the classic epidemic mathematics models have some disadvantages, some researches explore new research areas.For instance, research utilises the self-created mathematical model to simulate the virus outbreak and how to control the epidemic [22].It uses five parameters to build the model, such as the mortality rate for hospitalised people.In addition, prior study modifies the famous epidemic model SIR [23].In fact, COVID exhibits delay due to incubation periods and related phenomena.Hence, the study combines the basic model SIR with delay differential equations (DDEs) and PDE.The above studies rely on differential equations and existing epidemic mathematical models.Using DDEs is advantageous for COVID prediction because it shows an incubation period that can improve models' accuracy.Additionally, DDEs and PDE mathematical models only need a little historical data, enhancing convenience.Nevertheless, these conventional models or equations influence prediction accuracy due to the various uncertainties.Therefore, attempting the computational techniques based on historical data may perform better.
Though some classifications or algorithms are not suitable for predicting daily confirmed cases, some prior studies attempt to utilise statistical models.They investigate how many confirmed cases are there in the future by exponential, non-linear, linear statistical models and Bayesian statistical models [24][25][26].Firstly, the accuracy indicates that the performance of the Bayesian statistical model is worse than the statistical regression model.Thus, these studies are under the regression model.These papers predict the number of confirmed cases in Brazil, India and Myanmar.Although it achieves acceptable Mean Absolute Error (MAE) and Mean Squared Error (MSE), improving or optimising the models is difficult.Because the model always needs more independent variables for higher accuracy.Other studies combine the mathematical and statistical models, namely SEIR math- ematical model and the logistical statistical model.But the study mainly relies on a logistical statistical model to predict the trend of the COVID spread [27,28].It uses partial historical data to train and test the statistical model, which makes researchers comprehend the model's accuracy.The most significant reason is that the model lacks variables.That directly causes inaccuracy.On the whole, the regression model does not perform well.Consequently, the regression or statistical method is still not the most appropriate.
Some studies introduce how to use networks to simulate the virus spreading.Researchers describe that complex networks can utilize in infectious disease prediction, including star-shaped, power-law, and inhomogeneous W-graph [29][30][31].The purpose of a complex network shows the relationship between vertex.Although these complex networks can show the relationship between vertex, which applies to virus prediction, the degree of each vertex is relatively fixed or impractical, influencing prediction accuracy.Another paper predicts the virus spreading based on social networks [32].However, building a social network needs particular data, which is difficult for a significant population prediction.Hence, the social network is not suitable for this study.Previous study applies networks, computational language and programs to predict the trend of COVID [33].It uses a 4-regular network to simulate the virus spreading.In other words, the simulation supposes the number of closed contacts of each person is four.However, this is different from reality.Fortunately, the Erdos Renyl network can modify the degree of each node easily, which can generally restore realistic scenarios.That is why this study applies the Erdos Renyl network.Some research based on computational program language can simulate the COVID spreading arcuately.The research is the guideline for epidemic prediction [34].It points out that a forecast that can be simulated in the most realistic environment is one of the most critical factors in COVID prediction.The computational programming simulation may be the best choice.Hence, this study also uses this method.Unfortunately, there is little research in this area.
These previous researches describe the various approaches studying COVID.The main results include the daily case prediction, factors influencing the morbidity and mortality of COVID and elements in the virus spreading.Although the results demonstrate the daily confirmed cases projection, most use statistical or mathematical models, which do not simulate COVID in an actual environment.Therefore, these studies still have room for improvement in prediction accuracy.
Currently, the dominant strain is B.A.7 [35].However, many previous studies are based on the original strain, Delta or other strains.Since Omicron's transmission, hospitalisation, recovery, and mortality rate are very different from the previous strains, the prior studies do not reflect the current reality.
Moreover, resident COVID-19 antibody strength, population density, and government policy determine the virus's spread.The high population density is a distinctive feature of China, and Chinese residents do not have strong antibodies against COVID-19.The Chinese government implemented a strict epidemic prevention policy, Zero-COVID policy, until December 2022.Consequently, the COVID-19 prediction in other countries and China's COVID study based on data up to December 2022 is not indicative.There are no previous studies on COVID predictions for China reopening its borders or relaxing policies, which is urgent for the academic community to comprehend the COVID situation.Thus, this study focuses on BA.7 and the time after the Chinese government relaxed policies to predict COVID transmission, which is the novelty and uniqueness.Meanwhile, this study utilizes computational modelling to restore a realistic scene as possible and maximize the accuracy, rather than previous studies based on derivatives.

Refined SEIRS model
In this study, we use a modified computational model SEIRS to represent the epidemic spread, which means the susceptible individual may become infected.Then the infected person can recover from the virus.Finally, the individual will become susceptible again or die.Dead people will withdraw from the simulation.In order to recreate a scene as realistically as possible, model SEIRS is like a computational model rather than ODE or PDEs model.According to the above explanation, the following expression or equation demonstrates the rule.
In Eq. ( 1), the first S means susceptible individuals, E is exposed person, I represents infected people, R indi- cates the people who recovered from the virus and these people cannot infect again.The second S displays the recovered individuals who become susceptible again and people who died from the epidemic.After each simulation cycle, the situation of the second S will be the begin- ning of the first S in next simulation cycle.Since the sudden relaxation of covid policy, almost all residents are scared to be inflected.Due to the rapid infection of many (1) people in a short time, most residents worked online, all campuses were closed, all restaurants only supported takeaway service, and all shopping centers strictly limited the pedestrian flow.Therefore, just a few numbers of the exposed population do not influence the covid situation significantly.That is why this study does not consider the exposed population.However, future studies or situations may include the exposed population.Therefore, this also adds an exposed population for future research.
In this paper, this study uses the python code 'random' to make the random parameter for the individual in each simulation and compare the random parameter with the set parameter.As an illustration, if the random migration rate is less than the set migration rate, then this individual will go to another group.However, if the random migration probability is larger or equal to the set migration rate, this individual does not move to another population group.According to the above explanation and Eq. ( 1), we refine it and obtain more details equations to demonstrate the epidemic spreading.Equation ( 2) means one node or (individual) goes to another population group based on migration rate ( δ ).In reality, the system randomly chooses a node and decides whether go to another population group under the migration rate ( δ ).If the person is susceptible, then it can be followed Eq. ( 3), which means the person may infect.In this case, the transmission rate ( β ) will apply.
Furthermore, the infected person has four outcomes.The first situation is like Eq. ( 4).This equation shows that the infected person may recover directly under the recovery rate ( µ ).The second situation is like Eq. ( 5).According to hospitalization rate ( γ ) and hospitalization recovery rate ( τ ), the person is hospitalized after infec- tion and then recovers from the virus.The third one is that the infected person was hospitalized and admitted to the ICU for treatment.Afterwards, this individual also (2) recovers.In this case, we utilize the hospitalization rate ( γ ), ICU hospitalization rate ( ρ ) and ICU hospitalization recovery rate ( ϕ ).This process is based on Eq. ( 6).The fourth situation is considerably different to the previous three.As an illustration, Although the infected patient underwent hospitalization and ICU treatment, he died.Equation (7) and mortality rate ( σ ) are applied in this case.
So far, Eqs. ( 4)-( 7) are the situation in which the individual recovers.According to the SEIRS rule, the recov- ered person may become susceptible again.Equation (8) illustrates that the recovered person becomes susceptible again because of the waning immunity rate ( ε).

Total population
Due to the specificity of this study, which is mentioned in Refined SEIRS model section, only four parts in Eq. ( 1) are counted in the total population equation, namely S , I , R and S .According to Refined SEIRS model section, S , I and R represent susceptible, infected, and recovered, respectively.The second S means death people and the recovered person who may be infected again.However, only three states people exist at the beginning of the simulation.Consequently, the total population includes the number of susceptible, infected and, recovered, which is represented by S(t) , I(t) and R(t) , respectively.The population means the total number of samples.Equation (2) demonstrates the whole population of simulation.N population implies the whole population of this simulation.

Erdos renyl network
At the beginning of the simulation, we use the python library 'networkx' to build the Erdos Renyl network to represent each individual and their closed contacts.Each node indicates an individual, and each individual has a different number of close contacts.In this model, the Erdos Renyl network is constructed based on two parameters, namely the number of nodes N and the probability of each possible vertex connected with other nodes p node .The degree distribution of Erdos Renyl network is a Binomial distribution.
However, this study simulates the virus spreading in a large population community.Hence, the degree distribution is a Poisson distribution.(9) N population = S(t) + I(t) + R(t) (11) Moreover, the property of Erdos Renyl network of this study is subcritical regime which means that graph is almost always disconnected with many components.Equation (13) shows this property.Furthermore, p node is followed by Eq. ( 14).
For instance, if the average degree is 4 and the population of network is 1000, then p node is 4e-3.
In this study, our research background is Shanghai, a city with a population of 25 million and 18 districts.As shown in Fig. 1, they show that the population density in the districts is quite different.For instance, the population density of all districts in the city centre is more than 20000per/km 2 .Of these, Hongkou district population density is 32935per/km 2 .In comparison, the density of the most rural district is between 500 − 2000per/km 2 .Chongming district is only 539per/km 2 .Thence, the pop- ulation density in the city centre is significantly higher than in rural areas.It is obvious in Fig. 1 to see the differences in population density between districts.
Figures 2 and 3 show the population density at 10 a.m. and 10 p.m., respectively.We found the most densely populated area is in the city centre at 10 a.m.However, people are in the rural area at 10 p.m.That is why lots of red dots are gathered in the city centre in Fig. 2 and scattered in rural areas in Fig. 3.These two figures demonstrate that people work in the city centre and reside in the rural area.In general, this describes the characteristic of population migration in Shanghai.
Under the population density in the districts, the characteristic of population migration, and virus transmission, we divide 18 districts into six epidemic research regions to analyse and predict the COVID-19 situation in Shanghai.At the same time, we use the python library 'networkx' to create six Erodos Renyl networks to represent six epidemic research regions based on population density and characteristics of migration [36,37].
In this research, we create six epidemic research regions to predict and analyse the spread of COVID in Shanghai.For creating each network, we need to determine two parameters, namely the number of nodes N and the probability of each possible vertex connected with other nodes p node .N implies the total population of the district represented by the network.To illustrate, (12) the total population of Pudong district is 5.7 million, and Network 4 represents Pudong district.Thus, the number of nodes in Network 4 is 5.7 million.For obtaining p node , the parameter is calculated by Eq. ( 13).In this study, the average degree is equal to the number of closed contacts.Since the total population of the districts N has been determined, estimating the aver- age degree of each network is the next step.Due to the discrepancy in population density of the districts, the number of closed contacts is also diverse.Accordingly, the number of average degrees is not the same in networks.For instance, Network 2 represents the highest population density district.So, the average degree of the network is also the highest.Detailed information on networks describes in Table 1.
It is difficult to show a full-scale network in this paper because each network has millions of nodes.Considering the readability, we shrink the population of the above six epidemic research regions 100,000 times to make six schematics of Erdos Renyl networks for demonstrating the epidemic spreading in Shanghai.For example, Network 1 has 5.4 million nodes.We only utilize 54 nodes in the schematics of Network 1. Figure 4 describes these six networks.In contrast, our simulation of the epidemic spreading in Shanghai is still based on the actual number of populations.The numbers of degrees of each node vary in these six schematic Erodos Renyl networks, meaning each person has a different number of closed contacts.This is an advantage of the Erodos Renyl network because it reflects the reality.It is unlike other networks with constant degree of nodes, such as a regular network.Regular networks are "regular" because each node has the same number of links.In general, using the Erodos Renyl network has enormous advantages over the regular network or others.Using the Erodos Renyl network can simulate COVID spreading in a realistic environment.

Determining parameters
Although we build the simulation system in the previous part, we still need to determine and input all nine parameters, including migration rate ( δ ), transmission rate ( β ) Etc.In this research, the parameters are provided by the Shanghai government.The first case was detected, which means the new wave began on 21 Nov 2022.From that date, the Shanghai government updates the number of new cases, hospitalisations and these nine parameters daily.Consequently, these figures vary every day.However, due to regulation, the complete data for the nine parameters used in this study cannot open to the public.Therefore, we only demonstrate the average for each parameter.They are shown in Table 2.

Progress of simulation
The progress of the simulation is based on Erdos Renyl network, modified computational model SEIRS and python code.Thence, the system uses above three techniques to describe the progress of simulation.Firstly, the system determines the initial infected.Although Shanghai experience a serious COVID wave and a long-time period of lockdown in the first half of 2022, Shanghai government maintain Zero-COVID policy, which has enabled Shanghai to maintain a consistently low growth in daily confirmed cases.From the end of September to 20 November 2022, the daily confirmed case is consistently at zero.However, one new case detected in Pudong district on 21 November 2022.This also marks the beginning of a new wave of epidemics.Therefore, the initial infected is one.As this initial infected was detected in Pudong district and Network 4 represents Pudong district, the system randomly identifies one node in Network 4 and update it is in infected status.
In this research, the system uses python code to make a probability for the individual in each process of simulation.Afterwards, the system compares the random parameter with the set parameters, such as transmission rate, recovery rate, waning immunity rate, Etc.As an illustration, if the random recovery probability is less than the set recovery rate, then this individual will recover.Also, the individual will be infected if the random recovery probability is larger or equal to the set recovery rate.
Throughout the simulation process, the first step is that the system randomly chooses a node to move to another population group which is based on migration rate ( δ ) and Eq. 2. For example, as shown in Fig. 4, if the system randomly selects node 42 in Network 1 to go to another network, then the system will also randomly select a node from another network.For instance, the system chooses node 49 in Network 2 by simulation.42 in Network 1 is infected, and node 49 in Network 2 is susceptible.So, after the migration, the original position of node 42 in Network 1 will be replaced by susceptible node 49 in Network 2. In contrast the original position of node 49 in Network 2 will be replaced by inflected node 42 in Network 1.This example is briefly marked in Fig. 4.Although we only introduce the migration between node 42 in Network 1 and node 49 in Network 2, each node can potentially migrate to other networks.Afterwards, the system will check each susceptible node and its adjacent node in six networks.According to Fig. 4, if node 17 in Network 6 is susceptible, then the  system will check whether its adjacent nodes are infected.This means the system will determine whether adjacent nodes 3 and 22 are infected.If its adjacent nodes are infected, and the random transmission rate is less than the transmission rate ( β ), indicating they meet Eq. ( 3), node 17 in Network 6 will become infected.This example is also briefly marked in Fig. 4. Hereafter, the system focuses on infected individual.This process is based on recovery rate ( µ ) and Eq. ( 4).If the node random recovery rate is less than the recovery rate ( µ ), then the infected individual will become recov- ered.At the same time, the individual cannot be infected again if the person keeps recovered status.
Nevertheless, if the individual cannot recover, then the system tests whether the person will become hospitalized or still infected, which is based on hospitalization rate ( γ ) and Eq. ( 5).Additionally, the system also decides whether the hospitalized person still be hospitalized, ICU hospitalized or died.In detail, if the individual is hospitalized, then the simulation process updates the status of the individual is hospitalized or ICU hospitalized under the ICU hospitalization rate ( ρ ) and Eq. ( 6).In the same way, according to Eq. ( 7), if the individual's status is ICU hospitalized and the random mortality rate is less than set mortality rate ( σ ), then the system labels the particu- lar individual has died.Then, these dead people will withdraw from the simulation.
Although the infected person may become hospitalized, ICU hospitalized or die, they still have the opportunity to recover, except dead person.As an illustration, if the hospitalized person's random ICU hospitalization rate is larger than the set ICU hospitalized rate ( ρ ) and the random recovery rate is less than the set hospitalization recovery rate ( τ ), then  this person is recovered.Suppose the random ICU hospitalization probability is larger than the ICU hospitalization rate ( ρ ), but the random hospitalization recovery rate is larger or equal to the set hospitalization recovery rate ( ϕ ).In that case, the individual is still hospitalized.Likewise, the ICU hospitalized status individual is similar to a hospitalized individual.The system concentrates on the ICU hospitalization rate ( ρ ) and ICU hospitalization recovery rate ( ϕ).Finally, the system focuses on recovered individual.In this case, the waning immunity rate ( ε ) and Eq. ( 8) are two important indicators.The person will return to the susceptible state if the random waning immunity rate is less than the waning immunity rate ( ε ), the person will return to the susceptible state.Otherwise, it will keep recovering.Above explanations are the processes of the epidemic spread.It is also the simulation process for this system.Figure 5 shows the simulation process more intuitively.

Prediction performance and results
So far, we have completed building the computational simulation.Hereafter, we input the nine parameters each day and run the simulation.Finally, we compare the simulation results with actual data to calculate the accuracy for verifying the simulation effect.The actual data includes the number of hospitalized, ICU hospitalized and death from 21 Nov 2022 to 31 Jan 2023.Daily confirmed cases from 21 Nov 2022 to 15 Dec 2022 are also included.Since the Shanghai government no longer collected daily confirmed cases on 15 Dec 2022, the actual daily case data ended by 15 Dec 2022.
After the simulation, we use Eq. ( 15) to calculate the accuracy.In Eq. ( 15), simulation i means the simula- tion results per day, actual i represents the actual data per day, and n indicates the number of days predicted.Lastly, accuracy is derived by calculating the cumulative deviation.
Figures 6, 7, 8 and 9 illustrate the deviation between actual and simulated data in the daily case and medical resource needs.The prediction accuracies of daily confirmed cases, hospital bed needs, ICU bed needs, and cumulative deaths are 0.954, 0.962, 0.951 and 0.968, respectively.And the overall accuracy of the simulation is 0.959.In general, the simulation performs quite well, and the result is reliable.
Since we demonstrate the accuracy of the simulation is outstanding, we use it to forecast the daily confirmed cases, hospital bed needs, ICU bed needs and death in the following year.In addition, we use the historical data from the Shanghai government as parameters used in the simulation [38] point out that although omicron is highly mutable, there has never been a strain in the last six months that was very different from the previous strains.In other words, there is no significant

Building and optimizing statistical model
Through the previous predictions, the results demonstrate that Shanghai will experience the second and third  waves in 2023.If COVID leads to too many people being infected, it hurts the economy deeply [39].Thence, it is necessary for the government to minimise the number of daily confirmed cases in each wave of the epidemic as few as possible.Therefore, identifying the important factors that can influence the number of daily confirmed cases is the crucial section.This can make the government provide more scientific guidance to citizens for selfprevention, which decreases the negative impact on the economy and protect people's health.In the simulation, three parameters can influence the daily case: transmission rate, migration rate and waning immunity rate.Thus, we use historical data from the Shanghai Government relating to these three parameters and the parameter 'recovery rate' to build the statistical model.We need to add the variable 'recovery rate' in the statistical model because adding the variable 'recovery rate' makes the simulation complete and more realistic.Since hospitalized and dead people only count a few portions of the whole population, it does not affect the result of the statistical model.Thus, the statistical model does not include the variables for hospitalization death, such as variable 'hospitalization rate' , 'mortality rate' and so forth.In general, the dependent variable is the daily confirmed case.The independent variables include transmission rate, migration rate, waning immunity rate and recovery rate.After building and optimizing the statistical model, we calculate the overall coefficient of each parameter.The highest overall coefficient means the maximum influence of daily confirmed cases.

Linear and multiple polynomial model
Our first step utilizes multiple linear regression with homogeneous variance.The classical linear model (LM): Implies that outcomes Y i are independent and normally distributed: where µ i = X T i β .Note that all Y i have the same variance, namely σ 2 .Hence, the simple linear regression model is Then we use R code 'lm' to make a LM.However, we found the residual is too large and we notice each variable residual plot has quadratic pattern.Therefore, we use quadratic, cubic and other powers to increase the fitted model accuracy until hypothesis test's p-value is greater than 0.05 which means we accept null hypothesis.Finally, the best fitted model has shown in below:

Mixed-effect model
The above machine learning algorithm (linear or multiple regression) apply constant variance var[Y i ] = σ 2 .How- ever, the heterogeneous variances are more appropriate in reality, rather than homogeneous variance (constant variance).We now relax the constant variance assumption and assumption that var[Y i ] = σ 2 i .Therefore, a LM with heterogeneous variance can be formulated as: and the ε i are independent.Or, in matrix notation, where R is a diagonal matrix.
The simplest way to introduce heteroscedasticity and, at the same time, to reduce the number of variance parameters, is to assume that the variance of ε i is equal to a known proportion of one (unknow) parameter σ 2 .More specifically, we may associate with every observation i a known constant w i > 0 and assume that For instance, if Y i is the average of n i observations (all with the same covariates) and the original observations were homogeneous.We can consider the transformed model: Then var w 1 2 i ε i = σ 2 ---we are back at a homogene- ous LM.This motivates estimates / estimators of β via a weighted sum of squares: whereW = diag(w 1 , w 2 , . . ., w n ) is a diagonal matrix which leads to A more general and flexible way to introduce variance heterogeneity is by means of a variance function g(.).The variance of the residual errors var[ε i ] , is expressed as follows: where , σ is a scale parameter, v i is a vector of (known) covariates defining the variance function for observation i , while the vector δ contains a small set of variance parameters, common to all observations.Note that, because function g(.) involves µ i , it in fact depends on β , too.It is worth underscoring here that the parameter σ 2 in general should be interpreted as a scale parameter.This is the classical LM with homogeneous variance in which σ 2 can be interpreted as residual error standard deviation.Note that, g(.) should, strictly speaking, be referred to as a function modelling standard deviation, not variance.However, the term variance function is commonly used when referring to g(.).
In this study, we use two variance functions, namely different variances per stratum (varIdent) and power of a covariate (varPower).
For varIdent, this class represents a variance model with different variances for each level of a stratification variable s, taking values in the set {1,2, …, S}, This variance model uses S + 1 parameters to represent S variances and, therefore, is not identifiable.To achieve identifiability, some restriction needs to be imposed on the variance parameters δ .δ i = 1 is used, so that δ I , I = 2, . . ., S , represent to ratio between the standard deviations of the /th stratum and the first stratum.By definition, δ I > 0 , I = 2, . . ., S.
For varPower, the variance model represented by this class is These main arguments to varPower are value and form, which specify, respectively, an initial value for δ , when this is allowed to vary in the optimization, and a onesided formula with the variance covariate.Note that, when v i = 0 and δ > 0 , the variance function is 0 and the variance weight is undefined.Therefore, this class of variance functions should not be used with variance covariates that may assume the value 0.
Afterward, we use R code 'glm' to build the mixedeffect model based on the Eq.16.We apply varIdent and varPower variance function.Thus, we get the two mixedeffect statistical model.The first model with varIdent represents as follow.
The first model with varPower illustrates as follow.

Multiple polynomial model vs mixed-effect model
So far, we have built three statistical models: one multiplepolynomial model and two mixed-effect models.Therefore, our next step is determining whether the mixed-effect model is better than multiple polynomial model.Firstly, multiple polynomial model is better or the mixed-effect model with varIdent.R code ' ANOVA' is be used in this test.This command tests: Test statistic (likelihood-ratio test) has asymptotically a X 2 3 distribution, i.e., a X 2 distribution with 3 degrees of freedom.Also, the p-value is < 0.001.Hence, we believe that alternative hypothesis ( H 1 ) is our preferred.In other words, mixed-effect model is a more appropriate model.Secondly, we still use R code ' ANOVA' to test whether varIdent is better than model with varPower.This command tests: (18) Test statistic (likelihood-ratio test) has asymptotically a X 2  1 distribution, i.e., a X 2 distribution with 1 degree of freedom.Additionally, the p-value is < 0.001.Hence, we believe that alternative hypothesis ( H 1 ) is our preferred.In other words, mixed-effect model with varPower is the best model.We also use Akaike information criterion (AIC) to test the performance of the statistical

Result of statistical model
After the analysis, ANOVA test points out the mixedeffect model with varPower is the most appropriate statistical model.The coefficient of final model shows in Table 4.
Since the overall coefficient of the parameter represents their influence on COVID spread and daily confirmed cases, we input 0.01 to calculate the overall coefficient as an example.The following calculation illustrates how to calculate the overall coefficient of immigration rate.
After calculation, the overall coefficient of the immigration rate is 7.418.However, 7.418 does not mean about 7 more individuals will be infected if the immigration rate increases by 0.01 (1%).As a result, the overall coefficient, 7.418, only represents a quantitative increase in daily confirmed cases.Afterwards, we use the same approach as above to calculate the overall coefficient of the other two parameters, including transmission and waning immunity rate.The overall coefficients show in Table 5.
According to Table 5, results show that both the waning immunity and migration rate are more important than the transmission rate because of higher overall coefficients.

Discussion
Most previous studies concentrate on the original or Delta strain, and none investigates the Omicron strain after China reopens its borders.Hence, this research utilizes the Erdos Renyl network, optimized computational model SEIRS , and python code to predict the COVID trend.Figures 6, 7, 8 and 9 illustrate the forecast of daily confirmed cases, hospital bed needs, ICU bed needs and cumulative deaths.The overall accuracy of the simulation is 0.959, which shows that the simulation is appropriate and the result is credible.Afterwards, the research uses historical data from the Shanghai government to determine the nine parameters and predict the COVID trend.Figures 10,11  As shown in Fig. 10, Shanghai will experience two waves of COVID after the first wave.Their peaks will occur in mid-August and mid-December.The maximum daily confirmed cases will be around 88,000 and 340,000, respectively.Figure 11 demonstrates that during the second wave, the hospital bed needs will drop from a peak of 24,000 in mid-August to 8,000 in September.Similarly, 0.01 × 735.614 + 0.01 2 × 616.65 + 0.01 3 × (−157.74)= 7.418 the peak of ICU bed demand is also in mid-August, and the maximum demand for ICU bed is 1200.Finally, its needs will decrease to a minimum of 400 in September.However, the third wave is more severe than the second wave.During this time, there will be 74,000 hospital beds and 3,700 ICU beds for COVID patients, which will happen by the end of December or early January.Figure 12 illustrates the cumulative deaths of about 31,500 in the three waves.The second and third waves caused fewer deaths overall, with 4,000 and 10,500 deaths, respectively.
In general, according to Figs. 7, 8 and 9, among the three waves of COVID in 2023, the first wave is the most severe.The remaining outbreaks were far less severe than the first wave.While the second wave of COVID is not expected to result in a medical resources shortage, a more severe medical resources shortage is expected in the third wave.So far, there are 141,000 hospital beds and 1497 ICU beds in Shanghai.Therefore, ICU bed will be in short supply in mid-November due to the third wave of COVID.Fortunately, existing hospital beds are sufficient for the COVID outbreak the following year.According to Fig. 10, with China opening its borders, 31,500 people will die from COVID in Shanghai during the three waves.
According to the above results, they demonstrate that the second and third will occur in 2023.Finding factors that reduce COVID cases daily is crucial to minimise economic and health risks.This research utilises the historical data from the Shanghai government to build the statistical model to investigate the impact of transmission, migration and waning immunity rate on COVID spread and daily confirmed cases increasing.The hypothesis test and p-value demonstrate which statistical model is the most appropriate in this part.Firstly, we determine which variance is more suitable for these data, such as homogeneous and heterogeneous variance.The p-value rejects the null hypothesis.Consequently, the mixed-effect model with heterogeneous variance can demonstrate valuable results.Secondly, the system applies another hypothesis test to decide whether the mixed-effect model is under variances per stratum (var-Ident) or power of a covariance (varPower).Since the relatively lower degree of freedom (DF) and lower AIC is our preferred, the mixed-effect model with varPower is the best statical model.After the calculation, the overall coefficient of the transmission, waning immunity, and migration rate are 3.096, 7.219, and 7.418, respectively.Therefore, waning immunity and migration rate are two essential parameters in COVID spread.Decreasing these two parameters can significantly reduce the number of daily confirmed cases, especially migration rate.It is crucial for the government to compile the COVID self-prevention guide for residents after China opened its borders.

Conclusion
After describing the prior studies, this research utilizes a modified computational model SEIRS and python code to predict the COVID spreading trend, and the medical resources needs after China reopening the border.Moreover, the research also builds statistical models to investigate which parameter significantly impacts COVID daily new cases among transmission, migration, and waning immunity rate.These findings provide a strong basis for the government to prepare medical resources and develop guidelines for citizen self-prevention after China reopens the border.

Implication
The simulation results indicate that the second and third waves will happen in May-June 2023 and Oct-Dec 2023, respectively.This shows that the outbreak is far from over.Therefore, the government should remind the public not to ignore COVID.For example, people maintain a safe distance from others as far as possible in public places.In the meantime, wearing a mask in crowded places is also a practical approach to self-protection.
Moreover, the government also prepare more medical resource, such as ICU bed.Although the number of hospital beds and ICU beds available is enough for the second wave, there is a shortfall of nearly 2,200 ICU beds in the third wave.Fortunately, the number of hospital beds is sufficient.There are 160,000 hospital beds in Shanghai.The peak of hospital demand is about 81,000.Consequently, preparing more ICU beds before the third wave is the priority for the government.
In this study, we choose Shanghai as the research background because it is representative in China.Hence, the result of this study can demonstrate the situation in China after reopening the border.According to the above result, Shanghai is facing a shortage of medical resources, including ICU beds.Since the number of ICU beds per 100,000 people in Shanghai is 5.99, much higher than the average level in China, 4.6 ICU beds per 100,000 [40].Therefore, the shortage of ICU beds in the other regions of China will be even more serious.Therefore, the government should prepare more ICU beds before the peak of the third wave.
According to the mixed-effect model built in this study, the result demonstrates that the waning immunity and migration rate are two essential parameters in COVID spreading.Therefore, the government should attach importance to residences' immunity against COVID because decreasing immunity strength will cause higher infection probability and inflection.Also, before the second wave of the epidemic comes, the government can encourage people to work at home and limit the time of going out to decrease the migration rate, reducing the number of daily confirmed cases.Furthermore, the government should compile and update the COVID self-prevention guide for residents to illustrate which approach is the most appropriate to achieve the most effective self-prevention.

Limitation
Although we build six networks based on population density and migration characteristics, we do not contain community scenarios like campuses, parks, and supermarkets.The population density of these places is usually high, which may lead to higher transmission rates and more infections.Hence, creating more scenarios in further simulation is an upgrading area of future studies.
Furthermore, the hospitalization, ICU, and mortality rates vary by age group.To illustrate, hospitalization rates for older people are always higher than for younger people.Hence, considering age structure will further improve the accuracy of the simulation, which is another further study.

Introduction
SARS-CoV-2, a novel coronavirus that causes COVID-19, was discovered at the end of 2019.The local spread began in China at the beginning of 2020.Then the virus spread rapidly around the world after April 2020.Due to the high mortality rate and lockdown, COVID has negatively impacted people's health and the global economy.Consequently, scientists and researchers have done plenty of research regarding COVID, including nucleic acid reagents and vaccine developments.Also, some researchers concentrate on the daily confirmed case, and medical Page 2 of 20 Wang and Wang BMC Medical Informatics and Decision Making (2023) 23:186

Fig. 1
Fig. 1 Shanghai population distribution and density (Fig. 1 originally from paper 'A Multi-Indicator Evaluation Method for Spatial Distribution of Urban Emergency Shelters' .Permission obtained)

Fig. 11
Fig. 11 Hospital and ICU bed needs prediction

Fig. 12
Fig. 12 Cumulative deaths prediction and 12 demonstrate the detailed results.

Table 3
Details of AIC test result

Table 4
Coefficient of mixed-effect model

Table 5
The overall coefficients of parameters Table3represents the details of AIC test result and degree of freedom (DF).According to the above result, the Mixed-effect model with varPower is the best statistical model because of minimum AIC and relatively lower DF.