Estimating the Level of Asymptomatic COVID-19 Infections in Northern Ireland in 2020 ()
1. Introduction
The mathematical modelling of infectious disease has a long history and has in recent years been a key element of the investigation of infections in human and animal populations [1] - [6]. Moreover, mathematical modelling is now firmly established as a key tool in public health and in health care planning, and in guiding the responses to infections [7] [8]. Soon after the report of the presumptive respiratory tract zoonosis caused by the novel coronavirus in 2019 [9] [10] [11] that had the potential to cause multi-system illness and death and with the clear person-to-person transmission it became clear that there was a potential for a worldwide pandemic of a novel virus in a virgin population [12] [13] [14] [15]. Considerable scientific and medical interest quickly led to rapid progress in the development of insights into the pathogenesis of COVID-19 disease with the definition of its cellular receptor [16] and basic characteristics [17]. Since then, despite the challenges posed by this unknown disease, see [18], mathematical modelling of the epidemic has been reported [19] - [31] and used to guide and inform public health, economic and political decisions [32] [33] [34]. Epidemiological data soon indicated that the epidemic had very different characteristics in different populations and the importance of local factors (social, cultural, demographic, economic, transport infrastructure, housing, etc.) in determining epidemic development became clear [33] [34]. Such factors argue for the crucial need for the use of modelling in the context of local environments and we took advantage of a detailed data set from the Northern Ireland Department of Health as a platform for our modelling efforts. We recently described our basic SEIR model and demonstrated it provides a robust description of the Northern Ireland COVID-19 epidemic in 2020 [35].
As with other respiratory infections, such as influenza, the spread is seen across spatial scales. The majority of transmission events occur in household and household-like settings such as nursing homes, prisons, communal housing for workers, etc. [36]. Early in the pandemic, it became clear that there was a variable but significant pre-symptomatic period for many (if not all) infected individuals. Furthermore, studies from a range of locations have provided support for this idea and the duration of pre-symptomatic infection appears to range from 5 to 11 days [37]. Over and above this, there is clear evidence for asymptomatic infection without any symptoms reviewed in [38]. Most of the reports of this asymptomatic phenomenon were retrospective in nature and cross-sectional with limited longitudinal data. Since they were usually serendipitous in nature, the studies have significant methodological failings including poor symptom definition, inadequate follow-up and concerns about testing protocols [39]. Nevertheless, the available evidence points to asymptomatic infection being a highly prevalent phenomenon. While we remain ignorant of much of the biology underpinning this phenomenon viral loads seem similar [40] and asymptomatic transmission of infection is well documented [37] [38] [39] [40] [41] [42]. It follows that both pre-symptomatic and asymptomatic cases are potential of great significance in driving infection [36] [43].
An additional area of uncertainty in defining the dynamics of the COVID-19 pandemic has been the consideration of uncertainty in testing. Broadly COVID-19 testing incorporates three techniques: 1) PCR detection of viral nucleic acid; 2) antigen detection, and 3) antibody detection. For all of those techniques, there is a range of commercial kits available. However, Axell-House et al. [44] argued that many current studies evaluating test performance characteristics were not methodologically robust with sub-optimal statistical methods for the estimation of test performance characteristics. Similar concerns have been voiced elsewhere [45] - [54]. Moreover, there has also been a concern about the use of differing
threshold values in PCR tests [53] [55]. These methodological issues remain unresolved and so the comparison of reports from different jurisdictions and health care systems is problematic. This is significant since that error (or at least uncertainty) in tests will be significantly impacting prevalence data, i.e., how many positives there are in a population [54].
Here, we extend our recently described SEIR model of COVID-19 in the setting of Northern Ireland to investigate the impact of differing levels of asymptomatic transmission and the impact of test uncertainty on the model. More specifically, by using our model, introduced in Section 2.2, we are able to model the course of the COVID-19 epidemic in Northern Ireland in 2020, including the level of asymptomatic cases.
2. Materials and Methods
2.1. Data
In Northern Ireland (NI), the Department of Health (DoH) publishes daily updates of COVID-19 related data [56]. As with our earlier study [35], we restricted the analysis to the period from 1st March 2020 up to 25th December 2020. This period is well documented, significant new restrictions were imposed on 26th December 2020 [57], and we avoid the as yet uncertain impact of COVID-19 strains [58] [59].
This study is based on data sets from the aforementioned period, particularly on the time series of the daily number of confirmed cases i.e., positive tests or the daily incidence of infection, represented in Figure 1. However, the basic model, described below, provide the number of symptomatic infectious individuals from the infected subpopulation at any given time
, i.e., the prevalence of infection. To overcome this issue, we generated the number of cumulative cases from our data set, and extended the basic model so that it provides the estimation of the cumulative cases at any given time
. Figure 2 depicts the
Figure 1. Daily number of positive tests in Northern Ireland from 1st March 2020 to 25th December 2020 [56].
Figure 2. Cumulative number of reported positive tests in Northern Ireland from 1st March 2020 to 25th December 2020.
cumulative number of confirmed cases in Northern Ireland from 1st March 2020 to 25th December 2020.
Furthermore, we divide the period from 1st March 2020 to 25th December 2020 into 11 overlapping intervals
, such that:
• The first 10 intervals,
with
, have an equal length of 29 days;
• The last interval,
with
, has a length of 21 days.
As such, the first interval mostly covers the period before the first nationwide lockdown introduced in the United Kingdom (UK) on 27th March 2020. Let
denote the data from the ith interval
. During this study, the population size of Northern Ireland, N was kept at the estimated value of 1,893,700 [60] with no incoming or outgoing travel.
2.2. The Transmission Model
Various deterministic compartmental models have been used to model the early phase of the outbreaks in various countries. The most widely used model is an SIR model. The SIR model assumes infectiousness right after the exposure to the causative agent. However, there is an incubation period of 6 days on average, see e.g., [61] [62] [63]. Also, a large-scale meta-analysis [64], that includes 104 studies with 20,152 COVID-19 infections from 12 countries, estimates the level of asymptomatic cases to be around 13.34% of the infectious subpopulation. Similarly, another meta-analysis estimates the extent of asymptomatic infections to be 17% [65]. Notice that using an SIR model, this aspect of the infection cannot be addressed. To incorporate this and other epidemiological characteristics in modelling COVID-19 epidemic beyond the early growth in the number of positive tests, we use several copies of an SEIAR model, whose transmission diagram is depicted is Figure 3, in approximating the course of the pandemic in Northern Ireland in 2020. The variables represent the number of susceptible (S), exposed (E), symptomatic infectious (I), asymptomatic infectious (A) and recovered (R) individuals in a population of size
. We summarize the meaning of the state variables in Table 1.
The disease is transmitted in two routes: 1) at a rate
, from a symptomatically infectious individual to a susceptible one; 2) at a reduced rate
, from an asymptomatically infectious individual to a susceptible one, where
.
Figure 3. Transmission diagram for the SEIAR model (1). The meaning of the parameters is collected in Table 2.
The infected individuals are moved to the exposed comparment. Individuals from compartment E move to compartment I at a rate
, where
is the average incubation period, with probability
. Similarly, after
incubation time but with probability
, people move from compartment E to compartment A, that is, the probability of being asymptomatically infectious is p. Subjects from compartment I and A progress to compartment R
at a rate
and
, where
and
is the average duration of
infection in compartment I and A, respectively. Furthermore, we assume that the population size N is independent of time, and the exposure to the pathogen offers immunity for the time period of the study. Notice that the model is without vital dynamics, that is we do not consider the effect of birth and death processes including population flux into and out of NI, hence
. Thus, the described flow of the disease transmission results in:
(1)
To approximate the dynamics of COVID-19 in Northern Ireland on the 11 consecutive intervals, defined in Section 2.1, we use the following modification of (1):
(2)
with initial values:
(3)
where
and
non-negative real variables. The additional variable,
, does not affect the transmission dynamics; its role is to capture the cumulative number of COVID-19 cases. The meaning of the non-negative parameters
and
is summarized in Table 2. Also, in Table 2, we provide the ranges used in the fitting process described in Section 2.4.
We assume that
is independent of i.
2.3. Basic Reproduction Number
In characterizing the early phase of a disease outbreak, the so-called “basic reproduction number”,
, is the most commonly used metric. This transmissibility quantifying number is defined as the mean number of secondary infections induced by an infective in a completely susceptible population. In general, if
an epidemic occurs, and larger values of
can signify complex challenges for controlling the outbreak. In addition, the formula of
, provides useful information to design intervention measures to control an outbreak. For (2), we define:
(4)
which is a threshold number for (2), see [66]; that is, no epidemic occurs and the disease dies out if
. However, as the epidemic progresses, in the subsequent intervals the defining assumptions of
are not satisfied. For instance, it might be that
for some
or, since we assume complete immunity after recovery, the population is not fully susceptible anymore after some time. Therefore, in Section 4, we provide the so-called effective reproduction number defined as:
Table 2. Ranges of parameters in (2),
.
However, we note that, by using the data,
for
.
2.4. Parameter Fitting
We estimated the parameters in (2) by fitting its variable
to
, the data from the ith interval
, as follows. For each time interval
, numerical solutions of the model (2) with initial condition (3) were generated while the corresponding nonlinear curve-fitting problem was solved in the least-squares sense. The ranges of the positive parameters
and
, used in the fitting process, are given in Table 2. We set these ranges so that they contain the widely accepted intervals of the considered parameters [61] [62] [67] - [72].
The fitting process for each time interval
was initiated with
,
,
,
,
and
for
, in addition for
, we used
and
. All the computations were made by using MATLAB [73] leveraging the functions ode 23 and lsqcurvefit. We denote the resulting solution of (2) and its
component by
and
,
, respectively.
2.5. Confidence Intervals
The confidence intervals were created by using a parametric bootstrap method described in [74], which, in order to keep the presentation self-contained, we describe briefly. From
, we derived the fitted daily incidence of infection,
, and generated a new random sequence of daily incidence of infection,
from the Poisson distribution specified by the rate parameter
. Finally, after obtaining
on
, we fitted (2) to
by applying the steps detailed in Section 2.4. By repeating these steps 10 times, we obtained parameter estimates
and initial values
which we used in constructing the corresponding confidence intervals. More specifically, because of the sample size and the unknown standard deviation of the sample, we used the inverse cumulative distribution function of the Student’s t distribution.
2.6. Modelling Testing Uncertainty
To investigate the effects of uncertainty in testing—as described in Section 1, we ran simulations with random parameter sets; and we kept a set of parameters if the solution of (2) generated by the given parameter set satisfied a certain condition. More specifically, by using Latin Hypercube Sampling [75], we generated parameter sets of 5000 samples from
,
,
. Furthermore, using the same method, for
, samples of initial values
and
were generated. For
, we used (3) with the components of
. After obtaining the numerical solution of (2) with a set of parameters and initial conditions, we kept the solution, together with the corresponding parameters and initial values, when it satisfied:
(5)
where
in this study, that is, we assume 20% uncertainty in testing.
2.7. Bounds
and
To provide a range of possible values of
, we used the parameters kept in Section 2.6, and (4) to find bounds
and
of
. These are not necessary sharp bounds since we may miss, when sampling, parameters affecting
and
.
3. Results
We successfully fitted (2) to the number of individuals tested positive for COVID-19. Our model, incorporating a compartment for asymptomatic individuals, provides estimates on the size of this subpopulation. However, it is very difficult to obtain realistic picture about the level of asymptomatic infections without mass-testing. Nevertheless, our incidence-based model indicates that the probability of being infectious without symptoms ranged between 0.0754 and 0.25 following the pattern of Non-Pharmaceutical Interventions (NPIs) in NI in 2020. Detailed presentation of our estimates with confidence interval can be found in Table 7, which are in good agreement with the recent study [77].
In Tables 3-8, we present the parameters provided by the fitting algorithm described in Section 2.4. For those parameters, confidence intervals, computed by the method described in Section 2.5, are also provided in Tables 3-8. Based on these parameters, we also provide estimates of the basic and the effective reproduction numbers using (4) and the procedure described in Section 2.7.
In Figure 4 and Figure 5, we plotted the solution of (2) when
, that is for the period between 1st to 29th March 2020. Our estimate of
for this period is 3.3089; also
and
. Furthermore, in Table 9, we present the weekly estimates of
for the period of study, provided by the Department of Health of Northern Ireland. We can use these estimates to approximate required size of the immunised population (via infection or vaccination) to prevent large subsequent waves of infection. Namely, since the minimum level of vaccination, with vaccine giving 100% immunity, to achieve herd immunity is:
(6)
provided
, see [78]. Using
from Table 10, we obtain the following estimate of the herd immunity.
Figure 4. The cumulative number cases and the
component of the numerical solution of (2) with
and parameters from Table 10. Horizontal axis: days from 01/03/2020.
Figure 5. The numerical solution of compartment E, I, A and R of (2) with
and parameters from Table 10. Horizontal axis: days from 01/03/2020.
Table 5. The estimates of
.
Table 8. The estimates of
.
Table 9. Weekly estimates of
for the period of study, provided by the Department of Health of Northern Ireland [76].
Table 10. Parameter estimates from the fitting process.
(7)
Furthermore, by using
and
, we get a lower and an upper estimate for herd immunity:
Notice that:
(8)
which is significantly lower than the estimate (70% - 80%) recently reported in the news [79]. Furthermore, in Table 10 and Table 11, we collected the parameter estimates, and initial values of E and A, respectively. The latter indicates that approximately 5 exposed and 6 asymptomatic individuals on 1st March 2020 in NI could have triggered the epidemic in the country. Using the values from Table 10 and Table 11, we plotted the result of a simulation to explore the course of a hypothetical epidemic without any NPIs in NI in 2020 (Figure 6).
Similarly to Figure 4, Figure 7 depicts the fitted solution together with data points for i = 1, …, 10. Also, Figure 8 and Figure 9 shows simulation results when i = 0, …, 5 and i = 0, …, 10, respectively.
Figure 6. The hypothetical course of the epidemic without lockdown as the result of a simulation with values from Table 10 and Table 11.
Figure 7. The cumulative number cases and the
component of the numerical solution of (2) with (3) and parameters from Tables 3-8. Horizontal axis: days from 01/03/2020.
Table 11. Estimates for the initial values of
, the number of exposed individuals, and
, the number of asymptomatic individuals from the fitting process at the beginning of the study, 01/03/2020 together with 95% confidence intervals.
Figure 8. The numerical solution of compartments
,
and
of (2) for
with initial conditions
and (3) when
.
Figure 9. The numerical solution of compartments
,
and
of (2) for
with initial conditions
and (3) when
.
In Tables 3-8, we present the fitted parameter values for
together with the 95% confidence intervals. Using these parameter values, we computed numerical solutions of (2) using (3) for
. The results of the solutions are plotted in Figures 10-12. More specifically, Figure 10 shows the solution components for the first 29 days of the epidemic. The solution components for the first wave of the epidemic, the first 169 days, are plotted in Figure 11. Furthermore, in Figure 12, we plotted solutions of (1) by using (3) for
. In addition to the solution profiles, in Figures 10-12, we also plotted results of the procedure, described in Section 2.6, to illustrate the effects of testing uncertainty. Finally, in Table 12, we provide the number of solutions satisfying (5).
Table 13 summarizes the findings on
, calculated by using (4). In addition, the table contains the values of
and
, obtained by the method described in Section 2.7 where
. The findings on
are visualized in Figure 13.
4. Conclusions
We fitted our model (2) to the daily incidence of COVID-19 in Northern Ireland
Table 12. The number of solutions kept after the procedure described in Section 2.6.
Figure 10. Time profiles of variable of (2) with the values from Tables 3-8. First row from left to right:
,
and
, second row from left to right:
,
and
. The shaded regions are obtained by the method explained in Section 2.6 to visualize the effects of uncertainty in testing.
Figure 11. Time profiles of variable of (2) with the parameter values from Tables 3-8. First row from left to right:
and
, second row from left to right:
and
, and third row
and
. The shaded regions are obtain by the method explained in Section 2.6 to visualize the effects of uncertainty in testing.
in 2020; the data are provided by the Department of Health, and they are publicly available. The main finding of this study is that the proportion of asymptomatically infectious subpopulations ranged between 5% and 25% in Northern Ireland in 2020. Our estimate of the basic reproduction number,
, is 3.3089. This implies that around 70% of the population of NI should acquire immunity via
Figure 12. Time profiles of variable of (2) with the parameter values from Tables 3-8. First row from left to right:
and
, second row from left to right
and
, and third row
and
. The shaded regions are obtain by the method explained in Section 2.6 to visualize the effects of uncertainty in testing.
infection or vaccination. This estimate is in the range of estimates reported in other studies, see e.g., [80]. Also, by using
and
from Table 13, we obtained a lower and an upper estimate for herd immunity
.
A justifiable criticism of our modelling is the lack of precision in the definition of asymptomatic COVID-19 infection. Indeed this has been a source of
Figure 13.
using the estimates of
, in Table 13.
Table 13. Estimates of
using (4) and parameter values from Tables 3-8 and the process explained Section 2.7.
concern and confusion since the beginning of the pandemic [37] [38]. The initial case definition focused entirely on fever and cough. In particular, through the Zoe self-reporting phone app, see [81] [82], anosmia rapidly became recognized as a very common feature of COVID-19 infection. Subsequently, a wide range of other symptoms has been associated in some (but by no means all) COVID-19 patients [77] [83]. These include fatigue, muscle and chest pain, nausea, headache, breathlessness, abdominal pain, diarrhoea, hoarse voice, skin lesions and swelling (particularly on lips and face), finger and toe lesions, eye pain and a host of other features [77] [83]. Indeed the range of features is so broad yet also so variable in any given patient that none of the clinical features have significant diagnostic specificity and merely raise the possibility of COVID-19 infection and thus require laboratory test confirmation [83] [84]. Thus from an initially restrictive case definition, there is now evidence of extraordinary pleiotropism of effects of the virus. So what should be the definition of asymptomatic COVID-19 infection? Is it lacking the cardinal features of the infection, or is it lacking any of the possible clinical features? This complex issue remains unresolved. However, what is clear is that there is a substantial subset of patients who have either no symptoms or symptoms that are so mild and non-specific that only the most suspicious might consider COVID-19 infection [37] [38] [39] [40] [43]. It thus remains the case that consideration of an asymptomatic compartment is an important aspect of disease modelling. Moreover, it should be emphasized in studies as the asymptomatic (or very mildly symptomatic) are likely to be an important conduit for the infection of others [36] [37] [43] [85]. It is also the case that since COVID-19 has a long (and often variable) incubation period and that this may include a pre-symptomatic phase, where the patient is possibly shedding virus, the potential role of this phase in onwards transmission needs to be considered [36].
Methodological issues relating to false positive and false negative results in COVID-19 testing have been of concern throughout the pandemic [44] - [55]. Of particular note are the lack of precise information about false positive and false negative rates and the concern about the over-sensitivity of PCR when high numbers of cycles are used [53] [55]. To investigate the possible significance of this we sought to model the potential impact of test uncertainty and found that diagnostic variability (due to any cause, as we cannot model specific methodological issues) can dramatically impact the modelling predictions. This is particularly evident in Figure 11 where considerable variation in case rates emerges depending on test performance. Clearly, in our retrospective analysis, this is probably not a major concern but it does highlight its potential impact on modelling. Indeed where mathematical models are used to prospectively forecast the dynamics of the epidemic, and critical outcomes such as the number of hospitalizations and the need for ITU (Intensive Therapy Unit) beds, then much greater attention to the operating characteristics of COVID-19 testing is essential. Indeed, from the mathematical epidemiological perspective, it appears to be a much-neglected issue and warrants much greater attention.
There are many aspects of the ongoing epidemic of COVID-19 we did not consider in this study. For instance, as reported in [86] [87] [88], environmental differences, seasonal weather patterns might have significant effects on the spread and the severity of the disease. Also, our model (2) captures only the symptomatic and the asymptomatic spread, however, as we mentioned in the introduction, the presymptomatic spread is yet another potentially important source of infection. Last but not least, the vaccination programme in NI started in December 2020, and a good understanding of its effects on the epidemic requires a careful assessment, in particular, the subsequent NPI relaxation strategies. A recent UK-wide modelling of vaccination and control strategies [89], suggests that only a carefully designed combination of immunization and NPI relaxation has the potential of driving
below 1.
Note Added at Proof
Since the completion of this study and it had the original submission of a series of variants of COVID-19 which has emerged and some have become highly prevalent. Indeed, variants such as Delta and, more recently, Omicron have novel properties, in terms of infectivity and pathogenicity, which give them markedly different epidemiological characteristics. Delta and then Omicron have in turn become dominant in many populations. These variants are more likely to be transmitted by vaccinated persons compared to earlier variants, despite very good individual protection against severe diseases courses after vaccination [90] [91] [92]. Strikingly Omicron appears to be more infectious with a shorter incubation period and milder disease. Preliminary reports suggest that it may have a higher frequency of asymptomatic illness. Such data suggest that our analysis of the early phases of the pandemic, where Alpha and Beta variants predominate, may not be directly analogous to the current situation, underscoring the rapid evolution of COVID-19. Notwithstanding, to these important new variants and their properties, our general strategy remains valid and the information reported is highly relevant. Further studies of novel variants in the Northern Irish and other contexts are essential, building on the work reported here.
Acknowledgements
Moutari, S. would like to acknowledge the support from Innovate UK through the Knowledge Transport Partnership scheme (Projects S2377MPH and S2417MPH).