Quantitative Security Evaluation for Software System from Vulnerability Database

This paper proposes a quantitative security evaluation for software system from the vulnerability data consisting of discovery date, solution date and exploit publish date based on a stochastic model. More precisely, our model considers a vulnerability life-cycle model and represents the vulnerability discovery process as a non-homogeneous Poisson process. In a numerical example, we show the quantitative measures for contents management system of an open source project.


Introduction
From the latter half of 1990s, many security incidents have been reported in enterprise systems and personal computers, such as the denial-of-service attack via computer viruses and the data leak caused by unauthorized accesses.
Generally, most of security incidents are caused by software flaws and bugs called security holes and vulnerabilities.The effective counter measure against security incidents is to validate there is no flaw in the software during design and testing phases.Nowadays, for these purpose, model verification techniques are enhanced to validate the software design.For example, the model checking ensures that the software behaves according to its specification mathematically [1], and several testing techniques are developed to remove software faults as many as possible in the testing phase [2].However, even if such techniques are applied, it is difficult to remove all the flaws before releasing the software to the market due to external circumstances of software development; development cost, delivery date and unexpected specification changes.For such software systems, a security patching is one of the feasible solutions that do not allow an attacker to exploit vulnerabilities.
A security patch is a small program to fix the software faults causing security holes and vulnerabilities, and is distributed to the end-users through the Internet or other means after the software release.The user can remove a vulnerability by applying a corresponding security patch which is distributed from the vendor.Ideally, the security patch should be distributed whenever one discovers a vulnerability of the software product.However, the development and distribution of security patches incur expenses for the vendor, and a short development time might cause the distribution of a poorly designed patch causing a new problem.Thus, many of the software vendors design a plan to distribute a security patch at a specified period of time, e.g., quarterly distribution, and the patch fixes all the vulnerabilities which have been discovered until the distribution time.On the other hand, from the user perspective, applying a patch involves not only a tedious task but also a risk that the patch causes an error like misconfiguration.Therefore, in practice, users, especially enterprises and firms, also make a plan of what patches are applied at a specified period of time.These strategies for the software patch are called patch management.In [3], Okamura et al. discussed the optimal patch release timing to help the patch management for enterprise based on the stochastic model.
Essentially, it is important to quantify degree of security for the software system to discuss the patch management.In general, there are two perspectives on the quantitative evaluation of security: vendor's and user's perspective.From the vendor's perspective, the risk is that vendor is to release exploitation of a vulnerability before a patch is distributed.On the other hand, users should consider the risk caused by the delay of applying patches as well as the risk of software system itself.In fact, Okamura et al. [4] tried to evaluate the degree of security from user's perspective by considering user profile of the system.In this paper, we focus on the security risk for vendors.
In the past literature, many researches considered the risk of security in software system from the vendor's perspective.Wang et al. [5] presented a continuous-time Markov model to evaluate the security in the intrusiontolerant database system.Jonsson et al. [6] discussed the security model based on the analysis of attacker's behavior.In these papers, they considered the quantitative security for specific systems and it cannot always be applied to any kind of software system.Also Kimura [7] proposed a stochastic model, which is similar to the classical software reliability growth model, and presented a quantitative evaluation for the security of software system.His method focused on the vulnerability discovery process only and therefore it can be applied to many kinds of software system.However, the model derived in [7] is essentially equivalent to testing-domain dependent software reliability growth model [8].Thus, it cannot represent a variety of patterns for the vulnerability discovery process.
In this paper, we refine the quantitative software security model based on the vulnerability discovery process by using general distributions.Although the model presented here does not exactly include the model in [7], we adopt the similar situation where vendors and attackers compete to make a patch and to find an exploit.In addition, we present an illustrative example of the quantitative security evaluation of contents management system from the vulnerability data.
The rest of this paper is organized as follows.In Section 2, we describe the vulnerability model with respect to its discovery process.Section 3 presents the formulation of a quantitative security measure based on the vulnerability discovery process, patch release distribution and exploitation time distribution.Section 4 is devoted to the experiment for our quantitative security evaluation based on the vulnerability data.

Vulnerability Life Cycle
Vulnerability is defined as a fault on system requirements or a program that allows an attacker to violate the system integrity.A vulnerability is often caused by flaws on software requirements as well as software bugs, and thus it is more difficult to find vulnerabilities by software testing than to detect usual software bugs.
Arbaugh et al. [9] presented a vulnerability life-cycle model which consists of the following seven states:  Birth: The birth of a vulnerability, strictly speaking a flaw, occurs at software requirement or software design. Discovery: Someone discovers a flaw on software security, and then the flaw becomes a vulnerability. Disclosure: The vulnerability is disclosed when the discoverer reveals details of the problem. Correction: The vulnerability is correctable by developing and releasing a security patch. Publicity: The vulnerability and its problem become known by disclosing them to public medias. Scripting: An exploitation of the vulnerability is released.In this state, crackers with little or no skill can exploit the vulnerability to violate the integrity of system. Death: The vulnerability dies when one applies a security patch to all the vulnerable systems.Figure 1 illustrates the state transition of a typical vulnerability in the life-cycle model.

Vulnerability Discovery Process
In the vulnerability life cycle, we focus on the discovery and disclosure states.In general, the software vendor begins to take a counteraction against a vulnerability after discovering the vulnerability in the software operation phase.That is, the number of discovered vulnerabilities is a significant measure to determine a security strategy of the vendor.
To describe the vulnerability discovery process, we make the following assumptions:  (A-1) The software has a finite number of vulnerabilities to be discovered. (A-2) The time to discover a vulnerability is stochastically distributed, and all the times are mutually independent random variables.Under the above assumptions, we model the number of discovered vulnerabilities at time t, D (t), as follows.where m is the total number of undiscovered vulnerabilities at time t = 0 and F V (t) is a cumulative distribution function (c.d.f.) of the discovery time for a vulnerability.In addition, when the total number of undiscovered vulnerabilities follows a Poisson distribution with mean , the probability mass function (p.m.f.) of D (t) is given by Equation ( 2) equals the p.m.f. of non-homogeneous Poisson process (NHPP) with the mean value function . This framework is essentially same as NHPPbased software reliability models (SRMs) [10,11].Thus, by applying well-known statistical distributions to F V (t), we can obtain the vulnerability-discovery processes which correspond to several existing NHPP-based SRMs.For example, when F V (t) is a truncated logistic distribution, the corresponding NHPP-based vulnerability discovery model equals an inflection S-shaped model [12,13].The inflection S-shaped model has almost same representation ability as the vulnerability discovery model proposed by [14][15][16][17], since both models draw a logistic curve as the expected number of discovered vulnerabilities.

Security Evaluation Model
From vendor's perspective, the security path to fix the vulnerability should be distributed before the exploitation of it is released.That is, for the vulnerability life cycle, the state should be Death before Scripting.However, as seen in zero-day virus, the patch distribution is often delayed before releasing the exploitation.In addition, if a large number of vulnerabilities are discovered just after the release of software product, there is an increased risk of exploiting the vulnerabilities by malicious users.This is clearly the risk for the vendor.
To evaluate the vendor's risk, let T D and T S be the random times for distributing the security path of a vulnerability and releasing the exploitation of it, respectively, just after the vulnerability is discovered.Also, we assume that T D and T S have respective c.d.f.'s F D (t) = P (T D ≤ t) and F S (t) = P (T S ≥ t), and F S (t) is allowed to be defective, i.e., it is not always F S (∞) = 1.This means that there exists a probability that the vulnerability cannot be exploited for malicious attacks.
Let S(t) be the number of vulnerabilities whose exploitation is released before the patch is distributed.Then the process S(t) can be analyzed by similar way to M t /G/∞ queueing process with two different competitive services.Since the number of discovered vulnerabilities is described by an NHPP, we have Next we focus on the probability that the exploitation of a vulnerability is released before time t, provided that the vulnerability is discovered at T V = s (≥ t).The probability can be derived by the conditional probabilities on whether the patch is distributed before time t or not.The probability in the case where the patch is released before time t is given by where in general Also, the probability in the case where the patch is not released before time t is Therefore, we have the conditional probability that the exploitation of a vulnerability is released before time t provided that the vulnerability is discovered at s as follows.

P T t s T T T s P T T T t s T s P T t s T t s T s F u dF u
According to the argument of M t /G/∞ process [18], we obtain where Substituting Equation (7) into Equation (3) yields where That is, the number of vulnerabilities whose exploitation is released before the patch distribution also become an NHPP with mean value function .

 
G t  Based on the NHPP, we define the quantitative software security function from vendor's perspective as the probability that there is no vulnerability whose exploitation is released before a patch during time interval [s, t + s):

Numerical Example
In this section, we present quantitative security evaluation for a contents management system (CMS), which manages Web sites with graphical user interface.Since the vulnerability of CMS is exploited for altering Web site from the outside, the security evaluation of CMS is significant issue.In particular, we focus on two different versions of Joomla project 1 , which is a CMS developed as an open source project.
From the open source vulnerability database (OSV-DB) 2 , we collect the vulnerabilities for Joomla 1.5.x and 2.5.x.Tables 1 and 2 present the vulnerability data for Joomla 1.5.x and 2.5.xrecorded in OSVDB.The the columns Informed, Solution and Exploit Publish indicate the date when the vendor informs the vulnerability, the patch is distributed, and the exploit of the vulnerability is released.If informed or solution date is missed, we fill it as the disclosure date in the database.
Based on the vulnerability data, we first determine the vulnerability discovery process from the vendor informed date.That is, the vendor informed date is regarded as the discovery date of vulnerability.In the experiment, since the vulnerability discovery process is essentially same as the software reliability growth model, we apply the candidates presented in Table 3 as representative models.In addition, efficient ML estimation algorithms   [19,20] GAMMA gamma [19,21] PARETO Pareto [22,23] TNORM truncated normal [24] LNORM log-normal [24,25] TLOGIS truncated logistic [12,13] LLOGIS log-logistic [13,26] TXVMAX truncated extreme-value at maximum [27] LXVMAX logarithmic extreme-value at maximum [27] TXVMIN truncated extreme-value at minimum [27] LXVMIN logarithmic extreme-value at minimum [27,28] based on the EM algorithm have been developed for all the models [13,19,27,29].Furthermore, the model selection is performed by AIC (Akaike information criterion) [30], which is defined by 2 the number of model parameters .
the model with smaller AIC is better fitting to the observed data.Table 4 shows the maximum log-likelihood (MLL) and AIC for all the candidates in the vulnerability of Joomla 1.5.x.Similarly, Table 5 indicates the results of Joomla 2.5.x.
From these tables, it can seen that EXP is the best to represent the vulnerability discovery processes in both Joomla 1.5.x and 2.5.x.Figures 2 and 3 depict the cumulative number of vulnerabilities of Joomla 1.5.x and 2.5.x from their release date.The figures include the mean value functions of EXP models fitting to the observe data.The current date is 2013/1/11.From the figures, we find that the vulnerability discovery of Joomla 1.5.x has not converged yet.In contrast, the vulnerability discovery of Joomla 2.5.xalmost converges.In fact, the expected number of residual vulnerabilities are 2.41 in Joomla 1.5.x and 0.12 in Joomla 2.5.x.We estimate the distribution of patch release timing from the data.The means (variances) of patch distribution are 13.1 days (424.0 days 2 ) in Joomla 1.5.x and 34.7 days (2474.7 days 2 ) in Joomla 2.5.x.Since the variances are large, we cannot utilize the several well-known distributions such as normal distribution.To simplify the argument of distribution selection, this paper applies the phase-type (PH) distributions to represent for patch distribution.
The PH distribution is defined by the absorbing time in a continuous-time Markov chain consisting of several transient states and one absorbing state.It can approximate any distribution with any precision.That is, by using the PH distribution, we can reduce the problem of distribution selection into the parameter estimation of PH distribution.In addition, efficient algorithm for sample-based estimation of PH distribution has been proposed in [31].Figure 4 illustrates estimated density function of patch distributions for Joomla 1.5.x and 2.5.x.The numbers of phases are 13 and 12 in Joomla 1.5.x and 2.5.x,respectively, which are determined by the phase orders [32].Both distributions have two modes around 1 and 5 days.However, since the tails of distributions are long, the means (variances) of estimated PH distributions are 13.1 days (642.9 days 2 ) in Joomla 1.5.x and 34.7 days (3380.0days 2 ).
Next we determine the distribution of exploitation based on the exploit publish date.However, in the tables, vulnerabilities are not always exploited for a malicious attack, and exploitation of several of vulnerabilities has not been discovered.Also, the number of vulnerabilities whose exploitation is released is too small to determine the distribution form.Thus in the paper, we assume that the distribution of exploitation is given by the following exponential-type distribution.
where p S is the probability that the exploitation of the vulnerability exists and λ is the exploitation rate provided that there exists the exploitation of the vulnerability.The probability p S can be estimated as the fraction of the number of vulnerabilities whose exploitation exists over the total number of vulnerabilities.Then we have p S = 1/14 in Joomla 1.5.x and p S = 7/15 in Joomla 2.5.x.Also, the exploitation rates are given by the reciprocal number of mean time to exploit, namely, 1/λ = 1 (day) in Joomla 1.5.x and 1/λ = 24.1 (days) in Joomla 2.5.x.Since F V (t) and F S (t) are exponential distributions and F D (t) is PH distribution, Equation (10) can be expressed as a matrix exponential form.Based on G(t) in Joomla 1.5.x and 2.5.x,we can evaluate quantitative measures for security.Figure 5 illustrates the quantitative software  security functions of Joomla 1.5.x and 2.5.xgiven by Equation ( 11) from their release date, i.e., SS(t|0).Also Figure 6 indicates the software security functions of Joomla 1.5.x and 2.5.x from their current date.As seen in Figure 5, the quantitative software security of Joomla 1.5.x is higher than that of Joomla 2.5.x after their re-  leases.This is caused by two factors: the first is to find the greater number of vulnerabilities of Joomla 2.5.x in early phase just after the release, and the second is there are a greater number of vulnerabilities whose exploitation are released in Joomla 2.5.x.On the other hand, the quantitative software security functions from the current date in Figure 6 have different tendency from those from the release date in Figure 5.The quantitative software security of Joomla 2.5.x is marked by the convergence to a certain level.In the case where the operation during over 200 days, Joomla 2.5.x is more secure than Joomla 1.5.x.However, in early phase, Joomla 1.5.x is still secure, compared to Joomla 2.5.x.This is because the number of vulnerabilities of Joomla 2.5.x is almost converged at the current date as shown in Figure 3, though the vulnerabilities of Joomla 1.5.x are expected to remain even at the current date.This result suggests that Joomla 1.5.x is more secure at the current date, but it should be replaced with Joomla 2.5.xaround 200 days after from the viewpoint of security.

Conclusions
This paper has presented a quantitative security evaluation for software system from vendor's perspective.Concretely, we have proposed a general method to quantify the degree of security from the vulnerability database.The concept of our approach is similar to the software reliability growth models, and the advantage of our method is the applicability, namely, our method can be applied to any kind of software system if its vulnerability data can be obtained.In the numerical example, we have illustrated how to evaluate the software by using the vulnerability data for CMS.
In future, we will try to perform the experiments for other types of software system and comprehensively compare quantitative software security functions.In addition, we will derive the security measure from the user perspective based on the proposed model.

Figure 1 .
Figure 1.A typical state transition in a vulnerability lifecycle model.

Figure 5 .
Figure 5. Quantitative software security functions from release date.

Figure 6 .
Figure 6.Quantitative software security functions from current date.