^{1}

^{2}

^{3}

Any computer system with known vulnerabilities can be presented using attack graphs. An attacker generally has a mission to reach a goal state that he expects to achieve. Expected Path Length (EPL) [1] in the context of an attack graph describes the length or number of steps that the attacker has to take in achieving the goal state. However, EPL varies and it is based on the “ state of vulnerabilities” [2] [3] in a given computer system. Any vulnerability throughout its life cycle passes through several stages that we identify as “ states of the vulnerability life cycle” [2] [3]. In our previous studies we have developed mathematical models using Markovian theory to estimate the probability of a given vulnerability being in a particular state of its life cycle. There, we have considered a typical model of a computer network system with two computers subject to three vulnerabilities, and developed a method driven by an algorithm to estimate the EPL of this network system as a function of time. This approach is important because it allows us to monitor a computer system during the process of being exploited. Proposed non-homogeneous model in this study estimates the behavior of the EPL as a function of time and therefore act as an index of the risk associated with the network system getting exploited.

In 2016, the U.S. Government Cybersecurity report commences with the following paragraph. [

To address this scenario, many research efforts have been taken. However, due to the peculiar, voluminous and dynamic nature of the field, defending methods are still chasing behind the defending targets. Therefore, it is extremely important to integrate scientific efforts and develop strong theoretical basis aiming for rapid development of applications and system solutions.

In this study, we continue our research efforts in integrating Mathematical and Statistical theories into better understanding the complex behavior of computer network systems in the perspective of Cybersecurity. Thus, we propose a new method to estimate the EPL as a function of time “t”. The EPL is a major factor in determining the risk level of a given computer system where with smaller EPL, the network system is more vulnerable and probable to be exploited.

In our recent studies, [

In the present study, we introduce a Non Homogeneous Stochastic Model that allows the computer system administrators to predict the time that the system is most vulnerable for an attack in terms of the EPL. This estimate is based on the assumption that a system is more susceptible to be exploited when the EPL is at a minimum at a particular time “t”. In developing this model we have used a network system of two IPs with three vulnerabilities as a base model.

With the introduction of this new approach we will be re-defending the capability to estimate the probability of getting exploited as a function of time for a computer network system with given set of vulnerabilities. Even though we have already developed a successful statistical model to find the EPL of a possible attack, it is more important to estimate the EPL as a function of time. Current study will address this need. Thus, for a system with a given set of vulnerabilities, estimating of most probable exploit times can be modelled on the logical assumption that a system is more susceptible to be exploited at a time where the Expected Path Length (number of steps that an attacker needs to pass before achieving the goal state) is at its minimum.

The core component of this method is the attack graph [

Absorbing state or goal state is the security node which the attacker expects to reach and exploit. When the attacker has reached this goal state, the attack path is completed. Thus, the entire attack graph consists of these types of attack paths that will be illustrated in this study.

Given the CVSS score [

We define,

j = probability that an attacker is currently in state i and exploits a vulnerability in state.

n = number of outgoing edges from state i in the attack model.

v_{j} = CVSS score of the vulnerability in state j.

Thus formally we can define the transition probability given by,

p i j = v j ∑ k = 1 n v k

Now, using these transition probabilities we can derive the absorbing transition probability matrix P, which possesses the properties defined under Markov chain probability methods.

p i j , the transition probabilities for each state in an attack graph represent the risk of a particular state (for a given vulnerability) of being exploited. Therefore, it is logical to consider it as a risk variable. In our previous studies we have introduced a more convenient tool named “Risk Factor” [

It is important to note that when we consider a given vulnerability, its exploitability factor should vary with time. But the exploitability factor calculated under the CVSS is a constant and is not suitable for inclusion in a non-homogenous model. However, our “Risk Factor” model is based on the Vulnerability Life Cycle [

The probability of an exploitation for a given vulnerability can be obtained using the three stochastic models given in

In each of the equations, t is the age of vulnerability and is calculated by taking the difference between the dates that the vulnerability was first discovered and the attacking attempt started.

Thus, for a given vulnerability at a time we can obtain the probability of being exploited. We can now define the transition probability as follows.

p i j = R ( v j ( t ) ) ∑ j = 1 n R ( v j (t))

(v_{j}(t)) = Risk Factor of a given vulnerability in state j at time t,

e(v_{j}) = Exploitability sub score that is related to the CVSS score for the given vulnerability in state j.

And

R ( v j ( t ) ) = Y ( t ) ∗ e (vj)

Category | Model Equation | R^{2} | R a d j 2 |
---|---|---|---|

Low (0 - 4) | Y(t) = 0.135441 − 0.308532 (1/t) − 0.002030ln(lnt) | 0.9576 | 0.9566 |

Medium (4 - 7) | Y(t) = 0.169518 − 0.356821(1/t) − 0.007011ln(lnt) | 0.962 | 0.961 |

High (7 - 10) | Y(t) = 0.191701 − 0.383521 (1/t) − 0.00358ln(lnt) | 0.9588 | 0.9577 |

is the analytic form of the risk factor as a function of Y(t) and e(v_{j}) where Y(t) is the exploitability probability factor as a function of time and e(v_{j}) is the exploitability score taken from the CVSS.

Under the Attack Prediction, we consider two methods to predict the attacker’s behavior.

The absorbing transition probability matrix [

For a square (n × n) adjacency matrix P and a positive integer k, P^{k} is matrix P raised to the power of k. Since P is an absorbing transition probability matrix with respect to time, this matrix goes to some stationary matrix Π, where the rows of this matrix are identical as follows. That is,

lim k → ∞ P k = Π

Once the stationarity is achieved, goal state column of this matrix Π has ones, so we can find the minimum number of steps (time) that the attacker will reach the goal state with probability 1. Once the attacker is in the goal state we can identify the probability of the system being exploited.

The Expected Path Length (EPL) measures the expected number of steps the attacker will need starting from the initial state to reach the goal state (the attacker’s objective). As we discussed earlier P has the following canonical form,

P = ( Q R 0 I )

Here, P is the transition matrix, Q is the matrix of transient states, R is the matrix of absorbing states and I is the identity matrix.

The matrix P represents the transition probability matrix of the absorbing Markov chain. In an absorbing Markov chain the probability that the chain will be absorbed is always 1. Thus, we have

Q n → 0 as n → ∞

This property implies that all the eigenvalues of Q have absolute values strictly less than 1. Thus, I − Q is an invertible matrix and there is no problem in defining the matrix

M = ( I − Q ) − 1 = I + Q + Q 2 + Q 3 + ⋯

Using this fundamental matrix M of the absorbing Markov chain we can compute the expected total number of steps to reach the goal state until absorption.

Taking the summation of the first row elements of matrix M gives us the expected total number of steps to reach the goal state which is defined as the Expected Path Length.

Given below is an application that illustrates a computer network system of our proposed analytic process to estimate the EPL of a hacker.

In this section we present an example illustrating the application of the usefulness of our method. We combine the application of methodology with an attack graph relevant to a typical network exemplified with three different recorded vulnerabilities.

To illustrate the proposed analytical approach model that we have developed as discussed above, we considered the Network Topology [

The computer network consists of two service hosts IP 1, IP 2 and an attacker’s workstation, Attacker connecting to each of the servers via a central router.

In the server IP 1 the vulnerability is labeled as CVE 2016-3230 and shall be denoted as V_{1}.

In the server IP 2 there are two recognized vulnerabilities, which are labeled CVE 2016-2832 and CVE 2016-0911. Let’s denote them as V_{2} and V_{3}, respectively.

We proceed to use the CVSS score of the above vulnerabilities in our analysis. The exploitability score (e (v) in

Published date is in general considered as the date that a vulnerability is made known to the public. CVSS score is the score given to the vulnerability based on exploitability factors by the “Forum of Incident Response and Security Teams”, (FIRST). Calculation of this score is established and updated time to time

Vulnerability | Published date | CVSS score | Exploitability score | Time for the date 6/24/2016 (t_{j}) | Risk factor R(ν_{j}(t_{j})) |
---|---|---|---|---|---|

V_{1} (CVE 2016-3230) | 6/15/2016 | 9 (High) | 8 | 9 | 1.702 |

V_{2} (CVE 2016-2832) | 6/13/2016 | 4.3 (medium) | 2.8 | 11 | 0.3667 |

V_{3} (CVE 2016-0911) | 6/19/2016 | 1.9 (Low) | 3.4 | 5 | 0.2474 |

and the relevant details are available in the CVE detail and other relevant official websites.

June 24th was used as the date where a first attack attempt was made by an attacker. Risk factor is hence the Risk of being exploited on the 24th of June, calculated using the equation presented in the Section 2.2. That is,

( v j ( t ) ) = Y ( t ) ∗ e (vj)

For example, let’s consider the vulnerability “V1 (CVE 2016-3230)”. The CVSS score has given the exploitability score for this vulnerability as 8. Taking the difference between the published date (June 15th) and the attack date (June 24th), the age of this vulnerability is calculated as 9 days. Since this is a vulnerability of the category “High”, we can now use our model given in the

R ( v 1 ( t ) ) = [ 0.191701 − 0.383521 ( 1 t ) − 0.00358 ln ( ln t ) ] ∗ 8

R ( v 1 ( 9 ) ) = 1.702

Similarly, Risk factors for two other vulnerabilities are also calculated and presented in the

The host centric attack graph is shown by _{3} vulnerability. The graph shows all the possible paths that the attacker can follow to reach the goal state.

Note that IP1.1 state represents V_{1} vulnerability and IP2.1 and IP2.2 states represent vulnerabilities V_{2} and V_{3} respectively. Attacker can reach each state by exploiting the relevant Vulnerability.

In this section we will illustrate the process of developing Adjacency Matrix for the Attack Graph. Adjacency Matrix is a key analytical tool used in out methodology.

Let s_{1}, s_{2}, s_{3}, s_{4}, represent the attack states for Attacker, (IP1.1), (IP2.1) and (IP2.2), respectively.

To find the weighted value of exploiting each vulnerability from one state to another state, we divide the vulnerability score by summation of all out going vulnerability values from that state.

For our attack graph the weighted value of exploiting each vulnerability is given below. 1st row probabilities:

Weighted value of exploiting V_{1} from s_{1} to s_{2} is R_{1}/(R_{1} + R_{2}) Weighted value of exploiting V_{2} from s_{1} to s_{3} is R_{2}/(R_{1} + R_{2}) 2nd row probabilities:

Weighted value of exploiting V_{2} from s_{2} to s_{3} is R_{2}/(R_{2}) 3rd row probabilities:

Weighted value of exploiting V_{1} from s_{3} to s_{2} is R_{1}/(R_{1} + R_{3}) Weighted value of exploiting V_{3} from s_{3} to s_{4} is R_{3}/(R_{1} + R_{3}) 4th row probabilities:

Weighted value of exploiting V_{3} from s_{4} to s_{4} is 1.

For the Host Centric Attack graph we can have the Adjacency Matrix as follows.

Applying the information given in

s 1 s 2 s 3 s 4 A = s 1 s 2 s 3 s 4 [ 0 0.7614 0.2386 0 0 0 1 0 0 0.8255 0 0.1745 0 0 0 1 ]

Here, 0.7614 is the probability that attacker exploits vulnerability V_{1} in the first step, the step from s_{1} to s_{2}. Similarly, we can explain 0.1745 as the probability that attacker exploits the vulnerability V_{3} in the step s_{2} to s_{3} in his first attempt. Similarly, each probability represents the likelihood to exploit relevant vulnerability from one state to another state in the first attempt.

We can use this matrix to answer several important questions in cyber security analysis. First, using the Adjacency Matrix we expect to find the Expected Path Length. Then, we can analyze the behavior of Expected Path Length over the time.

To calculate the EPL over the time we follow the steps given below.

Step 1: Calculate the “Risk Factor” of each vulnerability on the date of the first attack assumed (June 24th in our application). That is, calculate the “age” of each vulnerability by taking the difference between the published date and the 24th of June. And, substitute this value of “t” in relevant model equation given in the

Step 2: Using those “Risk Factors”, develop the transition matrix “A” and calculate the EPL.

Step 3: Repeat the same process for all the following dates that we need to calculate the

Expected Path Length.

From

For example, let’s consider the 20th day. Under step 1, we calculate the Risk factors for V_{1}, V_{2} and V_{3}. For the 20th day age of three vulnerabilities V_{1}, V_{2} and V_{3} are, t_{1} = 9 + 20, t_{2} = 11 + 20 and t_{3} = 5 + 20, respectively. Then, by substituting these ages in the respective model equation from the

V_{1} is a vulnerability of “High” category. Therefore, we use the 3rd model equation from

Substituting, t = 29, in the model,

( v 1 ( t ) ) = Y ( t ) ∗ e (v1)

we obtain,

R 1 = 0.191701 − 0.383521 × ( 1 / 29 ) − 0.00358 ln ( ln 29 ) × 8 = 1.393

Similarly for V_{2} and V_{3} we obtain the following Risk factors calculated using the relevant model equations.

For, t = 31,

( v 2 ( t ) ) = Y ( t ) ∗ e (v2)

R 2 = 0.169518 − 0.356821 × ( 1 / 31 ) − 0.007011 ln ( ln 31 ) × 2.8 = 0.4182

For, t = 25,

( v 3 ( t ) ) = Y ( t ) ∗ e (v3)

R 3 = 0.135441 − 0.308532 × ( 1 / 25 ) − 0.002030 ln ( ln 25 ) × 3.4 = 0.4105

Once we have calculated the “Risk Factors” for all the vulnerabilities in the network system, the second step is to develop the Transition Matrix “A” as given in the

The transition probability Matrix for this system on the 20th day after the first attack attempt is assumed to be made is given below.

s 1 s 2 s 3 s 4 A = s 1 s 2 s 3 s 4 [ 0 0.7691 0.2309 0 0 0 1 0 0 0.7724 0 0.2276 0 0 0 1 ]

Step 3 is to calculate the EPL. Applying the methodology we explained in the Section 2.3.2. we can calculate the EPL using the transition matrix “A” by obtaining the matrix “M”.

The sum of the first row of matrix “M” is the EPL of this computer network system at the 20th day (from June 24th) from the first assumed attack attempt. We have obtained, EPL = 9.567 for the 20th day after the first attack created as given in the

Expected Path length

The

By examining the distribution of Expected Path Length of the attacker over 100 days, it will take fewer steps for an attacker to compromise the security goal as the age of vulnerabilities increases. Security practitioners in a typical organization can establish a threshold score for the system and the security teams can planned in advance and identify the critical points to establish a strategy to defend the security of the computer system and introduce relevant patches before we approach such critical stages.

In the present system, it is clear that the threshold score of the EPL is approximately 9.5 steps and the defending professionals can conclude that the system in their network is relatively safe from exploits only for the next 21 days as EPL score is above the threshold value.

It is also clear that any vulnerability that exists creates a threat to the computer system and the risk of probable exploitation will increase over the time of its existence without being patched. In other words, for a particular network system, a higher Expected Path Length for an attacker to reach a goal state represents more difficulty for the hacker and would be reasonable to assume that the attacker has to face many defending measures with a higher Expected Path Length compared to a smaller Expected Path Length. Now, using the probabilistic models that we have developed in our previous studies, using the Vulnerability Life Cycle approach [

Age (Days) | Expected Path Length | Age (Days) | Expected Path Length | Age (Days) | Expected Path Length | Age (Days) | Expected Path Length |
---|---|---|---|---|---|---|---|

1 | 12.2205398 | 26 | 9.517537 | 51 | 9.4453151 | 76 | 9.4239414 |

2 | 11.3052188 | 27 | 9.511655 | 52 | 9.4440137 | 77 | 9.4234008 |

3 | 10.7998722 | 28 | 9.506248 | 53 | 9.4427673 | 78 | 9.4228753 |

4 | 10.4850373 | 29 | 9.501261 | 54 | 9.4415727 | 79 | 9.4223643 |

5 | 10.2729754 | 30 | 9.49665 | 55 | 9.4404267 | 80 | 9.4218672 |

6 | 10.1220591 | 31 | 9.492376 | 56 | 9.4393265 | 81 | 9.4213834 |

7 | 10.0101501 | 32 | 9.488404 | 57 | 9.4382695 | 82 | 9.4209123 |

8 | 9.9244658 | 33 | 9.484705 | 58 | 9.4372532 | 83 | 9.4204536 |

9 | 9.8571518 | 34 | 9.481253 | 59 | 9.4362752 | 84 | 9.4200067 |

10 | 9.8031388 | 35 | 9.478024 | 60 | 9.4353336 | 85 | 9.4195711 |

11 | 9.7590231 | 36 | 9.474999 | 61 | 9.4344263 | 86 | 9.4191465 |

12 | 9.7224429 | 37 | 9.472158 | 62 | 9.4335516 | 87 | 9.4187324 |

13 | 9.6917134 | 38 | 9.469488 | 63 | 9.4327076 | 88 | 9.4183284 |

14 | 9.6656039 | 39 | 9.466972 | 64 | 9.4318929 | 89 | 9.4179342 |

15 | 9.643197 | 40 | 9.464599 | 65 | 9.431106 | 90 | 9.4175493 |

16 | 9.6237964 | 41 | 9.462358 | 66 | 9.4303454 | 91 | 9.4171736 |

17 | 9.606865 | 42 | 9.460237 | 67 | 9.4296099 | 92 | 9.4168066 |

18 | 9.5919829 | 43 | 9.458227 | 68 | 9.4288983 | 93 | 9.416448 |

19 | 9.5788176 | 44 | 9.456321 | 69 | 9.4282094 | 94 | 9.4160975 |

20 | 9.5671025 | 45 | 9.454511 | 70 | 9.4275421 | 95 | 9.415755 |

21 | 9.5566222 | 46 | 9.452789 | 71 | 9.4268956 | 96 | 9.41542 |

22 | 9.5472004 | 47 | 9.45115 | 72 | 9.4262687 | 97 | 9.4150924 |

23 | 9.5386921 | 48 | 9.449588 | 73 | 9.4256606 | 98 | 9.4147719 |

24 | 9.5309766 | 49 | 9.448098 | 74 | 9.4250706 | 99 | 9.4144583 |

25 | 9.5239531 | 50 | 9.446675 | 75 | 9.4244978 | 100 | 9.4141513 |

In the present study, we have developed a nonhomogeneous stochastic model for predicting the Expected Path Length (EPL) of a computer network system with a given set of vulnerabilities at time “t”.

Knowing EPL as a function of time is extremely important in developing defending strategies for not being exploited. Such strategies will reduce the likelihood of the computer network system being hacked.

As we observe the behavior of the EPL over the time, it is possible to identify the time ranges where EPL reached a minimum. Small EPL implies higher chance for a hacker to be successful. In other words, a computer network system is more vulnerable to be exploited on the days where the EPL is the smallest. On such time “t”, vulnerabilities and the system are hence more susceptible to be hacked. The same scenario from an attacker’s point of view can be explained. That is, on the days where EPL is at its smallest, the likelihood of making a successful attack attempt is higher. Therefore, an attacker (hacker), who identifies the set of vulnerabilities in a given computer system would put more attempt on exploiting the system on such date where the EPL is at its smallest. This means that we can use this method as a prediction method of attacking (hacking) time.

By knowing this time for any computer network system, security engineers or IT architects can take the necessary actions in advance to protect their computer system.

Finally, we have developed our methodology based on a typical computer network system that exists in a real world situation with given vulnerabilities that identifies the EPL and actual time that the subject computer system could be exploited. Thus, industry can apply the developed methodology in their own computer network system with a given (known) vulnerabilities to predict the EPL and most probable time of being exploited.

Kaluarachchi, P.K., Tsokos, C.P. and Rajasooriya, S.M. (2018) Non-Homogeneous Stochastic Model for Cyber Security Predictions. Journal of Information Security, 9, 12-24. https://doi.org/10.4236/jis.2018.91002