_{1}

Spatial autocorrelation is a measure of the correlation of an observation with other observations through space. Most statistical analyses are based on the assumption that the values of observations are independent of one another. Spatial autocorrelation violates this assumption, because observations at near-by locations are related to each other, and hence, the consideration of spatial autocorrelations has been gaining attention in crash data modeling in recent years, and research have shown that ignoring this factor may lead to a biased estimation of the modeling parameters. This paper examines two spatial autocorrelation indices: Moran’s Index; and Getis-Ord G<sub>i</sub><sup style='margin-left:-7px;'>*</sup> statistic to measure the spatial autocorrelation of vehicle crashes occurred in Boone County roads in the state of Missouri, USA for the years 2013-2015. Since each index can identify different clustering patterns of crashes, therefore this paper introduces a new hybrid method to identify the crash clustering patterns by combining both Moran’s Index and G<sub>i</sub><sup style='margin-left:-7px;'>*</sup> statistic. Results show that the new method can effectively improve the number, extent, and type of crash clustering along roadways.

In many vehicle crash data, geographic relationships among crashes can exist, and this phenomenon is termed spatial autocorrelation, which is a measure of the correlation of a crash with other crashes through space. Most statistical analyses are based on the assumption that the values of observations in each sample are independent of one another. Spatial autocorrelation violates this assumption, because samples taken from nearby locations are related to each other, and hence, they are statistically not independent of one another [

Spatial autocorrelation can be positive or negative among observations. Positive spatial autocorrelation occurs when observations having similar values are closer (i.e. clustered) to one another, and negative spatial autocorrelation occurs when observations having dissimilar values occur near one another [

The differences between the network autocorrelation and spatial autocorrelation were examined [

There are many indices or statistics that attempt to measure spatial autocorrelation for count data, such as Moran’s index (also called Moran’s I), the Geary’s C, and the Getis-Ord G statistic. These indices can be computed as Globalor Local measures depending on the scope of the analysis. Global spatial autocorrelation measures the overall spatial autocorrelation of the entire study area, providing a single measurement of spatial autocorrelation for an entire data. Local spatial autocorrelation measures the spatial autocorrelation of individuals features and identifies the spatial patterns across the study area considering the relationship between individual features. Indices of spatial autocorrelation are based on the general index of matrix association (i.e. the Gamma Γ index). The Global Gamma index consists of the sum of the cross products of the elements a_{ij} and b_{ij} in two matrices of similarity, using spatial similarity in one matrix and value similarity in the other matrix, such that [

Using different value similarity would result in different indices. For example, setting

Moran’s I statistic is one of the oldest indices of spatial autocorrelation and can be used to test for global and local spatial autocorrelation among continuous data. For any continuous variable, x_{i}, a mean

where,

x_{i}: the value of variable x at location i;

x_{j}: the value of variable x at location j;

w_{ij}: the elements of the weight matrix;

n: number of observations;

S_{0}: is the sum of the elements of the weight matrix:

The local Moran’s I for location i can be calculated as follows:

Values for this index typically, range from −1.0 to +1.0, where a value of −1.0 indicates negative spatial autocorrelation, and a value of +1.0 indicates positive spatial autocorrelation. When nearby points have similar Moran’s values, their cross product is high. Conversely, when nearby points have dissimilar Moran’s values, their cross-product is low. The expectation of Moran’s I statistic is:

When a Moran’s I value is larger than E(I), this would indicate positive spatial autocorrelation, and if a Moran’s I is less than E(I), this would indicate negative spatial autocorrelation. In Moran’s initial formulation, the weight variable, w_{ij}, was a contiguity matrix. Therefore, if zone j is adjacent to zone i, the product receives a weight of 1.0, otherwise, the product receives a weight of 0.0. A study [_{ij}, is a distance-based weight which is the inverse distance between locations i and j (1/d_{ij}). The z-score of Moran’s I can be computed as follows:

where E(I) is the expected value of I, and V(I) is the variance of I, as shown in Equation (7):

The Getis-Ord G statistic is calculated with respect to a specified threshold distance (defined by the user) rather than to an inverse distance, as with the Moran’s I [_{i} statistic is an indicator for local spatial autocorrelation for each data point. The Global G statistic can be calculated as follows [

where,

x_{i}: the value of variable x at location i;

x_{j}: the value of variable x at location j;

w_{ij}: the elements of the weight matrix.

There are two types of local G_{i} statistics, although almost the two types produce identical results [_{i}, does not include the autocorrelation of a zone with itself, whereas the _{i} statistic does not include the value of X_{i} itself, but only the neighborhood values, but _{i} as well as the neighborhood values), and both can be computed by the formulae [

where d is the neighborhood (threshold) distance, and w_{ij} is the weight matrix that has only 1.0 or 0.0 values, 1.0 if j is within d distance of i, and 0.0 if its beyond that distance. These formulae indicate that the cross-product of the value of X at location i and at another location j is weighted by a distance weight, w_{ij} which is defined by either a 1.0 if the two locations are equal to or closer than a threshold distance, d, or a 0.0 otherwise. The G statistic can vary between 0.0 and 1.0. The statistical significance of the local autocorrelation between each point and its neighbors is assessed by the z-score test and the p-value. ArcGIS uses the following formulae to calculate the local Getis-Ord

where,

x_{i}: the value of variable x at location i;

x_{j}: the value of variable x at location j;

w_{ij}: the elements of the weight matrix;

n: number of observations.

The expected G value for a threshold distance, d, is defined as:

where W is the sum of weights for all pairs of locations (

The standard error of G(d) is the square root of the variance of G. Therefore, a z-test can be computed by:

The crash clustering patterns (i.e. type of concentration of crashes) and its statistical significance is evaluated based on the output z-scores, the correspondent p-values and the confidence level. These will determine whether a crash is classified as having a significant high spatial autocorrelation (denoted by High-High, HH), a significant low spatial autocorrelation (denoted by Low-Low, LL), a significant dispersed outlier (either a high value surrounded by low value denoted by HL, or vice versa, a low value surrounded by high value denoted by LH), or insignificant random crash. A high positive z-score for a crash point indicates a significant spatial autocorrelation (either with high values HH or with low values LL). A low negative z-score for a crash point indicates a statistically significant spatial outlier (either with high-low HL or low-high LH). A z-score of a crash point close to zero indicates that the crash is randomly and independently distributed in space. To determine if the z-score is statistically significant, it should be compared to a range of values for a particular confidence level. For example, at a significance level of 95%, a z-score would have to be less than −1.96 or greater than +1.96 to be statistically significant. Typical confidence levels are 90%, 95%, or 99%.

To illustrate the analysis framework presented in this paper, Boone County, Missouri, USA crash data for the years (2013-2015) are used. Missouri crash data is reported by the Missouri State Highway Patrol (MSHP) and recorded in the Missouri Statewide Traffic Accident Records System (STARS). The total observed crashes within the three years 2013-2015 is 6886.0 along roads in Boone County.

In this paper ArcGIS 10.3.1 is used to compute the Moran’s I, and

・ Spatially join the attributes of crash incidents to road segments based on their location relationship (i.e. latitude/longitude) using functionalities of a GIS that try to parse roads up into consistent analysis units and matching the two features according to their relative spatial locations;

・ Build a network of roads from the crash attributed road segments;

・ Generate spatial weights matrix for the network arcs;

・ Compute the Global Moran’s I available in the ArcMap 10.3.1 Spatial Statistics toolkit;

・ Compute the Global (General) G_{i} statistic available in the ArcMap 10.3.1 Spatial Statistics toolkit;

・ Compute Anselin local Moran’s I available in the ArcMap 10.3.1 Spatial Statistics toolkit;

・ Compute the local Getis-Ord local

Statistically significant high spatial autocorrelation locations will have a high z-value and be surrounded by other crashes with high z-values as well (HH). Statistically significant low spatial autocorrelation locations (LL) will be found in cases where a crash point will have a low z-value and be surrounded by other

z-score | p-value | Confidence level |
---|---|---|

z-score < −1.65 or z-score > +1.65 | <0.10 | 90% |

z-score < −1.96 or z-score > +1.96 | <0.05 | 95% |

z-score < −2.58 or z-score > +2.58 | <0.01 | 99% |

crashes with low z-values as well. If the z-value of a particular crash location is higher than the mean z-value of all crashes, then it would be considered high. If the z-value of a particular crash point is lower than the mean z-value of all crashes, then it would be considered low. The resultant z-scores and p-values indicate whether crashes with either high or low z-values are clustered. A high z-score and small p-value for a crash point indicates a spatial clustering of high values (i.e. HH). A low z-score and small p-value indicates a spatial clustering of low values (i.e. LL). The higher (or lower) the z-score, the more intense the clustering. A negative z-score for a crash point indicates an outlier (i.e. a dispersed crash). A z-score near zero indicates no apparent spatial clustering (i.e. a random crash). Both the Anselin local Moran’s I and the local

Since the Anselin Moran’s I and the

Crashes occurred along the roads in Boone County, Missouri (2013-2015) are analyzed to assess whether they are spatially clustered, dispersed, or random. The Global Moran’s I and the Global (General)

The results of the analysis are interpreted within the context of the null hypothesis, which states that the crashes occurred in Boone County roads (2013-2015) are randomly distributed in the study area (i.e. there is no global spatial autocorrelation exists for the entire area). Since the p-values in

Index type | Index value | z-score | p-value |
---|---|---|---|

Global Moran’s I | 0.472 | 4.856 | 0.0000 |

Global | 0.255 | 2.817 | 0.0000 |

Index | High-High HH | Low-Low LL | Outliers HL | Outliers LH | Random | Total crashes |
---|---|---|---|---|---|---|

Anselin Moran’s I | 2411 | 3544 | 73 | 38 | 820 | 6886 |

2916 | 3265 | 0 | 0 | 705 | 6886 |

Using the new hybrid method by combining 30% Moran’s I, and 70%

Cluster #2 now presents insignificant random crashes compared to mostly LLs in

From

In many vehicle crash data, locational relationships among crashes can exist

Index | High-High HH | Low-Low LL | Outliers HL | Outliers LH | Random | Total crashes |
---|---|---|---|---|---|---|

Anselin Moran’s I | 2411 | 3544 | 73 | 38 | 820 | 6886 |

2916 | 3265 | 0 | 0 | 705 | 6886 | |

Hybrid | 1841 | 2417 | 0 | 0 | 2628 | 6886 |

given that movement is confined to roadways which are traversed by many users. This phenomenon is termed spatial autocorrelation and if not appropriately accounted for, can lead to incorrect parameter estimates in the modeling process. This paper examined two spatial autocorrelation indices: Moran’s I; and Getis-Ord

Abdulhafedh, A. (2017) A Novel Hybrid Method for Measuring the Spatial Autocorrelation of Vehicular Crashes: Combining Moran’s Index and Getis-Ord Statistic. Open Journal of Civil Engineering, 7, 208-221. https://doi.org/10.4236/ojce.2017.72013