Vol.3, No.12, 712-731 (2011)
doi:10.4236/health.2011.312120
Copyright © 2011 SciRes. Openly accessible at http://www.scirp.org/journal/HEALTH/
Health
Spatial autocorrelation analysis of 13 leading malignant
neoplasms in Taiwan: a comparison between the
1995-1998 and 2005-2008 periods
Pui-Jen Tsai1*, Cheng-Hwang Perng2
1Center for General Education, Aletheia University, New Taipei, Taiwan; *Corresponding Author: puijentsai@gmail.com
2Department of Statistics and Actuarial Science, Aletheia University, New Taipei, Taiwan.
Received 23 September 2011; revised 10 November 2011; accepted 21 November 2011.
ABSTRACT
Spatial autocorrelation methodologies, includ-
ing Global Moran’s I and Local Indicators of
Spatial Association statistic (LISA), were used
to describe and map spatial clusters of 13
leading malignant neoplasms in Taiwan. A lo-
gistic regression fit model was also used to
identify similar characteristics over time. Two
time periods (1995-1998 and 2005-2008) were
compared in an attempt to formulate common
spatio-temporal risks. Spatial cluster patterns
were identified using local spatial autocorrela-
tion analysis. We found a significant spatio-
temporal variation between the leading malig-
nant neoplasms and well-documented spatial
risk factors. For instance, in Taiwan, cancer of
the oral cavity in males was found to be clus-
tered in locations in central Taiwan, with distinct
differences between the two time periods. Sto-
mach cancer morbidity clustered in aboriginal
townships, where the prevalence of Helicobacter
pylori is high and even quite marked differ ence s
between the two time periods were found. A
method which combines LISA statistics and
logistic regression is an effective tool for the
detection of space-time patterns with discon-
tinuous data. Spatio-temporal mapping com-
parison helps to clarify issues such as the spa-
tial aspects of both two time periods for leading
malignant neoplasms. This helps planners to
assess spatio-temporal risk factors, and to as-
certain what would be the most advantageous
types of health care policies for the planning
and implementation of health care services.
These issues can greatly affect the performance
and effectiveness of health care services and
also provide a clear outline for helping us to
better understand the results in depth.
Keywords: Spatial Autocorrelation Analy sis; Global
Moran’s I Statistic; Local Indicators of Spatial
Association Statistic; Logi stic R egression;
Malignant Neoplasm; Taiwan
1. INTRODUCTION
Spatial analytical techniques and models can identify
spatial anomalies in the epidemiology of diseases, iden-
tify “hot spots” and locate spatio-temporal patterns.
Cluster mapping clarifies issues of internal and external
correlations, while logistic regression is a useful ap-
proach for the differentiation of spatial distribution pat-
terns over time. Common spatial techniques for health
research include: disease mapping, clustering techniques,
diffusion studies, identification of risk factors through
comparisons, and regression analyses [1]. All of these
methods are useful when assessing risk factors. They
also facilitate the planning of health care policies and
support the implementation of effective health care ser-
vices.
Cuzick and Edwards (1990) [2] proposed three gen-
eral methodologies for the detection of clustering. Spa-
tial autocorrelation statistics, such as Moran’s I [3-6] an d
Geary’s C [3-5] are global methods used to estimate the
overall degree of spatial autocorrelation in a dataset.
However, the possibility of spatial heterogeneity sug-
gests that the estimated degree of autocorrelation may
vary significantly. Local spatial autocorrelation statistics
provide estimates disaggregated to the unit level, allow-
ing the assessment of dependency relationships in dif-
ferent areas. LISA detect local spatial autocorrelation in
aggregated data by dividing Moran’s I statistic into con-
tributions for each area within a study region. These in-
dicators can detect clusters of similar or dissimilar dis-
ease frequency values around a given observation [7].
Unlike Moran’s I statistic, which measures the correla-
tion between attribute values in adjacent areas, the Gi(d)
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. Openly accessible at http://www.scirp.org/journal/HEALTH/
713713
local statistic is an indicator of local clustering that
measures the “concentration” of a spatially distributed
attribute variable [8,9].
The analysis of spatio-temporal change is a major
concern in geographical research. Analytical approaches
include: the Knox test [10], Mantel’s Z statistic [11], the
Jacquez k nearest neighbor test [12], Kulldorff’s spatial
scan statistic [13-15] and Bayesian spatial scan statistic
[16]. Herein, we are primarily interested in detecting
clusters that emerge over time, and our goal is to detect
emerging clusters as early as possible. For example, in
the public health domain, our goal is to detect emerging
clusters of disease indicative of naturally occurring dis-
ease outbreaks (such as influenza), bioterrorist attacks
(such as anthrax release), or environmental hazards
(such as a radiation leak). Clearly, the early detection of
such clusters would contribute to a more rapid response,
leading to lives being saved.
Cancer is one chronic disease with a multi-stage pro-
gression. Many studies examine cancer incidence at dif-
ferent times, under different environmental exposures
and in different ethnic groups. Cancer incidence changes
over time for people of different ages, which may be due
to variations in lifestyle, changing environmental expo-
sure, etc. Cancer incidence also varies in different geo-
graphic locations [17-20]. Again, this may have various
explanations with environmental impact being a strong
possibility.
The detection of spatio-temporal clustering generally
requires continuous data. Discontinuous data, with dif-
ferent durations of disease surveillance at the same loca-
tion, present a challenge. This study focuses on the use
of a set of discontinuo us data to detect ch anges in spatio-
temporal clustering. We propose herein a method for
ascertaining spatial clustering associated with the 13
leading malignant neoplasms, based on medical-care
data collected by the Taiwan National Health Insurance
and Taiwan Cancer Registry agencies. To test this ap-
proach, we have compared local clusters between two
periods (1995-1998 and 2005-2008) looking for simi-
larities. We have also investigated potential spatial risks
that could contribute to these health care events, rede-
fining epidemiologic and spatially referenced data.
2. MATERIALS AND METHODS
2.1. Study Area
The study area included the main island of Taiwan
(excluding all surrounding islets) which, in the year
2000, comprised more than 22 million inhabitants living
in an area of 36 ,0 00 k m2. A total of 350 local administra-
tive government areas, including five main urban areas,
two secondary urban areas, 162 rural townships, and 54
aboriginal townships on the plain and in mountainous
regions, were assessed (Figure 1). According to a 2002
Ministry of Interior report, urban areas are classified as
regions having at least one metropo litan centre, and they
can include neighboring cities and townships that share
socio-economic activities. Main urban areas are defined
as those with a population larger than one million , speci-
fically, Taipei-K eelun g, Kaohsiung , Taich ung -Changh u a,
Jhongli-Taoyuan and Tainan. Secondary urban areas are
defined as those with a residential population ranging
from 0.3 to 1 million (e.g. Hsinchu and Chiayi).
2.2. Data Collection and Management
The Taiwan Nation al Health In suran ce (NHI) prog ram
was initiated in 1995. The coverage rate of the program
increased from 92.4% in 1995 to more than 96.2% in
2000, increasing to 98% after the inclusion of those ac-
tive in the military forces in 2001. Once the NHI medi-
cal care data were properly collected and analyzed, a
complete picture of population behaviors according to
disease could be used for reference in the calculation of
prevalence and incidence of various diseases.
At the beginning of 2004, NHI data that was available
relative to medical care, such as the leading causes of
death, were reclassified and reprocessed in relation to
smaller units or areas (for example, precincts or town-
ships rather than the country as a whole). In addition,
regional data from the statistical analysis system (SAS)
program are now announced publicly by the NHI in
Figure 1. Map of urban areas and aboriginal townships in the
study area. Map of the study area divided into 350 administra-
tive districts including seven urban areas and an integrated area
of 54 plains and mountain aboriginal townships.
regular annual reports (for example, NHI, 2005-2008
[21-24]). These reports provide an accurate and reliable
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. Openly accessible at http://www.scirp.org/journal/HEALTH/
714
data source for the investigation of health care issues in
Taiwan.
Data were collected from contractual medical-care in-
stitutions, where th e NHI covers the costs of prescription
medicines and treatment at outpatient clinics. Such fa-
cilities accumulate detailed databases on medical costs
for inpatient care. The number of outpatient cases were
classified in relation to disease codes, as defined in the
1975 edition of “The International Classification of Dis-
eases, 9th Revision, Clinical Modification” (ICD 9 CM).
Patients suffering from diseases that were difficult to
classify into a given code or had mismatched ID num-
bers were not included in the final statistical data set.
Disease codes were classified according to gender and
age. Cases with the same ID numbers, but which exhib-
ited different diseases, were counted as different in-
stances.
Medical care data obtained from the 2005-2008 NHI
reports were examined, and the morbidity rates of the 13
leading causes of death were calculated. Disease classi-
fications (according to the ICD 9 CM) included the fol-
lowing (indicated within parentheses): trachea, bronchus,
and lung cancer (ICD 162); liver and intrahepatic bile
ducts cancer (ICD 155); colon and rectum cancer (ICD
153, 154); stomach cancer (ICD 151); oral cavity cancer
(ICD 140, 141, 143-146, 148, 149); oesophagus cancer
(ICD 150); pancreas cancer (ICD 157); non-Hodgkin’s
lymphoma (ICD 200, 202, 203); gallbladder and extra-
hepatic bile ducts cancer (ICD 156); leukaemia (ICD
204-208); female breast cancer (ICD 174); cervix uteri
cancer (ICD 179, 180); and prostate cancer (ICD 185).
Demographic information was provided by the Minis-
try of Interior [25]. The smallest administrative units
coded for examination of the various diseases cases or
health care events were precincts and townships. Age-
adjusted standard morbidity rates, adjusted using the
Segi (“world”) population in 1976 as the standard [26],
were then calculated prov iding results giving the leading
causes of death for males and females in each township.
During the period from 1995 to 1998, data on age-
adjusted malignancies by precinct and township were
obtained from the Atlas of Cancer Mortality and Inci-
dence in Taiwan, officially published by the Bureau of
Health Promotion, Department of Health [27].
2.3. Statistics
The global Moran’s I spatial autocorrelation was used
to assess the correlation among neighbouring observa-
tions and to identify patterns and levels of spatial clus-
tering in neighbouring districts [28]. The Moran’s I sta-
tistic, similar to the Pearson correlation coefficient [29],
was calculated by the following formula:



2
ij
ij
ij
Oi
i
x
xx x
N
Iw
Sxx

 (1)
where N is the number of districts, wij the element in the
spatial weight matrix corresponding to the observation
pair i, j and xi and xj observations for the areas i and j
with the mean
x
and:
O
ij
Sij
w
(2)
Since the weights were row-standardized (1
ij
w
),
the first step in the spatial autocorrelation analysis was
to construct a spatial weight matrix that contained in-
formation about the neighbourhood structure for each
location. Adjacency was defined as immediately neigh-
boring administrative districts, including the district it-
self. Non-neighbouring administrative districts were
assigned the we i ght of zero .
Spatial contiguity for polygons is defined as the prop-
erty of sharing a common boundary or vertex. Contigu-
ity analysis is an importan t method fo r assessing unu sual
features in connectivity distribution [4,30]. The Queen’s
measure of contiguity can be utilized to make up for
spatial contiguity by incorporating both the Rook and
Bishop relationships into a single measure [30]. The
administrative districts considered in this study were
highly irregular in both shape and size. Tsai et al. (20 09)
demonstrated that the most appropriate method is the
first order queen polygon contiguity method for quanti-
fying the spatial weights matrix for the analysis of con-
nectivity. Based on this approach, the spatial weight/
connectivity matrices were determined and utilized in
conjunction with the global Moran’s I statistic and fol-
lowing LISA calculations [6].
Moran’s I va lues may range from –1 (dispersed) to +1
(clustered). A Moran’s I value of 0 suggests complete
spatial randomness. A random permutation procedure
recalculates a statistic many times by reshuffling the data
values among the map units to generate a reference dis-
tribution. The obtained calculated statistic based on the
observed spatial pattern is then compared to this refer-
ence distribution an d a pseu do significance level (pseudo
p-value) computed. To verify that the value of Moran’s I
was significantly different from the expected value, we
applied a Monte Carlo randomisation test with 999 per-
mutations to achieve highly significant values. Data
values were reassigned among the N locations, providing
a randomised distribution against which one may judge
the observed value. If the observed value of I was within
the tails of this distribution, there was significant spatial
autocorrelation in the data, a pseudo p-value smaller
than 0.05, and the assumption of independence among
the observations could be rejected [31].
LISA statistic provides information related to the lo-
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. Openly accessible at http://www.scirp.org/journal/HEALTH/
715715
cation of spatial clusters and outliers and the types of
spatial correlation. Local statistics are important becau se
the magnitude of spatial autocorrelation is not necessar-
ily uniform over the study area [7,32]. LISA allowed us
to divide the study area into small locations, thus ena-
bling the assessment of significant local spatial cluster-
ing around an individual location. In addition to the de-
gree of spatial clustering, detailed variations of cluster-
ing in the locally defined geo-space were identified as
well as the locations of the spatial clusters. The local
version of Moran’s I at location i is given by:


21i
i
j
i
i
xx
ijj
I
wx x
xx
n

(3)
where n indicates the total number of locations (350
townships used in the years 1995-1998 and 349 town-
ships in 2005-2009); xi denotes the value of the variable
of interest, X, at location I; xj denotes the observation at
neighboring location s j; and
is th e sample average of
X. wij is the spatial weight matrix, which defines spatial
interaction across study regions. In general, wij = 1 if
location i and location j are neighboring, (share a com-
mon boundary); otherwise, wij = 0. In this study, spatial
contiguity was assessed as the first order queen’s conti-
guity which defines spatial n eighbors as those areas with
shared borders and vertexes.
Significance was tested by comparison to a reference
distribution obtained by random permutations [7]. This
analysis used 999 permutations to determine differences
between spatial un its. A positive value for the local Mo-
ran’s I index (i
I
) indicates that a feature has neighboring
features with similarly high or low attribute values and is
therefore part of a cluster. A negative value for (i
I
) in-
dicates that a feature has neighboring features with dis-
similar values; this feature is an outlier. In either instance,
the p-value for the feature must be small enough for the
cluster or outlier to be considered statistically significant.
LISA enables distinguishment between a statistically
significant (0.05 level) cluster of high values (HH), a
cluster of low values (LL), an outlier in which a high
value is surrounded primarily by low values (HL), and
an outlier in which a low value is surrounded primarily
by high va lues (LH). In add itio n to the va lue of a z-score
larger than +1.96, the outcomes are defined as clusters
with both HH and LL. In th e case of a value of a z-score
less than –1.96, the outlier is considered as clusters with
(HL) and (LH). We consider that outliers may not be
stablily and precisely displayed the outcomes of spatio-
temporal pattern comparison, because it is difficult to
distinguish between outliers how strength with or with-
out disease risks. Therefore, only hot and cold spots are
mapped on local Moran’s maps.
In addition to mapping, similarities between spatial
distribution patterns for the two periods (1995-1998 and
2005-2008) were determined using logistic regression
analysis. The binary response indicates whether there is
significant autocorrelation between administrative dis-
tricts or areas. The correlation is better (higher) if the
value of the z-score of the local Moran’s I statistic is
larger than +1.96 (clusters with hot spots and cold spots),
otherwise it is deemed to be low. The model is exp ressed
as:

01
PrHigher correlation
log Period
PrLower correlation






(4)
where the Period is considered an explanatory variable
in the logistic regression model and the two β valu es the
logistic regression coefficients of the model. Pr (Higher
correlation) and Pr(Lower correlation) denote the
“Higher” and “Lower” correlation probabilities, respec-
tively. In this study, two distinct precincts, the central
and west precincts in Tainan city, merged into one single
unified administrative unit in 2004. These unpaired data
were omitted and the total data from 348 townships were
tested using logistic regression.
Modeling of the logistic regression was performed
using SPSS 12. Global Moran’s I statistic and local Mo-
ran’s I statistic was calculated using Geoda (http://www.
geoda.uiuc.edu/), an open source spatial analysis system,
and visualized on LISA cluster maps using ArcMap 9.3.
3. RESULTS
Figure 2 displays the spatial clusters (hot spots and
clod spots) as obtained using LISA statistic for the top
13 leading malignant neoplasms for both males and fe-
males in Taiwan during two time periods (1995-1998
and 2005-2008).
Tab le 1 summarizes the results from global autocor-
relation statistics for the top 13 leading malignant neo-
plasms according to gender and in the two time periods
(1995-1998 and 2005-2008) in Taiwan. The results of
the global Moran’s I tests for most cases related to the
leading malignant neoplasms are statistically significant,
having a pseudo p-value smaller than 0.05, and indicated
spatial heterogeneity. However, opposing results (a pseu-
do p-value larger than 0.05) emerged in nine cases of
which are pancreas cancer for males (1995-1998), non-
Hodgkin’s lymphoma for males (1995-1998) and fe-
males (1995-1998 and 2005-2008), gallbladder and ex-
trahepatic bile ducts cancer for males (1995-1998) and
females (1995-1998 and 2005-2008), and leukemia for
males (2005-2008) and females (1995-1998), respec-
tively.
Table 2 summarizes the typology patterns, as calcu-
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. http://www.scirp.org/journal/HEALTH/
716
Openly accessible at
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. Openly accessible at http://www.scirp.org/journal/HEALTH/
717717
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. Openly accessible at http://www.scirp.org/journal/HEALTH/
718
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. Openly accessible at http://www.scirp.org/journal/HEALTH/
719719
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. Openly accessible at http://www.scirp.org/journal/HEALTH/
720
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. Openly accessible at http://www.scirp.org/journal/HEALTH/
721721
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. Openly accessible at http://www.scirp.org/journal/HEALTH/
722
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. Openly accessible at http://www.scirp.org/journal/HEALTH/
723723
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. Openly accessible at http://www.scirp.org/journal/HEALTH/
724
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. Openly accessible at http://www.scirp.org/journal/HEALTH/
725725
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. Openly accessible at http://www.scirp.org/journal/HEALTH/
726
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. http://www.scirp.org/journal/HEALTH/
727727
Figure 2. Spatial clusters of the 13 leading malignant neoplasms in Taiwan. Maps showing the spatial clusters of the 13 leading ma-
lignant neoplasms in Taiwan: A indicates trachea, bronchus, and lung cancer; B, liver and intrahepatic bile ducts cancer; C, colon and
rectum cancer; D, stomach cancer; E, oral cavity cancer; F, oesophagus cancer; G, pancreas cancer; H, non-Hodgkin’s ly mphoma; I,
gallbladder and extrahepatic bile ducts cancer; J, leukaemia; K, female breast cancer; L, cervix uteri cancer; M, prostate cancer. 1
indicates males within the period from 1995 to 1998 years; 2, males within the period from 2005 to 2008 years; 3, females within the
period from 1995 to 1998; 4, females within the period from 2005 to 2008.
lated using LISA statistic, categorized as clusters or non-
clusters at a z-score larger than +1.96. It also compares
the top 13 leading malignant neoplasms during the two
time periods (1995-1998 and 2005-2008).
Dissimilarities between the spatial distribution pat-
terns during the two periods (1995-19 98 and 2005-2 008)
are not statistically significant (p-value > 0.05) in males
for six out of eleven spatial clusters, and in females for
ten of twelve spatial clusters. In males, there are dis-
similarities for stomach cancer, oral cavity cancer, pan-
creas cancer, non-Hodgkin’s lymphoma, and prostate
cancer. In females, colon and rectum cancer, and pan-
creas cancer are dissimilar. Ta b l e 2 presents these find-
ings.
4. DISCUSSION
Locations in close proximity tend to share similar
attributes. According to Tobler (1979), “everything is
related to everything else, and nearby things are more
closely related to nearby things than to distant things”
[33]. In epidemiology, a cluster becomes apparent when
a number of health events occur which are situated close
together in space and/or time. The evaluation of spatial
distributions as a measure of disease risk may provide
etiological insights [34]. Spatial autocorrelation is the
relation between the values of a single variable attribut-
able to the geographic arrangement of areal units on a
map and can be used to determine the degree of spatial
clustering [35,36]. In this study, local Moran’s I statistic
was used to measure the degree of spatial clustering and
map the geographic patterns of the areal units. Spatial
clustering of the leading cause of death (also called hot
spots and cold spots) was identified by a z-score value
arger than +1.96. In epidemiology, “hot spots” are l
Openly accessible at
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. Openly accessible at http://www.scirp.org/journal/HEALTH/
728
Table 1. Global autocorrelation analysis of data for the 13 leading malignant neoplasms in Taiwan, according to gender, during 1995-
1998 and 2005-2008.
Moran’s I
Leading malignant neoplasms (ICD code) Male Female
1995-1998 2005-2008 1995-1998 2005-2008
Trachea, bronchus, and lung cancer (ICD 162) 0.38* 0.46* 0.17* 0.17*
Liver and intrahepatic bile ducts cancer (ICD 155) 0.45* 0.59* 0.34* 0.42*
Colon and rectum cancer (ICD 153, 154) 0.40* 0.52* 0.40* 0.49*
Stomach cancer (ICD 151) 0.34* 0.37* 0.22* 0.35*
Oral cavity cancer (ICD 140, 141, 143- 146, 148, 149) 0.43* 0.68* 0.09* 0.68*
Oesophagus cancer (ICD 150) 0.24* 0.22* 0.07* 0.25*
Pancreas cancer (ICD 157) 0.05 0.18* 0.07* 0.22*
Non-Hodgkin’s lymphoma (ICD 200, 202, 203) 0.02 0.07* 0.05 0.05
Gallbladder and extrahepa t i c b i l e ducts cancer (ICD 156) 0.06 0.14* 0.05 0.04
Leukaemia ( ICD 204-208) 0.08* 0.04 0.01 0.08*
Female breast cancer (ICD 174) n.d. n.d. 0.52* 0.53*
Cervix uteri cancer (ICD 179, 180) n.d. n.d. 0.24 * 0.26*
Prostate cancer (ICD 185) 0.12* 0.60* n.d. n.d.
n.d.: no detection. *: A pseudo p-value smaller than 0.05.
Table 2. Logistic regression model comparisons of the 13 leading malignant neoplasms in Taiwan, during 1995-1998 and 2005-2008.
Male Female
Leading malignant neoplasm s (I CD code) p-value description p-value description
Trachea, bro n c h u s , and lung cancer (ICD 162) 0.245 similaritya 0.21 similaritya
Liver and intrahepatic bile duc t s c anc er (ICD 155) 0.505 similaritya 0.412 similaritya
Colon and rectum cancer (ICD 153, 154) 0.492 similaritya 0.019 dissimilaritya
Stomach cancer (ICD 151) 0.034 dissimilaritya 0.053 similaritya
Oral cavity cancer (ICD 1 4 0, 141, 143-146, 148, 149) 0.007 dissimilaritya 0.229 similaritya
Oesophagus cancer (ICD 150) 0.844 similaritya 0.266 similaritya
Pancreas cancer (ICD 157) 0.029 dissimilarity 0.047 dissimilaritya
Non-Hodgkin’s lymphoma (ICD 200, 202, 203) 0.006 dissimilarity 0.179 similarity
Gallbladder and extrahepatic b i le d u ct s c a nc e r (ICD 156) 0.409 similarity 0.197 similarity
Leukaemia ( ICD 204-208) 0.137 similarity 0 .781 similarity
Female breast cancer (ICD 174) n.d. 0.182 similaritya
Cervix uteri cancer (I C D 179 , 180) n.d. 0.84 similaritya
Prostate cancer (ICD 185) 0.007 dissimilaritya n.d.
n.d.: no detection. a: A comparison of the two periods during which all of Moran’s test results are clusters (results based on Table 1).
considered interesting because of their correlation to
aetiology. This study, therefore, focuses on the spatial
locations of 13 leading malignant neoplasms. Information
about spatial location is useful for detecting risk from a
spatial point of view. A more detailed survey of these
identified “hot spots” may provide important clues on
risk factors for these diseases.
The modifiable areal unit problem (MAUP) is a phe-
nomenon whereby analysis of the same data provides
different results, grouped into different sets of areal units.
The MAUP can be subdivided into two separate effects
that usually occur simultaneously during the analysis of
aggregated data. The scale effect causes variation in
statistical results according to different levels of aggre-
gation. An association between variables, therefore, de-
pends on the sizes of the areal units of the rep orted data.
Generally, correlation increases as the size of the areal
unit increases. The zone effect describes variations in
correlation statistics caused by the regrouping of data
into different configurations, but with the same scale.
The MAUP occurs because spatial processes generating
the observed data may exist within certain scales, and for
particular areal units. These may be reflected more or
less accurately by the boundaries in use [37]. Manley et
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. Openly accessible at http://www.scirp.org/journal/HEALTH/
729729
al. (2006) concluded that MAUP is not really a problem,
but rather, a resource. Data at different scale levels can
enable the identification of processes operating within
different scales. It is clear that it is not possible to define
an ideal single census geography that captures all of the
processes for all variables [37]. Furthermore, the internal
composition of given areal units may not be homoge-
neous, particularly for disease distribution. Matisziw et
al. (2008) have suggested that down-scaling the spatial
structure of polygonal units could provide valuable in-
formation pertaining to the spatial distribution of disease
[38]. In this study, administrative government regions
are almost similar but not completely consistent in the
two periods (1995-1998 and 2005-2008). This was to
some degree due to the merging of the central and west
districts in Tainan city merging into one unit in 2004.
The use of only one scale to estimate spatial distribu tion
patterns, although still a cluster comparison, would be
more convenient; however, bias could be caused by
using a non-realistic spatial boundary. An ideal process
would be to calculate the spatial autocorrelation coeffi-
cients (such as the z-scores) based on realistic boundaries
(two scales for shape files that represented 350 townships
in 1995-1998 and 349 townships in 2005-2008, respec-
tively) and then omit the values of autocorrelation co-
efficients that were non-paired data from the comparison
of the two periods within the administrative regions.
The local spatial autocorrelation coefficients can be
tested for statistical significance under two rather dif-
ferent model assumptions. The first is the classical statis-
tical assumption of normality, whereb y it is assumed that
the observed value of the coefficient is the result of the
set of z-score values being independent and identically
distributed drawings from a normal distribution, implying
that variances are cons tant across the reg ion. The second
model is one of randomization, whereby the observed
pattern of the set of z-score values is assumed to be just
one realization from all possible random permutations of
the observed values across all the zones. Both models
have important weaknesses. For example, there is an
underlying population size variatio n and a lack of homo-
geneity of probabilities; however these models are
widely implemented in software packages to provide
estimates of the significance of observed results. In the
case of the randomization model, many software pack-
ages generate a set of N random permutations of the
input data, where N is us er specified. For each simulation
run, index values are computed and the set of such
values are used to provide a pseudo-probability distribu-
tion for the given problem, against which the observed
value can be compared. A z-transform of the coefficients
under normality or randomization assumptions is distri-
buted approximately as N(0, 1); hence, this may be com-
pared to percentage points of the normal distribution to
identify particularly high or low values [39]. In this
study, the comparison of databases from the two periods
(1995-1998 and 2005-2008) was addressed by the Tai-
wan Cancer Registry and the Taiwan National Health
Insurance agencies, respectively. Although the two data-
bases have a referenced value with high validity and
reliability, this case was defined with the same diagnostic
criteria (ICD 9 CM) and a world standard population in
1976 to calculate the morbidity rate. However, the esti-
mated morbidity rates derived from the two databases
cannot be directly compared with one another. Our
suggested resolution is to change the morbidity rate into
a z-transform by using a spatial autocorrelation calcula-
tion with a randomization of 999 permutations, and this
then makes two z-transform comparisons feasible. Bino-
minal variable logistic regression models were used to
distinguish spatial distribution patterns that addressed
the two periods (1 995-1998 and 2005-2008).
Z-scores for the LISA method were calculated using
the logistic regression model and results for various
leading malignant neoplasms during two periods (1995-
1998 and 2005-2008) were compared. However, the
constraint condition for spatial clustering comparison
(such as global Moran’s tested clusters on both sides) are
required to be satisfied before calculating the logistic
regression for purposes of comparison. Based on this
constraint, the results demonstrate statistically significant
differences for stomach cancer (in males), oral cavity
cancer (in males), prostate cancer (in males), colon and
rectum cancer (in females), and pancreas cancer (in
females). Another eleven compared cases were not signi-
ficantly different. The null hypothesis is, therefore,
accepted. The accepted null hypothesis results indicate
that the common spatial factor(s) may interact with both
periods.
Few previous ecological studies relate to malignant
neoplasms and their correlation to risk factors in Taiwan,
although oral cancer and stomach cancer have been
documented and are discussed briefly below. It is hoped
that this assessment of the spatial clustering of Taiwan’s
leading malignant neoplasms can contribute to the study
of spatial epidemiology.
Two separate groups identified clusters of areas
showing elevated mortality from oral cavity cancer in
females in the aboriginal townships in eastern Taiwan.
The habits of cigarette smoking, alcohol drinking and
betel nut chewing had higher prevalence in aboriginal
women in eastern Taiwan than in women in other regions
[40,41]. Chiang et al. suggested that high-risk areas of
oral cancer incidence in males closely coincided with
spatial distribution of heavy-metal pollution in soils
(such as chromium and nickel) in central Taiwan [42]. In
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. Openly accessible at http://www.scirp.org/journal/HEALTH/
730
this study, oral cavity cancer clusters for each gender
were calculated using the LISA statistic. Results identify
clear spatial clustering in central Taiwan, for males, and
eastern Taiwan for females, among Taiwanese aboriginal
townships. These observations, therefore, support the
results described in previous studies. However, according
to our results, the two periods (1995-1988 and 2005-
2008), show dissimilarity in the spatial distribution of
oral cavity cancer in males. Spatial risks affecting oral
cancer morbidity in males reveal space-time changes.
These findings could be interpreted as the changing
disease clusters over time, are due to the changes of
exposure cond itions to metal pollu tant and leading to the
results of a variation of virulence. Further investigation
is therefore warranted.
Several meta-analyses identified a strong and consis-
tent association between H. pylori infection and non-
cardiac gastric cancer [43-46]. The ecological study in
Taiwan suggests an association between this infection
and gastric cancer. H. pylori in fection in early childhood
may be a key issue and, it appears, a long indu ction time
is required for gastric carcinogenesis. High gastric
cancer mortality areas are clustered in the aboriginal
townships where the prevalence of H. pylori is high [40,
47]. Our results are similar to these previous studies.
Stomach cancer clusters for males and females are
located in the Taiwanese aboriginal townships, and a
new carcinogen cluster was identified in the northern
coastal region of Taiwan. This is worthy of further
investigation. However, the two periods (1995-1988 and
2005-2008) show dissimilarity in the spatial distribution
of gastric cancer in males. Spatial risks affecting gastric
cancer morbidity in males reveal space-time changes. By
changing disease clusters over time, a possible reason is
due to the changes of prevalence ranges of H. pylori or
increased in the interference of other risks in the study
area. Further investigation is therefore warranted.
5. CONCLUSIONS
A method which combines LISA statistics and log istic
regression is an effective tool for the detection of space-
time patterns with discontinuous data. Similarity is a
result of unchangeable condition in disease risks. Con-
versely, dissimilarity is deemed a significant change of
morbidity risks over the studied periods. This enables
planners to assess spatial risk factors and to determine
the most advantageous types of health care policies for
the planning and imple mentation of health care services.
These issues can greatly improve the performance and
effectiveness of health care services and also provide a
clear outline for better understanding of the results in
depth.
6. ACKNOWLEDGEMENTS
The authors would like to thank Taiwan’s Department of Health for
providing the National Health Insurance and Bureau of Health Pro mo-
tion databases.
REFERENCES
[1] Gesler, W. (1986) The uses of spatial analysis in medical
geography: A review. Social Science & Medicine, 23,
963-973. doi:10.1016/0277-9536(86)90253-4
[2] Cuzick, J. and Edwards, R. (1990) Spatial clustering for
inhomogeneous populations. Jo u rnal of the Roy al Statistical
Society, 52, 73-104.
[3] Cressie, N.A.C. (1993) Statistics for spatial data. Wiley,
New York.
[4] Legendre, P. and Legendre, L. (1998) Numerical ecology.
2nd English Edition, Elsevier , Amsterdam.
[5] Fortin, M.J. (1999) Spatial statistics in landscape ecology.
In: Klopatek, J.M. and Gardner, R.H., Eds., Landscape
Ecological Analysis: Issues and Applications, Springer-
Verlag, New York, 253-279.
doi:10.1007/978-1-4612-0529-6_12
[6] Tsai, P.J., Lin, M.L., Chu, C.M. and Perng, C.H. (2009)
Spatial autocorrelation analysis of health care hotspots in
Taiwan in 2006. BMC Public Health, 9, 464.
doi:10.1186/1471-2458-9-464
[7] Anselin, L. (1995) The local indicators of spatial associa-
tionLISA. Geographical Anal ysis, 27, 93-115.
d oi:10.1111/j.1538-4632.1995.tb00338.x
[8] Getis, A. and Ord, J.K. (1992) The analysis of spatial
association by use of distance statistics. Geographical
Analysis, 24, 189-206.
d oi:10.1111/j.1538-4632.1992.tb00261.x
[9] Getis, A. and Ord, J.K. (1996) Local spatial statistics: An
overview. In: Longley, P. and Batty, M., Eds., Spatial
Analysis: Modeling in A GIS Environment, John Wiley &
Sons, New York, 261-277.
[10] Knox, E.G. (1964) The detection of space-time interac-
tion. Appied Statistics, 13, 25-29.
doi:10.2307/2985220
[11] Mantel, N. (1967) The detection of cancer clustering and
the generalized regression approach. Cancer Research,
27, 209-220.
[12] Jacquez, G.M. (1996) A k nearest neighbor test for
space-time interaction. Statistics in Medicine, 15, 1935-
1949.
doi:10.1002/(SICI)1097-0258(19960930)15:18<1935::AI
D-SIM406>3.0.CO;2-I
[13] Kulldorff, M. and Nagarwalla, N. (1995) Spatial disease
clusters: Detection and inference. Statistics in Medicine,
14, 799-810. doi:10.1002/sim.4780140809
[14] Kulldorff, M. (1997) A spatial scan statistic. Communi-
cation in Statistic: Theory and Methods, 26, 1481-1496.
doi:10.1080/03610929708831995
[15] Kulldorff, M. (1999) Spatial scan statistics: Models, cal-
culations, and applications. In: Glaz, J. and Balakrishnan,
N., Eds., Scan Statistics and Applications, Birkhäuser,
Boston, 303-322. doi:10.1007/978-1-4612-1578-3_14
[16] Neill, D.B., Moore, A.W. and Cooper, G.F. (2006) A
Bayesian spatial scan statistic. Advances in Neural In-
P.-J. Tsai et al. / Health 3 (2011) 712-731
Copyright © 2011 SciRes. http://www.scirp.org/journal/HEALTH/Openly accessible at
731731
formation Processing Systems, 18, 1003-1010.
[17] Greenlee, R.T., Murray, T., Bolden, S. and Wingo, P.A.
(2000) Cancer statistics. A Cancer Journal for Clinicians,
50, 7-33. doi:10.3322/canjclin.50.1.7
[18] Adami, H.O., Hunter, D. and Trichopoulos, D. (2002)
Textbook of cancer epidemiology. Oxford University
Press, New York.
[19] Parkin, D.M., Whelan, S.L., Ferlay, J., Teppo, L. and
Thomas, D.B. (2002) Cancer incidence in five continents.
IARC Scientific Publications, Lyon.
[20] Frank, S.A. (2007) Dynamics of cancer: Incidence, in-
heritance, and evolution. Princeton University Press,
Princeton.
[21] National Health Insurance (2007) Statistical annual re-
port of medical care 2005. National Health Insurance
(Taiwan), Taipei.
[22] National Health Insurance (2008) Statistical annual re-
port of medical care 2006. National Health Insurance
(Taiwan), Taipei.
[23] National Health Insurance (2009) Statistical annual re-
port of medical care 2007. National Health Insurance
(Taiwan), Taipei.
[24] National Health Insurance (2010) Statistical annual re-
port of medical care 2008. National Health Insurance
(Taiwan), Taipei.
[25] Ministry of the Interior (2009) The demographic data-
base. http://www.moi.gov.tw/stat/index.aspx
[26] Ahmad, O.E., Boschi-Pinto, C., Lopez, A.D., Murray,
C.J.L., Lozano, R. and Inoue, M. (2000) Age standardi-
zation of rates: A new WHO standard (GPE discussion
paper series, No. 31). World Health Organization Press,
Geneva.
[27] Liaw, Y.P., Chen, C.J., Lee, W.C. and Hsu, S.Y. (2003)
The construction and use of the electric atlas of cancer
mortality and incidence in Taiwan. Taiwan Journal of
Public Health, 22, 227-236.
[28] Boots, B.N. and Getis, A. (1998) Point pattern analysis.
Sage Publications, Newbury Park.
[29] Cliff, A.C. and Ord, J.K. (1973) Spatial autocorrelation.
Pion Limited, London.
[30] Grubesic, T.H. (2008) Zip codes and spatial analysis:
Problems and prospects. Socio-Economic Planning Sci-
ences, 42, 129-149. doi:10.1016/j.seps.2006.09.001
[31] Cliff, A.D. and Ord, J.K. (1981) Spatial processes: Mod-
els and applications. Pion Limited, London.
[32] Ord, J.K. and Getis, A. (1995) Local spatial autocorrela-
tion statistics: Distributional issues and an application.
Geographical Analysis, 27, 286-306.
d oi:10.1111/j.1538-4632.1995.tb00912.x
[33] Tobler, W. (1979) Cellular geography. In: Gale, S. and
Olsson, G., Eds., Philosophy in Geography, Riedel, Dor-
drecht, 379-386.
[34] Moore, D.A. and Carpenter, T.E. (1999) Spatial analyti-
cal methods and geographic information systems: Use in
health research and epidemiology. Epidemiologic Re-
views, 21, 143-161.
[35] Griffith, D.A. and Arnrhein, C.G. (1991) Statistical analy-
sis for geographers. Prentice Hall, Englewood Cliffs.
[36] Kitron, U. and Kazmierczak, J.J. (1997) Spatial analysis
of the distribution of Lyme disease in Wisconsin. Ameri-
can Journal of Epidemiology, 145, 558-566.
[37] Manley, D., Flowerdew, R. and Steel, D. (2006) Scales,
levels and processes: Studying spatial patterns of British
census variables. Computers, Environment and Urban
Systems, 30, 143-160.
doi:10.1016/j.compenvurbsys.2005.08.005
[38] Matisziw, T.C., Grubesic, T.H. and Wei, H. (2008)
Downscaling spatial structure for the analysis of epide-
miological data. Computers, Environment and Urban
Systems, 32, 81-93.
[39] De Smith, M.J., Goodchild, M.F. and Longley, P.A.
(2007) Geospatial Analysis: A comprehensive guide to
principles, techniques and software tools. Matador,
Leicester.
[40] Lin, J.T., Wang, L.Y., Wang, J.T., Wang, T.H. and Chen,
C.J. (1995) Ecological study of association between
Helicobacter pylori infection and gastric cancer in Tai-
wan. Digestive Diseases and Sciences, 40, 385-388.
doi:10.1007/BF02065425
[41] Yang, Y.H., Lee, H.Y., Tnug, S. and Shieh, T.Y. (2001)
Epidemiological survey of oral submucous fibrosis and
leukoplakia in aborigines of Taiwan. Journal of Oral
Pathology & Medicine, 30, 213-219.
doi:10.1034/j.1600-0714.2001.300404.x
[42] Chiang, C.T., Hwang, Y.H., Su, C.C., Tsai, K.Y., Lian,
I.B., Yuan, T.H. and Chang, T.K. (2010) Elucidating the
underlying causes of oral cancer through spatial cluster-
ing in high-risk areas of Ta iwan with a distinct gender ra-
tio of incidence. Geospatial Health, 4, 231-242.
[43] Huang, J.Q., Sridhar, S., Chen, Y. and Hunt, R.H. (1998)
Meta-analysis of the relationship between Helicobacter
pylori seropositivity and gastric cancer. Gastroenterology,
114, 1169-1179. doi:10.1016/S0016-5085(98)70422-6
[44] Eslick, G.D., Lim, L.L. and Byles, J. (1999) Association
of Helicobacter pylori infection with gastric carcinoma:
A meta-analysis. The American Journal of Gastroen-
terology, 94, 2373-2379.
d oi:10.1111/j.1572-0241.1999.01360.x
[45] Xue, F.B., Xu, Y.Y. and Wan, Y. (2001) Association of
Helicobacter pylori infection with gastric carcinoma: A
meta-analysis. World Journal of Gastroenterology, 7,
801-804.
[46] Wang, C., Yuan, Y. and Hunt, R.H. (2007) The associa-
tion between Helicobacter pylori infection and early gas-
tric cancer: A meta-analysis. World Journal of Gastroen-
terology, 102, 1789-1798.
d oi:10.1111/j.1572-0241.2007.01335.x
[47] Teh, B.H., Lin, J.T., Pan, W.H., Lin, S.H., Wang, L.Y.,
Lee, T.K. and Chen, C.J. (1994) Seroprevalence and as-
sociated risk factors of Helicobacter pylori infection in
Taiwan. Anticancer Research, 14, 1389-1392.