^{1}

^{*}

^{1}

For a long time, Geography did not hold a specific mathematical approach for any interpretation of space and this was the key reason why Geography degrees covered a wide variety of subjects such as demography, geology or topography to fulfill its curriculum. Yet from the 90’s, Geography finally created its own research agenda to meet four vital questions of any true geographer: “Where is …?”, “Is there a general spatial pattern?”, “What are the anomalies?” and “Why do these phenomena pursue certain spatial distribution?” The present review article addresses ten different spatial (point, regression and event) issues for learning and teaching aim where statistics play a major background role on the outcomes of myGeoffice
^{<sup>©</sup>}
free Web GIS platform. These include cluster analysis, geographically weighted regression (GWR), ordinary least squares (OLS) regression, path analysis, minimum spanning tree, linear regression, space-time clustering and point patterns, for instance. Although the technical viewpoint of the algorithms is not explained at fully, this review paper makes a rather strong emphasis on the result’s interpretation, their respective meaning and when these techniques should be applied in a learning/teaching context.

If Geography did not exist, it had to be invented. In fact and according to [

Without any doubt, all these topics are essential for the description of space, as we know it. Yet, under the current Internet age, Geographical Information Systems (GIS) is seldom included in any Geography syllabus to foster inferential analysis of spatial data. Undoubtedly, the state of contemporary education in this field leaves much to be desired [

With the development of new digital technologies, the field of Geography is undergoing a major significant digital transformation [

The aim of this review paper is to raise awareness about the importance and use of Web GIS. In other words, the purpose is to stimulate interest in young students, teachers and other fans of Geography for myGeoffice^{©}, a free Web 2.0 tool. Besides this introduction, conclusion and references, the core of this writing is subdivided into three main sections:

➢ Point analysis (cluster analysis of the ovarian cancer in Portugal, Dijkstra optimal route and Kruskal minimum spanning tree problem among eight cities of Brazil).

➢ Spatial regression (GWR and OLS multiple regression of the lung cancer dataset in Ohio, USA).

➢ Event analysis (binomial probabilities of powerful earthquakes in Japan, Poisson likelihood of strong typhoons in the Philippines, Knox index with the motorbike’s crashes dataset in Hangzhou, China, moving spatial statistic of burglaries events in Toronto, Canada, and point pattern correlation between Madrid and Seville, Spain, regarding response time of ambulances).

The ten issues presented here follow this framework. Based on different spatial real data from Portugal, Brazil, USA (Ohio), Japan, Philippines, China (Hangzhou), Canada (Toronto) and Spain (Madrid and Seville), myGeoffice^{©} techniques were applied on a kind of democratization of mapping and analysis of the world around us. After all, if Geography is key to understand spatial daily changes, the Internet became the platform. Yet, it is GIS analysis allows us to evaluate suitability and capability, estimating and predicting, interpreting and understanding those changes [

Again, it is expected that readers may comprehend and take the below ten spatial problems into their classroom for discussion and construct new knowledge in a spatial way of thinking. The reader must understand that Web GIS, in general, and myGeoffice^{©}, in particular, is more than just mapping software that is running online. It is actually a partial statistical software for discovering, consuming and sharing geographic data to fulfill particular objectives. In the end, the goal of this review article are to open the reader’s eyes to what is now possible with Web GIS and, since the internal algorithm’s geocomputation was forgotten here, the basics of Web GIS are easy, engaging and, above all, now accessible to everyone, not just for the experts. Quoting [

Cluster analysis classifies a set of observations into two or more mutually exclusive unknown groups with the aim of data reduction, that is, to organize a system of data observations into groups where members of those groups share common properties [

Globally, this technique tries to find groups with a high intra-class similarity and a low inter-class one. For instance, the practitioner may be able to form clusters of customers who have similar buying habits in specific supermarkets for marketing geo-segmentation purposes. Identify groups of houses according to their type, value and geographic location based on their credit history is another example [

Portugal is a country located mostly on the Iberian Peninsula in southwestern Europe. It is the westernmost country of mainland Europe, bordered to the west and south by the Atlantic Ocean and to the north and east by Spain. Its continental territory with ten main districts (Minho, Montes, Douro, Beira Alta, Baixa & Litoral, Extremadura, Ribatejo, Alto and Baixo Alentejo, Algarve) also includes the Atlantic archipelagos of Azores and Madeira.

The similarity threshold between groups is another parameter that may lead to a different cluster classification of these twelve district state hospitals. For example, it is possible to consider that cluster one is composed of hospital 11 (Madeira), 12 (Azores) and 5 (Beira Baixa) while cluster 2 regards the remaining nine hospitals. Yet, it is also possible to consider three clusters instead: 1 (Minho), 6 (Beira Litoral), 8 (Ribatejo), 3 (Douro) & 9 (Alto Alentejo) VS 7 (Extremadura), 10 (Baixo Alentejo), 2 (Montes) & 4 (Beira Alta) VS 11 (Madeira), 12 (Azores) & 5 (Beira Baixa). Again, it is up to the investigator to explain these arrangements, that is, cluster analysis does not enlighten the reasons why these costs hospitals behave in this particular way. Basically, this technique only supplies clues or evidences for the scientist to investigate about possible explanations or causes for those twelve hospitals cost’s performance. Logically, if those clusters discriminate certain patterns then cluster analysis can be useful. However, that may not be the case in all situations.

Dijkstra path analysis, an algorithm conceived by the Dutch scientist Edger Dijkstra in 1959, is a graph search procedure that solves the single-source shortest path problem for a graph with non-negative path costs. If the vertices of the graph represent cities and the link costs represent, for example, driving distances between pairs of cities connected by a direct road, Dijkstra’s algorithm can be used to find the shortest route between one city and the remaining ones [

Let’s consider seven Brazilian cities that are supplied with coffee every week by a logistic transportation company from Ribeirao Preto (Sao Paulo state). The seven branches are spread around the country as depicted in

Kruskal is a graph algorithm that finds the minimum spanning tree for a connected weighted graph. Specifically, it finds a subset of edges (cities) within a connected tree (roads between towns) where the total connection weights (liters of gas, time or space distances, for example) among all edges of the present tree are minimized. If the graph is not all connected, as expected, it finds a minimum spanning tree for each connected component [

Kruskal has been in use in many real problems. [

Geographical Weighted Regression (GWR) is one of several spatial regression techniques that provides a local model of the variable the researcher are trying to understand/predict (dependent variable) by fitting a regression equation (based on one or several independent variables) to every feature in the spatial dataset [

[_{2} emissions). Their results indicate an obvious spatial effect on carbon emissions in the studied provinces whose urbanization impact presented an increasing trend from the south-eastern coast to the north-west from 2000 to 2015. As well, energy intensity had a remarkably positive effect on HCE for the same time period, although it had a negative effect in all provinces in 2005 and in some provinces in 2010. Likewise, income was a powerful explanatory factor for growth in household CO_{2} carbon emissions in all years. At last, the effect of income on HCE was positive and showed an increasing tendency year by year.

Ohio state is composed of 88 counties whose lung man cancers has shown atypical values when compared with the remaining 49 states of the USA (unsurprisingly, a particular sensitive issue for health insurances and local hospitals). Is it possible to infer any particular spatial trend, pattern and outlier from the present Ohio situation? For illustration purposes, the present spatial dataset holds 88 records and five variables: Latitude and longitude of each county, number of man lung cases in 1988 for each county (dependent variable), white and black population at risk (independent variables).

Within GWR, each Beta parameter of each local estimation (y_{i} = β_{0} + β_{1}x1_{i} + β_{2}x2_{i} + ε_{i}) has a sign and a magnitude according to each location (the essence of spatial heterogeneity, that is, the structure of the model changes from place to place across the study area as the parameter estimates change towards each other inside the model). If the sign is positive, an increase of the variable value to which the parameter refers will induce an increase in the dependent variable. If the sign is negative, a decrease will be induced. As stated by [

From

One of the basic statistics assumptions of regressions concerns the location of residuals (positive and negative) that should layout in a random way. For the present spatial dataset, these residuals are displayed on the left bottom image of

By examining the coefficient raster surface produced by GWR (to better understand the regional variation of the independent variables), it is, hence, possible to examine the spatial consistency (stationarity) relationship between the dependent (LungCancers) and both explanatory variables (WhiteRisk and BlackRisk, respectively) across Ohio state (see

The conventional Ordinary Least Squares (OLS) multiple regression focus on the relationship between a dependent variable (Y = LungMaleCancers) and one or more independent ones (X1 = WhiteRisk and X2 = BlackRisk, in this particular case) such as Y = Beta0 + Beta1 × X1 + Beta2 × X2 (according to

Once a regression model has been constructed, it is vital to confirm the model goodness via the R-squared index (R2 = 98.7982%) while the global model significance can be checked by the F-test (3493.791557, in this particular case) that follows a χ^{2} distribution. Since the degrees of freedom (df) equals (2.85), the threshold acceptance value of the null hypothesis (H0) is approximately 5.18 leading, consequently, to the inference of H0 rejection. As expected and confirming the strong linear relationship of the right graphs of

The binomial distribution is used for discrete random variables and it is appropriate when an event has only two possible outcomes: success (p probability of first outcome) or failure (q = 1 − p, that is, likelihood of second outcome). Recalling that the odds of all outcomes should sum to one, it is, thus, mandatory that p + q = 1 (p represents the possibility of a car being robbed in a particular city, for instance, while q equals the complementary likelihood of not being stolen).

As stated by [

The Richter magnitude scale is the most common standard of measurement for an earthquake. It was invented in 1935 by Charles Richter of the California Institute of Technology as a mathematical device to compare the size of earthquakes. Essentially, the 10 levels of the Richter scale quantify the amount of energy released during an earthquake, where each level is ten times stronger than the previous level [

Japan is the country with more earthquakes worldwide, where more than 2000 quakes are felt by Japanese every year. According to [

The Poisson distribution is applicable for count variables and it played a key function in experiments that had a historic role in the development of molecular biology [

This discrete probability expresses the chances of a given number of events occurring in a fixed interval of time and/or space (these events occur with a known average rate and are independent of the time since the last event). It is different from the Binomial distribution since it relies on an average estimate which may be derived from the long term average of occurrences such as the number of landslides along a mountain slope over a period of 50 years [

These appealing features of this particular event distribution may be used for spatial issues. For instance, in average, the Philippines are hit by eight tropical cyclones (typhoons) yearly. If so and according to the Poisson calculations (see

Globally, the Knox Index is a comparison of the relationship between incidents in terms of space and time, where each individual pair is compared in terms of the Euclidean distance and in terms of the time interval. Since each pair of points is being compared, there are N × (N − 1)/2 pairs, where N equals the total number of events (12 in this particular case study). Internally, the distance between events is divided into two groups—“Close in distance” and “Not close in distance” whereas the time interval between events is likewise divided into two similar groups—“Close in time” and “Not close in time” [

Hangzhou is the capital of Zhejiang Province, China, and the local political, economic and cultural center. This metropolis is located on the lower reaches of the Qiantang River in southeast China, a superior position in the Yangtze Delta and only 180 km from Shanghai. On March 19, 2018, twelve motorbikes crashes with other motorized vehicles in this beautiful city. Can we statistically hypothesize about space and time clustering of these events on that particular day?

Since the Knox Index is a one-tailed test that follows a Chi-square distribution (a high value is an indicative of spatial interaction) and based on the results of

The spatial-temporal moving average statistic comprehends the moving mean center of M observations across time, where M is a sub-set of the total sample events, N. Basically, this moving spatial concept implies that all N observations are sequenced in order of occurrence and, hence, it is implicit that there is a time dimension associated with the sequence itself [^{©}, the default span is 5 observations. The span is centered on each observation so that there are an equal number on both sides. Because there are no data points prior to the first event and after the last event, the first few mean centers will have fewer observations than the rest of the sequence.

Quite often, spatial patterning of incidents doesn’t occur uniformly throughout the year, but instead are often clustered together during short time periods [

Toronto is the capital city of the province of Ontario and the largest city in Canada by population, with 2,731,571 residents in 2016. As a global city, Toronto is a center of business, finance, arts and culture and is recognized as one of the most multicultural and cosmopolitan cities in the world. However and like any other city, housebreakers are common. Statistically, is it possible to predict a possible location for the next flat robbery? Can GIS help local police authorities on this spatial matter? (see

Point pattern (geographical phenomena that may be reasonably modelled as

point data such as clustering vs dispersion issues) also includes the relationship strength of a certain event within two study areas over a time period. In this particular case, the exact spatial locations of each event are not important. It only matters the number of events for each region according to each occurrence time. Using an illustration of [

· M1(region1) = AVG(count_region1)/Area1

· M1(region2) = AVG(count_region2)/Area2

· M1(Product) = M1(region1) × M1(region2)

· M2 = AVG(countRegion1 × countRegion2)/(Area1 × Area2)

Historically, ambulance care quality has been measured by response times to ensure that the highest quality and most appropriate response are provided for each patient on time [

Unquestionably, to setup this kind of national standard is welcome for the continuous improvement on a better first aid help to all citizens. Yet, the geographical diversity among villages, towns and metropolis will also lead to different time responses that must be analyzed more careful rather than the simple average of the country. Moreover, certain cities are better well planned (quite often, for historical reasons) in terms of roads, bridges, tunnels and highways than others, leading to an unfair absolute comparison. Nevertheless, it is possible to assess the relationship of the ambulance response time among cities. If the performance correlation between two municipalities is negative, this simply means that the ambulance effort in one of the municipality is being deteriorated relatively when compared with the second one. As expected, some kind of a new syllabus must be accomplished to reverse this outcome in the future. Otherwise, if a positive association between both cities is found, the relative effort of both health ambulance professionals are linked to each other and no particularly need to change their present procedures (besides to continuously decrease those response emergencies times, if possible).

Covering an area of 604 Km^{2}, Madrid is the capital of Spain and the largest municipality of this country with almost 6.5 million inhabitants. It is the third-largest city in the European Union, smaller than only London and Berlin. Seville, with an area of 140 Km^{2}, is the capital and largest city of the autonomous community of Andalusia and it holds about 1.5 million people, making it the fourth-largest city in Spain. Is there any relation between the performance of ambulance response time of Madrid and Seville? (see

Several classical statements concerning the definition of GIS can be found in specialized literature expressing the idea that spatial analysis can somehow be useful. GIS can be seen as a spatial analysis engine and its main end relies on the ability to predict outcomes and, above all, to understand those [

By using myGeoffice^{©}, ten different spatial matters are addressed in this review paper and under the quantitative approach. This happens because viewing and analyzing data geographically impacts our understanding of the world we live in. Nonetheless, another classic quantitative approach concerns shoreline limits, for example, which are particularly useful for the study of global warming effects on coastal cities. Which shoreline should be adopted? Should the geographical database be dynamic and capable of tracking fluctuations? One possibility is to consider the shore slope classification and the time sea level above the height of the lowest tide. Since tides follow a deterministic formula, it is possible to calculate the exact shoreline location for a given time t [

Somehow, it is implicit that spatial analysis is a GIS component to support decision-making (one of the five GIS Ms: mapping, measurement, monitoring, modelling and management) for solving problems with a spatial component. This happens because geographic problems require true spatial thinking [

The authors declare no conflicts of interest regarding the publication of this paper.

Negreiros, J. and Diakite, A. (2019) Ten Spatial Problems with myGeoffice^{©} for Teaching Purposes. Open Journal of Social Sciences, 7, 297-317. https://doi.org/10.4236/jss.2019.77026