Ten Spatial Problems with myGeoffice © for Teaching Purposes

For a long time, Geography did not hold a specific mathematical approach for any interpretation of space and this was the key reason why Geography degrees covered a wide variety of subjects such as demography, geology or to-pography to fulfill its curriculum. Yet from the 90’s, Geography finally created its own research agenda to meet four vital questions of any true geo-grapher: “Where is …?”, “Is there a general spatial pattern?”, “What are the anomalies?” and “Why do these phenomena pursue certain spatial distribu-tion?” The present review article addresses ten different spatial (point, regression and event) issues for learning and teaching aim where statistics play a major background role on the outcomes of myGeoffice © free Web GIS plat-form. These include cluster analysis, geographically weighted regression (GWR), ordinary least squares (OLS) regression, path analysis, minimum spanning tree, linear regression, space-time clustering and point patterns, for instance. Although the technical viewpoint of the algorithms is not explained at fully, this review paper makes a rather strong emphasis on the result’s interpretation, their respective meaning and when these techniques should be applied in a learning/teaching context.


Statement of the Problem
If Geography did not exist, it had to be invented. In fact and according to [1], it had been estimated that 80% of the informational needs of local government policymakers are related to geographic location. As a result, all F1-F6 levels of of burglaries events in Toronto, Canada, and point pattern correlation between Madrid and Seville, Spain, regarding response time of ambulances). The ten issues presented here follow this framework. Based on different spatial real data from Portugal, Brazil, USA (Ohio), Japan, Philippines, China (Hangzhou), Canada (Toronto) and Spain (Madrid and Seville), myGeoffice © techniques were applied on a kind of democratization of mapping and analysis of the world around us. After all, if Geography is key to understand spatial daily changes, the Internet became the platform. Yet, it is GIS analysis allows us to evaluate suitability and capability, estimating and predicting, interpreting and understanding those changes [6].
Again, it is expected that readers may comprehend and take the below ten spatial problems into their classroom for discussion and construct new knowledge in a spatial way of thinking. The reader must understand that Web GIS, in general, and myGeoffice © , in particular, is more than just mapping software that is running online. It is actually a partial statistical software for discovering, consuming and sharing geographic data to fulfill particular objectives. In the end, the goal of this review article are to open the reader's eyes to what is now possible with Web GIS and, since the internal algorithm's geocomputation was forgotten here, the basics of Web GIS are easy, engaging and, above all, now accessible to everyone, not just for the experts. Quoting [6], if we think of Geography as the ultimate organizing principle for the planet, then Web GIS is the operating system.

Is There Any Pattern among Hospital Costs Regarding Ovarian Female Cancer Expenses in the Twelve Districts State Hospitals of Portugal?
Cluster analysis classifies a set of observations into two or more mutually exclusive unknown groups with the aim of data reduction, that is, to organize a system of data observations into groups where members of those groups share common properties [7]. Globally, this technique tries to find groups with a high intra-class similarity and a low inter-class one. For instance, the practitioner may be able to form clusters of customers who have similar buying habits in specific supermarkets for marketing geo-segmentation purposes. Identify groups of houses according to their type, value and geographic location based on their credit history is another example [8]. Yet and under the GIS perspective, this cluster procedure is considered aspatial since the geographical component is not used by this algorithm [9].
Portugal is a country located mostly on the Iberian Peninsula in southwestern

What Is the Optimal Route Path between Ribeirao Preto and Other Seven Brazilian Cities?
Dijkstra path analysis, an algorithm conceived by the Dutch scientist Edger Dijkstra in 1959, is a graph search procedure that solves the single-source shortest path problem for a graph with non-negative path costs. If the vertices of the graph represent cities and the link costs represent, for example, driving distances between pairs of cities connected by a direct road, Dijkstra's algorithm can be used to find the shortest route between one city and the remaining ones [9]. As a result, this shortest path first is widely used in network routing protocols, most

How Can I Connect the Previous Eight Brazilian Towns (With Computer Private Cables, for Instance) Based on a Minimal Cost Strategy?
Kruskal is a graph algorithm that finds the minimum spanning tree for a connected weighted graph. Specifically, it finds a subset of edges (cities) within a connected tree (roads between towns) where the total connection weights (liters of gas, time or space distances, for example) among all edges of the present tree are minimized. If the graph is not all connected, as expected, it finds a minimum spanning tree for each connected component [9].
Kruskal has been in use in many real problems. [10], for instance, report that alteration of power network topology is often required to meet important objectives, such as restoring connectivity, minimizing power losses, maintaining stability and maximizing power transfer capability. This may be achieved by switching circuit breakers devices in the power network, including the electrical distribution reconfiguration of the network. For that, these academics used Kruskal's maximal spanning tree algorithm to search for the optimal network topology and to optimally convert an interconnected meshed network into a radial system to achieve best operational characteristics, cost and control [10].
By considering the same settings of the previous sub-section, the present objective becomes to connect the seven branches and the headquarters with a private computer data cable for internal private data use of this fictitious firm (see Figure 3).

What Can GIS Tell Me about the Spatial Distribution of Lung Man Cancers in Ohio, USA?
Geographical Weighted Regression (GWR) is one of several spatial regression techniques that provides a local model of the variable the researcher are trying to understand/predict (dependent variable) by fitting a regression equation (based on one or several independent variables) to every feature in the spatial dataset [11]. However, the novelty of this GIS approach regards the estimation for each location (see Figure 4) by only depending on their closer space neighbors and defined by the Kernel bandwidth (respecting First Law of Tobler, that is, everything is related to everything else but near things are more related than distant things).
[12], for instance, use GWR to examine the spatial effect of urbanization, energy intensity, energy structure and income on HCE (household CO 2 emissions). Their results indicate an obvious spatial effect on carbon emissions in the studied provinces whose urbanization impact presented an increasing trend   Ohio state is composed of 88 counties whose lung man cancers has shown atypical values when compared with the remaining 49 states of the USA (unsurprisingly, a particular sensitive issue for health insurances and local hospitals). Is it possible to infer any particular spatial trend, pattern and outlier from the present Ohio situation? For illustration purposes, the present spatial dataset holds 88 records and five variables: Latitude and longitude of each county, number of man lung cases in 1988 for each county (dependent variable), white and black population at risk (independent variables).
Within GWR, each Beta parameter of each local estimation (y i = β 0 + β 1 x1 i + β 2 x2 i + ε i ) has a sign and a magnitude according to each location (the essence of spatial heterogeneity, that is, the structure of the model changes from place to place across the study area as the parameter estimates change towards each other inside the model). If the sign is positive, an increase of the variable value to which the parameter refers will induce an increase in the dependent variable. If the sign is negative, a decrease will be induced. As stated by [13], parameter estimates for a variable that are close to zero often tend to be spatially clustered indicating that in these sub-regions of the study area, changes in this variable do not influence changes to the dependent variable. As well, large Betas imply a major influence of the independent variable over the dependent one. Consequently, this is potentially interesting and encourages further curiosity about the process, the data, the model and the outcome. At last, if a high value of the intercept Beta0 is found, this can be interpreted as the lacking of significant other variables in the final model such as number of real smokers, age, smoking habits (cigarette types, number of years and daily frequency of smoking), health status or air pollution.
From Figure 5 and as expected, it can be inferred that each GWR parameter has a sign (positive or negative) and a value according to each location. For instance, county one (Adams) with 13 lung cancer cases (dependent variable) and 12,443 and 6316 population at risk (white and black man, respectively) had a GWR estimation of 11.94 man lung cancers. The local spatial regression can, thus, be written as LungCancersAdams = 3.826991 + 0.000636 × WhiteRisk + 0.001631 × BlackRisk (it seems that in this particular county, BlackRisk contributes to the dependent variable more than the double when compared with WhiteRisk factor).
One of the basic statistics assumptions of regressions concerns the location of residuals (positive and negative) that should layout in a random way. For the present spatial dataset, these residuals are displayed on the left bottom image of Figure 6. Still, some clustering of over and/or under predictions become evident which means that the investigator is missing, at least, one key explanatory variable of the initial model.
By examining the coefficient raster surface produced by GWR (to better understand the regional variation of the independent variables), it is, hence, possible to examine the spatial consistency (stationarity) relationship between the dependent (LungCancers) and both explanatory variables (WhiteRisk and

May I Generate a Prediction Linear Model for the Lung Man
Cancers in Ohio, USA, with myGeoffice © ?
The conventional Ordinary Least Squares (OLS) multiple regression focus on the relationship between a dependent variable (Y = LungMaleCancers) and one or more independent ones (X1 = WhiteRisk and X2 = BlackRisk, in this particular case) such as Y = Beta0 + Beta1 × X1 + Beta2 × X2 (according to Figure 7 results, the prediction model becomes Y = 2.340396 + 0.000694 × X1 + 0.001549 × X2). Under the OLS framework and according to [14],     leading, consequently, to the inference of H0 rejection. As expected and confirming the strong linear relationship of the right graphs of Figure

What Is the Chance of a Level 6 or Higher Earthquake to Happen in One of the Nine Most Prone Japanese Cities in the Next 3 Decades?
The binomial distribution is used for discrete random variables and it is appropriate when an event has only two possible outcomes: success (p probability of first outcome) or failure (q = 1 − p, that is, likelihood of second outcome). Recalling that the odds of all outcomes should sum to one, it is, thus, mandatory that p + q = 1 (p represents the possibility of a car being robbed in a particular city, for instance, while q equals the complementary likelihood of not being stolen).
As stated by [15], a random variable is considered binomial if the following Yokohama, Nagoya, Shizuoka, Obihiro, Kushito and Nemuro are highly prone cities to get a level 6 quake in the next 30 years. Chiba, for example, holds an 85% probability of having a level 6 or higher quake, Kushiro has a 69% while Nagoya presents a 46% of likelihood for this kind of natural disaster. Since Obihiro presents the minimum probability (22%) of being devastated by a level 6 or higher quake for the next 3 decades, this likelihood was the one adopted for this problem (see Figure 9).

What Is the Probability of Having between 5 and 7 Typhoons Every Year in the Philippines?
The Poisson distribution is applicable for count variables and it played a key function in experiments that had a historic role in the development of molecular biology [18].  . This computation can warn us, for instance, that the likelihood of three of those 9 Japanese cities of having a quake of level 6 or higher for the next 30 years is close to 20.14%. Figure 10. As expected, the sum of these first one hundred probabilities supplied by myGeoffice © is equal or quite close to one.
Globally, the Knox Index is a comparison of the relationship between incidents in terms of space and time, where each individual pair is compared in terms of the Euclidean distance and in terms of the time interval. Since each pair of points is being compared, there are N × (N − 1)/2 pairs, where N equals the total number of events (12 in this particular case study). Internally, the distance between events is divided into two groups-"Close in distance" and "Not close in distance" whereas the time interval between events is likewise divided into two similar groups-"Close in time" and "Not close in time" [19].
Hangzhou is the capital of Zhejiang Province, China, and the local political, economic and cultural center. This metropolis is located on the lower reaches of the Qiantang River in southeast China, a superior position in the Yangtze Delta and only 180 km from Shanghai. On March 19, 2018, twelve motorbikes crashes with other motorized vehicles in this beautiful city. Can we statistically hypothesize about space and time clustering of these events on that particular day?
Since the Knox Index is a one-tailed test that follows a Chi-square distribution (a high value is an indicative of spatial interaction) and based on the results of Figure 11, it is possible to statistically affirm that there is a spatial-time clustering of these crashes events for this date in Hangzhou and, thus, the null hypothesis of a random distribution of these crashes between space and time should be rejected.

Where Will Be the Next Flat Burglary?
The spatial-temporal moving average statistic comprehends the moving mean center of M observations across time, where M is a sub-set of the total sample events, N. Basically, this moving spatial concept implies that all N observations are sequenced in order of occurrence and, hence, it is implicit that there is a time dimension associated with the sequence itself [19]. The M observations are called the span and, under myGeoffice © , the default span is 5 observations. The span is centered on each observation so that there are an equal number on both sides. Because there are no data points prior to the first event and after the last event, the first few mean centers will have fewer observations than the rest of the sequence.
Quite often, spatial patterning of incidents doesn't occur uniformly throughout the year, but instead are often clustered together during short time periods [19]. At certain occasions, a rash of incidents will occur in certain neighborhoods and the police often have to respond quickly to those events. In other words, there is both clustering in time as well clustering in space. For instance, [20] reports that the New York City Department of Health and Mental Hygiene has operated an emergency department syndromic surveillance system since 2001, using temporal and spatial scan statistics run on a daily basis for cluster detection. Though simple, this technique is very useful for detecting changes in behavior by serial offenders [19].  on this spatial matter? (see Figure 12).

Is There Any Correlation between the Average Response Time of an Ambulance between Madrid and Seville, Spain?
Point pattern (geographical phenomena that may be reasonably modelled as  Historically, ambulance care quality has been measured by response times to ensure that the highest quality and most appropriate response are provided for each patient on time [21]. Quite often, ambulance services are measured on the time it takes from receiving an emergency call to the time the patient arrives in the Emergency Department. For instance, in July 2017 and under the network of hospitals in England, Category 1 ambulance calls are those that are classified as life-threatening and needing immediate intervention and/or resuscitation. Their national standard sets out that all ambulance trusts must respond to Category 2 calls (the next emergency level) in 18 minutes on average and respond to 90% of Category 1 calls in 40 minutes [22].
Unquestionably, to setup this kind of national standard is welcome for the continuous improvement on a better first aid help to all citizens. Yet, the geographical diversity among villages, towns and metropolis will also lead to different time responses that must be analyzed more careful rather than the simple average of the country. Moreover, certain cities are better well planned (quite often, for historical reasons) in terms of roads, bridges, tunnels and highways than others, leading to an unfair absolute comparison. Nevertheless, it is possible to assess the relationship of the ambulance response time among cities. If the performance correlation between two municipalities is negative, this simply means that the ambulance effort in one of the municipality is being deteriorated relatively when compared with the second one. As expected, some kind of a new syllabus must be accomplished to reverse this outcome in the future. Otherwise, if a positive association between both cities is found, the relative effort of both health ambulance professionals are linked to each other and no particularly need to change their present procedures (besides to continuously decrease those response emergencies times, if possible).
Covering an area of 604 Km 2 , Madrid is the capital of Spain and the largest municipality of this country with almost 6.5 million inhabitants. It is the third-largest city in the European Union, smaller than only London and Berlin. Seville, with an area of 140 Km 2 , is the capital and largest city of the autonomous community of Andalusia and it holds about 1.5 million people, making it the fourth-largest city in Spain. Is there any relation between the performance of ambulance response time of Madrid and Seville? (see Figure 13). Figure 13. Based on the below dataset concerning ambulance response time in 2014, it can be deduced that there is a positive correlation between both Spanish cities, in spite of the average response ambulance time is higher in Madrid than in Seville.

Conclusions
Several classical statements concerning the definition of GIS can be found in specialized literature expressing the idea that spatial analysis can somehow be useful. GIS can be seen as a spatial analysis engine and its main end relies on the ability to predict outcomes and, above all, to understand those [23]. GIS is simultaneously the telescope, the microscope, the computer and the Xerox machine of regional analysis and the synthesis of spatial data [24]. GIS is a specific class of information systems designed to capture, storage, manipulate, retrieve, analyse and display all forms of geographically referenced data and information [25]. Certainly, GIS is not a mapping database.
By using myGeoffice © , ten different spatial matters are addressed in this review paper and under the quantitative approach. This happens because viewing and analyzing data geographically impacts our understanding of the world we live in. Nonetheless, another classic quantitative approach concerns shoreline limits, for example, which are particularly useful for the study of global warming effects on coastal cities. Which shoreline should be adopted? Should the geographical database be dynamic and capable of tracking fluctuations? One possibility is to consider the shore slope classification and the time sea level above the height of the lowest tide. Since tides follow a deterministic formula, it is possible to calculate the exact shoreline location for a given time t [26].
Somehow, it is implicit that spatial analysis is a GIS component to support decision-making (one of the five GIS Ms: mapping, measurement, monitoring, modelling and management) for solving problems with a spatial component.
This happens because geographic problems require true spatial thinking [27].