Spatial Multidimensional Association Rules Mining in Forest Fire Data

Hotspots (active fires) indicate spatial distribution of fires. A study on determining influence factors for hotspot occurrence is essential so that fire events can be predicted based on characteristics of a certain area. This study discovers the possible influence factors on the occurrence of fire events using the association rule algorithm namely Apriori in the study area of Rokan Hilir Riau Province Indonesia. The Apriori algorithm was applied on a forest fire dataset which contained data on physical environment (land cover, river, road and city center), socioeconomic (income source, population , and number of school), weather (precipitation, wind speed, and screen temperature), and peatlands. The experiment results revealed 324 multidimensional association rules indicating relationships between hotspots occurrence and other factors. The association among hotspots occurrence with other geographical objects was discovered for the minimum support of 10% and the minimum confidence of 80%. The results show that strong relations between hotspots occurrence and influence factors are found for the support about 12.42%, the confidence of 1, and the lift of 2.26. These factors are precipitation greater than or equal to 3 mm/day, wind speed in [1 m/s, 2 m/s), non peatland area, screen temperature in [297K, 298K), the number of school in 1 km 2 less than or equal to 0.1, and the distance of each hotspot to the nearest road less than or equal to 2.5 km.


Introduction
Forest fires are considered to be a potential hazard that causes enormous physical, biological and environmental losses.Hotspots are image pixels that may represent fires.A study on the spatial relationships between the location of hotspots occurrence and specific geographical objects near the hotspots is essential.Therefore, the possible influence factors for fires can be determined to predict the future hotspots occurrence.
Spatial data mining is a growing research area in analyzing large spatial data.It is a process to extract knowledge, spatial relationships, or the other interesting patterns not explicitly stored in spatial databases [1].In a spatial data mining system, attributes of neighbors of an object may have a significant influence on the object itself.Therefore, the discovery process for spatial data is more complex than those for non-spatial data, because spatial data mining algorithms have to consider the neighbors of objects in order to extract useful knowledge [2].
In this study, a technique in data mining, namely association rule mining, is applied to a case study.The purpose of the case study is to discover relations between hotspots occurrence and the characteristics of neighboring objects of hotspots.Pre-processing steps for spatial data were performed to prepare a dataset as the input of wellknown association rule algorithm, i.e.Apriori.The results are spatial association rules describing frequent co-occurrences between variables in the spatial database.
Some works related to mining spatial association rules are discussed in [3][4][5][6].Moreover, Berardi, et al. [7] discovered spatial association rules from a particular kind of images, namely document images.This work studied six papers, published in the IEEE Transactions on Pattern Analysis and Machine Intelligence, in the January and February 1996 issues.The rules discovery is based on the processes of layout structure extraction (layout analysis) and logical structure extraction (document image understanding).This work uses SPADA (Spatial Pattern Discovery Algorithm) [8] to generate association rules, for example [7]: is_a(A,running_head)  on_top(A,B), is_a (B,content), type_text(A), support: 90.9%; confidence: 90.9% This rule means that if a logical component (A) is a running head, then it is textual and it is on top of another layout component (B) which is a component of type content.This rule has a high support and a high confidence i.e. 90.9%.
A case study by [9] determines the existing spatial relationships between the location of incidents and specific geographical objects near the center of Helsinki.This work performed the transformation of spatial data to the transaction format such that the classic association rules algorithms can be applied to the data.Each object in the transactional file is identified by only its unique ID, geographical coordinates and the specification of object type (point, line, or polygon).The algorithm based on the Apriori algorithm was utilized to extract association rules from the transaction file.One of the rules is as follows: bars and restaurants  incidents (1.7%; 40.0%).
This rule states that an incident has occurred in a neighbourhood of 40% of all bars and restaurants within the Helsinki city center during the studied time period.
Spatial association rules of land use were extracted in the study by [10] from the land use data of Yi city in Hubei Province in China.This work used the fuzzy concept lattice method to obtain land use spatial association rules, which can offer decision supports in land suitability evaluation, classification and grading and land use planning [10].

Study Area and Forest Fires Data
The study area is Rokan Hilir district in Riau Province in Indonesia.Rokan Hilir spans an area of 8881.59Km 2 [11] or approximately 10% of Riau's total land area.Rokan Hilir is located in the western part of the north Sumatera, the southern part of Bengkalis district and Rokan Hulu district, the eastern of Dumai and the northern part of the north Sumatera and Melaka strait.The district is divided into 13 subdistricts with the total of population is 552,400 based on Population Census 2010 of the Riau Province [12].
The data used in this study are as follows: 1) Spread and coordinates of MODIS hotspots 2008.The data are provided by Fire Information for Resource Management System (FIRMS), University of Maryland, NASA, Conservation International.
2) Digital maps for road, rivers, city centers, land cover, and the administrative border from National Coordinating Agency for Survey and Mapping (BAKOSUR-TANAL), Indonesia.
3) Socio-economic data from BPS-Statistics Indonesia including inhabitant's income source, population density, and number of school per km 2 .
4) Weather data 2008 (in the NetCDF format) including screen temperature, precipitation, 10 m wind speed, and surface height.The data were collected from Meteorological Climatological and Geophysical Agency (BMKG), Indonesia.
5) Digital maps for peatland depth and peatland types provided by the Wetland International.
A MODIS hotspot/active fire is a vegetation fire, but sometimes it is a volcanic eruption or the flare from a gas well.It is detected using the MODIS (or Moderate Resolution Imaging Spectroradiometer) instrument, on board NASA's Aqua and Terra satellites [13].A MODIS hotspot represents the center of a 1 km (approximately) pixel flagged as containing one or more actively burning hotspots/fires (Figure 1).
In the study area, as many 517 MODIS hotspots were found in 2008.We create a buffer of each hotspot and then 513 points were randomly generated outside buffers as non-hotspot points.The radius of buffer is 0.90737 km as the result of Landsat TM image processing.
The spatial data as influencing factors for hotspots occurrence are stored in layers in the spatial database.There are three types of spatial features in the layers i.e. point, line, and polygon.The spatial reference system UTM 47N and datum WGS84 were assigned to all layers in the spatial database.

Data Transformation
Association rules mining requires a dataset in the transaction format which contains transaction id and item sets.Several steps were performed to create the transaction dataset from the set of layers of influencing factors for hotspots occurrence.The tools utilized in data transformation are PostgreSQL 9.1 (http://www.postgresql.org)to manage the spatial database, PostGIS 1.5 (http://www.postgis.org)to perform spatial operations, and Quantum GIS 1.7.2 (http://www.qgis.org) to analyze and to visualize spatial data.
This work applied topological and distance relationships to relate spatial objects in two different layers.The topological operation ST_Within that is available in Post GIS defines whether a point feature is located inside a polygon feature.For example, for each hotspot and nonhotspot point, we determine whether the points are inside a land cover type in which land cover objects are represented in polygon (Figure 2).The operation ST_Within was also used to relate the hotspot occurrence layer to other layers i.e. income source, precipitation, screen temperature, 10 m wind speed, peatland type and peatland depth.
Moreover, the distance function is used to calculate distance from a point (or line) to another point (or line).This work computed distance from hotspots and nonhotspots (point features) to the nearest river (line features), to the nearest road (line features), and to the nearest city centers (point features).For example, Figure 3 shows hotspot locations overlaid with road and city centers.Figure 4 shows how distance from a hotspot to every river segment is calculated and the minimum value is considered as the distance from a hotspot to the nearest river.To perform this task, the spatial operation ST_ Distance in PostGIS 1.5 was applied to calculate distance of objects to the nearest river, road, and city center.Because the Apriori algorithm requires categorical values in a dataset, the minimum distance were converted to categorical values based on the classes provided in Table 1.
Table 2 provides the number of spatial features in all layers in the forest fire database.The layers contain spa-tial objects that may influence hotspots occurrence.In order to discover associations between spatial objects and hotspots occurrence using the Apriori algorithm, each layer is related to the hotspot layer.
Relating the hotspot layer to other layers using the spatial operation ST_Within and ST_Distance results several new layers.For example, Figure 5       All new layers were integrated into a single layer by matching identifiers of objects in the hotspot layer and those in other layers.This step produced a relation that is considered as a transactional dataset for the Apriori algorithm.

Spatial Association Rules
The basic idea of mining association rules from spatial databases is similar to those from non-spatial databases (transactional or relational databases).A spatial association rule has the form A  B (s%, c%), where A and B are sets of spatial or non-spatial predicates, s% is the support of the rule, and c% is the confidence of the rule [1].Spatial association rules differ with non-spatial association rules because it may include spatial predicates such as distance information (for instance, close_to, and far_away), topological relations (for example, touch, overlap, and intersect), and spatial orientation (such as right_of, and east_of).An example of spatial association rule is as follows: x is a shopping centre  x close to a bus station  x close to a settlement area (0.5%, 75%).
The rule says that 75% of shopping centers that are close to bus stations are also close to settlement areas, and 0.5% of the data belong to such a rule.

Apriori Algorithm
The Apriori algorithm was introduced by [14] to discover frequent itemsets and association rules in a transactional dataset that have support and confidence greater than the user-specified minimum support (minsup) and minimum confidence (minconf) respectively.An association rule has the form X  Y, where X and Y are a subset I, I is a set of items, and X  Y = .The Apriori algorithm is as follows [14]: L k is a set of large k-itemsets.This set contains kitemsets that have minimum support.C k is a set of candidate k-itemsets.Itemsets in this set are potentially large itemsets.In the Apriori algorithm, the apriori-gen function has the argument L k1 i.e. the set of all large (k1)itemsets.The output of this function is a superset of the set of all large k-itemsets [14].There are three most widely-used measures for selecting interesting rules i.e. support, confidence and lift.Support and confidence of the rule A  B are defined as follows [1]: is the percentage of transaction in a transactional dataset D that contain both A and B whereas confidence(AB) is the percentage of transactions in D containing A that also contain B [1].Equation ( 2) is also stated as follows: In order to measure the correlation between A and B in the rule AB, the correlation measure Lift may be used which is computed as follows [1]: Based on the value of

 
lift , A B in Equation ( 4), the relation of occurrence of A and B is described as follows.If

Multiple Dimensional Association Rule Mining
In multiple dimensional association rule mining, association rules are discovered from a dataset which contains more than one attribute (called as a dimension).For example, in single dimension mining, we can generate a rule: buys (X, "pc tablet")  buys (X, "earphones"), whereas in a multidimensional mining, we can generate a rule: Occupation (X," IT staff") and Salary (X, "10-20K")  buys (X, "smartphone").
In this rule, occupation, salary and buys are dimensions that may have different types such as boolean, categorical and numerical.
Srikant and Agrawal [15] introduced an approach to map the quantitative association rules problem to the boolean association rules problem.The quantitative values are partitioned into intervals and then the pair < attribute, interval > is mapped to a boolean attribute [15].For example, the attribute Age can be partitioned into two intervals: 20 -29 and 30 -39.The categorical attribute correspond to <attribute, value>.For example, the attribute Married that has two values: yes and no, is replaced to the pair < Married: Yes > and < Married: No>. Figure 6 shows an example of a dataset before and after mapping to boolean association rules problem.We can apply the algorithms for mining single dimension association rule to the new dataset (Figure 6(b)).

Result and Discussion
Pre-processing steps on the spatial forest fires data result a dataset consisting of 490 records.Variables in the dataset are hotspots occurrence, distance to nearest river (dist_river), distance to nearest road (dist_road), distance to nearest city, center (dist_city), land cover (land_cover), income source (income_source), population density (population), number of school per km 2 (school), precipitation in mm/day (precipitation), screen temperature in k (screen _temp), 10m wind speed in m/s (wind_speed), peatland type (peatland_type), and peatland depth (peatland_ depth).The Apriori algorithm which is available in the statistical computing tool R (http://www.r-project.org/) was executed on the dataset and it generated 2981 association rules.The purpose of this study is to find possible factors that strongly influence hotspots occurrence.Therefore for further analysis, we only study association rules that include hotspots occurrence.There are 324 rules or about 10.87% containing hotspots occurrence generated from the dataset with the minimum support of 10% and the minimum confidence of 80%.
For the support value greater than or equal to 25%, weather variables and socio-economic variables occur with hotspots in the study area.The support of 25% means that 123 transactions out of 490 transactions support the association rules.The weather variables included in the rules are precipitation  3 mm/day and screen temperature = [297˚K, 298˚K) whereas the socio-economic variables appeared in the rules are population density ≤ 50 and number of school in 1 km 2 ≤ 0.1.
The physical environmental factors including dist_city = (7 km, 14 km], dist_river ≤ 1.5 km, dist_road ≤ 2.5 km, and land_cover = Plantation occur together with hot spots in the rules that have the support less than 25%.Moreover, hotspots appear in non-peatland and in the area where inhabitant's income source is plantation.
According to the rule 5, hotspots occur in locations in which the precipitation is greater than or equal to 3 mm/ day.The distance between the locations to nearest city centers is greater than 7 km and less than 14 km.As many 99 transactions out of 490 transactions (20.20%) support this association.The rule 6 means that hotspots were found in non-peatlands with the range of 10 m wind speed is [1 m/s, 2 m/s).
Figure 7 shows the scatter plot for 324 association rules containing the item hotspot_occurrence = Yes.Each point in the plot represents a rule.Support and lift are used for the x-axis and y-axis respectively while the color of the points is used to indicate the confidence level of the rules.
The rule in the bottom right in Figure 7 has the highest support i.e. 44.4898%.There are 24 rules in the top left corner with the highest lift of 2.258065 and the highest confidence of 1.In the average, these rules are supported by 12.4149667% of records in the dataset.In addition to hotspots occurrence, the rules include other factors that are considered as influencing factors for fire events.These factors are precipitation greater than or equal to 3 mm/day, wind speed in [1 m/s, 2 m/s), non peatland area, screen temperature in [297K, 298K), the number of school in 1 km 2 less than or equal to 0.1, and the distance of each hotspot to the nearest road less than or equal to 2.5 km.

Summary and Future Work
This paper discusses the application of the association rule algorithm to discover strong relationships among hotspots occurrence and other geographical objects for forest fires.Pre-processing steps were conducted on the spatial forest fire dataset in order to prepare a task relevant dataset for the Apriori algorithm.Two types of spatial relationships namely topological and metric were applied to relate a spatial feature to other spatial features.
Our analysis with the minimum support of 25% and the minimum confidence of 80% shows strong relations among hotspot occurrence, weather variables, and socioeconomic.Hotspots mostly occur in less-populated areas with population density being less than or equal to 50 and number of schools per km 2 being less than or equal to 0.1.The precipitation when the hotspots occur is greater than or equal to 3 mm/day and the interval for screen temperature is [297˚K, 298˚K).
The association among hotspots occurrence and physical environmental factors was discovered for the support greater than 10% and less than 25%, and the mini-mum confidence of 80%.Hotspots were found not far from rivers and roads where the distance of the hotspots to the nearest river and road was less than or equal to 1.5 km and 2.5 km, respectively.Areas where hotspots found are covered by plantation and thus inhabitant's income source is plantation.
In future work, we intend to investigate how negative association rules algorithms may be applied on the forest fire dataset to discover strong relations between geographical objects and locations where hotspots are not probably occurred.

Figure 2 .
Figure 2. Hotspot locations overlaid with the land cover layer.
shows the relations as the representation of layers.Each relation has the attribute the_geom which stores the geometry type of spatial features.The new layer (c) is obtained by applying the spatial operation ST_Within to define whether points in the hotspot layer (a) are inside polygons in the land cover layer (b) or not.

Figure 3 .
Figure 3. Hotspot locations overlaid with road and city centers.

Figure 4 .
Figure 4. Hotspot locations overlaid with the river layer.

Figure 5 .
Figure 5.A new layer (c) as the result of relating the hotspot layer (a) and the land cover layer (b).


lift ,  A B is greater than 1, then A and B are positively correlated meaning that the occurrence of A implies the occurrence of B. If  ,  lift A B is less than 1, then A and B are negatively correlated.A and B are independent if  t ,  lif A B is equal to 1.It means that there is no correlation between A and B [1].

Figure 6 .
Figure 6.A dataset before (a) and after (b) mapping to boolean association rules problem (Srikant and Agrawal, 1996).

Figure 7 .
Figure 7. Scatter plot for 324 association rules containing the item hotspot_occurrence = Yes.