A Novel Approach for Sugarcane Yield Prediction Using Landsat Time Series Imagery : A Case Study on Bundaberg Region

Quantifying sugarcane production is critical for a wide range of applications, including crop management and decision making processes such as harvesting, storage, and forward selling. This study explored a novel model for predicting sugarcane yield in Bundaberg region from time series Landsat data. From the freely available Landsat archive, 98 cloud free (<40%) Landsat Thematic Mapper (TM) and Enhanced Thematic Mapper (ETM+) images, acquired between November 15th to July 31st (2001-2015) were sourced for this study. The images were masked using the field boundary layer vector files of each year and the GNDVI was calculated. An analysis of average green normalized difference vegetation index (GNDVI) values from all sugarcane crops grown within the Bundaberg region over the 15 year period identified the beginning of April as the peak growth stage and, therefore, the optimum time for satellite image based yield forecasting. As the GNDVI is an indicator of crop vigor, the model derived maximum GNDVI was regressed against historical sugarcane yield data, which showed a significant correlation with R2 = 0.69 and RMSE = 4.2 t/ha. Results showed that the model derived maximum GNDVI from Landsat imagery would be a feasible and a modest technique to predict sugarcane yield in Bundaberg region.


Introduction
Sugarcane (Saccharum spp.L.) is a widely grown semi-perennial crop that plays a major role in global agricul-ture for the production of sugar and other by-products/co-products (bioethanol, rum, bagasse, fertilizer, filter muds etc.) [1].More than twenty eight million hectares of agricultural land is devoted globally to growing sugarcane, producing approximately 1.7 trillion tonnes of raw sugar each year [2].Accurate and timely prediction of yield offers the global sugar industry improved efficiency and profitability by supporting decision making processes such as crop harvesting scheduling, marketing, milling and forward selling strategies.Currently, the in-season estimation of yield is undertaken using visual or destructive sampling techniques by either growers or mill funded productivity officers.However, this method is labour intensive with accuracies influenced by varied seasonal climatic conditions, crop age due to an extended harvest period and human error.Commercially available harvester mounted yield monitoring devices operated in different sugarcane growing regions [3] also offer varied degrees of yield mapping accuracy, but data is only available post-harvest and therefore not available to support in-season decision making.As an alternative, remote sensing technologies have been evaluated in recent years as a more accurate and cost effective method of sugarcane yield prediction [4].
Time series of satellite-based remote sensing images offer a unique opportunity to assess land cover dynamics and monitoring productivity of agricultural crops at varying spatial and temporal resolutions [5].Numerous studies have reported the potential of time series observations of satellite images at different resolution such as AVHRR (Advanced Very High Resolution Radiometer) [6]- [8], MODIS (Moderate Resolution Imaging Spectroradiometer) [9]- [11], MERIS (Medium Resolution Imaging Spectrometer) [12]- [14], SPOT (Satellite Pour l'Observation de la Terre) [15]- [17] and Landsat TM/ETM+ [18]- [20], for monitoring vegetation, detecting land cover change, identifying crops, mapping seasonal patterns, crop rotations and crop yield prediction.In Australia, Lee-Lovick and Kirchner [21] first examined the potential benefits of remote sensing for sugarcane and subsequently only a few studies have reported the use of time series observation of satellite imagery for sugarcane yield prediction [1] [22].Although, in recent times, regional yield forecasts from single within -season remote sensing observations (SPOT 5) has produced high accuracies [4], the timing of image capture and the influence of weather condition and other stresses (e.g., disease or nutrient deficiencies) has resulted in some incidences of high inaccuracy.
Sugarcane production depends on several environmental factors that vary seasonally and annually and the complex interactions among them.The advantage of time series satellite observations is that it provides a more accurate indication of crop response over time, whether it be the result of environmental changes occurring from rainfall distribution, drought, nutrient deficiency and other related factors or different human management practices such as fertilizer application, pests, weeds and diseases control etc. [9] [16] [23].Moreover, time series observation can clearly define the time of year when the regional crop reaches its maximum growth vigour, and as such indicate when an image should capture for future single image based yield forecasting.Therefore, time series observation of satellite imagery has been suggested as a more appropriate method for achieving accurate yield forecasting in sugarcane industry [1] [5] [9] [22] [23].
In addition to a pure remote sensing approach, several researchers have integrated crop models when predicting sugarcane yield.Duveiller et al., [1] estimated sugarcane yield at the regional scale in Brazil using fraction of absorbed photosynthetically active radiation (fAPAR) derived from the VEGETATION sensor of SPOT 4 satellite images.Mulianga et al., [9] reported a time integral and spatial aggregation of normalized difference vegetation index (NDVI) data to forecast sugarcane yield in western Kenya.On Reunion and Guadeloupe islands, Bégué et al., [23] compared SPOT time series acquired maximum NDVI values and integrated NDVI values to estimate yield and found comparable RMSE of 13.2 and 15 t/ha respectively.Morel et al., [5] compared integrated NDVI method, Kumar-Monteith efficiency model and forced coupling MOSICAS model to estimate sugarcane yield and revealed that the remote sensing based integrated NDVI method can provide the most accurate estimation of sugarcane yield on Reunion Island.In another study, Morel et al., [22] successfully applied the force coupling technique using the fraction of intercepted photosynthetically active radiation (fIPAR) derived from time series SPOT satellite to MOSICAS model to improve the sugarcane yield forecasting technique.Although remote sensing data coupled with crop models produced reasonably satisfactory results under experimental conditions in different studies, the main limitation of these models under on-firm conditions is the ability to accurately incorporate the influences of abiotic and biotic influences on crop production [22].
The normalized difference vegetation index (NDVI) from satellite imagery is one of the most widely used vegetation indices to determine crop phenology, biomass, and productivity in spatial resolution.Although it has been shown to be effective in reducing the influences of atmospheric attenuation and shading [11], it is susceptible to saturation at higher leaf area index (LAI), an occurrence that regularly occurs in large biomass crops such as sugarcane.To overcome these issues green normalized difference vegetation index (GNDVI) has been employed in several studies to ascertain the temporal differences and to predict sugarcane yield [4] [24].Sequential observations of GNDVI can provide seasonal crop profiles that show the progression of crop canopy from emergence to senescence.These profiles reflect the crop performance based on environmental factors and are related to the final crop yields.
The objective of this study was to develop a novel model from time series Landsat TM/ETM+ data to predict sugarcane yield in Bundaberg region.The seasonal growth profiles over a period of 15 growing seasons were developed using time series GNDVI values derived from imagery captured between mid-November to July each year.The annual timing of peak growth was identified, therefore indicating the optimal timing for future yield prediction from a single image capture.The model derived highest GNDVI values were regressed against historical sugarcane yield data to estimate sugarcane yield at the regional level.In a recently cited article of Robson, et al. [25] average GNDVI from February to April produced a significant correlation (R 2 = 0.91) to predict sugarcane yield in Bundaberg region for the six consecutive years from 2010 to 2015.Therefore, this study also compared the average GNDVI technique and model derived maximum GNDVI technique to predict sugarcane yield using the data from 2001 to 2015.

Study Area
The study was carried out at Bundaberg cane growing region (Figure 1) located in the South East of Queensland in Australia.The area is located between longitudes 151.91˚E and 152.49˚E, and latitudes 24.51˚S and 25.13˚S, covering an area of 20,700 ha.The climate of Bundaberg is subtropical with long hot summers and mild winters.The soil type in this area varied enormously due to climate, substance of parent material and topography.The mean annual rainfall of the area was recorded to 964 mm in 2015 [26].

Remote Sensing Data
In this study, 98 Landsat TM/ETM+ L1T images from 2001 to 2015 with cloud cover less than 40% over Bun- Images acquired between mid-November and July each year were selected as these were identified to encompass the sugarcane growth period in that region.Satellite images were downloaded from the US Geological Survey National Center for Earth Resources Observation and Science via the GLOVIS data portal (http://glovis.usgs.gov/).
Image pre-processing is one of the important steps to make all the images similar or nearly similar so that they can be considered to be captured in the same environmental conditions and by the same sensors [27].Digital image processing software ENVI 5.1 and ArcGIS 10.2 were used for the image processing, analysis, and spatial data integration.In this study, all the Landsat images were geometrically referenced to UTM projection system "WGS 1984 UTM zone 56N" to match with the supplied land cover boundary images.Radiometric normalization was used for the acquired images to reduce reflectance variations between image dates due to atmospheric conditions and surface anisotropy.For the bulk corrections of atmospheric effects simple dark-object subtraction (DOS) method was applied [28].Notably minor cloud, haze, shadow or bad data pixels were not considered while processing the images Sugarcane boundary vector layers for each year were sourced from Bundaberg Sugar Limited.To ensure that the extracted spectral information was specific to sugarcane only, a 10 m internal buffer was applied to each field boundary.All the Landsat images were masked using these boundary layers and the green normalized difference vegetation index (GNDVI) derived, using the following equation [29] NIR GREEN where, R NIR and R GREEN are the reflectance measured in the near infrared and green spectral bands.

Model
All available average GNDVI data from the 15 year period were plotted against the date of image acquisition and a quadratic model was fitted to visualize the progression of sugarcane crop growth.This model identified when maximum vigour of the annual crop was achieved, via the peak of the quadratic curve and, therefore, indicates when images should be captured to get maximum vigour of crop and ultimately predict yield in the season.The vertex form of the quadratic model as shown in Equation ( 2) was used to shift the vertical axis of the curve according to the acquired GNDVI value in maximum vigour period of a specific year.
( ) Here, "a" is a value in the curve that indicates the curvature of the line, the sign (±) on "a" tells whether the quadratic opens up or opens down, the sign (+) indicates that the quadratic opens up and the sign (−) indicates the quadratic opens down; h is the horizontal shift of the curve from x = 0, for any standard quadratic equation

Statistical Analysis
Linear regression analysis was performed to evaluate the relationship between the model derived maximum GNDVI value and sugarcane yield.The fifteen yield forecasts from 2001 to 2015 and yield observations at harvest were evaluated and compared using root means square error (RMSE).
( ) ( ) Here, actual means actual yield data (t/ha), predicted means predicted yield data (t/ha) from measured GNDVI values and n is the number of observation.All analysis were performed using Microsoft Excel (version 10).

Time Profile of Sugarcane GNDVI
The average GNDVI time profile of the Bundaberg sugarcane growing region as a function of image acquisition date of year from 2001 to 2015 is illustrated in Figure 2.Although less number of cloud free images were obtained from November till March in most of the growing season, the evolution of GNDVI profile shows a distinct growth phases such as tillering, vegetative development and the senescence of sugarcane crop.Some infrequent variations in GNDVI values are related to the phenology of the crop and climatic conditions which was reported by Bégué et al. [23] as well.

Quadratic Model
All available GNDVI data over the 15 year period (2001 to 2015) were plotted against image acquisition date of year (Figure 3).A quadratic model was identified to best represent the growth profile of sugarcane crop with R 2  = 0.72.The vertex form of the model shown in Figure 3 indicates that the GNDVI value reaches to its peak after 145 days of starting date (15 th November) and about three months before harvest.This result is consistent with the previous findings [30] [31], whom reported that the best acquisition period of satellite images is about two months prior to the beginning of harvest for the prediction of sugarcane yield.The highest average GNDVI value from the model is 0.58.

Determination of Highest GNDVI from the Model
The model derived GNDVI values were plotted over the calculated GNDVI values from Landsat images (Figure 4).The model was shifted vertically in each year to pass through the calculated GNDVI value acquired near or at the maximum vigour period of sugarcane.The highest GNDVI value from the model was regressed against the final average crop yield measured in that year.In Figure 3

Prediction of Sugarcane Yield
A scatter plot of model derived maximum GNDVI against the annual harvested yield in terms of tonnes of cane per hectare (t/ha) from 2001 to 2014 is shown in Figure 5.The data point of 2013 is excluded, due to an extensive flood during 2013 that prevented the harvest of around 40% of final yield.A linear relation with good agreement (R 2 = 0.69 and RMSE = 4.2 t/ha) was established between the model derived maximum GNDVI and the annual harvested yield (t/ha).The annual harvested yield of 2015 (87.3 t/ha) is used as a model validation data, which showed an overestimation of predicted yield by only 3.54 t/ha in that year.
Previous research by Robson et al. [25] identified that the time series method can produce a more accurate prediction than a single capture SPOT 5, when GNDVI values derived between February and April were averaged.Therefore, the average GNDVI data from the imagery acquired from February to April for all years were also regressed against the annual harvested yield according to Robson et al. [25] (Figure 6).Here also the data point of 2013 was not considered due to extensive flooding.The linear correlation was found with R 2 = 0.48 and RMSE = 5.46 t/ha.The annual harvested yield of 2015 is over estimated by 5.30 t/ha using the average February to April GNDVI data.The maximum GNDVI extracted from all Landsat images acquired around the critical April period for each year was also regressed against the annual harvested yield data (Figure 7), producing a correlation of R 2 = 0.58  and RMSE = 4.89 t/ha.It is likely that the regression coefficient was slightly less than that achieved from the quadratic model as the timing of image capture for each year ranged between March 11 to May 9 and therefore not within the peak, early April growth period.Using the algorithm from this linear relationship, the predicted annual harvested yield for 2015 is over estimated by 5.09 t/ha.Note that in 2015, the maximum GNDVI was extracted from the image of 24 th of April, which is very near to the optimum period of image capture.

Conclusions
In terms of yield forecasting, the maximum crop vigouror GNDVI value was historically achieved at 145 days from planting i.e. early April.This period is suggested as the optimal growth stage for the acquisition of satellite imagery to be used for regional yield forecasting.The development of a quadratic model that accurately depicts the growth profile of sugarcane has provided the opportunity for the maximum GNDVI to be calculated from imagery captured between November and March.Not only does this create the opportunity for predictions to be made earlier in a current growing season, but in the event that consistent cloud cover prevents the capture of a new image during the critical April period, the maximum GNDVI value for that season can still be calculated.
Although results from this study are highly encouraging, additional research is required to model temporal sugarcane growth across other domestic and global growing regions, as the influence of environmental conditions and cropping practices will likely vary the quadratic relationship between GNDVI and yield.

Figure 1 .
Figure 1.Location of sugarcane grown area on Bundaberg in Queensland.

2 y
k is the vertical shift of the curve from x = 0, for any standard quadratic equ-

Figure 2 .
Figure 2. Time series of Landsat GNDVI data over 15 year's period (2001 to 2015) illustrating the sugarcane crop cycles in Bundaberg region.The images were captured only during the growing period from mid-November to July each year for sugarcane crop.

Figure 3 .
Figure 3.The average GNDVI data from 2001 to 2015 in the growing period of sugarcane (mid-November to July) against image acquisition date.Image acquired in the same date of different years were averaged to apt the trend line in the data.
, the GNDVI graph for Year 2001, 2005, 2009 and 2014 are shown as examples.From the model, the maximum GNDVI values for those years were 0.56, 0.58, 0.55 and 0.58 respectively.Although the GNDVI values at the senescent stage in 2009 and 2014 show higher values, the calculated GNDVI values near maximum vigour were considered to match with the model derived GNDVI values.

Figure 4 .
Figure 4.The model derived GNDVI values (shown as lines) plotted over the calculated GNDVI values from Landsat images (shown as dots) for year 2001, 2005, 2009 and 2014 as examples.The calculated GNDVI values near maximum vigour of sugarcane were matched with the model derived GNDVI to get the maximum GNDVI.

Figure 5 .
Figure 5.The scatter plot of model derived GNDVI Vs annual harvested yield (t/ha) from 2001 to 2014.Data from 2013 was excluded, as the crops were not totally harvested in that year due to extensive flood.

Figure 6 .
Figure 6.The scatter plot of average GNDVI (February to April) Vs annual harvested yield (t/ha) from 2001 to 2014.Data from 2013 was excluded, as the crops were not totally harvested in that year due to extensive flood.

Figure 7 .
Figure 7.The scatter plot of maximum GNDVI from all available captured images Vs annual harvested yield (t/ha) from 2001 to 2014.Data from 2013 was excluded, as the crops were not totally harvested in that year due to extensive flood.The image capture dates for each of the different years are shown as shadow text.