Employing Canopy Hyperspectral Narrowband Data and Random Forest Algorithm to Differentiate Palmer Amaranth from Colored Cotton

Palmer amaranth (Amaranthus palmeri S. Wats.) invasion negatively impacts cotton (Gossypium hirsutum L.) production systems throughout the United States. The objective of this study was to evaluate canopy hyperspectral narrowband data as input into the random forest machine learning algorithm to distinguish Palmer amaranth from cotton. The study focused on differentiating the Palmer amaranth from cotton near-isogenic lines with bronze, green, and yellow leaves. A spectroradiometer was used to acquire hyperspectral reflectance measurements of Palmer amaranth and cotton canopies for two separate dates, December 12, 2016, and May 14, 2017. Data were collected from plants that were grown in a greenhouse. The spectral data were aggregated to twenty-four hyperspectral narrowbands proposed for study of vegetation and agriculture crops. Those bands were tested by the conditional inference version of random forest (cforest) to differentiate the Palmer amaranth from cotton. Classifications were binary: Palmer amaranth and cotton bronze, Palmer amaranth and cotton green, and Palmer amaranth and cotton yellow. Classification accuracies were verified with overall, user’s, and producer’s accuracy. For the two dates combined, overall accuracy ranged from 77.8% to 88.9%. The highest overall accuracies were observed for the Palmer amaranth versus the cotton yellow classification (88.9%, December 12, 2016


Introduction
Palmer amaranth, an aggressive and invasive weed, negatively impacts cotton growth and productively throughout the United States. It grows at a rapid rate (i.e., approximately 25 -50 mm per day), produces several thousand seeds per plant, competes with cotton plants for sunlight and soil nutrients, and reduces cotton yield. Palmer amaranth populations are controlled by chemical and mechanical means. Its presence in fields is observed via site visits, which are time consuming when several hundred hectares need to be surveyed. To better implement control strategies for Palmer amaranth invasions in cotton production systems, agriculturalists need tools that can help them differentiate it from cotton. Researchers, consultants, and producers have shown interests in employing remote sensing technologies as weed detection and survey tools in crop production systems.
Remote sensing applications for weed survey have focused on comparing leaf and canopy light reflectance properties of weeds to crops with the intention to identify spectral bands showing statistically significant differences between them. Spectral profiles and statistical analyses were used to distinguish soybean (Glycine max L.) from pitted morning glory (Ipomea lacunose L.) [1], sunflower (Helianthus annuus L.) from corn caraway (Ridfolia segetum Moris.) [2], and Palmer amaranth and redroot pigweed from cotton [3]. Over the years, there has been a shift towards using multispectral and hyperspectral data as input into machine learners to differentiate crops from weeds. Machine learning involves training computer algorithms to discover patterns in datasets.
As indicated earlier, machine learners coupled with spectral data have shown promise for weed-crop discrimination. The user has many learners to choose from. Random forest, a nonparametric machine learning method, uses a group of decision trees to estimate a value or assign an object to a class [13]. It is competitive with other machine learners such as support vector machines and neural networks and was ranked as one of the best classifiers [14] [15]. Minimum data preparation is needed by the algorithm. Random forests work well with large datasets and are not affected by outliers. The algorithm does not require a separate testing set because it uses bootstrap sampling for each tree, meaning each tree is built with 63% of the data and thus leaving 37% of the data for testing (i.e., the "out-of-bag" data).
Crop plants with specific traits are being developed by plant breeders to meet the desired characteristics needed for crop production systems. Presently at experiments stations, researchers are evaluating cotton near-isogenic lines that have bronze, green, and yellow colored leaves with the intention of using them in cotton production systems. Reference [3] statistically compared the leaf spectral profile of the cotton lines to redroot pigweed and Palmer amaranth. Currently, no research is available that has explored application of canopy narrowband hyperspectral data and random forest machine learning algorithm for differentiating Palmer amaranth from cotton with different colored leaves. The objective of this study was to evaluate canopy hyperspectral narrowband data as input into random forest machine learning algorithm to distinguish Palmer amaranth from cotton. The study focused on using optimal hyperspectral narrowbands proposed for study of vegetation and agricultural crops [12] as input into the random forest algorithm. Additionally, the research concentrated on cotton near-isogenic lines that have bronze, green, or yellow leaf colors.

Experimental Setup
The study was conducted in a greenhouse located at the United States Depart- bronze, cotton green, and cotton yellow seed) were obtained from established seed banks maintained at USDA-ARS, Stoneville, MS. A detailed description of the cotton near-isogenic lines is provided in [3].
The experimental design was a randomized complete block design with 18 replications and 4 treatments per replication (i.e., 1 cotton bronze plant, 1 cotton green plant, 1 cotton yellow plant, and 1 Palmer amaranth plant). Data were collected from two separate experiments. Planting dates were November 7, 2016 and April 18, 2017 for experiments one and two, respectively. For each experiment, several seeds of the respective plants were planted into 11-cm pots containing potting mix (Pro-Mix BX general professional growth medium, Premier Tech Horticulture, Quakertown, PA). Approximately 10 days after emergence, plants were thinned to 1 plant per pot. The plants were subjected to a 14-hr day length; the greenhouse temperature was maintained between 21.1˚C and 26.7˚C. Sodium vapor lamps were used as a supplemental light source at the beginning and ending of the day. The plants were fertilized weekly with a liquid fertilizer (approximately 4.9 ml of fertilizer to 3.8 L of water, Dyna-Gro All-Pro, Richmond, CA); water was added as needed.

Data Collection
Canopy reflectance measurements of Palmer amaranth and cotton plants were obtained with a hyperspectral spectroradiometer (Fieldspec 3 Full Range, ASD Inc. Boulder, CO) sensitive to a spectral range of 350 to 2500 nanometers (nm). Reflectance measurements were acquired in the vegetative growth stage (i.e., Palmer amaranth 10 leaf stage and cotton 4 leaf stage). Herbicide management programs are more effective when weeds are treated in the vegetative phase; and most users want to kill or treat weeds prior to seeding.
Canopy reflectance measurements were acquired on December 12, 2016 and May 14, 2017. They were obtained outside of the greenhouse under sunlit conditions. Measurements were obtained ± 2 hours of solar noon. Prior to measurements, a black felt cloth was placed across the top of the pot to cover the potting mix, thus providing a uniform background. The black felt cloth's spectral reflectance was less than 2 percent in all regions of the spectrum measured by the spectroradiometer.
The spectroradiometer was calibrated at approximately 15-minute intervals with a white calibration panel. For canopy measurements, the spectroradiometer's fiber optic cable was held 15 cm from the top of the plant canopy, resulting in a ground field of view of 35 cm 2 . For each plant canopy, the spectral reading was an average of 15 scans collected by the instrument.

Classifications
The spectral data were highly correlated according to Pearson correlation analyses. Therefore, classifications were completed with the conditional inference version (cforest) of random forest. It produces more stable variable of importance readings than random forest, especially when variables are highly correlated [20]. Also, cforest is different than random forest in that it employs conditional inference trees as base learners and the conditional permutation scheme described by [20] [21] to rank variables. To complete the classifications, the decision trees (i.e., forest) were developed using subsampling without replacement [22]. That procedure reduces the bias in variable importance rankings and is appropriate to use when potential predictor variables vary in their scale of measurement or their number of categories [22].
Cforest required adjustments of two parameters to develop the classification models, ntree and mtry. Ntree characterized the number of decision trees to use in each forest. Mtry represented the number of variables randomly chosen to derive tree split points.
Cforest models were developed and tested based on the following procedures.

Accuracy Assessment of Cforest Models
Accuracy of cforest models was determined with user's, producer's, and overall accuracies [24]. User's accuracy was tabulated by dividing the sum of the cor-

Results
Cforest model parameters are summarized in Table 2 for each classification. The default ntree (500) and mtry (5) values were adequate to use for five out of six classifications. The only exception was the May 14, 2017, Palmer amaranth versus cotton bronze classification. For that classification, the best accuracies were obtained when all variables were considered for splitting a tree node.
Accuracy assessment results of the classifications are shown in Table 3. Overall accuracies were greater than 77% with the lowest overall accuracy achieved  (Table 3). Overall, Palmer amaranth user's accuracy was less than its producer's accuracy; whereas, the opposite was observed for the cotton classes ( Table 3). The highest user's and producer's accuracies for the Palmer amaranth class varied for the classifications (Table 3). For both dates, the user's accuracy of cotton yellow was greater than the user's accuracies of cotton bronze and cotton green. Producer's accuracy varied from one classification to the next for the cotton classes ( Table 3). The user's accuracy of the Palmer amaranth class was less than the user's accuracy of the cotton classes; in contrast, the producer's accuracy of the Palmer amaranth class was greater than the producer's accuracy of the cotton classes ( Table 3).
The variable importance rankings of the classifications for both dates are shown in Figure 1 and Figure 2. The results of the December 12, 2016 dataset indicated that 14 to 15 bands were important for differentiating Palmer amaranth from cotton. Those bands encompassed the blue, green, red, red-edge, near-infrared, and shortwave-infrared regions of the spectrum ( Figure 1, Table   1). The top ranked bands were 705 nm (red-edge), 720 nm (red-edge), and 570   Figure 2). Those bands encompassed the blue, green, red, red-edge, near-infrared, and shortwave-infrared regions of the spectrum (Figure 2, Table 1

Discussion
Hyperspectral narrowbands proposed for vegetation and agricultural surveys were evaluated as input into cforest classification algorithm to differentiate Palmer amaranth from cotton. The spectral data and machine learning algorithm were tested for three different scenarios and for two separate dates (Table 3). American Journal of Plant Sciences  Overall, the best classification accuracies were achieved for the Palmer amaranth versus cotton yellow classifications with accuracies ranging from 77.3% to 94.4% (Table 3). For December 12, 2016, the second best classification accuracies were observed for the Palmer amaranth versus cotton bronze classification ( Table 3).
The second best overall accuracy for the May 14, 2017 classifications occurred for Palmer amaranth versus cotton green ( Table 3). The user's and producer's accuracies rankings were not consistent for the Palmer amaranth and cotton bronze versus Palmer amaranth and cotton green classifications ( and showing potential of random forest for crop-weed discrimination [8] [9]. Variable importance rankings suggested that 15 spectral bands or less were needed for Palmer amaranth cotton discrimination ( Figure 1, Figure 2). Other researchers have also shown that 15 to 30 spectral bands are needed for vegetation and crop surveys [12] [26]. Additionally, one or more of the red-edge bands (i.e., 705 nm, 720 nm) were ranked in the top five variable importance for all classifications. The red-edge region (680 -740 nm) of the optical spectrum is the transition zone between red and near-infrared reflectance of vegetation, and represents the change between chlorophyll absorption and light scattering caused by leaf internal structure [28]. Changes in both chlorophyll content and leaf structure are often reflected in the red-edge region of the spectrum. Shifts in the red-edge position are caused by chlorophyll content [29] [30], leaf area index [31], and plant biomass [32].
For this study, differences in leaf chlorophyll content and canopy architecture compared to cotton, hence affecting its reflectance. Furthermore, the rankings indicated that visible, near-infrared, and shortwave-infrared bands contributed to the models derived for the classifications (Figure 1, Figure 2; Table 1), fur-American Journal of Plant Sciences ther substantiating that differences in leaf pigment, canopy architecture, and leaf water content, respectively affected the classification results.

Conclusion
This study demonstrated that canopy hyperspectral narrowband data could be used in tandem with random forest machine learning algorithm to differentiate cotton from Palmer amaranth, an invasive weed of cotton production systems.
This study not only focused on cotton with green leaves, but also evaluated the hyperspectral narrowbands and classification algorithm on cotton with bronze and yellow leaves. Out of the 24 bands evaluated in this study, 15 or less were important to Palmer amaranth-cotton discrimination. The next step will be to determine whether random forest (i.e., cforest and other versions of the algorithm) could be used with airborne hyperspectral imaging data to differentiate Palmer amaranth from cotton with different colored leaves or not. When using airborne imaging systems, the user must consider that in-canopy shadowing, soil background, bi-directional reflectance, spatial resolution of the imagery, and radiometric resolution of the imaging system will influence the spectral response of the feature of interest. Overall, this research further supports using hyperspectral narrowband data and cforest as decision support tools for weed discrimination in cotton production systems.