Comparison of Deep Learning Architectures for Late Blight and Early Blight Disease Detection on Potatoes

Potato late blight and early blight are common hazards to the long-term production of potatoes, impacting many farmers around the world, particularly in Africa. Early detection and treatment of the potato blight disease are critical for promoting healthy potato plant growth and ensuring adequate supply and food security for the fast-growing population. As a result, machine-driven disease detection systems may be able to overcome the constraints of traditional leaf disease diagnosis procedures, which are generally time-consuming, inac-curate, and costly. Convolutional Neural Networks (CNNs) have been shown to be effective in a variety of agricultural applications. CNNs have been shown to be helpful in detecting disease in plants because of their capacity to analyze vast volumes of data quickly and reliably. However, the method hasn’t been widely used in the detection of potato late blight and early blight diseases, which reduce yields significantly. The goal of this study was to compare six cutting-edge CNN architectural models, taking into account transfer learning for training and four hyperparameters. The CNN architectures evaluated were AlexNet, GoogleNet, SqueezeNet, DenseNet121, EfficientNet b7, and VGG19. Likewise, the hyperparameters analyzed were the number of epochs, the batch size, the optimizer, and the learning rate. An open-source dataset containing 4082 images was used. The DenseNet121 architecture with a batch of 32 and a Stochastic Gradient Descent (SGD) optimizer with a learning rate of 0.01 produced the best performance, with an accuracy of 98.34% and a 97.37% f1-score. The DenseNet121 model was shown to be useful in developing computer vision systems that aid farmers in improving their disease management systems for potato cultivation.


Introduction
Contemporary society is concerned about food security issues due to the continual increase in population, rural to urban migration, climate change, and the reduction of cultivable land caused by the increase in industrialization and urbanization processes. The agricultural sector remains important to the socio-economic development of Africa, contributing 32% of the GDP. About 80% of the agricultural output comes from smallholder farmers and employs nearly 65% of the population. Low productivity, which characterizes agricultural production, remains a major concern in many African countries [1].
Potato is the third most important food crop in terms of global consumption [2]. The improvement of the potato production system in sub-Saharan Africa can be a pathway out of poverty. Potato has a short cropping cycle and produces a large amount per unit area in a short period (International Potato Centre, Sub-Saharan Africa 2020). However, diseases such as early blight, late blight (LB), bacterial wilt (BW), and viruses reduce the production of smallholder potato farmers in sub-Saharan Africa. Potato late blight and early blight influence the quality and quantity of the potatoes, hence causing direct crop loss. They are leaf spot diseases caused by Phytophthora infestans and the fungus Alternaria solani respectively that cause average yield losses of between 30% and 75% [3]. Typically, small-scale farmers continuously use fungicides to combat these diseases, but this practice creates a dependency on pesticides and compromises human health and the environment [4]. Furthermore, regulators such as the European Union (EU) are enacting increasingly stringent chemical usage requirements for agricultural products entering their markets [5].
Early diagnosis of plant diseases plays an important role in improving agricultural yield. Disease-infected plants typically have visible markings or lesions on their leaves, stems, flowers, and/or fruits. In general, each illness state has a distinct visual pattern that can be utilized to diagnose the disease. Small rounds or irregular dark-brown to black dots on the older (lower) leaves are the first signs of early blight ( Figure 1). These patches can grow up to 3/8 inch in diameter and become angular-shaped over time [6]. Small, light to dark green, round to irregularly shaped water-soaked dots are the earliest signs of late blight in the field (Figure 1). The lowest leaves are frequently the first to get these diseases [7]. Plant diseases have traditionally been diagnosed by human experts. This is, however, costly, time-consuming, and, in some situations, unworkable; therefore, farmers are not able to respond quickly and accurately [8]. This has prompted studies that utilize deep learning, particularly used in image processing, for the early detection and management of diseases in agriculture [9]. The main contribution of this research was to determine the Convolutional Neural Network (CNN) architecture and hyperparameters that may be suitable to be deployed in conventional as well as mobile/embedded computing environments for disease detection on potatoes in the field. To achieve it, transfer learning and fine-tuning were applied to six (6) state of art Convolutional Neural Networks (AlexNet, Ef-ficientNet b7, GoogleNet, SqueezeNet, DenseNet121, and VGG19) to identify the hyperparameters that best influenced the training of architectures for late blight and early disease identification on potato leaves. The hyperparameters analyzed were the number of epochs, batch size, optimizer, and learning rate.
The rest of the paper is structured as follows: Section 2 provides an overview of related studies. The tests and six state-of-the-art CNN architectures are introduced in Section 3. The results are presented in Section 4. Section 5 is where the discussion takes place, and Section 6 is where the conclusions are reported.

Related Works
Convolutional Neural Network (CNN) is a type of artificial neural network used to interpret visual imagery in deep learning. CNNs are used in a variety of agricultural applications, including crop type classification, and the detection of diseases on plant leaves [10] [11]. The use of CNN for the detection of disease has been tested in several studies. KC et al. [12] used different CNN architectures to detect 58 diseases of 25 types of plants with a success rate of 99.5%. Fuentes et al. [13] compared the ability of several CNN architectures were able to recognize nine types of diseases and pests in tomato plants. Mohanty et al. [14]  In the case of potatoes, studies have employed CNNs for the detection and categorization of disease. Islam et al. [15] created a classifier that could classify healthy leaves and those affected by late and early blight diseases using 300 potato images drawn from the PlantVillage dataset [16]. Multiclass SVM was used to identify leaf pictures into three groups based on ten color and textural attributes. was also used by the researchers. Benchmarking these models using the evaluation metric because the dataset is imbalanced would have been beneficial.
Generally, many researchers use accuracy as a criterion for selecting the best CNNs. However, the accuracy has various flaws, including reduced uniqueness, discriminability, informativeness, and bias towards data from the majority class [20]. Tiwari et al. 2020 [21] fine-tuned (transfer learning) pre-trained models like VGG19 for late blight and early blight disease detection on potato leaves.
The model was fine-tuned to extract relevant characteristics from the dataset.
The results were then analyzed using multiple classifiers, logistic regression out- Afzaal et al. [24] trained and compared 3 CNN models namely GoogleNet, VGGNet, and EfficientNet for accurate identification of early blight disease on potatoes at different growth stages using the PyTorch framework. The results showed that EfficientNet was the most effective model. There are few studies presenting models for potato late blight and early disease identification. This study presents one study on late blight and early blight with a wide dataset.

Dataset Description
The dataset was downloaded from [25] and accessed on 12 November 2021. The dataset contained 4082 images of potato leaves. The data had a spread of 3 class labels assigned to them: Healthy class, Late Blight, and Early blight. Figure 2 shows a sample of the three classes of potato images obtained from the Kaggle. Low-quality JPG images were used because of their capacity to represent real-world scenarios such as the presence of noise, contrast, and blur [17].

Convolutional Neural Networks
CNNs are built by superimposing convolutional layers that apply a set of local filters through the dimensions of the data entry. Thus, these networks detect patterns by calculating local correlations using a kernel whose parameters are determined by the learning process. Kernels are tiny sections of the convolution that glide over the convolution; to extract meaningful information with fewer dimensions. The output of the preceding layer, called the feature map, is often switched to a pooling layer that reduces the dimensions of this map by sub-sampling it, thus giving the result the property of translation invariance. The reduction of the dimension is generally achieved by taking the maximum or the average of values over a set of pixels. The stacking of convolution and pooling layers at various scales can detect larger and more complex patterns [26]. These mechanisms are summarized in Figure 3.
It is time-consuming to collect images belonging to a specific area of interest and train a classifier from scratch. Transfer learning makes it possible to overcome this challenge by using a pre-trained model and changing its last few layers. This helps to achieve good results even with a small dataset since the basic image features have already been learned in the pre-trained model from a much larger dataset. In this study, AlexNet, GoogleNet, SqueezeNet, EfficientNetb7, VGG19, and DenseNet121 were used for transfer learning since they have shown high accuracy in earlier datasets.
S. E. Arnaud et al. Testing 102 Figure 3. Overall mechanism of the CNN architecture.

AlexNet Architectures
The AlexNet architecture [27] consists of 5 convolution layers and 3 fully connected layers. The first convolutional layer consists of 96 filters of sizes 11 × 11 and uses a step of 4 pixels. The second layer comprises 256 filters of 5 × 5 sizes.
Layers three and four use 384 filters each of size 3 × 3. The last convolutional layer has 256 filters of size 3 × 3. Finally, the 2 fully connected layers have 4096 and 1000 neurons respectively. The number of neurons in the final layer is often modified to suit the problem.

GoogleNet Architectures
GoogleNet is a 22-layer deep convolutional neural network. GoogleNet was built by Google to improve the performance of deep neural networks both in terms of speed and precision. Its evolution and iterative improvement led to the emergence of several versions of the network. Inception-v1 [28], Inception-v2 [29], Inception-v3 [29], and Inception-v4 [30] are the most popular versions of GoogleNet. The Inception architecture consists of 4 branches concatenated in parallel. The first branch consists of a 1 × 1 kernel convolution, followed by two 3 × 3 convolutions. The second branch comprises a 1 × 1 convolution, followed by a 3 × 3 convolution. The third branch has a pooling, followed by a 1 × 1 con-

SqueezeNet
SqueezeNet architecture is a smaller network that was created to be a more compact alternative to AlexNet [31]. It consists of 26 convolutional layers and SqueezeNet. It has over 50 times fewer parameters than AlexNet but performs three times faster. SqueezeNet takes advantage of the convolution layer (which has only 1 × 1 filters), feeding into an expanded layer that has a mix of 1 × 1 and 3 × 3 convolution filters and chains a bunch of these modules together to arrive at a smaller model.

EfficientNet-b7
EfficientNet is a convolutional neural network architecture and scaling method that uniformly scales the existing model network width, depth, and resolution with a set of fixed scaling coefficients to improve its performance [32]. This is unlike conventional practice that arbitrarily scales these factors.

VGG19
The

Hyperparameters
Optimization of a model is the process of changing the hyperparameters of the

Experimental Framework
It takes a lot of computing power to train modern CNN systems. Therefore, This needs a more thorough assessment of the proposed system using additional performance indicators. The confusion matrices were utilized to investigate the tumor classification system's performance. The confusion matrix was used to describe the performance of the CNNs due to its ability to accurately measure the performance of a model with two or more classes. The Matrix checks how often its predictions are accurate compared to reality in classification problems.
Each row in the matrix corresponds to a predicted class and each column corresponds to an actual class. The confusion matrix makes predictions for each row of a test dataset. Based on its predictions and the expected results, the matrix indicates the number of correct and incorrect predictions for each class. This will allow the evaluation metrics of sensitivity, accuracy, specificity, precision, and f1-score to be calculated.
The sensitivity, also known as the recall, is the proportion of potato leaves that were accurately labeled as positive to those that were truly positive. Equation (1) was used to compute the sensitivity, where TP stands for true positives, which is the number of positive cases that are correctly identified, and FN stands for false negatives, which is the number of positive cases that are incorrectly classified as negative.

TP
Specificity is defined as the conditional probability of true negatives given a secondary class, which approximates the probability of the negative label being true; it is represented by Equation (3).
Precision, defined as the number of images that were correctly classified as positive out of all positives, is given by Equation (4). It assesses the algorithm's prediction ability. Precision refers to how "accurate" the model is in terms of how many of the anticipated positives are actually positive.

Results
This section compares six state-of-the-art CNN architectures in order to determine the best model for detecting late blight and early blight diseases in potatoes. The goal of this study was to compare CNN models based on their accuracy, precision, sensitivity, specificity, and F1-Score. The following quality indicators were used while presenting the results and comparing the models: Accuracy is defined as the ratio of correctly labeled images to the total number of samples and the F1-score (5) is a useful performance statistic, especially when there is an unequal distribution of classes. The dataset utilized in this work contained 1628 images of early blight, 1434 images of late blight, and 1020 healthy leaves. As a result, the model/optimizer with the highest f1-score was deemed the most suited architecture for the detection of potato late blight and early blight diseases in the field. Validation accuracy ranged from 81.36 to 98.34 when using the SGD parameters as an optimizer, while f1-score values ranged from 56.67 to 97.37 for all CNNs. Table 2 shows that DenseNet and VGG19 had the best accuracy for the evaluation with 25 epochs, with an accuracy of 98.34 and a learning rate of respectively 0.001 and 0.01. Looking at the f1-score result the DenseNet (97.38%) had the best performance with 25 epochs. Furthermore, for 10 epochs DenseNet once again performed comparatively better with the highest accuracy (97.84) and f1-score (96.63) for a learning rate of 0.01. On the other hand, the lowest performance recorded for 10 and 25 epochs was SqueezeNet with an accuracy of respectively 81.23 and 81.36 and an f1-score of respectively 57.02% and 51.46% for a learning rate of 0.1. According to the specificity metrics, the majority of the Open Journal of Applied Sciences

Discussion
The optimizer plays a key role in minimizing the error function, allowing the model to conform to the instances in the training set. For both Adam and SGD optimizer, the validation accuracy and F1-score of CNNs in detecting early blight and late blight disease ranged from 81.23 to 98.51 and 56.67 to 97.37 respectively. It was observed that training the network with Adam as the optimizer increased the accuracy of all models to be between 89.46 and 98.51 as shown in Table 3. However, the GoogleNet architecture may have contributed to poor validation accuracy using the Adam optimizer since it starts with a large receptive field to decrease computing requirements [12]. These findings are consistent Open Journal of Applied Sciences with those of Russakovsky et al. After tuning the hyperparameters, it was also observed that the highest accuracy and f1-score for detection of potato late blight and early blight disease was obtained from DenseNet (97.37). The findings reveal that a larger batch size does not always result in high accuracy, and that the learning rate and optimizer utilized can have a major impact. Lowering the learning rate and batch size will help the network to train more efficiently, especially when fine-tuning. Our findings are consistent with those of Masters et al. [35], who suggested that smaller batch sizes be employed. Radiuk et al. [36] claim that when a high learning rate is utilized, the larger the batch size, the better the CNN's performance. While we do not encourage using big batch size numbers in our research, Radiuk's findings are consistent with our findings on the batch size and learning rate relationship. We specifically mentioned that better learning rates necessitate bigger batch sizes. Finally, Bengio et al. [37] suggested that a batch size of 32 is a good place to start. While our trials (which showed that a batch size of 32 produced decent results) back this up, the best results were obtained with a batch size of 16.
The lowest performance recorded for 10 and 25 epochs was SqueezeNet with an accuracy of respectively 73.21% and an f1-score of 50.86% with SGD optimizer for a learning rate of 0.1 and a batch size of 64. The best performance (98.34% validation accuracy and 97.37 f1-score) was obtained using a combination of batch (32), learning rate (0.01), and epochs (25). Table 4 compares this study to similar studies in which the best-performing CNN designs have been reported. Yen Lee et al. [19] focused solely on constructing a CNN architecture for disease detection on potatoes, achieving a 92 percent accuracy. Similarly, Tiwari et al. [21] used fine-tuning (transfer learning) to extract significant characteristics from the dataset using pre-trained models like VGG19. The results were then analyzed using multiple classifiers, with logistic regression outperforming the others by a significant margin of classification accuracy, achieving 97.8% over the test dataset. Multiclass SVM was used by Islam et al. [15] to divide leaf images into three classes based on ten color and textural variables. The cross-validated accuracy of the system was 93.7 percent. Finally, this study evaluated the performance of six CNNs using transfer learning and modifying four hyperparameters, resulting in the DenseNet model with the

Conclusion
The agriculture industry can considerably benefit from the use of CNN in image classification to boost yield production.