Estimating Mass of Harvested Asian Seabass Lates calcarifer from Images

Total of 1072 Asian seabass or barramundi (Lates calcarifer) were harvested at two different locations in Queensland, Australia. Each fish was digitally photographed and weighed. A subsample of 200 images (100 from each location) were manually segmented to extract the fish-body area (S in cm), excluding all fins. After scaling the segmented images to 1mm per pixel, the fish mass values (M in grams) were fitted by a single-factor model ( 1.5 M aS = , 0.1695 a = ) achieving the coefficient of determination (R) and the Mean Absolute Relative Error (MARE) of 2 0.9819 R = and 5.1% MARE = , respectively. A segmentation Convolutional Neural Network (CNN) was trained on the 200 hand-segmented images, and then applied to the rest of the available images. The CNN predicted fish-body areas were used to fit the mass-area estimation models: the single-factor model, 1.5 M aS = , 0.170 a = , 2 0.9819 R = , 5.1% MARE = ; and the two-factor model, b M aS = , 0.124 a = , 0.155 b = , 2 0.9834 R = , 4.5% MARE = .


1.5
M aS = , 0.1695 a = ) achieving the coefficient of determination (R 2 ) and the Mean Absolute Relative Error (MARE) of 2 0.9819 R = and 5.1% MARE = , respectively.A segmentation Convolutional Neural Network (CNN) was trained on the 200 hand-segmented images, and then applied to the rest of the available images.The CNN predicted fish-body areas were used to fit the mass-area estimation models: the single-factor model,

Introduction
In aquaculture, the economic value of a particular fish species is primarily determined by its mass (M).However, weight measurement usually involves manual handling, whilst length can easily be estimated from digital images through identifying the nose and tail of the fish.Therefore mathematical models were developed to estimate fish mass from its length (L).For example, the lengthmass power model, was commonly used, where a and b were empirically-fitted species-dependent parameters [1] [2].With the advances in image processing and the widespread availability of low-cost high-definition digital cameras, not only the length, but also other fish shape features could be collected automatically and used to estimate the mass.In particular, it was found that the fish image area (S) could be used to estimate the fish mass (M) via the linear model, for grey mullet (Mugil cephalus), St. Peter's fish (Sarotherodon galilaeus) and common carp (Cyprinus carpio) [3].The same area-mass linear model (Equation (20) was confirmed to be more accurate than the length-mass power model (Equation ( 1)) for Jade perch (Scortum barcoo) [4], obtaining the coefficient of determination (R 2 ) and the mean absolute relative error (MARE) of 2 0.99 R = and 6% MARE = , respectively.Even though the linear model (Equation ( 2)) appeared to perform better than Equation (1) [3] [4], Equation ( 2) is limited to the range of sufficiently large fish for any non-zero fitted parameter a.On the other hand, the area-mass power model, does not exhibit the applicability limitations of Equation ( 2) and achieved the fit of 2 0.99 R = for Alaskan Pollock (Theragra chalcogramma) [5].Furthermore, the fitted models had 1.5 b ≈ [5], which was consistent with the proportional relationships between the fish length ( L S ∝ ), width (W S ∝ ) and height ( H S ∝ ), and between the fish volume ( V LWH ∝ ) and fish mass (M), obtaining

M S ∝
) with 2 0.998 R = by [7].Based on the preceding discussion, the first goal of this work was to establish the area-mass power model for the industrial scale harvesting of Asian seabass or barramundi (Lates calcarifer) in Queensland, Australia.The goal was successfully accomplished by fitting Equations ( 3) and (4), as displayed in Figure 3.The second goal of this study was to design a practical image-processing method to extract fish-body area while excluding the fins for enhanced accuracy and also for possible applications in industrial-scale modern selective breeding programs [8] [9].That goal was achieved by training a segmentation neural network in Section 2.2.

Datasets
Two datasets were used in this study.The first was the Barra-Ruler-445 (BR445) World Journal of Engineering and Technology dataset used in [10] [11], and publically available via [12] originated from the [9] study.The second dataset was the Barra-Area-600 (BA600) dataset and released to public domain on publication of this work via [13].In both datasets, each harvested barramundi fish (Asian seabass, Lates calcarifer) was digitally photographed and its weight was measured and recorded against the image file name.
All images had a millimeter-graded ruler placed next to the fish, see Figure 1 for examples.The weights ranged 0.2 kg -1 kg in BR445, and 1 kg -2.5 kg in BA600.
The image scales (in millimeters per pixel) were determined manually by measuring the number of pixels between the end points of the 300 mm ruler present in each image.The BR445 image scales were checked by the automatic ruler-scaling (RS2) algorithm [11].The BA600 images were taken from the same distance hence they had the same scale.

Automatic Fish-Body Segmentation
The fins of the fish can contribute significantly to the total fish image area, see typical examples in Figure 1.At the same time the fins' contribution to the fish mass is negligible.Therefore, ideally, only the fish-body area should be used to estimate the fish mass.For example, using the fish area without considering the fin tail was found to be more accurate when predicting the mass of Jade perch Scortum barcoo [4].Furthermore, the fins are highly flexible and are more likely to change shape during harvesting, or be damaged and/or erode during the production growth cycle.resulting fish-body binary masks were individually scaled to have the same scale of 1 mm per pixel.In this study all custom computer programs were written in Python programming language, which was also used to calculate the fish-body pixel areas.The obtained fish areas and the corresponding measured mass values were fitted via Equation ( 4) and results displayed in Figure 2. The fit achieved highly accurate 2 0.9819 R = , and 5.1% MARE = , which were comparable to the corresponding results obtained on other fish species [4] [5] [6] [7]. Figure 2 clearly illustrated how the weight of the harvested Asian seabass Lates calcarifer could be estimated from the fish area with high accuracy.However, before such estimation method could be deployed in the aquaculture production environment, a robust automatic body-area extraction algorithm would be required, which was the focus for the rest of this section.
The recently developed semantic-segmentation Convolutional Neural Networks (CNN) [14] were highly successful in solving challenges where the segmentation of an image into per-pixel classes was required [11] [14] [15].As discussed in the introduction, the second primary goal of this study was to design a practical Computer Vision algorithm to extract fish-body area from images.The Deep Learning neural networks [16] have revolutionized modern Machine Learning including the field of Computer Vision, and a large number of segmentation Deep Learning CNN models have been proposed.Comparing even the most popular segmentation CNN models was outside the scope of this work.
Instead, the most accurate Fully Convolutional Network from [14], FCN-8s, was used.FCN-8s could be viewed as the modern baseline segmentation CNN model due to its highest citation rate out of all available segmentation CNNs (more than 4000 Google Scholar citations at the time of writing).
The FCN-8s model was implemented [17] in Python utilizing the high-level neural networks Application Programming Interface (API) Keras [18] together Figure 2. Relation between the measured fish weight ( M in g) and the seg- mented-by-hand fish-body image area ( S in cm 2 ) fitted by: Equation (4) as with the machine-learning Python package TensorFlow [19].The FCN-8s model is a general features-to-segmentation decoder CNN, which required an image-to-features CNN encoder.The original FCN-8s [14] was built with the VGG16 [20] convolutional layers as the encoder.The VGG16 model within Keras was trained to recognize 1000 different ImageNet [21] object classes and commonly referred to as ImageNet-trained.The ImageNet-trained CNN models were often more accurate than randomly initialized CNN models when they were further trained to recognize new object classes [22].Therefore the convolutional layers of the ImageNet-trained VGG16 model were used to build our version of the FCN-8s model referred at the Fish Area Segmentation (FAS) model hereafter.
The FAS model was loaded with the relevant VGG16 weights facilitating the knowledge transfer [22], where the remaining convolutional as well as de-convolutional FCN-8s layers were initialized by the uniform distribution as per [23].Furthermore, the first two FCN-8s decoder layers had their number of neurons reduced to 512 comparing to the 4096 neurons of the original FCN-8s in [14].Such drastic reduction was justified by the requirement to recognize and segment only the single class of objects, i.e. fish body.The sigmoid activation function was used in the last layer.
The described 200 images together with the corresponding hand-segmented body masks were used to train the FAS.The 200 image-mask pairs were randomly split 80% -20%, where the 80% of pairs were used as the actual training set and the remaining 20% were used as the validation set to assess the training process.Since the training set had such small number of images, the encoding VGG16 layers in FAS were fixed and excluded from training.The remaining trainable weights (excluding biases) were regularized by a weight decay set to .The training and validation images as well as the masks were rescaled to 1mm per pixel.Then each image-mask pair was extensively augmented for each epoch of training, i.e. one pass through all available training and validation images.Specifically, the python-opencv package was used to perform augmentations, where each image and if applicable the corresponding binary mask were: • randomly rotated in the range of [−180, +180] degrees; • randomly scaled vertically in the range of [0.8, 1] and independently horizontally within the same range; • randomly cropped to retain 480 × 480 pixels; • each color channel was ±12.5 range randomly shifted; • randomly flipped horizontally and vertically; • ImageNet color mean values were subtracted as required when working with the VGG16 model.
To assist better segmentation, the following loss function was adopted, (

Results and Discussion
Multiple training sessions with different random train/validation split produced very similar results.The FAS model and its training procedure exhibited negligible over-fitting as demonstrated by the comparable final training and validation loss values (mean of Equation ( 5)) of 0.063 ± 0.001 and 0.072 ± 0.003, respectively.The training and validation per-pixel accuracies were 0.9945 ± 0.0005 and 0.9935 ± 0.0005, respectively.The trained FAS model was applied to all available (scaled to 1mm per pixel) images including the 200 images used for training.By its design FAS could be applied to images of any size.However in practice, it was significantly faster to pad available images by zero values to fill the fixed 640 × 640 shape and then feed them into FAS for prediction, where the 640 × 640 square was large enough to fit all available scaled images.For each image, the prediction heat-map of [0, 1] range pixel values were further processed by setting values above 0.51 to ones (i.e.predicted as the body pixels) and the rest to zeros (i.e. the background pixels).The largest connected non-zero region in each image was accepted as the final fish body segmentation, and its area in pixel 2 (i.e.mm 2 ) was calculated.Overlapping fish and/or multiple fish per image were outside the scope of this work.
It took 2 -3 hours to train FAS on Nvidia GTX 1080Ti GPU.However, once trained the FAS model was fast enough to process 640 × 640 images at a rate of 30 images per second on the same GPU, and therefore it could even be deployed in the aquaculture production processing video feed in real time.All predicted Figure 1.Examples of images from the BR445 (left column) and BA600 (right column) datasets.
MARE = .Higher density of data points were denoted by lighter color.World Journal of Engineering and Technology that both the validation images were also augmented by the preceding augmentation pre-processing steps in order to prevent the indirect fitting of the validation images. ,