Multi-Sensor Image Fusion: A Survey of the State of the Art

Image fusion has been developing into an important area of research. In remote sensing, the use of the same image sensor in different working modes, or different image sensors, can provide reinforcing or complementary information. Therefore, it is highly valuable to fuse outputs from multiple sensors (or the same sensor in different working modes) to improve the overall performance of the remote images, which are very useful for human visual perception and image processing task. Accordingly, in this paper, we first provide a comprehensive survey of the state of the art of multi-sensor image fusion methods in terms of three aspects: pixel-level fusion, feature-level fusion and decision-level fusion. An overview of existing fusion strategies is then introduced, after which the existing fusion quality measures are summarized. Finally, this review analyzes the development trends in fusion algorithms that may attract researchers to further explore the research in this field.


Introduction
In the late 1970s, with the emergence and development of image sensors, multi-sensor information fusion facilitated the emergence of branch-image fusion, an emerging research field combining sensors, signal processing, image processing, and artificial intelligence while using images as the research object in the field of information fusion. This approach combines the image information about the same scene obtained either by multiple image sensors or by the same image sensor in different working modes to obtain a new and more accurate description of texture characteristics and structural information of features, and also work well in all weather and all-day/night conditions, but its visual effect is relatively poor when compared to the visible images [3]. In addition, multispectral (MS) images have low resolution and high spectral density, while panchromatic (PAN) images have high resolution and low spectral density [4]. Due to the redundancy and complementarity between the image information obtained by different image sensors (or the same image sensor in different working modes), when compared to any of the individual input remote images, a more comprehensive and accurate image description of a certain scene can be obtained through the fuse of multiple source images (as illustrated in Figure 1) [5]; this approach overcomes the limitations of and differences between the geometric, spectral and spatial resolution of single sensor images, improves image clarity and comprehensibility, and provide more effective information for subsequent image processing task (e.g. image segmentation [6] [7], classification [8], saliency [9] [10], target detection and recognition [11] [12], localization [13], medical diagnosis [14], surveillance [15], energy monitoring [3] [16], agricultural applications [17] and military applications [18]). applications, more and more satellites are acquiring remote sensing images of observation scenes with differing spatial, spectral and temporal resolutions. A similar situation has arisen in other applications such as medical imaging [20].
After investigating the early proposed image fusion methods, Burt and Adelson introduced a novel image fusion algorithm based on layered image decomposition [21]. In order to further improve stability and noise resistance, as well as to resolve the pathological condition arising from patterns with opposite contrast, [22] improved the pyramid fusion method. Pohl and Genderen subsequently described and explained the mainly pixel-based image fusion of Earth observation satellite data as a contribution to multi-sensor integration-oriented data processing [5]. A categorization of the multi-scale decomposition-based image

Pixel-Level Fusion
Of the three levels of image fusion, pixel-level image fusion is the lowest level.
Compared with feature-level and decision-level fusion, pixel-level image fusion involves the direct fusion of the source image pixels under strict registration conditions according to a fusion rule. This approach retains as much scene information as possible of the source images, and also has high precision; accordingly, it can be used to improve the sensitivity and signal-to-noise ratio of the signal, thereby facilitating subsequent image analysis, processing and understanding (e.g. feature extraction, image segmentation, scene analysis/monitoring, target recognition, image recovery, etc.). For example, in remote sensing applications, fused images with a high spatial resolution, along with the spectral content of the multi-spectral (MS) images, could be obtained by fusing low MS images and high resolution panchromatic (PAN) images, which would be useful for land-cover classification.
However, there are several key disadvantages of pixel-level fusion. Firstly, pixel-level image fusion requires high registration accuracy between source images, and should generally be achieved pixel-level registration. In addition, the pixel-level image fusion processing has a large amount of data, and is also encumbered by a slow processing speed, and poor real-time performance.
Due to the diversity of source images, along with the diversity of practical fu-  [20]. Table 1 presents the summary of the major pixel-level image fusion methods, along with the transforms and fusion strategies adopted; more details are presented below.

Multi-Scale Decomposition-Based Fusion Methods
Multi-scale decomposition-based fusion methods use a multi-scale decomposition method to decompose multisource images into different scales and different resolutions, and consequently, to obtains low-frequency sub-bands containing image energy information and high-frequency sub-bands containing detailed information of different scales of images; in the next step, the fusion is performed according to different fusion rules on the low-frequency sub-band and high-frequency sub-bands respectively; finally, the multi-scale reconstruction is performed through the fused sub-band image to obtain the final fused image. A schematic diagram of the images fusion scheme based on general multi-scale decomposition is illustrated in Figure 3. Another key factor that affects the fusion results is the fusion strategy. This is a process that determines the formation of the fused image, based on coefficients or pixels of the source images.
The image fusion method based on pyramid transform is simple to implement; however, when the gray value difference between images is large, square traces will appear in the fusion result due to the correlation of features between adjacent scales after the image is transformed by the pyramid. The wavelet transform has a good ability to analysis local time domain and frequency domain for the signal, while the optimal representation of the signal point singularity can be obtained for the one-dimensional segmented smooth signal. For the two-dimensional image signal, however, the commonly used two-dimensional separable wavelet is a "non-sparse" image representation: this is a tensor product of one-dimensional wavelets, with only a finite direction, that is unable to optimally represent a two-dimensional image containing line singularities. In short, a common limitation of the methods in the wavelet family is that they find it difficult to accurately represent the curves and edges of the images.
In order to solve these problems and better represent the images, many mul-  [50]. However, these multi-scale decomposition methods cannot sparsely represent any anisotropic structure; therefore, the ripplet transform is proposed [52]. In [53], moreover, a remote sensing image fusion method based However, since spatial consistency is not adequately considered during the fusion process, the fusion of these methods may result in distortion of the brightness and color values [54].
Recently, edge-preserving filters [55] have been actively researched within the field of image processing, and have also been successfully applied in multi-sensor image fusion. In better represent the intrinsic geometrical structure of images and achieves better performance than the traditional fusion methods [56]. In addition, the combination of Gaussian and bilateral filters is also successfully applied in the fusion of infrared and visible images [57]. Furthermore, some computer vision toolssuch as support value transform [58], log-Gabor transform [59], and anisotropic heat diffusion [60]-have also been applied for multi-scale decomposition-based fusion.
As the authors of [20] point out, the spatial quality of the fused images may be less satisfactory if fewer decomposition levels are applied, while both the performance and computational efficiency of the method may also be reduced if too many decomposition levels are applied. Thus, some researchers have attempted to determine the optimal number of decomposition levels that will yield optimal fusion quality. For example, the authors in [61] compared various multi-resolution decomposition algorithms, focusing on the influence of the decomposition level and filters on fusion performance. They subsequently concluded that a short filter usually provides better fusion results than a long filter, while the most appropriate number of decomposition levels is four. The work of [62] estimated the optimal number of decomposition levels for multi-spectral and panchromatic image fusion with a specific resolution ratio.

Sparse Representation Methods
The research hotspots in the field of sparse representation include the approximate representation of models, uniqueness and stability of model solutions, per-resolution), audio processing (such as blind source separation) and pattern recognition (such as face and gesture recognition). From a practical perspective, targeted flexible models, computational speed, adaptive and high-performance representation results are key ways in which sparse representation methods can achieve advantage in the application domain.
The purpose of the signal sparse representation is to represent the signal with as few atoms as possible in a given over-complete dictionary: this allows a more concise representation of the signal to be obtained, making it easier and more convenient for us to obtain the information contained in the signal.
Utilizing the characteristics of sparse coefficients, Yang and Li were the first to apply sparse representation theory to image fusion. Firstly, the multi-source input images are divided into many overlapping patches in order to capture local salient features and maintain shift invariance. In the next step, in order to obtain the corresponding sparse coefficients, the overlapping patches from multi-source images are decomposed in the overcomplete dictionary. Subsequently, the sparse coefficients from multiple source images are applied to the fusion process. Finally, the image is reconstructed using the fusion coefficients and dictionary [63]. In [64], a novel pan-sharpening method (HR panchromatic and corresponding LR spectral channel fusion), named Sparse Fusion of Image (SparseFI), is proposed. Based on the theory of compressed sensing, it utilizes the sparse representation of the HR/LR multispectral image block in the panchromatic image, along with its down-sampled LR version. Yu et al. used the first model (JSM-1) from the joint sparse representation method proposed in [65] to achieve image fusion [66]. In [67], Liu et al. proposed an improved, compressed sensing-based image fusion scheme. In this method, the low and high sparse coefficients of the source images obtained by discrete wavelet transform are fused by means of an improved entropy-weighted fusion rule and max-abs-based fusion rule respectively. After using the local linear projection of random Gaussian matrix, the fusion image is reconstructed using a compression sampling matching tracking algorithm. Compared with the traditional transform-based image fusion methods, the proposed approach can retain more details, such as edges, lines and contours. Ma et al. proposed an image fusion algorithm based on sparse representation and guided filtering. Firstly, a sparse representation (SR)-based method was utilized to construct the initial fused image from the source images. Then, the spatial domain (SF)-based fused image was obtained in order to make full use of the spatial information of the source image. Finally, the final fused image was obtained by using guided filtering to reconstruct the SR-based fusion image and the SF-based fusion image [68]. In [69], a noisy remote sensing image fusion method based on joint sparse representation Journal of Computer and Communications (JSR) was proposed to fuse SAR images with other source images. Firstly, the redundant complementary sub-images were obtained by the JSR method, and then the complementary sparse coefficients were fused together by using an improved fusion rule based on pulse coupled neural network (PCNN). At the same time, due to the different types of noises between the SAR image and other source images, these noises where treated as complementary information in the source images and suppressed in this step. Finally, the fused image is reconstructed by adding the fused complementary sub-images and the redundant information.

Methods in Other Domains
In addition to those based on multi-scale decomposition and sparse representation, there are also many fusion methods based on other theoretical knowledge.
For instance, [70] proposed a novel fusion framework based on the total spectrum change (TV) method and image enhancement. For the multi-scale decomposition of the spatial variation generated by the spectral variational framework, these authors verified that the decomposition components could be effectively modeled by the tail Rayleigh distribution (TRD) rather than the commonly used Gaussian distribution. Therefore, TRD-based saliency and matching measures were proposed to fuse each sub-band decomposition, while spatial intensity information is also employed to fuse the remaining image decomposition components. Zhang et al. reconstruct the infrared background using quad-tree decomposition and Bessel interpolation. Secondly, by subtracting the reconstructed background from the infrared image, the infrared brightness feature is extracted, and then refined by reducing the redundant background information.
In order to resolve the overexposure problem, the fine infrared features are first adaptively suppressed and then added to the visual image, enabling the final fused image to be obtained [71]. Using fuzzy logic and population-based optimization, Kumar and Dass devised a new fusion method. In a departure from the weighted average of pixels method, the author proposed a method based on total variation to fuse the multiple sensor images. Under the proposed approach, the imaging process was modeled as a local affine model, while the fusion problem was framed as an inverse problem [72]. With the aim of further improving the fuse results, Shen et al. used maximum a posteriori (MAP) estimation in the hierarchical multivariate Gaussian conditional random field model to derive the optimal fusion weights [73]. In [74], the source images were first decomposed into a principal component matrix and a sparse matrix by means of robust principal component analysis; after that, the weights were estimated by taking the local sparse feature as the input of the pulse coupled neural network. [75] further proposed using a linear regression model to generate synthetic components, which were only partially replaced based on the correlation between the intensity component and the panchromatic image. In [76], firstly, the salient structure of the input image was fused in the gradient domain. In the next step, the fusion image was reconstructed by solving the Poisson equation, which ensures that the  [77], the intuitionistic fuzzy set theory was applied to image fusion, which involved transferring input image to the fuzzy domain, after which the maximum and minimum fusion operations were used to achieve fusion processing. In [78], Liu et al. proposed a novel focus evaluation operator based on max-min filter. In the proposed focus metric, the max-min filter was combined with the average filter and the median filter (MMAM) to evaluate the focus degree of the source images.
After that, based on the structure-driven fusion region and the depth information of the blurred images, MMAM is utilized to achieve high-quality multi-focus fused image.

Combination of Different Transforms
The quencies while LSF was employed to enhance the regional features of DCT coefficients that could be useful for image feature extraction [45]. The work of [84] further proposed a new method for fusing visible and infrared images, referred to as DTCWT-ACCD, based on DTCWT and an adaptive combined clustered dictionary. Yang et al. put forward a novel remote sensing image fusion method based on adaptively weighted joint detail injection. In the proposed method, firstly, the spatial details were extracted from MS and PAN images through à trous wavelet transform and multi-scale guided filter; after that, the extracted details were sparsely represented to generate the main joint details by dictionary learning from the sub-images themselves; subsequently, in order to obtain the refined joint details information, an adaptive weight factor was designed considering the correlation and difference between the previous joint details and PAN image details; finally, the fused image was obtained by injecting refine joint details into the MS image using modulation coefficients [85]. In [86], the PAN and source image was decomposed into the base layer and the detail layer, so it could effectively overcome the above two problems, and then achieve high-quality multi-focus image fusion and multi peak image fusion [88].
In [89] [91], in which the SRCNN framework was also used to model the pan-sharpening process as an end-to-end mapping. In this algorithm, the input of the network was the superposition re- To reduce the amount of the input data for CNN, a simple linear iterative clustering method was extended to segment MS images and generate superpixels, and replace pixels with superpixels as the basic analysis unit. In order to make full use of the spatial spectral information and environmental information of superpixels, a multi-local area joint representation method based on superpixels was proposed. Then, an SML-CNN model was established to extract effective joint feature representations, and a softmax layer was used to divide these features learned by multiple local CNNs into different categories. Finally, a multi-information modification strategy that combined the detailed information and semantic information was employed to eliminate the adverse effects on the classification results within and between superpixels, thereby improving the classification performance. Another work that applies CNNs to remote sensing image fusion was introduced by Liu et al. [94], in which an end-to-end learning framework based on deep multi-instance learning (DMIL) was utilized to classify MS and PAN images by using feature-based joint spectral and spatial information. The framework consists of two instances: one for capturing pan spatial information and one for describing MS spectral information, two examples of the characteristics obtained directly in line, could be regarded as simple fusion characteristics. In order to fully integrate spatial spectral information for further classification, the simple fusion feature is entered into a three-layer fully connected fusion network to learn the high-level fusion feature. And in [95], Ma et al. proposed a novel generation countermeasure network called FusionGAN to fuse infrared and visible images. In this network, the generator was utilized to generate the fused image with the main infrared intensity and visible gradient information. The discriminator aimed to prevent the fused image from containing more details in the visible image, so that the fusion image retains both the radiation information in the infrared image and the texture information in the visible image. Journal of Computer and Communications As the authors of [96] point out, a residual network (ResNet) was utilized to make full use of the high nonlinearity of deep learning model to realize image fusion task, and the proposed algorithm could achieve the highest spatial spectral unified accuracy. And in order to obtain clearer and richer texture feature panoramic images, Liu et al. proposed a novel multi focus image fusion algorithm, which combines NSST and ResNet. In the proposed algorithm, NSST was utilized to fully consider the high-frequency details in the image and low-frequency global features. For the high-frequency details, improved gradient sum of Laplace energy was employed to handle high frequency sub-band coefficients of different levels and directions by using different directional gradients. For the low-frequency details, ResNet with a deep network structure is used to obtain the spatial information characteristics of the low-frequency coefficient image [97]. For most fusion methods, the detection of the focus area is a key step. In view of this, Liu et al. proposed a multi-focus image fusion algorithm based on dual convolutional neural network (DualCNN). Firstly, the source images were input into the dual CNN to recover the details and structure from their super-resolution images and improve the contrast of the source images. Secondly, bilateral filtering was applied to reduce the noise of the fused image, and the guided filter was used to detect the focus area of the image and refine the decision map. Finally, the fusion image was obtained by weighting the source image according to the decision graph. Experimental results show that the algorithm can preserve image details well and maintain spatial consistency [98].
Another work that applies CNNs to image fusion was proposed by Zagoruyko and Komodakis, in which a CNN-based model was employed to directly learn a general similarity function for comparing image patches from the image data. Then, by exploring and studying a variety of neural network structures, it was proved that these network structures were particularly suitable for this task, which provides a new idea for studying the effectiveness of convolutional neural network design target fusion indicators [99]. Journal of Computer and Communications between different feature sets, thereby facilitating real-time subsequent decision-making. In other words, feature fusion can enable the maximum efficiency and the minimum dimensional feature vector sets to be obtained, which facilitates the final decision [100].

Feature Selection-Based and Feature Extraction-Based
Techniques Generally speaking, existing feature fusion techniques can be subdivided into two key categories: namely, feature selection-based and feature extraction-based.
In feature selection-based fusion methods, all sets of feature vectors are first grouped together, after which an appropriate feature selection method is utilized. Battiti [101] proposed a method using supervised neural networks, [102] presented a fusion method based on dynamic programming, and Shi and Zhang provided a method based on support vector machines (SVM) [103].
In the feature extraction-based methods, moreover the multiple feature vector sets are combined into a single set of feature vectors, which are input into a feature extractor for fusion [100] [104]. Moreover, the classical feature combination algorithm is feature extraction-based, meaning that it groups multiple sets of feature vectors into one union-vector (or super-vector) [104].
In [100], the union-vector-based feature fusion method is defined as serial feature fusion, while the feature fusion method based on the complex vector is called parallel feature fusion. Qin and Yung devised a method that used localized maximum-margin learning to fuse different types of features during BOCVW modeling for eventual scene classification [105]. A new feature extraction method, based on feature fusion, is proposed in [106] based on the idea of canonical correlation analysis (CCA). In order to classify very high resolution (VHR) satellite imagery, Huang et al. [107] presented a multi-scale feature fusion methodology based on wavelet transform. Furthermore, in order to detect nighttime vehicles effectively, a novel bio-inspired image enhancement method with a weighted feature fusion technique was devised by Kuang and Zhang [108]. Yang et al. [109] presented a novel feature combination strategy, the idea behind which involves combining two sets of feature vectors using a complex vector instead of a real union-vector. Fernandezbeltran et al. proposed a novel pLSA-based image fusion method designed to reveal multimodal modal patterns in SAR and MSI data, thus effectively merging and classifying Sentinel-1 and Sentinel-2 remote sensing data [110]. The work of [111] proposed a novel label fusion method, referred to as Feature Sensitive Label Prior (FSLP), which takes both the variety and consistency of the different features into account as was utilized to gather atlas priors.

Feature-Level Fusion Methods in Other Domains
Classical feature fusion methods represent feature data as real numbers. These and include inference [109] [112] and estimation-based methods [113] [114], as well as methods that employ certain types of feature data [115] [116] [117] [118].
Peng et al. [119] presented a quantum-inspired feature fusion method that uses the maximum mutual von Neumann entropy to define the relationship between quantized feature samples. Peng and Deng developed a quantum-inspired feature fusion method with collision and reaction mechanisms [120]. A new quantum-inspired feature fusion method, based around maximum fidelity in order to better improve the completeness and conciseness of existing feature data, was developed in [121]. In addition, a multi-modal feature fusion-based framework was proposed to improve geographic image annotation: this algorithm leverages a low-to-high learning flow for both the deep and shallow modality features, with the overall goal of achieving effective geographic images representation [122].

Decision-Level Fusion
The goal of decision-level image fusion is to obtain decisions from each source image, then combine these decisions into a global optimal decision reference to certain criteria and the credibility of each decision. Decision-level image fusion is the highest level of information fusion, and the results of this process provide a basis for command and control decisions. At this level of the fusion process, the initial judgments and conclusions for the same target are first established for each sensor image, after which the decision from each source image is processed: finally, the decision-level fusion process is executed in order to obtain the final joint image [4].
A variety of logical reasoning methods, statistical methods, information theory methods, and so forth can be used for decision-level fusion; these include Bayesian inference, Dempster-Shafer (D-S) evidence theory, consensus-based hybrid methods, joint measures methods, voting, fuzzy decision rules (such as fuzzy integral [123], and fuzzy logic [124]), rank-based, cluster analysis, Composite Decision Fusion (CDF), and neural networks. While decision-level convergence has good real-time and fault tolerance, its preprocessing cost is high and information loss is the most. Table 2 illustrates several decision-level fusion methods which are presented in more details in the following. Dempster-Shafer (D-S) [127] [128] Hybrid methods based on consensus [129] Joint Measures Method (JMM) [130] Voting [131] Fuzzy decision rule [123] [124] [132] Rank-based [133] Cluster analysis [134] Composite Decision Fusion (CDF) [135] Adaptive decision fusion based on the local scale of the structure [136] Neural network [137] [138] Journal of Computer and Communications Bayesian inference is an abstract concept that provides only a probabilistic framework for recursive state estimation. Grid-based filters, particle filters (PFs), KF and EKF are all Bayesian-type methods [125]. Although Bayesian inference can effectively solve most fusion problems, the Bayesian method does not consider uncertainty; therefore, errors and complexity may be introduced into the posterior probability measurement [139]. In [126], Bayesian rules were utilized to fuse the results from the fast sparse representation classifier and support vector machine classifier for SAR image target recognition purposes.
The D-S evidence method, which is an extension of Bayesian inference, can be used without the need for prior probability distributions, and can thus being able to deal with uncertainty and overcome certain drawbacks. D-S reasoning clearly indicates that despite a lack of information about propositional probabilities, it can solve some problems that cannot be solved by probability theory.
A new joint measures-based approach to multi-system/sensor decision fusion was described in [130]. The work of [132] introduced a decision fusion method for the classification of urban remote sensing images, consisting of two key steps. In the first step, data was processed by each classifier separately, for each pixel, memberships degree for the considered classes were provided. In the second step, a fuzzy decision rule was used to aggregate the results provided by the algorithms depending on the function of the classifier. A general framework based on the definition of two accuracy measures was then proposed to combine the information of several individual classifiers via multi-class classification. The first measure was a point-by-point measure which estimates, the reliability of the information provided by each classifier for each pixel, while the second measure estimates the global accuracy of each classifier. Finally, the results were aggregated with an adaptive fuzzy operator governed by these two measures.
The ranking-based decision fusion algorithm is an example of a typical decision fusion algorithm. Huan and Pan proposed three different decision fusion strategies: namely, multi-view decision fusion strategy, multi-feature decision fusion strategy and multi-classifier decision fusion strategy. Their work proved that the performance of SAR image target recognition could be improved through the use of these strategies [133].
In [134], a strategy for the joint classification of multiple segmentation levels from multi-sensor imagery, using SAR and optical data, is introduced. Firstly, Journal of Computer and Communications the two data-sets were segmented separately to create independent aggregation levels at different scales; next, each individual level from the two data-sets was then preclassified using a support vector machine (SVM). Subsequently, the original outputs of each SVM (i.e., images showing the distances of the pixels to the hyperplane fitted by the SVM) were used in a decision fusion to determine the final classes. The decision fusion strategy was based on the application of an additional classifier, which was applied to the pre-classification results.
The work of [135] proposed a Composite Decision Fusion (CDF) strategy.
This approach combined a state-of-the-art kernel-based decision fusion technique with the popular composite kernel classification approach, enabling it to deal with the combined classification of a color image with high spatial resolution and a lower-spatial-resolution hyperspectral image of the same scene.

Fusion Strategy
Fusion strategies are important in the context of data fusion tasks from different sensors, and play a significant role in improving the quality of fused images.
Therefore, the design of more advanced fusion strategies is anticipated to be another research direction in the image fusion field.
In [23], the authors reviewed some classical fusion strategies: namely, coefficient, window, and region based activity level measurement (CAM, WAM, RAM), window and region based consistency verification (WRCV), choose-max and weighted-average based coefficient combining methods (CM-WACC), etc.
Moreover, these strategies are widely employed in multi-scale decomposition based image fusion algorithms. Moreover, in order to achieve better fusion performance, scholars have improved the traditional fusion rules and designed some novel one. In [140], image fusion was expressed as an optimization problem, while an information theory method was applied in a multi-scale framework to obtain fusion results. For their part, Zheng et al. use principal component analysis to fuse the basic components [141]. In this method, a choose-max (CM) scheme and a neighborhood morphological processing step were used to increase the consistency of coefficient selection, which reduced the distortion in the fused image. Through the use of a local optimization method, a novel guided filtering approach based on the weighted average method was presented to fuse the multi-scale decomposition of input images [54].
Generally speaking, most sparse representation-based image fusion methods are designed based on traditional fusion strategies, such as CM [88] [142], weighted average-based coefficient combination (WACC) [143] [144] [145], substitution of sparse coefficients (SSC) [146], and WAM [147]. To improve the fusion performance of the sparse representation-based methods, a spatial context-based weighted average (SCWA) was utilized on a sparse representationbased image fusion method; this approach considers not only the detailed information regarding each image patch, but also that regarding spatial neighbors [64]. Similar to the multi-scale decomposition-based fusion methods, weighted averages based on machine learning (MLWA) [148], block and region-based activity level measurement (BRAM) [149] [150] [151], model-based methods (MM) [152], SCWA [153], and component substitution (CS) [154] [155] [156] have been applied to fusion methods in other domains and in combination with different transforms, and have achieved the goal of improving fusion performance.
Many fusion strategies employ pixel-level image fusion. However, in most practical applications, individuals focus on the region-level of the image objects.
Therefore, region-level information should also be considered during image fusion processing. Region-based rules are based on the spatial, inter-scale and intra-scale dependencies of images and can consider their low-level and middle-level structures of images. Thereby, region-based strategies have been widely used in image fusion applications.

Fusion Performance Evaluation
Generally speaking, a good fusion method should have the following characteristics: 1) the fused image should be able to preserve most of the complementary and useful information contained in the input images; 2) the fusion method should not produce visual artifacts that may distract the human observer or disrupt further processing tasks; 3) the fusion method should be robust to certain imperfect conditions, such as mis-registration and noise [20].
As illustrated in Figure 4, the quality and performance evaluation of image fusion can be divided into subjective and objective evaluation. The former can be divided into two main classes, namely interpretability subjective evaluation and forced-choice subjective evaluation. Moreover, objective evaluation can be categorized into information theory-based metrics, image feature-based metrics, image structural similarity-based metrics and human perception-inspired metrics.
The subjective evaluation method-that is, the subjective visual judgment method-evaluates the image quality according to the subjective feeling of the person. However, due to the high cost and the difficulty associated with controlling various human factors (e.g., individual differences, personal perception, biases, etc.), extensive subjective evaluation is not always feasible. The objective evaluation method uses certain mathematical models to simulate the way the human eye perception the fused image and count the amount of image features, contents, or information transferred from the input images to the fused image, enabling it to quantitatively evaluate the quality of the fused image. Objective computational models, also known as fusion metrics, can reveal certain inherent properties of the fusion process or the fused image. The challenge associated with the use of these fusion metrics is as follows: while we can judge the quality of the fused image and source images by comparing the metric values, it is difficult to clarify the significance of the difference between the two index values, such as 0.78 and 0.79 [29]. A summary of the available fusion metrics is presented in Table 3, while these metrics are further summarized and compared in [155]. Table 4 illustrates the expression of the specific metrics shown in Table 3. The specific meaning of each fusion evaluation metric and indicator expression are detailed in [155]- [170].
In Table 4 , , , where, , , , t p h q and Z are the real scalar parameters that determine the shape of the nonlinearity of the masking function. These equations are also valid for the joint distribution between image B and fused image F.  where, W is the number of pixels in the local region W and r is the input images A and B.

Future and Conclusions
Although scholars have proposed a variety of image fusion and objective performance evaluation methods, there are still some problems with these approaches at present. Hence, it is still necessary to improve and innovate the image fusion algorithms in order to adapt to various applications. Potential future researches include the following: 1) Research on the application of image fusion. One of the key points is that, for different areas of image fusion application, the imaging mechanism of the corresponding imaging system and the physical characteristics of the imaging sensor should be analyzed so that a better fusion effect can be obtained. The multi-source image fusion algorithm must be effectively combined with the application in order to better analyze the fusion process and obtain improved results.
2) It is necessary to research multi-scale decomposition and reconstruction methods suitable for image fusion. For multi-scale decomposition based fusion methods, the fusion effect is largely dependent on which multi-scale decomposition methods are chosen. Hence, it is very important to improve the multi-scale decomposition methods available. In addition, it is necessary to study the influence of certain internal factors on the quality of the fused image obtained via the multi-scale decomposition method, as this will help to Journal of Computer and Communications find or improve the multi-scale decomposition method in order to improve the quality of the image fusion.
3) It is demanded to consider the improvements to the integration guidelines. The integration guidelines are key to this type of integration method. At present, the fusion criterion is not limited to simple methods such as coefficient selection and weighted averaging; people are studying ways to integrate human neural networks and other phenomena that can simulate human visual perception or reflect images.
4) It is essential to overcome the effects of mismatch and noise interference on the fusion results. In addition, most image fusion algorithms often assume that the source images have been accurately registered and that no noise pollution is received. In practical applications, however, the multi-sensor images are not only likely to contain mismatches, but may also be impacted by the effects of noise. Therefore, it is necessary to overcome the influence of mismatch and noise interference on the fusion results.
5) It is urgent to research new multi-scale fusion evaluation indices. At present, multi-scale decomposition-based image fusion algorithms are mostly based on the "energy" of pixels (or windows, or regions) as a fusion measure index, which is used to reflect the information contained in the coefficients at each resolution. However, the use of such a fusion measure is not always appropriate. To this end, it is necessary to combine the imaging characteristics of the source image to find a fusion measure index that can more accurately reflect the relative importance of the coefficients at each resolution.
6) It is needed to explore the application of deep learning (DL) in image fusion. In recent years, the use of DL has resulted in many breakthroughs in various computer vision and image processing problems, such as classification, segmentation and object detection. Deep learning-based research has also become an active topic in the field of image fusion over the past three years. The key issues and challenges associated with DL-based fusion algorithms are as follows: firstly, the design of network architecture, including the input, inner and output architecture; secondly, the generation of training datasets; thirdly, the application of conventional image fusion technology to specific fusion problems. At present, although DL-based image fusion research has achieved good results, but it still in the initial stage, there is remains a huge potential for its future development in this field. The application of DL in image fusion should be further explored from the three key problems and challenges of DL-based fusion algorithms.
Multi-sensor image fusion is an effective technology for use in fusing complementary information from multi-sensor images into fusion images. This these include differences resolution between the source images, noise, imperfect environmental conditions, diversity of applications, computational complexity or the limitations of existing technologies. It is therefore, it is expected that new research and practical applications based on image fusion will continue to grow and develop over the next few years.