Perceptually Lossless Compression for Mastcam Multispectral Images: A Comparative Study

The two mast cameras, Mastcams, onboard Mars rover Curiosity are multispectral imagers with nine bands in each. Currently, the images are compressed losslessly using JPEG, which can achieve only two to three times of compression. We present a comparative study of four approaches to compressing multispectral Mastcam images. The first approach is to divide the nine bands into three groups with each group having three bands. Since the multispectral bands have strong correlation, we treat the three groups of images as video frames. We call this approach the Video approach. The second approach is to compress each group separately and we call it the split band (SB) approach. The third one is to apply a two-step approach in which the first step uses principal component analysis (PCA) to compress a nine-band image cube to six bands and a second step compresses the six PCA bands using conventional codecs. The fourth one is to apply PCA only. In addition, we also present subjective and objective assessment results for compressing RGB images because RGB images have been used for stereo and disparity map generation. Five well-known compression codecs, including JPEG, JPEG-2000 (J2K), X264, X265, and Daala in the literature, have been applied and compared in each approach. The performance of different algorithms was assessed using four well-known performance metrics. Two are conventional and another two are known to have good correlation with human perception. Extensive experiments using actual Mastcam images have been performed to demonstrate the various approaches. We observed that perceptually lossless compression can be achieved at 10:1 compression ratio. In particular, the performance gain of the SB approach with Daala is at least 5 dBs in terms peak signal-to-noise ratio (PSNR) at 10:1 compression ratio over that of JPEG. Subjective comparisons also corroborated with the objective metrics in that perceptually lossless compression can be achieved even at 20 to 1 compression.


Introduction
Image compression is a well-developed field [1]. People have focused on lossless compression for secured data storage/transmission, or aggressive lossy compression for mobile applications in the past. Lossless compression can only achieve 2 to 3 times of compression. Aggressive lossy compression usually aims for 20 or more times of compression. Recently, we have been focusing on achieving something in the middle. That is, we aim at achieving perceptually lossless compression with a compression ratio of 10. Such a requirement is necessary for many commercial and military applications where users want to achieve a compromise between compression quality and bandwidth usage.
Mars rover Curiosity has many instruments onboard for Mars data collection and in-situ surface characterization [2]. Alpha Particle X-ray Spectrometer (APXS) [3], Laser Induced Breakdown Spectrometer (LIBS) [4] [5], and Mastcam [6] [7] [8] [9] are well-known ones. Quite a few of these instruments are imagers that fight for limited bandwidth to transmit data back to Earth.
Currently, the Mastcam images are all compressed using JPEG, a technology of 90's [10]. Although JPEG [10] is simple and efficient, the compression ratio for lossless compression can be at most between two to three times. There are new compression standards developed in the past two decades. Well-known video codecs include J2K [11], X264 [12], and X265 [13], which are also applicable to still image compression. J2K, X264, and X265 also provide lossless compression options. In some applications such as security monitoring where video quality is of prime importance, people are still using lossless image compression algorithms such as JPEG and J2K for compressing videos frame by frame. JPEG, X264, and X265 are discrete cosine transform (DCT) based algorithms and J2K is wavelet based. About 15 years ago, there were some developments in DCT based algorithms where overlapped blocks known as lapped transforms (LT) were used to further improve the compression [14]. In the past few years, a group of researchers have incorporated LT [14] into an open source codec known as Daala [15]. Daala can be used for both still and video compression.
There is also a lossless option.
In our earlier papers [16] [17], we have proposed and evaluated two approaches to Mastcam image compression. One of them is a two-step approach [16], which we first applied Principal Component Analysis (PCA) [18] [19] [20] [21] [22] to compress the nine-band Mastcam image cube to three bands or six bands and then a conventional codec (JPEG, J2K, X264, X265) was applied to further compress the three or six PCA bands. It was observed that using six PCA C. Kwan, J. Larkin Journal of Signal and Information Processing bands yielded better performance than that of using three bands. Another approach [17] is the split band (SB) approach, which splits the nine bands into three groups and apply still image compression to each group separately.
In this research, we present a thorough comparative of four approaches to Mastcam image compression. In addition to the two earlier approaches [16] [17], we propose a new video approach to compressing Mastcam images by treating the three groups of three bands as video frames. Moreover, we also include a study by using PCA only for compression. This is because PCA itself is a good compression technique. In all of the four approaches, we have added Daala in our experiments. That is, five image codecs in the literature (JPEG, J2K, X264, X265, and Daala) were evaluated to see which one can achieve perceptually lossless compression with compression ratio of 10:1. It is important to emphasize that perceptual performance assessment requires a suitable metric. Some conventional metrics such as peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) may not match well with human's subjective evaluations. Recently, some researchers have developed metrics known as human visual system (HVS) and HVS with masking (HVSm) [23] that correlate well with human perceptions. We have incorporated HVS and HVSm into our comparative studies.
Subjective evaluations for RGB Mastcam images have also been carried out. It was observed that perceptually lossless compression can be achieved even at 20 to 1 compression. Most importantly, at 10 to 1 compression, some objective metrics are 5 to 10 dBs higher than JPEG when Daala was used.
The first key contribution of our project is to propose a video approach to compressing multispectral image data cube. This video approach can be applied to any multispectral or hyperspectral images. The second contribution is to compare the video approach with three alternatives (SB approach, two-step approach, and PCA only approach). The third contribution is a thorough comparison of the four approaches to Mastcam images. No one, except our group, has thoroughly studied this Mastcam image compression problem before.
Our paper is organized as follows. Section 2 summarizes the technical approach and its components. Section 3 summarizes all the experiments using actual images that are of interest to our customer. Finally, concluding remarks will be given in Section 4.

Video approach
As mentioned earlier, we have four approaches to compression Mastcam images. Two approaches were described in earlier papers. In this research, we propose a video approach to compression. Our overall technical approach can be summarized as follows. First, some preprocessing steps are used to make sure the image height and width are even numbered, normalized, and in double.
Second, the nine-band image cube is converted to three groups of three-band C. Kwan, J. Larkin Journal of Signal and Information Processing images. Third, since different codecs require different input formats, the images are saved to appropriate formats such as Y4m or YUV444. Fourth, we will apply the various compression algorithms (JPEG, J2K, X264, X265, and Daala) to the three-frame video and generate various performance metrics. There are four performance metrics. Each metric is computed by comparing the reconstructed nine-band cube with the original 9-band cube. The compression ratio is generated by comparing the size the compressed file with the size of the original image cube. Figure 1 illustrates the work flow of the video approach. We include some details for some of the blocks.  Here, we briefly summarize an alternative approach, which was presented in [17]. Details can be found in [17]. In [17], we did not include Daala. Here, we include new results with Daala as one of the codecs. The overall architecture is very similar to that of the video approach, except that each 3-band image is now compressed independently.

Two-step approach
Some preliminary results of this two-step approach were presented in [16].
The detailed signal flow can be found in Figure 2. The first step is to apply PCA to compress the nine-band Mastcam image cube to three or six bands. The second step is to compress the three or six bands using conventional codecs. In [16], we found that six-band has better performance. It should be noted that we did not include Daala in [16] because at that time, we did not know Daala has a newer version that outperforms an old version developed about 2 years ago.

One-step PCA only approach
It is well-known that PCA can be used for data compression. Actually, PCA, unlike DCT and wavelet, is a data dependent approach that yields optimal data compression. Figure 3 illustrates the PCA approach.

Compressing RGB only
The 5, 4, 2 bands in the Mastcam image cube are the RGB bands. We have performed a separate study specifically for RGB bands. It is similar to the SB approach.

Brief Overview of Relevant Compression Algorithms
In this paper, we will compare image codecs in the market and objectively evaluate different codecs and eventually recommend the best codec to our customer.
With the above in mind, we performed a brief summary of the existing high performance codecs.   These video compression algorithms are owned by Google. The performance is somewhat close to X-264. We did include VP8 and VP9 in our study because they are not as popular as X264 and X265.
It has good still image compression.
• X-265 [13]: This is the next-generation video codec and has excellent still image compression and video compression. However, the computational complexity is much more than that of X264. In general, X265 has the same basic structure as previous standards and contains many incremental improvement over X264. It should be noted that X264 and X265 are optimized versions of H264 and H265, respectively.
• Daala [15] Recently, there is a parallel activity at xiph.org foundation, which implements a compression codec called Daala [15]. It is based on DCT. There are pre-and post-filters to increase energy compaction and remove block artifacts. Daala borrows ideas from [14].
The block-coding framework in Daala can be illustrated in Figure 4.

Performance Metrics
In almost all compression systems, researchers used peak signal-to-noise ratio (PSNR) or structural similarity (SSIM) to evaluate the compression algorithms. Given a fixed compression ratio, algorithms that yield higher PSNR or SSIM will be regarded as better algorithms. However, PSNR or SSIM do not correlate well with human perception. Recently, a group of researchers investigated a number of different performance metrics [23]. Extensive experiments were performed to investigate the correlation between human perceptions with various performance metrics. According to the results found in [23], it was determined that two performance metrics correlate well with human perception. One image example shown in Figure 5 demonstrates that HVS and HVS-M have high correlation with human subjective evaluation results. In the past, we have used HVS and HVS-m in several applications [30] [31].

Mastcam Imager and Data
Mastcam imager information is shown in Figure 6 and Table 1. There are 6 overlapping bands and 3 non-overlapping bands (L3, L4 and L5 from the left camera and R3, R4, and R5 from the right camera). More details about Mastcam can be found in [2].

Mastcam Image Compression Results: Video Approach
When using the Video approach for Daala, the nine-band image is saved as a three-frame video in the Y4M format. While Daala can also be used to encode still images, it was created with video compression as its primary use (Figure 7).  Table 2 and Table 3 Figure 10, Figure 11, Table 4, and Table 5  In general, the performance metrics of right images are higher than those of the left images. The reason is that the modern codecs have better mechanisms to compress high resolution images.

Compression Using PCA Only Approach
PCA has been used as a compression tool in the past. It is also known as KLT (Kohonen Loeve Transform). Here, we briefly summarize the application of PCA only to compress left and right Mastcam images. Figure 12 and Figure 13 show that the compression performance of PCA3 and PCA6 is not enough to reach 10 to 1 compression. For instance, PCA3 achieved a PSNR of 41.25 dBs at a compression ratio of 0.18. We will see in Section 3.5 that it will be good to combine PCA with other codecs to further improve the compression performance to 10 to 1.

Compression Using a Two-Step Approach
The first step performs the PCA compression. In the second step, we used the Journal of Signal and Information Processing      Table 6, Figure 15, and Table 7 show the performance metrics of      right images, the differences between Daala and JPEG are even bigger.

Comparison between Video and SB Approaches
Now, we would like to compare the Video and SB approaches. Figure 16 and Journal of Signal and Information Processing Figure 17 show the detailed comparison between video and the SB approaches for the left and right images, respectively. Let us first focus on the left images near the compression rate of 0.1. We have the following observations:   • JPEG has the highest PSNR than others.
• Daala has a very high HVSm (>45 dB) in both video and SB approaches.
• Daala SB is slightly stronger than the Video approach.
• Daala SB produces a PSNR that is slightly weaker than J2K.
• Daala SB Produces a stronger SSIM than video but both approaches are below J2K.
For right images, we see a similar trend as above except that the metrics are all higher in right images.
Hence, video approach is slightly worse than SB because video has more overhead and there are only three frames. If there are more frames, the Video approach may have edges over the SB approach. Since our interest is to achieve perceptually lossless compression, Daala is a better choice than others. In addition, Daala is amenable to parallel processing whereas J2K requires the whole image for processing and is not suitable for parallel implementation.
C. Kwan, J. Larkin Journal of Signal and Information Processing

Comparison between the Video and Two-Step Approaches
Here, we would like to compare the video and the 2-step approaches. Figure 18 and Figure 19 show the detailed comparison between Video and the 2-step approaches for the left and right images, respectively. Let us first focus on the left images near the compression rate of 0.1. In terms of HVS and HVSm, the Video approach with Daala is better than others most of the time. The Daala's performance is 7 dB and 10 dB better than JPEG in HVS and HVSm, respectively.
X265 is the second best in HVS and HVSm. Finally, all the metrics in right images are higher than those corresponding left ones. Figure 20, Figure 21, Table 8, and Table 9 show the metrics of different codecs for left and right RGB images, respectively. It can be seen that, at 0.1 compression     ratio, J2K performed the best in terms of PSNR and SSIM. However, in terms of HVS and HVSm, Daala is the best performing one. It is also somewhat surprising to notice that JPEG is consistently the third best one in all metrics.

Compression of RGB Images Only
To subjectively evaluate the different codecs, we include three case studies.
Case 1 is for compression ratio near 0.1 compression ratio. Figure 22 shows the original and 5 reconstructed images from JPEG, J2K, X264, X265, and Daala. We observe no perceptual loss of quality as compared to the original image. Case 2 is for compression ratio near 0.05 compression ratio. Again, it is still difficult to spot any artifacts between the reconstructed images and the original images shown in Figure 23. Case 3 is for compression ratio near 0.0251 compression ratio, which corresponds to 40 to 1 compression. In this case, we start to visualize some artifacts ( Figure 24) in JPEG, X264, and X265. However, Daala and J2K still do not have visible artifacts.

Discussions
• Comparison of different approaches In this paper, we propose a new video approach to compressing Mastcam multispectral images and compare with earlier approaches (PCA only, SB, and two-step). Through extensive experiments using actual Mastcam images, we observed that the SB approach yielded slightly better performance than others.
• Comparison of different codecs In each approach, we have compared five codecs. Amongst them, we observed that Daala has the best performance in terms of HVS and HVSm. The improvement of Daala over JPEG is more than 5 dBs in those metrics at 10 to 1 compression.  Since HVS and HVSm correlate well with human perception, we believe Daala is a good candidate to replace JPEG. For the two conventional metrics (PSNR and SSIM), we occasionally observed that JPEG and J2K has slight edge over other codecs. It is somewhat surprising to notice that the two popular codecs, X264 and X265, did not yield good results at 10 to 1 compression. However, they did sometimes have reasonable performance at lower compression cases such as 5 to 1 or 3 to 1.

• Computational Complexity
Daala is DCT based and is hence amenable to parallel processing. J2K, on the other hand, is a wavelet based approach that requires the whole image. Hence, Daala is more efficient for fast processing.
• Subjective Comparisons Through visual experiments using RGB images, we noticed that at 10:1 compression, all codecs have almost no loss. Even at 20:1 compression, it is still hard to notice any artifacts. However, at 40 to 1 compression, JPEG, X264 and X265 start to see some color distortions and block artifacts. Daala and J2K still performed reasonably well at 40:1.

Conclusions
One key objective in our research is to achieve perceptually lossless compression However, we recommend that 10:1 compression should be deployed because we believe the NASA wants to preserve the fidelity of the images as much as possible. At 10: 1 compression, it is 3 or 4 times better than lossless compression in terms of bandwidth saving. Journal of Signal and Information Processing One future direction is to investigate how we can create a customized package for NASA. The package will essentially replace JPEG.