^{1}

^{*}

^{2}

^{1}

^{3}

^{4}

In a context marked by the proliferation of smartphones and multimedia applications, the processing and transmission of images ha ve become a real problem. Image compression is the first approach to address this problem, it nevertheless suffers from its inability to adapt to the dynamics of limited environments, consisting mainly of mobile equipment and wireless networks. In this work, we propose a stochastic model to gradually estimate an image upon information on its pixels that are transmitted progressively. We consider this transmission as a dynamical process, where the sender push es the data in decreasing significance order. In order to adapt to network conditions and performances, instead of truncating the pixels, we suggest a new method called Fast Reconstruction Method by Kalman Filtering (FRM-KF) consisting of recursive inference of the not yet received layers belonging to a sequence of bitplanes. After empirical analysis, we estimate parameters of our model which is a linear discrete Kalman Filter. We assume the initial law of information to be the uniform distribution on the set [0, 255] corresponding to the range of gray levels. The performances of FRM-KF method ha ve been evaluated in terms of the ratios in the quality of data image/size sent and in the quality of image/time required for treatment. A high quality was reached faster with relatively small data (less than 10% of image data is needed to obtain up to the sixth-quality image). The time for treatment also decreases faster with number of received layers. However, we found that the time of image treatment might be large starting from a image resolution of 1024 * 1024. Hence, we recommend FRM-KF method for resolutions less or equal to 512 * 512. A statistical comparative analysis reveals that FRM-KF is competitive and suitable to be implemented on limited resource environments.

Transmission of digital images has been widely studied, since the early years of the Internet [

The primary objective of PIT is to transmit a significant and interpretable core of the image and subsequently transmit complements layers in order to gradually improve the quality. This method requires a preparation of the image to be transmitted. PIT techniques can be grouped into three main areas: the spatial domain [

In this work, we are interested in the progressive transmission and refinement of still images, as a process that adapts to low quality network service. A special focus is made on JPEG2000 format since it is the most used standard nowadays [

The rest of the paper is organized as follows. In Section 2, we review the literature works about progressive image transmission. Section 3 deals with the theoretical foundations on the discrete Kalman Filtering, while Section 4 presents the proposed method modeling image transmission as a filtering procedure. In Section 5, we apply the FRM-KF on a standard image gallery and discuss its performance.

PIT techniques can be grouped into three main areas namely the spatial domain, the methods based on transformed domain and the pyramid structured domain.

Spatial domain methods are based on the bit-plane decomposition (BPDM) [

The bit plane decomposition method is the most intuitive one when tackling the problem of progressive transmission. Indeed, the level of gray of each pixel in an image is coded over 8 bits having different significances. The collection of the i^{th} significant bits of all pixels constitutes the i^{th} bit plane to be transmitted at the i^{th} step. On the receiver side, the binary image will be rebuilt after receiving a certain number of bit planes, and gradually refined with the arrival of the other planes. BPDM does not introduce any distortions but it suffers from a lack of flexibility and limited performance in terms of adaptation to variations in network conditions. Improvements of BPDM in terms of reduction of storage space and therefore transmission time are available in the literature: quantification of pixels and selection of areas of interest with higher priorities [

During the vector quantization the pixels are grouped in blocks (code-blocks) which are transformed each into a vector. The obtained vectors are grouped into a lighter structure called code-book where they are codewords. Codewords are progressively transmitted and used to produce an approximative image on the receiver side. The main available improvement of the VQM is the Tree-sourced Vector Quantization method (TSVQM) which consists to transmit first the vector quantizations contributing more quickly to obtain a better image quality [

The main goal of transform based methods (e.g. Discrete Cosines Transform (DCT)) is to achieve the concentration of energy in low-frequency areas which are grouped into a small number of coefficients. The low frequency coefficients have a strong and decisive impact on the final and overall quality of the image. Before their transmission in the decreasing order of importance, those coefficients are hierarchized following a technical scanning pattern (e.g. the zigzag scan used in JPEG [

The pyramidal shape is ideal for progressive transmission. To form a pyramid, an image is reduced in terms of resolution according to a predefined method such as Discrete Wavelet Decomposition (DWT) [

All the above techniques do not integrate a prediction on the data not yet received. Such an inference allows a faster access to a transmitted image. A method trying to achieve that goal is the pixel interpolation permitting to estimate not yet received data using a model constructed based on available data. The SIDE-MATCH algorithm is an implementation of interpolation method [

Filtering is a procedure which aims at estimating the state of a given dynamic system with noisy observations. Usually, the outputs are given as a sequence: { Y n } n ∈ T . T ⊆ ℝ denotes a set of time values. It can be discrete or continuous, depending on data availability and observation rate. Each output Y n is related to an unknown or partially known state X n through a stochastic model of the form

Y n = H n ( X n ) + V n , (1)

where V n is the noise occurring in the measurement procedure. H n represents and averaged relationship between Y n and X n . In other terms it is a trend of evolution of Y as a function of X The observation noise is usually assumed to be a normal or Gaussian random variable [

A large class of these applications is covered by the discrete filtering that can be described by the general linear problem

{ X n + 1 = A n X n + B n + ε n Y n + 1 = C n + 1 X n + 1 + D n + 1 + ω n + 1 ε n ⇝ N ( 0, K n ) , ω n ⇝ N ( 0, W n ) (2)

where A n , B n , C n and D n are matrices expressing the dynamics of the signal and the observation. The filtering problem (2) has an explicit solution in the Gaussian linear case known as the “discrete Kalman Filter” which is presented as follows. Let

X n p = E [ X n | Y 0 , ⋯ , Y n − 1 ] , (3)

X n e = E [ X n | Y 0 , ⋯ , Y n ] (4)

and Q n p = V a r ( X n − X n p ) , Q 0 p given by the law of X 0 and X 0 p = X 0 e . The filtering equations are given as

{ x n + 1 p = A n ( I − N n C n ) x n p + A n N n ( y n − D n ) + B n x n + 1 e = ( I − N n + 1 C n + 1 ) x n + 1 p + N n + 1 ( y n + 1 − D n + 1 ) N n = Q n p C n T ( C n Q n p C n T + W n ) − 1 Q n + 1 p = A n ( I − N n C n ) Q n p A n T + K n (5)

Since Q n e = V a r ( x n − x n e ) , one has Q n e = ( I − N n C n ) Q n p + N n W n N n T . On the other hand, Q 0 p = V a r ( x 0 ) + V a r ( x 0 p ) and if x 0 p is chosen as being constant (0 for example), then its variance will be null and Q 0 p = V a r ( x 0 ) . The techniques developed in the linear filtering can sometimes be extended to the nonlinear case by the mean of linearization methods [

We consider the progressive transmission of a JPEG2000 image, encoded in bitplane. We assume the transmission is done bitplane by bitplane, over a narrow network channel. Because of the poor network quality, the receiver cannot wait until all the data are transmitted before decoding and displaying the image. Moreover, the transmission can unpredictably stop at any time. Thus, the receiver has to use the data received so far to estimate as better as possible the whole image. A first approach consists in simply refreshing the estimated image with newly received layers and in displaying the result when its quality reaches a given threshold. Instead, we learn from successive bitplanes or layers, considered as partial observations of the image, to infer the missing parts. Hence, the bitplane transmission can be viewed as a dynamic system with partial observations. Since image structures are variables, we can use a representative sample of coefficients for statistical inference.

Let S be such an image, and { L n } 1 ≤ n ≤ M the sequence of bitplanes extracted from S. Transmitting S consists in transmitting the layers L M , L M − 1 , ⋯ , L 1 . We call X n the part of S yet to be transmitted, after the sequence L M , ⋯ , L M − n + 1 , has been transmitted: that is the “residual” S − { L M , ⋯ , L M − n + 1 } . For convenience, we also say X 0 = S , Y 0 = 0 and Y n = L M − n + 1 .

Deterministically, we can then write

{ X n + 1 = X n − Y n + 1 X 0 = S , Y 0 = 0 (6)

However, following our purpose of inference, a stochastic description is needed here. Hence, from the receiver’s viewpoint, the following model that recalls the problem (2) can be considered:

X n + 1 = α X n − β − ε n (7)

Y n + 1 = 1 − α α X n + 1 + β α + ε n α + ω n + 1 (8)

with α , β ∈ ℝ , ε n ⇝ N ( 0, γ n 2 ) , ω n ⇝ N ( 0, σ n 2 ) , where γ 0 > 0 , γ n + 1 = a n + 1 γ 0 , σ 0 = 0 and σ n + 1 = b n σ 1 + c 1 − b .

Equation (7) describes the dynamics of the remaining information to be received while Equation (8) gives the next layer to be received. Indeed, we make the hypothesis of an arithmitico-geometric progression of the part of the image that remains to be sent ( X n ). On the same manner, we assume an affine relationship between the current layer to be sent ( Y n ) and the current part of the image that remains to be sent ( X n ). The choice of an affine model is as simple as natural for a first modeling that will prove otherwise reasonable. Notice that Y 0 = 0 and that by formulation of the problem, X 0 follows the uniform law U ( [ 0 ; 255 ] ) . Indeed, except that they belong to [ 0 ; 255 ] ∩ ℕ , we do not have any prior information on coefficients, and completed information is given by

S = X n + ∑ i = 0 n Y i , ∀ n = 0 , ⋯ , M . (9)

Equation (7) underlines an exponential variation of estimation errors both with their variances. The filtering procedure will consist in determining the mathematical expectation of X n conditionally upon Y 0 : n , at the step n = 1 , ⋯ , 7 . The choice of the upper bound of n = 7 is motivated by the fact that we process images by channels. And for a real color image, we have red, green and blue channels each coded on 8 bits (numbered from 0 to 7). The estimation S n of S is given by

S n = E [ X n | Y 0 , ⋯ , Y n ] + ∑ i = 0 n Y i . (10)

Proposition 1. If 0 < | α | < 1 then

lim n → ∞ E [ X n ] + β 1 − α = lim n → ∞ E [ Y n ] = 0.

Moreover, if 0 ≤ a < 1 and 0 < b < 1 , then

lim n → ∞ V a r [ X n ] = 0 and lim n → ∞ V a r [ Y n ] = c 1 − b .

Proof. Let U n = E [ X n ] . One has

U n + 1 = α U n − β

and

U n + 1 = α U n − β = α n + 1 U 0 − 1 − α n + 1 1 − α β

Hence, E [ X n ] = α n E [ X 0 ] + 1 − α n 1 − α β and if | α | < 1 then lim n → ∞ α n = 0 and lim n → ∞ E [ X n ] = − β 1 − α . On the other hand, E [ Y n ] = 1 − α α E [ X n ] + β α and

lim n → ∞ E [ Y n ] = lim n → ∞ 1 − α α E [ X n ] + β α = 0 (11)

For the second part of Proposition 1,

V a r [ X n + 1 ] = α 2 V a r [ X n ] + V a r [ ε n ] (12)

= α 2 ( n + 1 ) V a r [ X 0 ] + 1 − ( a 2 α − 2 ) n + 1 1 − a 2 α − 2 α 2 n γ 0 2 (13)

and

V a r [ Y n + 1 ] = ( 1 − α ) 2 V a r [ X n ] + V a r [ ε n ] (14)

+ V a r [ ω n + 1 ] (15)

= ( 1 − α ) 2 V a r [ X n ] + V a r [ ε n ] (16)

+ ( b n σ 1 + c 1 − b ) 2 (17)

Hence, if additionally 0 ≤ a < 1 et 0 < b < 1 , then lim n → ∞ V a r [ X n ] = lim n → ∞ V a r [ ε n ] = 0 and lim n → ∞ V a r [ Y n ] = c 1 − b .

Proposition 1 illustrates the fact that in the long run ( n → ∞ ), the remaining information about the image is predictable and tends to − β 1 − α . On the other hand, the layers to be received tend to zero in the long run. That is realistic since only a finite number of layers are sufficient. Following the same principle, the remaining information shall be null in the long run. Thus, we should have β = 0 . We adopt it later in the work.

The use of the Kalman filter also gives us the benefit of its memoryless characteristic: it only retains the previous state to infer the current one. So it is not necessary to keep track of all the previously computed states in memory for the prediction method.

The dynamics of the conditional distribution law (characterized by its mean vector and its variance-covariance matrix) is stirred by the filtering equations. In order to determine coefficients α , β , a, b and c, we proceed by statistic regressions on the sample [ 0 ; 255 ] ∩ ℕ corresponding to all possible values of a block of pixels. Regression aims at identifying the best set of parameters which minimize the sum square error (SSE) of the best fitting model. Precisely, we shall determine α and β which minimize the quantity

SSE 1 = ∑ n = 0 7 ( 1 256 ∑ i = 0 255 ( X n + 1 i − α X n i + β ) ) 2

= ∑ n = 0 7 ( X ¯ n + 1 − α X ¯ n + β ) 2 (18)

Since we adopted β = 0 , it remains to find α in such a way that (18) is minimal. After α and β have been identified, one can obtain consecutively a, b and c by minimizing the following SSEs:

SSE 2 = ∑ i = 1 n ∑ n = 0 7 ( γ n + 1 − a γ n ) 2 = ∑ n = 0 7 ( S X n + 1 − a S X n ) 2 (19)

and

SSE 3 = ∑ i = 1 n ∑ n = 0 7 ( σ n + 1 − b σ n − c ) 2 = ∑ n = 0 7 ( S Y n + 1 − b S Y n − c ) 2 . (20)

In (18), (19) and (20), we have for Z = X , Y ,

Z ¯ n = 1 256 ∑ i = 0 255 Z n i and S Z n 2 = 1 255 ∑ i = 0 255 ( Z n i − Z ¯ n ) 2 .

Following the aforementioned regressions, we obtained

Note that all the parameters satisfy the hypotheses of Proposition 1 and therefore guarantee the exponential convergence of the filter.

This section aims at applying the filtering procedure we described above to a sample of 210 images coming from the University of Southern California-Signal

Parameters | Values |
---|---|

α | 0.4942136 |

β | 0 |

a | 0.2499698 |

b | 0.7850239 |

c | 5764.0507 |

and Image Processing Institute USC-SIPI^{1} database. We used a computer workstation with the following characteristics: RAM: 4 GB, Processor: 4xIntel (c) CoreTM i3-3227U CPU @1.90 GHz of 32 bits on a Ubuntu operating system 18.04 (Linux kernel: 4.15.0-74-generic).

According to Section 3 we have A n = α , B n = β , C n = 1 − α α , D n = β α , K n = γ n 2 , W n = γ n 2 α 2 + σ n 2 , Q 0 p ∈ { 255 2 12 ; 255 2 6 } . Let us recall here that for a Uniform law U ( [ a , b ] ) the variance is given by ( b − a ) 2 12 .

evolution of the visual rendering of images following the quality layers reception and the filtering procedure.

Compared to the results in [^{rd} step (

A regression analysis showed for all considered methods that there is an affine relation between the number of received layers and the measured PSNR (at least 93% for the adjusted R-squared) with high significant^{2} slope and intercept.

The database coming from the USC-SIPI contains 73 images having a 256 * 256 resolution, 83 images having a 512 * 512 resolution, 53 images having a 1024 * 1024 resolution and only 1 image having a 2050 * 2050 resolution. For our statistical analyses we then focused on 256 * 256, 512 * 512 and 1024 * 1024 resolutions. Again we found an affine relation between the PSNR and the number of received layers. The P-value was less than 2 × 10^{−}^{16} and the adjusted R^{2} (model fitting factor) was between 88.77% and 96.96%. In order to give a general behavior of the FRM-KF method, the computed values of the slope and the intercept are given in

256 * 256 resolution | ||
---|---|---|

Method | Slope | Intercept |

SPIHT | 3.95*** | 8.72** |

Tung | 3.85*** | 9.97** |

Tzu-Chueng | 2.19*** | 15.95*** |

FRM-KF | 6.44*** | 5.44* |

512 * 512 resolution | ||

SPIHT | 3.43*** | 10.66*** |

Tung | 3.32*** | 11.90*** |

Tzu-Chueng | 2.47*** | 14.44*** |

FRM-KF | 6.45*** | 5.42* |

256 * 256 resolution | ||
---|---|---|

Method | Slope | Intercept |

SPIHT | −2.12^{•} | 2.26 |

Tung | −2.09* | 3.20 |

Tzu-Chueng | −4.49** | 11.14** |

512 * 512 resolution | ||

SPIHT | −2.64* | 4.23 |

Tung | −2.64* | 5.20 |

Tzu-Chueng | −4.24** | 9.73* |

Slope | Intercept | |
---|---|---|

256 * 256 resolution | 6.6557*** | 3.6186*** |

512 * 512 resolution | 6.66706 *** | 3.51488*** |

1024 * 1024 resolution | 6.3676 *** | −1.9659** |

We evaluated the time needed to decode the images. The first phase consisting to generate white noise, to decode the first quality layer of the original image, and to combine the both took about 2.24 × 10^{−}^{1} ± 2.868 × 10^{−}^{2}, 8.87 × 10^{−}^{1} ± 5.612 × 10^{−}^{2} and 3.505 ± 1.587 × 10^{−}^{1} (in terms of average ± standard deviation) seconds respectively for 256 * 256, 512 * 512 and 1024 * 1024 resolution images. The necessary times to decode each other quality layer and to combine it with previous result, were given by 3.149 × 10^{−}^{2} ± 4.012 × 10^{−}^{3}, 1.223 × 10^{−}^{1} ± 1.024 × 10^{−}^{2}, 4.95 × 10^{−}^{1} ± 4.009 × 10^{−}^{2} seconds respectively for 256 * 256, 512 * 512 and 1024 * 1024 resolution images. The images used on current mobile devices have a resolution of at least 512 * 512. With regard to the time corresponding to the processing of the 1024 * 1024 resolution image, we recommend the FRM-KF method to resolutions less or equal to 512 * 512.

Focusing on the amount of data transmitted during a streaming of images for each quality layer, we notice that less than 10% of image data is needed to obtain up to the sixth-quality image. So, the process is suitable in terms of processing and memory resources for small devices with low computing capabilities.

This work addressed the problem of image transmission in limited environment. We were interested in the progressive transmission and refinement of still images, as a process that adapts to low quality network service. In order to achieve our objectives, we proposed a stochastic model which presents the missing parts of the image as noise effects. In a stochastic context, the problem of estimating dynamically a signal conditionally upon available observations is known as filtering. Thus, we tried successfully to calibrate a Kalman filter model using statistical regression and some general considerations. The output model we got was precisely a discrete Kalman filter.

Applying the filtering procedure on a dataset of 209 images we got satisfactory results. Indeed, we evaluated the evolution of Peak Signal to Noise Ratio (PSNR) with respect to the number of received layers. An affine relation was found independently on the PIT method we considered (Set Partitioning In Hierarchical Tree, Tzu-Chuen, Tung and FRM-KF methods). The FRM-KF approach we proposed appeared to be one which improves the PSNR faster.

The performance of FRM-KF method has been further evaluated in terms of the ratios in the quality of data image/size sent and in the quality of image/time required for treatment. A high quality was reached faster with relatively small data (less than 10% of image data is needed to obtain up to the sixth-quality image). The time for treatment also decreases faster with number of received layers. However, we found that the time of image treatment might be large starting from a image resolution of 1024 * 1024. Hence, we recommend FRM-KF method for resolutions less or equal to 512 * 512.

In future works, we are expected to extend our method in multimedia communication environments, subject to disturbances, in order to ensure robustness to breakdowns and interference. We should also consider adapting our approach to video streaming in order to ensure a greater continuity of video streaming service content.

The authors declare no conflicts of interest regarding the publication of this paper.

Saoungoumi-Sourpele, R., Nlong, J.M., Fotsa-Mbogne, D.J., Kamdjoug, J.-R.K. and Bitjoka, L. (2021) Full Image Inference Conditionally upon Available Pieces Transmitted into Limited Resources Context. Journal of Signal and Information Processing, 12, 57-69. https://doi.org/10.4236/jsip.2021.123003