<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.4 20241031//EN" "JATS-journalpublishing1-4.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.4" xml:lang="en">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">ojapps</journal-id>
      <journal-title-group>
        <journal-title>Open Journal of Applied Sciences</journal-title>
      </journal-title-group>
      <issn pub-type="epub">2165-3925</issn>
      <issn pub-type="ppub">2165-3917</issn>
      <publisher>
        <publisher-name>Scientific Research Publishing</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.4236/ojapps.2026.164065</article-id>
      <article-id pub-id-type="publisher-id">ojapps-150697</article-id>
      <article-categories>
        <subj-group>
          <subject>Article</subject>
        </subj-group>
        <subj-group>
          <subject>Biomedical</subject>
          <subject>Life Sciences</subject>
          <subject>Chemistry</subject>
          <subject>Materials Science</subject>
          <subject>Computer Science</subject>
          <subject>Communications</subject>
          <subject>Engineering</subject>
          <subject>Physics</subject>
          <subject>Mathematics</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>A Link Quality Prediction Method Based on External Attention and Prior Probability</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name name-style="western">
            <surname>Xie</surname>
            <given-names>Meng</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <name name-style="western">
            <surname>Shi</surname>
            <given-names>Weibin</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <name name-style="western">
            <surname>Lie</surname>
            <given-names>Yulai</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <name name-style="western">
            <surname>Xu</surname>
            <given-names>Wenfeng</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
      </contrib-group>
      <aff id="aff1"><label>1</label> School of Optical-Electrical and Computer Engineering, University of Shanghai for Science &amp; Technology, Shanghai, China </aff>
      <author-notes>
        <fn fn-type="conflict" id="fn-conflict">
          <p>The authors declare no conflicts of interest regarding the publication of this paper.</p>
        </fn>
      </author-notes>
      <pub-date pub-type="epub">
        <day>02</day>
        <month>04</month>
        <year>2026</year>
      </pub-date>
      <pub-date pub-type="collection">
        <month>04</month>
        <year>2026</year>
      </pub-date>
      <volume>16</volume>
      <issue>04</issue>
      <fpage>1103</fpage>
      <lpage>1116</lpage>
      <history>
        <date date-type="received">
          <day>11</day>
          <month>03</month>
          <year>2026</year>
        </date>
        <date date-type="accepted">
          <day>07</day>
          <month>04</month>
          <year>2026</year>
        </date>
        <date date-type="published">
          <day>10</day>
          <month>04</month>
          <year>2026</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>© 2026 by the authors and Scientific Research Publishing Inc.</copyright-statement>
        <copyright-year>2026</copyright-year>
        <license license-type="open-access">
          <license-p> This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link> ). </license-p>
        </license>
      </permissions>
      <self-uri content-type="doi" xlink:href="https://doi.org/10.4236/ojapps.2026.164065">https://doi.org/10.4236/ojapps.2026.164065</self-uri>
      <abstract>
        <p>Link quality estimation is a critical foundation for path selection in routing protocols of wireless sensor networks. Affected by multipath fading, noise, and interference in wireless channels, wireless links typically exhibit nonlinear and non-stationary characteristics, which pose challenges to efficient and accurate link quality prediction. To address the issues of error accumulation and lack of parallel computing in existing autoregressive methods, a Transformer-based link quality prediction method named LEAPP is proposed. A multi-head external attention mechanism is introduced in the encoder to reduce computational complexity and enhance global modeling capability. The decoder uses packet success rate (PSR) as input and constructs a non-autoregressive prediction model, achieving effective integration of prior probability and deep learning models. Experimental results show that compared with baseline models, the MAE and RMSE of LEAPP are reduced by 22.1% and 16.3%, respectively, and the MAE drops to 0.0092 on the public dataset. Meanwhile, the non-autoregressive inference mode reduces the inference delay by approximately 82% compared with traditional methods, significantly improving the real-time performance and practicality of online link quality prediction.</p>
      </abstract>
      <kwd-group kwd-group-type="author-generated" xml:lang="en">
        <kwd>Transformer</kwd>
        <kwd>Link Quality Prediction</kwd>
        <kwd>External Attention</kwd>
        <kwd>Prior Probability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec1">
      <title>1. Introduction</title>
      <p>Wireless Sensor Network (WSN) are an important component of the IoT [<xref ref-type="bibr" rid="B1">1</xref>][<xref ref-type="bibr" rid="B2">2</xref>]. Due to the adoption of low-power communication technologies in WSN, they are susceptible to factors such as multipath fading, environmental interference, and noise, resulting in unstable link states. Link quality exhibits non-stationary, asymmetric, and irregular fluctuation characteristics in both time and spatial domains [<xref ref-type="bibr" rid="B3">3</xref>]. How to accurately predict link quality is a key problem that needs to be solved.</p>
      <p>Early studies mostly adopted theoretical or empirical model-driven link quality estimation methods, which evaluate link quality by calculating predefined variables. Such methods are often only applicable to specific scenarios and difficult to generalize to complex and variable real-world environments [<xref ref-type="bibr" rid="B4">4</xref>]. In recent years, data-driven methods have gradually become mainstream and can be subdivided into three categories based on the technologies adopted: statistical models, machine learning models, and deep learning models. Statistical models realize link quality estimation by establishing the mapping relationship between link metrics and Packet Reception Ratio (PRR), which have the advantages of low computational overhead and easy deployment. However, they are highly dependent on measurement data collected under specific scenarios. Once the node deployment environment changes, the prediction performance of the original mapping model may degrade significantly. Machine learning methods improve the robustness and accuracy of models to a certain extent by fusing multi-dimensional link indicators and introducing fuzzy logic, regression, or classification models. Nevertheless, their performance is still limited by manual features, resulting in poor adaptability to dynamic channel environments and a bottleneck in prediction accuracy. Deep learning methods can automatically learn complex link features from raw data, significantly improving the prediction accuracy of PRR. However, their application in WSN faces multiple challenges: first, the model training and inference processes are computationally intensive and energy-consuming, making it difficult to meet the resource constraints of low-power devices; second, it is hard to obtain complete and high-quality training samples in dynamic channel environments; third, the model decision-making process lacks interpretability, which is not conducive to system debugging and trusted deployment. These factors collectively restrict the practicalization of deep learning in highly dynamic and resource-constrained WSN.</p>
      <p>To address the limitations of the aforementioned existing methods, this paper proposes a link quality prediction method based on Transformer [<xref ref-type="bibr" rid="B5">5</xref>], named LEAPP (Link Estimation based on External Attention and Prior Possibility). In terms of model structure design, LEAPP introduces the Multi-Head External Attention (MHEA) [<xref ref-type="bibr" rid="B6">6</xref>] mechanism in the encoder, and innovatively adopts the physical layer parameter PSR as the input in the decoder to construct a non-autoregressive prediction mechanism. By effectively fusing the prior knowledge of theoretical models with the data-driven capabilities of deep learning, LEAPP maintains high prediction accuracy while improving the generalization performance of the model across different deployment scenarios, providing an efficient and scalable link quality estimation solution for resource-constrained and highly dynamic wireless sensor networks.</p>
      <p>The main contributions of this paper are as follows:</p>
      <p>1) A link quality prediction model based on the Transformer framework is proposed. Different from structures such as LSTM that rely on local temporal modeling, LEAPP enhances the global dependency modeling capability by introducing the multi-head external attention mechanism and utilizing externally learnable parameters in the encoder, while reducing computational complexity.</p>
      <p>2) A non-autoregressive decoder with PSR as input is designed. Combined with the attention mechanism, it effectively solves the problem of gradual error accumulation during the inference phase, achieving efficient parallel inference while improving prediction accuracy. The model can still converge quickly in small-sample scenarios, featuring good data efficiency and interpretability.</p>
      <p>3) Comprehensive comparative experiments, ablation studies, and sensitivity analyses are conducted based on multiple self-collected and public WSN datasets. The results show that LEAPP significantly outperforms existing methods in complex and dynamic environments, demonstrating outstanding advantages in prediction accuracy, stability, and generalization ability. This verifies its application potential in low-power WSN systems.</p>
    </sec>
    <sec id="sec2">
      <title>2. Design of the LEAPP Model</title>
      <sec id="sec2dot1">
        <title>2.1. Overall Model Framework</title>
        <p>In this paper, a Transformer based sequence-to-sequence model for link quality prediction is designed. The overall architecture is shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>. The model takes the channel measurement metrics at the receiver as input and outputs the estimated value of the current PRR.</p>
        <p>The encoder is responsible for extracting contextual semantic features from the input SINR (Signal to Interference and Noise Ratio) sequence. Specifically, the original SINR sequence is first dimensionally upgraded through linear mapping to match the embedding dimension of the model. Subsequently, it is processed by multiple stacked MHEA (Multi-Head External Attention) modules and feed-forward networks. Residual connections and layer normalization are introduced between each sub-layer to stabilize the training process and accelerate convergence. The decoder takes the PSR (Physical Layer Parameter) sequence as input. The input at each time step undergoes linear transformation, then captures temporal dependencies through the MHEA module, and further enhances the nonlinear expression capability via the feed-forward network. The output of the encoder serves as the contextual input for the intermediate layers of the decoder, participating in cross-sequence attention computation to realize information interaction between the input sequence and the target sequence. In the output stage, the hidden representation of the last layer of the decoder is compressed into a single-channel output through a linear projection layer, establishing the mapping relationship from the SINR sequence to the PRR predicted values. This design not only retains the powerful sequence modeling capability of Transformer but also achieves a balance between real-time performance, interpretability, and deployment feasibility in resource-constrained WSN scenarios through structural innovations.</p>
        <fig id="fig1">
          <label>Figure 1</label>
          <graphic xlink:href="https://html.scirp.org/file/2313736-rId15.jpeg?20260410032754" />
        </fig>
        <p><bold>Figure 1</bold><bold>.</bold> Model structure diagram.</p>
      </sec>
      <sec id="sec2dot2">
        <title>2.2. Data Preprocess</title>
        <p>According to communication theory, PSR is a function of the Signal to Interference plus Noise Ratio (SINR) [<xref ref-type="bibr" rid="B7">7</xref>]. Therefore, this paper selects SINR as the input feature of the encoder. For the decoder, the PSR calculated based on a modified theoretical model is adopted as the input sequence, with the calculation formulas shown in Equations (1) and (2).</p>
        <disp-formula id="FD1">
          <label>(1)</label>
          <mml:math display="inline">
            <mml:mrow>
              <mml:mi>B</mml:mi>
              <mml:mi>E</mml:mi>
              <mml:mi>R</mml:mi>
              <mml:mo>=</mml:mo>
              <mml:mfrac>
                <mml:mn>8</mml:mn>
                <mml:mrow>
                  <mml:mn>15</mml:mn>
                </mml:mrow>
              </mml:mfrac>
              <mml:mo>×</mml:mo>
              <mml:mfrac>
                <mml:mn>1</mml:mn>
                <mml:mrow>
                  <mml:mn>16</mml:mn>
                </mml:mrow>
              </mml:mfrac>
              <mml:mo>×</mml:mo>
              <mml:mstyle displaystyle="true">
                <mml:munderover>
                  <mml:mo>∑</mml:mo>
                  <mml:mrow>
                    <mml:mi>k</mml:mi>
                    <mml:mo>=</mml:mo>
                    <mml:mn>2</mml:mn>
                  </mml:mrow>
                  <mml:mrow>
                    <mml:mn>16</mml:mn>
                  </mml:mrow>
                </mml:munderover>
                <mml:mrow>
                  <mml:msup>
                    <mml:mrow>
                      <mml:mrow>
                        <mml:mo>(</mml:mo>
                        <mml:mrow>
                          <mml:mo>−</mml:mo>
                          <mml:mn>1</mml:mn>
                        </mml:mrow>
                        <mml:mo>)</mml:mo>
                      </mml:mrow>
                    </mml:mrow>
                    <mml:mi>k</mml:mi>
                  </mml:msup>
                </mml:mrow>
              </mml:mstyle>
              <mml:mrow>
                <mml:mo>(</mml:mo>
                <mml:mrow>
                  <mml:msubsup>
                    <mml:mrow>
                    </mml:mrow>
                    <mml:mi>k</mml:mi>
                    <mml:mrow>
                      <mml:mn>16</mml:mn>
                    </mml:mrow>
                  </mml:msubsup>
                </mml:mrow>
                <mml:mo>)</mml:mo>
              </mml:mrow>
              <mml:msup>
                <mml:mi>e</mml:mi>
                <mml:mrow>
                  <mml:mrow>
                    <mml:mo>(</mml:mo>
                    <mml:mrow>
                      <mml:mn>20</mml:mn>
                      <mml:mo>×</mml:mo>
                      <mml:mrow>
                        <mml:mo>(</mml:mo>
                        <mml:mrow>
                          <mml:mi>S</mml:mi>
                          <mml:mi>I</mml:mi>
                          <mml:mi>N</mml:mi>
                          <mml:mi>R</mml:mi>
                          <mml:mo>−</mml:mo>
                          <mml:mi>τ</mml:mi>
                        </mml:mrow>
                        <mml:mo>)</mml:mo>
                      </mml:mrow>
                      <mml:mo>×</mml:mo>
                      <mml:mrow>
                        <mml:mo>(</mml:mo>
                        <mml:mrow>
                          <mml:mfrac>
                            <mml:mn>1</mml:mn>
                            <mml:mi>k</mml:mi>
                          </mml:mfrac>
                          <mml:mo>−</mml:mo>
                          <mml:mn>1</mml:mn>
                        </mml:mrow>
                        <mml:mo>)</mml:mo>
                      </mml:mrow>
                    </mml:mrow>
                    <mml:mo>)</mml:mo>
                  </mml:mrow>
                </mml:mrow>
              </mml:msup>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <disp-formula id="FD2">
          <label>(2)</label>
          <mml:math display="inline">
            <mml:mrow>
              <mml:mi>P</mml:mi>
              <mml:mi>S</mml:mi>
              <mml:mi>R</mml:mi>
              <mml:mo>=</mml:mo>
              <mml:msup>
                <mml:mrow>
                  <mml:mrow>
                    <mml:mo>(</mml:mo>
                    <mml:mrow>
                      <mml:mn>1</mml:mn>
                      <mml:mo>−</mml:mo>
                      <mml:mi>B</mml:mi>
                      <mml:mi>E</mml:mi>
                      <mml:mi>R</mml:mi>
                    </mml:mrow>
                    <mml:mo>)</mml:mo>
                  </mml:mrow>
                </mml:mrow>
                <mml:mi>L</mml:mi>
              </mml:msup>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <p>Among them, <italic>τ</italic> is the offset of SINR, which is measured through experiments. In the experiments of this paper, <italic>τ</italic> is set to 5.5 dB [<xref ref-type="bibr" rid="B8">8</xref>], <inline-formula><mml:math><mml:mi> L </mml:mi></mml:math></inline-formula> denotes the packet length, and is set to 37 in this work. The calculation formula of SINR is shown in Equation (3):</p>
        <disp-formula id="FD3">
          <label>(3)</label>
          <mml:math display="inline">
            <mml:mrow>
              <mml:mi>S</mml:mi>
              <mml:mi>I</mml:mi>
              <mml:mi>N</mml:mi>
              <mml:mi>R</mml:mi>
              <mml:mrow>
                <mml:mo>(</mml:mo>
                <mml:mrow>
                  <mml:mi>d</mml:mi>
                  <mml:mi>B</mml:mi>
                </mml:mrow>
                <mml:mo>)</mml:mo>
              </mml:mrow>
              <mml:mo>=</mml:mo>
              <mml:mn>10</mml:mn>
              <mml:mi>lg</mml:mi>
              <mml:mrow>
                <mml:mo>(</mml:mo>
                <mml:mrow>
                  <mml:mfrac>
                    <mml:mrow>
                      <mml:msub>
                        <mml:mi>P</mml:mi>
                        <mml:mi>s</mml:mi>
                      </mml:msub>
                    </mml:mrow>
                    <mml:mrow>
                      <mml:msub>
                        <mml:mi>P</mml:mi>
                        <mml:mi>i</mml:mi>
                      </mml:msub>
                      <mml:mo>+</mml:mo>
                      <mml:msub>
                        <mml:mi>P</mml:mi>
                        <mml:mi>n</mml:mi>
                      </mml:msub>
                    </mml:mrow>
                  </mml:mfrac>
                </mml:mrow>
                <mml:mo>)</mml:mo>
              </mml:mrow>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <p>Among them, <inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> P </mml:mi><mml:mi> s </mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> denotes the received useful signal power, while <inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> P </mml:mi><mml:mi> i </mml:mi></mml:msub><mml:mtext></mml:mtext></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> P </mml:mi><mml:mi> n </mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> represent the interference signal power and noise power, respectively.</p>
        <p>PSR denotes the probability of successfully receiving a packet under specific SNR conditions. PRR refers to the frequency of successful reception events among <inline-formula><mml:math><mml:mi> N </mml:mi></mml:math></inline-formula> transmissions. The probability that PRR takes a specific value R is determined by the PSR sequence within the statistical window, as shown in Equation (4):</p>
        <disp-formula id="FD4">
          <label>(4)</label>
          <mml:math display="inline">
            <mml:mrow>
              <mml:mi>γ</mml:mi>
              <mml:mo>=</mml:mo>
              <mml:mi>P</mml:mi>
              <mml:mrow>
                <mml:mo>{</mml:mo>
                <mml:mrow>
                  <mml:mi>P</mml:mi>
                  <mml:mi>R</mml:mi>
                  <mml:mi>R</mml:mi>
                  <mml:mo>=</mml:mo>
                  <mml:mi>R</mml:mi>
                  <mml:mrow>
                    <mml:mo>|</mml:mo>
                    <mml:mrow>
                      <mml:mrow>
                        <mml:mo>(</mml:mo>
                        <mml:mrow>
                          <mml:mi>p</mml:mi>
                          <mml:mn>1</mml:mn>
                          <mml:mo>,</mml:mo>
                          <mml:mi>p</mml:mi>
                          <mml:mn>2</mml:mn>
                          <mml:mo>,</mml:mo>
                          <mml:mo>⋯</mml:mo>
                          <mml:mo>,</mml:mo>
                          <mml:mi>p</mml:mi>
                          <mml:mi>n</mml:mi>
                        </mml:mrow>
                        <mml:mo>)</mml:mo>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mrow>
                </mml:mrow>
                <mml:mo>}</mml:mo>
              </mml:mrow>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <p>Among them, <italic>γ</italic> can be regarded as the prior probability of a successful reception event under specific SINR conditions, <italic>i.e.</italic>, under the condition of a specific PSR sequence. The expected value of PRR is shown in Equation (5):</p>
        <disp-formula id="FD5">
          <label>(5)</label>
          <mml:math display="inline">
            <mml:mrow>
              <mml:mi>P</mml:mi>
              <mml:mi>R</mml:mi>
              <mml:mi>R</mml:mi>
              <mml:mo>=</mml:mo>
              <mml:mstyle displaystyle="true">
                <mml:msubsup>
                  <mml:mo>∑</mml:mo>
                  <mml:mrow>
                    <mml:mi>i</mml:mi>
                    <mml:mo>=</mml:mo>
                    <mml:mn>0</mml:mn>
                  </mml:mrow>
                  <mml:mi>N</mml:mi>
                </mml:msubsup>
                <mml:mrow>
                  <mml:msub>
                    <mml:mi>γ</mml:mi>
                    <mml:mi>i</mml:mi>
                  </mml:msub>
                  <mml:msub>
                    <mml:mi>R</mml:mi>
                    <mml:mi>i</mml:mi>
                  </mml:msub>
                </mml:mrow>
              </mml:mstyle>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <p>Among them, N denotes the window size. It is quite difficult to derive an analytical expression for <italic>γ</italic> through theoretical analysis, and machine learning provides an alternative solution—learning the mapping relationship between the PSR sequence and PRR via training.</p>
        <p>When calculating PRR using a sliding window, LEAPP introduces the EWMA algorithm to optimize PRR estimation, where PRR labels are computed per packet. First, the local PRR mean is calculated based on a small time window, and then EWMA is applied to this sequence. This strategy retains the high sensitivity of small windows to sudden link changes while effectively alleviating the lag problem of the arithmetic mean of large windows. By virtue of exponentially decaying weights, EWMA can dynamically weight and smooth historical observations, thereby achieving a good balance between real-time performance and stability. The calculation methods of EWMA are shown in Equations (6) and (7).</p>
        <disp-formula id="FD6">
          <label>(6)</label>
          <mml:math display="inline">
            <mml:mrow>
              <mml:msub>
                <mml:mi>y</mml:mi>
                <mml:mn>0</mml:mn>
              </mml:msub>
              <mml:mo>=</mml:mo>
              <mml:msub>
                <mml:mi>x</mml:mi>
                <mml:mn>0</mml:mn>
              </mml:msub>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <disp-formula id="FD7">
          <label>(7)</label>
          <mml:math display="inline">
            <mml:mrow>
              <mml:msub>
                <mml:mi>y</mml:mi>
                <mml:mi>k</mml:mi>
              </mml:msub>
              <mml:mo>=</mml:mo>
              <mml:mrow>
                <mml:mo>(</mml:mo>
                <mml:mrow>
                  <mml:mn>1</mml:mn>
                  <mml:mo>−</mml:mo>
                  <mml:mi>α</mml:mi>
                </mml:mrow>
                <mml:mo>)</mml:mo>
              </mml:mrow>
              <mml:msub>
                <mml:mi>y</mml:mi>
                <mml:mrow>
                  <mml:mi>k</mml:mi>
                  <mml:mo>−</mml:mo>
                  <mml:mn>1</mml:mn>
                </mml:mrow>
              </mml:msub>
              <mml:mo>+</mml:mo>
              <mml:mi>α</mml:mi>
              <mml:mo>⋅</mml:mo>
              <mml:msub>
                <mml:mi>x</mml:mi>
                <mml:mi>k</mml:mi>
              </mml:msub>
              <mml:mo>,</mml:mo>
              <mml:mrow>
                <mml:mo>(</mml:mo>
                <mml:mrow>
                  <mml:mi>k</mml:mi>
                  <mml:mo>&gt;</mml:mo>
                  <mml:mn>1</mml:mn>
                </mml:mrow>
                <mml:mo>)</mml:mo>
              </mml:mrow>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <p>Among them, α is the smoothing coefficient. In the experiments of this paper, the time window is set to 5 and <italic>α</italic> = 0.075. The obtained EWMA of the window mean is approximately equal to the arithmetic mean calculated using a statistical window of size <italic>N</italic> = 100.</p>
        <p>For data slicing, a sliding window mechanism is adopted to slice data in chronological order to construct fixed-length sequence samples. The SINR feature is mapped to the [0, 1] interval using Min-Max normalization before being fed into the model, so as to improve training stability and convergence speed. The time series is split sequentially; to avoid inconsistent distributions between the training and test sets, samples are shuffled prior to splitting, and then divided into training, validation, and test sets at a ratio of 7:1:2.</p>
      </sec>
      <sec id="sec2dot3">
        <title>2.3. External Attention Mechanism</title>
        <p>To reduce the model complexity, LEAPP introduces the external attention mechanism. External attention constructs attention weights based on two lightweight, learnable parameter matrices, which can be implemented using only two cascaded linear layers and two normalization layers (as illustrated in <xref ref-type="fig" rid="fig2">Figure 2(b)</xref>). Compared with the inherent <inline-formula><mml:math><mml:mrow><mml:mi> O </mml:mi><mml:mrow><mml:mo> ( </mml:mo><mml:mrow><mml:msup><mml:mi> n </mml:mi><mml:mn> 2 </mml:mn></mml:msup></mml:mrow><mml:mo> ) </mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> computational complexity of traditional attention mechanisms (where n denotes the sequence length), external attention not only achieves linear computational complexity but also efficiently captures global feature correlations in the input data. The calculation method of external attention is shown in Equation (8):</p>
        <disp-formula id="FD8">
          <label>(8)</label>
          <mml:math display="inline">
            <mml:mrow>
              <mml:mi>E</mml:mi>
              <mml:mi>x</mml:mi>
              <mml:mi>t</mml:mi>
              <mml:mi>e</mml:mi>
              <mml:mi>r</mml:mi>
              <mml:mi>n</mml:mi>
              <mml:mi>a</mml:mi>
              <mml:mi>l</mml:mi>
              <mml:mtext>
                 
              </mml:mtext>
              <mml:mi>A</mml:mi>
              <mml:mi>t</mml:mi>
              <mml:mi>t</mml:mi>
              <mml:mi>e</mml:mi>
              <mml:mi>n</mml:mi>
              <mml:mi>t</mml:mi>
              <mml:mi>i</mml:mi>
              <mml:mi>o</mml:mi>
              <mml:mi>n</mml:mi>
              <mml:mrow>
                <mml:mo>(</mml:mo>
                <mml:mrow>
                  <mml:msub>
                    <mml:mi>D</mml:mi>
                    <mml:mi>q</mml:mi>
                  </mml:msub>
                  <mml:mo>,</mml:mo>
                  <mml:msubsup>
                    <mml:mi>D</mml:mi>
                    <mml:mi>k</mml:mi>
                    <mml:mrow>
                    </mml:mrow>
                  </mml:msubsup>
                  <mml:mo>,</mml:mo>
                  <mml:msub>
                    <mml:mi>D</mml:mi>
                    <mml:mi>v</mml:mi>
                  </mml:msub>
                </mml:mrow>
                <mml:mo>)</mml:mo>
              </mml:mrow>
              <mml:mo>=</mml:mo>
              <mml:mi>N</mml:mi>
              <mml:mi>o</mml:mi>
              <mml:mi>r</mml:mi>
              <mml:mi>m</mml:mi>
              <mml:mrow>
                <mml:mo>(</mml:mo>
                <mml:mrow>
                  <mml:msub>
                    <mml:mi>D</mml:mi>
                    <mml:mi>q</mml:mi>
                  </mml:msub>
                  <mml:msubsup>
                    <mml:mi>D</mml:mi>
                    <mml:mi>k</mml:mi>
                    <mml:mi>T</mml:mi>
                  </mml:msubsup>
                </mml:mrow>
                <mml:mo>)</mml:mo>
              </mml:mrow>
              <mml:msub>
                <mml:mi>D</mml:mi>
                <mml:mi>v</mml:mi>
              </mml:msub>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <p>Among them, <inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> D </mml:mi><mml:mi> q </mml:mi></mml:msub><mml:mo> ∈ </mml:mo><mml:msup><mml:mi> R </mml:mi><mml:mrow><mml:mi> n </mml:mi><mml:mo> × </mml:mo><mml:mi> d </mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> , maps the input features to a matrix of fixed dimension., <inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> D </mml:mi><mml:mi> k </mml:mi></mml:msub><mml:mo> , </mml:mo><mml:msub><mml:mi> D </mml:mi><mml:mi> v </mml:mi></mml:msub><mml:mo> ∈ </mml:mo><mml:msup><mml:mi> R </mml:mi><mml:mrow><mml:mi> S </mml:mi><mml:mo> × </mml:mo><mml:mi> d </mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> are externally learnable parameter matrices, and the computational complexity of external attention is <inline-formula><mml:math><mml:mrow><mml:mi> O </mml:mi><mml:mrow><mml:mo> ( </mml:mo><mml:mrow><mml:mi> d </mml:mi><mml:mi> S </mml:mi><mml:mi> n </mml:mi></mml:mrow><mml:mo> ) </mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> 。</p>
        <p>The structure of multi-head external attention is illustrated in <xref ref-type="fig" rid="fig2">Figure 2(b)</xref>. It concatenates the outputs of h groups of external attention heads along the feature dimension and restores them to the model dimension through linear mapping to obtain the final output. The calculation method of multi-head external attention is shown in Equation (9):</p>
        <disp-formula id="FD9">
          <label>(9)</label>
          <mml:math display="inline">
            <mml:mrow>
              <mml:mtable columnalign="left">
                <mml:mtr columnalign="left">
                  <mml:mtd columnalign="left">
                    <mml:mrow>
                      <mml:mi>M</mml:mi>
                      <mml:mi>u</mml:mi>
                      <mml:mi>l</mml:mi>
                      <mml:mi>t</mml:mi>
                      <mml:mi>i</mml:mi>
                      <mml:mi>E</mml:mi>
                      <mml:mi>H</mml:mi>
                      <mml:mi>e</mml:mi>
                      <mml:mi>a</mml:mi>
                      <mml:mi>d</mml:mi>
                      <mml:mrow>
                        <mml:mo>(</mml:mo>
                        <mml:mrow>
                          <mml:msub>
                            <mml:mi>D</mml:mi>
                            <mml:mi>q</mml:mi>
                          </mml:msub>
                          <mml:mo>,</mml:mo>
                          <mml:msub>
                            <mml:mi>D</mml:mi>
                            <mml:mi>k</mml:mi>
                          </mml:msub>
                          <mml:mo>,</mml:mo>
                          <mml:msub>
                            <mml:mi>D</mml:mi>
                            <mml:mi>v</mml:mi>
                          </mml:msub>
                        </mml:mrow>
                        <mml:mo>)</mml:mo>
                      </mml:mrow>
                      <mml:mo>=</mml:mo>
                      <mml:mi>C</mml:mi>
                      <mml:mi>o</mml:mi>
                      <mml:mi>n</mml:mi>
                      <mml:mi>c</mml:mi>
                      <mml:mi>a</mml:mi>
                      <mml:mi>t</mml:mi>
                      <mml:mrow>
                        <mml:mo>(</mml:mo>
                        <mml:mrow>
                          <mml:mi>h</mml:mi>
                          <mml:mi>e</mml:mi>
                          <mml:mi>a</mml:mi>
                          <mml:msub>
                            <mml:mi>d</mml:mi>
                            <mml:mn>1</mml:mn>
                          </mml:msub>
                          <mml:mo>,</mml:mo>
                          <mml:mi>h</mml:mi>
                          <mml:mi>e</mml:mi>
                          <mml:mi>a</mml:mi>
                          <mml:msub>
                            <mml:mi>d</mml:mi>
                            <mml:mn>2</mml:mn>
                          </mml:msub>
                          <mml:mo>,</mml:mo>
                          <mml:mo>⋯</mml:mo>
                          <mml:mo>,</mml:mo>
                          <mml:mi>h</mml:mi>
                          <mml:mi>e</mml:mi>
                          <mml:mi>a</mml:mi>
                          <mml:msub>
                            <mml:mi>d</mml:mi>
                            <mml:mi>h</mml:mi>
                          </mml:msub>
                        </mml:mrow>
                        <mml:mo>)</mml:mo>
                      </mml:mrow>
                      <mml:msub>
                        <mml:mi>W</mml:mi>
                        <mml:mi>O</mml:mi>
                      </mml:msub>
                    </mml:mrow>
                  </mml:mtd>
                </mml:mtr>
                <mml:mtr columnalign="left">
                  <mml:mtd columnalign="left">
                    <mml:mrow>
                      <mml:mtext>
                         
                      </mml:mtext>
                      <mml:mi>w</mml:mi>
                      <mml:mi>h</mml:mi>
                      <mml:mi>e</mml:mi>
                      <mml:mi>r</mml:mi>
                      <mml:mi>e</mml:mi>
                      <mml:mtext>
                         
                      </mml:mtext>
                      <mml:mi>h</mml:mi>
                      <mml:mi>e</mml:mi>
                      <mml:mi>a</mml:mi>
                      <mml:msub>
                        <mml:mi>d</mml:mi>
                        <mml:mi>i</mml:mi>
                      </mml:msub>
                      <mml:mo>=</mml:mo>
                      <mml:mi>E</mml:mi>
                      <mml:mi>x</mml:mi>
                      <mml:mi>t</mml:mi>
                      <mml:mi>e</mml:mi>
                      <mml:mi>r</mml:mi>
                      <mml:mi>n</mml:mi>
                      <mml:mi>a</mml:mi>
                      <mml:mi>l</mml:mi>
                      <mml:mi>A</mml:mi>
                      <mml:mi>t</mml:mi>
                      <mml:mi>t</mml:mi>
                      <mml:mi>e</mml:mi>
                      <mml:mi>n</mml:mi>
                      <mml:mi>t</mml:mi>
                      <mml:mi>i</mml:mi>
                      <mml:mi>o</mml:mi>
                      <mml:mi>n</mml:mi>
                      <mml:mrow>
                        <mml:mo>(</mml:mo>
                        <mml:mrow>
                          <mml:msub>
                            <mml:mi>D</mml:mi>
                            <mml:mi>q</mml:mi>
                          </mml:msub>
                          <mml:mo>,</mml:mo>
                          <mml:msub>
                            <mml:mi>D</mml:mi>
                            <mml:mi>k</mml:mi>
                          </mml:msub>
                          <mml:mo>,</mml:mo>
                          <mml:msub>
                            <mml:mi>D</mml:mi>
                            <mml:mi>v</mml:mi>
                          </mml:msub>
                        </mml:mrow>
                        <mml:mo>)</mml:mo>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:mtd>
                </mml:mtr>
              </mml:mtable>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <p>By introducing external parameter matrices, the model retains the global parallel modeling capability of multi-head attention while significantly reducing computational complexity and storage overhead, thereby improving efficiency and stability in long-sequence modeling. This mechanism is particularly suitable for temporal tasks such as communication link quality prediction, which not only require the model to capture long-range temporal dependencies but also demand efficient and scalable inference on resource-constrained edge devices.</p>
        <fig id="fig2">
          <label>Figure 2</label>
          <graphic xlink:href="https://html.scirp.org/file/2313736-rId52.jpeg?20260410032755" />
        </fig>
        <p><bold>Figure 2</bold><bold>.</bold> (a) External attention mechanism; (b) Multi-head external attention mechanism.</p>
      </sec>
    </sec>
    <sec id="sec3">
      <title>3. Experiments</title>
      <sec id="sec3dot1">
        <title>3.1. Experimental Platform and Data Collection</title>
        <p>To test the performance of the link quality prediction model, both self-collected datasets and public datasets are used in the experiments. The self-collected dataset is built based on a testbed constructed with CC2530 wireless sensor nodes, and link quality-related data are collected in various environments<sup>1</sup>. During the data collection process, two network topologies n-to-1 and 1-to-1 are adopted to cover the link dynamic characteristics under different communication scenarios. The experimental sites include the 9th floor of the Optoelectronics Building (indoor office environment), residential buildings (typical home environment), and parking lots (complex multipath and occlusion environment). The experimental conditions are shown in<bold>Table 1</bold>. In the names of the experimental datasets, suffixes “1” and “n” are used to denote the 1-to-1 and n-to-1 networks, respectively. For the data collected in parking lots, additional suffixes “1” and “2” are used to identify data with transmission powers of −22 dBm and −8 dBm, respectively.</p>
        <p><bold>Table 1</bold><bold>.</bold> Experimental conditions.</p>
        <table-wrap id="tbl1">
          <label>Table 1</label>
          <table>
            <tbody>
              <tr>
                <td>Scenario</td>
                <td>Dataset</td>
                <td>Transmission Power/dBm</td>
                <td>Transmission Period/ms</td>
                <td>Communication Distance/m</td>
              </tr>
              <tr>
                <td>Office Building</td>
                <td>Oecb9-1/n</td>
                <td>−22 - 4.5</td>
                <td>100 - 2000</td>
                <td>5 - 30</td>
              </tr>
              <tr>
                <td>Parking</td>
                <td>Parking-n-1/2</td>
                <td>−22, −8</td>
                <td>500</td>
                <td>5 - 10</td>
              </tr>
              <tr>
                <td>Residential</td>
                <td>Resi-1/n</td>
                <td>−22 - 4.5</td>
                <td>50 - 2000</td>
                <td>6</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <p>The public dataset used in the test experiments is due [<xref ref-type="bibr" rid="B9">9</xref>]. This dataset covers link quality data under various experimental conditions, including different communication distances and transmission powers. Even under a single distance condition, the number of data records collected in a single experiment reaches more than 2.4 million, providing sufficient and reliable support for evaluating the model’s performance across diverse scenarios.</p>
        <p>The experiments in this study were conducted on a computing platform equipped with an NVIDIA GeForce RTX 4090 GPU, with Ubuntu 20.04 LTS as the operating system. The TensorFlow deep learning framework was adopted to implement model training and testing, and all codes were run in the Python 3.9 environment.</p>
      </sec>
      <sec id="sec3dot2">
        <title>3.2. Ablation Experiments and Sensitivity Experiments</title>
        <p>To verify the role of the external attention mechanism in the link quality prediction task, ablation experiments targeting the attention modules of the encoder and decoder were conducted on the Oecb9-n dataset. Only the form of attention was replaced in the experiments, while all other hyperparameters, training strategies, and data partitioning were kept consistent to ensure the comparability of results. Specifically, four model groups were constructed: both the encoder and decoder adopt Multi-Head Self-Attention (MHSA); the encoder adopts MHSA while the decoder adopts Multi-Head External Attention (MHEA); the encoder adopts MHEA while the decoder adopts MHSA; both the encoder and decoder adopt MHEA. The experiments used Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) as performance evaluation metrics, and counted the FLOPs of the model in a single forward inference to measure computational complexity.</p>
        <p><bold>Table 2</bold><bold>.</bold> Performance comparison of MHEA and MHSA.</p>
        <table-wrap id="tbl2">
          <label>Table 2</label>
          <table>
            <tbody>
              <tr>
                <td>enc_attention</td>
                <td>dec_attention</td>
                <td>FLOPs</td>
                <td>MAE</td>
                <td>RMSE</td>
              </tr>
              <tr>
                <td>MHSA</td>
                <td>MHSA</td>
                <td>31843617</td>
                <td>0.0315</td>
                <td>0.0477</td>
              </tr>
              <tr>
                <td>MHSA</td>
                <td>MHEA</td>
                <td>26605863</td>
                <td>0.0299</td>
                <td>0.0453</td>
              </tr>
              <tr>
                <td>MHEA</td>
                <td>MHSA</td>
                <td>26625319</td>
                <td>0.0278</td>
                <td>0.0413</td>
              </tr>
              <tr>
                <td>
                  <bold>MHEA</bold>
                </td>
                <td>
                  <bold>MHEA</bold>
                </td>
                <td>
                  <bold>21387565</bold>
                </td>
                <td>
                  <bold>0.0251</bold>
                </td>
                <td>
                  <bold>0.0408</bold>
                </td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <p>As can be seen from <bold>Table 2</bold>, the MAE and RMSE of the baseline model (MHSA-MHSA) are 0.0315 and 0.0477, respectively, with FLOPs of 31,843,617. When MHEA is introduced only in the decoder, the error decreases slightly; while replacing MHSA with MHEA in the encoder leads to a more significant improvement in model performance, indicating that external attention plays a greater role in capturing the global dependencies of input link features. When both the encoder and decoder adopt MHEA, the model achieves the optimal results, with reductions of approximately 22.1% and 16.3% in MAE and RMSE compared with the baseline, respectively, and the computational load is reduced by 10,456,052 FLOPs. This result demonstrates that multi-head external attention improves prediction accuracy while reducing computational overhead, with a particularly prominent effect at the encoder side, thus confirming its superior global modeling capability in the link quality prediction task.</p>
        <p>To investigate the prediction accuracy of the proposed model under different hyper parameter settings, controlled variable experiments were conducted from four aspects: input sequence length (len_seq), number of rows of the external parameter matrix (S), dimension of the mapping layer (d_model), and number of attention heads (num_heads). All other hyperparameters remain unchanged, where the loss function is MAE, the optimizer is Adam, the learning rate is set to 1e-3, the batch size is fixed, and the number of training epochs is 300.</p>
        <p>During the experiments, only one parameter was adjusted at a time, while the remaining parameters were kept at their default configurations. All experiments were performed under the same training and inference strategies, with Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) as evaluation metrics to measure prediction accuracy and overall stability. As shown in <bold>Table 3</bold>, the model achieves good performance when S = 32, while the performance degrades when S increases to 64, indicating that an excessively large parameter matrix may introduce redundant information. Experiments show that d_model = 128 is an optimal configuration: an overly low dimension limits the model’s expressive capacity, while an overly high dimension causes error rebound, indicating a tendency of over fitting in the model. The model performs best when the number of attention heads is 8, which balances global dependency modeling and computational overhead. In summary, the optimal configuration of the proposed model is concentrated at S = 32, d_model = 128, and num_heads = 8, demonstrating that the structure designed in this paper can stably achieve high-precision prediction under reasonable parameter conditions.</p>
        <p><bold>Table 3</bold><bold>.</bold> Model sensitivity experiments under different hyperparameters.</p>
        <table-wrap id="tbl3">
          <label>Table 3</label>
          <table>
            <tbody>
              <tr>
                <td>S</td>
                <td>d_model</td>
                <td>num_heads</td>
                <td>MAE</td>
                <td>RMSE</td>
              </tr>
              <tr>
                <td>10</td>
                <td>128</td>
                <td>8</td>
                <td>0.0267</td>
                <td>0.0436</td>
              </tr>
              <tr>
                <td>20</td>
                <td>128</td>
                <td>8</td>
                <td>0.0288</td>
                <td>0.0464</td>
              </tr>
              <tr>
                <td>
                  <bold>32</bold>
                </td>
                <td>
                  <bold>128</bold>
                </td>
                <td>
                  <bold>8</bold>
                </td>
                <td>
                  <bold>0.0251</bold>
                </td>
                <td>
                  <bold>0.0408</bold>
                </td>
              </tr>
              <tr>
                <td>64</td>
                <td>128</td>
                <td>8</td>
                <td>0.0281</td>
                <td>0.0440</td>
              </tr>
              <tr>
                <td>32</td>
                <td>32</td>
                <td>8</td>
                <td>0.0330</td>
                <td>0.0489</td>
              </tr>
              <tr>
                <td>32</td>
                <td>64</td>
                <td>8</td>
                <td>0.0299</td>
                <td>0.0451</td>
              </tr>
              <tr>
                <td>
                  <bold>32</bold>
                </td>
                <td>
                  <bold>128</bold>
                </td>
                <td>
                  <bold>8</bold>
                </td>
                <td>
                  <bold>0.0251</bold>
                </td>
                <td>
                  <bold>0.0408</bold>
                </td>
              </tr>
              <tr>
                <td>32</td>
                <td>256</td>
                <td>8</td>
                <td>0.0509</td>
                <td>0.0724</td>
              </tr>
              <tr>
                <td>32</td>
                <td>128</td>
                <td>2</td>
                <td>0.0489</td>
                <td>0.0691</td>
              </tr>
              <tr>
                <td>32</td>
                <td>128</td>
                <td>4</td>
                <td>0.0321</td>
                <td>0.0490</td>
              </tr>
              <tr>
                <td>
                  <bold>32</bold>
                </td>
                <td>
                  <bold>128</bold>
                </td>
                <td>
                  <bold>8</bold>
                </td>
                <td>
                  <bold>0.0251</bold>
                </td>
                <td>
                  <bold>0.0408</bold>
                </td>
              </tr>
              <tr>
                <td>32</td>
                <td>128</td>
                <td>16</td>
                <td>0.0272</td>
                <td>0.0440</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <p>As can be seen from the relationship between MAE and input sequence length shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>, the MAE decreases significantly when the sequence length increases from 10 to 32; as the sequence length increases further, the magnitude of the MAE decreases gradually diminishes. This indicates that longer sequences help the model capture richer temporal characteristics of link quality, thereby improving prediction accuracy. However, excessively long sequences not only significantly increase computational overhead but also prolong the waiting time for data collection and inference, impairing the real-time performance of the system. Considering prediction performance, computational efficiency, and response speed comprehensively, this paper selects len_seq = 32 as the default configuration for input sequence length in subsequent experiments.</p>
        <fig id="fig3">
          <label>Figure 3</label>
          <graphic xlink:href="https://html.scirp.org/file/2313736-rId53.jpeg?20260410032756" />
        </fig>
        <p><bold>Figure 3</bold><bold>.</bold> MAE for different input sequence lengths.</p>
      </sec>
      <sec id="sec3dot3">
        <title>3.3. Evaluation of the Improved Decoder Input Sub-Layer</title>
        <p>Traditional autoregressive decoders suffer from distribution mismatch between training and inference, which easily causes error accumulation and low inference efficiency. Scheduled sampling [<xref ref-type="bibr" rid="B10">10</xref>] in dynamically adjusts the teacher-forcing probability to gradually align the training distribution with the inference distribution. Two-stage scheduled sampling further adopts dual-channel training to achieve a smoother transition between training and inference. To verify the effectiveness of the improved decoder input sub-layer proposed in this paper, comparative tests are conducted among LEAPP, the model with the original decoder input sub-layer (teacher-forcing in training and autoregressive mode in inference), and two models improved based on scheduled sampling. Inference time is also evaluated, and the timing includes the data preprocessing process.</p>
        <p>The experimental results are shown in <bold>Table 4</bold>. The dataset used is Oecb9-n, which contains more than 10,000 data samples. Due to error accumulation in autoregressive inference, the model with the original decoder input sub-layer has a large prediction error. Models improved with scheduled sampling and two-stage scheduled sampling reduce the error by approximately 50% compared with the full teacher-forcing model. By using PSR as the decoder input, LEAPP eliminates the input discrepancy between training and inference and the resulting error accumulation. The prediction error is reduced by 82% compared with the teacher-forcing model, and by 62% and 61% compared with the scheduled sampling and two-stage scheduled sampling models, respectively. Since LEAPP supports parallel computation during inference, the inference time is reduced by 89% compared with the teacher-forcing model, and by 83% and 84% compared with the scheduled sampling and two-stage scheduled sampling models, respectively. This enables LEAPP to better meet the real-time requirements of application systems for link estimation tasks.</p>
        <p><bold>Table 4</bold><bold>.</bold> Performance comparison of different input sub-layers for the decoder.</p>
        <table-wrap id="tbl4">
          <label>Table 4</label>
          <table>
            <tbody>
              <tr>
                <td>Decoder Input</td>
                <td>Train method</td>
                <td>Inference Time</td>
                <td>MAE</td>
              </tr>
              <tr>
                <td rowspan="3">PRR</td>
                <td>teacher forcing</td>
                <td>1.75 s</td>
                <td>0.1370</td>
              </tr>
              <tr>
                <td>Scheduled Sampling</td>
                <td>1.17 s</td>
                <td>0.0641</td>
              </tr>
              <tr>
                <td>two-pass Scheduled Sampling</td>
                <td>1.22 s</td>
                <td>0.0625</td>
              </tr>
              <tr>
                <td>PSR</td>
                <td>non-autoregressive training</td>
                <td>
                  <bold>0.19</bold>
                  <bold>s</bold>
                </td>
                <td>
                  <bold>0.0245</bold>
                </td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
      </sec>
      <sec id="sec3dot4">
        <title>3.4. Comparative Experiments</title>
        <p>To comprehensively evaluate the performance of LEAPP, comparative experiments were conducted on multiple datasets between LEAPP and existing mainstream link quality prediction models. The compared models include traditional Recurrent Neural Network (RNN) [<xref ref-type="bibr" rid="B11">11</xref>], Long Short-Term Memory (LSTM) [<xref ref-type="bibr" rid="B12">12</xref>], Gated Recurrent Unit (GRU) [<xref ref-type="bibr" rid="B13">13</xref>], Transformer [<xref ref-type="bibr" rid="B5">5</xref>], and the hybrid convolutional and recurrent model CNN + LSTM (CL) [<xref ref-type="bibr" rid="B14">14</xref>].To verify the effectiveness of different attention mechanisms in link quality prediction, two extended models are added for comparison: CNN + LSTM + SA (CLSA) with self-attention and CNN + LSTM + EA (CLEA) with external attention. All experiments are carried out under unified hardware and training configurations, <italic>i.e.</italic>, len_seq = 32, S = 32, d_model = 128, num_heads = 8, to ensure the comparability of results.</p>
        <p>Experimental results (see <xref ref-type="fig" rid="fig4">Figure 4</xref>) show that traditional recurrent neural network models achieve relatively low accuracy, indicating the limitations of using recurrent structures alone to capture complex link quality characteristics. The CL model achieves better performance than single recurrent neural network models by combining convolution and recurrent structures, demonstrating that joint modeling of local features and temporal dependencies can improve prediction accuracy. The two attention-based extended models, CLSA and CLEA, do not bring significant performance gains, suggesting that simply stacking multiple model components cannot effectively enhance the ability to model complex dependencies. As a typical model constructed with self-attention mechanism, Transformer achieves significantly superior prediction performance compared with traditional recurrent neural networks and CL-based models, reducing the average MAE by approximately 40% relative to RNN. This validates the effectiveness of self-attention in capturing long-range temporal dependencies. Further comparison reveals that models using the external attention mechanism achieve significantly lower average error than those using self-attention. LEAPP achieves the best performance on all datasets. For the public dataset due, the MAE of LEAPP is reduced to 0.0092, indicating that improving the decoder input sub-layer and adopting external attention instead of self-attention can significantly boost prediction accuracy and endow LEAPP with strong generalization ability.</p>
        <fig id="fig4">
          <label>Figure 4</label>
          <graphic xlink:href="https://html.scirp.org/file/2313736-rId54.jpeg?20260410032756" />
        </fig>
        <p><bold>Figure 4</bold><bold>.</bold> MAE of different models on different datasets.</p>
        <p><xref ref-type="fig" rid="fig5">Figure 5</xref> depicts the temporal fluctuations of the predicted and true PRR values for all models. The experiment is conducted on the Oecb9n dataset, and consistent patterns are observed across other datasets as well. It can be seen from <xref ref-type="fig" rid="fig5">Figure 5</xref> that the predicted values of all models generally follow the changing trend of the true values, but differ in fitting accuracy. The Transformer achieves better fitting performance than traditional recurrent networks and convolutional combination models by virtue of the selfattention mechanism. Its prediction curve is closer to the true values, and the error is significantly lower than that of RNN, GRU, LSTM and CLbased models, demonstrating the superiority of selfattention in modeling temporal dependencies. Compared with the Transformer, LEAPP achieves a notably smaller prediction error, indicating the effectiveness of the external attention mechanism and the improved decoder sublayer in boosting model performance. Due to insufficient capability in modeling local dependencies, individual RNN, LSTM and GRU models suffer from large prediction errors. Combining LSTM with CNN reduces the prediction error to some extent, yet the error is still considerably larger than that of LEAPP.</p>
        <fig id="fig5">
          <label>Figure 5</label>
          <graphic xlink:href="https://html.scirp.org/file/2313736-rId55.jpeg?20260410032756" />
        </fig>
        <p><bold>Figure 5</bold><bold>.</bold> Comparison of errors between predicted results and actual values using different methods.</p>
      </sec>
    </sec>
    <sec id="sec4">
      <title>4. Conclusion</title>
      <p>To address the shortcomings of existing methods in prediction accuracy and computational efficiency, a Transformer-based link quality estimation method named LEAPP is proposed. In terms of model design, a multi-head external attention mechanism is introduced to strengthen the global feature modeling capability and reduce model complexity. PSR is adopted as the decoder input to balance prediction accuracy and inference efficiency. Experimental results on multiple datasets demonstrate that compared with existing methods, LEAPP exhibits superior performance in terms of accuracy, stability, and generalization ability in complex environments.</p>
    </sec>
    <sec id="sec5">
      <title>Funding</title>
      <p>Supported by the National Natural Science Foundation of China (Grant No. 61374040), and the Science and Technology Development Project of University of Shanghai for Science and Technology (Grant No. 2020KJFZ082).</p>
    </sec>
    <sec id="sec6">
      <title>NOTES</title>
      <p><sup>1</sup>http://gitee.com/WING_USST/wndataset/tree/master. </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <title>References</title>
      <ref id="B1">
        <label>1.</label>
        <citation-alternatives>
          <mixed-citation publication-type="confproc">Ali, A.A., Gharghan, S.K. and Ali, A.H. (2024) A Survey on the Integration of Machine Learning Algorithms with Wireless Sensor Networks for Predicting Diabetic Foot Complications. <italic>AIP</italic><italic>Conference</italic><italic>Proceedings</italic>, 3232, Article ID: 040022. https://doi.org/10.1063/5.0236289 <pub-id pub-id-type="doi">10.1063/5.0236289</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1063/5.0236289">https://doi.org/10.1063/5.0236289</ext-link></mixed-citation>
          <element-citation publication-type="confproc">
            <person-group person-group-type="author">
              <string-name>Ali, A.A.</string-name>
              <string-name>Gharghan, S.K.</string-name>
              <string-name>Ali, A.H.</string-name>
            </person-group>
            <year>2024</year>
            <article-title>A Survey on the Integration of Machine Learning Algorithms with Wireless Sensor Networks for Predicting Diabetic Foot Complications</article-title>
            <source>AIP Conference Proceedings</source>
            <volume>3232</volume>
            <fpage>040022</fpage>
            <elocation-id>ID</elocation-id>
            <pub-id pub-id-type="doi">10.1063/5.0236289</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B2">
        <label>2.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Yick, J., Mukherjee, B. and Ghosal, D. (2008) Wireless Sensor Network Survey. <italic>Computer</italic><italic>Networks</italic>, 52, 2292-2330. https://doi.org/10.1016/j.comnet.2008.04.002 <pub-id pub-id-type="doi">10.1016/j.comnet.2008.04.002</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.comnet.2008.04.002">https://doi.org/10.1016/j.comnet.2008.04.002</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Yick, J.</string-name>
              <string-name>Mukherjee, B.</string-name>
              <string-name>Ghosal, D.</string-name>
            </person-group>
            <year>2008</year>
            <article-title>Wireless Sensor Network Survey</article-title>
            <source>Computer Networks</source>
            <volume>52</volume>
            <pub-id pub-id-type="doi">10.1016/j.comnet.2008.04.002</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B3">
        <label>3.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Baccour, N., Koubâa, A., Mottola, L., Zúñiga, M.A., Youssef, H., Boano, C.A., <italic>et</italic><italic>al</italic>. (2012) Radio Link Quality Estimation in Wireless Sensor Networks. <italic>ACM</italic><italic>Transactions</italic><italic>on</italic><italic>Sensor</italic><italic>Networks</italic>, 8, 1-33. https://doi.org/10.1145/2240116.2240123 <pub-id pub-id-type="doi">10.1145/2240116.2240123</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1145/2240116.2240123">https://doi.org/10.1145/2240116.2240123</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Baccour, N.</string-name>
              <string-name>Mottola, L.</string-name>
              <string-name>Youssef, H.</string-name>
              <string-name>Boano, C.A.</string-name>
            </person-group>
            <year>2012</year>
            <article-title>Radio Link Quality Estimation in Wireless Sensor Networks</article-title>
            <source>ACM Transactions on Sensor Networks</source>
            <volume>8</volume>
            <pub-id pub-id-type="doi">10.1145/2240116.2240123</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B4">
        <label>4.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Cerar, G., Yetgin, H., Mohorcic, M. and Fortuna, C. (2021) Machine Learning for Wireless Link Quality Estimation: A Survey. <italic>IEEE</italic><italic>Communications</italic><italic>Surveys</italic><italic>&amp;</italic><italic>Tutorials</italic>, 23, 696-728. https://doi.org/10.1109/comst.2021.3053615 <pub-id pub-id-type="doi">10.1109/comst.2021.3053615</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1109/comst.2021.3053615">https://doi.org/10.1109/comst.2021.3053615</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Cerar, G.</string-name>
              <string-name>Yetgin, H.</string-name>
              <string-name>Mohorcic, M.</string-name>
              <string-name>Fortuna, C.</string-name>
            </person-group>
            <year>2021</year>
            <article-title>Machine Learning for Wireless Link Quality Estimation: A Survey</article-title>
            <source>IEEE Communications Surveys &amp; Tutorials</source>
            <volume>23</volume>
            <pub-id pub-id-type="doi">10.1109/comst.2021.3053615</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B5">
        <label>5.</label>
        <citation-alternatives>
          <mixed-citation publication-type="journal">Vaswani, A., Shazeer, N., Parmar, N., <italic>et</italic><italic>al</italic>. (2017) Attention Is All You Need. arXiv: 1706.03762.</mixed-citation>
          <element-citation publication-type="journal">
            <person-group person-group-type="author">
              <string-name>Vaswani, A.</string-name>
              <string-name>Shazeer, N.</string-name>
              <string-name>Parmar, N.</string-name>
            </person-group>
            <year>2017</year>
            <article-title>Attention Is All You Need</article-title>
            <fpage>1706</fpage>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B6">
        <label>6.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Guo, M., Liu, Z., Mu, T. and Hu, S. (2022) Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks. <italic>IEEE</italic><italic>Transactions</italic><italic>on</italic><italic>Pattern</italic><italic>Analysis</italic><italic>and</italic><italic>Machine</italic><italic>Intelligence</italic>, 45, 5436-5447. https://doi.org/10.1109/tpami.2022.3211006 <pub-id pub-id-type="doi">10.1109/tpami.2022.3211006</pub-id><pub-id pub-id-type="pmid">36197869</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1109/tpami.2022.3211006">https://doi.org/10.1109/tpami.2022.3211006</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Guo, M.</string-name>
              <string-name>Liu, Z.</string-name>
              <string-name>Mu, T.</string-name>
              <string-name>Hu, S.</string-name>
            </person-group>
            <year>2022</year>
            <article-title>Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks</article-title>
            <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
            <volume>45</volume>
            <pub-id pub-id-type="doi">10.1109/tpami.2022.3211006</pub-id>
            <pub-id pub-id-type="pmid">36197869</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B7">
        <label>7.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">IEEE (2006) IEEE Standard for Information Technology Part 15. 4: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications for Low-Rate Wireless Personal Area Networks (WPANs).</mixed-citation>
          <element-citation publication-type="other">
            <year>2006</year>
            <article-title>IEEE Standard for Information Technology Part 15</article-title>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B8">
        <label>8.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Shi, J.J., Qiu, Y.H., Long, H.B., <italic>et al</italic>. (2024) Research Onwireless Network Link Quality Estimation Method Based on IEEE 802.15.4 Physical Layer. <italic>Modeling</italic><italic>and</italic><italic>Simulation</italic>, 13, 4019-4034. https://doi.org/10.12677/mos.2024.133365 <pub-id pub-id-type="doi">10.12677/mos.2024.133365</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.12677/mos.2024.133365">https://doi.org/10.12677/mos.2024.133365</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Shi, J.J.</string-name>
              <string-name>Qiu, Y.H.</string-name>
              <string-name>Long, H.B.</string-name>
            </person-group>
            <year>2024</year>
            <article-title>Research Onwireless Network Link Quality Estimation Method Based on IEEE 802</article-title>
            <source>15.4 Physical Layer. Modeling and Simulation</source>
            <volume>13</volume>
            <pub-id pub-id-type="doi">10.12677/mos.2024.133365</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B9">
        <label>9.</label>
        <citation-alternatives>
          <mixed-citation publication-type="confproc">Fu, S., Zhang, Y., Jiang, Y., Hu, C., Shih, C. and Marron, P.J. (2015) Experimental Study for Multi-Layer Parameter Configuration of WSN Links. 2015 <italic>IEEE</italic> 35 <italic>th</italic><italic>International</italic><italic>Conference</italic><italic>on</italic><italic>Distributed</italic><italic>Computing</italic><italic>Systems</italic>, Columbus, 29 June-2 July 2015, 369-378. https://doi.org/10.1109/icdcs.2015.45 <pub-id pub-id-type="doi">10.1109/icdcs.2015.45</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1109/icdcs.2015.45">https://doi.org/10.1109/icdcs.2015.45</ext-link></mixed-citation>
          <element-citation publication-type="confproc">
            <person-group person-group-type="author">
              <string-name>Fu, S.</string-name>
              <string-name>Zhang, Y.</string-name>
              <string-name>Jiang, Y.</string-name>
              <string-name>Hu, C.</string-name>
              <string-name>Shih, C.</string-name>
              <string-name>Marron, P.J.</string-name>
              <string-name>Systems, C</string-name>
            </person-group>
            <year>2015</year>
            <article-title>Experimental Study for Multi-Layer Parameter Configuration of WSN Links</article-title>
            <source>2015 IEEE 35th International Conference on Distributed Computing Systems</source>
            <volume>29</volume>
            <pub-id pub-id-type="doi">10.1109/icdcs.2015.45</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B10">
        <label>10.</label>
        <citation-alternatives>
          <mixed-citation publication-type="confproc">Mihaylova, T. and Martins, A.F.T. (2019) Scheduled Sampling for Transformers. <italic>Proceedings</italic><italic>of</italic><italic>the</italic> 57 <italic>th</italic><italic>Annual</italic><italic>Meeting</italic><italic>of</italic><italic>the</italic><italic>Association</italic><italic>for</italic><italic>Computational</italic><italic>Linguistics</italic>: <italic>Student</italic><italic>Research</italic><italic>Workshop</italic>, Florence, 5-10 July 2019, 351-356. https://doi.org/10.18653/v1/p19-2049 <pub-id pub-id-type="doi">10.18653/v1/p19-2049</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.18653/v1/p19-2049">https://doi.org/10.18653/v1/p19-2049</ext-link></mixed-citation>
          <element-citation publication-type="confproc">
            <person-group person-group-type="author">
              <string-name>Mihaylova, T.</string-name>
              <string-name>Martins, A.F.T.</string-name>
              <string-name>Workshop, F</string-name>
            </person-group>
            <year>2019</year>
            <article-title>Scheduled Sampling for Transformers</article-title>
            <source>Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop</source>
            <volume>5</volume>
            <pub-id pub-id-type="doi">10.18653/v1/p19-2049</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B11">
        <label>11.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Xu, M., Liu, W., Xu, J., Xia, Y., Mao, J., Xu, C., <italic>et</italic><italic>al</italic>. (2022) Recurrent Neural Network Based Link Quality Prediction for Fluctuating Low Power Wireless Links. <italic>Sensors</italic>, 22, Article 1212. https://doi.org/10.3390/s22031212 <pub-id pub-id-type="doi">10.3390/s22031212</pub-id><pub-id pub-id-type="pmid">35161954</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3390/s22031212">https://doi.org/10.3390/s22031212</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Xu, M.</string-name>
              <string-name>Liu, W.</string-name>
              <string-name>Xu, J.</string-name>
              <string-name>Xia, Y.</string-name>
              <string-name>Mao, J.</string-name>
              <string-name>Xu, C.</string-name>
            </person-group>
            <year>2022</year>
            <article-title>Recurrent Neural Network Based Link Quality Prediction for Fluctuating Low Power Wireless Links</article-title>
            <source>Sensors</source>
            <volume>22</volume>
            <elocation-id>1212</elocation-id>
            <pub-id pub-id-type="doi">10.3390/s22031212</pub-id>
            <pub-id pub-id-type="pmid">35161954</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B12">
        <label>12.</label>
        <citation-alternatives>
          <mixed-citation publication-type="confproc">Kanto, Y. and Watabe, K. (2024) Wireless Link Quality Estimation Using LSTM Model. <italic>NOMS</italic> 2024-2024 <italic>IEEE</italic><italic>Network</italic><italic>Operations</italic><italic>and</italic><italic>Management</italic><italic>Symposium</italic>, Seoul, 6-10 May 2024, 1-5. https://doi.org/10.1109/noms59830.2024.10575638 <pub-id pub-id-type="doi">10.1109/noms59830.2024.10575638</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1109/noms59830.2024.10575638">https://doi.org/10.1109/noms59830.2024.10575638</ext-link></mixed-citation>
          <element-citation publication-type="confproc">
            <person-group person-group-type="author">
              <string-name>Kanto, Y.</string-name>
              <string-name>Watabe, K.</string-name>
              <string-name>Symposium, S</string-name>
            </person-group>
            <year>2024</year>
            <article-title>Wireless Link Quality Estimation Using LSTM Model</article-title>
            <source>NOMS 2024-2024 IEEE Network Operations and Management Symposium</source>
            <volume>6</volume>
            <pub-id pub-id-type="doi">10.1109/noms59830.2024.10575638</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B13">
        <label>13.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Liu, L.L., Xiao, T.Z., Shu, J., <italic>et</italic><italic>al</italic>. (2022) Link Quality Prediction Based on Gated Recurrent Unit. <italic>Advanced</italic><italic>Engineering</italic><italic>Sciences</italic>, 54, 51-58.</mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Liu, L.L.</string-name>
              <string-name>Xiao, T.Z.</string-name>
              <string-name>Shu, J.</string-name>
            </person-group>
            <year>2022</year>
            <article-title>Link Quality Prediction Based on Gated Recurrent Unit</article-title>
            <source>Advanced Engineering Sciences</source>
            <volume>54</volume>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B14">
        <label>14.</label>
        <citation-alternatives>
          <mixed-citation publication-type="confproc">Fan, J.B., and Liu, L.L. (2023) A Hybrid Model with CNN-LSTM for Link Quality Prediction. 2023 6 <italic>th</italic><italic>International</italic><italic>Conference</italic><italic>on</italic><italic>Electronics</italic><italic>Technology</italic> ( <italic>ICET</italic>), Chengdu, 12-15 May 2023, 603-607. https://doi.org/10.1109/icet58434.2023.10211502 <pub-id pub-id-type="doi">10.1109/icet58434.2023.10211502</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1109/icet58434.2023.10211502">https://doi.org/10.1109/icet58434.2023.10211502</ext-link></mixed-citation>
          <element-citation publication-type="confproc">
            <person-group person-group-type="author">
              <string-name>Fan, J.B.</string-name>
              <string-name>Liu, L.L.</string-name>
            </person-group>
            <year>2023</year>
            <article-title>A Hybrid Model with CNN-LSTM for Link Quality Prediction</article-title>
            <source>2023 6th International Conference on Electronics Technology (ICET)</source>
            <volume>12</volume>
            <pub-id pub-id-type="doi">10.1109/icet58434.2023.10211502</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
    </ref-list>
  </back>
</article>