Optimizing the Matrix Element Method with Deep Neural Networks for High-Energy Physics Analyses

Olokunboyo A. Olaiya

doi:10.4236/jhepgc.2025.113051

Journal of High Energy Physics, Gravitation and Cosmology > Vol.11 No.3, July 2025

Optimizing the Matrix Element Method with Deep Neural Networks for High-Energy Physics Analyses

Department of Physics and Astronomy, University of New Hampshire, Durham, USA.
DOI: 10.4236/jhepgc.2025.113051 PDF HTML XML 53 Downloads 325 Views

Abstract

The Matrix Element Method (MEM) is a widely used algorithm in experimental and theoretical high-energy physics (HEP) analyses. The MEM is based on the Lagrangian method to assess the compatibility of experimental events with a hypothetical process. The matrix element is then regressed with the help of a Deep Neural Network (DNN). The integration results can be approximated with DNN, which makes it possible to use the MEM for parameter scans, innovative physics searches, and other uses. The DNN can be trained to nearly mimic the outcomes of the direct numerical integration of the matrix element. It is possible to identify the specific ways in which the parton shower impacts the analysis by using the method to analyze these occurrences using fixed-order calculations. The method can, in theory, be applied to any measurement; however, processes involving intermediate resonances and leading to many-particle final states are anticipated to yield the highest improvement compared to cut-based analysis techniques. This research provides insights to understand the importance of optimizing machine learning strategies and fine-tuning regressions while also exploring the use of symmetry in particle physics with an interest in particle interactions and decays.

Keywords

Deep Neural Network, Matrix Element Method, Regressions

Share and Cite:

Olaiya, O.A. (2025) Optimizing the Matrix Element Method with Deep Neural Networks for High-Energy Physics Analyses. Journal of High Energy Physics, Gravitation and Cosmology, 11, 802-814. doi: 10.4236/jhepgc.2025.113051.

1. Introduction

Development in experimental and theoretical high-energy physics (HEP) analyses now employ the instrumentation of modern machine learning methods, involving deep learning, which are rapidly gaining applicability. Machine learning, a widely used automated inference procedure in HEP, is often associated with multivariate techniques like Boosted Decision Trees (BDT) [1]. In recent years, the set of methods and tools commonly employed in HEP has expanded significantly due to the deep learning revolution. To fully leverage the information contained in a given event, such as those generated by the Large Hadron Collider (LHC), statistical analysis necessitates enhancements in machine learning analysis techniques [2]. Neural networks and BDT are used in HEP to learn from simulations or data. Nonetheless, difficulties encompass the acquisition and generalization of physical data, employing control samples or regularization strategies, and curtailing reverse engineering possibilities because of indirect training data [3] [4].

In contrast, the Matrix Element Method (MEM) regression leverages our understanding of the Standard Model (SM) through Lagrangian to assess the compatibility of experimental events with a hypothetical process [2]. The inherent understanding of the physics process and detector response renders the internal dynamics of MEM more straightforward to interpret compared to machine learning techniques. Moreover, because there is no training step involved, MEM can be effectively utilized even when the dataset contains very few events, a scenario in which other methods encounter difficulties. The MEM, derived from Tevatron experiments, is widely used in particle physics analyses. It’s used at the LHC for searches and measurements of processes like top quark and anti-quark pair production and Higgs boson decay ( $t \bar{t} H$ ) and spin correlation. However, its complexity and computational demands make it more CPU-intensive than other methods [5]. The result of matrix element integration can be approximated with a Deep Neural Network (DNN), which makes it possible to use the MEM for parameter scans, innovative physics searches, and other uses. In the absence of a closed or computationally feasible form, the probabilities given by MoMEMta can be understood as indistinguishable functions of the 4-momentum. Given appropriate assumptions and a sufficiently large width divided across several layers, a neural network can approximate any function. While there’s no guarantee that the MEM’s output conforms to all of these presumptions, it sufficiently motivated this study. In comparison to matrix element evaluation based on direct integration, this method could produce much quicker results. DNNs often take significantly less computation time than classical MEM integrations, requiring a simulated sample and several hours of training. Advancements in the MEM field, such as GPU acceleration, parallel computing, BDT, neural networks, and normalizing flows, can reduce computation times and avoid integration variable optimizations [2].

In this study, I would like to understand the details of the computational/probabilistic distribution required for MEM, its fitting with DNN, topology using the Drell-Yan weights model, and application to real-life analysis for certain physical processes based on experimental data [2].

2. Methodology

1) Regression is a machine learning approach that is used to predict the continuous outcomes of interaction and decays based on the input variables. DNNs are simply types of neural networks having many layers between the output and input layers, which permits learning of difficult patterns in the data [5] [6]. To train the neural network, input features—such as particle momenta, energies, and other pertinent variables—derived from experimental data are fed into the system. By using theoretical calculations or Monte Carlo simulations to acquire the true distribution, the network is trained to minimize the difference between its predicted probability distribution and the true distribution. To improve its predictions, the network iteratively modifies its internal parameters (weights and biases) through techniques like gradient descent and backpropagation [7]. From first-principles computations for high-energy processes in a power series expansion order by order, we can simplify the expansion as in:

$F (x) = f_{00} (x) + α f_{01} (x) + α^{2} {f_{11} (x) + f_{02} (x)} + \dots$ (1)

in which case $α ≪ 1$ is a small expansion parameter. Where the functions $f_{i j}$ are related to the real part of a complex number, $f_{i j} \in ℝ (M_{i}^{⋆} M_{j})$ and $i + j$ gives the power of the expansion coefficient, that is a generic term such that $α^{i + j} f_{i j}$ ( $M_{i} \in ℂ$ with $i \in {0, 1, 2, \dots}$ is called the matrix element). The number of closed loops contributing to the Feynman graph is represented by the subscript, which is the total for a particular scattering process used in computing the matrix element. Here, the second-order terms (the order is given by the power of the $α$ ) ${f_{11} (x) + f_{02} (x)}$ is the term of interest, where $f_{02}$ is computationally space- and time-expensive than $f_{11}$ . This difference results from the Feynman graphs, where $f_{02}$ has two loops and $f_{11}$ has one loop. To expedite the computation of the functions, constructing a regressor with a representative sample is an easy fix. Speeding up Monte Carlo simulations is highly relevant not only in particle physics but in a very large set of problems addressed by all branches of physics using perturbative expansion like the one in Equation (1) [8]. However, the regressors must generate very highly accurate predictions over the whole function domain in order to reach the precision required for matching with experimental data. The specifications for the regressors, specifically pertaining to the definition of high precision, are as follows:

1) High precision: prediction error < 1% over more than 90% of the domain of the function.

2) Speed: prediction time per data point of <10⁻⁴ seconds.

3) Lightweight: the disk size of the regressors should be a few megabytes at the most for portability.

Compared to the Monte Carlo statistical error, for instance, the prediction error—whose exact definition is provided in Equation (2)—guarantees that propagating the approximation mistake on $f_{02}$ to the entire function $F$ stays a sub-leading source of error.

$δ = \frac{f {(x)}_{p r e d i c t e d} - f {(x)}_{t r u e}}{f {(x)}_{t r u e}}$ (2)

where $δ$ gives the difference between the predicted value of the function $f {(x)}_{p r e d i c t e d}$ , and its true value $f {(x)}_{t r u e}$ , normalized by $f {(x)}_{t r u e}$ [8].

The terms that emerge in the generation of four charged leptons in proton-proton collisions, $p p \to l_{1}^{+} l_{1}^{-} l_{2}^{+} l_{2}^{-}$ , where each lepton pair is mediated by an electrically neutral electroweak gauge boson, such as a $Z$ boson or a photon ( $γ$ ), are represented by the functions $f_{i j}$ in Equation (1). This process’s leading-order function ( $f_{00}$ ) equals the square of the scattering amplitude at the tree level. Higher terms have graphs with internal legs that form a closed loop, whereas graphs without such legs are connected with a tree-level amplitude. For instance, adding the second-order term results in a time penalty of 1500 times greater accuracy in predicting the rate of creation of four electrons [8]-[10].

2) Let’s talk about asymmetry. The functions in question are maps $f_{i j}^{(n)} \equiv f_{i j} : ℝ^{n} \to ℝ$ , where $n \in {2, 4, 8}$ and $i, j \in {0, 1, 2}$ . The feature space, or domain of the functions, is filled in from a uniform distribution and linearly transferred to the unit hypercube [10]. For instance, the physical phase-space coordinates in the 4-dimension are: the process total energy $\sqrt{s_{12}}$ , having a di-boson system scattering angle of $θ^{⋆}$ , with masses of the two bosons, $m_{34}$ and $m_{56}$ . Since the bosons are believed to be off their mass shell, these masses are not fixed in general [8] [9]. The 8-dimension case gives additional angles with leptons in the final state, and the 2-dimension case gives fixed masses. As a result, various physics analyses will call for various regressors. The 4D regressors, on the other hand, are more versatile and do not have any fixed physical parameters [8]. The smaller number of necessary functions in the third column of Table 1 is obtained by leveraging the symmetry properties discussed below derived from physics domain knowledge. For the data used in Ref. [8], it stems from the symmetries manifesting in the scattering process that was simulated. In Table 2, the last column indicates whether summing the functions has a physics meaning; in the cases where it does, i.e. 2D and 8D, only the single regressor of the sum of the functions is required. The complete functions, $f_{i j}^{(n)}$ , for any dimension $n$ , is full. Certain combinations of the external particles that the method specifies allow pairs of functions to be mapped into each other. Thus, in feature space, the second coordinate, $x_{2}$ , is subject to a linear transformation both separately and in conjunction with the permutation of the third and fourth coordinates, $x_{3}$ and $x_{4}$ . For instance, in 4D, the number of independent functions is reduced from 162 to 25 by two permutations, $π_{12} : = p_{1} \leftrightarrow p_{2}$ and $π_{34} : = p_{3} \leftrightarrow p_{4}$ as seen in Table 3. Here, $p_{i}$ represents a particle with label $i$ .

Table 1. The dimension determines how many of the resulting functions-technically known as helicity amplitudes-there are [8].

Variable	2D	4D and 8D
$\sqrt{s_{12}}$	$[m_{34} + m_{56}, 1 TeV]$	$[m_{34} + m_{56}, 3 TeV]$
$\cos θ^{⋆}$	$[- 1, 1]$	$[- 1, 1]$
$m_{34}, m_{56}$	$m_{Z} = 91.1876 GeV$	$[50, 130] GeV$

Table 2. Symmetry characteristics minimize the quantity of necessary functions [8].

Dimensionality	Total functions	Independent functions	Sum is physical?
2D	18	5	Yes
4D	162	25	No
8D	8	4	Yes

Table 3. Symmetry representation [8].

Permutation	Particle Symmetry	Coordinate Symmetry
$π_{12}$	$p_{1} \leftrightarrow p_{2}$	$x_{2} \to 1 - x_{2}$
$π_{34}$	$p_{3} \leftrightarrow p_{4}$	$x_{2} \to 1 - x_{2}$ and $x_{3} \leftrightarrow x_{4}$

3) Describing the matrix method. The methodology can, in theory, be applied to any measurement; however, processes involving intermediate resonances and leading to many-particle final states are anticipated to yield the highest improvement in comparison to cut-based analysis techniques. Generally speaking, the MEM allows for the simultaneous determination of multiple unknown parameters (both experimental and theoretical parameters characterizing the detector response and the physics processes being measured) in a single measurement, thereby reducing systematic uncertainties [11]. The possibility that $L_{s a m p l e}$ will see a sample of chosen events in the detector is the foundation of the Matrix Element technique. The likelihood is computed as a function of the assumed values for each of the parameters to be monitored and is derived directly from the theoretical prediction for the detector resolution and the differential cross-sections of the relevant processes. The measurement of the parameters is obtained by minimising $- \ln L_{s a m p l e}$ , where the likelihood $L_{s a m p l e}$ for the total event sample is calculated by multiplying the likelihoods of observing each individual event [11] [12]. The majority of analysis techniques in experimental particle physics, in contrast, compare distributions from observed events in the detector with corresponding distributions from simulated events generated theoretically, passed through a detector simulation, and then reconstructed using the same event reconstruction software [13]. The sample likelihood $L_{s a m p l e}$ for $N$ measured events to have measured properties $x_{1}, \dots, x_{N}$ can be written as:

$L_{s a m p l e} (x_{1}, \dots, x_{N}; \vec{α}, \vec{β}, \vec{f}) = \prod_{i = 1}^{N} L_{e v t} (x_{i}; \vec{α}, \vec{β}, \vec{f})$ (3)

where the symbol $\vec{α}$ denotes assumed values of the physics parameters to be measured, $\vec{β}$ stands for parameters describing the detector response that are to be determined, and $\vec{f}$ is defined below. The likelihood $L_{e v t} (x_{i}; \vec{α}, \vec{β}, \vec{f})$ to observe event $x_{i}$ under the assumption of parameter values $\vec{α}, \vec{β}$ and $\vec{f}$ is given as the linear combination:

$L_{e v t} (x_{i}; \vec{α}, \vec{β}, \vec{f}) = \sum_{P r o c e s s e s, P} f_{P} L_{P} (x_{i}; \vec{α}, \vec{β})$ (4)

where the sum is over all individual processes $P$ that could have led to the observed event $x_{i}$ , $L_{P} (x_{i}; \vec{α}, \vec{β})$ is the likelihood to observe this event under the assumption that it was produced via process $P$ , and $f_{P}$ denotes the fraction of events from process $P$ in the entire event sample, with $\sum_{P} f_{P} = 1$ . In total, the physics parameters $\vec{α}$ , the detector response described by $\vec{β}$ , and the event fractions $\vec{f}$ are to be determined simultaneously from the minimization of $- \ln L_{s a m p l e}$ [12].

The MEM is used to calculate $P (x | α)$ , which is the likelihood of observing an experimental event in light of theoretical hypothesis $α$ . Here, $x$ stands for the 4-momenta of any number of particles that have been detected in their final condition. The aim is to differentiate the parton level particles $y$ generated at the contact point prior to hadronization and their detection from the experimentally observed particles $x$ . $α$ can be used to denote various models or any set of parameters (such as the mass of a resonance) [2]. The probability that a hard scattering will result in a partonic final state $y$ for hadron colliders is proportional to the differential cross section, which is calculated as

$d σ (q_{1}, q_{2}, y) = \frac{{(2 π)}^{4} {| M (q_{1}, q_{2}, y) |}^{2}}{q_{1} q_{2} s} d Φ (y)$ (5)

where $q_{1}$ and $q_{2}$ are the initial state parton momentum fractions and $s$ is the center-of-mass energy. $d Φ (y)$ is the n-body phase space of the final state $y$ , while ${| M (q_{1}, q_{2}, y) |}^{2}$ denotes the matrix element for the given process $α$ .

The parton distribution functions (PDF) $f_{a} (q)$ (for each parton $q$ of flavor $a$ ), efficiency $ϵ (y)$ to reconstruct and select the hadronic state $y$ , and transfer function $T (x | y)$ normalized with respect to $x$ are all involved in the transmission of the parton-level 4-momenta $y$ to the experimentally observed ones $x$ . The latter parameterizes the hadronization, the parton shower, and the detector response (which results in a blurring of the momenta of the detected particles due to its limited resolution) [2] [13]. A convolution of the differential cross section with the transfer function and a total over the initial states yields the probability $P (x | α)$ :

$\begin{matrix} P (x | α) = \frac{1}{σ_{α}^{o b s}} \int_{q_{1}, q_{2}} \sum_{α_{1}, α_{2}} \int_{y} d q_{1} d q_{2} f_{α_{1}} (q_{1}) \\ \times f_{α_{2}} (q_{2}) \frac{{(2 π)}^{4} {| M (q_{1}, q_{2}, y) |}^{2}}{q_{1} q_{2} s} T (x | y) ϵ (y) \end{matrix}$ (6)

where $σ_{α}^{o b s}$ stands for the visible/observed cross-section and ensures that the probability is normalized. It is often computed a posteriori with $σ_{α}^{o b s} = σ_{α} 〈 ϵ 〉$ , where $σ_{α} = \int d σ_{α} (y)$ is the total cross-section and $〈 ϵ 〉$ is the expectation of the selection efficiency. Define the MEM weights as $W (x | α) = σ_{α}^{o b s} \times P (x | α)$ . In addition, the weights can span several orders of magnitude which is why most of the time we will use the event information defined as $I_{α} = - \log_{10} (P (x | α))$ or ${I^{'}}_{α} = - \log_{10} (W (x | α))$ which only differs by a constant term. Several complex processes are involved in the transfer function, and in order to make its integration easier, a number of assumptions are typically made. It is possible to factorize the transfer function for the various particles since each particle’s detection and measurement in the final state are independent, in first order. By factorizing the various parts of the measured 4-momentum, this argument can be advanced even further [2]. Thus, the transfer function is written as

$\begin{array}{l} T (x | y) = \prod_{i = 1}^{n} T_{i} (x^{i} | y^{i}), \\ T (x^{i} | y^{i}) = T_{i}^{E} (x^{i} | y^{i}) T_{i}^{η} (x^{i} | y^{i}) T_{i}^{ϕ} (x^{i} | y^{i}) \end{array}$ (7)

where the index $i$ refers to the final-state particles. The resolutions in $η$ and $ϕ$ are specified as delta functions and are typically very narrow. Conversely, the energy resolution is contingent upon the particle characteristics and must replicate the behavior of the simulation detector effects, which are usually Gaussian in the case of rapid simulations [2] [11] [13].

4) Drell-Yan process and weights. In this process, a virtual photon, or Z boson, is created when a quark from one hadron and an antiquark from another hadron annihilate. After that, this virtual particle decays into two leptons with opposing charges (e.g., electrons and muons). This method is important because it yields important information regarding parton distribution functions (PDFs). These PDFs characterize the distribution of momentum among the partons (quarks and gluons) of an incoming high-energy nucleon. Deep inelastic scattering (DIS) is strongly associated with the Drell-Yan mechanism. The Feynman diagram for the Drell-Yan process can be obtained by rotating the DIS Feynman diagram by 90 degrees [14].

5) DNN decisions and fitting with MEM. Fully linked layers with linear activations for the output layer and leaky ReLU activations for the hidden layers make up the DNN architectures used in Ref. [8], see Figure 1. The Leaky ReLU performs better than other activation functions like ReLU, softplus, and ELU [9] [15] [16] in a comparison study of activation functions in “Deep neural networks and skip connections,” which has demonstrated time affordability. In order to evaluate the training stage’s progress, the coefficient of determination, or $R^{2}$ , can be computed using,

$1 - R^{2} = \frac{\sum_{i} {(f {(x_{i})}_{p r e d i c t e d} - f {(x_{i})}_{t r u e})}^{2}}{N σ^{2}}$ (8)

where $σ^{2}$ is the true values’ variance over the $N$ validation points that are identified by the index $i$ . Keep in mind that the mean-squared error, or the selected objective function, is proportional to $1 - R^{2}$ . While the mean-squared error alone does not allow us to compare different models self-consistently, the constant of proportionality, $σ^{2}$ , standardizes it in a way that makes it a meaningful figure of merit during the training phase [8] [10] [13].

Figure 1. The building block with skip connections in a DNN. The activation of Leaky ReLU is totally associated with the first two levels. After being added to the input via the skip link and having linear activation, the last layer is converted using a non-linear Leaky ReLU function. If the input and output dimensions for the block and transformation gate $I$ are different, then the weight matrix $W$ can be trained. To create the neural network, the blocks are layered one after the other [8].

Depending on how sophisticated the process is, calculating an event’s weight with MoMEMta can take a few seconds to several minutes. It quickly becomes prohibitive to repeat this for every hypothesis $α$ and every event. In real-world analyses with several hypotheses and perhaps extra model parameters, this frequently complicates the method’s use. To put it briefly, Ref. [8] study’s method processes simulated samples using MoMEMta and generates event information $I^{'}$ for various hypotheses. The outcome is then utilized to train a DNN together with MoMEMta inputs, which are the 4-momenta of every observable particle. The 4-momenta of all the detected particles and the missing transverse energy ( $P_{T}$ and $ϕ$ ) are the inputs of MoMEMta; as such, these are the inputs we wish to supply to the DNN [2]. Particles produced may be boosted in one or both directions, depending on the longitudinal difference in momentum between the initial partons that collide in the detector. The network’s capacity to explain the interesting portion of the matrix element will be hampered by the need to learn about the longitudinal boost in addition to the function in Equation (6). The situation is greatly improved by using the $P_{T}$ , $η$ , and $ϕ$ angles for each particle as inputs since the $Δ η$ between two particles is well approximated and independent of the longitudinal boost [2].

3. Proof of Concept and Application to $t \bar{t}$ Production

Let’s use the MEM in this study to produce two opposite sign leptons and two jets (bjets) initiated by b quarks as detected particles as a proof of concept. The primary contributions to this topology make it intriguing—Drell-Yan $Z \to l^{+} l^{-}$ production with additional jets, and top quark pair production $t \bar{t}$ having leptonic decays of the $W$ bosons from top quarks—are very dissimilar in the way they are treated in the integration. While there are no missing particles in the former case, the later involves undiscovered neutrinos whose degrees of freedom must be taken into consideration. Considering the resonnant $H \to Z (\to l^{+} l^{-}) A (\to b \bar{b})$ process that results in the context of Two Higgs Doublet Models (2HDM)), and has been studied by the ATLAS and CMS collaborations [17] [18]. The unknown masses of the $H$ and $A$ bosons will demonstrate the effectiveness of our method when the parameter space is multi-dimensional, which is precisely the situation where the classical integration is impracticable. The multiple resonances in this process also pose an interesting challenge from the integration point of view [2] [16]. A synopsis of the events for which the weights have been calculated indicates that, in order to prevent the network from becoming unbalanced, only 500 K events for the $t \bar{t}$ process have been employed in the training [2]. For both $M_{A}$ and $M_{H}$ , the $H \to Z A$ samples and weights are divided into 23 mass configurations up to 1 TeV. For each event $t \bar{t}$ , Drell-Yan, and $H \to Z A$ , the event information $I_{α}^{'}$ will be assessed 23 times using various mass parameters specific to the $H \to Z A$ instance. The data given in Ref. [2] also includes the number of faulty weights and the number that were recomputed using additional iterations. Figure 2 displays the ${I^{'}}_{D Y}$ distributions for each of the three sample types. The weights from the MEM computed with MoMEMta and the ones from the DNN agree quite well. The model with six layers and 200 neurons, with RELU and SELU [9] [15] [16] activation functions for the hidden and output layers, respectively. Utilizing the Breit-Wigner resonances in a variable change allows for a reasonable computation time reduction (approximately 3.2 times slower than for the Drell-Yan weights) [1] [8]. Figure 3 displays the corresponding ${I^{'}}_{t \bar{t}}$ distributions. Although there is less of a discrepancy in the weight distribution between the Drell-Yan and $t \bar{t}$ samples, the Drell-Yan events have a longer tail. The distinct mass configurations $M_{H}$ and $M_{A}$ that make up this sample are the source of the double peak observed in the $H \to Z A$ scenario. Low weights are associated with high (pseudo)scalar masses, but low masses are more in line with the theory of $t \bar{t}$ . Overall, there is good agreement between the weights from the DNN and those obtained classically [2]. Considering the application’s next-leading order (NLO) of MEM in $t \bar{t}$ production, we can examine the impact of missing higher-order effects in the likelihood computation as an example. With POWHEG [19] [20], a single top-quark event is produced, and PYTHIA8 [21] is used to sprinkle it down [12]. It is possible to identify the specific ways in which the parton shower impacts the analysis by using the MEM to analyze these occurrences using fixed-order calculations. The extraction of the top-quark mass using the MEM from showering events produced using POWHEG and PYTHIA8 is depicted in the left plot of Figure 4. The likelihood function is exclusively computed in the Born approximation (blue) and with NLO adjustments (red) in the study displayed in Figure 4. To derive the estimator for the top-quark mass ${\hat{m}}_{t}$ as the position of the minimum and the related statistical uncertainty $Δ {\hat{m}}_{t}$ as the breadth of the parabola, a parabola is fitted to the negative logarithm of the likelihood function (Log-Likelihood). By factorizing in two different factorization and renormalization scales in the likelihood calculation, the theoretical uncertainty resulting from missing higher orders is assessed. The missing higher-order adjustments in the corresponding likelihood calculations are the reason for the biases in the recovered estimators ${\hat{m}}_{t}^{B o r n} = 150.19$ GeV and ${\hat{m}}_{t}^{N L O} = 163.75$ GeV with regard to the input value for the top-quark mass $m_{t}^{t r u e} = 173.2$ GeV. When there is a discrepancy between the probability density used to calculate likelihood and the actual distribution of the events, the MEM is known to provide biased estimators. Hence, there is a need for the calibration of the method in introducing its associated uncertainties.

Figure 2. Distributions of the event information ${I^{'}}_{D Y}$ from both MoMEMta and the DNN for the three samples Drell-Yan (left), $t \bar{t}$ -(middle) and $H \to Z A$ (right) events [2].

Figure 3. Distributions of the event information ${I^{'}}_{t \bar{t}}$ from both MoMEMta and the DNN for the three samples: Drell-Yan (left), $t \bar{t}$ -(middle) and $H \to Z A$ (right) events [2].

Figure 4. Left: Top-quark mass extraction via the MEM at leading order (LO) (blue) and NLO (red) from single top-quark events generated with POWHEG + PYTHIA8. Right: Same but with extended likelihoods functions [12].

4. Conclusion

This study proposes the utilization of a Deep Neural Network (DNN) to approximate the integral of the Matrix Element Method (MEM) with the objective of expediting MEM computations. It is inferred from the few examined and documented example processes that, with the assistance of specialized tools like MoMEMta, a DNN can be effectively trained to closely replicate the outcomes of the direct numerical integration of the matrix element. Although the weights in this regression employing the DNN inherently contain errors, these are unlikely to significantly impact performance across various applications. The adoption of accelerated weight computations presents a plethora of opportunities, such as systematic research, probability scans, parameter scans, and, more broadly, the application of MEM to a wider array of physics analyses, including the discovery of novel physics phenomena.

Acknowledgements

I wish to express my sincere gratitude to Professor Elena Long for her invaluable introduction to “Special Topics in Particle Physics”. The research by Simone Alioli et al. served as a significant impetus for this study.

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper.

References

[1]	Nguyen, N.T.T., Kenyon, G.T. and Yoon, B. (2020) A Regression Algorithm for Accelerated Lattice QCD That Exploits Sparse Inference on the D-Wave Quantum Annealer. Scientific Reports, 10, Article No. 10915.[CrossRef] [PubMed]
[2]	Bury, F. and Delaere, C. (2021) Matrix Element Regression with Deep Neural Networks—Breaking the CPU Barrier. Journal of High Energy Physics, 2021, Article No. 20.[CrossRef]
[3]	Dery, L.M., Nachman, B., Rubbo, F. and Schwartzman, A. (2017) Weakly Supervised Classification in High Energy Physics. Journal of High Energy Physics, 2017, Article No. 145.[CrossRef]
[4]	Kuusela, M., Vatanen, T., Malmi, E., Raiko, T., Aaltonen, T. and Nagai, Y. (2012) Semi-Supervised Anomaly Detection—Towards Model-Independent Searches of New Physics. Journal of Physics: Conference Series, 368, Article ID: 012032.[CrossRef]
[5]	Bruscino, N. (2016) Search for the Standard Model Higgs Boson Decaying into b^- Produced in Association with Top Quarks Decaying Hadronically in Pp Collisions at = 8 TeV with the ATLAS Detector. 4th Annual Large Hadron Collider Physics, Lund, 13-18 June 2016, Sweden, 1-4.[CrossRef]
[6]	Dalitz, R.H. and Goldstein, G.R. (1999) Test of Analysis Method for Top—Antitop Production and Decay Events. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 455, 2803-2834.[CrossRef]
[7]	Leigh, M., Raine, J.A., Zoch, K. and Golling, T. (2023) v-Flows: Conditional Neutrino Regression. SciPost Physics, 14, 159.[CrossRef]
[8]	Bishara, F., Paul, A. and Dy, J. (2024) High-Precision Regressors for Particle Physics. Scientific Reports, 14, Article No. 5294.[CrossRef] [PubMed]
[9]	Nair, V. and Hinton, G.E. (2010) Rectified Linear Units Improve Restricted Boltzmann Machines. Proceedings of the 27th International Conference on International Conference on Machine Learning, Haifa, 21-24 June 2010, 807-814.
[10]	Grazzini, M., Kallweit, S. and Wiesemann, M. (2018) Fully Differential NNLO Computations with MATRIX. The European Physical Journal C, 78, Article No. 537.[CrossRef]
[11]	Fiedler, F., Grohsjean, A., Haefner, P. and Schieferdecker, P. (2010) The Matrix Element Method and Its Application to Measurements of the Top Quark Mass. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 624, 203-218.[CrossRef]
[12]	Martini, T., Kraus, M., Peitzsch, S. and Uwer, P. (2020) The Matrix Element Method as a Tool for Precision and Accuracy. Proceedings of European Physical Society Conference on High Energy Physics—PoS (EPS-HEP2019), Belgium, 10-17 July 2019, 1-9.[CrossRef]
[13]	Fiedler, F. (2010) Precision Measurements of the Top Quark Mass. https://api.semanticscholar.org/CorpusID:118787406
[14]	Chen, L., Gao, J. and Liang, Z. (2014) Nuclear Dependencies of Azimuthal Asymmetries in the Drell-Yan Process. Physical Review C, 89, Article ID: 035204.[CrossRef]
[15]	Sun, Y., Wang, X. and Tang, X. (2015) Deeply Learned Face Representations Are Sparse, Selective, and Robust. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 2892-2900.[CrossRef]
[16]	Clevert, D., Unterthiner, T. and Hochreiter, S. (2015) Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). https://api.semanticscholar.org/CorpusID:5273326
[17]	Saggio, A. (2020) Search for 2HDM Neutral Higgs Bosons through the Process → ZA → l⁺l⁻ b with the CMS Detector. Proceedings of European Physical Society Conference on High Energy Physics—PoS (EPS-HEP2019), Belgium, 10-17 July 2019, 1-5.[CrossRef]
[18]	Petrucciani, G. (2013) Searches for a Heavy Higgs Boson Decaying into WW, ZZ in Final States with Hadronic Jets and Neutrinos. In: Petrucciani, G., Ed., The Search for the Higgs Boson at CMS, Scuola Normale, 147-163.[CrossRef]
[19]	Alioli, S., Nason, P., Oleari, C. and Re, E. (2010) A General Framework for Implementing NLO Calculations in Shower Monte Carlo Programs: The POWHEG Box. Journal of High Energy Physics, 2010, Article No. 43.[CrossRef]
[20]	Alioli, S., Nason, P., Oleari, C. and Re, E. (2009) NLO Single-Top Production Matched with Shower in POWHEG: s-and t-Channel Contributions. Journal of High Energy Physics, 2009, 111.[CrossRef]
[21]	Sjöstrand, T., Ask, S., Christiansen, J.R., Corke, R., Desai, N., Ilten, P., et al. (2015) An Introduction to PYTHIA 8.2. Computer Physics Communications, 191, 159-177.[CrossRef]

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies