^{1}

^{*}

^{1}

In metabolomics data, like other -omics data, normalization is an important part of the data processing. The goal of normalization is to reduce the variation from non-biological sources (such as instrument batch effects), while maintaining the biological variation. Many normalization techniques make adjustments to each sample. One common method is to adjust each sample by its Total Ion Current (TIC), i.e. for each feature in the sample, divide its intensity value by the total for the sample. Because many of the assumptions of these methods are dubious in metabolomics data sets, we compare these methods to two methods that make adjustments separately for each metabolite, rather than for each sample. These two methods are the following: 1) for each metabolite, divide its value by the median level in bridge samples (BRDG); 2) for each metabolite divide its value by the median across the experimental samples (MED). These methods were assessed by comparing the correlation of the normalized values to the values from targeted assays for a subset of metabolites in a large human plasma data set. The BRDG and MED normalization techniques greatly outperformed the other methods, which often performed worse than performing no normalization at all.

A major obstacle in global liquid chromatography mass spectrometry (LC-MS) based metabolomics is drawing comparisons between samples processed on different runs of the same instrument or on different runs from different instruments. There are a number of reasons for wanting to compare samples from different instrument runs. Single runs using a mass spectrometer are limited to a certain number of samples. When run through a mass spectrometer, samples are prepped and placed on a plate containing a defined number of wells with each well housing an individual sample. The number of wells available depends on the type and size of plate used, but is generally some multiple of 24 [

Exact concentration can be derived though calibration curves, i.e. standard curves, in which known concentrations of the target metabolite are included as a way to orient the ion counts and estimate the levels in samples of interest according to their position on the curve. For a thorough review of standard curves see the five part series by Dolan [

Lacking full quantitation, one must find some way to adjust the ion counts in different batches to each other. Batch effects are typically removed through normalization. The goal of normalization is to reduce the systematic variation but preserve the biological variation. Many normalization techniques used in the field adjust each sample. However, there are other normalization techniques that make adjustments for each metabolite, rather than the sample. Often normalization techniques are deemed successful if the variance has decreased. However, some of the important biological variation may have been removed also. Since the ideal measurement for a metabolite would be from a targeted assay or clinical measurement, we compare the normalized values to the values from a panel of targeted assays, where the concentrations have been measured.

For this discussion, it will be assumed that the data sets are organized so that the rows correspond to the samples and the columns refer to the features (metabolites). The most common normalization is total ion count (or total ion current) normalization (TIC) in which all metabolites in a sample is divided by the total number of ions observed in the sample [

Various adjustments to this basic premise include median normalization, MS-total useful signal (MSTUS) [

Normalizers of the first class are defined as being a ratio of the sample’s raw intensity values and a function of the sample vector. Let X i = { x i 1 ⋯ x i m } be the vector of observed ion counts for metabolites 1 , 2 , ⋯ , m , for sample i, and let X i N represent the resulting vector of normalized metabolites. Normalizers of the first class are defined as

X i N = X i f i (Xi)

where ƒ i ( • ) is some function. For example, for TIC ƒ i is the sum of all the raw peak areas in sample i, and thus X i N is a vector where the original values have been scaled by this sum.

Method | ƒ i ( • ) |
---|---|

TIC | f i = ∑ j = 1 m x i j |

MSTUS | f i = ∑ A x i j A = { k } such that x i k observed for all i ϵ { 1 , ⋯ , n } |

VECT | f i = ( ∑ j = 1 m x i j 2 ) 1 / 2 |

Mean | f i = ∑ j = 1 m x i j m |

Median | f i = m e d i a n ( X i ) |

MAD | f i = median ( | X i − median ( X i ) | ) |

LB^{a}^{ } | f i = median ( X i ) / median ( X Baseline ) |

PQN^{b}^{ } | q i j = x i j T I C / x c o n t r o l , j T I C |

^{a,}^{b}Baseline/Control spectrum may be taken from a designated sample or calculated from available data, such as sample with median TIC.

TIC of the resulting normalized sample is equal to that of the “baseline”. LB assumes a constant linear relationship between the sample and the baseline. Non-linear extensions are available. Although the name includes “scaling”, the intent is consistent with normalization which seeks to adjust all spectrum of each sample to the same level in some sense and the computation is consistent with the Class I definition. PQN, which involves a four-step process, is the most computation intensive of Class I normalizers listed here. In the first step TIC normalization is performed. Second, a control spectrum is calculated ? this may be based upon a designated sample or the median spectra from all samples may be used. Third, for each feature the ratio, i.e. quotient, of the TIC normalized intensity of the sample and control spectrum is found. The final normalizer is then the median of all quotients. Most of the other class I normalizers are reasonably straightforward to calculate and are not time intensive from a computational standpoint. Hence, these are popular and common choices for normalizing.

The second class of normalizers involve MA plots, which are derived from Altman-Bland plots on the log scale [

minus i j = log 2 ( x i j ) − log 2 ( x i j ′ )

a v g i j = log 2 ( x i j ) + log 2 ( x i j ′ ) 2 .

The “M” can be viewed as the log of the ratio, while “A” is the log of the product divided by 2. Orienting the two spectra in this way is intended to magnify trends, both linear and non-linear, related to the systematic variation, such as batch effects. Then an equation is fitted to this curve, so that one can remove the difference between the two samples due to the systematic variation. Under Cyclic LOWESS, a non-linear local regression curve (LOWESS) is fitted to the MA plot for a given pair of samples. The process is then repeated for all possible pairwise combinations of samples in the data set. Following a complete iteration over all samples, the cycle is repeated until some tolerance is achieved between the latest cycle and the preceding one.

Another variation on this is contrast normalization [

X O = log ( X ) • M .

The first row of M is the repetition of the constant 1 / k . The other rows of M are not uniquely defined, except in the case of k = 2 which gives

M 2 = 1 2 [ 1 1 1 − 1 ] .

For k > 2 , M is not unique, which requires some consideration for the next step in which X 1 0 , the first row of Y O , is used to predict the remaining rows X i 0 for i = 2 , ⋯ , n . Referring to these predictions as X ^ i 0 , using LOWESS regression with weighted least squares produces X ^ i 0 ′ s that are invariance to the choice of M . Estimation of X ^ i 0 ′ s is iterated until some tolerance between the previous and newest estimate is achieved. The final normalized matrix is then given by

X N = [ X 1 0 ( X 2 0 − X ^ 2 0 ) ⋯ ( X m 0 − X ^ m 0 ) ] ′ .

From this point the data set may be analyzed or mapped back to the original space via the reverse transformation

exp ( X N • M ) .

The similarity to cyclic LOWESS may not be immediately obvious; however, notice that when k = 2 the contrast matrix M coupled with the log transformation is analogous to the orientation of the MA plot. Contrast normalization essentially generalizes the MA concept to higher dimensions.

Normalizations that do not fit the criteria of Class I or Class II are classified here. One example of this is Quantile Normalization (Quant) [_{i} be the ordered set of intensities for sample i :

X i = { x i [ 1 ] ⋯ x i [ m ] } ,

and consider the vector of average ordered statistics across all X_{i}

X ¯ = { x ¯ [ 1 ] ⋯ x ¯ [ m ] } = { ∑ i = 1 n x i [ 1 ] n ⋯ ∑ i = 1 n x i [ m ] n } .

This essentially orders each row of the data set and then takes the average of each column. The normalized vector for a sample is then replaced with these values in the order corresponding to the ranks of un-normalized vector:

X i N = { x ¯ [ r a n k ( x i 1 ) ] ⋯ x ¯ [ r a n k ( x i m ) ] } .

An advantage to Quant is that it directly puts the intensities of each sample on the same scale, making sample to sample comparisons easier. One drawback is that features with missing values must be removed or imputed. Second, metabolites that are significantly more abundant may be normalized to a near static state. In fact, in the data set used in 2.4, oleic acid had the highest peak areas in every sample, so all of its values would all normalize to the same value. The same issue could apply to metabolites that are significantly lower in abundance than all other metabolites because metabolites near the limit of detection often drop out, i.e., no peak is detected.

Mass spectrometry returns an ion count that is proportional to the true concentration but also dependent on the instrumentation. Rocke and Lorenzato [

x i j b = β i b y i e η i j b + ε i j b .

The subscript of the sample concentration, y i , is dependent only on the biochemical since the k samples are technical replicates. β i b relates to the ionization effeciency of the instrument, and will vary by metabolite and batch. η i b ∼ N ( 0 , σ η i b 2 ) and ε i b ∼ N ( 0 , σ ε i b 2 ) are both normal, random errors with the former dominating at higher concentrations and latter dominating at lower concentrations. Note that the intercept term, which is related to the background level of the instrument, has been removed, as it is generally regarded as a nuisance parameter and is in fact ignored in single point calibration curves [

µ i b = E [ x i j b ] = β i b y i e σ η i b 2 / 2 .

Shuffling the order of these terms gives

µ i b = E [ x i j b ] = ( β i b e σ η i b 2 / 2 ) y i .

As both β i b and e σ η i b 2 / 2 are fixed, but unknown, parameters depending only on the metabolite and batch, these terms may be combined into a single unknown variable. Letting β i b ∗ = β i b e σ η i b 2 / 2 it is easy to see that mean ion count for the batch is proportional to true concentration level:

µ i b = β i b * y i

Hence, the mean ion count for the two batches is proportional:

µ i 1 β i 1 ∗ = µ i 2 β i 2 ∗

By the law of large numbers, there exists a k such that average of the replicates within a batch

x ¯ i b = ∑ j = 1 k β i b y i e η j i b + ε j i b k

is reasonably close to β i b ∗ y i . Scaling each batch against the mean of these replicates would thus eliminate the batch differences.

Data processing often includes QC samples as part of the metabolomic workflow in order to monitor instrument performance [

An important part of the experimental protocol should be randomization of the samples across the instrument runs. Under such randomization, for a given metabolite, the expected value of the relative concentration is the same for each instrument run if there were no batch effects. Hence, randomly assigning the samples and dividing the values for each metabolite on each instrument run day by the observed median should put each batch on the same scale. This is similar to the bridge normalization only with the samples themselves serving as the bridging.

Theoretically, the sample mean is generally a more consistent estimator than the sample median, but in skewed distributions and low sample sizes the efficiency of the mean can be impaired. Due to the propensity for extreme outliers in metabolomic data, which could adversely affect the sample mean, the median is used instead. This normalization procedure will be referred to as “MED”, henceforth.

The goal is to compare bridge set (BRDG) and median scaling of experimental samples (MED) to standard -omic normalizations that might be considered for a metabolomic data set. Total ion current (TIC), median absolute deviation (MAD), probabilistic quotient normalization (PQN) and cyclic LOWESS (CLOW) were chosen from the available options. This list includes a good mix of popular normalization methods and representatives of Class I and II normalizers.

Plasma samples were obtained from participants in the Insulin Resistance Atherosclerosis Family Study (IRASFS), which was sponsored by the National Heart, Lung and Blood Institute with the goal of examining the genetic epidemiology of insulin resistance and visceral adiposity [

Plasma samples from these participants were also run on a separate targeted assay of seven metabolites, which were shown to be markers for impaired glucose tolerance (IGT) [

Resulting normalized levels of these nine metabolites in the global panel are compared to the targeted results using Pearson’s correlation, r. All analysis was performed in R version 3.4.3 [

For a preliminary analysis, a variance components analysis was performed for the bridge samples in order to assess how much of the variation can be attributed to the instrument batch. Those metabolites present in at least 80% of the bridge samples were used for this analysis (1049 metabolites). The variance components were fitted with JMP v13 [

The correlations of the normalized data to the targeted assays are shown in

METABOLITE | % Var from BATCH |
---|---|

2-hydroxybutyrate | 96 |

3-hydroxybutyrate | 85 |

4-methyl-2-oxopentanoate | 95 |

alpha-tocopherol | 89 |

cholesterol | 84 |

linoleoyl-GPC | 74 |

oleic acid | 85 |

pantothenic acid | 72 |

serine | 70 |

METABOLITE | NONE | TIC | MAD | PQN | CLOW | BRDG | MED |
---|---|---|---|---|---|---|---|

2-hydroxybutyrate | 0.69 | 0.60 | 0.58 | 0.69 | 0.63 | 0.96 | 0.95 |

3-hydroxybutyrate | 0.96 | 0.94 | 0.92 | 0.96 | 0.86 | 0.99 | 0.97 |

4-methyl-2-oxopenanoate | 0.77 | 0.72 | 0.68 | 0.83 | 0.78 | 0.95 | 0.95 |

alpha-tocopherol | 0.40 | 0.23 | 0.23 | 0.40 | 0.22 | 0.17 | 0.81 |

Cholesterol | 0.50 | 0.24 | 0.29 | 0.50 | 0.23 | 0.69 | 0.73 |

linoleoyl-GPC | 0.26 | 0.29 | 0.30 | 0.26 | 0.31 | 0.49 | 0.56 |

oleic acid | 0.88 | 0.73 | 0.72 | 0.85 | 0.29 | 0.95 | 0.95 |

Pantothenate | 0.92 | 0.74 | 0.75 | 0.82 | 0.86 | 0.93 | 0.94 |

Serine | 0.89 | 0.83 | 0.65 | 0.89 | 0.81 | 0.93 | 0.92 |

even greater than 0.9. From

When performing normalization to metabolomics data, it is important that the method appropriately corrects for the systematic variation but preserves the biological variation. Various methods were assessed by comparing their values to targeted data, where the actual concentrations of certain metabolites in the samples are known. Many common normalization techniques that make corrections across each sample, such as TIC normalization, often performed worse than performing no normalization at all. The two methods that relied on metabolite-specific correlations (BRDG, MED) performed much better than the sample-based normalizations, and many of the resulting correlations were over 0.9. Correcting by the median batch value from the experimental samples (MED) can work well in a variety of applications. However, if one wants to run a very small set and merge into previous data sets or compare the values in two different data sets, it is probably better to normalize by bridge samples (BRDG). The main drawback of BRDG is that metabolites that are not present in the bridge samples cannot be normalized. Additionally, if the bridge samples are obtained from a different source, there may be some metabolites that have different batch effects in the bridge samples than in the experimental samples. To avoid this issue, having the bridge samples as similar as possible to the experimental samples is recommended.

The IRASFS was supported by the National Institutes of Health (HL060944, HL061019, HL060919, and DK085175).

The authors declare no conflicts of interest regarding the publication of this paper.

Wulff, J.E. and Mitchell, M.W. (2018) A Comparison of Various Normalization Methods for LC/MS Metabolomics Data. Advances in Bioscience and Biotechnology, 9, 339-351. https://doi.org/10.4236/abb.2018.98022