Evaluation Strategies for Coupled GC-IMS Measurement including the Systematic Use of Parametrized ANN Training Data

Data evaluation strategies for the novel coupled MCC-IMS sensory system are developed. Mayor attention to the plausibility of applied procedures and the feasibility of automation was paid. Three stages of extraction levels with increasing data reduction are presented for several fields of application. According to suitable extraction levels, real data were tested on various structures of artificial neural networks (ANN) with the result, that the computational levels must still be chosen by expertise, but subsequent processing and training can be fully automated. For the training of larger networks a method of automated generation of secondary training data is presented which exceeds the quality of previous noise models by far. It is concluded that the combination of MCC-IMS as measuring instrument and ANNs as evaluation technique have high potential for industrial use in process monitoring.


Introduction
Ion Mobility Spectrometry (IMS [1]) and Gas Chromatography (GC [2]) have been well established measuring technologies for several decades.However their coupling into a combined measuring technology (GC-IMS) is relatively new [3,4].Though this method is very promising in terms of sensitivity and accuracy, all evaluation tools have to be developed from the very beginning.Several attempts have been made to find an analytical approach for the description of GC-IMS spectra in order to automate the evaluation of these measurements, only little progress was gained [5,6].Due to their 2-dimensional nature GC-IMS measurements contain great quantities of data, which, depending on the measurement setup, may contain up to or even data points.

Measuring Principle and Resulting Properties of the Measurements
The GC-IMS is a system that measures 2 different properties independent from each other [7][8][9].While the GC column separates analytes depending on their ability to adsorb and desorb on the inner column surface (see Fig- ure 1), the IMS separates charged particles under the influence of an electrical field depending on their drift behavior in a carrier gas atmosphere.This can be seen in Figure 2. Beginning at the moment of the analyte injection into the GC column the output of the GC column is permanently analyzed by the IMS.At a given rate per second the IMS is recording 1-dimensional spectra until all fractions of the analyte have been eluted from the column.The taken 1-dimensional spectra are combined into one 2-dimensional spectrum with certain characteristics as it can be seen in Figure 3.
The carrier gas is always present in the measurement process and therefore is seen in all spectra.The according peak in the spectrum is called RIP (Reactant Ion Peak).The RIP is a constant feature of IMS spectra.The ions which are analyzed by the IMS are ionized by a radiation source in the reaction region of the IMS and then pulled by an electrical field toward the detector.The time from the opening of the gate, when the electric field starts to pull the ions till they hit the detector and cause an electric current is called the drift time of this particular ion sort.The ionized carrier gas (reactant ions) reacts with the analyte sample when latter is eluted from the GC into the IMS.Due to the reaction mechanism the formation of analyte ions competes with the amount of reactant ions present in the reaction region.Hence the detection of a peak at a given retention time R t reduces the intensity of the RIP at the same R t (see Figure 3).Since the radiation in the ionization region is constant, the amount of produced ions is assumed to by nearly constant and hence the amount of detected charged particles at the Faraday detector.
Thus a decreasing or even vanishing RIP intensity is always the result of detected analytes forming a peak at D RIP and preserving the overall ion amount detected at a given retention time

Available Measurement Data
Depending on the processing approach, the demanded amount of measurement data to initialize a certain processing algorithm can differ by orders of magnitude.
Here initialization means a first and one-time execution of the algorithm code which sets the processing program into the state where it can simply read an unknown measurement sample and produce an appropriate output to classify the unknown sample.Initialization can be the storage of reference samples, the evaluation of reference samples in order to adopt certain threshold parameters which determine the classification process or the training of an ANN.Furthermore any processing algorithm, once initialized, needs to be evaluated with new data sets that were not used before during the initialization.
The most simple case is the attempt to distinguish between two possible categories for example the breath of people with and without some specific decease or the detection of some specific contaminant in any food or beverage product.
Since every processing approach has to be evaluated with really measured data it is important to have a dataset of measurements big enough.Though the motto is, the bigger the dataset the better for the evaluation, the analysis in this work had to content itself to the dataset given in Table 1.The measurements were carried out using the FlavourSpec and GC-IMS by G.A.S.Both systems consist of a combination of an IMS with a chromatographic column for a better pre separation of volatile organic compounds (VOCs) in complex mixtures.The essential technical and physical data of the measuring devices are essembled in Listing 1.
The involved interaction potentials between drift gas and analyze particles are of enormous complexity.This is especially the case with complex analyze particles as MVOCs.There are no analytical models to predict the appearance of a molecule or atom in the spectra just from    fundamental modeling using first principle physics.Another way to distinguish between different spectra would therefore be of interest.In the following alternative approaches for evaluation and processing of GC-IMS spectra are presented.The benefits of using them in Neural Networks are discussed and results of classification tests produced by them are presented.One problem is to find a suitable and general evaluation strategy which helps to determine from which category a given measurement is.A successful strategy not only has to yield stable classification rates on unseen spectra/measurements, it furthermore needs to be general enough to be applied to new classification problems without cumbersome modifications.

Simple Metrics Evaluation
After inspecting the spectra of the breath measurements (two measurements from two different flavors can be seen in Figure 4) a quite simple and straight forward approach gets obvious.It is to define a metrics on the two dimensional spectra in order to determine a distance between two spectra.One possible definition of this distance between two arbitrary spectra and This method needs only one reference measurement   2 for the complete list of measurements).
i for every substance i that needs to be classified.Any uncategorized measurement R M would then be evaluated against all references and the minimum distance min of all distances i would be given as the most likely classification result for the unknown measurement.Meaning that if min than is the category where M most likely belongs to.
We find that only measurement categories with visible, obvious differences like those in Figure 4 (like the breath samples with different candy flavors as seen in Figure 4) can be separated by this algorithm.
Since the simplest straight forward approach doesn't work and several attempts to align measurements according to the RIP position failed to gain any improvement, it is mandatory to get more insight into the measuring principle.

Evaluation of RIP Profiles
Given the measuring characteristics explained in Section 2.1, one could use the RIP-shape for further analysis and neglect the information given by the points with > D RIP t t .By doing so the magnitude of data points is reduced from millions down to several hundreds.Figure 5 shows an exemplary extraction of a RIP-Profile along the retention time axis.At fixed drift time (which is the drift time of the RIP peak) the intensity values along the retention time axis are extracted and combined to one profile.This process can be automated because the RIP can be easily found in every GC-IMS measurement.This approach, though being simplifying, allows the use of Artificial Neural Networks (ANNs) for the evaluation and classification of measured spectra.The use of ANNs seems even imperative.As can bee seen in Figure 6 from the overlaid different spectra, it is not possible to define discrimination levels for the signals to distinguish between different measurements without multi-feature analysis, for which ANN are well suited if training data are sufficient.

Evaluation of Drift Profiles
The extraction of the RIP profiles looses a lot of information that is enclosed in the drift axis.A similar approach would be to extract a profile along the drift time axis and to lose information that is enclosed in the retention time axis.Since there is no exceptional point along the retention time axis like the RIP is on the drift time axis one has to find another extraction formalism differing from the RIP profile extraction.Two possible ways to obtain a drift profile Equation ( 3) is nothing else then just a simple IMS measurement, since all information that was gained during the separation process of the GC column was now again summed up as if it was never separated.Since Equation ( 4) is a projection and doesn't integrate all information along the retention time axis for every point in our drift spectrum this evaluation formalism was used.

Virtual Measurements
Depending on the setup, the RIP-Profiles contain from up to data points.Combined with the drift profile  that contains up to 3000 data points due to higher sam-variations, the RIP profile was cut in separate dips surpling rate, a total data volume of 3 4

10
 points per single measurement is obtained.This still is a powerful reduction of the original spectra by several orders of magnitude.Nonetheless the result is a vector of high dimensionality.Let the dimension of this vector be defined as n and the overall amount of weights in the Network be .It is obvious that n N N  must apply.Furthermore it is well known that t ount of examples T for the training of the Network must be greater then t amount of network weights N (with a rule of thumb 2 T N  [10,11]).This fact im cates that one needs sets 2 6 -10 measured examples to train a network.Wi uring duration between 3 to 10 minutes this is hardly to be accomplished.Therefore one must consider a way to produce new "virtual" measurements in parametrized manner to simulate the naturally measured scattering.
Generating new datasets by just superimposing white no he am pli he of a m 10 th eas ise over the existing data sets yields only poor results in the training process of networks.In contrast we find that virtual generation by functional approximation and subsequent parameter variation is more promising.As deduced earlier the information about detected substances is enclosed in the RIP profile as dip.Every dip represents a different detected substance.Position, amplitude and general form of every dip are considered to be the most decisive and relevant properties of the RIP profile.Hence one can use exactly these features and distort them slightly to generate new "virtual" measurements for the training set of the ANN.Since slight variations in the detection time (drift or retention time respectively) don't cause gaussian noise on the peaks but shift these peaks and change their shape.In order to produce rounded by two peaks.These partial profiles were superposed with the distortion rofiles, where p F t is the profile of the i th dip and min, max, , i i t t are the retent n times of the surrounding peak a.It is important to understand that the connecting points do not contain important information since at these points the signal of the RIP is relaxed back to its zero-level.Thus it is ensured that relevant data are varied and methodology is kept free from artefacts.As distortion functions Bézier curves [12,13] were used.A general Bézier curve of order n is defined as These partial distortion profiles for the individual dips are joined to one distortion profile which is wei by a random parameter, wh generated randomly for ghted ich is

The Classifying ANN ation performance of this da
The first tests of the classific approach showed poor results (as shown in Figure 8).Further analysis of the problem revealed that the used tasets had many categories which the ANN should discriminate.Modifications in the design structure of the used ANN brought a major improvement to the classification performance.The modified network structure is shown in Figure 9. Instead of presenting the example measurement to one single network and training this network to distinguish all categories from each other, the problem set with N categories was divided.Now N Networks were trai ed to distinguish the category from all other categories.This approach doesn't reduc the overall ratio of data sets selected for training per network weight.But for every binary network i n i e ANN which has to decide whether a given measurem from the category i or not, the ratio of training examples per weight im roved since the network needs less weights.An unknown measurement is subsequently presented to all Networks.The ideal case is, that only one of the networks produces the output "yes".This Network and the RIP profile based method yielded up to 100% classification rate on nearly all presented problem sets.This redesigned architecture (Binary Decision Tree) has another advantage.All datasets were pure measurements of one substance at a time thus every measurement is ent is p one of the pr und conclusions another test run w sf a binary decision tree with virtual data the binary training m definitely assignable to one category.So only n ANNs can have the output "yes" after evaluating the ofile vector with all ANNs.The result is a more redundant classification.
To substantiate the fo as made.Figure 10 shows the comparison between  the standard training approach with an ANN to di tinguish all categories from each other (yellow plot);  the training of a binary decision tree with virtual training data generated as described in Section 4.3 (blue plot);  the training o generated just by superposing white noise over the real measurements (green plot).One must keep in mind that in ode the evaluation files of 1 n  categories are combined and to one category so that the ANN can classify these 1 n  categories against one other category k .So the rel of profiles in category " k " and category "not k " is about 90% .This means that only classification ter than this "mixRate" are actually classifying something.As can be seen in Figure 10 only the binary decision tree method reaches classification rates above the mix rate, and therefore shows the best performance.Since the training sets for the binary decision tree were generated according to Section 4.3 this indicates that the principle of generating virtual measurements is correct.
Yet worth mentioni ation rates that are grea ng is the fact that the values shown in this plot are averaged values.Actually there were training runs, where the ANN in the binary decision tree architecture yielded classification rates of 100%.Though the usage of the RIP Profile seems ising approach and furthermore reduces the data by several orders of magnitude.It skips many useful data though, which is only justified as long as the classification performance is high enough.This is the case for many of the measurements but not for all of them.During the evaluation tests with olive oil measurements this method failed to classify as can be seen in Figure 11.The broad bands of standard deviation indicate that the training process doesn't converge in every training attempt to a high classification rate.This is another indication of insufficient discernible training data.The extracted RIP profiles do not carry enough information to train an ANN successfully in the case of olive oil.Since the RIP extraction looses all inform time, the next logical step to add information from projections onto the drift time axis of the spectra (drift profile).These drift profiles can resolve peaks which are distinct on the drift time axis but have the same retention time and therefore can not be resolved in a RIP profile.Strictly spoken, substances which are not separated by the GC column can't be seen as disjoint features on the RIP profile but they have a chance to be separated during the drift process.
The test of diff mensional spectra reveals that in some cases the RIP Extraction is not the ideal method to obtain training vectors and one should use another extraction technique to obtain less data intensive training sets.

Conclusions
Various methods h Different approaches of data reduction/extraction have been developed and evaluated as well as different structures of ANNs for the classification of data samples.To reach automated evaluation a peak detection algorithm was developed on the basis of the existing "growing islands" algorithm.This algorithm though developed to generate virtual measurements from RIP profiles is generic enough to detect peaks on any continuous profile where detected information is enclosed in the formation of peaks and dips.indicated that the data compression via RIP profile extraction is a powerful method.Out of 6 data sets with about 10 categories t The achieved classificatio e network classified 5 of the data sets with a rate of over 97%.A variety of measured substances like different sorts of juices to be distinguished from each other, several soft drinks, various oil sorts and meat in several aging stadiums have been used to evaluate the aforementioned methods.The discussed draw backs of this extraction method with RIP profiles and with drift profiles were observed during the classification tests.The two presented extraction methods showed different results on the available data sets.In the end at least one strategy for every available classification problem was found, which managed to reach high classification rates.The combined classification results of RIP profiles and drift profiles are encouraging.Since they lay ground for automated evaluation of measurements and possible monitoring applications.
An overview of the reached classification results with the different evaluation strategies is given in One disadvantage was already mentioned in the introduction.Though the RIP profile contains information about peaks being measured in the GC-IMS spectrum it doesn't show double peaks appearing at the same ret n time, which arise from monomer and dimer formation.Another drawback is the loss of information on the drift time of the peaks.This can lead to the creation of RIP profiles without any discriminating information.This is the reason why in some cases the drift profiles achieve better classification results.Since the principle is the same, information appears in form of the peaks, the profiles are interchangeable.
Further Improvements: Although this paper shows great potential in evaluation of GC-IMS measurements with ANNs, there are still improvements possible which are to be investigated.Othe ata duction methods should be considered.Furthermore tests with bigger data sets should be implemented to investigate the convergence and the classification behavior of the ANNs and the according data extraction strategies.Only statistically mo ts are able to determine the stableness and usability of this method.

Figure 1 .
Figure 1.Schematic draft of the measurement technique and the gas flow inside of a GC-IMS detector. analyte

Figure 2 .
Figure 2. Schematic picture of the IMS and its parts.

Figure 3 .
Figure 3. GC-IMS spectrum shown in perspective view with the duality of RIP and Peak.This is an artificial illustration intended to visualize basic principles of GC-IMS measurements.It is therefore exaggerated and not completely consistent since the smaller peaks should as well diminish the RIP intensity.The properties of this spectrum are discussed in the text.

Figure 4 .
Figure 4. Two different candy flavors taken from the breath measurement.The spectra show obvious differences.Left: Candy flavor 1; Right: Candy flavor 2 (see Table2for the complete list of measurements).

Figure 5 .
Figure 5. Extraction of a RIP-Profile out of a measured GC-IMS spectrum.

Figure 6 .
Figure 6.Averaged RIP spectra of 5 different substances in the same measurement class with the according standard deiation.v posed on the original measurement.ure7 a RIP profile and the distortion functions in different distortion strengths can be seen.The final "virtual" measurement is just the sum of the measured original profileIn Fig

Figure 7 .
Figure 7. Extracted RIP profile and overlay distortions for the generation of "virtual" measurements.

Figure 8 .
Figure 8. Classification performance of the standard ANN approach on the basis of RIP profiles.Results are poor.A rate of about 5% -10% is just as good as guessing when trying to classify between 18 possible categories.

Figure 9 .
Figure 9. Split ANN architecture to find a redundant deci- Figure 12 shows the comparison of the average of 10 training runs with  RIP profiles  Drift profiles  RIP + Drift prof as training sets.As seen b files only shows insufficient classification performance with the olive oil measurements.The training with combined RIP and Drift profiles shows better classification in the average but is very unstable.This is very probably due to the higher dimensionality of the training vector, which reduces the probability to reach a global minimum in the error function of the ANN.In the case of olive oil we find that the training with Drift profiles shows the best classification results.Beside the good classification performance one can see the very narrow band from the standard deviation indicating that this training approach yields very stable results and reaches good classification results in every training run.

Figure 10 .
Figure 10.Comparison between different data generation modes and different ANN structures.Green, red and blue curves are plotted on the left axis, the yellow curve is plotted on the right axis.

Figure 11 .
Figure 11.Classification performance of different training and data generation approaches on the basis of the RIP profiles.

Figure 12 .
Figure 12.Classification performance of different trainings sets.

Table 2 .
The classification rates given in this table are maximum rates that were reached during several training attempts.

Table 2 . Classification rates for all measured data with dif- ferent training profiles used to train the ANN.
work has been supported by the German BMBF, contract 01IS09046A.MCC-IMS measurement, in some cases this discrete data ion is regarded as continuous, which is justified by the high resolution due to a high sampling rate in the available measurements Complete charge accumulated at one certain retention time by summing up charge at all drift times his n B tBezier curve of order n defined by n + 1 control pointsCopyright © 2012 SciRes.OJAppS