Performance Comparison among Trained Judges and Panels for the Evaluation of “Cuajada” Type Fresh Cheese in Two Regions from Oaxaca in México

Four “cuajada” type fresh cheeses were evaluated by two trained panels in different regions from Oaxaca in Mexico (Instituto Tecnológico de Comitancillo (ITC) and Universidad del Mar (UMAR)). Each panel was integrated by six judges. Analysis of variance (ANOVA) in conjunction with principal component analysis (PCA) and the Rv coefficient were used to identify similarities and consensus among trained judges and panels. The ANOVA results revealed that the judges from ITC performed significantly better (P < 0.05) in discrimination and repetitivity, while both panels showed a similar effect in discrimination. The PCA determined some similarities in the position of cheeses in the sensory space, while Rv revealed that judges and panels were consensual. In conclusion, statistical resources determined that both panels were discriminating and that similarities in the positioning of cheeses in the sensory space were found.


Introduction
In the Mexican state of Oaxaca, a type of fresh cheese typically known as "cuajada" is produced in several municipalities from the region of the Isthmus of Tehuantepec, and it's commonly sold in local markets; so it has been positioned in the diet of consumers from this region for decades.This type of cheese is manufactured by a handcraft process [1] without quality control; this can result in differences in the manufacturing practices among the regional producers.In addition, this product lacks a sensory description.The Quantitative Descriptive Analysis (QDA ® ) [2,3] is the method commonly used for sensory characterization of foods [4].The sensory attributes play an important role in the selection of a product by consumers [5] and at the same time they are used for the construction of sensory profiles, which are selected and quantified by trained judges [6].The performance of these judges can be assessed from different perspectives: 1) the ability of discrimination on the sensory characteristics of the products; 2) the congruence and repeatability of the responses; 3) the use of lexicon; and 4) the application of the procedure [7].Another issue of the "cuajada" type fresh cheese producers involves the territorial expansion of their product, because the possible sensory changes of the product from its place of origin to other places are unknown.In recent years, studies at interlaboratory or trained inter-panels levels have allowed the observation of differences in the sensory characterization of the same product in different locations, as well as the variations in the sensory profile of one panel to another within or outside the same culture.This has led to the standardization of sensory profiles and the measurement of the level of performance of judges and panels from different sensory laboratories [4,8,9]; also these kinds of studies have shown that data obtained by different panels are reliable and repeatable [10].Nowadays, products like Type Fresh Cheese in Two Regions from Oaxaca in México cheeses [11][12][13], beer [14], walnuts [4], chocolate [8], dry sausage [15] and jellies [9,16] have been successfully evaluated at a trained inter-panels level.In these types of studies the validation of performance of judges and the comparison of sensory profiles provided by different trained panels can be evaluated by univariate statistical methods such as Analysis of Variance (ANOVA) [17,18] and by multivariate statistical methods such as Principal Component Analysis (PCA) [19][20][21], Generalized Procrustes Analysis (GPA) [22], the Structuration des Tableaux a Trois Indices des Statistique (STATIS) [23,24] and Multiple Factorial Analysis (MFA) [8,25,26]; in this last method the position of the products and attributes used by each panel in the sensory map can be displayed [27].The objective of this study is to evaluate the performance of trained judges and inter-panels for their correlation and comparison of the sensory profile of "cuajada" type fresh cheese in two different regions in the state of Oaxaca.

Geographical Distribution of "Cuajada"
Type Fresh Cheese-Producing Areas The four "cuajada" type fresh cheeses evaluated in this study were manufactured by cheese' producers from four municipalities of the Isthmus of Tehuantepec in Oaxaca, Mexico.The first cheese was manufactured in the area of Santo Domingo Ingenio (94˚46' west longitude, 16˚35' north latitude and at 40 meters above sea level).The second cheese was manufactured in the area of Juchitán de Zaragoza (16˚26' north latitude, 95˚01' west longitude and at an altitude of 30 meters).The third cheese was manufactured in the area of Asuncion Ixtaltepec (95˚03' west longitude, 16˚30' north latitude and at 30 meters above sea level).The last cheese was elaborated in the area of San Pedro Comitancillo (95˚09' west longitude, 16˚29' north latitude, at 70 meters above sea level).

Product Experimental Conditions
The sampled cheeses were vacuum packed in portions of 1 kg using a Multivac ® equipment, model C350 (Multivac Company of México).Then, the products were transported in refrigerated containers at 4˚C ± 1˚C for sensory analysis by two panels located in different geographical regions from Oaxaca in Mexico (Comitancillo y Puerto Angel).The cheeses were labeled as follows: cheese A: Santo Domingo Ingenio; cheese B: Juchitán de Zaragoza; cheese C: Asunción Ixtaltepec; cheese D: San Pedro Comitancillo.Prior to the sensory study, the cheeses were maintained at 25˚C for 1 hour; then, they were cut into cubes of 3.5 × 3.5 cm and served for evaluation by the judges of each panel [11,[28][29][30].[2,3].Sensory characteristics evaluated by the panels were white color, granular texture to the touch, soft to the touch, smell of rennet, salty, lumpy in the mouth, soft in the mouth and aroma to serum, using a continuous scale ranging from 0 (null intensity) to 9 (strong intensity) [25].

Trained Panels
Each training session lasted about 45 to 60 minutes.Samples of cheese were served to the judges of both panels in a simultaneous multiple ways [31].

Performance Evaluation among Judges
The performance of judges in each panel was evaluated by a one-way (product) analysis of variance (ANOVA), using the test of Fisher (F) test as an index of discriminant ability (F product ) and the mean square error (Mse) as an index of the repeatability among sessions; High values of (F product ) and low values or near zero of Mse, mean that a judge is discriminant and repetitive respectively [17,18,20,32].

Performance Evaluation by Trained Panel
In order to check the performance of each panel, the following mixed model of three-way ANOVA with interacttion (Product × Judge) was used: Attribute = Product + Judge + Session + Interaction (Product × Judge) + Error, where judge, session e interaction were considered as a random effect, and product as a fixed effect [12,15,17,33], using the Fisher (F) test, and the product factor (F product ) as an index of discriminatory power.Consensuality in the use of the scale among judges was assessed by judges factor (F judges ); session factor (F session ) was used to measure differences of the results among sessions, and the interaction factor (F interaction ) was used to determine similarities or differences in the classification of the chesses over the scale, with a level of significance of α = 0.05 [34].Type Fresh Cheese in Two Regions from Oaxaca in México 2.4.1.3.Comparison of Sensory Characterization of Chesses at Trained Inter-Panels Level In order to investigate if the two independent panels characterized the products in the same way, the data of both groups were analyzed using the next mixed model of ANOVA developed by [6,8,35]: Attribute = Product + Panel + Judge + Session + Interaction (Product × Panel) + Error, where product and panel were considered as fixed effects, and judge, session e interaction as random effects; the product factor (F product ) was considered as an index of discriminant power, while the panel factor (F panel ) as an index to evaluate the differences between trained panels; the judge factor (F judge ) as an index of evaluation of the differences of a judge within the same panel; the session factor (F session ) as an index of evaluation of the differences in the scores among sessions, and the interaction factor (F interaction ) as an index of the differences in the classification of the cheeses over the scale for each panel, with a level of significance of α = 0.05.

Bidimensional Aspects
The sensory space for each panel was built using Principal Components Analysis (PCA) [15,21].Multiple Factorial Analysis (MFA) [25][26][27] and Rv coefficient [10,23,36] were applied to visualize the consensus at trained judges and inter-panels level, considering that values higher than Rv = 0.67 are considered acceptable and consensual [37,38].Bidimensional data processing was carried out with the XLSTAT ® software, version 2009 (Addinsoft, New York, NY, USA).Unidimensional data was performed using the Statgraphic Plus ® 5.2 software (Statistical Graphics Corp, USA).

Evaluation of the Performance of Each Judge
One-way ANOVA results for the evaluation of the performance of judges from panel of ITC (Figure 1) reveled that judges 1, 2, 3 and 4 showed high values of discrimination (F) at a significant level (P < 0.01) for all attributes, while judge 5 was highly discriminative (P < 0.01) for the attributes such as white color, the smell of rennet, granular texture to the touch, soft to the touch, aroma to serum, soft in the mouth and lumpy in the mouth.Otherwise, judge 6 showed a lower index of discrimination (P < 0.05) for the attributes such as smell of rennet, granular texture to the touch and soft in the mouth.For the evaluation of the repeatability (Mse), judge 3 was less repetitive for soft in the mouth attribute; judge 4 for granular texture to the touch attribute; the judge 5 for salty attribute and the judge 6 for salty, granular texture to the touch and soft to the touch attributes.
The ANOVA results for evaluating the performance of judges from the panel of UMAR (Figure 2) revealed that judge 1 showed high values of discrimination (F) at a significant level (P < 0.01) for 7 of the 8 sensory attributes, while judge 2 was highly discriminative (P < 0.01) for almost all attributes, except for granular texture to the touch and soft to the touch; judge 3 was highly discriminative (P < 0.01) for white color, smell of rennet, salty, granular texture to the touch and aroma to serum; judge 4 was highly discriminative (P < 0.01) for white color, smell of rennet, salty, granular texture to the touch, soft to the touch, aroma to serum and soft in the mouth.Judge 5 was less discriminative on granular texture to the touch, soft to the touch and lumpy in the mouth attributes; while judge 6 was highly discriminative (P < 0.01) for white color, smell of rennet, granular texture to the touch, aroma to serum and soft in the mouth.On the other hand, the Mse for the evaluation of the repeatability of the UMAR panel, determined that judge 1 was less repetitive on smell of rennet, granular texture to the touch, soft to the touch and lumpy in the mouth attributes; while judge 2 was less consistent in the evaluations among sessions on smell of rennet, granular texture to the touch, soft to the touch and soft in the mouth attributes.Judge 3 was less repetitive on soft to the touch, soft in the mouth and lumpy in the mouth attributes.Judge 4 was less reproducible on smell of rennet, granular texture to the touch, soft to the touch, soft in the mouth and lumpy in the mouth attributes.Judge 5 was repetitive for white color and soft in the mouth attributes; while the judge 6 was only repetitive for the aroma to serum attribute.According to [39], subjects who are better able to remember smells are more discriminating, as well as judges who have a greater ability to focus are more consistent.On the other hand, most of the judges in both panels had good performance in repetitivity due to the low values of Mse obtained [18] in most attributes.However, for the mechanical type attributes (granular texture to the touch, soft to the touch and soft in the mouth), the repeatability values of judges of both panels were high (low repetition).This behavior may be explained by the confusion among the descriptors and the differences in the sensory perception of the cheeses evaluated in two different places [32].

Evaluation of the Performance of Each Trained
Panel Table 1 shows the three-way ANOVA with interaction (Product × Judge) results for the performance evaluation of the ITC panel, where the product factor showed the panel was highly discriminative (P < 0.01) in all attributes; the judge factor revealed that there was only disagreement   on the use of the scale for the evaluation of the white color attribute determining that the panel was consensual for the other seven sensory attributes.The session factor determined the panel was repetitive (P > 0.05), which means that the scale was used in the same way in all sessions for the evaluation of attributes, except soft in the mouth (P < 0.05); the Product × Judge interaction showed that there were significant differences (P < 0.05) in the classification of the cheeses on the scale for all attributes, except for the salty attribute.The three-way ANOVA results with interaction (Product × Judge) for the performance evaluation of the UMAR panel are shown in Table 2; this group was highly discriminating (P < 0.01) in the most of the attributes except for the aroma to serum.The judge factor revealed differences in the use of the scale on white color, soft to the touch, salty, lumpy in the mouth and soft in the mouth attributes; this effect might been due to inter-individual variation because of differences in the use of the intensity' scale by the subjects [12] and by the difference in training time of the panelists [4,28].The session factor revealed that the panel was repetitive on all attributes except on the salty attribute; while the interaction factor determined that there were differences in the classification of cheeses for smell of rennet and aroma to serum attributes.However, the results of the (Product × Judge) interaction showed significant differences (P < 0.05) on smell of rennet and aroma to serum attributes.According to [15], this effect may be due to the judges classified the products on the scale in different ways.In general, the values obtained from discrimination (F products ) in both panels in most cases were higher than those reported by [32], who used different panels for the evaluation of attributes of smell, serving the samples in a sequential monadic way (product by product) and obtaining values of F products in range of 0.34 to 21.10.In the present research, the samples were served by a simultaneous multiple way (attribute by attribute), which allows for better comparative discrimination among the products, getting high levels of discrimination and repeatability in the panel [31].The results of judge factor determined some discrepancies in the use of the intensity' scale, which is explained by inter-individuals differences [12].In the case of the session factor, the results showed that data from both panels were repetitive in the majority of attributes; this disagrees with the results presented by [4], who evaluated 16 sensory characteristics of products made from walnut with panels located in Italy, Spain and France, and they found a significant effect (P < 0.05) on 6, 4 and 3 attributes, respecttively.Regarding the interaction factor (Product × Judge), [15], reported the same effect as the present research when they evaluated ten sensory attributes in dry sausages of France by two panels (one of them trained through internet and the other trained in a laboratory); their results showed that internet-trained panel presented a significant effect (P < 0.05) in the interaction factor on 6 attributes, while the laboratory-trained panel presented a significant effect (P > 0.05) on 3 sensory attributes.
The results of the present research showed that both panels were highly discriminative [35] despite the differences in training time.

Comparison of the Sensory Characterization of Cheeses at Trained Intra-Panels Level
The four-factors ANOVA results with interaction (Product x Panel) for the evaluation of the sensory characterization of cheese are shown in Table 3, where the product factor showed that both panels were highly discriminative (P < 0.01) on all sensory attributes except on soft to the touch [11], found highly significant differences (P < 0.01) on the hardness, smell cream, salty, creamy aroma attributes for the cheddar cheese evaluated by trained panels in Scotland and Norway; [13] reported that the smell, aroma and texture attributes had a highly significant effect (P < 0.01) in the evaluation of roncal type cheese.For the panel factor the results revealed that both panels used different parts of the scale for the evaluation of the cheeses, except for soft to the touch and lumpy in the mouth attributes, in which there were no significant differences (P > 0.05).These results were similar to those obtained by [14] and [4], who obtained significant differences (P < 0.05) in the use of the scale on 36 of 39 sensory attributes and on 14 of 16 sensory attributes for the evaluation of beer and nut products between two sensorial analysis laboratories.For judge factor, the results showed no significant differences (P > 0.05) on all attributes, which means that there were no differences among the judges (in the same panel) for the classification of cheeses [6,13].The results of the session factor revealed no significant differences (P < 0.05) on all attributes, it means the panels gave similar scores among sessions; this effect was also observed by [8], who concluded that the session factor is rarely significant (P > 0.05).The results of the interaction (Product × Panel) showed that both panels of the present research found no significant differences (P > 0.05) in the classification of the cheeses for the salty (Figure 3(e)) and lumpy in mouth attributes (Figure 3(f)).However, significant differences (P < 0.05) in the classification of the cheeses were found for attributes as white color (Figure3(a)), granular texture to the touch (Figure 3  Copyright © 2011 SciRes. FNS Type Fresh Cheese in Two Regions from Oaxaca in México by [35], in 9 of 16 sensory attributes for the evaluation of biscuits by two trained panels from different nationalities.The differences in the classification of the cheeses may be due to factors such as training time [15,35], although [8], mention that the significant effect of the interaction (Product × Panel) may be due to changes in the conditions of tasting (serving type, differences in the preparation of samples, etc.), where the effect of the temperature of the samples has an impact on the sensorial perception [29].Another reason might be due to the differences in the use of reference products for the training of both panels [12] in conjunction with differences in the performance of the panels, where one group was more discriminative than the other [6], as in the case of the ITC panel, which performed significantly (P < 0.05) better than the UMAR panel for the discrimination and use of the scale for evaluation of the intensity.This result may be due to the influence of the concept of familiarization of the analyzed product since judges of ITC panel of belong to a farming area where the studied cheeses are manufactured and are included in diet of people from the region of the Isthmus of Tehuantepec [1]; while the judges from panel of UMAR live in a fishery area and they probably were mostly influenced by seafood.This concept of familiarization may also have an influence on the retention of sensory attributes in memory, which in turn may be associated with cultural issues, and conesquently some persons may express the sensations with one word, while others from other places may associate sensations to several words, which represents a problem for the translation and evaluation of sensory attributes from one place to another [40].

Space and Evaluation Trained Inter-Panels
The differences and similarities shown by two-way ANOVA with interaction (Product × Judge) are reflected in Figures 4(a)-(b), where the PCA is observed for the ITC and UMAR panels, revealing similarities in the generation of the principal axis with a percentage of 83.99% and 95.51% respectively [11].These results are similar to those reported by [29], where cheddar cheese was evaluated by two trained panels obtaining variance values of 86 and 93%, which were higher than those reported by [41] of 37.11% for the evaluation of dry-cured ham by two panels from different countries (France and Spain).Therefore, the ITC panel grouped (Figure 4(a)) the cheeses A and D and at the same time opposed them for cheeses B and C; this classification was similar to that made by the US panel.However the sensory characterization made by TIC panel determined the cheeses A and D were perceived with major intensity on attributes such as soft in mouth, lumpy in the mouth and the smell of rennet; While B and C cheeses were perceived with major intensity on attributes such as granular texture to the touch, soft to the touch, white color, aroma to serum and salty.In the case of the UMAR panel (Figure 4(b)), cheeses A and D were characterized as white color, granular texture to the touch, lumpy in the mouth and the smell of rennet, while B and C cheeses were perceived as soft to the touch, soft in the mouth, salty and aroma to serum; the value obtained of Rv trained panels = 0.85 determined similarities in the positioning of the cheese on sensory space built by both panels.This value was higher than those reported by [10], who obtained values of Rv from 0.39 to 0.57 for the evaluation of trained panels, but it was similar to those obtained by [42] (Rv = 0.87), who characterized yogurts by trained panels in France and Vietnam.
Therefore, the variation of data in the two first princepal components of the MFA was 88.40% (Figure 5).This result is similar to those obtained by [8], who characterized different chocolate products with several trained panels obtaining a value of 84.54% in the first two components of the MFA.The Figure 5 also showed that the distances between trained panels are equidistant for cheeses B, C and D, contrary to the cheese A, in which major differences between panels for the evaluation of this cheese were found.These differences are shown in Figure 3 of the interaction (Product × Panel) where there are major discrepancies in the classification of cheese A in the majority of sensory attributes.

Consensus among Judges within Each Panel
The result of Rv applied to the analysis of the judges (Table 4) revealed values from 0.80 to 0.98 for ITC panel, and values of Rv from 0.86 to 0.98 for the UMAR panel; these values reflect a consensus among the subjects of both panels [37,38].However, our Rv results are higher than those reported by [10] who evaluated the performance of judges from different panels reporting values of Rv from 0.39 to 0.57.Therefore, the result of the two first principal axes of the MFA was of 81.34% of variation (Figure 6); this value was superior to that reported by [20] (66% in the two principal axes for the evaluation of the consensus among evaluator judges of wine).Figure 6 shows the judges from both panels are very close (determining a consensus among them), unlike judge 4 from TIC and the judge 6 from UMAR, which were located slightly away from the group, obtaining the lowest values of Rv of 0.80 and 0.86 respectively.This effect might contribute to some discrepancies between the panels according to the results obtained by the fourays ANOVA with interaction (Product × Panel).w Copyright © 2011 SciRes.
FNS Type Fresh Cheese in Two Regions from Oaxaca in México  Copyright © 2011 SciRes.FNS

Conclusions and Recommendations
The use of univariate and multivariate statistical methods used in this research in conjunction with the selection and training level of each sensory panel showed the differences in discriminatory capacity of judges and panels.The differences in the sensory characterization of "cuajada" type fresh cheese and the impact in the performance of each panel were explained by the use of applied vocabulary, differences in training and the concept of product familiarization.On the other hand, some sensory attributes (color white, the smell of serum, salty, lumpy in the mouth and soft in the mouth) were common between both panels, resulting in high values of discrimination; while other attributes (granular texture to the touch, soft to the touch and aroma to serum) caused some differences in their interpretation, obtaining low values of discrimination and repeatability between the panels from one place to another and contributing to the heterogeneity of results.Differences among the judges of the same panel were found, giving discrepancies between the panels.For this reason it is also important to focus on the performance evaluation of each judge and not only in the performance by panel.Despite all this, the PCA showed some similarities in the positioning of the cheese and attributes, while the Rv coefficient revealed a strong consensual agreement among panels and judges.On the other hand, the authors recommend applying simple, economic and fast methodologies of free vocabulary profile such as the flash profile for searching and understanding sensory concepts with respect to the origin of the product, prior to the judge's training for a real quantification of the typical attributes.

Figure 6 .
Figure 6.Sensory space of consensus among the judges of panels from Instituto Tecnológico de Comitancillo (ITC) and niversidad del Mar (UMAR U ).