“AniFair”: A GUI Based Software Tool for Multi-Criteria Decision Analysis—An Example of Assessing Animal Welfare

Multi-criteria decision analysis deals with decision problems in which multiple criteria need to be considered. The criteria might be measured on different scales so that comparability is difficult. One approach to help the user to organize the problem and to reflect on his or her assessment on the decision is Measuring Attractiveness by a Categorical Based Evaluation TecHnique (MACBETH). Here the user needs to provide qualitative judgment about differences of attractiveness regarding pairs of options. MACBETH was implemented in the M-MACBETH software using the additive aggregation model. The present article introduces the software tool “AniFair” which combines the MACBETH approach with the Choquet integral as an aggregation function, because the Choquet integral enables the modeling of interaction between criteria. With the Choquet integral, the user can define constraints on the relative importance of criteria (Shapley value) and the interaction between criteria. In contrast to M-MACBETH, with every instance of “AniFair” the user is made available at least two aggregation level. “AniFair” provides Graphical User Interfaces for the entering of information. The software tool is introduced via an example from the Welfare Quality Assessment protocol for pigs. With this, “AniFair” is applied to real data that were collected from thirteen farms in Northern Germany by an animal welfare expert. The “AniFair” results enabled a division of the farms into five groups of comparable performance concerning the welfare principle “Good feeding”. Hereby, the results differed in how much the interaction between criteria contributed to the Choquet integral values. The shares varied from 5% to 55%. With this, the vulnerability of aggregation results towards relative importance of and interaction between criteria was stressed, as changes in the ranking due to the definition of constraints could be shown. All results were exported to human readable txt or csv files for further analyses, and advice could be given to the farmers on how to improve their welfare situation.


Introduction
Multi-criteria Decision Analysis (MCDA) is a general term for concepts that aim at supporting the user in dealing with decision problems involving multiple criteria [1]. Often these criteria include qualitative instead of quantitative state descriptions, and comparability among criteria is commonly difficult, as measurement scales do not necessarily coincide. MCDA methods help to structure and solve such decision problems, i.e. to find so called nondominated solutions.
Nondominated means, the user cannot alter the solution by improving it in terms of some criteria without doing worse in some other criterion. As the set of nondominated solutions can be very large, MCDA methods are needed to help the user reflect about the kind of decision he or she is about to make and to discover a solution that mirrors his or her preferences. Being an active research area since the 1960s, MCDA methods have been used in various scientific and applied fields as operational research [2], transportation systems [3], neuropsychology [4] or renewable and sustainable energies [5]. A survey of MCDA methods can for example be found in Zavadskas and Turskis [6] and Liou and Tzeng [7]. This article deals with the MACBETH (Measuring Attractiveness by a Categorical Based Evaluation TecHnique) approach [8] which was developed in 1992 by Carlos Bana e Costa [9]. The MACBETH approach requires only qualitative judgment about the difference of attractiveness (DoA) with regard to pairs of options. Therefore, when multiple criteria were involved that were measured on incomparable scales or be evaluated qualitatively, the problem of decision making was brought down to a straight forward questioning-answering-protocol [10]. This interactive method was additionally implemented into the M-MACBETH software [11] which generates comparable numerical scales for the criteria based on these user preferences. Furthermore, the additive value aggregation model was adopted in the M-MACBETH software to come to an overall decision including all criteria.
In this article the software tool "AniFair" is presented. "AniFair" is a software for Multi-criteria Decision Analysis which like the M-MACBETH software was implemented based on the mathematical foundations of the MACBETH method. However, "AniFair" combines the MACBETH approach for the calculation of comparable scales with the Choquet integral as aggregation function instead of an additive model. In the application of additive aggregation (weighted arithmetic mean) lies the implicit assumption that the criteria are mutually preferentially independent. In reality, this condition does not hold, as interaction between criteria is rather to be expected. The Choquet integral was introduced by Murofushi and Sugeno [12] and constitutes a natural extension of the weighted

Material and Methods
The software tool "AniFair" is in detail described in Section 2.5 and was applied to a real life example associated to the evaluation of animal welfare. "AniFair" was used with data collected on farm concerning the category 'Sows and piglets' from the 'Welfare Quality Assessment protocol for pigs' (Section 2.1).

Welfare Quality Assessment Protocol for Pigs
Studies [31] [32] showed that the welfare of farm animals is a concern of growing importance for the consumers of animal-related products-especially food.
This raised the question how the animal's welfare status could be scientifically described and assessed in a reliable way. The Welfare Quality® project started in 2004 and combined the analysis of the consumers' points of view with the knowledge of experts from animal welfare science. Twelve criteria were identified that should be accounted for in a system that assesses animal welfare. These criteria were partitioned in the four welfare principles 'Good feeding', 'Good housing', 'Good health', and 'Appropriate behavior'. Separated 'Welfare Quality® Assessment protocol's (WQAP)' for different species were published [27] [28] [29] in which the assessment of welfare statuses based on these welfare principles was described in detail. Animal welfare is a multidimensional concept that relies on multiple indicators to assess the aforementioned welfare criteria. All collected information must be stepwisely aggregated: Criteria scores are calculated from the indicators and afterwards combined further to achieve principal scores. From principal scores an overall evaluation to distinguish between welfare standards of farms is obtained.

Data from 'Sows and Piglets on Farm Level'
In the 'Welfare Quality® Assessment protocol for sows and piglets' measures regarding the sows, regarding the piglets, and both were described. To explain handling and functionality of "AniFair", the principle 'Good feeding' was used.
The remaining principles 'Good housing', 'Good health', and 'Appropriate behavior' were added in order to present the 'Multiple instances'-version of "AniFair" (Section 2.5.4), but were not individually discussed in detail.

'Good Feeding' in 'Sows and Piglets'
The animal welfare principle 'Good feeding' in 'Sows and piglets' consists of the criteria 'Absence of prolonged hunger' and 'Absence of prolonged thirst'. These criteria were evaluated using the measures 'Body condition score' (BCS), 'Age of weaning', and 'Water supply'.  Body condition score, as a measure of 'Absence of prolonged hunger'. The BCS measured the energy reserves of an animal. According to WQAP it was scored for the sows on a three point scale. A score was given to every sow. Thereby, the sows were scored '0' when their BCS was within a healthy range, i.e. firm pressure was needed to feel the hip bones and the backbone. The animals were scored '1', when the sows appeared obese or the hip bone and backbone could easily be felt. The BCS score '2' was given when the sows had prominent hip bones or backbone and a very thin visual appearance. The percentages of sows with BCS '0', '1', and '2' were calculated for every farm, respectively.  Age of weaning, as a measure of 'Absence of prolonged hunger'. The age of weaning was a measure concerning the piglets. Legal specification state that piglets need to be suckled by the sow for at least 28 days. As score for the farm the averaged number of days from birth to weaning was taken.  Water supply, as a measure of 'Absence of prolonged thirst'. The drinking places for sows and piglets were scored on a two point scale. One score was given for the whole farm taking into account the cleanliness and functionality of all drinkers. The score '0' was given when all drinkers were clean and functioning without stint. The score '2' was given otherwise.

Data Collection
Data was collected on thirteen farms in Schleswig-Holstein in Northern Germany. The farms held 40 to 5000 sows (mean 663.1 ± 1331.9). An observer trained with regard to WQAP visited the farms repeatedly and scored 30 sows per visit according to WQAP. For this example the data from the first visit on every farm was used. These first visits took place from September to December 2016 and from April to July 2017.

Ordinal and Precardinal Scales
The MACBETH approach presented the user with decisions about DoA that involve only qualitative judgment regarding two options at the time. In the 'Ani-Fair' implementation this was used for the calculation of comparable scales for all criteria (Section 2.5.2). In the following, different types of scales are defined.
: is more attractive than : is equally attractive as An ordinal scale can easily be obtained by ranking the elements of X according to their attractiveness and assigning real numbers that satisfy conditions (1) and (2). However, the differences between the scores on an ordinal scale can be arbitrary, and in MCDA scales are needed, that reflect not only the order of attractiveness of the elements, but also the differences of their attractiveness.
To create a scale with meaningful differences between its scores, in the M-MACBETH software the user needed to judge the DoA for pairs of elements of X with one of the following attributes 'extreme', 'very strong', 'strong', 'moderate', 'weak' and 'very weak'. Based on these judgments a scale S could be reviewed as precardinal.
Definition 2 (Precardinal scale (reflecting given user judgment)) An ordinal scale : x is more attractive than j x and l x is more attractive than k x the following implication holds: If the difference of attractiveness between i x and j x was judged to be larger than the difference of attractiveness between l x and k x , than ( ) ( ) ( ) ( ) . A positive affine transformation applied to a precardinal scale results in a precardinal scale that reflects the same given user judgment. Large/small distances on a precardinal scale correspond to large/small DoA between the respective elements. Precardinal scales, however, do not necessarily fulfill that the relative distances between scores on the scale exactly represent the relative DoA as experienced by the user. This is the characteristic of cardinal scales.
In both the M-MACBETH software and "AniFair", cardinal scales were achieved while the user got the possibility to modify the precardinal scale proposed by the software (supplementary material, Appendix: Background of 'Making criteria comparable', Visualization and adaption of scales.).

Choquet Integral
The Choquet integral can be seen as a natural extension of the weighted arithmetic mean in case mutual preferential independence between criteria cannot be assumed. In practice interaction phenomena among criteria occur. In this case the aggregation function cannot be considered additive, and not only the importance of each criterion, but the importance of subsets of criteria needs to be taken into account. Instead of a vector of weights, a monotone set function-called capacity-is introduced. For the remainder of this section let is called a capacity, if the following conditions hold: Based on the concept of a capacity, the Choquet integral can be defined. Definition 4 (Choquet integral) Let : f N + →  be a function represented (5) In case the capacity µ is an additive function, the Choquet integral coincides with a weighted arithmetic mean. The exponential complexity due to the fact that a capacity is in general given by a set of 2 n coefficients has been a limiting condition, since Grabisch [14] proposed the concept of k-additivity as a trade of between complexity and the possibility to model interaction.
Definition 6 (k-additive capacity) Let Every k-additive capacity can thus be represented by at most ficients, which is a significant reduction of complexity [33]. For the software tool "AniFair" the case 2 k = was implemented.

Shapley Value and Interaction among Criteria
As capacities put weight on all subsets that hold a criterion instead of just weighting the singled out criteria, not only the importance of each individual criterion was meaningful for the decision process. Thus, the Shapley value was introduced Shapley to address the relative importance of each criterion with respect to the decision problem. With n being the number of criteria, the Shapley value was a vector ( ) Secondly, two criteria were called redundant or to interact negatively, when the union of the criteria did not contribute more to the decision problem than each criterion individually. This was represented by interaction indices . Thirdly, two criteria were said to be independent when they did not interact, i.e. the importance of the single criteria more or less summed up to the importance of the combination of criteria. Formula for and development of the interaction index could be found in Murofushi and Soneda [35]. For a 2-additive capacity µ the formula for the Choquet integral of a func- with the property 1 0 2 The second term of the sum could be seen as the part of the Choquet integral value that results from interaction of criteria [22].

Software Tool "AniFair" and Application
"AniFair" was implemented using R [36]  An installer for "AniFair" can be downloaded at https://www.anifair.uni-kiel.de/de/willkommen-bei-anifair. It comes with a portable version of R 3.4.1 and the above mentioned packages to avoid instabilities due to version conflicts, but allow "AniFair" to run in its development environment instead. In addition, example status files are provided for tryout runs (supplementary material, Saving and reloading 'AniFair' status.).
"AniFair" was designed to assist the user in the decision between objects of interest (OoI) when multiple and not comparable criteria are involved. Hereby, the possibility was provided to run more than one instance of "AniFair" simultaneously. As all instances in the 'Multi instance'-version worked equally to the single instance version, 'AniFair' was explained with respect to a single instance.
The procedure associated with the software tool "AniFair" could be divided into the three sectors 'Creation of criteria tree', 'Making criteria comparable', and 'Choquet integral aggregation' as illustrated in Figure 1. Firstly, the user needed to insert decision criteria and OoI (Section 2.5.1). The software tool then generated comparable scales (Section 2.5.2) for all criteria and calculated a capacity so that the OoI could be compared via the values of a Choquet integral (Section 2.5.3). Furthermore, the 'Multiple instances'-version of 'AniFair' offered the possibility to aggregate the results of all instances (Section 2.5.4).

Creation of Criteria Tree
In the GUI window opened by "AniFair" the topic of the decision problem could be inserted as root of the criteria tree ( Figure 2). The user could alter the entered topic via the 'Alter' button. The "AniFair" start window was split up into one Figure 1. Graphical representation of the structure of "AniFair". In "Creation of criteria tree" the user can enter decision criteria and a list of objects of interest. In 'Making criteria comparable' the user needs to define his or her preferences regarding the different states the criteria could show. These are used to calculate scales that are comparable between criteria. Afterwards, the objects need to be assigned scores on those scales. In 'Choquet integral aggregation' this scoring is used to provide a capacity on the set of criteria and calculate Choquet integral values for all objects. Additionally, the user can define constraints regarding the relative importance of and interaction among the criteria. framed box container for the entering of OoI and a respective framed box container for the building of the criteria tree. Both, objects and criteria, could be entered manually or uploaded from file. "AniFair" prevented the entering of object or criteria names that had already been used for other items. Entered objects and criteria were presented in the "AniFair" start window each associated with Figure 2. Creating a criteria tree in an "AniFair" instance. The user has started to enter object names ('1', ..., '4'). The first level criterion "BCS_S" has already been entered with subcriteria 'BCS_S_1" and 'BCS_S_2' which have been chosen as criteria with 'data available' ('DA') meaning that the data gathered for the subcriteria will be used in the aggregation process. The criteria names and the 'DA' buttons are marked with red and bold font. As subcriteria of 'BCS_S' have been chosen as 'data available', 'BCS_S' itself cannot be chosen simultaneously; the corresponding 'DA' button is greyed out. By clicking the 'more... ' buttons the user could enter additional object names and criteria. In the upper part of the window the 'Add AniFair instance' button is placed which starts an additional instance when results for the current instance are present.
buttons 'Alter', '.Delete', and '.Restore'. The '.Delete' button left the object or criterion greyed out, and it was not used in further processing, except it was restored again. While the entered object names were all listed in one framed box container, each criterion had its own framed box container, because the definition of second level criteria (subcriteria) was possible. All subcriteria of one criterion were displayed in the same framed box container as the criterion. The entering of second level criteria was carried out within the framed box container of the corresponding first level criterion.
With each criterion or subcriterion, additional 'DA' buttons were displayed. It had to be marked for which first or second level criteria data had been collected and which data, respectively (sub)criteria, should be used in the aggregation process. Thus, if a first level criterion was marked as 'Data Available' ('DA'), none of its subcriteria could be marked, and if a subcriterion was marked as 'DA', the corresponding first level criterion could not be marked at the same time.
This gave the user the possibility to design his or her criteria tree as visualization of the decision problem, and then independently decide upon the criteria involved in the decision process.
Instead of entering OoI and criteria tree manually or uploading them from individual files, a complete "AniFair" status from a former "AniFair" application could be reloaded. The 'LOAD' button opened a drop down menu from which an "AniFair" status file (Section 2.5.2, paragraph Saving and reloading "Ani-Fair" status) could be chosen. Independent and dependent subcriteria. "AniFair" distinguished between two types of subcriteria. The subcriteria of a first level criterion were considered dependent, if the states of the subcriteria were effected by each other. E.g. the criterion 'BCS_S' was splitted into the dependent subcriteria 'BCS_S_1' and 'BCS_S_2', each measured as percentages of sows with BCS '1' and '2', respectively (Sections 2.2.1 and 2.5.5). As an animal scored '1' could not be scored '2' simultaneously, these percentages (i.e. the states of the subcriteria) are not independent from each other. A pre-aggregation of the subcriteria had to take place within 'BCS_S' and 'BCS_S' was afterwards used in the main aggregation. With independent subcriteria the state of one subcriterion did not influence the state of the remaining subcriteria. Independent subcriteria were used in the aggregation together with first level criteria; no pre-aggregation took place.
Before the processing could continue, the user had to provide "AniFair" with the information, which subcriteria should be considered independent, respectively, dependent (supplementary material, Figure S.2).
Limiting the number of criteria per aggregation step. As the computational time for capacity calculation grew disproportionately with the number of criteria, in "AniFair" at most fifteen criteria per aggregation step were allowed. With sixteen or more criteria very large objects burdened the working memory, or the calculation could not be carried out at all due to the fact, that the native code of lin.prog.capa.ident could not support long vectors (64 bit indexes).

Making Criteria Comparable
The main user interaction occurred in the part of "AniFair" in which comparability between the criteria was aspired. This was approached in a very similar way as in the M-MACBETH software [11] and the mathematical foundations could be found in e Costa, Corte, and Vansnick [38] and in the supplementary material, Appendix: Background of 'Making criteria comparable'. Therefore, the process is only briefly outlined here, and the differences between the software tools are highlighted.
The user had to deal with the definition of the different states (performance level) the 'DA' criteria could take (Figure 3(a)). Afterwards "AniFair" needed to be provided with information on the DoA between the performance levels in terms of the qualitative attributes 'extreme', 'very strong', 'strong', 'moderate', 'weak', 'very weak', and 'no' [8], which were inserted in a matrix of judgment ( Figure 4) for every 'DA' criterion. In case, these judgments were inconsistent with precardinality (Definition 2), "AniFair" offered suggestions to solve the inconsistency. Those user information formed the basis for the calculation of "AniFair" scales AniFair AniFair 1 , , n S S  which then could be adapted by the user until they best matched his or her experience of the decision problem. Hereby, "AniFair" ensured that the inserted user preferences were not violated during this modification. Scale adaption was carried out via an interactive graphical representations of AniFair   In case of dependent subcriteria, pre-aggregations within the respective first level criterion took place, and the first level criterion was then used in the main aggregation (Section 2.5.1, Independent and dependent subcriteria). Thus, not only scales for the dependent subcriteria ('BCS_S_1', 'BCS_S_2'), but also a scale for the respective first level criterion ('BCS_S') was needed. As a basis for this, additional matrices of judgment were needed to be filled in by the user concerning the DoA between the dependent subcriteria ( Figure 3(b)). As final step before aggregation with the Choquet integral could be carried out, the OoI needed to be assigned scores from the final criteria scales final final 1 , , n S S  (paragraph Scoring of objects of interest.).
Export of user entered information. "AniFair" suggested to export all user entered information to human readable txt files. This included criteria tree, defined performance level and filled in matrices of judgment and the scales. In contrast to the uncommon mcb file format exported by the M-MACBETH software, these files can be easily viewed and can serve as a basis for discussion between groups of decision makers. Examples for these files are given in the supplementary material, Appendix: Data exported from "AniFair". In addition, the status of "AniFair" could be saved to less human readable files and reloaded, in case a modeling needed to be interrupted (supplementary material, Saving and reloading "AniFair" status).
Scoring of objects of interest. Every OoI was associated with one performance level per 'DA' criterion according to the available data (as an example see Figure 5). With which collection of performance level the OoI were associated was not known by "AniFair" at this stage. This information could be entered manually ( Figure 5) as in the M-MACBETH software, but it could also be uploaded from file.
For the latter, the user needed to prepare a file organized as follows. OoI denoted the rows and 'DA' criteria denoted the colums. It was important, that the object and criteria names in the file match the names entered in "AniFair" by the user. For criteria with qualitative performance level, the fields of the table might exclusively hold performance level as defined by the user. For criteria with quantitative performance level, the fields of the table contained the originally collected data, which was internally compared with the defined quantitative performance level by "AniFair". "AniFair" could manage, if the number or order of OoI, respectively, 'DA' criteria differ between user entered information and file, and it deleted duplicates. For the 'Good feeding' example the beginning of the corresponding file was depicted in Listing 1.
If still the upload was not successful, a window was opened that showed an example on how to prepare the file, and "AniFair" allowed the user to choose with one row per OoI and one column per criterion.

Choquet Integral Aggregation
There were several mathematical approaches to identify a k-additive capacity (Section 2.4, Definition 6) reflecting specific user given information. An overview on the methods provided by the R-package 'Kappalab' [39] was given in Grabisch, Kojadinovic, and Meyer [40]. For the "AniFair" implementation the approach 'maximum split' via the R-function lin.prog.capa.ident was chosen.
With the 'maximum split' the minimal difference between the overall utility of the OoI was maximized from a system of linear inequations (SLI). For this, the OoI needed to be ranked by a partial weak order, because lin.prog.capa.ident needed the differences in overall utility of the OoI as an input. As the final criteria scales were comparable and reflected user preferences, it was meaningful to calculate mean scores for the OoI and to sort the rows of the matrix ( ) Scores OoI (Section 2.5.2, paragraph Scoring of objects of interest, Listing 7) according to these means. As the rows of ( ) Scores OoI corresponded to the overall utility of the OoI, one inequation was derived from two successive rows, comparing one OoI with the next best preferred OoI (Listing 2). Additionally, a preference threshold C δ was set.
The R-function lin.prog.capa.ident was based on linear programming and used the R-package 'lpSolve'. Given the input Acp created in Listing 2, an object of class Mobius.capacity was created which held the capacity µ as list of coefficients. µ could afterwards be used to calculate the corresponding Choquet integral (Definition 4) using the function Choquet.integral (Listing 3).
Visualization of the Choquet results of pre-aggregation steps. In case of dependent subcriteria, "AniFair" opened a notebook with one tab for each criterion that had dependent subcriteria (Figure 6(a)). In each tab a table was presented. Let n dep be the number of dependent subcriteria for the respective criterion, then the table had n dep + 2 or n dep + 3 columns. In the first column, the objects were listed. The following n dep columns held the scores of the objects for the dependent subcriteria, i.e. the respective columns of the matrix Scores (OoI) (Section 2.5.2, paragraph Scoring of objects of interest). These were followed   The pre-order of the OoI given to lin.prog.capa.ident via the argument Acp (Listing 2) was based on the assumption that all criteria were equally important to the decision problem. In re-calculation the pre-order needed to be reconsidered, when constraints on the Shapley value were defined that suggested otherwise.
The generation of a weight vector representing the defined Shapley value constraints was implemented using the function lp from the R-package 'lpSolve'. A weighted version of the matrix Scores (OoI) of scores for the OoI was calculated while the weight vector was element wisely multiplied to the rows. From this weighted version of Scores(OoI) a weighted version of Acp was created according to Listing 2 and passed to lin.prog.capa.ident for the re-calculation (Listing 6) together with the constraints defining matrices Asp, Asi, Aip, and Aii.
As far as a solution existed that satisfied the given constraints, the results were displayed in a two-sided window (supplementary material, Figure S.6). On the left the solution from the preceding calculation was presented, and on the right the re-calculated solution could be seen. If no solution existed, the user was asked to define less strict constraints. Export of results. The windows displaying the results of aggregations were equipped with an 'Export' button, in order to export the results to txt files (supplementary material, Listing Exported.3, Exported.5) and csv files (supplementary material, Listing Exported.4, Exported.6).

Aggregation of Instances
In the 'Multiple instances'-version of 'AniFair' the instances appeared as tabs in the main window (Figure 8(a)). Every instance was handled as described in Sections 2.5.1, 2.5.2, and 2.5.3. As soon as results for the OoI existed for all instances, a Choquet integral aggregation of the instances could be initiated by clicking the 'AGGREGATION' button. As could be seen in Figure 8(b), the user was presented with the type of results available for each instance: Choquet integral values or (weighted) mean. The results of the aggregation of instances were displayed in the same manner as the results for aggregation within instances ( Figure 8(c)). Also, constraints on Shapley value and interaction indices could be defined, if a Choquet integral solution existed, and a weighted mean could be calculated, if no solution for the Choquet integral was given.

Application of 'AniFair' to 'Good Feeding' in 'Sows and Piglets'
Creation of criteria tree. In the "AniFair" instance in the first tab ( Figure 2) the topic 'Good feeding' was entered as root for the criteria tree. Furthermore, the thirteen farms were entered as objects '1', ..., '13' as well as the first level criteria 'BCS_S' (body condition score of sows), 'Age_of_weaning', and 'Wa-ter_supply'. 'BCS_S' was split up in second level criteria 'BCS_S_1' and 'BCS_S_2'. As 'DA' criteria the second level criteria 'BCS_S_1' and 'BCS_S_2' and the first level criteria 'Age_of_weaning' and 'Water_supply' were marked. As BCS was scored on a three-point scale and 'BCS_S_0/1/2' were measured in percentages of affected animals, all information ing 'BCS_S_0' was given with the information on 'BCS_S_1' and 'BCS_S_2'. A visualization of the complete criteria tree can be found in supplementary material ( Figure S.1). The second level criteria 'BCS_S_1', and 'BCS_S_2' were marked dependent after "Proceed calculation" was hit (supplementary material, Figure  S.2, Listing Exported.1).
Making criteria comparable. The definition of performance level, the filling of the matrices of judgment, and the adaption of the scales were carried out by the same person, who collected the data. The proper scientific background in the topic of animal welfare was guaranteed in the decision process. The bases of comparison were set to 'Quantitative performance level' for the 'DA' ria 'BCS_S_1', 'BCS_S_2', and 'Age_of_weaning'. Being measured as percentages of sows and numbers of days, these criteria were scored on numerical scales (section 2.2). Performance level were inserted in order of decreasing attractiveness. The BCS value '1' was given to sows that failed the healthy state, therefore, it was desirable to have low percentages of sows with BCS '1'. The three performance level '<8.7', '8.7-18.6', and '>18.6' were defined. '2' was the least desirable BCS value, as it described malnourished sows. The performance level defined here were '0', '0-0.3', and '>0.3'. For 'Age_of_weaning' averaged number of days from birth to weaning were grouped into the three performance el '>28', '28-24.5', and '<24.5'. As 'Water_supply' was measured qualitatively by judging the cleanliness and functionality of the drinkers, 'Qualitative performance level' was set. The descriptions '0: adequate' and '2: cleanliness and/or functionality not adequate' were entered with the abbreviations '0' and '2'. Pictures of the inserted performance level and the graphical visualizations of the adapted scales could be found in the supplementary material ( Figures S.3, respectively, S.4). All user defined information was exported to txt file and could as well be found in the supplementary material (Listings Exported.1 and Exported.2). The OoI were afterwards scored by uploading the information from the file depicted in Listing 1 and the following 13 × 4 matrix Choquet integral aggregation. Figure 6 shows the results for the Choquet integral calculation. Additional constraints were defined afterwards. The criterion 'BCS_S' was a so called animal-based measure [28]. WQAP recommended that in the evaluation of animal welfare a strong focus should lie on animal-based more than resource-based or management-based measures. Therefore, the user decided to define the following constraints regarding the Shapley value: Aggregation of instances. As can be seen in Figure 8(c), 'AniFair' instances for the animal welfare principles 'Good feeding', 'Good ing', 'Good health', and 'Appropriate behaviour' (automated tions 'Good0', 'Good1', 'Good2', 'Appro') have been run, and an aggregation of the four principles was performed leading to a ranking of the thirteen farms according to an overall welfare score. As for 'Good health' no Choquet integral values existed, for the sake of homogeneity the unweighted mean was chosen for all welfare principles in this example. The following constraints were defined, additionally: All interaction indices were limited between 0 and 1 to enforce independence or complementary interaction between the welfare principles. Via the pre-order of the Shapley indices equality of all Shapley indices was determined. The results as well as the associated Shapley value and interaction indices were exported to txt file (supplementary material, Listing Exported.5) and csv file (supplementary material, Listing Exported.6). Table 2 shows the results of the final Choquet integral calculations.

Results
According to user information, the following final scales were received for the The scores of the 13 farms under analysis with respect to these scales can be found in Table 1   scores due to vanishing interaction indices ( Table 2). An image of the full screen visualizing the solution proposed by "AniFair" and the recalculation could be seen in supplementary material ( Figure S.8).

Discussion
In the present article, the software tool "AniFair" for Multi-criteria decision analysis was introduced and presented via an example of assessing animal welfare with regard to the principles and criteria from the Welfare Quality® Assessment protocol for pigs. In contrast to 'Growing and finishing pigs', no proposal for an aggregation system regarding 'Sows and piglets' has been released yet [28]. The welfare principal 'Good feeding' in 'Sows and piglets' was chosen as the main example to present the functionality of "AniFair", because it was less likely that a direct comparison with a currently used aggregation system could cloud the judgment of the possibilities offered by "AniFair". As interims result, the thirteen farms that participated in data collection were associated with a 'Good feeding' score (Table 1). Additional "AniFair" instances were applied to the principles 'Good housing', 'Good health', and 'Appropriate behaviour' to illustrate the 'Multiple instances'-versions of "AniFair" and to aggregate the principle scores to overall welfare assessments for the farms.
To establish a ranking of the farms considering all criteria associated with 'Good feeding' was a difficult task. The decision maker was forced to compare and weight multilayered information, as both quantitative criteria ('BCS_S_1', 'BCS_S_2', 'Age_of_weaning'); however on incomparable scales, and the qualitative criterion 'Water_supply' needed to be taken into account. The MACBETH approach [10] was utilized to generate comparable scales from 0 to 100 based on user preferences in form of qualitative evaluations concerning differences of attractiveness. Instead of having to give quantitative information regarding his or her preferences, the user was confronted to qualitatively judge the differences between only two performance levels of a criterion at a time. The functionality described up to here was similarly implemented within the M-MACBETH software [11].
Looking at other methods in multi-criteria decision, the UTA (UTilités Additives) method proposed by Jacquet-Lagreze and Siskos [23] enables the estimation of a nonlinear additive function from the decision makers global preferences [41]. "AniFair" was implemented as part of a research project related to animal welfare. The meaningful addressing of animal welfare is currently an actively studied field; thus, global a priori preferences for a reference set would lack objectivity and compromise the transferability of the measurement principle. In UTA, during the assessment of a utility function, the interval between the extreme values of each criterion is partitioned into equally long subintervals [23].
This can only be done with quantitatively measured criteria, but in "AniFair" it was necessary to consider also qualitatively measured criteria. Furthermore, as the modeling of criteria interaction was desired, the Choqet integral was set as aggregation function in "AniFair", whilst aggregation was integrated within the utility function regarding the UTA method. In comparison between the MACBETH approach and the AHP (Analytical Hierarchy Process) method [42], both provide means to structure the decision problem via a criteria tree, and both use a questioning-answering-protocol to help the user specify his or her preferences [43]. Although it would have been possible to combine the generation of criteria scales from AHP with the Choquet integral the same way the MACBETH approach and the Choquet integral were combined in "AniFair", MACBETH was chosen over AHP. Reasons for this were the usage of the 9-point numerical scale in AHP compared to the semantic scale from MACBETH, and the differences in dealing with inconsistencies in the user judgment. Due to the 9-point numerical scale used in the AHP method, the decision maker has to quantify the differences of importance between pairs of options. Especially in cases of qualitative criteria, these differences might not be addressed properly by the given numerical options. In the MACBETH approach, the qualitative attributes are represented by six variables in the SLI [38], i.e. their quantity is based on how they were used by the decision maker. With the AHP method Eigenvalue methods on the matrix of user judgments were applied to calculate i.a. a consistency index which is thresholded at 0.1 to rate the user judgment as consistent, whilst in the MACBETH approach inconsistency is given when the SLI has no solution. As the comparability between criteria was one of the main tasks in the addressing of animal welfare, the authors figured it important to base the scales on consistent judgment without tolerances. Overall, the MACBETH approach seemed to be the most suitable method to achieve comparability.
However, in the M-MACBETH software, a questioning-answering-protocol was analogously used to determine criteria weights for the additive aggregation function. "AniFair" used the MACBETH approach solely to generate scales on which the criteria can be addressed comparably, but not for the weighting of criteria. As another difference, on every aggregation level within "AniFair" the Choquet integral [13] [40] was used as aggregation function replacing an additive aggregation, i.e. weighted mean. However, the weighted mean was presented as alternative, in case no solution for the Choquet integral existed. Furthermore, "AniFair" was built for applications where data for the OoI has been gathered in advance for all criteria considered in decision making ("DA" criteria). For the calculation of a capacity, a direct ranking of the OoI by the user was avoided in favor of a ranking strictly by the collected data to maintain objectivity (Section 2.5.2, Scoring of objects of interest). In applications with the necessity to prevent subjective influence in the ranking of objects, such as animal welfare, the approach proposed in "AniFair" might be more adequate than methods dealing with a user given pre-order e.g. additive or non-additive robust ordinal regression [44] [45]. In Clivillé, Berrah, and Mauris [22] the Shapley value and the interaction indices were determined from additional user preferences prior to capacity calculation. In contrast, as with the method proposed in Angilella, Greco, and Matarazzo [45] the "AniFair" approach worked with partial information, i.e. the user was not forced to specify his or her preferences concerning the importance of criteria or interaction between criteria to achieve a solution.
For the capacity calculation, the 'maximum split' method was chosen that led to dispersed utilities and reached the maximal split that a Choquet integral solution can take for the given pre-order of objects [40]. This was considered the most suitable solution for users unfamiliar with capacity calculation or the theory behind multi-criteria decision making. Nevertheless, the user was afterwards invited to refine the proposed solution based on his or her knowledge about the relative importance of the criteria (i.e. Shapley value) and on how interaction between the criteria contributed to the decision problem. The Choquet integral was based on a capacity on the set of all criteria (Definition 3; Grabisch [14]). Thus, in assigning the relevance of a single criterion CRIT not only the value of the capacity of the one element set {CRIT} but the capacity values of all subsets containing CRIT needed to be considered (Section 2.4.1). The user had to be aware that all criteria combinations that included CRIT were weighted by setting constraints on the Shapley index of CRIT. In the 'Good feeding' example, the Shapley value (Figure 7) was set according to the user preference that the criteria linked to animal-based measures should have a higher relative importance [28]. As a result, the Shapley index of 'BCS_S' was ≈4.2 times higher than the Shapley indices of 'Age_of_weaning' and 'Water_supply' in the re-calculated solution (Section 2.5.5, Choquet integral aggregation). This changed the rank-ing of farms. Farm '4' was scored differently in the criteria compared to farms '3' and '6', but was also positioned second in the ranking due to the high relative importance of 'BCS_S'. For the same reason farms '1' and '9' improved in ranking whilst farm '13' dropped down five places. Although 'BCS_S' was weighted highest by the user, the largest differences (between farms '8' and '4' as well as farms '6' and '1') were associated with changes in the score for 'Water_supply'. This was due to the fact, that 'Water_supply' only had two performance level that were scored very differently (90.0 and 19.9), and gave rise to the question if this criterion might be resolved finely enough for the assessment of animal welfare. This illustrated the vulnerability of aggregation results towards relative importance of criteria. Scientific work [46] [47] proved that not only personal knowledge and topicality but various other factors influenced how the user evaluated the relevance of criteria.
Since independence of criteria usually was not given with real live decision problems [48], and since criteria interaction might be experienced differently among users [49], the possibility to model interaction between criteria was an important feature of "AniFair". An example on how the aggregation of interacting criteria could not be modeled by a weighted mean could be found in Čaklović [50]. Only the interaction between pairs of criteria-associated with 2-additive capacities-was implemented in "AniFair" due to simplicity and the more likely existence of the mathematical solution. Additionally, the 2-additive case was the most interesting for practical applications [48]. Furthermore, it was less complicated to evaluate for the decision maker than interactions of higher order. As the welfare of pigs with regard to 'Good feeding' was sensitive towards prolonged hunger as well as prolonged thirst (Section 2.1), the importance of the union of criteria was considered larger than the importance of single criteria, and thus, all criteria interacted positively (complementary criteria). Beneath others, in Table 1 the parts of the Choquet integral values that were associated to interaction were given. Farm '8' was ranked highest and showed high scores for all criteria. As a consequence only 5% of the overall 'Good feeding' performance resulted from the interaction. On the contrary, for farms '7' and '13' the interaction represented approximately 55% of the farms performance, due to the large differences in criteria scores. It has to be noted, that without considering interaction these farms would have scored 30.93 and would, thus, have been ranked higher than farm '12' (29.93 without considering interaction). In addition to the successful combination between MACBETH and the Choquet integral ("Ani-Fair"; Martín, Traulsen, Buxadé, and Krieter [30]; Clivillé, Berrah, and Mauris [22]), the importance of the modeling of interaction was undermined by Gomes, Machado, Costa, and Rangel [51], as they presented significant advances that arose from extending the TODIM (an acronym in Portuguese of Interactive and Multicriteria Decision Making [52]) method by using the Choquet integral as an aggregation function.
A further consequence of the recalculation, concerns the distribution of Cho-quet integral values. The 'maximum split' solution was associated with fairly balanced weighting between the criteria. After recalculation, the range of Choquet integral values was narrower, the mean difference between consecutive farms was smaller, and the differences showed stronger variation. Instead of nearly equidistantly ranked farms with the 'maximum split' solution, after the recalculation the distribution showed a majority of negligible differences. In this way, the more specific adaption of the model to user preferences in terms of constraints, led to a clear separation into groups of farms with comparable all 'Good feeding' scores, while the first farm in ranking farm '8' was clearly superior to the following farms. Thus, a more pronounced statement towards the animal welfare status was made.
Similar to the aforementioned AHP method, with 'AniFair' at least two aggregation level were possible. With this the natural human tendency was supported to break down selection processes and to split up decision making in several stages, when the number of objects increased [53]. In the 'Good feeding' example, a pre-aggregation from the second level criteria "BCS_S_1" and 'BCS_S_2' to the first level criterion 'BCS_S' had to take place, cause 'BCS_S_1' and 'BCS_S_2' were considered dependent (Section 2.5.1, Independent and dependent subcriteria.). 'AniFair' automatically initiated this pre-aggregation step and provided the aggregated 'BCS_S' scores for the main aggregation. "AniFair" could in this regard be seen as a hybrid between AHP and MACBETH, as subcriteria can be aggregated to first level criteria like in AHP, but can also serve on equal terms to the first level criteria. In the latter case, the first level criteria play the role of 'Non-criteria nodes' as in the M-MACBETH software and contribute only to the structuring of the decision problem but not to the decision process. Using the 'Multiple instances'-version of "AniFair", the user was given an additional aggregation level by running multiple "AniFair" instances and applying the aggregation of instances.
All information were entered in "AniFair" via Graphical User Interfaces. The user was guided through the decision process, and his or her content-related expertise was specifically queried when needed for the next step in decision mak- Regarding animal welfare, the main concern is its measurability, as a clear definition as well as how to address animal welfare with overall scores are heavily discussed topics and subjects to current scientific research [26] [28] [30]. Animal welfare is on the one hand bound to moral questions and prone to emotional discussion but on the other hand connected to economical questions as well as social and political aspects. Compared to decision problems concerning personal life or structural decisions and improvement in companies, subjectivity needs to be applied with all care. As a matter of fact, the impact of considering multiple decision criteria in assessing animal welfare was once more underlined by the fact that the ranking of farms had changed comparing the 'Good feeding' scores with the overall welfare scores including all four principles. A special example was farm '13', that was ranked highest in the overall welfare score but was found only on the eleventh rank when the analysis was reduced to 'Good feeding'. On the contrary, farm '4' was ranked second with regard to "Good feeding" and took position eleven in the ranking due to the results of the aggregation over all welfare principles. Exemplarily, farms '9', '10', '11' hold ranks six to eight in both aggregations ( Table 1, Table 2). As a consequence, for an overall welfare score all four principles should be evaluated to come to a holistic welfare assessment.
The origination of the overall scores was transparent, as "AniFair" provided Shapley value, interaction indices and all interims results such as pre-aggregation results and information on how the individual farms scored in the criteria. With this, farmers could be shown their current animal welfare status compared to other farms. As the criteria scales were made homogeneous and the relative importance of the criteria is known, "AniFair" results were a solid basis for aimed advice to the farms with low rankings, in which criterion it was most pressing to improve the welfare status. In a discourse between farmers, animal welfare scientists and politicians, the theoretical aspects of measuring animal welfare combined with the overall welfare scores with regard to real farms could attest or revise animal welfare assessment. A reliable overall welfare score could serve as a basis for the certification of animal welfare labels which allow consumers to choose their animal-related products with regard to good animal welfare. Creation of criteria tree. Assessed were the welfare criteria 'Absence of prolonged hunger' and 'Absence of prolonged thirst' (main article Section 2.2.1) via the measures body condition score of sows ('BCS_S'), age of weaning ('Age_of_weaning'), and water supply ('Water_supply'). 'BCS_S' was divided into the second level criteria (subcriteria) 'BCS_S_1', 'BCS_S_2'. At data collection the percentages of sows scored '1', and '2' were calculated for every farm. The criterion 'Age_of_weaning' was assessed as the averaged number of days from birth to weaning as stated by the farmer. The criterion 'Water_supply' was given by a binary decision, if the drinking places for sows and piglets were adequate regarding the cleanliness and functionality of all drinkers (score '0') or not (score '2'). From these criteria a criteria tree was build in the "AniFair" main window which was fully displayed in Figure S.1. All criteria that were selected by the 'DA' buttons were marked in bold and red font. Hitting the 'Proceed calculation' button opened a GUI window where the User had to decide upon the in/dependence of subcriteria (main article Section 3.1, Independent and dependent subcriteria) and confirm the choices before further processing could be carried out ( Figure S.2).

Formatting of Funding Sources
Making criteria comparable. In Figure S.3 the definitions of the performance level for the decision criteria were illustrated. As a next step, the matrices of judgment could be filled in (Appendix: Background of 'Making criteria comparable'). The evaluation of the differences of attractiveness could for all criteria be viewed in Listing Exported.1 together with the export of the criteria tree and the definition of performance level. All Listings were presented in the Appendix: Data exported from "AniFair" of this Supplementary material. Based on these user preferences "AniFair" scales were calculated which were precardinal scales, i.e. the distances between the entries on the scale mirror the qualitative attributes with which the User evaluated the pairwise differences between performance level. However, the relative differences of attractiveness as experienced by the User might not be represented by distances between entries on the "AniFair" scales. That was why the User was asked to modify the scales after inspecting the graphical visualization. Figure S.4 illustrated the "AniFair" scales on the left and the final criteria scales after user modification on the right exemplary for 'Age_of_weaning' and 'Water_supply'. As an example, the User experienced that the performance level '28 -24.5' of criterion 'Age_of_weaning' needed to be scored closer to the maximum score 100 associated with the performance level '>28' than "AniFair" had suggested. Thus, the User modified the scale for 'Age_of_weaning' by raising the score for '28 -24.5' via the spin buttons of the thermometer. "AniFair" internally calculated boundaries (Appendix: Background of 'Making criteria comparable', Dependent intervals.) for the modification of the scale to prevent that the user preferences entered earlier were  In the left column of (a) (b) and (c) (d) the scales are displayed in the graphic and the thermometer as calculated by "AniFair". In the right column the scales have been modified by the User in a manner that his or her concept of relative attractiveness between criteria is fulfilled (Appendix: Background of 'Making criteria comparable', Matrices of judgment and scale calculation, Dependent intervals). Boundaries within which the before made evaluation of differences of attractiveness stay fulfilled are internally calculated to make sure that the displayed scales are always consistent with the user defined preferences. (a) (b) For the criterion 'Age_of_weaning' the score for the performance level '24.5' has been raised to the boundary, that is why the corresponding marker has changed appearance and color from circle to crossed square and from green to red. (c) (d) For 'Water_supply' the score for '0' has exemplarily been lowered from 100.0 to 90.0 and the score for '2' has been raised from 0.0 to 19.9. of animal welfare ( Figure S.5(a)). Figure S.5(b) showed, that all interaction indices had been set greater than zero. As the welfare of pigs was sensitive towards prolonged hunger as well as prolonged thirst, the importance of the union of criteria was considered larger than the importance of single criteria, and thus, all criteria interact positively (complementary criteria). Furthermore, the User considered it necessary, that the interaction indices for pairs of criteria coincided. These constraints were defined via the pre-order of interaction indices ( Figure  S.5(c)). In Figure S.6 could be seen, that the ranking of farms with regard to the welfare principle 'Good feeding' had changed for rank 2 and following. Furthermore, the Shapley value and the interaction indices had been adapted according to the defined constraints. The final results as well as the constraints were then exported to txt-file and csv-file (Listings Exported.3, Exported.4 in Appendix: Data exported from "AniFair").
'Multiple instances'-version and aggregation of instances. 'Good feeding' and the remaining welfare principles 'Good housing', 'Good health', and 'Appropriate behaviour' were run out in the 'Multiple instances'-version of 'Ani-Fair' (main article Section 3.4). As with 'Good health' no capacity solution existed and a weighted mean was calculated instead, these results were displayed in Figure S   all other results and user entered information regarding the welfare ciples 'Good housing', 'Good health', and 'Appropriate behaviour' were not illustrated, as in this article no detailed discussion on the welfare of pigs but a pres-entation of the "AniFair" software tool was aspired. For the aggregation of instances the User was presented with the type of results available for each instance (main article Section 3.4, Figure 8(b)). As for 'Good health' no Choquet integral values existed, for the sake of homogeneity the unweighted mean was chosen for all welfare principles in this example. The results of aggregation prior to the definition of constraints was displayed in Section 3.4, Figure 8(c) in the main article. The following constraints had been defined, additionally: All interaction indices have been limited between 0 and 1 to enforce complementary interaction between the welfare criteria. Via the pre-order of the Shapley indices equality of all Shapley indices was determined. The results as well as the associated Shapley value and interaction indices were exported to txt file (Listing Exported.5 in Appendix: Data exported from "AniFair") and could be viewed in Figure S. 8.
As a result the thirteen farms were assigned overall scores for all individual welfare principals and an overall evaluation of the welfare standard. A ranking was formed that reflects the relative importance of the criteria, respectively, principles. As the scores were made comparable and displayed together with the final scores, aimed advice could be given to the farms with low rankings, in which criterion/principle it was most pressing to improve the welfare status of the animals. All decisions could be looked up in the exported files and served as basis for discussion for animal welfare experts. Saving and reloading "AniFair" status. Up to three '{SAVE}' buttons could be found in "AniFair". With these buttons the current "AniFair" status could be saved. This included OoI, criteria, subcriteria, information which criteria are 'DA', information on the (in) dependence of subcriteria, bases of comparison, performance level, matrices of judgment, "AniFair" scales and dependent intervals. In contrast to the export of user entered information, scales or results to txt files, these "AniFair" status files were not designed for analysis or to be human readable, but to reload information into "AniFair". Every "AniFair" instance was equipped with a 'LOAD' button to restore all information in a respective "AniFair" status. Afterwards, criteria could be added without compromising any loaded information. However, the deletion of a criterion could compromise the mapping between the criteria and the information on performance level, matrices of judgment, scales and dependent intervals. "AniFair" might, thus, be obliged to ignore the information. Alteration of criteria also caused "AniFair" to neglect the information on the respective criteria.