An Experimental Comparison between Self-and Third-Party Evaluations

How to pick up the true meaning of messages exchanged in the laboratory is an important issue for experimental research. The present study investigates, by experimentally comparing selfand third-party evaluations, to what extent self-evaluations by message receivers can be relied on. After standard public-good game, subjects receive a free-form written message evaluating their decision and self-evaluate its content from their counterparts. Third-party evaluators also evaluate the content independently. A comparison between both evaluations shows that a significant proportion of them agree. Firm evidence of a self-serving bias cannot be found.


Introduction
Non-restricted communication plays an important role in economic decision-making.Experimental literature continues to accumulate evidence showing that the behavior of subjects significantly differs in depending on whether they are allowed to send messages to each other or not (e.g., Cooper and Kagel [1]; Charness and Dufwenberg [2]; Schotter and Sopher [3]; Kimbrough et al. [4]; Sutter and Strassmair [5]).That is, cheap talk is not so cheap after all.One important issue here is how to pick up the true meaning of messages exchanged in the laboratory.Message receivers correct their behavior according to the content of a message, and message senders expect in advance that this will happen and write the message.Therefore, to understand why such a message has a significant impact, researchers have to reproduce the true interpretation of a message by the message receiver.
Message receivers' self-reports look sound, but they may produce a severe self-serving bias (e.g., Babcock et al. [6]; Babcock and Loewenstein [7]; Offerman [8]); i.e., receivers may interpret a message in their own favor for some reasons when are asked to do so.For example, they may want to justify their behavior to preserve their self-image.Alternatively, they may want to appear to be nice to the experimenter.By contrast, third-party evaluations may be somewhat more objective but differ from the message receiver's interpretation being made during the experiment.
However, very little is known to what extent subjects' self-evaluations and third-party evaluations agree or disagree.This study provides evidence on this question.Using a message exchanges an experiment described below, self-evaluation of a message received by the subjects is compared with third-party evaluations of it.A significant proportion of both evaluations agree, suggesting that the subjects' self-evaluations are, at least to some extent, reliable.Firm evidence of a self-serving bias cannot be found.

Message Exchange Experiment
The message exchange experiment, which comprises two stages, is described as follows.In the first stage, paired subjects play a standard public-good game; they simultaneously decide how much they would invest in the public good from 20 units of an endowment.A zero-investment is the dominant strategy, while investing the whole endowment leads to a Pareto-efficient allocation (see the appendix for a detailed description of the voluntary contribution mechanism used in the experiment).In the second stage, after the decision of their partners has been revealed, subjects write a free-form message evaluating their partners' contribution and send it to them.The message is inputted via a keyboard, and is not handwritten.After the subjects have sent their message, it is displayed on their partners' computer screens.After confirming the content of the message, the subjects classify it into the following three evaluation indexes: positive, neutral, or negative.
The experiment was conducted at Osaka University.Twenty subjects participated in each of two sessions1 .The experiment required approximately one hour, and the average payoff per subject was $23.61.

Third-Party Evaluations of Messages
After the message exchange experiment, an additional 12 students were employed as third-party evaluators.After a detailed description of the message exchange experiment, they simultaneously and independently classify the messages actually written in the experiment according to their content into the same three evaluation indexes.Among these 12 evaluators' decisions on each message, the most popular one was adopted as the third-party evaluation 2 .
Comparing self-and third-party evaluations of a message, we define a self-serving bias as follows.Definition 1.We say there exists a self-serving bias if (i) the self-evaluation of a message is neutral or positive but the third-party evaluation of it is negative or (ii) the self-evaluation of a message is positive but the third-party evaluation of it is neutral.
In other words, if the receiver of a message interprets it more positively than third-party evaluators do, a selfserving bias is indicated.Of course, we can consider the opposite bias, such as a sort of self-discipline bias3 .

Comparison between Self-and Third-Party Evaluations
Figure 1 shows the linkage between subjects' relative contributions compared to their partners' and the types of messages they received and rated.The horizontal axis denotes the relative difference between the contributions of paired subjects, and the vertical axis denotes the frequency of each type of message.For example, three subjects contributed 15 units more than their counterparts did and received a positive message.As seen in Figure 1, subjects who contributed more than their counterparts did tend to receive a positive message, and those who contribute less, a negative message.However, the relationship is not simply linear 4 .
This nonlinear relationship seemingly implies that the messages are distorted by their receivers, but this is not true.Table 1 compares the evaluations of messages by the subjects themselves and those by third-party evaluators.The data in parentheses, those with underlines, and those in brackets represent self-serving evaluations, agreed evaluations, and self-discipline evaluations, respectively.Self-serving and self-discipline evaluations account for 17.5% and 7.5%, respectively, and the remaining 75% are agreed evaluations.At least the experimental data show that a significant proportion of self-evaluations are reliable in the sense that they match third-party evaluations.Firm evidence of a self-serving bias (and that of the opposite bias) could not be found.

Does Majority Opinion Matter?
In addition to the direct comparison of self-and third-party evaluations presented above, a further investigation was conducted on the agreement between both evaluations.In the analysis so far, the most popular evaluation among third-party evaluators was adopted as the average opinion and the number of votes in its favor was neglected.Here, we investigate the relationship between the number of votes for the most popular evaluation and the probability that both evaluations agree.Specifically, the following probit model was estimated: Max 0, Max 0, where the dependent variable is a dummy, which equals 1 if subject i's self-evaluation of a message received from subject j agrees with third-party evaluations and 0 otherwise.The independent variables are the number of votes for the most popular evaluation of the message by third-party evaluators and the absolute positive and negative differences between the contributions of subjects i and j.
The results are summarized in Table 2. First, as is apparent from Table 2, the number of votes for the most popular evaluation does not have a significant effect on the agreement between self-and third-party evaluations (p = 0.780).Intuitively, the more difficult a message is for third-party evaluators to judge, the more likely it is that the receiver's evaluation would be influenced by a self-serving bias.However, this possibility was ruled out.Considering Result 1, self-evaluations accord well with third-party evaluations, even for a message on which opinions are divided among third-party evaluators.
Second, although the absolute positive and negative differences between contributions were not significant at the 10% level (p = 0.196 and 0.206, respectively), the estimated coefficients for both were small positive values.Some subjects and third-party evaluators might use this information, in addition to the content of a message, to evaluate it.

Conclusions
The present study provides the evidence on the extent to which self-evaluation of a message received by subjects can be relied on.The experimental data confirm that (i) their self-serving bias is not large and (ii) their self- Negative [1] [0] 8 Note: The data in parentheses, those with underlines, and those in brackets represent self-serving evaluations, agreed evaluations, and self-discipline evaluations, respectively.evaluations accord well with third-party evaluations, even when a message is relatively difficult for third-party evaluators to judge.A positive interpretation of these results will imply that experimental researchers can consider subjects' self-evaluations during an experiment as reliable data for analytical purposes, at least to some extent.However, even though the number of observations is small, some subjects interpret a message more positively-but not more negatively-than third-party evaluators do.When and how often cheating occurs is still an open question, left for future research.
Finally, a limitation of this study should be mentioned.The discussion thus far implicitly assumes that third-party evaluators will evaluate messages objectively and neutrally, at least to some extent.However, if taking their psychological factors as human beings into account, this may not always be true.For example, many studies point out that people sometimes behave spitefully, that is, their behavior intends to make others suffer (monetary or nonmonetary) losses (e.g., Jensen [9]; Leibbrandt and López-Pérez [10]).By doing so, they can derive pleasure from others' misfortune.This behavior is believed to be due to aversion to inequity and/or competitive spirit.If a third-party evaluator has such a spiteful preference, he/she may feel jealous of a player who has received praise for his/her good behavior, and consciously give him/her a low evaluation5 .To verify the neutrality of third-party evaluations and its robustness, a systematic investigation is needed.

Figure 1 .
Figure 1.Relative contributions and the types of messages.

Table 1 .
Self-and third-party evaluations of messages.

Table 2 .
Results of the probit regression.
Notes: The dependent variable equals 1 if a subject's self-evaluation agrees with the third-party evaluation, and 0 otherwise.The numbers in parentheses represent p-values.