An Experimental Analysis of the Assessment and Perception of Behavior Change: How Summary Measures Influence Sensitivity to Change Processes

doi:10.4236/psych.2013.41001

Paper Menu >>

Journal Menu >>

Psychology

2013. Vol.4, No.1, 1-10

Published Online January 2013 in SciRes (http://www.scirp.org/journal/psych) http://dx.doi.org/10.4236/psych.2013.41001

An Experimental Analysis of the Assessment and Perception of

Behavior Change: How Summary Measures Influence

Sensitivity to Change Processes

Anselma G. Hartley1, Jack C. Wright1, Audrey L. Zakriski2, Anne N. Banducci3

1Department of Cognitive, Linguistic, & Psychological Sciences, Brown University, Providence, USA

2Department of Psychology, Connecticut College, New London, USA

3Department of Psychology, University of Maryland, College Park, USA

Email: Anselma_Hartley@brown.edu

Received October 6th, 2012; revised November 6th, 2012; accepted December 4th, 2012

A series of experiments examined how summary assessment measures influence people’s ability to detect

change in behavior over time and across situations. Two measures that are often used to assess child be-

havior (Teacher Report Form) and adult personality (Five Factor Inventory) were examined. Each instru-

ment led perceivers to focus on the overall frequency of targets’ behavior, even when targets differed both

in how they reacted to social events and in how often they experienced those events in their interactions

with others. Although people adopted an overall frequency perspective when using summary measures,

they detected changes in events and targets’ if … then … reactions to events when using alternative con-

text-specific measures. The findings demonstrate how summary trait methods can shift perceivers’ atten-

tion away from situational factors and thereby yield trait scores that are insensitive to context-specific but

potentially important changes in targets’ social behavior.

Keywords: Personality; Social Perception; Assessment; Behavior Change; Social Context

Introduction

A potential conflict exists between the way people think

about personality and how researchers assess it. On the one

hand, researchers often emphasize the breadth and stability of

traits and therefore use personality measures that aggregate

over variability that may occur over time and situations

(Mischel, 2009; Watson, 2004). On the other hand, social cog-

nition research suggests that people incorporate situational

information into their personality impressions (Kammrath,

Mendoza-Denton, & Mischel, 2005; Smith & Collins, 2009).

Despite the widespread use of “summary” trait measures in

both child and adult assessment, little research has explored

how social perceivers use them under laboratory conditions in

which the relevant inputs can be isolated and manipulated. The

present research illustrates how such methods can deepen our

understanding of how summary trait measures influence per-

ceivers’ sensitivity to personality change. In this paradigm, we

create targets who show different patterns of change over time

in their social environments and in how they responded to them.

We examine the possibility that summary trait measures lead

perceivers to focus on overall behavior rates and to de-empha-

size contextual information they might otherwise use. We test

the further implication that this emphasis on overall frequencies

leads raters to report that target behavior is stable over time

even when targets show clear changes in how they respond to

specific social situations.

Summary approaches have a long tradition in child and adult

assessment. On widely used child measures (e.g., Teacher Re-

port Form or TRF, Achenbach & Rescorla, 2001), an adult

typically rates how well brief statements describe the child.

Many of these statements focus on the frequency of behaviors

(“teases a lot,” “threatens people”), some include trait adjec-

tives (“stubborn”), and less often they refer to the context in

which the behaviors occur (“disobedient at school”, “defiant,

talks back”). Popular “Big Five” measures used to assess adult

personality (e.g., NEO-PI-R and the NEO-Five Factor Inven-

tory or FFI, Costa & McCrae, 1992) also include behavior fre-

quency statements (e.g., “seldom sad or depressed”), trait ad-

jectives (“is a cheerful, high-spirited person”), and statements

that explicitly refer to behavior in context (“if he doesn’t like

people, he lets them know it”). Although these child and adult

measures vary in how their items were generated and how often

they refer to contexts, they share an essential feature: Both

aggregate into summary scales that do not reveal what these

contexts are, how often they occur, or how responses to them

may vary. Such measures thus focus on mean-level behavior

tendencies, and do not reveal individual differences in how

people respond to specific contexts (Cervone, 2005; Cervone,

Shadel, & Jencius, 2001).

Alternative models incorporate context into personality as-

sessment by examining if … then … links between events that

occur in a person’s social environment (e.g., if provoked) and

their reactions to them (e.g., then hostile) (Vansteelandt & Van

Mechelen, 1998; Wright & Mischel, 1987). Studies adopting

such approaches have demonstrated that personality is revealed

not simply through overall trait or behavior levels, but through

an individual’s contextualized patterning of trait-relevant be-

havior (Fournier, Moskowitz, & Zuroff, 2008; Hartley, Zakriski,

& Wright, 2011; Hoffenaar & Hoeksma, 2002; Smith, Shoda,

Cumming, & Smoll, 2009). A complementary line of “socially

situated” cognition research proposes that context plays an

A. G. HARTLEY ET AL.

important role in social perception and judgment (Reeder,

Monroe, & Pryor, 2008; Smith & Collins, 2009). Although

early studies on the “fundamental attribution error” (Ross, 1977)

argued that situational influences are often ignored, subsequent

research found that people do incorporate contextual informa-

tion into their personality judgments, but when and how they do

so depends on several factors (Gilbert & Malone, 1995). For

example, people have difficulty integrating situational influ-

ences into their dispositional judgments when the salience of

the stimuli is low and cognitive load is high (Chun, Speigel, &

Kruglanski, 2002). People’s ability to process behavioral and

situational information also depends on their statistical knowl-

edge and investment in the target (Schaller, 1992), and on their

affective state (Hunsinger, Isbel, & Clore, 2011).

Despite considerable field research using summary measures

(Gresham et al., 2010; Terracciano, McCrae, & Costa, 2009),

little work has examined how perceivers use them under con-

trolled laboratory conditions. Social cognition research has used

experimental methods to study people’s use of situational in-

formation (Chun et al., 2002; Kammrath et al., 2005; Trope &

Gaunt, 2000), yet this work has not examined how summary

trait measures influence what people encode in their ratings.

Some researchers have claimed that summary measures are

implicitly contextualized by the respondent even when items

lack explicit contextual cues (Tellegen, 1991; Wood & Roberts,

2006), and are therefore sensitive to reaction patterns (Denissen

& Penke, 2008). For example, items that contain trait adjectives

(e.g., “thoughtful and considerate”, “is a cheerful, high spirited

person”) might lead the rater to infer the situations that are most

relevant and to judge how the target reacts when those situa-

tions are encountered. However, we are unaware of an experi-

mental test of this idea. Other researchers have speculated that

summary methods lead people to rely on global representations

lacking in specific time or setting cues (Schwarz & Oyserman,

2011). Support for this argument is found in studies showing

that summary measures lead people to ignore conditional if …

then … links between events and reactions and focus instead on

overall act frequencies (Wright et al., 2001). In the present

study, we test the idea that summary measures—including

popular child behavior measures and adult five-factor meas-

ures—are designed to assess overall behaviors, do this well, but

in doing so miss changes in how people respond to specific

social situations.

We extended past work in several ways. First, rather than

focusing on a single time point, we created targets that changed

over time, both in how often they encountered events (“event

rates”) and in the conditional probability of their responses to

them (“reaction rates”). In Studies 1-2ab, peer provocation and

adult discipline were the focal events and aggression was the

focal reaction, as these are relevant to child assessment (Dirks,

Treat, & Weersing, 2007). This yielded two targets who

showed “converging” changes in event rates and reaction rates

(i.e., both decreased or both increased), and thus their overall

rates of aggression increased or decreased. The two other tar-

gets showed “diverging” changes: One experienced an increase

in aversive events, but became less likely to respond aggres-

sively to them; the other experienced a decrease in aversive

events, but became more likely to respond aggressively. These

targets are especially interesting because they show opposite

changes in event and reaction rates, yet show no change in

overall aggression rates. If summary measures track only over-

all rates, as we have proposed, they should distinguish between

targets whose overall rates differ, but fail to distinguish be-

tween targets who show opposite reaction change but constant

overall behavior rates. If, on the other hand, these measures are

implicitly contextualized as others have suggested, they should

distinguish between targets whose reactions to events changed

over time, even if their overall behavior rates did not.

Second, we used both child and adult targets, and we exam-

ined both popular measures for studying child behavior (TRF;

Achenbach & Rescorla, 2001) and adult personality (NEO-FFI;

Costa & McCrae, 1992). In each of our experiments, partici-

pants used the instrument to rate the target at the end of one

period of observation, and then again at the end of a second

period. Studies 1-2ab focused on aggressive behaviors of chil-

dren that are relevant to the TRF, and Study 3 focused on (dis)

agreeable behaviors of adults that are relevant to the agreeable-

ness domain on the FFI. Guided by past theorizing and evi-

dence (Schwarz & Oyserman, 2011; Wright et al., 2001), we

hypothesized that relevant scales on the TRF (aggression) and

FFI (agreeableness) would be sensitive to changes in targets’

overall behavior rates, but insensitive to differences between

the diverging targets whose reactions changed in opposite di-

rections.

Third, we examined whether participants can detect changes

in rates of eliciting events and changes in targets’ conditional

reactions to them, even if this is not evident when they use

summary trait measures. Based on people’s sensitivity to con-

text at a single time point (Chun et al., 2002; Wright et al.,

2001), we predicted that participants’ open-ended descriptions

of targets would refer not only to their overall behavior tenden-

cies, but also to events targets encountered and their event-

specific reactions. We further expected that participants would

differentiate between the diverging targets when explicitly

asked to estimate how often targets encountered events and the

conditional probability of their reactions to those events. Be-

cause people can have difficulty judging conditional probabili-

ties (see Fox & Levav, 2004), we examined how two response

formats—a typical rating format (e.g., Vansteelandt & Van

Mechelen, 1998) versus a frequency-count estimation format

(Gigerenzer, 2008)—influenced their performance. Support for

these hypotheses would indicate that widely used summary

assessment methods divert people’s attention away from situa-

tion-specific changes in behavior they otherwise notice and

thereby yield ratings that reflect only targets’ overall behavior

frequencies.

Study 1

We first examined change over time. Using a 2 (event rate) ×

2 (reaction rate) × 2 (phase) design, we manipulated whether a

target child experienced an increase or decrease in the probabil-

ity of aversive events (“event rates”), and an increase or de-

crease in the conditional probability of aggressive behavior

when those events occurred (“reaction rates”). We hypothesized

that the TRF is primarily sensitive to base-rates, and thus

should be influenced by all factors that contribute to overall

behavior (i.e., events and reactions), and not just by targets’

reaction rates. Thus, the TRF should be unable to distinguish

between the functionally diverging targets even though one

showed an increase in aggressive reactions to aversive events

and one showed a decrease.

A. G. HARTLEY ET AL.

Method

Participants

Forty-three undergraduates from the pool in an introductory

psychology class participated at Brown University. Three were

removed: two who completed materials out of order, and one

who did not understand the instructions. This yielded a sample

of 40 (20 M, 20 W, Mage = 19.2 years, SD = 1.17). All studies

reported were approved by Brown University’s Institutional

Review Board.

Materials

The experimental stimuli were based on Wright et al. (2001),

but described the target at two points. The target was identified

as a fictitious 11-year-old boy (“Dan”) in a residential summer

program. Participants viewed 32 vignettes that described the

target at the beginning of the summer (Phase 1) and 32 that

described him 9 weeks later (Phase 2). Four targets were cre-

ated. One encountered an increase in aversive events and

showed an increase in aggressive reactions to those events

(E+/R+) (“+” = increase). The second showed a decrease in

both event rates and reaction rates (E−/R−) (“−” = decrease).

The third encountered an increase in aversive events, but

showed a decrease in aggressive reactions (E+/R−). The fourth

had the reverse arrangement (E−/R+).

Each vignette, presented for 9 seconds on an otherwise blank

computer screen, described the setting and an interaction be-

tween Dan and another person. The setting, agent, agent action,

target name, and response appeared in the same order. Events

consisted of aversive peer events (tease, threaten), aversive

adult events (warn, discipline), nonaversive peer events (proso-

cial talk, ask), and non-aversive adult events (prosocial talk,

ask/instruct). Reactions were aggressive or nonaggressive. An

example of a peer aversive event with an aggressive reaction is:

“In the dining hall a boy says, ‘Shut up and give me your des-

sert.’ Dan replies, ‘No, you shut up. I want it.’” An example of

an adult aversive event with a non-aggressive reaction is: “In

swimming, a counselor says, ‘You better not go past that green

rope.’ Dan says, ‘Okay, I won’t.’”

Table 1 shows the probabilities of aversive events, p(E), the

conditional probabilities of aggressive reactions to those events,

p(R|E), and the corresponding frequencies. The probabilities of

aversive events are obtained by dividing the number of aversive

events per phase by the total number of vignettes per phase (32).

Conditional probabilities of aggressive reactions are obtained

by dividing the number of aggressive behaviors to aversive

events by the number of aversive events encountered. The

overall probability or “base rate” of aggressive behaviors, p(R)

is obtained by p(E)*p(R|E); this is equivalent to the number of

aggressive behaviors per phase divided by the total number of

vignettes per phase. The converging E+/R+ and E−/R− targets

showed increases (or decreases) both in aversive events and in

aggressive reactions to them, and therefore their base rates of

aggression increased (or decreased) over phases. The diverging

E−/R+ and E+/R− targets (rows 2 - 3) differed in the condi-

tional probability of their aggressive reactions to aversive

events, but had equal base rates of aggression at each phase.

Dependent Measures

Open-Ended Descriptions. Participants read, “You’ve just

Table 1.

Properties of the four experimental targets for all studies.

Phase 1 Phase 2

Condition p(E) p(R|E) p(R) p(E) p(R|E) p(R)

E−/R− .75 .75 .56 .25 .25 .06

(24/32)(18/24)(18/32) (8/32) (2/8) (2/32)

E−/R+ .75 .25 .19 .25 .75 .19

(24/32)(6/24)(6/32) (8/32) (6/8) (6/32)

E+/R− .25 .75 .19 .75 .25 .19

(8/32)(6/8) (6/32) (24/32) (6/24)(6/32)

E+/R+ .25 .25 .06 .75 .75 .56

(8/32)(2/8) (2/32) (24/32) (18/24)(18/32)

Note: p(E) = probability of aversive event; p(R|E) = probability of aggressive

reaction to aversive event; p(R) = base-rate probability of aggressive behavior.

Note that p(R) = p(E)



p(R|E). “+” indicates increase; “−” indicates decrease in

event or reaction rate. E = event; R = reaction. Values in parentheses indicate

frequencies on which probabilities and conditional probabilities were based; for

p(E) and p(R), the denominator is always the total number of vignettes per phase

(32), and for p(R|E), the denominator is the number of aversive events per phase.

read about Dan during the first week of June (second week of

August) in the residential summer program. Please describe in a

few sentences what was most important about Dan and the

summer program during that time.”

Teacher Report Form. As in Wright et al. (2001), we used a

subset of the 118 items from the 1993 version of the TRF

(Achenbach, 1993) to avoid fatigue. Specifically, we used the

scale that was most relevant to this study (aggression, 25 items)

and a contrast scale (withdrawal, 9 items), with “school”

changed to “camp” for our stimuli. An example of an aggres-

sion item is “argues a lot”; an example of a withdrawal item is

“unhappy, sad, or depressed.” Items were rated using the TRF’s

0 - 2 scale. Test-retest reliability of the TRF aggression and

withdrawal scales in field studies is reported to be .89 and .85

respectively when the interval is 2 - 3 weeks (Achenbach,

Howell, McConaughy, & Stanger, 1995). The TRF aggression

scale correlates modestly but significantly with classroom ob-

servations of verbal aggression and disruptive behavior (Henry,

2006).

Perceived Overall Change. Participants rated changes in

Dan’s “overall behavior”, “behavior toward peers”, and “be-

havior toward counselors”. These were averaged into an “over-

all target change” scale (α = .96). Next, they rated how peers’

and adults’ overall “behaviors towards Dan changed.” These

were averaged into an “overall social environment change”

scale (α = .96). All items used a 7-point scale (1 = much worse,

7 = much improved).

Behavior, Event, and Reaction Measures. To clarify whether

participants detected overall behavior rates, event rates, and

reaction rates at each phase, these items corresponded as

closely as possible to the stimuli. Participants first rated the

overall frequency of the target’s aggressive and prosocial be-

haviors shown during Phase 1 using 4 items (e.g., “Dan argued

or quarreled”, “talked politely/made friendly requests”). They

then rated how often Dan encountered aversive and non-aver-

sive events at Phase 1, using 4 items (e.g., “peers teased, threa-

A. G. HARTLEY ET AL.

tened, or bossed Dan”, “adults complimented/made friendly

requests”). Next, they rated the target’s reactions given that

some event occurred, using 16 items (4 events × 4 reactions).

Participants read, “Indicate how often Dan showed each reac-

tion to the event described.” After each of 4 event prompts (“If

a peer teased, threatened, or bossed Dan …”), the participant

rated how often the target showed a reaction to it (e.g., “he

argued or quarreled”); the wording of the reaction was the same

as the wording of the behaviors noted above. Participants then

rated the behaviors, events, and reactions that were shown dur-

ing Phase 2. All items were rated on a 6-point scale (0 = never,

5 = almost always).

Procedure

Participants were run in groups of 1-4 on separate computers

and were randomly assigned to condition, to which the experi-

menter was blind. Using the dependent measures just described,

participants completed these steps, in order: 1) read 32 vi-

gnettes for Phase 1, each for 9 s; 2) open-ended description and

TRF; 3) 32 vignettes for Phase 2; 4) repeat step 2; 5) overall

perceived change; 6) additional ratings of behavior, events, and

reactions seen at Phase 1 and at Phase 2. To avoid contaminat-

ing the TRF, it was administered before measures that men-

tioned events or reactions.

Preliminary Analyses

Participants’ open-ended responses were coded as follows. 1)

“Overall behavior”: An uncontextualized statement about a

prosocial, neutral, or aggressive behavior or disposition without

a specified eliciting event (e.g., “Dan was friendly”). 2)

“Event”: A statement about a positive, neutral, or aversive

event without a specified response (e.g., “People were nice to

Dan”). 3) “Reaction”: A prosocial, neutral, or aggressive be-

havior in response to a positive, neutral, or aversive event (e.g.,

“Dan was friendly when others were nice to him”). Agreement

between the first author and a coder who was blind to condition

was acceptable (average κ = .80).

Additional analyses examined how perceived overall change

measures (see previous) compared with other measures. The

perceived overall change scale correlated highly with the cal-

culated TRF aggression change (r = .88, p < .001), and the

perceived overall social environment change scale correlated

highly with the calculated event change score (r = .93, p < .001).

To avoid redundancy, perceived overall change analyses are not

presented.

Results and Discussion

Open-Ended Descriptions

Although the open-ended descriptions were not our main fo-

cus, we examined the Phase 1 descriptions to clarify partici-

pants’ perceptions before they were affected by the TRF. Based

on past research (Kammrath et al., 2005), we predicted that

participants would not only describe overall behavior tenden-

cies, but also describe events and conditional reactions to them.

We calculated percentages by dividing the number of state-

ments in each category for each participant by the total number

of codeable statements for that participant. As predicted, par-

ticipants used all statement types, with nonsignificant differ-

ences in their mean relative frequency: uncontextualized be-

havior statements (40%), event statements (32%), and reaction

statements (28%), F(2, 72) = 2.15, p > .1. We also found a

statement type × reaction condition interaction, F(2, 72) = 6.18,

p < .005, η2 = .15. In conditions with low reaction rates at

Phase 1, uncontextualized behavior statements were more fre-

quent (52%) than event statements (26%) or reaction statements

(22%), whereas in conditions with high reaction rates at Phase

1, statement types differed less (28%, 38%, and 34%, respec-

tively). We found a similar pattern when analyses were re-

stricted to statements about aggressive behaviors; details can be

obtained from the first author.

Summary Trait Assessment

We expected that the TRF would detect changes in overall

behavior rates, but not distinguish between the functionally

diverging targets whose overall rates were equal. Specifically,

we predicted that TRF aggression ratings would decrease over

phase for the E−/R− condition, increase for the E+/R+ condi-

tion, and remain unchanged for the diverging conditions

(E−/R+, E+/R−).

As shown in Figure 1, the results supported this prediction.

A 2 (event) × 2 (reaction) × 2 (phase) ANOVA, with phase as a

repeated measure, revealed the expected reaction condition x

phase interaction, F(1, 36) = 56.99, p < .001, η2 = .61. Also as

expected, we found an interaction between event condition and

phase, F(1, 36) = 7.24, η2 = .66. (In all repeated-measures

analyses, significance tests were based on Greenhouse-Geisser

adjustments.) We also found a small unexpected effect for

phase, F(1, 36) = 5.52, p < .05, η2 = .13; TRF aggression rat-

ings were slightly higher overall at Phase 1 than Phase 2. No

other effects were expected or found.

To simplify subsequent analyses, we computed change

scores (Phase 2 - Phase 1), which were then submitted to a 2

(event condition) × 2 (reaction condition) ANOVA. Figure 2(A)

presents mean TRF change in standardized form (z-scores); this

was solely to permit graphical comparisons with other measures

with different natural metrics, and otherwise had no effect on

any findings we report. Our predictions and findings necessary-

ily parallel those just explained, though are now expressed as

change scores. We found the expected main effects for event

and reaction condition (Table 2) and the expected Tukey’s

HSD comparisons (Figure 2(A)). As predicted, the TRF was

sensitive to changes in overall behavior, but not to the event or

reaction changes that contributed to those rates. As shown in

Figure 2(A), the diverging conditions (E−/R+, E+/R−; see

middle bars) with identical overall behavior rates in the stimuli

did not differ for TRF aggression despite the fact that one in-

creased in aggressive reactions and the other decreased.

The preceding analyses used categorical predictors (condi-

tion), and do not fully reveal how participants’ ratings were

predicted by the base-rates of aggressive acts in the stimuli.

Recall that values for p(R) can be derived by multiplying p(E)

and p(R|E) as shown in Table 1. Because this (equal) weighting

yields the base rates, we expected it to best predict the TRF

aggression ratings. It is also possible that participants were

more influenced by the probability of encountering events, or

by the conditional probability of reactions to them. To test this,

we attached weights between .01 - .99 (in increments of .01) to

each component and computed predicted values. With w as the

event weight, and 1 − w for the reaction weight, the predicted

values were [(wip(E) + (1 − wi)p(R|E)]/2. For each weighted set,

A. G. HARTLEY ET AL.

Phase

Mean TRF

0102030

E- R- *

E- R+

E+ R-

E+ R+ *

Table 2.

F-tests and effect sizes for ANOVAs of Teacher Report Form (TRF)

ratings, event judgments, and reaction judgments, for Studies 1-2ab.

TRF Event Reaction

Study Source F η2 F η2 F η2

1 Reaction 56.99 .61 10.77 .23 126.54.78

Event 70.24 .66 137.38 .79 42.42.54

R × E .32 .01 1.56 .04 1.85 .05

2a Reaction 40.90 .53 12.46 .26 92.89.72

Event 47.02 .57 154.74 .81 25.85.42

R × E 2.39 .06 8.17 .19 1.19 .03

2b Reaction 90.75 .72 8.87 .20 50.78.59

Event 94.78 .73 45.25 .56 .95 .02

R × E .03 .00 .02 .00 .08 .00Figure 1.

Mean Teacher Report Form (TRF) aggression ratings by

phase, for Study 1. Experimental conditions are shown

next to each line. Error bars indicate +/− 1 SEM. Asterisks

indicate significant differences across phase (ps < .001).

Note: R × E = Reaction × Event interaction. Degrees of freedom were (1, 36) for

all studies. All F’s > 7.40 (12.83) were significant at p < .01 (.001); all other F’s

shown were p > .05.

-1.5 -1.0-0.50.00.51.01.5

(A) TRF Means

Mean Difference (z)

-R-

+R-

-R+

+R+

St udy 1St udy 2aSt udy 2b

-R-

+R-

-R+

+R+

abb

-R-

+R-

-R+

+R+

-1.5 -1.0-0.50.00.51.01.5

(B) Event Means

-R-

+R-

-R+

+R+

St udy 1St udy 2aSt udy 2b

-R-

+R-

-R+

+R+

-R-

+R-

-R+

+R+

aab

bc cd

-1.5 -1.0-0.50.00.51.01.5

-R-

+R-

-R+

+R+

St udy 1St udy 2aSt udy 2b

-R-

+R-

-R+

+R+

-R-

+R-

-R+

+R+

(D) TRF R-Square

W eight ed Cues

R- Squared

00.2 0.40.60.8 1.0

00.2 0.4 0.60.8 1.0

S2a

S2b

(E) Event R-Square

W eight ed C ues

00.2 0.40.60.8 1.0

00.2 0.40.6 0.8 1.0

S2a

S2b

(F) Reaction R-Sq uar e

W eight ed C ues

00.2 0.40.60.8 1.0

00.2 0.4 0.6 0.8 1.0

S2a

S2b

Figure 2.

Results for Teacher Report Form (TRF), event, and reaction measures for Studies 1 (S1), 2a (S2a), and 2b (S2b). Top row

(panels A-C) shows mean change scores for each measure (standardized within study). Experimental conditions are on the

abscissa. Bars within a panel that do not share a subscript (a)-(d) are significantly different based on Tukey’s HSD. Error bars

indicate +/− 1 SEM. Bottom row (panels (D)-(F)) shows cue weight analysis results for TRF, event, and reaction judgments,

respectively. A “weighted cue” value of 0 on the abscissa represents a full weighting of events; 1 represents a full weighting

of reactions. The ordinate shows the R2 values for predictions of participants’ ratings for phases 1 and 2 combined. Dotted

lines indicate hypothetical perfect sensitivity to act-frequencies (AF); events (EV), and reactions (RE).

A. G. HARTLEY ET AL.

we computed scores from these values, used them to predict

participants’ deviation from their mean TRF aggression rating

over the two phases, and computed R2. If participants showed

perfect sensitivity to the base rate of aggression, a peak R2 of

1.0 would occur at equal weighting of events and reactions (.50

on the abscissa; see line “AF” in Figure 2(D)). Perfect sensitiv-

ity to events is shown by line “EV” in Figure 2(E); perfect

sensitivity to reactions is shown by line “RE” in Figure 2(F).

As expected, results for the TRF resembled the theoretically

perfect AF curve in Figure 2(D) (see “S1” for Study 1), and

were best modeled (R2 = .81) when event rates (.55) and reac-

tion rates (.45) were nearly equally weighted.

Event Judgments

We examined participants’ judgments of events using the

same method as for the TRF. We predicted that event judg-

ments would show increases in the E+ conditions and decreases

in the E− conditions. As expected, the largest effect was the

main effect for event condition (Table 2), with judged event

change higher on average for the E+ conditions and pairwise

comparisons showing discrimination between the functionally

diverging conditions (Figure 2(B)). We also found a smaller,

unexpected main effect for reaction condition, with judged

event change higher on average for R+ conditions. As shown in

Figure 2(B), the mean change for the E+/R− condition, though

in the expected direction, was lower than one would expect if

participants’ event ratings were influenced only by events. As

shown in Figure 2(E), results for participants’ event judgments

resembled the theoretical results (see line “EV”) and were best

modeled (R2 = .80) when the weight was high for event rates (w

= .78) and low for reaction rates (.22).

Reaction Ju dgments

Parallel analyses were performed for judgments of aggres-

sive reactions to aversive events. We expected participants to

be sensitive to changes in target’s reaction rates and for their

ratings to increase in the R+ conditions and decrease in the R−

conditions. As expected, the largest effect was the main effect

for reaction condition (see Table 2), with pairwise comparisons

showing discrimination between the diverging conditions (Fig-

ure 2(C)). However, we also found a main effect for event

condition; the marginal mean was higher for E+ conditions. As

shown in Figure 2(C), the mean changes for the diverging

conditions (E−/R+, E+/R−), were not as large as one would

expect if reaction ratings were influenced only by reaction rates.

As shown in Figure 2(F), reaction ratings were best modeled

(R2 = .82) when the weights were less extreme (w = .63 for

reactions, .37 for events) than was found for event judgments.

Compared to the results for event judgments, these results do

not correspond as closely to the theoretically perfect results (see

line “RE”).

Summary

As expected, the TRF aggression scale was sensitive to

changes in the overall rate of targets’ aggression. It did not

detect differences between targets whose base rates were un-

changed, even though one of them increased in aggressive reac-

tions and the other decreased. Although participants focused on

targets encountered events and their conditional reactions to

those events when context-sensitive measures were used. This

occurred even though they provided these judgments at the end

of the experiment, when memory demands were high. Partici-

pants’ reaction judgments were influenced more than antici-

pated by how often the targets encountered relevant events.

act frequencies when using the TRF, they detected how often

Studies 2a-b

One interpretation oftive difficulty judging

Method

Participants. For Studynts (23 W, 17 M, Mage =

Studies 2a-b, stimuli were

1993 TRF to determine if the

mat (see Study 1, Method) into a frequency-count format. Par-

participants’ rela

action rates is that the changes they observed violated their

expectations about the stability of behavior over time. For ex-

ample, some studies suggest that temporal stability is high rela-

tive to the cross-situational consistency of behavior (Fleeson,

2001), and that people over-rely on the former when making

judgments about personality (Mischel & Peake, 1982). Study

2a therefore examined whether participants’ judgments would

be more sensitive to reaction changes when targets’ behavior

varied across settings (i.e., classrooms) rather than over time as

in Study 1. A second interpretation is that judging reactions to

events is more complex than judging overall behavior rates or

event rates. Past research demonstrates that people have diffi-

culty interpreting conditional probabilities (Fox & Levav, 2004)

and that formally equivalent tasks may be easier when they are

presented in a frequency-count format (Gigerenzer, 2008). To

address these questions, Study 2b reformatted the event and

reaction dependent measures into a frequency-count format and

asked participants to provide separate estimates of how often

events and relevant reactions to those events occurred.

2a, 40 stude

.22 years, SD = 3.50) participated, and for Study 2b, 40 (21

W, 19 M, Mage = 22.92 years, SD = 3.82) participated. Partici-

pants in both studies were recruited from the Brown University

community through flyers advertising a “psychology study”

and were paid $8 for volunteering.

Materials and procedure. For

arly identical to those in Study 1, but minor revisions were

made to describe cross-situational change rather than temporal

change. Whereas Study 1 described the target’s behavior at two

distinct points in time (June and August) Studies 2a-b described

the target’s behavior in two classroom settings (art and music).

Otherwise, the specific events and reactions described were the

same as those used in Study 1.

Study 1 used items from the

ndings from Wright et al.’s (2001) study of behavior at a sin-

gle time point extended to behavior change. Study 2a-b used

items from the 2001 TRF to determine if our results generalize

to the more recent version of the instrument. The aggression

scales in the two versions are similar, with 19 of the 20 items in

the 2001 version also appearing in the 1993 version (see

Achenbach & Rescorla, 2001). The remaining dependent mea-

sures in Study 2a were identical to those used in Study 1, with

minor word changes to ask about cross-situational change. For

example, when participants were asked about the target’s be-

havior at Phase 1, the word “June” was changed to “art class”;

likewise for Phase 2, “August” was changed to “music class.”

Study 2b was identical to Study 2a, except that the behavior,

event, and reaction measures were changed from a rating for-

A. G. HARTLEY ET AL.

ticipants were first asked to report the overall frequency of the

target’s behaviors, or n(R), at Phase 1 and Phase 2. The pro-

gram required that participants’ answers be between 0 - 32. The

same format was used for event judgments, n(E). Using the n(E)

estimate provided, the reaction prompt read, “You reported that

peers teased Dan [n(E)] times. Out of those [n(E)] times, how

many times did Dan respond by arguing or quarreling?”; we

refer to this as n(R ∩ E), where ∩ = the intersection of reac-

tions and events. Answers were required to be between 0 and

n(E) previously estimated. We computed the conditional prob-

ability of a reaction given an event (“computed reaction”) as,

p(R|E) = n(R ∩ E)/n(E).

Results and Discussion

As predicted, Studies 2a and

2b were similar pported the hy-

es in events. The ex-

d, as was the now familiar, smaller main

reaction change, we expected the

tional

gressive acts, and did

dy 3

One might argue that o for the child assessment

method (TRF) do not aply-used adult personality

Method

Thirty-nine undergradu6 M, Mage = 19.21 years,

SD = 1.10) from an introdhology pool participated.

the results for TRF ratings for

to those in Study 1 and again su

thesis that the TRF would be sensitive to overall behavior

rates, and not detect changes in diverging targets. The main

effects (Table 2), pairwise comparisons (Figure 2(A)), and cue

weighting analyses (Figure 2(D)) were similar to those for

Study 1. As expected, TRF ratings for Studies 2a-b were best

predicted (R2 = .77 and .82, respectively) when weights for

events (w = .50) and reactions (.50) were equal, as would occur

for ideal act frequency sensitivity.

The results for Study 2a again supported the hypothesis that

participants would be sensitive to chang

cted main effect for event condition was obtained, as was a

smaller effect for reaction condition (Table 2). As expected,

participants detected the difference between the events rates for

the E+/R− and E−/R+ targets, but again they were also some-

what affected by reaction rates. Participants’ event ratings

(Figure 2(E)) were best predicted (R2 = .83) when the weight

was high for events rates (w = .75) and low for reaction rates

(.25), as expected.

For reaction judgments, the expected main effect for reaction

condition was foun

fect for event condition (Table 2). Change for the diverging

conditions (E−/R+, E+/R−) was differentiated (Figure 2(C)),

but less clearly than one would expect if reaction ratings were

solely influenced by reaction rates. Reaction judgments (Figure

2(F)) were best predicted (r = .78) when the weight was higher

for reaction rates (w = .64) than for event rates (.36). Thus, the

results essentially replicated those in Study 1; the cross-setting

format of Study 2a did not measurably affect participants’ sen-

sitivity to reaction change.

Although the cross-setting format did not seem to increase

participants’ sensitivity to

equency-count format used in Study 2b to increase partici-

pants’ sensitivity to event rates and reaction rates by decoupling

the conditional probability format of the reaction rating task.

For event judgments, we found the expected main effect for

event condition (Table 2), and change scores for the diverging

conditions (E−/R+, E+/R−) were in the expected direction

(Figure 2(B)). However, mean change was less extreme than

expected for both diverging conditions (E+/R−; E−/R+), and

participants demonstrated slightly less sensitivity to events

using this response format. Compared to Studies 1-2a, event

judgments were predicted (R2 = .60) by a weighted combination

of events (w = .65) and reactions (.35) (see Figure 2(E)).

In contrast, the frequency-count format did increase partici-

pants’ sensitivity to reaction change. The computed condi

obabilities were uniquely influenced by the actual conditional

probabilities of targets’ reactions (Table 2). As shown in Fig-

ure 2(C), the means for the diverging conditions (E−/R+,

E+/R−) were different and now comparable to the converging

conditions with corresponding reaction change (E+/R+, E−/R−).

The cue weight analysis (Figure 2(F)) showed that the reaction

measure was best predicted when the reaction weight was rela-

tively high (w = .88) and the event weight was low (.12). How-

ever, Figure 2(F) also reveals that the means in the converging

conditions were less extreme and the reaction measures more

variable (i.e., standard errors larger) than in previous studies,

resulting in a lower peak R2 value (.59).

Summary. As in Study 1, in Studies 2a-b, TRF ratings were

predicted by the actual base-rates of ag

t distinguish between targets who showed equal overall

change, but opposite changes in aggressive reactions. As in

Study 1, participants’ event judgments were sensitive to actual

event rates, though they were somewhat influenced by reaction

rates. For Study 2b, event judgments were influenced by actual

event rates, but were noisier when the frequency-count format

was used. In contrast, the frequency-count format in Study 2b

improved participants’ sensitivity to reaction change: Condi-

tional probabilities derived from participants’ frequency esti-

mates were influenced solely by changes in the conditional

probabilities of targets’ reactions. These results indicate that

people can assess change in reactions but have some difficulty

under the conditions we created, and improve when the fre-

quency-count format is used.

Stu

ur findings

ply to wide

easures (e.g., NEO-FFI; Costa & McCrae, 1992). As we have

noted, some researchers have argued that five-factor measures

may emphasize behavior frequencies less and allow observers

to give greater weight to targets’ conditional reactions (see

Wood & Roberts, 2006) and therefore detect reaction patterns

(Denissen & Penke, 2008). If so, the FFI could distinguish be-

tween our functionally diverging, but act-frequency equivalent

targets. We suggest, however, that the majority of the FFI’s

items are act frequency in nature, and we therefore predicted

that the FFI, like the TRF, would be primarily affected by

changes in the frequency of targets’ trait-relevant behaviors.

Study 3 therefore focused on the FFI domain of agreeableness

and created stimuli that were structurally identical to those used

in Studies 1-2ab, but described a college student showing

(dis)agreeable reactions to (non)aversive events. Although

agreeableness (A) was the main interest, all domains were ana-

lyzed. We expected other domains that were relevant to our

stimuli—extraversion (E) and neuroticism (N)—to behave si-

milarly to agreeableness, and not distinguish between function-

ally diverging targets. We made no predictions for openness (O)

and conscientiousness (C), as these behaviors were not the fo-

cus of the study.

ates (23 W, 1

uctory psyc

imuli had the same event and reaction rates as in Study 1, but

described a 19-year-old sophomore, and focused on agreeable-

A. G. HARTLEY ET AL.

imarily sensitive to changes in act

re 3(A), the three traits most

eral Discussion

This research usch to examine the

perception and ase. Three main

ness. Because the target was an adult, interactions involved pants were sensitive to changes in the social events the target

encountered. Third, participants were sensitive, but somewhat

less so, to the conditional probability of targets’ reactions to

those events when explicitly asked to assess them. These results

support the view that popular child and adult summary meas-

ures assess overall behaviors rather than reactions. They also

demonstrate that such measures can show stability even when

changes occur in people’s reactions to events, and illustrate

how people’s perceptions of change may diverge from conclu-

sions based on their own summary trait ratings.

only peers (rather than peers and adults). An example of an

aversive event paired with a disagreeable reaction is: “Dan’s

lab partner says, ‘I don’t want to do the analyses in the way we

agreed.’ Dan replies, ‘Tough. We’re doing it my way and I’m

not changing my mind.’” The dependent measure was the 60-

item NEO-FFI (Costa & McCrae, 1992).

Results and Discussion

FFI scale scores were pr

frequencies. As shown in Figu We have noted that people might “implicitly contextualize”

items on child behavior checklists and adult personality invent-

tories, even though most items in such measures do not explic-

itly identify the context in which a behavior may occur (see

Denissen & Penke, 2008; Tellegen, 1991; Wood & Roberts,

2006). In this view, the rater infers the situations that are most

relevant and focuses on the target’s conditional responses to

those situations. We predicted, however, that these measures

would primarily assess overall behaviors and show little sensi-

tivity to people’s reaction patterns. Our results supported this

prediction and provided little evidence of implicit contextuali-

zation for either of the measures we studied. The aggression

scale on the child measure (TRF) distinguished between the

targets based on their overall behavior frequencies. However, it

did not distinguish between targets who showed opposite pat-

terns of change in their social environments and how they re-

acted to them. Likewise, domain scores on the adult measure

(FFI) also appeared to be primarily sensitive to overall behavior

and did not distinguish between changes that originated in the

environment versus those that originated in the target’s reac-

tions.

levant to the experiment (A, E, N) showed results that were

similar to those for the TRF in Studies 1 and 2. There were

main effects for reaction condition, F’s(1, 35) > 2.56, ps < .001,

η2’s = .37 (N), .54 (E), and .74 (A), main effects for event con-

dition F’s(1, 35) > 39.36, ps < .001, η2’s = .53 (N), .61 (E), .63

(A), and no significant interactions nor discrimination between

functionally diverging targets. As predicted, participants’ A, E,

and N ratings were best predicted by a weighted combination of

events (.45, .54, .59, respectively) and reactions (.55, .46, .41)

(Figure 3(B)), which were all similar to the ideal act frequency

result. For O, there was a main effect for reaction condition,

F(1, 35) = 19.86, p < .001, η2 = .36, and for C a main effect for

event condition, F(1, 35) = 15.01, p < .001, η2 = .3. Although

the R2 values for O and C were lower than for the other traits, O

ratings were better predicted by reactions (.61) than by events

(.39), whereas the C ratings were better predicted by events (.75)

than by reactions (.25).

Gen

The summary instruments we examined were built on the

assumption that personality is stable and enduring, and there-

fore focus on mean-level behaviors rather than situational influ-

ences (see Cervone et al., 2001). In this regard, our results show

that the TRF and FFI capture precisely what they were designed

to capture: overall behavior. However, our results also highlight

the tradeoffs associated with this emphasis on overall from

changes in the social situations they encounter. Our studies

ed an experimental approa

sessment of behavior chang

ndings emerged. First, two instruments that are widely used in

child and adult assessment enabled raters to detect changes in

overall behavioral tendencies, but did not enable them to dis-

tinguish between targets who showed opposite changes in their

trait-relevant reactions to events. Second, in both temporal

(Study 1) and cross-situational paradigms (Study 2a), partici-

(A) FFI Difference Scores

ean

Diff

erence

-20-10 0+1 0+20

E-R-

E+R-

E-R+

E+R+

AgreeableExtravertedNeurotic(R)Openness Conscientious

E-R-

E+R-

E-R+

E+R+

E-R-

E+R-

E-R+

E+R+

bc c

E-R-

E+R-

E-R+

E+R+

E-R-

E+R-

E-R+

E+R+

aab

0.0 0.2 0.4 0.60.8 1.0

(B) FFI R-Squ are

R-Squared

W eight ed C ues

.00.25 .50.75 1.0

Figure 3.

Results for NEO-FFI for Study 3. Panel A shows mean change scores for agreeableness (A), extraversion (E), neuroticism (N), openness

scientiousness (C). Experimental conditions are on the abscissa. Bars within a panel that do not share a subscript (a)-(c) are (O), and con

significantly different based on Tukey’s HSD. Error bars = +/− 1 SEM. Panel B shows cue weight analysis for FFI judgments for A, E, N, O,

and C. AF = hypothetical perfect sensitivity to act-frequencies

A. G. HARTLEY ET AL.

als

an-

response-

bility of the authors and does not necessarily represent the offi-

cial views of the tal Health or the

Nationa

Vermont.

Achenbach, T. M., Howell, C. T., McConaughy, S. H., & Stanger, C.

(1995). Six-year predational sample of chil-

dren and youth: I. Crs. Journal of the Ame-

o illustrate how summary measures could show that behavior

stable over time or across settings even when an individual

ws clear changes in how they respond to social stim

mate events and conditional reactions at Phases 1 and 2. This

put the retrospective event and reaction ratings at a disadv

sh uli.

These findings suggest that research on change over time and

across settings (see Helson, Jones, & Kwan, 2002; Terracciano

et al., 2009) should not over-rely on summary trait or behavior

measures, but should also incorporate measures that explicitly

examine people’s reaction patterns and the make-up of their

social environments.

Overall, our findings from the event and reaction rating tasks

indicate that, given the right assessment format, participants can

report on events and r

eactions when asked. However, they also

indicated that judgments about reactions, p(R|E), may be in-

herently more difficult than overall frequency judgments be-

cause they require the perceiver to encode how often an event

occurred as well as how often a behavior co-occurred with it.

We attempted to improve participants’ performance in Study 2b

by decomposing the task into its two frequency components:

participants first estimated the frequency of aversive events,

n(E), and then estimated the frequency of aggressive acts to

those events, n(R ∩ E). We then computed conditional prob-

abilities from these two estimates in the usual fashion, p(R|E) =

n(R ∩ E)/n(E). These derived estimates were affected uniquely

by the actual conditional probabilities of targets’ reactions in

the stimuli, and were not influenced by how often targets en-

countered events, as found in Study 1 and 2a. A key challenge

for future research is to determine the task formats that best

enable people to disentangle event rates and reaction rates, but

that are as simple and efficient as possible.

Interpreting participants’ difficulty in judging reactions re-

quires careful attention to our procedure. The reaction measure

in Studies 1-2ab was administered for both Phase 1 and 2 after

participants had filled out the TRFs. Completing the act fre-

quency task first may have framed all subsequent measures in

the experiment and may have influenced participants to think

more as “act frequentists” rather than “contextualists” (see

Schwarz & Oyserman, 2011; Wright et al., 2001). Findings

from the open-ended assessments provide some support for this

interpretation. Participants’ initial descriptions of the targets,

which were provided before they were influenced by other

measures at Phase 1, not only used uncontextualized behavior

statements, but also used simple event statements and condi-

tional if … then … statements about event-reaction links.

Limitations of our studies should be noted. First, although

our experimental approach answers questions about how sum-

mary assessments measure change, our manipulations for the

event and reaction change parameters were larger (.25/.75) than

might typically be observed in natural settings. Additional

laboratory studies will be needed to examine how the TRF, FFI,

and other summary measures (e.g., BFI; John, Donahue, &

Kentle, 1991) perform under a wider range of stimulus ma-

nipulations. It will also be important to examine measures that

appear to give greater emphasis to children’s reactions to events

(e.g., SSRS, Gresham & Elliot, 1990) and those that also focus

on features of the social environment (e.g., Fournier et al.,

2008).

Second, because our focus was on the TRF and FFI, other

measures were either brief (e.g., open-ended descriptions) or

were collected after all stimuli were shown. In contrast to other

research on people’s use of contextual information (Chun et al.,

2002; Wright et al., 2001), our studies required subjects to en-

code multiple interactions over two phases, and only then esti-

tage. However, field studies often involve even more challeng-

ing conditions, in which raters’ are asked to summarize more

complex social interactions over much longer time periods.

Clearly, additional research will be needed to answer questions

about how people use information about situations and reac-

tions under a wide range of stimulus complexity and memory

load conditions (see Chun et al., 2002).

Overall, our findings suggest that instruments widely used to

study personality change research are efficient at assessing

overall behavior change, but ill-equipped to capture nuanced,

context-specific dispositional and environmental change proc-

esses. As a result, these measures may have difficulty revealing

whether behavior change stems from changes in the person, the

environment, or both. Given our findings that people are sensi-

tive to changes in the environment and in people’s reactions

(given the proper assessment format), it should be possible to

develop measures that are more consistent with how people

naturally encode behavior in context and that are better suited

to assess the context-specific aspects of personality change. A

major goal of future research in this area should be to deepen

our understanding of the judgment processes that are engaged

(or disengaged) when informants complete an assessment in-

strument, and use that knowledge to help improve the quality of

assessment practices in research and applied settings.

Acknowledgements

This research was supported in part by award number

R15MH076787 and 3R15MH076787-01S1from the National

Institute of Mental Health. The content is solely the

National Institute of Men

l Institutes of Health. We are especially grateful to

David Freestone, whose programming assistance made it possi-

ble to collect the data reported in Study 2b. We also thank Rus-

sell Church and Elena Festa Martino for their comments on

earlier versions of this work.

REFERENCES

Achenbach, T. M. (1993). Empirically based taxonomy: How to use

syndromes and profile types derived from the CBCL/4-18, TRF, &

YSR. Burlington: University of

ictors of problems in a n

oss-informant syndrome

rican Academy of Child & Adolescent Psychiatry, 34, 336-347.

Achenbach, T. M., & Rescorla, L. A. (2001). Manual for the ASEBA

school-age forms & profiles. Burlington, VT: University of Vermont.

doi:10.1097/00004583-199503000-00020

Cervone, D., Shadel, W. G., & Jencius, S. (2001). Social-cognitive

theory of personality assessment. Personality and Social Psychology

Review, 5, 33-50. doi:10.1207/S15327957PSPR0501_3

Cervone, D. (2005). Personality architecture: Within-person structures

and processes. Annual Rev iew o f Psychology, 56, 423-452.

doi:10.1146/annurev.psych.56.091103.070133

Chun, W. Y., Spiegel, S., & Kruglanski, A. W. (2002). Assimilative

behavior identification can also be resource dependent: The uni-

model perspective on personal-attribution phases. Journal of Person-

ality and Social Psychology, 83, 542-555.

doi:10.1037/0022-3514.83.3.542

osta Jr., P., & McCrae, R. R. (1992). NEO PI-R Professional

Odessa, FL: Psychological Assessment Resources,

CManual.

Inc.

A. G. HARTLEY ET AL.

Denissen, J. J. A., & Penke, L. (2008). Motivational individual reaction

norms underlying the Five-Factor model of personality: First steps

towards a theory-based conceptual framework. Journal of Research

in Personality, 42, 1285-1302. doi:10.1016/j.jrp.2008.04.002

D V. R. (2007). The situation irks, M. A., Treat, T. A., & Weersing,

specificity of youth responses to peer provocation. Journal of Clini-

cal Child & Adolescent Psycho log y, 36, 621-628.

doi:10.1080/15374410701662758

leeson, W. (2001). Toward a structure- and process-integrated view of

personality: Traits as density distributions of states. Journal of Per-

sonality and Social P sychology, 80, 1011-1027.

doi:10.1037/0022-3514.80.6.1011

ournier, M. A., Moskowitz, D. S., & Zuroff, D. C. (2008). Integrating

dispositions, signatures, and the interpersonal dom

Personality and Social P s y chology,

ain. Journal of

94, 531-545.

doi:10.1037/0022-3514.94.3.531

ox, C. R., & Levav, J. (2004). Partition-edit-count: Naive extensional

reasoning in judgment of conditional probability.

mental Psychology: General , 133, 626-

FJournal of Experi-

642.

doi:10.1037/0096-3445.133.4.626

igerenzer, G. (2008). Rationality for mortals: How people cope with

uncertainty. Oxford: Oxford University Press.

ilbert, D. T., & Malone, P. S. (199

G5). The correspondence bias. Psy-

chological Bulletin, 1 1 7 , 21-38. doi:10.1037/0033-2909.117.1.21

resham, F. M., Cook, C. R., Collins, T., Rasethwane, K., Dart, E.,

Truelson, E. et al. (2010). Developing a chan

havior rating scale as a progress monitor

ge-sensitive brief be-

ing tool for social behavior:

An example using the social skills rating system-teacher form.

School Psychology Revie w, 39, 364-379.

Gresham, F. M., & Elliott, S. N. (1990). Social skills rating system

manual. Circle Pines: American Guidance Service.

Hartley, A. G., Zakriski, A. L., Wright, J. C. (2011). Probing the depths

of informant discrepancies: Contextual influences on divergence and

convergence. Journal of Clinical Child & Adolescent Psychology, 40,

1-13. doi:10.1080/15374416.2011.533404

elson, R., Jones, C., & Kwan, V. S. Y. (2H002). Personality change

and Social Psy-

over 40 years of adulthood: Hierarchical linear modeling analyses of

two longitudinal samples. Journal of Personality

chology, 83, 752-766. doi:10.1037/0022-3514.83.3.752

enry, D. B. (2006). Associations between peer nominations, teacher

ratings, self-reports, and observations of malicious and disruptive

behavior. Assessment, 1 3, 241-252.

doi:10.1177/1073191106287668

offenaar, P. J., & Hoeksma, J. B. (2002). The structure of opposition-

ality: Response dispositions and situational aspects. Journal of Psy-

chology and Psychiatry and Allied Health Disciplines, 4

3, 375-385.

pression formation. Personality

Hunsinger, M., Isbell, L. M., & Clore, G. L. (2011). Sometimes happy

people focus on the trees and sad people focus on the forest: Con-

text-dependent effects of mood in im

and Social Psychology Bulletin.

John, O. P., Donahue, E. M., & Kentle, R. L. (1991). The big five in-

ventory—Versions 4a and 54. Berkeley, CA: University of Califor-

nia.

Kammrath, L. K., Mendoza-Denton, R., & Mischel, W. (2005). Incor-

porating if … then … personality signatures in person perception:

Beyond the person-situation dichotomy. Journal of Personality and

Social Psychology, 88, 605-618.

doi:10.1037/0022-3514.88.4.605

ischel, W. (2009). From personality and assessment (1968) to per-M

ch in Personality, 43, 282-290. sonality science. Journal of Resear

doi:10.1016/j.jrp.2008.12.037

Mischel, W., & Peake, P. K. (1982). Beyond déjà vu in the search for

cross-situational consistency. Psychological Review, 89, 730-755.

doi:10.1037/0033-295X.89.6.730

f Personality and Social Psychol-

Reeder, G. D., Monroe, A. E., & Pryor, J. B. (2008). Impressions of

Milgram’s obedient teachers: Situational cues inform inference

about motives and traits. Journal o

ogy, 95, 1-17. doi:10.1037/0022-3514.95.1.1

oss, L. (1977). The intuitive psychologist and his shortcomings: Dis-

tortions in the attribution process. In L. Berkowitz (Ed.), Advances in

experimental social psychology (Vol. 10).

New York: Academic

tereotypes. Journal of Personality and Social Psychology, 63,

Press.

challer, M. (1992). In-group favoritism and statistical reasoning in

social inference: Implications for formation and maintenance of

group s

61-74. doi:10.1037/0022-3514.63.1.61

chwarz, N., & Oyserman, D. (2011). Asking questions about behavior:

Self reports in evaluation research. In Melvin, M., Donaldson, S., &

Campbell, B. (Eds.), Social Psychology

and Evaluation. New York:

a0015072

Guildford Press.

mith, E. R., & Collins, E. C. (2009). Contextualizing person percep-

tion: Distributed social cognition. Psychological Review, 116, 343-

364. doi:10.1037/

tterns and their interpersonal consequen-

Smith, R. E., Shoda, Y., Cumming, S. P., & Smoll, F. L. (2009). Be-

havioral signatures at the ballpark: Intraindividual consistency of

adults’ situation-behavior pa

ces. Journal of Research in Personalit y, 43, 187-195.

doi:10.1016/j.jrp.2008.12.006

ellegen, A. (1991). Personality traits: Issues of definition, evidence

and assessment. In W. Grove, & D. Cicchetti (Eds.), Th

about psychology: Essays in ho

Tinking clearly

nor of Paul Everett Meehl (pp. 10-35).

006

Minneapolis: University of Minnesota Press.

erracciano, A., McCrae, R. R., & Costa Jr., P. (2009). Intra-individual

change in personality stability and age. Journal of Research in Per-

sonality, 44, 31-37. doi:10.1016/j.jrp.2009.09.

14.79.3.344

Trope, Y., & Gaunt, R. (2000). Processing alternative explanations of

behavior: Correction or integration? Journal of Personality and So-

cial Psychology, 79, 344-354. doi:10.1037/0022-35

Vansteelandt, K., & Van Mechlen, I. (1998). Individual differences in

situation-behavior profiles: A triple-typology model. Journal of Per-

sonality and Social P sychology, 75, 751-765.

doi:10.1037/0022-3514.75.3.751

atson, D. (2004). Stability versus change, dependability versus error:

Issues in the assessment of personality over

search in Personality, 38, 319-350

time. Journal of Re-

. doi:10.1016/j.jrp.2004.03.001

Wood, D., & Roberts, B. W. (2006). Cross-sectional and longitudinal

tests of the personality and role identity structural model (PRISM).

Journal of Personalit y, 74, 779-810.

doi:10.1111/j.1467-6494.2006.00392.x

right, J. C., Lindgren, K. P., & Zakriski, A. L. (2001). Syndromal

versus contextualized personality asse

ronmental and dispositional determinant

ssment: Differentiating envi-

s of boys’ aggression. Jour-

nal of Personality and Social Psychology, 81, 1176-1189.

doi:10.1037/0022-3514.81.6.1176

right, J. C., & Mischel, W. (1987). A conditional approach to dispo-

sitional constructs: The local predictability of social behavi

nal of Personality and Social Psych

or. Jour-

ology, 53, 1159-1177.

doi:10.1037/0022-3514.53.6.1159