Development and Validation of a Short Version of the Primary Scales of the Inventory of Personality Organization: A Study among Japanese University Students ()
1. Introduction
The concept of borderline personality and borderline personality disorder has been much debated. The widely accepted Diagnostic and Statistical Manual of Mental Disorders III to 5 indicates that both the definition and the operationalized diagnostic criteria be evidenced-based on observable symptoms and behaviour characteristics. However, researchers from a psychodynamic perspective view the pathology of borderline personality in terms of intrapsychic dynamics such as defence patterns and object relationships. Kernberg (1975, 1984) introduced the structural concept of “borderline personality organization (BPO)”. This facilitates understanding of its organization and its association with assumed aetiology and treatments. In Kernberg’s (1975, 1984) model, the diagnosis of BPO is based on three structural dimensions. These dimensions include primitive defences (PD), identity diffusion (ID), and reality testing (RT) disturbance. People with PD are characterized by their repetitive and inflexible maladaptive defence mechanisms including splitting, projective identification, and denial. ID is the lack of integration of the concept of the self or significant others. Their narratives of their self and others are chaotic. RT is the capacity to differentiate self from non-self and intrapsychic from external origins of perceptions and stimuli. People with borderline personality are likely to fail to differentiate the two.
In order to apply this concept in clinical and research settings, Kernberg and Clarkin (1995) developed a self-report measure of borderline personality organization: the Inventory of Personality Organization (IPO). The IPO is one of the most widely used scales in the world that can measure the borderline personality structure. The IPO was translated into Japanese and successful exploratory factor analyses (EFAs) were conducted (Igarashi et al., 2009). This was used for some clinical studies (Igarashi et al., 2010a, 2010b; Uji et al., 2013).
However, since this 57-item scale is quite lengthy, it imposes a burden on research participants in measuring BPO. There have been reports about the scale’s short versions. They include the French 20-item version (Verreault et al., 2013) and the German 16-item version (Zimmermann et al., 2013, 2015). The French 20-item version was developed using a sample of Quebec university students and adults (Normandin et al., 2002; Verreault et al., 2013). It consists of five, six, and nine items for measuring PD, ID, and RT, respectively. However, the process of selecting 20 items when creating the short version was not specified. The goodness-of-fit (CFI = 0.932) of the French short version was not sufficient (Verreault et al., 2013) in that was below the 0.95 cutoff recommended by Hu and Bentler (1999). The German version, on the other hand, has been developed with a German clinical sample. It consists of five, six, and five items in PD, ID, and RT, respectively. The goodness-of-fit (CFI = 0.913) of the German short version was also not sufficient (Zimmermann et al., 2013).
Abridgement of the long version of a psychological measure has pros and cons. It has been criticised that a long version is often shortened without careful consideration (Coste et al., 1997; Kleka & Soroko, 2018; Koğaer, 2020; Schipolowski et al., 2014). Our rationale followed Stöber and Joormann’s (2001) study that selects items for each domain that show (a) high correlations with the total score of the measure, (b) high correlations with the score of the respective domain, and (c) high intercorrelations. This means (a) the total score of the short version shows high correlation with the long (original) form, (b) the items are representative of the respective measure domain subscale, and (c) the domain subscales show satisfactory Cronbach’s alpha. In addition, we considered that (a) the selected items of a short version should yield a factor structure corresponding to the theory of the measure, (b) the factor structure resulted from abridgement should be stable over groups of different demographic feature (measurement invariance) [reflected by recommendations made by the COSMIN Study Design checklist (Mokkink et al., 2018)], and (c) the subscale scores of the short version should show similar associations with the external validity measures as those of the long (original) version. Construct validity is the degree to which a test measures what it claims to be measuring (Cronbach & Meehl, 1955). In the case of borderline personality traits, we anticipate that they will be associated with state depression, malfunctioning coping styles, lower resilience and self-esteem, narcissistic traits, feelings of shame and guilt, the perceived low care and over-protection parenting style during childhood, and social desirability.
The present study aimed to develop a short version of the IPO using data from Japanese university students that were used in our previous studies (Hiramura et al., 2008; Igarashi et al., 2010a, 2010b; Kitamura & Nagata, 2014; Liu et al., 2009; Sakata et al., 2013; Shikai et al., 2008, 2014; Takagishi et al., 2014; Uji et al., 2008, 2009a, 2009b; 2011, 2012, 2013). We also studied its configural, measurement, and structural invariances between men and women. The correlations of the IPO subscales with other variables were examined for their construct validity.
2. Methods
2.1. Study Procedures and Participants
We used the data which were reported previously (Igarashi et al., 2009). This was a longitudinal study with a nine-wave four-month weekly follow-up on various psychological issues conducted among a convenience sample of students from two universities in Kumamoto, Japan. Anonymity was assured and participation was voluntary. These data consisted of 642 eligible students. The main focus was the 504 students who attended and responded to the present survey at the 7th wave (when the IPO was included in the questionnaire). The male to female ratio was 120:383 (gender was missing in one student). Of the 642 students, some missed their classes, consequently over time varying number of students responding to the survey (Wave 1, n = 425; Wave 2, n = 433, Wave 3, n = 437, Wave 4, n = 433; Wave 5, n = 438, Wave 6, n = 436, Wave 7, n = 504, Wave 8, n = 436, and Wave 9, n = 441). Every Wave was separated with one to two week duration, except for the period of four weeks from Wave 7 to Wave 8.
2.2. Measurements
Borderline personality trait was measured using the IPO (Clarkin et al., 2001; Kernberg & Clarkin, 1995). This is a self-report instrument derived from the central concepts of Kernberg’s (1970, 1975) personality organization model. The IPO has 83 items measured on a 5-point scale (never true = 0 to always true = 4). The main domains were PD (16 items), ID (21 items), and RT (20 items). Two additional scales―Aggression (18 items) and Moral Values (8 items, along with two additional items in PD and one additional item in ID)―were included by Clarkin et al. (2001). With permission from Dr. Clarkin the original author (personal communication, 13 December 2002), the IPO was translated into Japanese by one of us (TK) with back translation into English by an individual who was unaware of the original instrument so that the accuracy of the Japanese translation was verified (Igarashi et al., 2009). The scores of each scale were calculated as person mean substitution, in which a mean of all the scale items was divided by the number of items available (Elliot & Hawthorne, 2005; Hawthorne & Elliot, 2005). Participants responded to the IPO at Wave 7. Since our main focus was on development of a short version related to the primary three scales (PD, ID, and RT), we excluded items of Aggression and Moral Values from the subsequent analyses.
State depression was rated by the Self-rating Depression Scale (SDS; Zung, 1965). SDS is a widely-used self-report measure of depressive symptoms. The SDS consists of ten items on a 4-point scale (never = 0 to almost always = 3). Using a Japanese university student population, Kitamura et al. (2004) reported a three-factor structure for the scale: affective, cognitive, and somatic. In the present study we used seven SDS items from the affective category. The SDS was given at Wave 7.
Coping style was rated by the Coping Inventory for Stressful Situations (CISS; Endler & Parker, 1990). The CISS is a self-report of perceived coping patterns. The CISS consists of 48 items on a 5-point scale (not at all = 0 to very much = 4). There are three subcategories: Task-oriented Coping, Emotion-oriented Coping, and Avoidance-oriented Coping. Task-oriented Coping is believed to be adaptive. It outlines priorities, determines a course of action, and follows through with the action involved. Emotion-oriented Coping is less adaptive. It is characterised by blaming oneself about the situation or events and becoming preoccupied with worrying about them. Avoidance-oriented Coping is also less adaptive. It involves participation in non-problem-solving behaviours as a way of ignoring the problem. Furukawa et al. (1993), in the Japanese translated version, demonstrated the CISS reliability and validity. The CISS was distributed to the participants at Wave 1.
Resilience was rated by the Resilience Scale (RS; Wagnild & Young, 1993). The RS is a self-report scale consisting of 25 items. Although the original RS was rated on a 7-point scale (disagree = 0 to agree = 6), we modified the number of choices to five in order to adjust the number of the choices to that match most of the other questionnaires in this study. The RS was distributed to the participants at Wave 4.
Narcissistic trait was rated by the Narcissistic Personality Inventory (NPI; Raskin & Hall, 1979). The NPI is a self-report measure initially with 233 items being divided into two forms. Emmons (1984) shortened it into a 54-item scale. For the Japanese adaptation, Oshio (2004) developed an 18-item measure (NPI-S) with three subcategories: Feeling Superior (6 items), Desire for Admiration (6 items), and Assertiveness (6 items). The NPI-S was rated on a 5-point scale (disagree = 0 to agree = 4). The NPI-S was distributed to the participants at Wave 5.
Self-conscious emotions were measured with the Test of Self-Conscious Affect-3 (TOSCA-3; Tangney et al., 2000). This is a self-report of six self-conscious emotions: shame, guilt, externalization, detachment, alpha pride, and beta pride. The TOSCA-3 is a unique questionnaire which shows eleven negative and five positive scenarios with four or five responses reflecting one of the six affects. Each response is rated on a 5-point scale (not likely = 0 to very likely = 4). This was translated into Japanese with permission of the original author with verification via retranslation into English (Hasui et al., 2009). The TOSCA-3 was distributed to the participants at Wave 6.
Perceived rearing during childhood was measured using the Japanese version (Kitamura & Suzuki 1993a, 1993b) of the Parental Bonding Instrument (PBI; Parker et al., 1979). The PBI is a self-report measure to retrospectively assess a parental attitude toward the participant as a child. Each item is scored on a 4-point scale (very unlikely = 0 to very likely = 3). The PBI has two subcategories for each parent: Care (12 items) and Overprotection (13 items). The PBI was distributed to the participants at Wave 3.
Social desirability was rated the Japanese version (Kitamura & Suzuki, 1986) of the Social Desirability Scale (Crowne & Marlowe, 1960). This is a self-report of tendency to respond in socially desirable fashion. It consists of 10 items scored on a 5-point scale. Higher scores indicate socially desirability tendency. This was distributed to the participants at Wave 5.
2.3. Data Analysis
Of the 504 participants, 439 (87%) students completed all 57 IPO items. Two did not fill any of the IPO items. Missing values were tested using Little’s Missing Completely at Random (MCAR) test. If the MCAR test did not reject the null hypothesis, we presumed MCAR. To test cross validity (Cliff, 1983; Cudeck & Browne, 1983; Romera et al., 2008), the sample was randomly split in half, with one half used for the exploratory factor analyses (EFAs) and the other for the confirmatory factor analyses (CFAs).
Using the first halved sample, we calculated the mean, SD, skewness, and kurtosis of each IPO item. We then correlated each item with respective domain subscale score and with the total score of the IPO. We selected three items for each domain. To this end, we ranked each item in respective domain in terms of item-domain score correlations and item-total score correlations. We selected items with highest correlations. The selected three items were correlated with each other.
We used the first halved sample to subject the elected nine items for a series of EFAs. The Keiser-Meyer-Olkin (KMO) index and Bartlett’s sphericity test were used to check a sample’s adequacy for EFAs. We started the EFA with a single-factor model consecutively to 2- and 3-factor models. For the factor extraction we used the most likelihood method and it was rotated with PROMOAX rotation.
Then, we used the second halved sample to compare the EFA-derived factor models using CFAs in terms of goodness-of-fit indices including chi-square (χ2), comparative fit index (CFI), and root mean square error appropriation (RMSEA). A good fit was defined as chi-squared divided by degrees of freedom < 2, CFI > 0.97, and RMSEA < 0.05. An acceptable fit was defined as chi-square divided by degrees of freedom < 3, CFI > 0.95, and RMSEA < 0.08 (Bentler, 1990; Schermelleh-Engel et al., 2003). To start with, we chose the single-factor model because of parsimony. If the 2-factor model showed a significantly better decrease in chi-squared when compared to the single-factor model, we would choose the 2-factor model. Similarly, we compared the 2-factor with the 3-factor models. Internal consistency (reliability) was calculated for each factor using omega (Dunn et al., 2014; Peters, 2014). It is of note that we did not use an alpha coefficient because alpha is an index of reliability that only measures a single-factor structure.
After determining the best fit model, we used the whole sample to examine configural and measurement invariance of this model across men and women. Invariance from one step to the next was interpreted as ‘accepted’ if we identified either (a) a non-significant increase in chi squared for degrees of difference, (b) a decrease in CFI of less than.01, or (c) an increase in RMSEA of less than 0.015 (Cheng, 2007; Desa, 2014). CFI and RMSEA may be better indicators of judging measurement invariance than chi squared. This is because chi squared is sensitive to the sample size and may therefore produce excessive ‘rejection’ rates.
The compatibility of the 9-item short version of the IPO was checked by comparing the scores of the scales of the short version with the scores of their corresponding scales of the full version.
Finally, we correlated the three subscale scores of the short version IPO with the other correlates: SDS, CISS, RS, NPI-S, TOSCA-3, PBI, and social desirability. Because this is a multiple comparison, we set the alpha value at p < 0.001. All statistical analyses were conducted using the Statistical Package for Social Science (SPSS) version 27.0 and Amos 27.0.
3. Results
The MCAR test of these data (χ2 = 77.267, df = 64, p = 0.123) indicated that the data were missing completely at random. Then using the first halved sample, we checked the mean and SD of each of the IPO items. Neither skewness nor kurtosis was substantially high for any IPO item (Table 1). The three items with the highest item-domain and item-total score correlations were selected for each domain. For PD they included items 9 (“behave in contradictory ways”), 15 (“people either overwhelm me with love or abandon me”), and 16 (“feel things with either joy or despair”). For ID included were items 21 (“fluctuate between being warm and cold”), 29 (“important people suddenly change their attitudes towards me”), and 32 (“see myself in different ways at different times”). For RT were items 39 (“not sure whether a voice I have heard is my imagination”), 48 (“can’t tell whether certain physical sensations are real”), and 55 (“can’t tell whether I simply want something to be true”). Hence, these nine items were subjected to EFAs. The items of each domain subscale showed moderate correlations with each other (0.41 to 0.53, 0.38 to 0.53, and 0.56 to 0.69 for PD, ID, and RT, respectively).
Both the KMO index (0.838) and Bartlett’s sphericity test, χ2 (36) = 770.475, p < 0.001, showed adequacy of the first halved data for EFA. We started a single-factor model where all the nine items showed a factor loading > 0.3 (Table 2). In a 2-factor model, items suggesting PD and ID (i.e., items 9, 15, 16, 21, 29, and 32) were all loaded on the second factor whereas items suggesting RT (i.e., items 39, 48, and 55) were loaded on the first factor (table not shown). In a 3-factor model (Table 2), each factor loaded the three items expected from theory. Factors 1, 2, and 3 represented RT, ID, and PD, respectively.
For CFAs among the second halved sample, the single-factor model did not show an acceptable fit (χ2/df = 4.585, CFI = 0.840, and RMSEA = 0.120). When compared with this model, the 2-factor model demonstrated a significant decreased in chi squared (from 123.795 to 90.622 for df diference of 1). This was also the case for the difference between the 2- and 3-factor models. We concluded that the 3-factor model was the best. The CFA of this model’s fit with the data was acceptable (χ2/df = 1.768, CFI = 0.970, and RMSEA = 0.056; Figure 1). Internal consistency was good for PD (ω = 0.694), ID (ω = 0.727), and RT (ω = 0.792). There were moderate correlations between the three factors: PD vs. ID = 0.73, PD vs. RT = 0.74, and ID vs. RT = 0.58.
When men and women were compared, the present 9-item 3-factor model showed configural, metric, scalar, factor variance and factor covariance invariance stability (Table 3). Factor means did not differ between men and women in terms of PD and RT. However, the factor mean of ID was significantly higher
Table 1. Mean, standard deviation, skewness, and kurtosis and IT correlation of the IPO items (n = 256).
Note. IT correlation indicates the correlation between the item and the score of the domain it belongs to. IPO = Inventory of Personality Organization.
Table 2. Exploratory factor analyses of the nine IPO items (n = 256).
Note. IPO = Inventory of Personality Organization.
Note. The number in the rectangle refers to the item number of the Inventory of Personality Organization. CFI = comparative fit index; RMSEA = root mean square error approximation.
Figure 1. Confirmatory factor analysis of the short version of the inventory of personality organization.
Table 3. Multiple group structural equation modelling of the 3-factor structure in men and women (n = 504).
Note. NS = not significant; Δ, difference; Ref. = reference; CFI = comparative fit index; RMSEA = root mean square error of approximation; *p < 0.05; ***p < 0.001.
among women than men.
The correlations between the scores of short and long (original) versions of the IPO were excellent (PD = 0.83, ID = 0.85, and RT = 0.85, p < 0.001, respectively). Thus, the scores of the short version reflected the scores derived from the long (original) version.
When examining the three subscales of the short IPO with the other correlates, the IPO subscales were in most cases, significantly correlated with depression, Emotion-oriented Coping, Desire for Admiration, Shame, and perceived affectionless control of the two parents (low care and overprotection). This was generally the case when we used the scores of the long (original) version. However, the three subscales differ in a few cases. Thus, the narcissistic trait of Desire for Admiration was slightly less significant for RT using the short version. Guilt was correlated only with ID, which was, however, not correlated with Externalisation. ID was also not correlated with low care and overprotection of the two parents (Table 4).
4. Discussion
This study found that the IPO short version with 9-items having a 3-factor structure had an acceptable fit with the data. The model indicated good reliability (internal consistency). These three factors were in line with Kernberg’s (1975, 1984) theory. In addition, the factor structure between men and women was consistent, which maintained the stability of the factor covariance. The abridged subscales (PD, ID, and RT) also showed fairly high correlations with the full version subscales. The subscales of the short and long (original) versions were correlated in a similar pattern with the other correlates. This means that the use of the short version is comparable to the use of the full version. It is interesting that the items we selected for the short version overlapped, to some extent, to those of the German 16-item IPO (Zimmermann et al., 2013). They were items 9, 29, 39, 48 and 55. Hence they may be core symptoms of borderline personality organization.
Construct validity of the short version of the IPO may be supported. BPO is
Table 4. Correlations of the three IPO subscales (short version) with the other correlates (n = 504).
*p < 0.05; **p < 0.01; ***p < 0.001. The figures on the left and right represent those using the short and long (original) versions, respectively.
frequently associated with depression. For example, in a follow-up study of 290 patients with DSM-III-R borderline personality disorder, Zanarini et al. (2004) reported more than 80% of them had comorbidity of Major Depression. As expected, all three subscales of the short IPO showed significant correlations with depression scores.
Individuals with BPO find difficulties in interpersonal relationships and this may be based on their maladaptive coping styles. Thus, Wingenfeld et al. (2009) reported that as compared with healthy controls, individuals with BPO were likely to use emotion-oriented coping styles and less likely to use task-oriented coping styles particularly in situations where they were unable to manage sufficiently. Our data were in line with Wingenfeld et al.’s (2009) findings.
In a review of the theory of resilience (Richardson, 2002), resilience is viewed as personal and interpersonal quality to survive in the face of adversity that involves coping with stressors in a manner resulting in the identification, fortification, and enrichment of protective factors. Some authors noted that people with BPO lacked resilience but without empirical evidence (Fonagy et al., 2017; Paris et al., 2014). The present study provides evidence that all three subscales of the short version of the IPO were correlated with low resilience.
Kernberg and colleagues emphasized closeness of BPO and narcissism (Clarkin et al., 2006). However, Miller et al.’s (2010) study found no correlation between the scores of the NPI (that was also used in the present study) and any of the BPO measures. This was in contrast with our results. It is, nevertheless, of note that in our results only the Desire for Admiration subscale was significantly correlated with the IPO scores. In addition, the Assertiveness subscale was, though not significantly, associated negatively with ID subscale scores. Hence, summation of all the NPI item scores may have diluted significance.
It has been noted that people with BPO are characterized by excessive shame feelings. Even though shame is a type of self-conscious emotion, less has been studied about the links of BPO and other types of self-conscious emotions. In a study of undergraduate students, Peters & Geiger (2016) reported that BPO traits were positively correlated with shame and externalization and negatively correlated with guilt. This study did not use scales to assess the two types of pride. We used the TOSCA as in Peters & Geiger’s (2016) study. We found that while Shame was correlated with all the three IPO subscales, Guilt was correlated with ID, whereas Externalization was correlated with PD and RT. Pride scores showed no correlations with the IPO scores. This suggests that when facing an unpleasant event, PD inhibit manifestation of guilt feeling and promotes converting of responsibility to others (Externalization).
BPO has been frequently linked to adverse experiences in childhood. For example, our past report using this sample demonstrated that IPO scores were correlated with the scores of child abuse experiences (Igarashi et al., 2010a). Nickell et al. (2002) reported in a study of undergraduate students that BPO was correlated low care and overprotection of fathers and mothers. Our results were in line with this study. In addition, we found that such correlations were for PD and RT but not for ID. Such discriminant association may need further studies regarding developmental the aetiology of different facets of BPO.
This study is not without limitations. First, we used a university student population with heavy emphasis on women. We might obtain a different result if we used a non-student population with a wider range of age. However, borderline personality is more prevalent among younger people, therefore we did not consider it a major drawback. A clinical population should also be studied in order to examine if the factor structure varies according to clinical status (Hessel et al., 2021). Second, this study only measured the IPO at a single time point and therefore we were unable to check measurement invariance across multiple assessment times. Further studies should adopt a follow-up research design. Third, most of the other correlates with which the short IPO scales were correlated a few weeks separately from the time when the IPO was distributed. This was because of avoiding excessive burden on the participants. Nevertheless, it might bias the results. In addition, we did not use a clinical diagnostic interview against which the results from the self-report should be validated. The data also should be compared with other relevant clinical indices so that we could measure construct validity. Finally, even though cultural differences are of great research and clinical importance, that kind of investigation was beyond the scope of our study. International collaborative studies are a major agenda of future studies.
Abridgement of a psychological measure should be based on sound psychometric properties of the long (original) measure. The factor structure of the IPO has been reported by several authors. For example, Lenzenweger et al. (2001) studied university students and found that a 3-factor structure fit better than a 2-factor structure although they preferred the 2-factor structure (that combined PD and ID) because the improvement of the 3-factor structure was slight. Ellison and Levy (2012) reported that the primary scales of the IPO had a 4- rather than 3-factor structure among a non-clinical population (N = 1260). Berghuist et al. (2009) reported that the Dutch version of the IPO (with original 5 subscales) had a 4-rather than 5-factor structure. They determined the number of factors by a scree plot. This is, however, rather arbitrary: different models should be compared in CFAs. These findings suggest that the factor structure of the original primary IPO scales is still debatable. Our approach was, therefore, to elicit items that reflect each of the three primary scales of the IPO.
5. Conclusion
The 9-item short version of the Japanese IPO has a robust factor structure, good internal consistency and measurement invariance. This instrument is easy to use and worth applying in clinical as well as research settings in Japan.
Acknowledgements
The authors thank the students who participated in this study. We would also like to thank Drs. Edward Barroga and Sarah E. Porter for editing this article. The present study and its analysis plan were not preregistered in an independent registry.
Author Contributions
Conceptualization (TK); Methodology (TK); Data collection (TK); Statistical analysis (FY, TK); Writing (FY, YK, TK); Project administration (TK). All the authors have read the manuscript, agreed to its content, and are accountable for all aspects of its accuracy and integrity.
Data Availability Statement
All the data used in this study are available upon reasonable request to the senior author.
Ethics Approval
This study has been approved by the Ethical Committee of Kumamoto University (Institutional Review Board; Epidemiology No. 19) where the main researcher was working and the Ethical Committee of Kitamura Institute of Mental Health Tokyo (No. 2020080101). The present manuscript is an original work and not being considered or reviewed by any other publication. Nor has it been published elsewhere in the same or a similar form.