Development and Validation of the Japanese-Translated Version of the Multiple-Choice Questionnaire of Depression Literacy (MCQ-DL)

Although depression literacy plays a key role in encouraging people with depression to seek professional treatment, there exist no measures of depression literacy in Japan that are comparable to those validated in English-speaking countries. The present study therefore developed and validated a Japa-nese-translated version of the Multiple-Choice Questionnaire of Depression Literacy (MCQ-DL), which is rated as being of high quality by recent systematic reviews. We conducted an online two-wave survey (N T1 = 325, N T2 = 180) and examined the psychometric properties of the full-item (27 items) and short (10 items) versions of the Japanese-translated MCQ-DL. Results provide several points of validity evidence for both versions as measures that capture individuals’ depression literacy profiles: 1) one-factor structures of these versions were supported by the data; and 2) the items used in both versions had a variety of difficulty and discrimination indices. Results also indicate several limitations of the Japanese-translated MCQ-DL for use in corre-lation-based and multivariate analyses: 1) internal consistencies seem insufficient (α = .68) and poor (α = .28) for the full-item and short versions, respectively; 2) the test-retest reliability was insufficient for the short version (r = .51, p < .001, 95% CI [.40, .60]), 3) both the full-item


Introduction
Depression affects 350 million people worldwide (World Health Organization, 2012a) and is estimated to be the largest factor in disease burden by 2030 (World Health Organization, 2008). Although early interventions are highly necessary for reducing the burden of depression, less than 30% of people with depression in most countries receive any professional treatment (World Health Organization, 2012b). To address this service gap for depression and other mental disorders, the previous research has frequently focused on mental health literacy, which indicates the presence of comprehensive knowledge and skills necessary for recognizing and managing one's own or others' mental health problems (Jorm, 2012). To improve mental health literacy, many educational programs has been conducted thus far, including the Defeat Depression Campaign (Paykel, Hart, & Priest, 1998;Paykel et al., 1997;Rix et al., 1999), the Time to Change Campaign (Evans-Lacko et al., 2013;Henderson & Thornicroft, 2009), and the Mental Health First Aid (MHFA) Programs (Jorm, Blewitt, Griffiths, Kitchener, & Parslow, 2005;Kitchener & Jorm, 2002a, 2002b. Previous evaluation studies of these programs typically have utilized measures that focused only on one specific component of mental health literacy, including measures of: 1) the ability to recognize mental disorders (Kitchener & Jorm, 2002b), 2) the recognition of effective treatments for depression (Paykel et al., 1998;Paykel et al., 1997), 3) the etiology of depression (Paykel et al., 1998;Paykel et al., 1997), and 4) the knowledge needed to manage one's own depression (Rix et al., 1999). More recent studies, on the other hand, have developed and utilized measures that contain multiple components of mental health literacy.
For example, Gabriel and Violato (2009)  According to the systematic reviews conducted by Kutcher (2015, 2016), there exist three measures written in English that capture multiple components of depression literacy, including those developed Psychology by Gabriel andViolato (2009), Hart et al. (2014), and Kiropoulos, Griffiths, and Blashki (2011). Among these three measures, the MCQ-DL (Gabriel & Violato, 2009) is rated as being of the highest quality (Wei et al., 2016). In particular, internal consistency and structural evidence for validity of the MCQ-DL are rated as better than the other two measures of depression literacy (Wei et al., 2016).
Moreover, it has been revealed that this questionnaire is usable in multivariate analyses. Foster, Elischberger, and Hill (2018), for example, picked 10 items from the questionnaire 1 and showed that depression literacy mediates the relationship between socioeconomic status and stigmatizing attitudes toward depression.
In Japan, however, there exist no measures of depression literacy comparable to those validated in English-speaking countries. Although Yamakawa et al. (2012) developed the first measure of depression literacy in Japanese, which was based on the information about depression provided to the public, this measure has two weak points in its items and format. First, they originally developed nine items without referring to the measures validated in English-speaking countries. Therefore, the cross-cultural comparability of the measure is limited. Second, they used a true-false response format. In such formats, participants are likely to provide a number of correct responses by chance, even when they are not confident at all in their choices. More robust formats, such as the four-choice format used by Gabriel and Violato (2009), are needed to accurately capture detailed profiles of participants' literacy.
The present study, therefore, developed a Japanese-translated version of the MCQ-DL (Gabriel & Violato, 2009) and conducted initial validation of the measure. We also examined whether the 10-item version adopted by Foster et al. (2018) has similar psychometric properties to the full 27-item measure. We expected the short version would be easily incorporated into the evaluation studies of educational programs regarding depression in Japan, including those focusing on stigma reduction (e.g., Kashihara, 2015;Kashihara & Sakamoto, 2018) and the promotion of helping behaviors toward people with mental health problems (Hashimoto et al., 2016;Kato et al., 2010;Suzuki et al., 2014). We conducted a two-wave survey to examine the psychometric properties-including internal consistencies, factor structures, item difficulty and discrimination indices, correlations with other related measures, and test-retest reliabilities-of the full-item and short versions of the translated questionnaire.

Participants and Procedure
The protocol of the present study was approved by the Committee of Research Ethics at the College of Humanities and Sciences, Nihon University. A total of 1 S. D. Foster and colleagues (personal communication, November 28, 2017) picked up 10 items (Items 1,2,4,5,8,10,11,22,23,and 27; see Appendix) from the original version with the intention of capturing the variety of components included in the full-item version developed by Gabriel and Violato (2009). Psychology 461 Japanese people who registered with the online survey system of Cross Marketing Inc. (https://www.cross-m.co.jp/) answered the baseline questionnaire (T1), which included all measures described in the Measures subsection.
We asked them to answer the follow-up questionnaire three weeks later (T2), and 355 of them participated in the T2 survey, which only included the Japanese-translated version of the MCQ-DL.
As pointed out by Krosnick (1991) and Krosnick, Narayan, and Smith (1996), some survey participants give careless answers without making deliberations concerning the items to save cognitive effort. This behavior, called satisficing, influences data quality and decreases statistical power (Maniaci & Rogge, 2014). Considering that satisficing is commonly observed in online surveys as well as other survey forms (e.g., telephone, paper-and-pencil) (Fricker, Galesic, Tourangeau, & Yan, 2005;Gosling, Vazire, Srivastava, & John, 2004), we embedded two trap items to detect satisficing in both the T1 and T2 questionnaires. Sample trap items included "In this item, please ignore the following question and simply tick the choice on the top of the screen. What is the risk of death by suicide among clinically depressed patients?" We removed 113 participants at T1 and 71 participants at T2 who gave at least one incorrect answer to these trap items from the following analyses.
We then removed 23 participants (13 of them participated in the T2 survey) who answered "I have depression" on the Level of Contact Report (mentioned later in the Measures subsection) at T1 from the analyses, considering that we aimed to examine the stigma held by people without depression. As a consequence, we used data from the remaining 325 participants (177 female, 148 male; M age = 46.44, SD = 13.40) at T1 and 180 participants (98 female, 82 male; M age = 48.31, SD = 12.79) at T2 for the further analyses.

Depression Literacy
The items of the original English version of the MCQ-DL (Gabriel & Violato, 2009) are shown in the Appendix section. We developed a Japanese-translated version of the questionnaire that was semantically equivalent to the original one by following a translation/back-translation procedure, as detailed in Brislin (1970Brislin ( , 1980 and Brislin, Lonner, and Thorndike (1973). First, two professional translators translated the original version into Japanese. Second, the authors of the present research modified some words and phrases used in the draft of the Japanese-translated version (explained in detail in the next paragraph). Third, two professional translators and a professional proofreader, who did not participate in the Japanese translation, back-translated the modified Japanese version into English. Fourth and finally, the authors of the original questionnaire compared the original and back-translated versions and confirmed that the two versions were semantically equivalent.
In the process of developing the Japanese-translated version, we modified two items of the original questionnaire. First, we changed the query of Item 3 from Psychology "What are the lifetime chances of becoming clinically depressed?" to "Which of the following is closest to the lifetime chances of becoming clinically depressed (i.e., the proportion of people who get depression at least once in their lives)?" We added an instruction for the participants to choose the closest choice, considering that the lifetime prevalence of clinical depression is estimated to be around 16% in the Diagnostic and Statistical Manual of Mental Disorders, 5 th ed. (American Psychiatric Association, 2013) and is lower than the rate of "one in three" that was regarded as the correct choice in the original questionnaire. We also explained the phrase "lifetime chances" in parentheses to ensure the participants would understand the meaning of this phrase. Second, we changed one of the choices in Item 18 ("Which is NOT a recognized treatment for clinical depression?") from "Kiekie therapy" to "Herbal medicine," considering that the plant Kiekie is unfamiliar to Japanese people.
The participants ticked one out of four choices for each item listed in the questionnaire. The scoring procedure and α coefficients are detailed later in the Results section. The authors are glad to share the Japanese-translated version of the questionnaire upon request.

Stigmatizing Attitudes toward Depression
Previous studies frequently showed that stigmatizing attitudes toward depression can be reduced by educational programs and contents aimed at improving depression literacy (Finkelstein & Lapshin, 2007;Griffiths, Christensen, Jorm, Evans, & Groves, 2004;Griffiths et al., 2006;Kashihara, 2015;Kashihara & Sakamoto, 2018;Rusch, Kanter, & Brondino, 2009). In addition, Foster et al. (2018) showed that the short version of the MCQ-DL (Gabriel & Violato, 2009) was negatively correlated with stigma of depression. We therefore anticipated that both the full-item and short versions of the MCQ-DL should exhibit significant negative correlations with stigmatizing attitudes toward depression.
To assess stigmatizing attitudes toward depression, we used the Japanese-translated version  of the Personal Stigma Subscale of the Depression Stigma Scale (Griffiths et al., 2004). This measure consists of two nine-item subscales: the personal stigma subscale assessing participants' negative attitudes toward people with depression, including sample items such as "Depression is a sign of personal weakness."; and the perceived stigma subscale assessing participants' beliefs about the attitudes of the public, including sample items such as "Most people believe that depression is a sign of personal weakness." We used only the personal stigma subscale to examine the correlation between depression literacy and participants' stigmatizing attitudes. This subscale used a 5-point Likert scale with the anchors 1: disagree to 3: neither agree nor disagree to 5: agree; it was found to have sufficient internal consistency (α = .80).

Familiarity with People with Depression
As shown in a previous systematic review (Corrigan & Shapiro, 2010) and me-Psychology ta-analysis (Corrigan, Morris, Michaels, Rafacz, & Ruesch, 2012) on the effectiveness of educational programs for mental disorders, plenty of educational programs have utilized contact-based approaches. Moreover, contact-based education has produced positive outcomes, including the reduction of stigma. It is also reasonable to suppose that people obtain more opportunities to learn about depression as they get familiar with those who have depression. We therefore anticipated that both the full-item and short versions of the Japanese-translated MCQ-DL should exhibit significant positive correlations with familiarity with people with depression.
We assessed participants' levels of familiarity with people with depression using the Japanese-translated version (Kashihara, 2015) of the Level-of-Contact Report (Holmes, Corrigan, Williams, Canar, & Kubiak, 1999). This checklist includes 12 statements reflecting different levels of contact with people with depression, ranging from the least ("I have never observed a person that I was aware had major depression"; rank order score = 1) to the most ("I have depression"; rank order score = 12) intimate contact. Participants ticked every statement that corresponded to their experience. Each participant's level of familiarity was scored by taking the highest rank order score of the statements she or he ticked.

Empathy
As indicated by Jorm (2012), mental health literacy includes skill in recognizing the early signs of mental health problems and others' depressive moods and in expressing their empathic concerns. Foster et al. (2018) also showed that the short version of the MCQ-DL (Gabriel & Violato, 2009) was positively correlated with participants' empathy levels. We therefore anticipated that both the full-item and short versions of the Japanese-translated MCQ-DL should exhibit significant positive correlations with components of empathy.
Participants' levels of empathy were assessed using the Japanese-translated version (Sakurai, 1988) of the 28-item Interpersonal Reactivity Index (Davis, 1980(Davis, , 1983. This measure consists of four seven-item subscales that reflect different dimensions of empathy. The perspective-taking subscale (α = .72) reflects a tendency to understand situations from others' perspectives, and sample items include "I sometimes try to understand my friends better by imagining how things look from their perspective." The empathic concern subscale (α = .53) reflects a tendency to feel sympathy for others, and sample items include "I often have tender, concerned feelings for people less fortunate than me." The fantasy subscale (α = .72) reflects a tendency to become involved in a fictitious world depicted in books, movies, etc.; sample items include "I really get involved with the feelings of the characters in a novel." The personal distress subscale (α = .76) reflects a negative aspect of empathy, in terms of being sensitive and reactive to situational stressors; sample items include "Being in a tense emotional situation scares me." This 28-item measure used a 4-point Likert scale with the anchors 1: disagree to 4: agree; it has a sufficient internal consistency overall (α = .77).

Data Analysis
To validate the Japanese-translated MCQ-DL, we examined the psychometric properties of both the full-item and short versions of the questionnaire based on various kinds of analyses. Using the data obtained at T1, we conducted the following analyses: 1) calculation of coefficients α to examine internal consistencies, 2) confirmatory factor analyses to examine factor structures, 3) item response theory analyses to examine item difficulty and discrimination indices in detail, and 4) correlational analyses to obtain convergent evidence of validity.
Moreover, we examined the test-retest reliabilities of the two versions using both the T1 and T2 datasets. In conducting these analyses, we kept in mind the importance of not applying binary judgements concerning whether these versions were valid or invalid but rather clarifying the purposes for and the extent to which these versions could be usable by following the recommendation of Kane (2006) regarding the process of scale development and validation. We used Mplus version 7 (Muthén & Muthén, 1998-2012 to conduct confirmatory factor analyses and R version 3.3.3 (R Development Core Team, 2017) to perform item response theory analyses. All the other analyses were performed using Stata version 14 (StataCorp, 2015).

Internal Consistencies
On average, the participants provided 17.27 correct answers (SD = 5.73) for the full-item version (27 items) of the MCQ-DL and 3.81 correct answers (SD = 1.64) for the short version (10 items). Compared to the cutoff criteria for reliability coefficients (>.70) proposed by Nunnally & Bernstein (1994), the Cronbach's alpha coefficients were not as large for the full-item version (.68), and they were extremely small for the short version (.28).

Factor Structures
Next, we examined factor structures for both the full-item and short versions.
We then conducted similar analyses to test the one-factor model of the short version. Ten items from the short version were randomly assigned to the fol-

Item Difficulty and Discrimination
We then conducted item response theory analyses based on the two-parameter logistic model (Birnbaum, 1968) to examine item difficulty and the discrimination indices for both the full-item and short versions. As shown in Table 1, except for Item 3 ("What are the lifetime chances of becoming clinically depressed?"), which had extremely high difficulty (35.34), the item difficulty indices for the full-item version was distributed around the zero-point 3 . Although most of the item discrimination indices had positive values, those for Items 5 ("Which of the following, about sex differences in clinical depression is TRUE?") and 24 ("Which is NOT a common occurrence during treatment with antidepressants?") were negative. These negative values indicate that the participants with higher depression literacy were more likely to provide incorrect answers to these items. Items 5 and 24, therefore, seemingly functioned as trick questions.
Similar trends for item difficulty and discrimination indices were found in the short version. As shown in Table 2, except for Item 27 ("Psychotherapy can help many people with clinical depression. Which of the following statements about psychotherapy is FALSE?"), which had extremely high difficulty (42.78), the item difficulty indices for the short version were distributed around the zero-point. Although most of the item discrimination indices had positive values, those for Items 2 ("What is the risk of death by suicide among clinically depressed patients?") and 5 had negative values.

Correlations with Related Variables
Next, we examined correlations between the variables measured at T1 to show convergent evidence for the validity of the full-item and short versions. As 3 The zero-point in item difficulty indices indicate that exactly half of the participants have a latent ability to answer the corresponding items correctly.   Table 3. Descriptive statistics and correlations for the variables assessed by the baseline questionnaire (T1). the full-item nor the short version had significant correlations with the fantasy or personal distress subscales (rs < .08, ps > .182). There was, however, a significant positive correlation between the empathic concern subscale and the full-item version alone (r = .12, p = .037, 95% CI [.01, .22]). Although there were some exceptions regarding these subscales, most of the correlations were in line with our predictions.

Test-Retest Reliability
Finally, using both the T1 and T2 datasets, we examined the test-retest reliabili-

Discussion
The present study developed a Japanese-translated version of the MCQ-DL and It should also be noted that the Japanese-translated MCQ-DL-the short version, in particular-seems to have limitations for use in correlation-based and multivariate analyses. As Cronbach's alpha coefficients indicate, the internal consistencies seem insufficient (α = .68) and poor (α = .28) for the full-item and short versions, respectively. Moreover, the test-retest reliability was insufficient for the short version (r = .51, p < .001, 95% CI [.40, .60]). Although it is inevitable that these reliability coefficients become smaller as the variety and numbers of items increase and decrease, respectively (Sijtsma, 2009) Table 1 and Table 2 for details), it is possible that some items of the MCQ-DL function differently in North American countries and Japan. As indicated by the lack of well-validated measures of depression literacy (see the Introduction section for details) and by the cross-culturally high level of stigma Psychology concerning depression (Griffiths et al., 2006), Japanese people possibly do not have as much detailed knowledge about depression as do North Americans. It is therefore possible that some items concerning "basic knowledge about depression" in North American countries (e.g., Item 19: "Which is NOT a common side effect antidepressant drugs?") are regarded by the Japanese as covering information that is much too detailed.

Limitations and Future Implications
Although the present study provides several points of validity evidence for the  a. Both individual and group talk therapy provides an opportunity to express and discuss thoughts and feelings with the therapist.
b. Therapy may help to resolve life issues that may contribute to depression.