The MDS-7: A Brief Mental Disorder Screener for Improved Diagnostic Accuracy ()
1. Introduction
It has long been known [1]-[3] that initial diagnoses of mental disorders, no matter which type of health professional makes them, are unreliable and cannot be trusted for treatment purposes. Yet this unreliability is routinely ignored by general practice physicians, who make the great majority of mental disorder diagnoses [4], and it is also ignored by most psychiatrists. In this article, I am first going to assume that inter-clinician agreement is a good indicator of diagnostic accuracy and examine the factors that affect it, but only because this is the assumption that everyone makes at present. Then I am going to dismiss inter-clinician agreement and instead argue for the use of a brief new mental disorder diagnostic measure, the MDS-7, that does address accuracy.
It is essential at the outset to point out that agreement between clinicians on the diagnosis does not mean that the diagnosis is accurate. There are five possibilities here. The clinicians could be agreeing on the correct diagnosis, or they could be agreeing on the wrong diagnosis. Or the first clinician could be wrong and the second correct, or the other way round. Or the diagnoses could be different and both of them wrong. My initial purpose, however, is to look at the current situation in which agreement is assumed to signify correct diagnosis, and then to examine how agreement varies by interview method. I do this only because this is the situation that patients or prospective patients currently are faced with. They either, as happens in most cases, go along with the first diagnosis because they trust the clinician, or they seek a second opinion if they do not trust the clinician or simply want confirmation of the first one.
2. Inter-Clinician Agreement Based on Interview Type
There are two forms of diagnostic agreement to be considered. The first is between-clinician, or interrater, agreement. This is typically referred to as interrater reliability, but validity is the real concern because you do not know how accurately each rater is measuring the disorder. The second form of agreement is better described as consistency rather than agreement because it applies to diagnoses of one patient made by two different clinicians at two different times. Consistency not only differs with rater accuracy but also with whether the symptoms or their severity levels have changed in the interval. And again, the accuracy (validity) of each rater’s diagnosis is simply assumed in calculating consistency.
Observed agreement, not kappa
Ever since the 1974 article on inter-clinician agreement by Spitzer and Fleiss [5], researchers have been told to report agreement, and consistency, by using the kappa statistic of agreement, K. The kappa statistic adjusts the observed proportion of agreement by subtracting from it the so-called proportion of agreement expected by chance, which in turn is based on the prevalence of the disorder in the population from which the sample of patients is drawn. Thus, where P stands for proportion: kappa, K, is calculated as equal to P (observed agreement) minus P (chance agreement squared), divided by P (perfect agreement) minus P (chance agreement squared). In theory, kappa can range from 1.00 when observed agreement is perfect and there is no chance agreement, down to .00 when chance agreement is equally as high as observed agreement. Kappa agreement, however, is incorrect and misleading. Firstly, the chance agreement proportion assumes that the mental health professional raters are guessing, which is not only highly unlikely but professionally insulting. Secondly, the prevalence of the disorder, and hence the chance agreement proportion, would differ drastically depending on whether the population and hence the sample being studied consists of members of the public, general medical practitioners’ typical visitors, mental clinic outpatients, or hospitalized inpatients. The following review is based only on those studies that report the straightforward percentage of agreement on the presence of the disorder in prospective or current patients.
Most commonly, the measure is either an unstructured interview, an unstructured interview with presumed memory of the DSM disorder criteria, or a structured interview based on the DSM disorder symptoms.
2.1. Unstructured Interview Agreement
The one study available in the published literature using an unstructured interview for diagnosis without the influence of the DSM diagnostic system is the 1949 study by Ash [1], a study conducted prior to the publication of the first DSM manual in 1952.
Ash’s 1949 study
Ash’s is an extremely valuable study because it involves three practicing psychiatrists working at an outpatient clinic who, in pairs, simultaneously interviewed the patient and then made separate diagnoses. Thus, the symptoms asked about and the information provided by the patient are controlled, leaving rater accuracy as the only variable affecting agreement. The inter-clinician agreement findings from Ash’s study are summarized in Table 1. All numbers in the table are percentages, with the average agreement percentages underlined.
Table 1. Unstructured interview diagnostic agreement (%) in the 1949 study by P. Ash [1]. Three psychiatrists A, B, and C were involved in the study and random pairs of them jointly interviewed a patient (N = 139 patients) and separately made a diagnosis.
Level of diagnosis |
Pair of psychiatrists’ agreement (%) |
Average agreement (%) |
A, B |
A, C |
B, C |
Major (mental deficiency, psychosis, neurosis, personality disorder, no disorder) |
66 |
67 |
58 |
64 |
Specific (approximately 60 disorders to choose from) |
34 |
44 |
34 |
37 |
The first row of the table shows the percentage of agreement when the two psychiatrists had only to choose between major categories, in this case four major categories of mental disorder plus a no disorder category. As shown, the average degree of agreement on the diagnosis when both psychiatrists are presented with the same “data,” is just 64%. This implies that psychiatrists will disagree on the diagnosis for at least one in three presenting patients. If the two psychiatrists agree, then this is the diagnosis that the patient is likely to be treated under (never mind whether the agreed-upon diagnosis is correct). But if they disagree, then there is no way of deciding which is the more accurate for guiding treatment. Interestingly, as shown in Ash’s paper but not included in the table, seeking a third opinion does not help. Ash found that the likelihood of a third psychiatrist being able to break a disagreement by agreeing on the diagnosis with one of the first two psychiatrists is just 51%, or about the same as a coin toss.
Diagnostic agreement, as would be expected, is much lower for the diagnosis of specific disorders, of which there were about 60 for the clinicians to choose from in Ash’s study. As shown in the lower row of data in the table, the likelihood of two psychiatrists agreeing on the specific disorder diagnosis based on the same interview is extremely low at just 37%. However, it can reasonably be argued that main-category diagnoses are more important than specific ones. This is because broad diagnoses largely determine the initial form of treatment—a neurological brain-based disorder indicates referral to a neurologist; a biochemically caused “psychotic” disorder indicates medication will be required; a reactive or “neurotic” disorder indicates that counseling or psychotherapy should be tried first; and diagnosis of no serious disorder means that the doctor should wait and take no action unless serious symptoms emerge on further visits.
If even the broadest unstructured interview diagnoses are not reliably made, as Ash’s results suggest, then psychiatry is in big trouble. What to do about it is discussed later in this article. Next, however, we will look at whether knowledge of the DSM diagnostic system improves diagnostic agreement.
2.2. Unstructured Interview Agreement after Briefing on the DSM
Two studies were available in the literature that tested straightforward inter-clinician agreement from an unstructured interview following a detailed briefing on DSM diagnoses.
Beck et al.’s 1962 DSM-I study
Beck et al.’s 1962 study [3] yields a second valuable estimate of psychiatrists’ agreement on broad mental disorder diagnoses, this time under the assumption that the clinicians can memorize and incorporate DSM disorder criteria, in this case the criteria identified in the first version of the DSM that appeared in 1952 and is now referred to as the DSM-I (see [6]). In this study, all patients had been newly referred to the University of Pennsylvania Hospital’s psychiatric outpatient clinic and remained as outpatients during the study. Random pairs of four university hospital-affiliated and experienced psychiatrists first engaged in several group meetings to discuss the DSM disorder criteria and to try to resolve semantic differences in the disorder labels and symptom wording, a step that, if anything, should have improved the level of agreement. Then each of the two psychiatrists interviewed the same patient a few minutes apart in a “DSM informed” but unstructured interview to make the diagnosis. They had six disorder categories to choose from, of which the three main ones were schizophrenic behavior, depression reaction, and anxiety reaction. The agreement findings are shown in Table 2, where the four psychiatrists are labeled A, B, C, and D, and the numbers shown in the table are percentages of pairwise inter-clinician agreement, with the overall average underlined.
Table 2. Unstructured interview diagnostic agreement (%), after discussion of the DSM-I disorder categories, in the 1962 study by Beck et al. [3]. Four psychiatrists A, B, C, and D were randomly allocated into pairs, and then both psychiatrists separately interviewed the patient a few minutes apart (N = 193 patients) and made a diagnosis. They had six disorder categories to choose from.
Psychiatrist paired with |
Psychiatrist |
A |
B |
C |
D |
A |
– |
61 |
60 |
57 |
B |
61 |
– |
58 |
45 |
C |
60 |
58 |
– |
33 |
D |
57 |
45 |
33 |
– |
Average agreement per psychiatrist (%) |
60 |
55 |
50 |
45 |
Overall average agreement (%): 54 |
|
|
|
The overall average agreement in the pairs of diagnoses, as shown, was just 54%. Psychiatrist D’s agreement with the other three psychiatrists was somewhat lower at 45% but, as Beck et al. note, this is probably because psychiatrists A, B, and C had gained most of their diagnostic experience with outpatients while psychiatrist D’s experience was predominantly with inpatients, who would tend to have more serious disorders.
What is surprising is that the overall average agreement of 54% (or, not shown in the table, 57% based on just the three main categories) is lower or at least no higher than Ash’s finding of 64%. This suggests that using the DSM – or rather trying to remember and apply the DSM criteria after recently having discussed them – does not help. This does not bode well for university training of psychiatrists and clinical psychologists with the DSM, especially now that the current version, the DSM-5, has become much expanded and the symptoms more numerous than in the earlier versions.
American Psychiatric Association’s 2013 DSM-5 field trials
This second study of what may be called DSM memory warrants discussion mainly because it is a telling example of what not to do. This study is based on the clinical trials for the current version of the DSM diagnostic system, the DSM-5, which was introduced in 2013 [7]. There were supposed to be two trials, but the first one, a planned trial of full-time practicing clinicians’ ability to diagnose after training with (a prepublication version of) the DSM-5, was never started because the American Psychiatric Association experts running the trials could not recruit enough practitioners willing to take the time out to participate [8]. This left only the second trial, which had major shortcomings. The raters were not full-time practicing psychiatrists as planned for the first trial but a mix of academic psychiatrists making up about a third of the raters, PhD-level clinical psychologists making up almost half, and nurses, counselors, and social workers the rest. The patients were psychiatric outpatients who had previously been diagnosed with a DSM-IV mental disorder but were unlikely to have been suffering from a serious disorder [9]. After four hours of training on the DSM-5, the raters were supposed to use unstructured interviews to simulate real-world diagnostic practice conditions, but instead they were told to refer to ratings made by the patient beforehand on a set of 23 so-called cross-cutting disorder symptoms, and then make their own ratings on the same 23 symptoms after interviewing the patient [10]. These symptoms were not based on the DSM-5 symptom descriptions but came from various differently worded self-report research questionnaires such as the PHQ and the PROMIS measures (see the footnote to Table 1 in [10]). This of course meant that the trial was not a test of DSM memory at all.
2.3. DSM-Based SCID-5 Structured Clinical Interview Agreement
This third scenario assumes that the DSM-5 is content-valid—which is not an unreasonable assumption aside from the difficulties imposed by the questionable expansion of disorders (see [11] for a summary of the main changes and disorder additions from the DSM-IV). If the DSM itself is content-valid, then any measure based on it must adhere closely to the DSM symptom wording and be sure to apply the hierarchical decision rules pertaining to the required as opposed to optional alternative symptoms. The only measures to do this are the American Psychiatric Association’s structured clinical interviews, the SCID-IV based on the DSM-IV, and the SCID-5 based on the DSM-5 [12]. We can dismiss the World Health Organization’s CIDI structured clinical interview because its symptoms are poorly worded, there is no acknowledgement that informants may be required and, unlike with the SCID, there is no requirement that the symptoms directly cause dysfunction.
Osório et al.’s SCID-5 study
The only SCID-using study that the present author could find on inter-clinician diagnostic agreement that reported percentage overall agreement rather than only kappa agreement was the 2019 study by Osório et al. [13]. This study was conducted in Brazil using a Brazilian translation of the SCID-5-CV (clinician version) questionnaire and involved pairs of 12 experienced clinicians, comprising seven psychiatrists and five clinical psychologists, who interviewed 124 psychiatric outpatients and 29 psychiatric inpatients and 20 individuals with no history of psychiatric or psychological treatment, thus covering a wide range of disorder severity. The clinicians were given approximately 20 hours of training on the SCID-5-CV which included discussion of diagnostic problems mentioned in the SCID-5 users’ guide and practice ratings of five video-recorded SCID-5 interviews conducted by the highly experienced first author of the study. An important feature of Osório et al.’s study was that, as in Ash’s excellent early study [1], both clinicians had the same interview data, this time because the first clinician conducted the interview while making SCID ratings, with the second clinician silently observing and independently making SCID ratings. Osório et al.’s study thus controlled for not only the measure (the SCID) but also the data (joint observation of the patient), leaving the raters’ answer interpretation differences as the only variable to affect the ratings. Osório et al.’s main results are shown in Table 3.
Table 3. SCID-5-CV structured interview diagnostic agreement (%) and test-retest consistency (%) in the 2019 study by Osório et al. [13].
Disorder category |
Agreement (%): One interview rated by 2 clinicians (N = 180) |
Consistency (%): Re-interview 10 to 30 days after (n = 53)a |
Biochemical |
|
|
Schizophrenia |
100 |
89 |
Bipolar I disorder |
100 |
100 |
Major depressive disorderb |
|
|
- lifetime |
100 |
100 |
- current episode |
78 |
50 |
Psychological |
|
|
PTSD |
100 |
100 |
Panic disorder |
100 |
67 |
Social anxiety |
100 |
75 |
Generalized anxiety |
90 |
80 |
aThe remaining n = 86 patients were re-interviewed by telephone, not in person, which is not a sufficiently valid procedure. bMajor depression of the serious melancholic type, which is also present in bipolar I disorder.
The first data column in the table shows the percentage inter-clinician agreement on the SCID-5 diagnoses. For most disorders agreement was 100% perfect with two important exceptions. Agreement was unacceptably low, at 78%, for diagnosing whether the patient was currently suffering from a major depressive episode, and quite possibly disagreements occurred because of failure to distinguish the two types of depression, biochemical and reactive, with their different core symptoms [14].
The other diagnosis not always agreed on was generalized anxiety disorder, GAD. Agreement on GAD in Osório et al.’s study was 90%, but disagreement was 10%, which is surprising when the SCID ([12], p. 71) is so explicit about the symptoms. The SCID-5, it should be noted, seems too conservative in measuring anxiety disorders because it requires ([12], p. 71) that the anxiety be present “more days than not for the past six months.” The problem here is that severe anxiety may be experienced for shorter durations as reactions to transiently stressful events [15].
The second data column in Table 3 shows the degree of consistency for a further SCID-5 diagnosis made 10 to 30 days after the initial SCID-5 diagnosis. Here, the only disorders to show 100% consistent diagnoses were bipolar I disorder (the serious form) and post-traumatic stress disorder. For bipolar I disorder, a lifetime manic episode is required and this would be highly memorable for a close friend or family member and likely to be reported in both interviews. And for post-traumatic stress disorder, PTSD, frequent flashbacks would be all too vividly remembered by the patient on both interviews. Schizophrenia, on the other hand, was just 89% consistent, suggesting that informant reports of patients’ detachment from reality are not always reliable. The neurotic-type disorders were inconsistently diagnosed. Current-episode depression showed consistency of only 50%, although some of this could be because on one of the interviews the episode had passed. Anxiety disorder diagnostic consistency also was well under 100%, which is hard to explain given the interval of just several weeks between interviews.
Overall, then, even the “gold standard” diagnostic method, the SCID structured clinical interview, is not trustworthy enough to be relied on for the initial diagnosis, particularly for so-called mood disorders. Furthermore, the current version, the SCID-5, depending on how many disorders the individual is judged to have symptoms for, takes an hour to 90 minutes for the clinician to administer and is far too time-consuming to be used in practice.
3. A Suggested Solution: The MDS-7 Mental Disorder Screener
The reality is that most suspecting mental disorder sufferers will seek and get only one diagnosis, most likely from a general medical practitioner and made during a normal office visit with unstructured and therefore unguided questioning. A brief, more structured and more accurate mental disorder screener is clearly needed. The present author has developed a screener called the MDS-7 that appears to fit these requirements. It is based on the author’s core symptoms principle [14] [15]. This principle is as follows: If the presenting patient does not have the core symptoms of a particular disorder, then the patient cannot possibly have that disorder. The core symptoms of the disorder are necessary and sufficient to make the diagnosis [16] and the presence of any other symptoms does not matter.
The core symptoms of the main mental disorders covered by the MDS-7 were selected by the present author after careful reading of the DSM-5 diagnostic manual [7] and by referring to the excellent paperback book on mental disorder diagnosis, and differential diagnosis, prepared by U.S. expert psychiatrist Allen Frances, who headed the American Psychiatric Association’s DSM-IV Task Force [11]. I should add that the development of the MDS-7 was also greatly influenced by Australian psychiatrist Gordon Parker’s writings on bipolar disorder and forms of depressive disorder (e.g., [17]) and by his advice over the years on the diagnosis and treatment of the present author’s younger brother, for whom I have guardianship, and who has been suffering for most of his adult life from bipolar I disorder.
The MDS-7 questionnaire is given in Table 4 and the scoring method and suggested disorder hierarchy are given in Table 5. Both can be freely reproduced from this article and can easily be translated into other languages with an online program such as Google Translate. It is suitable for use with teenagers and adults, though not with still-developing children [18].
Table 4. The MDS-7 mental disorder screener questionnaire. The symptom name is underlined and a suggested starter question is given for each. Follow-up clarification questions are left to the discretion of the clinician. Dysfunction is defined as inability to satisfactorily perform one’s usual daily activities.
PATIENT_______________________________________________
*INFORMANT (if needed)_________________________________ |
DATE_______________________ |
CLINICIAN_____________________________________________ |
Symptom (and suggested starter question) |
Present to a dysfunctional level? |
1. Severely depressed mood* “Have you recently had days when you feel so down and depressed that you cannot function at your best?” |
Yes |
No |
2. Severe anhedonia* “Have you recently been feeling really flat and getting no enjoyment from the things that you normally enjoy?” |
Yes |
No |
3. Mental agitation with motor retardation* “Have you recently had days when you felt all stirred up inside but unable to get going and get things done?” |
Yes |
No |
4. Manic episode* “Do you sometimes go into a large upward mood swing – feeling great but realizing or being told you’re a bit over the top and hyperactive?” |
Yes |
No |
5. Semi-paralyzing anxiety “Do you often find yourself becoming very, very anxious and can’t easily calm yourself down? Or have you ever had what is called a panic attack, where your heart races, you get all dizzy, and feel like you will black out? |
Yes |
No |
6. Trauma reaction “Do you often have recurring vivid flashbacks or nightmares about something horrible or upsetting you’ve seen, or have been through?” |
Yes |
No |
7. Psychotic episode* “Do you sometimes hear a weird voice or voices in your head telling you to do things you wouldn’t normally do, or see strange and often haunting visions in your mind?” |
Yes |
No |
*If an informant is being interviewed rather than the patient for symptoms 1, 2, 3, 4, and 7, the starter questions are reworded as “Has X…” or “Does X…” The anxiety and trauma questions 5 and 6 may also not be properly answerable if the patient is very unwell and you may have to wait for a later interview.
Table 5. MDS-7 scoring rules and disorder hierarchy.
Scoring rules |
Reactive depression = (1) severely depressed mood Melancholic depression = (1) severely depressed mood and (2) severe anhedonia and (3) mental agitation with motor retardation Bipolar I disorder (manic-depression) = (4) at least one fully manic episode confirmed by an informant Anxiety disorder = (5) semi-paralyzing anxiety PTSD = (6) trauma reaction Dissociative psychosis (e.g., schizophrenia) = (7) recurring delusions or hallucinations |
Disorder hierarchy |
If a patient qualifies for more than one of the above disorders, the hierarchy to be applied for the primary disorder diagnosis and treatment is: Dissociative psychosis > Bipolar I disorder > Melancholic depression > PTSD > Reactive depression > Anxiety disorder. |
The MDS-7 questionnaire, shown in the first of the two tables, takes the form of a semi-structured interview, beginning with the core symptoms of the most prevalent disorders and progressing to those of the least prevalent but more serious disorders. Each question briefly labels the symptom and then provides a suggested “starter question” for the clinician to use, which can be followed by the clinician’s own questions for clarification when needed, or for confirmation of dysfunction. The sole purpose is to get a definitive yes or no answer for each of the core symptoms and the clinician should not pursue any other symptoms that happen to be mentioned during the interview. Importantly, if the individual is not well enough to be interviewed, it is essential to engage the participation of a reliable family member or close friend, or of an attending nurse if the patient is already in hospital, and run through the MDS-7 questions with them.
The accompanying second table explains the core symptom-based scoring method and the recommended hierarchy for deciding on the most important diagnosis requiring treatment. Note that this hierarchy provides a differential diagnosis. By following the scoring rules in the table, eight disorders – ranging from psychosis to anxiety – can be distinguished. The diagnosing clinician should do the scoring and arrange a follow-up appointment with the patient or, if necessary, with his or her guardian, to discuss the results and what should be done given this initial diagnosis.
Cautionary note about prescribing
The initial diagnosis should always be made as tentative and provisional. When a disorder specification using a mandated ICD-10 code is needed for health insurance reimbursement purposes, as it is in most countries, the clinician should report the most likely serious disorder but use the “unspecified” ICD-10 code, such as F31.9 Bipolar I Disorder, Most Recent Episode Unspecified, or F41.9 Unspecified Anxiety Disorder (see Frances [11]). The diagnosis can always be made more specific after further observation of the patient.
Frances [11] argues that no medication should be prescribed before several follow-up visits, which is wise advice given the fact that we know appallingly little about how psychiatric medications work and that their use is largely a matter of trial and error [19]. But it does not seem prudent to allow the patient to leave the visit with nothing. The medical literature suggests initially prescribing St John’s Wort in what is known as the “informed placebo” procedure [14] [15]. In this procedure the clinician tells the patient, truthfully, that St John’s Wort has been shown to work for many people and that it “may work for you,” but to be careful to take a tablet only on those days when feeling very depressed or anxious. It is also more effective if the clinician authoritatively writes a prescription for St John’s Wort, even though it can be purchased over-the-counter. Although probably mainly placebic in effect, St John’s Wort it is thought to be effective partly because it is a mild herbal MAOI (monoamine oxidase inhibitor) antidepressant [20]. Of course, if the patient is seen in a hospital emergency room and is acutely distressed or uncontrollably anxious, a strong sedative may be required.
4. Conclusions
In-office consultations for suspected mental disorders are not reliable and accurate enough to make treatment recommendations. This conclusion applies to general medical practitioners, psychiatrists, and psychologists or counselors who may take it upon themselves to make a diagnosis. Nor does it help to get a second diagnostic opinion because this is just as likely to be inaccurate as the earlier one. The only solution is for clinicians to use a more accurate diagnostic screening procedure in the first place.
A brief and sufficiently accurate 7-question screener, the MDS-7, is offered in this article. The questions are confined to the essential core symptoms of the main disorder types. Whereas the symptom questions are fixed, the clinician can ask for further information and confirmation of dysfunction if the symptom question does not produce a clear yes or no rating. The screener should take only about 10 minutes to administer plus another 10 minutes afterwards to score the symptoms and make the initial diagnosis. The patient (or guardian) is asked to return the following week to discuss the most probable diagnosis and the treatment options.