Auditory and Visual Versions of the WMS III Logical Memory Subtest: The Effect of Relative Importance of Information Units on Forgetting Rate

Background: The Logical Memory (LM) subtest of the WMS III is a frequently used clinical assessment measure of memory. The goal of the present study is to evaluate three ways of improving the diagnostic utilization of the LM: First, taking into account the importance of the units of information in scoring the test; second, introducing a visual version in addition to the auditory version of the test; and third, by testing the feasibility of group administration of the test. Methods: We compared the effect of importance of information on the forgetting rate of visual and auditory versions of the test. Sixty-nine participants were randomly allocated into Auditory and Visual groups. Recall was tested immediately, 40 minutes later and after a one-week delay. Results: We found that the forgetting rate was steeper for the less, compared to the more, important units of information. The pattern of findings was similar but not identical in the auditory and visual versions of the test. Conclusions: The present results indicate that utilization of the LM could be improved by taking into account the relative importance of the information units, adding a visual modality and applying group administration. These advantages need to be validated in clinical populations with memory impairment.


Introduction
The Logical Memory (LM) subtest of the Wechsler Memory Scale (WMS) How to cite this paper: Lambez, B., & Vakil, E. (2020). Auditory and Visual Versions of the WMS III Logical Memory Subtest: The Effect of Relative Importance of Information Units on Forgetting Rate. Psychology (Wechsler, 1997a) is the most frequently used clinical memory assessment measure (Morris et al., 1997;Vakil, 2012). Although frequently used both in clinical practice (for diagnosis and treatment) and experimental settings (clinical trials and diagnostic studies), standard administration is done individually and orally (Morris et al., 1997). The result limits ecological validity by neglecting possible LM visual-auditory memory assessment qualities (Buchweitz et al., 2009), and limiting reduction of assessment resources by enabling group administration. Additionally, it has been shown that different importance levels affect retrieval (Vakil et al., 1992). Therefore, using the present LM scoring system (Anand et al., 2011) may neglect important information of diagnostic and practical importance.
We believe LM could be further improved to increase its practical importance in the fields of both memory assessment through higher ecological validity, and of memory remediation, through processes of abstracting meaning. Therefore, in this article, we discuss three issues regarding LM's new scoring system and administration methods.
The LM WMS III subtest is a reliable measure of verbal episodic memory (Sullivan, 2005). The task is an index of auditory-verbal memory, requiring verbal recall of two orally presented story passages, consisting of three parts: LM I (immediate recall), LM II (delayed recall), and LM Recognition (delayed recognition). It addresses three processes involved in memory-encoding, storage and retrieval (Li et al., 2006). LM is sensitive in detecting cognitive decline and subtle memory change in early dementia, individuals with mild cognitive impairment (MCI) (An & Chey, 2004;Chapman et al., 1995;Li et al., 2006;Lim et al., 2015;Robinson-Whelen & Storandt, 1992) and TBI (Chapmann et al., 2016;Hamilton et al., 2004;Hayden et al., 2005;Rabin et al., 2009).
In this paper, we address three diagnostic issues of neuropsychological importance regarding the use of the LM subtest: 1) the need for a scoring system to assess memory through information units of different importance levels; 2) validation of visual modality administration; 3) validation of group administration.
Therefore, in the current study, we compared visual and auditory versions of the LM test, asking whether forgetting rate over time is affected similarly in both versions as a function of information units' importance.

Scoring System for Information Unit Importance Level
Commonly used auditory memory testing, such as the Rey-Auditory-Verbal learning Test (Rey, 1964) and paired associates WMS subtest (Wechsler, 1997b) for single words or word lists, are of lower ecological validity (Chaytor & Schmitter-Edgecombe, 2003). LM is a verbal memory test of higher ecological validity (Wechsler, 1997c), by testing memory for a meaningful narrative story with a beginning, middle and ending. Hence, it allows us to assess memory for complex narrative information, that is very similar to learning abilities we need in order to maneuver successfully under everyday demands (Lezak et al., 2004).

B. Lambez, E. Vakil Psychology
However, our everyday learning is composed of memory units of varying importance. Therefore, the current LM scoring system that gives the same weight for each information unit presents a certain anomaly (Lezak et al., 2004), not quite capturing a dominant factor of every day learning challenges, by neglecting differentiation of the more from the least important information units. Reading an article online or hearing an office conversation are two ways in which complex information is perceived before being encoded and stored in immediate and long-term memory. Such remembrance is facilitated by gist-reasoning, a form of developmentally advanced reasoning that is pivotal to new learning. It is defined as the ability to synthesize complex information, whether written, auditory, or visual, into abstract meanings that are not explicitly stated (Chapman et al., 1995); namely, a complex integrative function that is ubiquitous in everyday life.
The ability to abstract the gist-or the most important points-from the information presented, results in new forms of learning such as memory at gist level, which involves assimilating and interpreting incoming information on a generalized level of meaning (Reyna, 2008). Healthy adults typically are able to engage in abstracting meaning and gist memory with relative ease.
In both children and adults, it had been found that recall of written paragraphs was affected by rated importance of "hierarchies": a greater number of important vs. less important units are recalled (Brown & Smiley, 1978, 1977Denhiere, 1980;Denhière & Legros, 1987;Moore & O'Driscoll, 1983). Furthermore, Brown and Smiley found that with additional study time, college students recalled more units of the two most important categories, but the same number of unimportant units as students with less study time.
In order to understand better the relationship between level of importance and memory, the Fuzzy-trace framework has been introduced (Brainerd et al., 2002;Reyna & Brainerd, 1995). According to the Fuzzy-trace theory, the nature of performance in an episodic memory task is driven at the same time by a verbatim trace (or detail information) and by gist information (the more essential elements of the information). Verbatim memory preserves information about the identity, details, and characteristics of the material presented, and can be associated with less important information. Gist memory preserves information for the general meaning or idea conveyed by an assortment of items, reflecting the more important information. Evidence suggests that patients with TBI have persistent gist reasoning deficits (Gamino et al., 2009;Vas et al., 2010). Additional experimental findings show the two types of information dissociated from each other, both at storage and retrieval (Brainerd et al., 2003a); in dual-retrieval accounts of free recall (Brainerd et al., 2003b;Payne et al., 1996), early retrieved information has been found to be dominated by direct access of verbatim traces, but later retrieval is dominated by reconstruction from gist.
The LM test has a scoring system for both immediate and delayed recall, where each item has the same score regardless of importance level, and all correct items are summed giving a maximum score of 25 for each story. However, it B. Lambez, E. Vakil Psychology has been found that when using the LM test, normal participants showed differential forgetting rates as a function of the item's importance, i.e., more important units of information were better retained over time (Vakil et al., 1992). Therefore, different information units have different importance levels regarding the story narrative, affecting their retrieval over time. This is consistent with findings reported above that early retrieval is dominated by verbatim information, while late retrieval is dominated by gist information (Brainerd et al., 2002;Payne et al., 1996). Therefore, in the current (WMS IV) and previous version of the WMS (WMS III), the logical memory stories have gained an additional form of scoring: a second scoring criterion was established, regarding the recall of general topics (gist) (Anand et al., 2011). For example, the original scoring system credits points for the recall of the character's exact profession and place of work (one point each); the additional scoring system now also grants one point if a person recalls that the character was working, even though the location or profession is not remembered. Although this scoring system reflects a higher level of cognition known to affect memory processes, it is not yet used as a normative scoring system for this test. In addition, in the second scoring system each general topic has the same score, not embodying the different importance levels of each information unit to the story narrative. An additional finding is that overall, individuals following TBI not only recalled fewer items from the story, but they did not show the differential forgetting rate as a function of the item's importance (Vakil et al., 1992). Therefore, using the differential importance levels scoring system has otherwise undetected diagnostic qualities.

Validation of LM Subtest Visual Administration
An additional issue of great diagnostic importance is the tested modality. In everyday life, much of the information we learn and acquire is through reading.
To date, most standardized neuropsychological tests for memory are either visual, using mostly material of figures and objects, or verbal, mostly of oral administration. This approach somewhat neglects memory for verbal information presented visually. Therefore, examining memory for visual-verbal information has significant ecological importance.
Recent studies focusing on reading comprehension have shown different brain activation associated with reading as opposed to listening to text (Buchweitz et al., 2009). Furthermore, findings have reinforced theories of brain organization postulating the dedication of unimodal brain regions to the processing of low-level information, while modality independent regions would tap more abstract levels (Jobard et al., 2007;Mesulam, 1998). On the other hand, some behavioral results show no difference across modalities, demonstrating that these higher-order processes can also be intertwined and are not only separate; successful reading relies on an interaction between decoding linguistic visual input and accessing phonological information (Booth et al., 2000(Booth et al., , 1999. However, Sullins (Sullins et al., 2010) points out a visual modality effect, wherein learning abilities are en-B. Lambez, E. Vakil Psychology hanced when text is presented visually rather than orally. From a remediation perspective, phonics interventions have greater initial effect sizes on explicit reading measures, but interventions with a comprehension component result in greater effect sizes later. Improvement in reading is beneficial for memory, as it involves construction of coherent memory representations (Cain et al., 2004).
Interestingly, a recent study found science comprehension tasks, fluid intelligence and domain-specific knowledge fully accounted for the ability to comprehend texts and videos. Findings suggest fluid intelligence can predict comprehension ability, regardless of modality (Schroeders et al., 2013).
Some cognitive processes, such as inference-making and other higher-level cognitive processes, are not modality specific (Booth et al., 2002). However, some processes use a specific modality through recruitment of distinct cortical areas (Cohen et al., 2002). These findings demonstrate the advantage of visual presentation over oral presentation in reading comprehension ability, by means of supporting a visual modality specific neural mechanism. Therefore, it is important to understand whether the enhanced learning abilities in visual modality on reading comprehension would be expressed in memory differences. Meta-analysis based on 91 studies in the field of multi-media learning found learning from a visual text led to better scoring on a retention test than learning a spoken text narrative (Wang et al., 2016). Additional research showing the same advantage for the visual text (Crooks et al., 2012;Tabbers et al., 2004) supports the dual-coding theory (Paivio, 1969) and the dual route theory of reading (Jobard et al., 2003). Both imply that while reading information, we access orthographic and phonologic information, whereas when hearing it we mostly access the latter. This explains why deeper processing while reading results in better storage and retention. Therefore, assessing memory using auditory and visual versions of the same LM test is significant both theoretically and diagnostically, since they do not necessarily tap the same memory processes.

Validation of LM Subtest Group Administration
The last issue of diagnostic importance is individual vs. group administration. In the current study, we aimed to validate further and gather more information from the LM subtest, first by using a differential scoring system as a function of importance of information units. Based on the aforementioned findings, we predict that the importance level factor will have a greater effect on more de-B. Lambez, E. Vakil Psychology layed recall. Thus, the forgetting rate of the less important units of information would be steeper than that of the more important ones. Second, by adding a visual modality administration: in line with the reviewed literature, we predict superior recall for the visual over the auditory modality. Third, by examining the validity of group administration.
Therefore, in the current study we compared visual and auditory versions of the LM test, and asked whether the information units forgetting rate over time is affected similarly in both versions as a function of the information units' importance.

Participants
Participants were allocated randomly to one of two groups. Social demographic details were gathered, based on anonymous self-report, including the Psychometric Entrance Test (PET) score for higher education in Israel. The PET measures mainly verbal and rhythmic acquisition of cognitive and scholastic abilities, in an effort to predict future success in academic studies. The test consists of three subtests: verbal reasoning, quantitative reasoning, and English as a foreign language (Beller, 1994).
The groups were formed based on the modality of the material presented at the study phase: The Auditory modality group (n = 32, mean age = 22.5, age range 20 -25) and the Visual modality group (n = 37, mean age = 22.32, age range 19 -25). All participants were undergraduate first year psychology students at Bar-Ilan University, who took part in the experiment to fulfil academic requirements (see Table 1). The study was approved, as required, by the Institutional Review Board of Bar-Ilan University. Informed consent was obtained from all participants.

Procedure
A Hebrew version of LM story A of the WMS III (Wechsler, 1997a) was given to all participants. The Auditory group had the story read aloud by a neuropsychologist, in standard fashion as described in the WMS III manual. Standard administration differed in two aspects: group administration, and that participants were asked to write what they remembered of the story instead of repeating it aloud. The Visual group had the story administered in a single written paragraph, presented on a projector screen for 25 seconds. The presentation duration time was determined based on a preliminary study, in which 10 participants were asked to read the paragraph and instructed to remember it later. The average time was 25 seconds. Both were in group administrations, and participants were asked to remember the story as accurately as possible, while remembering as many details as possible. Recall of the story was requested three times to test immediate recall, delayed recall after 40 minutes and once again after one week.
After the one-week recall, participants were presented with the single-story paragraph on a separate sheet of paper. They were asked to rank the importance of each story unit (as determined by the WMS III manual) using a three-point score: one point for the least important, two points for the important, three points for the most important unit of information (25 units of information in total). Each participant's ranking for each unit were summed up, and the units were ranked according to the number of points reached. The list of units was then divided into three groups: the nine units of the story with the highest attained sum score were considered the most important, the eight second highest ranked units were considered important, and the eight lowest ranked units were considered the least important.
The stories were scored according to the WMS III manual by two skilled independent judges. In case of disagreement, a neuropsychologist gave the final score. The total score for each story was then broken down into three scores, expressing the number of units recalled at each level of importance, as determined previously.

Results
Mixed ANOVA with repeated measures was conducted in order to test the effects of Importance (Most important, Important, & Least important), Time (Immediate, 40 minute & one week delay), and Modality at the study phase (Auditory vs. Visual); the former two are within-subjects factors and the latter is a between-subjects factor. The results showed that both main effects, Importance and Time, reached significance, F (2, 66) = 82.91, p < 0.001, η 2 = 0.55 and F (2, 66) = 40.40, p < 0.001, η 2 = 0.38, respectively. However, the main effect for Modality did not reach significance, F (1, 67) = 0.384, p = 0.54, η 2 = 0.01. These main effects should be interpreted cautiously because of the significant Importance by Modality and Importance by Time interactions, F (2, 66) = 3.51, p < 0.05, η 2 = 0.05 and F (4, 64) = 5.29, p < 0.001, η 2 = 0.07, respectively. Interestingly, the triple interaction of Importance by Time by Modality reached significance as well, F (4, 64) = 2.86, p < 0.05, η 2 = 0.04. As can be seen in Figure 1  were significantly steeper (p < 0.001) than for the most important units of information (levels 3). In the comparison of importance level 1 vs 2, we found a difference between group modalities: the auditory group showed a main effect for importance (p < 0.01), where level 2 information units were remembered significantly more than level 1 units. However, the forgetting rate was not significantly dissociable on the two importance levels. Unlike in the auditory group, the forgetting rate of level 1 was significantly (p < 0.01) steeper than that of level 2 (see Figure 1(a) and Figure 1(b)).

Correlation between overall LM performance and External University Admission Testing (Psychometric Higher-Education Entrance Test-PET)
Pearson product moment correlation was conducted between overall performance on LM test (collapsed over time) and performance on the PET. We investigated the correlation in each modality group separately. As can be seen in Figure 2, LM performance was significantly associated with PET score for the visual group, r (37) = 0.38, p < 0.01), but not for the auditory group, r (32) = 0.07, p = 0.36.
In order to detect the source of this significant correlation in the visual group, we broke down the correlation to the three levels of importance (Figure 3). We found significant correlations for the important units (level 2), r (37) = 0.52, p < 0.01 and the most important units (level 3), r (37) = 0.40, p < 0.05, but not for the least important units (level 1), r (37) = 0.01, p = 0.98. These results indicated that the source of the significant correlation between the PET score and recall of B. Lambez, E. Vakil Psychology

Discussion
The LM subtest of the WMS is the most widely known and used neuropsychological test assessing complex verbal memory (Trifilio et al., 2020). Story memory tests were designed to assess everyday learning and remembering of new declarative information, and have been validated in this respect (Squire, 1987).
The aim of the present study was to evaluate three ways of improving LM diagnostic utilization: first, taking into account the importance of the units of information in scoring the test; second, introducing a visual version in addition to the auditory version of the test, and third, by testing the feasibility of group administration of the test. LM was scored using three levels of importance deter-B. Lambez, E. Vakil Psychology mined by the participants. Results indicate that regardless of modality, as information units were more important, they were remembered overall and retained better, showing a lower forgetting rate over time. The manual's scoring system (Anand et al., 2011) does not take into consideration information units' level of importance. Rather, it considers recall of general topics (gist), with each topic resembling the same scoring value. Our findings reinforce previous research (Vakil et al., 1992), showing memory is sensitive to different levels of importance that are more pronounced over time. Hence, this emphasizes the need for an elaborated scoring system that takes into consideration the levels of importance of each information unit. As shown in past research, some clinical populations, such as individuals with TBI, do not show differential forgetting rate as a function of importance (Vakil et al., 1992). This is attributed to impaired executive functions processing, resulting in difficulties when applying an elaborated strategy, either at encoding, retrieval or both. Therefore, a scoring system that takes into consideration the importance level of each information unit is potentially more sensitive to memory deficits than the standard scoring system.
The second aim of the present study was to compare a visual version of the test to the standard auditory version. We found that the overall scores did not differ between oral and visual administration, indicating the visual version of the test reflected a results pattern similar to the standardized auditory version. This finding is somewhat surprising, considering the vast literature (Sullins et al., 2010) showing a visual modality advantage as a result of higher-order intertwined processes, that rely on each other. However, we can presume that since recall was in writing, interaction of processes such as decoding phonological input presented visually and assessing phonological information presented orally happened in both modality conditions, resulting in similar behavioral manifestations (Booth et al., 2000(Booth et al., , 1999. However, upon closer examination, we found a correlation between visual recall and PET score. This was not true for the auditory version, possibly implying that different underlying cognitive processes are involved in the different modalities. These potential differences lie in the nature of the mental representations that get constructed through the modality, as explained by the dual-coding hypothesis (Paivio, 1991;Sadoski & Paivio, 2004). It postulates that visual and auditory information are each processed differently and along distinct channels. The reason why one channel correlated more with PET scores (reflecting verbal, quantitative reasoning) than the other, might be explained by Baddeley's (Baddeley, 1992) working memory model consisting of a visuospatial sketchpad and a phonological loop. According to Baddeley's model, learning is more effective if both channels are utilized (using both visuospatial sketchpad and phonological loop), resulting in less cognitive load for either channel. Accordingly, we might propose that during visual administration, both channels are triggered inherently and more robustly, resulting in deeper processing and more effective use of working memory (Oberauer & Lin, 2017). As working memory is known B. Lambez, E. Vakil Psychology to be highly correlated with general intelligence scores (Oberauer et al., 2005), it can be understood why visual administration requires more effective use of working memory, thus explaining the findings of greater correlation with intelligence through PET scores (Kline, 2013). However, this assumption is based on PET scores of university entrance exams which were available through self-report.
In order to expand and examine precisely the underlying processes, future research must include a more comprehensive, standardized neuropsychological assessment, while correlating the experimental results with a wide range of cognitive functions.
The third aim was to examine the validity of group administration. Group administration of neuropsychological tests is not used frequently, mainly because of the diagnostic setting in clinical situations. However, group administration could offer significant economic benefits for experimental and clinical proposes. The present raw scores (LM immediate recall, M = 47.82, Delayed recall, M = 29.59) are at the upper end of the average range of raw scores for this age group (WMS-III manual, immediate recall from M = 32 -50 and for delayed recall M = 18 -32) (Wechsler, 1997b). Hence, group administration yields a slightly higher score than the average results for individual administration, which could be viewed as validation of this form of administration. The fact that performance was a little above average could be attributed to the fact that the sample here is of students with higher intelligence than the general population.
In order to establish a more significant validation, future research is required with a larger and more heterogeneous sample of participants, to enable a direct comparison of group versus individual administration.
The present results indicate that utilization of the LM could be improved by taking into account the relative importance of the information units, adding visual modality and applying group administration. These have several practical implications; First, when accounting for the scoring system of different importance levels, we acquire additional information on the person's ability to extract abstract knowledge. This could then be translated later into memory remediation strategies through practice and improvement of abstracting and extracting abilities. Second, when implementing visual administration, we can further clarify the diagnosis of verbal memory deficit. As verbal-visual and verbal-auditory memory are distinct processes (Lezak et al., 2004), it is beneficial to implement more precise and focused diagnostic and rehabilitation strategies. Third, through group administration we are able to increase efficiency of screening on larger scales through faster and simultaneous evaluation.
These advantages need to be validated in clinical populations with memory impairment. Psychology of the paper. EV collaborated in writing the first version of the paper. All authors participated in rewriting and revising the manuscript, and approved the final version. All authors read and approved the final manuscript.

Funding
This work was supported by the Israeli Ministry of Defense, Rehabilitation Department [203003-846].

Availability of Data and Materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Ethics Approval and Consent to Participate
All participants were informed about the aims and characteristics of the study and signed a written consent form. The procedure was approved by the Ethics Committee of the Faculty of Psychology, Bar Ilan University.

Consent for Publication
Participants were informed about the evaluation process and how their data would be handled. All participants authorized their data to be used for scientific publications. All authors also approved the publication of this article in its present form.