Clinical Validation of Cambridge Neuropsychological Test Automated Battery in a Norwegian Epilepsy Population

Introduction: Semi-automatic neuropsychological testing has gained a position both in clinical use and in research. Comparison studies with traditional neuropsychological tests are sparse and the role of such semi-automated testing is debated. To integrate semi-automated neuropsychological testing in the established clinical setting the tests must be validated in the patient groups addressed. The aim of this study was to validate Cambridge Neuropsychological Tests Automated Battery (CANTAB) in patients with epilepsy. Material and Methods: Patients scheduled for traditional neuropsychological testing with Category test (CT), Trail Making Test part B (TMT-B), WAIS-III and WMS-R were also asked to complete the CANTAB battery. Our hypothesis was that memory tests from CANTAB (DMS, PAL) would correlate with visual memory tests from WMS-R and that a test of executive functions from CANTAB (SOC) would correlate with functions tested with TMT-B, CT and WAIS-III. Results: Scores from DMS correlated strongly with Visual Paired Associations 1 from WMS-R. From SOC results correlated both with Visual Paired Association 1 & 2, General Memory Index and Full Scale IQ. Results from PAL correlated with several results from the traditional battery: Verbal, Visual and General Memory Index, Paired Associations, Visual Memory Span Backwards, TmtB and Visual IQ. Conclusion: Our results indicate that DMS primarily tests visual matching to sample. SOC tests executive functions and also depends on non-verbal IQ and memory. Numerous correlations between PAL and traditional tests illustrates that PAL is a complex task depending on several cognitive domains, but mainly memory.


Introduction
Automated neuropsychological testing has over the last two decades gained a position both in clinical evaluation and follow-up of patients and also in neurocognitive research.The full prospects of such automated testing are still debated among neuropsychologists.Unfortunately, validity studies and comparisons between traditional tests and automated tests are sparse [1].Still the possible advantages of such testing in certain clinical settings are evident and its use must be further explored.Cambridge Automated Neuropsychological Test Battery (CANTAB) [2] is a semi-automated neuropsychological test battery applied on a lap-top PC, developed at the University of Cambridge.CANTAB contains 22 neuropsychological tests in five cognitive domains: Visual memory, semantic/ verbal memory, decision making and response control, executive function and attention.With CANTAB it is possible to create cognitive test batteries adapted to the clinical setting by choosing tests that address the relevant cognitive function or area of the brain.CANTAB offers several advantages compared to traditional neuropsychological testing.The test-procedure is highly standardized and the tests can be administered by personnel without neuropsychological training after short instructtions.This makes CANTAB more feasible in an everyday clinical setting.The range in difficulty level within each test is wide, reducing the possibilities for floor and ceiling effects.
CANTAB has already been used and shown its applicability on a wide range of known cerebral diseases such as Parkinson's disease [3], Alzheimer's disease [4], Huntington's disease [5] and stroke [6], in addition to neurosurgical disorders such as head injuries [7] and normal pressure hydrocephalus [8].Neurological diseases with known focal effects on the brain have contributed to establish construct validity of CANTAB [9].Comparisons with traditional neuropsychological tests such as Wiscon-sin Card Sorting Test [10], Digit Forward Span [11] and Wechsler Verbal Paired Associates have been conducted for some of the CANTAB tests.More comprehensive comparisons have also been done [12,13].We could only find one article describing CANTAB used in patients with epilepsy [14].This article underlines the positive prospects of CANTAB regarding this patient group.CANTAB has earlier been validated in a Norwegian cohort of patients operated for arachnoidal cysts [15] and used to assess cognitive function level in a Norwegian cardiac arrest cohort [16].
Validation of new methodology is mandatory to investtigate whether the applied method actually address the issue in focus.Criterion validity approaches this question by comparing the test under investigation with an already established and validated test, i.e. the "gold standard".We have previously found good construct validity in a group of Norwegian hospitalised patients [15].
The aim of this study was to investigate criterion validity of the CANTAB battery in a Norwegian epilepsy population.We chose CANTAB subtests used frequently in earlier studies [17], which assess functions known to be affected in epilepsy patients, such as memory and executive functions.[18].However, the main focus in this study is to investigate criterion validity of the tests rather than to attempt a representative assessment of cognitive deficit in patients with epilepsy.The tests chosen could a priori be matched to a traditional neuropsychological test battery.The subtests are non-verbal, which make them useful in a non-English speaking community.
As criterion variables we selected tests from a standardized test battery routinely used at the Departement of Neurology, including the composite intelligence test WAIS-III [19], the composite memory test WMS-R [20] and the composite neuropsychological Halstead-Reitan Battery (HRB) [21].As a general hypothesis, we expect-ed that measures from DSM and PAL would show high correlations with measures of non-verbal memory from the WMS-R, and that measures from the SOC would show higher correlations with intelligence and measures from the HRB thought to measure executive functions (Category Test and Trail Making Test, part B).In addition to analyzing correlations with summary measures of intelligence and memory, we selected tests of visual reproduction, visual paired associate learning and visual memory span from the WMS-R for a closer analysis.Thus, more distinct patterns of correlations with non-verbal memory for each of the CANTAB subtests may be detected.

Patients and Test Setting
Patients with epilepsy over 16 years of age scheduled for clinical indicated neuropsychological assessment were eligible for inclusion.Patients with dementia, severe psychiatric history or use/abuse of central stimulating or inhibiting drugs, except anti-epileptic drugs, were excluded.Potential study subjects were given information about the project and asked to participate at the scheduled appointment with the neuropsychologist.The patients were all admitted to the Neurologic department at Haukeland University Hospital for 2 -3 days of cognitive examination.If included, the CANTAB test was performed during this period to ensure that patients were in the same condition when tested with the two test batteries.All tests were conducted by the same investigators on CAN-TAB (JT) and traditional battery (AG).Because our focus is the comparison of methods, we did not aim to include a representative selection of Norwegian epilepsy patients.

CANTAB Battery
With CANTAB the patient is presented for the test on a touch sensitive screen and responds by touching the screen.The integrated software records and processes the responses generating results as raw scores on each test.The CANTAB soft-ware also contains results from an English normal reference population.By comparing the raw scores from the CANTAB test to the mean rawscores in the reference population z-scores are generated.Thus each result from CANTAB testing are presented booth as raw scores and z-scores which indicates the patient's level of cognitive performance within the tested domain compared to a normal population.From CAN-TAB it is possible to report a wide range of results from each test describing different aspects of each function tested.We chose to report the results (Table 1) most frequently described in earlier studies and which has shown strongest test-retest reliability [22].
The tests from CANTAB are described below: 1) Motor Screening Test (MOT) was conducted first to screen for ability to cooperate with the apparatus.Patients were instructed to point on a flashing cross as soon as it appears.
2) With Delayed Matching to Sample (DMS), memory and forced decision-making were tested.DMS is reported to be a test for both immediate matching to sample, delayed matching to sample and forced choice recognition memory.This test may be sensitive to damages mainly in the medial temporal lobe with some input from the frontal lobe.Patients were asked to remember 30 non-figurative objects, recall them and distinguish them from other similar patterns after a delay of 0, 4 or 12 seconds [23].
3) Paired Associate Learning (PAL) is a test of episo-dic and visual memory but also depends on the ability of spatial planning.The performance on PAL depends on input mainly from the temporal lobe, but also from the frontal lobe.The patients had to remember the location of different patterns appearing on the screen and then point out where on the screen the pattern initially was shown.Increasing difficulty level ranging from two to eight patterns to be remembered [23].4) Stockings of Cambridge (SOC) is described as a test of executive function.It requires spatial abilities and strategic planning and is claimed to give a measure of frontal lobe function.Patients had to move three coloured circles in the lower half of the screen to match a given pattern in the upper half of the screen.The difficulty level increases as the number of minimum moves needed to complete the task rises from two to five [24].These tests and reported results are described in detail elsewhere [15].

Traditional Neuropsychological Battery
Category test, Trail Making Test part B, WAIS-III and WMS-R were administered according to standard instructions given in the manuals.In the Category test, the patients were presented to figures on a screen, and asked to respond by pushing a button indicating a match between the figure and one of the numbers 1 -4.They were given auditory feedback as to whether the responses were correct or incorrect.Using this method, the ability to detect and follow distinct principles in seven series of pictures is tested, and the score is the number of incorrect responses.In the Trail Making Test part B, 25 digits and letters is printed on a sheet of paper, and the task is to draw a line in alternating sequence (number-letter-number-let-ter…) as fast as possible.The score is the total time (in seconds) used to perform the test.In addition to raw scores, T-scores based on age-corrected norms (Matthews & Kløve, 1964) were calculated.The WAIS-III consists of 14 subtests, and gives a full-scale IQ score and separate Verbal and Performance IQ scores, as well as four index scores.We only analyzed the IQ values, as indicators of general abilities.The WMS-R consists of two verbal and three visual memory tests, tests of digit span, visual span and mental control, and delayed memory testing for two verbal and two visual tests.A general memory index, separate indexes of verbal memory, visual memory, attention/concentration and delayed memory is given.In the Visual Reproduction (VRI) subtest, four geometric figures are presented at 10s each, and each figure is drawn immediately after presentation.In the delayed condition (VRII), the subject is asked to reproduce the same figures from memory after about 30 minutes.In the Visual Paired Associates I (VPAI), the task is to remember figurecolour associations.Six meaningless figures are shown together with six particular colours for 3 s for each figure-colour pair, and the subject should match each figure with the associated colour immediately after presentation of all six pairs.This procedure is repeated three times, and the score is the sum of correct associations.In the delayed condition (VPAII), a similar match is required after about 30 minutes.In the Visual Memory Span, subjects are asked to repeat sequences of pointing at squares on a piece of paper.In the backwards condition, the sequences are to be repeated backwards.Reported results from traditional tests are also shown in Table 1.

Classification of Impairment
Classification of the patients as cognitively impaired or non-impaired was done both based on CANTAB results and results from traditional testing.Regarding CANTAB this was preformed by applying criteria suggested by Jackson for the categorization of cognitive failure in intensive care patients [16,25]: The patients were classified with a cognitive impairment if they had a z-score ≤ -2.0 on two or ≤ -1.5 on three of ten tests.No strict criteria for classification of impairment on WAIS-III and WMS-R are defined.We chose to classify patients as impaired if they had a Full Scale Intelligence Quotient (FSIQ) or General Memory Index (GMI) equal to or below 80.This criterion corresponds roughly to the Jackson et al ( 2004) criteria for impairment on CANTAB, and would also be in accordance to the separation between normal and below normal performance on the WAIS-III [19].

Statistics
Comparison of CANTAB results between the epilepsy group and the integrated reference population was done using one-sample t-test.We used Pearson's correlation coefficient to assess and express the correlation between traditional tests and CANTAB tests.All results from CANTAB were tested against all results from traditionaltesting (Table 1).All analyses used SPSS 17.0 for Windows (SPSS Inc., Chicago, IL, USA).

Ethics
The project was approved by the regional ethics committee and the National Data Inspectorate.

Level of Cognitive Performance
Based on results from CANTAB and according to the criteria suggested by Jackson, 47% (95% CI: 18% -75%) were classified as cognitively impaired.According to the traditional tests, 53% (95% CI: 25% -82%) were cognitively impaired.The classification of each patient is shown in Table 3.Only three of the 15 patients were classified differently by the two batteries.Regarding the level of cognitive function on each test in the epilepsy group, the results are shown in Table 4 for both CANTAB and the traditional battery.The epilepsy group scored worse than the integrated reference population on all tests, but only significantly on SOC and PAL indicating respecttively executive dysfunction and reduced visuospatial memory when measured with CANTAB.

Correlations
Because of the small sample, we decided to analyze only the variables in which statistical significant correlations were found both with raw and standardized scores.This was meant as a conservative measure, to avoid undue attention to spurious correlations.

Delayed Matching to Sample
Regarding DMS we found that results on the 12 seconds delay subtask from CANTAB correlated significantly with results from Visual Paired Associations I from the WMS-R (Table 5).

Discussion
The most important results from our correlation of results from CANTAB with traditional neuropsychological tests are the following: Total correct scores on the 12 seconds delay in the CANTAB subtest Delayed Matching to Sample (DMS) correlated strongly with the Wechsler Memory Scale-R subtest Visual Paired Associates, condition 1.No other neuropsychological tests showed significant correlations with measures from the DMS, strongly indicating that this condition of the test is a unique test of visual matching to sample, which may be regarded as a function mainly depending upon the temporal lobes.
In the CANTAB subtest Stockings of Cambridge (SOC), Subsequent thinking time on the most difficult task (5 moves) correlated with general measures of memory and intelligence.Moreover, correlations with non-verbal measures of memory and intelligence were significant whereas correlations with verbal measures were non-significant.This indicates that this condition of SOC depends on both memory and reasoning, and mainly non-verbal aspects of these functions.Correlations between results from the CANTAB subtest Paired Associate Learning (PAL) and traditional tests revealed a wide range of correlations with both memory and intelligence tests, but the most consistent correlations were with general measures of memory.This indicates that PAL is a complex test, demanding the use of global memory, visual memory, verbal memory and spatial skills.However, the strong correlations with memory tests may reflect a particular association between this test and temporal lobe functioning.
The two batteries showed a high degree of agreement in the classification of cognitive impairment.12 of 15 patients (80%) were similarly classified as impaired or not impaired by the two test batteries.We also found similar estimates of incidences of cognitive dysfunctions by the two test approaches, 47% vs. 53%.The criteria the classifications are based on are considered quite strict regarding both batteries.The classification is for both tests mainly based on memory tests (DMS and PAL from CANTAB and WMS-R from the traditional battery) and tests for executive function and intelligence (SOC from CANTAB and WAIS-III from the traditional battery).Hence we claim that comparing classification and estimates of incidences from the two batteries is relevant.
When looking for correlations between PAL and traditional tests several interesting and also clinically logical features were revealed.All three results from PAL (Total errors, First trial memory score and Total trials) correlated with Global memory index.In addition both Verbal and Visual memory indexes correlated as well.Somewhat surprisingly, there was a tendency towards stronger correlations with verbal than non-verbal memory.This may indicate a need to verbalize to perform effectively on the PAL test.Patients frequently report that they are connecting names to the figures shown in this subtest.The First trial memory score indicates immediate ability to store visual information.This condition correlated strongly with the Visual paired association and more moderately, but still significant, with the subtest Visual Span Backwards (VSB), from the WMS-R.As expected, this may indicate that this condition demands immediate recollection of visual patterns and associations.The more complex conditions of Total errors and Total trials showed stronger association with general memory.Especially since correlations with IQ were generally weak, this underlines the dependency of these test conditions on temporal lobe function, also claimed by other authors [26,27].Thus, the First trial memory score may be viewed as depending on passive storage, whereas results of Total errors and Total trials depends more on active recollecttion.Hence it is not surprising that these results correlate with the more cognitive demanding Trail Making Test, part B (TMT-B) and also with visual IQ.In addition, both TMT-B and VSB are heavily dependent on spatial abilities.The correlation between subtests of PAL and these tests thus may indicate that spatial abilities also are important for performance on the PAL subtest.This is not unexpected, given the demand to remember spatial locations.
Only results from the most difficult task on DMS (total correct 12 seconds delay) correlated with results from the traditional testing (Visual paired associations, condition I).Results on VPAI are based on the ability to remember a match between patterns and colours.This very strong correlation confirms that DMS mainly is a test of visual matching to sample.This aspect of memory depends heavily on forced-choice decision, and may be dependent on frontal as well as temporal lobe function.Even though both DMS and PAL assess memory function, our results indicates that DMS assesses mainly visual memory while PAL assesses memory in general, including visual and verbal memory.PAL may reflect free recall aspects of memory known to be more dependent on temporal lobe function, whereas the forced-choice format of the DMS may lead to larger involvement of the frontal lobes.However, the problem of localization of memory in the brain is not yet completely resolved, and there is a degree of uncertainty in these interpretations.In addition, our results support that performance on PAL depends on inputs from several cognitive domains, whereas DMS is a much more specified test of visual matching to sample.
SOC is regarded a test for executive function.We compared SOC with tests from our standard clinical test battery, which are thought to be tests of executive functions, i.e.Category test and TMT-B.Somewhat surprisingly, we found no significant correlations between SOC and these tests.However, the definitions of executive functions are rather vague when it comes down to specific tests, and the general concept of executive functions probably contains a wide variety of cognitive functions [28].Thus, it may not be expected a high degree of correlation between different measures of this construct.SOC may be a specific test of strategic planning and execution of such plans, whereas Category and TMT-B tests other aspects of executive function.Results reported from mean thinking time, 5 moves, from SOC correlated with several memory scores (Visual reproduction, 1 and 2, Visual memory index and Global memory index).Subsequent thinking time is the time used after the initial move has been made.If the patient makes a wrong move he can reset the target stimuli to start over again.To do this resetting fast and effectively the patient needs to remember the last position or the initial position of the target.This may explain why memory seems an important asset regarding SOC even though SOC is a test for executive function.Not surprisingly, results from SOC also correlated with IQ (Full scale IQ and Performance IQ).IQ is claimed to be an important factor in executive functioning.This result that SOC test executive domains in our population.
One problem with our study is the small sample of patients tested.This make the statistical power low, and there may be correlations of clinical importance that are lost because of this.However, especially the highest correlations found may be relatively robust, and probably reflects important common variance between the tests.Our study population is not explicitly selected to be representative of Norwegian epilepsy patients.Hence the level of cognitive performance we found can not be generalised to describe epilepsy cohorts in general.

Conclusion
Correlations found between Cambridge Neuropsychological Test Automated Battery and traditional neuropsychological test batteries commonly used in Norwegian epilepsy patients support the criterion validity of CAN-TAB.Targeted cognitive domains previously documented to be assessed by CANTAB in other populations have shown to be targeted also in our Norwegian sample of patients with epilepsy.This strengthens the view that CANTAB is applicable for assessment of Norwegian patients in the same manner as English-speaking patients.

Table 1 . Systematic overview of results reported from both CANTAB and traditional battery with abbreviations (abb).
15 patients were included and tested with both traditional neuropsychological tests and CANTAB.Mean age was 34.1 years (16 -62).11females and 4 males were tested.Table2describes the epileptological data in the patient group.No particular effort to achieve a representative selection of Norwegian epilepsy patients was done.However, patients were included successively, without any selection other than availability and willingness to participate.

Table 4 . (a) Mean value and SD's in the epilepsy group on each CANTAB test. P-values from one-sample t-test with test value 0 indicating different level of performance in the epilepsy group compared to the integrated normal reference population in CANTAB; (b) Mean values and SD's in the epilepsy group on reported measures from the traditional battery.
* p < 0.05.