Conservatism in Materiality Judgements: The Effect of Auditor’s Sex and Culture1 ()
1. Introduction
The concept of materiality permeates the audit process. The International Statement on Auditing (ISA) 320 (International Assurance & Auditing Standards Board (IAASB), 2009) states that the auditor should apply the concept of materiality both in planning and performing the audit, and in evaluating the effect of individual misstatements on the audit and of uncorrected misstatements, if any, on the financial statements and in forming the opinion in the auditor’s report.
Good materiality judgements are therefore crucial for the conduct of an efficient and qualitative audit. The lower the materiality, the greater the scope of the audit is. If materiality is too high, the auditor might not collect enough evidence and certain abnormalities might not be detected, increasing the risk of expressing an unqualified opinion where a modification is reasonably justified (false positive). If, however any error of whatever small size needs to be found, the auditor would engage in extensive audit procedures that are no longer justified from a cost-benefit perspective (reasonable assurance), increasing the risk of qualifying financial statements that still give a true and fair view (false negative).
Despite its importance, materiality remains nevertheless deemed vague (Azzopardi & Baldacchino, 2009) and auditors face uncertainty about material misstatements (Knechel, Krishnan, Peyzner, & Velury, 2013). Regulatory authorities and professional bodies have been quite cautious in publishing guidelines or rules of thumb regarding materiality (Chewning & Higgs, 2002), as this might prevent that auditors would simply rely on quantitative measures, without careful consideration of qualitative factors. Materiality is therefore not a purely objective concept but as argued by Knechel et al. (2013) “materiality assessments require complex, subjective judgments and estimates, opening the door to errors and biases.”
While it is generally accepted that the auditor’s determination of materiality threshold is a matter of professional judgment and thus inherently subjective, the literature on materiality decisions overlooks the effect of the auditors’ personal characteristics on materiality decisions. The extant research in materiality judgements mainly focuses on the methods and measures used to calculate materiality and reveals that auditors do not treat materiality uniformly, with large discrepancies between thresholds applied in practice (Holstrum & Messier, 1982; Iselin & Iskandar, 1999; Chewning & Higgs, 2002; Messier, Martinov-Bennie, & Eilifsen, 2005; Azzopardi & Baldacchino, 2009; Vance, 2011; Azzali, Mazza, Fornaciari, & Trinchera, 2018).
Variations in materiality judgments may be due to the absence of clear materiality guidelines or due to a number of factors including the auditor’s personality and contextual differences. The relevance of auditor’s personality in materiality judgements is in line with current researchers arguing that differences across individual auditors can influence audit quality (DeFond & Francis, 2005; Church, Davis, & McCraken, 2008; Francis, 2011) and studies providing evidence that audit outcomes can be influenced by individual auditor characteristics (Chi, Huang, Liao, & Xie, 2009; Gul, Wu, & Yang, 2013). Knechel, Vanstraelen and Zerni (2015) report that auditor aggressive and conservative reporting is a systematic audit partner attribute and not randomly distributed across engagements. Also, Messier, Owhoso and Rakovski (2008) suggest that personal attributes such as risk tolerance and overconfidence might engage audit partners into different reporting styles.
In this paper, we focus on the sex and the socio-cultural background of the individual auditor as differences in both sex and culture are likely to be individual auditor attributes that affect materiality judgements. As noted in Birnberg’s (2011) framework on Behavioral Accounting Research (BAR) “gender-related issues such as risk taking” could be important in BAR and “the potential role of national cultures is becoming more important as BAR internationalizes”.
Analyzing the final written ability exam of 160 future Belgian auditors, we find evidence that the sex and socio-cultural background of auditors affect their materiality judgements to be either conservative or aggressive.
Our study contributes to the growing literature examining the effect of individual auditor attributes on audit outcomes. Further, if personal characteristics impacting materiality judgements can be identified, the audit firms can take action (e.g., mixed audit teams) to lessen or compensate for the differences in judgement among auditors.
The rest of this paper proceeds as follows. Section 2 provides background by reviewing the relevant research literature and presents our research hypotheses. Section 3 describes our data and research method, and Section 4 reports and discusses our results. Section 5 concludes and presents limitations and implications of the study.
2. Literature Review and Hypothesis Development
2.1. Sex and Culture Matter
A substantial body of risk research (original articles and meta-analyses) from outside the accounting indicates that women and men differ in their perceptions of risk (Gustafson, 1998; Byrnes, Miller, & Schafer, 1999; Charness & Gneezy, 2012). The consistent finding in these studies is that women perceive risks higher than men and are also less willing to take on risk (more risk-averse) than men. Women also seem to perceive different risks (Gustafson, 1998).
Likewise, management research shows longstanding evidence that women executives are more conservative when it comes to risk (Muldrow & Bayton, 1979; Huang & Kisgen, 2013; Baixauli-Soler, Belda-Ruiz, & Sanchez-Marin, 2015). The firm risk level is smaller when a CEO is a female (Khan & Vieto, 2013) and firms run by female CEOs tend to have lower leverage and less volatile earnings (Faccio, Marchica, & Mura, 2016).
Recent research also suggests that men and women react differently in an auditing context. The impact of sex differences on audit judgments is confirmed in laboratory settings (Chung & Monroe, 2001; Gold, Hunton, & Gomaa, 2009) and in archival studies, where Chin and Chi (2008)—for the Taiwan market and Hardies, Breesch and Branson (2016)—for the Belgian market found evidence that female auditors were more likely to issue a going-concern opinion (GCO). The study of Hardies et al. (2016) also found that the effect of client risk on the likelihood that an auditor issues a GCO is larger for female than for male auditors.
To our knowledge only the dated study of Estes and Reames (1988) tested the effect of gender on materiality decisions. Using a survey-based case-study it was concluded that male auditors tend to increase their materiality threshold, albeit that the results were not significant.
As the above discussion suggests, female auditors might be more conservative than their male counterparts in setting materiality thresholds, leading to our first hypothesis:
HYPOTHESIS 1: Female auditors set, ceteris paribus, lower materiality thresholds than male auditors.
In their literature review of cross-cultural differences in auditors’ judgment and decision making (including risk, confidence and probability judgements) Nolder and Riley (2014) recommend extra research “to respond to both the gap in the extant literature and the changing multicultural environment of audit firms”.
Most studies investigating whether culture differences impact judgement, have utilized the framework of Hofstede (1980) (Birnberg, 2011). In his latest version, Hofstede (2011) defines culture as “the collective programming of the mind that distinguishes the members of one group of categories of people from others”. One of six dimensions of national culture described in the Hofstede model is Uncertainty Avoidance. Uncertainty Avoidance deals with the degree to which the members of a society feel uncomfortable with uncertainty and ambiguity.
Uncertainty Avoidance Index scores tend to be higher in East and Central European Countries, in Latin Countries (e.g. Belgium 94 and France 86), in Japan (92) and in German speaking countries, lower in English speaking (e.g. USA 46 and UK 35), Nordic (e.g. The Netherlands 23 and Denmark 23) and Chinese culture countries (Hofstede, Hofstede, & Minkov, 2010).
While not completely the same as risk avoidance, a number of recent studies indicate that the national level of Uncertainty Avoidance is negatively associated with corporate risk-taking behavior with regard to e.g. innovative projects (Li, Griffin, Yue, & Zhao, 2013; Li & Zahra, 2012).
Analogously, we could assume that low uncertainty avoidant auditors may be comfortable with the uncertainty inherently included in materiality judgements, for which there are no guidelines to fall back on. By contrast, high uncertainty avoidant auditors might feel anxious in the presence of uncertainty and ambiguity and might lower their materiality threshold.
The results of Arnold, Bernardi and Neidermeyer (2001) lead however to the opposite hypothesis as they conclude that higher Uncertainty Avoidance societies exhibit higher materiality levels. Possibly Hofstede’s cultural dimension Uncertainty Avoidance should be interpreted differently. Instead of assuming that to avoid uncertainty, auditors would decrease their materiality threshold in order to detect more abnormalities, one could also argue “that auditors would expand materiality estimates so that any remaining errors are not material (and therefore not errors)” creating as such a level of pseudo certainty (Arnold et al., 2001).
2.2. Belgium, the Country of Many Differences
“On a map, Belgium looks like one country but is at least two, and arguably three, divided by language, wealth and politics.” (Financial Times, Nov. 3rd 2015).
“Expats may face some culture shock in Belgium, especially when they first arrive. Most notably, there are three main languages and many different cultures all wrapped up in one fairly small country”… “The cultural and linguistic differences can be striking if one travels north into the Flemish areas or south into [the French-speaking region of] Wallonia. The buildings are different, the people are different, and the two communities generally have different traits, so it can sometimes feel like a country divided in half.”
(http://www.expatarrivals.com/europe/belgium/moving-belgium)
While Belgium officially only came into existence in 1830, these differences can be traced back to Roman times, when a battle for influence raged between the Franks (Germans) and the Romans over Gaul (the area Belgium was originally part of). When the Franks took over most of Gaul, the area now described as Wallonia, was already steeped in the Roman language that would later evolve into French. In Belgium’s history, the year 1815 was key. Following Napoleon’s defeat in Waterloo, the United Kingdom of Netherlands was created, and it was decided that territories that were once part of France should be now attached to the United Kingdom of Netherlands. In the years to come the linguistic division between the Walloons, whose language is French, as opposed to the Flemish, whose mother tongue is Dutch, was one of the main reasons for unrest in the southern provinces of the United Kingdom of Netherlands which finally led to the Belgian Revolution in 1830 and resulting in the independence of Belgium (https://theculturetrip.com/europe/belgium/articles/belgium-a-brief-history-of-how-it-all-began/).
The differences between Flemings and Walloons still persist today and are apparent in almost all fundamental social, economic and political functions (Dewachter, 2008). Flemings tend a.o. to have a greater entrepreneurial nature, identified in higher self-employment and new firm creation than Walloons (Sels et al., 2009, 2010). Taken into account that entrepreneurs are perceived as more risk prone than other people (Macko & Tyszka, 2009), this could suggest that Flemings are more risk taking than Walloons.
The above assumption is in line with Hofstede’s model noting that Germanic countries (like The Netherlands to which Flanders adheres most) are lower Uncertainty Avoidance societies than Latin countries (like France to which Wallonia adheres most) (see above). Recent research suggests that the cultural and linguistic differences in Belgium might also matter in an auditing context. Hardies et al. (2016) found that Belgian auditors with a French-speaking affiliation were more likely to issue a GCO.
This suggest that auditors with a French-speaking affiliation might also be more conservative than auditors with a Dutch-speaking affiliation, in setting their materiality thresholds, leading to our second hypothesis:
HYPOTHESIS 2: Auditors with a French-speaking affiliation set, ceteris paribus, lower materiality thresholds than auditors with a Dutch-speaking affiliation.
To verify our hypotheses, we analyze the final written ability exams of future Belgian auditors including an evaluation materiality task. Via this laboratory setting we study differences in the level of materiality judgements between female and male auditor trainees. The socio-culture background of Belgian trainees is captured by their language. The next section describes the laboratory setting and regression model.
3. Data and Research Method
3.1. Participants and Sample Selection
The hypotheses are tested empirically in a semi-laboratory context analyzing the final written ability exam of future Belgian auditors. To be approved as an auditor in Belgium, auditor trainees must first pass a test of theoretical knowledge and then complete a minimum of three years of practical training. At the end of the training period, the auditor trainee’s practical and theoretical knowledge is tested in an ability’s exam. This exam is organized by the Professional Body of Belgian Auditors (IBR) once each semester (May/June and November/December). It consists of a written and an oral proof.
The written exam is traditionally prepared by an experienced auditor. It includes a practical case (the same within but different in between the exam sessions). The trainees receive some background about a company and a set of specific situations and/or problems. They also receive the financial statements of the company. The exam is open book and is conducted in computerized manner allowing the trainees to consult all accounting or legal texts. The exam time is approximately 7 hours. Using all the available data, the trainees must prepare an audit memorandum summarizing the main audit findings. They are also explicitly asked to set a materiality threshold. Finally, an audit report including an audit opinion should be drafted.
The correction of the written exam and the evaluation of the oral exam are done by a number of exam juries, each consisting of five members: a professor (chairman), three auditors and a person representing the public interest. To guarantee a consistent approach by the different exam juries, a model solution of the written exam, including the “optimal” materiality level, is prepared by the author of the exam.
To succeed, trainees must realize an acceptable overall score, that is, candidates with a poor score on the written exam can still pass with an outstanding performance on the oral exam, and vice versa. Trainees are allowed a maximum of five attempts to succeed the final ability exam.
We gained confidential access to the written exam papers of the first and second session of 2007, 2009, 2010 and 2012 (i.e., 8 exam sessions). The oral exams are not public and could therefore not be part of our research design. In total 392 trainees took part in the 8 exam sessions. Only trainees that succeeded in the overall exam (written and oral) were retained in our experiment. In total, 44% or 171 trainees succeeded in the overall exam (Table 1).
While the concept of materiality permeates the audit process and describes to a large extent the scope of the auditor’s responsibility, it is remarkable that of the 171 total cases, 11 missing values (or 6%)2 were noted, meaning that 11 trainees did not include a materiality level in their written exam. Our analysis will further make use of the 160 valid cases.
3.2. Bias
A laboratory setting based on a case study is an imperfect mirror of a real-world situation. An exam setting in particular, could also include more problems than would normally occur in an actual case, as it wants to test the trainees as much as possible on their knowledge and understanding. Trainees will be more focused on looking for and spotting crucial problems. In an exam setting, trainees are not subject to external pressure but the time available for processing information is limited. In the particular case of an audit exam, no independency or budget issues arise. The trainees are also not able to ask for further explanations. Research also suggest that incremental levels of accountability (e.g., justification, review, feedback) of materiality judgements) increase judgement conservatism (DeZoort, Harrison, & Taylor, 2006). Considering all these observations, it is to be expected that the trainees will be more risk-averse and will use a lower level of materiality in their audit report than in real audit engagements.
While trainees might not be typically the ones making materiality decisions in practice, the trainees included in this study have however a reasonable degree of experience. As they succeeded in their ability exam, they became certified auditors in only a couple of months after their exam. As such they are allowed to undertake audit engagements completely independently and to sign the audit report. Moreover, Knechel et al. (2015) not only provide evidence that aggressive or conservative reporting varies systematically across individual auditors but also persists over time.
Table 1. Derivation of sample and materiality thresholds per exam session.
3.3. Research Method
We test our two hypotheses by estimating regression model (1).
(1)
To identify the level of conservatism in materiality judgements we calculate the Relative Median Deviation (RMD) which tells us how much the materiality level calculated by each trainee (i) within exam session (x) differs from the median materiality level observed in exam session (x).
This relative measure allows us to compare levels of variance not only within but also between exam sessions. The lower the Relative Median Deviation, the lower the calculated materiality threshold.
SEX is our first test variable and takes a value of 1 for a female trainee and 0 for a male trainee. As discussed in the literature review, we capture the socio-culture background of Belgian trainees by their language (LANG). LANG is our second test variable and takes a value of 1 for a Dutch-speaking trainee and 0 for a French-speaking trainee.
ATTEMPT signifies the number of times trainees have taken the final written ability exam and varies between 1 and 5. Studies have found that students (and female students in particular) tend to become more prudent if they must retake an exam repeatedly (e.g., Cipriani, 2018), so we expect a negative coefficient for this variable.
To counter concerns that the results for the test variables SEX and LANG might be affected by firm-specific characteristics we also include a control variable BIG4, which is a dummy variable that takes a value of 1 for trainees working for a Big4 audit firm and 0 for trainees working for a non-Big4 audit firm. In line with the traditional theory of DeAngelo (1981) Big4 auditors tend to be more conservative in their opinions (e.g., Cano-Rodriguez, 2010; Krishnan and Krishnan, 1996) and tend to assess materiality at a lower level than non-Big4 auditors (Blokdijk, Drieenhuizen, Simunic, & Stein, 2003). We expect a negative coefficient for this variable.
4. Results
4.1. Overall Statistics on Materiality Thresholds
Table 1 summarizes the overall statistics about the materiality thresholds formulated by the different trainees for each exam session.
In line with the extant research in materiality judgements suggesting that auditors do not treat materiality uniformly, with large discrepancies between thresholds applied in practice materiality levels within the same exam session differ substantially. Table 1 shows that the largest materiality threshold (max) was between 4 to 90 times larger than the smallest materiality threshold (min) (with an average ratio max/min of 28).
Table 2 shows that profit, shareholders’ equity, turnover and total assets are (in descending order) the most important financial variables that are used in calculating the quantitative rules of thumb. Some trainees included more than one financial variable of which they calculated the average or after reasoning only retained one. In 11 cases no description of the variable used was included in the written exam.
In line with literature (a.o., Chewning & Higgs, 2002; Eilifsen & Messier, 2014) the most commonly used financial variable was “profit” (103 over 160 trainees, or 64%). Most trainees used Earnings Before Taxes (EBT)3 and the most commonly used percentages over profit were 5% to 10% which are in line to the results of Blokdijk, Drieenhuizen, Simunic and Stein (2003).
The rules of thumb used by the trainees (median) seem still consistent with the original work of Leslie (1985) :
· 5.0% of the profit before taxes;
· 0.5% of turnover;
· 1.0% of the shareholders’ equity (at book value);
· 0.5% of the total assets.
The lack of consistency in calculating materiality thresholds thus seems not to be driven by the diversity in quantitative rules but is rather due to qualitative aspects and specific circumstances as shown in Table 3.
In line with Paape and Van Buuren (2011) especially the general economic situation (merely going-concern) of the company had a significant negative effect on materiality. The size of adjustment made by trainees varied however enormously ranging from no adjustment at all to halving the original calculated materiality threshold, with one trainee making a downward adjustment of even 80%.
Table 2. Summary of quantitative factors used as rules of thumb for materiality thresholds.
Table 3. Summary of qualitative factors cited by trainees in determining materiality thresholds.
Notice: Not in all exam sessions the same qualitative factors were incorporated at the same level and some trainees cited more than one qualitative factor. The results should therefore be interpreted with care.
The same range of adjustments was noted about an increase in the complexity of the audited entity. A low degree of complexity did however not necessarily increase the materiality level. A similar effect was noted regarding the strength of the internal control of the company. While a weak internal control encouraged trainees to adjust materiality downward, a strong internal control was no reason for increasing the materiality level.
While all companies to be analyzed were for-profit the activity of the company was still noted to be an important element, especially in choosing the type of financial variable as a starting point for calculating materiality, i.e. profit.
In six of eight exam sessions fraud was mentioned, albeit in different degrees going from a closed incident to serious fraudulent practices still going on. The qualitative factor fraud was however only cited by 11 trainees. Materiality was then adjusted downward to extend control procedures and increase the chance of discovering any other fraud. The limited number of references to fraud is somehow remarkable as fraud is listed as the most important qualitative factor determining materiality (Eilifsen & Messier, 2014; Securities and Exchange Commission (SEC), 1999). This might confirm the study by Knapp and Knapp (2001) which states that less experienced auditors (in this case trainee) do not yet have enough experience and the right knowledge to correctly assess the risk of fraud.
4.2. Sex and Socio-Cultural Background
As the calculated materiality levels seem to differ substantially across auditors looking at the same information and this variation seems not merely to be driven by the diversity in quantitative rules, we can assume that the individual characteristics of the trainees also impact materiality judgements. Trainees might interpret the qualitative factors and specific circumstances noted in the respective case studies differently along their own risk perception and risk aversion.
Variables definitions: Relative Median Deviation (RMD) is defined as the deviation between the individual calculated materiality and the median materiality divided by the median materiality. SEX is a dummy variable with a value of one in case of a female trainee. LANG is a dummy variable with a value of one in case of a Dutch-speaking trainee. BIG4 is a dummy variable with a value of one in case the trainee works for a Big4 audit firm. ATTEMPT is the number of times trainees have taken the final ability exam measured on an ordinal scale (range 1 - 5).
Statistical significance based on two-tailed tests at the 1%, 5%, and 10% levels are denoted by ***, **, and * respectively. T-tests were used for comparing means of the dependent variable RMD between the dichotomous variables SEX, LANG and BIG4. One-way analysis of variance (ANOVA) was used to test for differences in the means of the dependent variable RMD broken down by the number of ATTEMPTs.
The univariate results in Table 4 show rather large differences between both female and male trainees and French- and Dutch-speaking trainees. The Relative Median Deviation is significant lower for female (p = 0.08) and French-speaking (p = 0.03) trainees. Looking at differences for trainees working for a Big4 or a non-Big4 audit firm and the number of attempts, we observe no significant difference.
Variables definitions: Relative Median Deviation (RMD) is defined as the deviation between the individual calculated materiality and the median materiality divided by the median materiality. SEX is a dummy variable with a value of
Table 4. Univariate Results (dependent variable = RMD) (n = 160).
one in case of a female trainee. LANG is a dummy variable with a value of one in case of a Dutch-speaking trainee. BIG4 is a dummy variable with a value of one in case the trainee works for a Big4 audit firm. ATTEMPT is the number of times trainees have taken the final ability exam measured on an ordinal scale (range 1 - 5).
Statistical significance based on two-tailed tests at the 1%, 5%, and 10% levels are denoted by ***, **, and * respectively. The pseudo R2 (0.013) is very low but fields to predict human behavior typically have low R2-values.
The univariate results are confirmed by the results of the multivariate regression analyses (Table 5). Both test variables SEX and LANG are statistically significant and have the predicted sign. The control variable BIG4 is not significant, indicating that, opposed to previous research, Big4 auditors are not more conservative in assessing materiality. No significant difference emerged from the number of times trainees have taken the exam (ATTEMPT).
The results indicate that female trainees set lower materiality thresholds than male trainees. In addition, also French-speaking trainees set lower materiality thresholds than Dutch-speaking trainees. Hypothesis 1 and 2 are thus supported. Female auditors on the one hand and auditors with a French-speaking affiliation on the other hand tend to be more conservative in setting materiality thresholds than their male and Dutch-speaking counterparts.
5. Conclusion, Limitations and Implications
5.1. Conclusion
In this paper, we examined whether the individual auditor characteristics sex and socio-cultural background affect materiality judgements to be either conservative or aggressive. Analyzing the final written ability exam of 160 Belgian trainees, our results demonstrate that considerable variance exists in materiality judgments across auditors, mainly caused by the subjective interpretation of
Table 5. Multivariate regression results (dependent variable = RMD) (n = 160).
qualitative factors or specific circumstances. Our results indicate that female auditors on the one hand and French-speaking auditors on the other hand set lower materiality thresholds than their male and Dutch-speaking counterparts.
5.2. Limitations
The results of this paper should be interpreted with some caution due to possible limitations. First, we need to consider the bias resulting from a laboratory setting. Differences can occur in materiality levels between auditors in an exam setting and auditors in a real-world setting in which actual audit judgments are made. Also, our research used auditor trainees with on average 5 years of audit experience, while in real-world settings the auditor or engagement partner will on average be more experienced. Second, as only a limited variation is explained by our model, we should be aware that our results could be driven by omitted variables.
5.3. Implications
Notwithstanding these limitations, the reported findings should be of interested to practitioners and regulators. In line with theory suggesting that management diversity can be a benefit for organizations, allowing for the use of more diverse human skill sets (Opstrum & Villadsen, 2014), also diversity in audit (management) teams could be considered in order to lessen or compensate for the differences in judgement among auditors. The Big4 audit firms and The Professional Body of Auditors in The Netherlands (NBA) recently stated in their discussion paper on the Dutch audit profession that a diverse composition of the audit team in all stages of the audit process contributes to quality (Dinkgreve et al., 2017).
NOTES
1The author would like to thank the workshop participants at the 42nd Annual Congress of the European Accounting Association (2019) and the 10th European Auditing Research Network (EARNet) Symposium (2019). I gratefully acknowledge the help of the Professional Body of Belgian Auditors (IBR), Wout Verhaeren en Marie-Laure Vandenhaute for their assistance in collecting data.
28 of 11 missing values relate to exam session 4 in which reference to the materiality threshold was not explicitly asked for.
3Other profit definitions used were Earnings Before Interest, Taxes, Depreciations and Amortizations (EBITDA), Earnings Before Interest and Taxes (EBIT), Earnings After Taxes (EAT) or Net Income (NI).