Reasoning with the THOG Problem: A Forty-Year Retrospective

Being able to create new information from already existing premises is the essence of human reasoning. This paper focuses on one of the most important experimental tasks that have been used to study how people make inferences: the THOG problem (Wason, 1977, 1978; Wason & Brooks, 1979). It is a hypothetico-deductive reasoning problem in which subjects must formulate and test hypotheses from the comprehension of an exclusive disjunctive statement. Research on this task has shown that it is a difficult problem to solve and few people reach the logically correct answer. This paper presents some of the main theoretical explanations about people’s inferences with this task. From a general perspective, the Dual Process and the Hypothetical Thinking Theories and the Mental Models Theory are found. Some of the more specific proposals have focused on analysing the underlying mechanisms of the cognitive biases such as the Confusion Theory or the Non-Consequential Thinking. Moreover, a review of the empirical investigations on this meta-inference task is presented. Finally, some research on the THOG problem that provides important new clues on broader topics in the study of human reasoning is analyzed.


Introduction
Reasoning is one of the cognitive processes in which psychologists studying thought have shown interest. Knowing what type of mechanisms subjects use to elaborate a conclusion or studying which factors modulate inferences are some of the fundamental axes of empirical investigation on human inference.
Peter Wason, deemed one of the pioneers of modern Psychology of Reasoning, set forth a series of experimental tasks to study how problems that require planning, hypothesis proposal and consequence inference are solved (see for example Newstead, 2003or Manktelow, 2021, for an excellent biography of P.C. Wason).
This work centres on one of these tasks: The THOG problem (Wason, 1977(Wason, , 1978Wason & Brooks, 1979). Firstly it will be explained what this task consists of and which are the most common responses people give. Secondly, some of the main lines of investigation on the THOG problem will be analysed. Then, the main theoretical explanations on reasoning with this task will be set out. This work will also analyse how the study of reasoning has advanced from the empirical investigation of the THOG. The present retrospective study has been developed from the 1970s to the current decade.
The THOG is a hypothetico-deductive experimental task in which participants must reason according to an exclusive disjunctive rule and where they create and test hypotheses to reach the correct answer. The original version of the problem (Wason, 1977) is as follows: In front of you are 4 designs: black diamond, white diamond, black circle and white circle (see Figure 1). You are to assume that I have written down one of the colours (black or white) and one of the shapes (diamond or circle.) Now read the following rule carefully: "If, and only if, any of the designs includes either the colour I have written down, or the shape I have written down, but not both, then it is called a THOG". I will tell you that the black diamond is a THOG. Each of the designs can now be classified into one of the following categories: a) definitely is a THOG, b) insufficient information to decide, c) definitely is not a THOG. As had happened a decade before with another metainference task created by Peter Wason, the Selection Task (Wason, 1966), when the participants tried to solve the THOG, fewer than a third of them got the right answer, which was "the white circle is a THOG and the other two designs are not". To reach this response, participants have to start with the THOG example in the task statement (the black diamond is a THOG), and from that, create hypotheses on what the experimenter has written, taking the stated disjunction into account. The hypotheses are: 1) the experimenter has written white diamond and 2) the experimenter has written black circle. There follows a comparison of each hypothesis with the task designs. If the experimenter has written "white diamond", the design which may be a THOG is the white circle, since it is the only one with a characteristic that coincides. If the experimenter has written "black circle", the white circle will be the THOG too. Therefore, both hypotheses allow us to conclude that apart from the black diamond, "the white circle is also a THOG and the other two designs are definitely not".
The most commonly made biases are the "intuitive errors": Intuitive error "Type A" and "Type B" (Griggs & Newstead, 1983). "Type A" intuitive error mirrors the correct response and leads to the answer "the white circle is not a THOG and the other two designs are THOGs". Intuitive error "Type B" occurs when subjects say: "the white circle is not a THOG and there is insufficient information to decide about the other two designs". Why are these answers repeatedly given? According to Wason & Brooks (1979), these errors stem from the participants focusing on the task designs rather than on the hypotheses that must be elaborated based on what the experimenter has written. Therefore, people assume that the two properties of the designated THOG are the two properties included in the disjunctive rule.
Several theoretical explanations have been put forward about these errors, such as the "common element fallacy" and "perceptual matching bias". The "common element fallacy" consists in assuming that the two properties which define the positive example THOG ("diamond" and "black") are the properties of THOG, which leads to considering "white diamond" and "black circle" THOGs while "white circle cannot be THOG". The "perceptual matching bias" surmises that the participants respond to each design according to whether their properties match the positive example in the explanation or not. This would justify the frequent answer of "the white circle cannot be THOG" as it is considered completely unmatched.
Regardless of the explanation for the errors participants make, it is widely accepted by researchers that the THOG problem is a multitask one (Marek et al., 2000). In consequence, the main difficulty is that its correct solution calls for the use of several cognitive tasks, such as understanding an exclusive disjunction, generating hypotheses about possible combinations written by the experimenter and testing said hypotheses. Moreover, to reach the right solution, extra mental work is necessary as participants have to "think about what is true to find out what is false", or simultaneously consider alternative hypotheses which are mutually exclusive. In this vein, the THOG problem has been defined as a case of illusory inference related to exclusive disjunction and the problems that arise upon thinking about what is false (Manktelow & Galbraith, 2012).
How can the complexity of this task be reduced? Empirical research in THOG has endeavoured to analyse what kind of variables might make it easier to reach the correct solution: Its content? Its structure? The scenario in which it is included? The type of instructions participants receive? And so on. In the following section some of the main empirical studies that have analysed these factors are presented. The objective is not to make an exhaustive review of them, but to put forward the main modulating variables of reasoning highlighted in the research on Wason's THOG problem (see Table 1).

Looking for the Origin of Simplification: Beyond the Effect of Content
Analysing the factors which explain reasoning with THOG and the keys to its difficulty has been and still is the driving force behind over forty years of empirical research (see Martín & Valiña, 2002, 2003, for reviews on disjunction and the THOG problem, respectively).  Evans et al., 1993: Chapter 4;Manktelow, 2021;Newstead & Evans, 1995or Valiña & Martín, 2016, for a review. See also Griggs, 1983, for a publication concerned with the role of problem content in the selection task and in the THOG problem). Specifically, the results of the first thematic versions of the selection task have been interpreted in terms of simplification for the content (Wason & Shapiro, 1971;Johnson-Laird et al., 1972). Nevertheless, this effect is neither simply thematic nor is it necessarily a facilitator in the sense that it improves logical reasoning. In fact, "the thematic facilitation effect shows us the extreme sensitivity of reasoning or decision making to pragmatic factors" (Evans, 2000: p. 45). In this regard, the THOG problem and the Wason selection task "challenged the notion that human cognitive development culminates in a formal-logical stage" (Bellini-Leite & Frankish, 2021: p. 207). In hindsight, today it is considered that the effect of thematic facilitation in the four-card problem may be oversimplified and imprecise (Evans, 2017). In general terms, the content looks like a critical factor for execution, but not in the way it has been interpreted on occasion.
Specifically, what has happened in the studies on the influence of content in reasoning with the THOG problem? The first results seem to indicate that this variable was not the key to ease reasoning, unless another kind of significant change in the task were made. For example, one of the first thematic versions that gave way to consistent facilitation (over 70% of participants got the right answer), was the "drug problem", proposed by Griggs & Cox (1982), in which Psychology Table 1. Some empirical research concerning the THOG, in chronological order.

Authors
Version Results-Explanation  DRUG Problem Facilitation: when the problem is phrased in such a way as to make the structure of the task very clear and explicit. Problem representation is key in problem solving.
Newstead, Griggs & Warner (1982) GASTRONOMIC Problem The realism of the task improves performance on the problem when the realistic material cues in the correct answer from memory. Memory cuing explanation.

Smyth & Clark (1986) HALF-SISTER Problem
Realistic content is not sufficient to induce correct answer. Express the exclusive relation through a familiar and known concept (half-sister) improve performance. Girotto & Legrenzi (1989)  Better performance with thematic content than with abstract content.
Better performance with one-other instructions than with standard instructions. Individual differences.
Seoane, Valiña, Rodríguez, Martín & Ferraces (2007) participants had to discover a combination of medication in function of a determined medical treatment. The best performance with respect to the original abstract THOG was explained in terms of problem representation. The underlying structure of it seemed to indicate the need to consider two possibilities, which corresponded to the two hypotheses needed to reach the solution. Despite criticisms received concerning structural differences between the THOG problem and that of drugs, the authors defended that the tasks were about isomorphic ones which required the same logical operations to get the same answer.
One of the first works in which the influence of empirical knowledge on reasoning with the THOG became evident was that of Newstead et al. (1982). In particular, the authors devised "the gastronomic problem", in which different kinds of solid food (meat or ice cream) and sauces (gravy or chocolate sauce) were related. The task consisted of the following: the experimenter tells the participants that they will eat a meal if it includes one of the foods written down by a friend of theirs: a solid (either meat or ice-cream), and a liquid (either gravy or chocolate sauce), but will not eat meat with chocolate sauce or ice-cream with gravy. Knowing that the experimenter eats meat with gravy, the participant must decide which other combinations are acceptable for consumption. The results indicate that a high number of participants gave the right answer: "the experimenter can eat ice-cream with chocolate sauce and not meat with chocolate sauce or ice-cream with gravy".
However, the key variable seemed to be the activation of knowledge of the participants recalled from the content, since the right answer coincided with a more plausible one empirically speaking. This statement was confirmed in a later experiment in which said version was solved successfully (75% of cases) by children of 8 -9 years old, which, according to Piaget, still do not seem to have the formal skills needed to solve these kinds of problems. In line with this and in the same year, Griggs & Cox (1982) proposed the "memory cuing hypothesis", to explain the execution with a thematic version of the four-card task, which included a "drinking age rule" ("if a person is drinking beer then the person must be over 19"). The authors pointed out as a key factor of the registered facilitation, the activation of empirical knowledge and past experience to generate the solution, since the correct answer coincided with the most coherent one on an empirical level. A similar strategy could be used by participants in the case of the gastronomic version of the THOG. Then, are experience and knowledge the keys of performance?
The realism of the relation included in the task could be an important variable, but not always enough to facilitate making the correct choice. Along these lines, Smyth & Clark (1986) came up with a new thematic version ("the half-sister problem") in which an everyday disjunctive expression was used: the concept of half-sister (a woman who shares either the mother or the father, but not both). The authors showed participants the names of four women and their parents. They were told that one of them, Robin, was the experimenter's half-sister and were asked which of the other three were also half-siblings. 95% of participants got the correct answer. However, this problem has been criticised for being considered more of a test of classification than an isomorphic reasoning to THOG. In fact, when presented with the same version as in the THOG, the execution worsened. Therefore, it seemed that neither the task content nor the realism of relation included in the rule was enough to lead participants to develop the right strategy.
In the late eighties, research using abstract versions of the THOG that also registered facilitation started to be designed. In this vein, Girotto & Legrenzi (1989) devised new thematic and abstract versions of the task so as to confirm what for them was fundamental to come to the right answer: the need to separate two levels, data and hypothesis. This is the essence of "Confusion Theory".
In particular, the origin of facilitation is in the differentiation between the level of data (that is, the THOG example which is mentioned in the explanation) and the level of hypothesis that the experimenter has elaborated from the THOG pointed out. According to the authors, this is crucial in reaching the correct answer, both in thematic versions ("the Soviet Spies problem" by Girotto & Legrenzi, 1989, "the Blackboard Problem" by O'Brien et al., 1990, "the Pythagoras Problem" by Needham & Amado, 1995), as well as abstract ones ("the MIB-THOG Problem", "the Pub Problem", the TRUMP-FAFNER Problem, "the One-Other THOG Problem", "the SARS Problem", "the SARK Problem") (see Table 1). In consequence, according to "confusion theory", intuitive errors occur because participants have difficulties in separating the data given in the problem (i.e., the values of the identified THOG) from the hypotheses to be generated (i.e., the possible written-down combinations).
From a more general perspective, some authors propose studying the THOG problem trying to eliminate the unnecessary barrier between reasoning research and the more general study of thought (Evans, 2010). In this sense, the way of reasoning with the THOG problem has been related to other types of thinking, such as decision making and participants' own cognitive limitations. Thus, the difficulties in reasoning with disjunctions can also be linked to difficulties in the decision making process. Along these lines,  propose that when participants decide upon everyday issues, they frequently fail to consider all the possible results and consequences of uncertain events. This occurs, for example, in the "disjunction effect" , according to which people prefer to rely on one decision in which a mental simulation is involved, rather than two, even if both lead to the same conclusion. This is proposed in the "non-consequential reasoning hypothesis" by Girotto & Legrenzi (1993). In this sense, incorrect performance of the task could be modulated by the difficulties of thinking through uncertainty (Newstead & Griggs, 1992;. In summary, problems may arise when having to combine possible hypotheses and test them using a disjunctive rule. From this hypothesis participants' errors are linked to limitations in working memory and attention. Following Stanovich (2011), only those subjects of higher ability can decouple their beliefs and previous knowledge from the problem content in order to think hypothetically about complex problems. How can participants' cognitive capacity influence experimental task solving such as the THOG? Two lines of research that have analysed this question are: 2) On the other hand, the role of individual differences in task solving has been studied (this line of investigation is further developed in the next section).

The Study of Individual Differences and the THOG Problem
The main object of pioneering studies on individual differences in the THOG (Mimikos, unpublished, cited in Newstead & Evans, 1995) (1997,1998,2000); Stanovich et al. (2011Stanovich et al. ( , 2016Stanovich et al. ( , 2017; Thompson Valiña et al. (1995Valiña et al. ( , 2000; West et al. (2008), started studying the link between the participants' cognitive skill and its implementation in different experimental reasoning tasks. In effect, people differ in their cognitive capacity, which is echoed in the marks they obtain in psychometric intelligence tests.
Is there a relation between these marks and the execution of experimental reasoning tasks like the THOG? Martín et al. (1998) analyse this question. In this work, participants had to solve two experimental tasks: the original abstract THOG and a thematic version of it: "the Drug Problem". The subjects also performed several psychometric tests that measured comprehension and reasoning capacity. Results indicated that both reasoning and comprehension modulated the execution in both versions of the THOG. Additionally, the thematic version registered better results than those of the abstract original. In later works Seoane et al., 2007) the authors increased the parameters that could explain the execution in the THOG. Not only cognitive capacity and skills, but also variables related to personality and "dispositional trends" seemed to influence reasoning. In particular, Seoane et al. (2007) analysed, on the one hand, the influence of characteristics related to the task itself (content and instructions) and on the other hand, the differential characteristics of the participants. The experimental tasks used were two thematic versions of the THOG: "the drug problem"  and "the pub problem" (Girotto & Legrenzi, 1989), as well as two abstract versions depending on the type of experimental instructions. Participants received the original version, with "standard" instructions or "one-other instructions"). Results confirmed main effects of both variables, content and instructions, but no interactive effects. "The pub problem" in particular, showed better execution than "the drug problem" and the "one-other" instructions obtained better results than the standard ones in the original version. Furthermore, a differential execution by participants was registered when solving the THOG versions. The empirical results indicated that differences in processing capacity were not enough to explain the execution.
In line with Stanovich (1999), the performance was explained not only from computational level differences (algorithmic), but also from differences in intentional level. In particular, verbal reasoning, the ability to understand and solve logical problems or cognitive flexibility turned out to be good predictors of individual differences in reasoning the THOG. Consequently, the execution could not only be explained by differences in participants' cognitive capacity and abilities, but that other variables linked to styles or "thinking dispositions" were also important parameters. The results of these empirical works confirm the proposal by Stanovich (1999Stanovich ( , 2009Stanovich ( , 2011 and Stanovich & West (2000), who defended the need to explain differential execution in reasoning tasks from different levels of analysis (algorithmic and intentional).
Thus far, the different variables that seem to influence reasoning with the THOG task have been presented. Regardless of what these factors are, when participants guess the designs which are THOG or not, in general terms, it is said that there is facilitation. Nevertheless, "we only know they answered correctly, not what led to that answer" (Koenig & Griggs, 2011: p. 66). In the next section, a line of research on the THOG is presented. This work analyses which could be the key components of the task that lead to facilitation, using the analogical transfer method.

Searching for the Origin of Facilitation: The Research on
Analogical Transfer Griggs et al. (2001); , 2004a, 2004b, 2011; Koenig et al. Psychology (2007) started a line of empirical investigation into whether participants were able to isolate the start of a solution to a version of the THOG and use this knowledge to solve the original problem. One of the tasks used is the "Pythagoras THOG problem" (Needham & Amado, 1995). This task included the same dimensions (colour and shape) of the figures as the THOG problem in a narrative context which separated data and hypothesis, and asked participants to make hypotheses about colour and shape combinations a teacher may have written down. The authors propose that the narrative structure is responsible for facilitation (62% correct answers), as it allows for data-hypothesis separation.
However, the explicit requirement for hypothesis generation did not improve reasoning (as had previously been proposed by Girotto & Legrenzi, 1993;Newstead & Griggs, 1992;Smyth & Clark, 1986or Wason & Brooks, 1979. From critical positions it was expressed that the key to the Needham & Amado task was the inclusion of a new characteristic to the problem which had not been contemplated and that could affect results. This was later analysed by . In particular, in "the Pythagoras problem", the designs were numbered " Figure Figure   1 is a THOG" was used instead of "the black diamond is a THOG", as in the original version. The line of research put forward by Koenig & Griggs (2004a, 2004b proposes that, for participants to understand the problem and transfer this knowledge to finding the correct solution for the original THOG, the version used must fulfil two requirements: 1) allow for the separation between the THOG example and the hypotheses on the possible combinations written by the experimenter, and 2) explicitly ask participants to make hypotheses. Koenig & Griggs (2004a, 2004b, 2011 and Koenig et al. (2007), designed a series of studies with different versions of the THOG (such as the "standard dot-cross THOG problem" or "standard letter/number THOG task" (Cordell, 1978), to analyse factors responsible for transfer. Specifically, they analysed the following question: To get the answer, do the participants use heuristic strategies based on superficial similarities of tasks, or is the similarity in problem structure the key factor? Results seem to indicate that the critical variable was the structural similarity among the tasks used.
In general terms, the empirical results within this line of analogical transfer research support dual process theory (Evans & Over, 1996). Following Koenig et al. (2007), the analogical transfer and the dual-process theory explain facilitation on the THOG task. This theory approach, developed in the following section, proposes the existence of two systems or forms of reasoning: system 1, which include quick implicit and automatic processes (type 1 processes) and system 2, consisting of slow explicit processes that require effort (type 2). The latter is responsible for hypothetical thinking and allows for the isolation of the start of the solution of the THOG, which in turn facilitates success in process transfer. The following section sets out the main theory explanations on THOG reasoning and analyses this theoretical proposal.

Thinking about the THOG: Main Theoretical Explanations
On a theory level, several different explanations about THOG reasoning have been proposed. Some of the more specific ones have focused on analysing the origin of the errors, such as the already explained "common element fallacy" or "perceptual matching bias". In fact, much can be known about human reasoning by studying them (Evans & Over, 1996). From a more general perspective, in the context of theories about human inference, among the theoretical proposals that have dedicated most attention to THOG reasoning, we find the Dual Process and Hypothetical Thinking theories (Evans & Over, 1996;Evans , 2007. The Mental Models Theory has explained the difficulty of reasoning with disjunctions in studies on illusory inferences (Johnson-Laird & Savary, 1996, 1999Khemlani & Johnson-Laird, 2017). Specifically, in this historical context, an old student of Princeton University, Mark Johns, has proposed an explanation for the execution with the THOG (see Johnson-Laird, 2000). This approach is set out below.

The Mental Models Theory
The Mental Models Theory (Johnson-Laird, 1983;Johnson-Laird & Byrne, 1991Johnson-Laird, Byrne, & Schaeken, 1992) proposes that participants reason by elaborating semantic representations from the meaning of the premises and using their general knowledge of the world. In short, they construct mental models. This is explained in three points as follows. Firstly, the general phases of a deduction are presented, focusing on reasoning with an exclusive disjunction. The starting point for a deduction is the participants' understanding of the meaning of the premises shown in the task, supported by their understanding of the world, to create a mental model that represents a state where the statements are true. Below is a provisional and possible conclusion, which is true in the constructed model and must be validated. That is, possible counterexamples which make it false have to be searched for. If none exist, inference will be definitively validated and accepted, and if a counterexample exists, said conclusion will be rejected and another created.
According to Mental Models Theory, errors made by participants are not momentary lapsus, but are linked to cognitive limitations on a working memory level. In this vein, one of the main predictions of the theory is that the higher the number of models the participant has to reason, the higher the working memory load, increasing the possibility of making mistakes. Hence, participants tend to initially reason about the least amount of information possible. Only in the case of not being able to reason from the initial explicit model do they develop or use implicit models, fleshing out all the alternative possibilities. A consequence of the previous prediction is the principle of truth, according to which participants only initially contemplate models that express situations in which the premises are true, which in turn minimises working memory load. However, this may be an added difficulty in disjunctive reasoning, where the participant needs to "think what's false". In effect, disjunctions are difficult, and their difficulty lies in that having reasonable intuitions about them is complicated (Johnson-Laird et al., 2012). To correctly reason it is necessary to keep in mind models of more than one possibility, and to represent what is true as well as untrue, thinking about these possibilities consciously. In other words, participants should not rely on just one mental model, but on models spears out in a series of possibilities. In this sense, reasoning depends on intuitions or on deliberations and fully explicit models or both (Johnson-Laird, 2021;Johnson-Laird et al., 2021).
It has been shown that inferences from an inclusive disjunction are more difficult than those from an exclusive disjunction. In the following section we see how this theory explains reasoning with exclusive disjunction, which is the type of rule participants reason in the THOG problem.
From the statement "p or q but not both", the participant represents two initial explicit models, which are the following: where "¬" represents the negation of a clause.
Focusing on the THOG problem and the Mental Models Theory framework, an old Princeton University student, Mark Johns, developed a computer program to explain the reasoning behind this task (Johnson-Laird, 2000). The explanation is as follows: From the initial information in the formulation: "black diamond is a THOG", participants envisage the two characteristics the experimenter has written, and following the "principle of truth", construct the mental models that represent them: black diamond It is then incorrectly inferred that "the white diamond may be a THOG", as it shares one of those characteristics, but they cannot be certain since the other characteristic (black), might be critical. For the same reason, they could infer that "the black circle may be a THOG" as it shares one characteristic, but they also answer that the white circle cannot be a THOG because it shares neither characteristic.
The correct answer depends on fleshing out the initial models to make explicit what is false, in both cases: black ¬diamond ¬black diamond Since there are only two possible shapes and two possible colours, the false cases in both models could be replaced by their corresponding positive characteristics: black circle white diamond A design is THOG if it has one of the characteristics in each one of those models, and is indeterminated (could be or not) if it has one characteristic in only one of the models. It follows that "the white circle is a THOG" because it has one characteristic from the first possibility and one from the second. Furthermore, neither "the black circle" nor "the white diamond" are THOGs because the former has both characteristics from the first possibility and the latter has both from the second.
Research into the THOG has revealed that when participants reason with different thematic versions of the task, answers may vary. Content type, scenario, empirical knowledge and so on are key variables in the execution. For years, one of the criticisms of the mental models theory focussed on the lack of specificity as regards the sort of mechanisms that could determine the role of those factors on reasoning. Thus, Johnson-Laird & Byrne (2002) have proposed the strategies of semantic and pragmatic modulation, which can have two different effects on reasoning: 1) add links between models or 2) block the activation of counterexamples. Initial research into the effect of modulation has been performed in the area of the conditional "if…then" (see, for example Quelhas et al., 2010), even though later, the importance of knowledge on reasoning has also been studied in other connectives such as conjunction, disjunction (see for example López Astorga, 2018Astorga, , 2019Quelhas & Johnson-Laird, 2017;Quelhas et al., 2019;Johnson-Laird et al., 2021), or the negation of said connectives (Khemlani et al., 2014;Macbeth et al., 2014;Yin et al., 2000, among others).
In this sense, Quelhas and Johnson-Laird (2017) have empirically confirmed the existence of differences in the interpretation of the disjunction "or", depending on the content in the formulations and participants' empirical knowledge. In effect, "knowledge modulates the meanings of logical terms" (Johnson-Laird, 2021: p. 221). According to the authors, the disjunctive connective has a key meaning, which leads to an inclusive interpretation. Nevertheless, modulation can affect this interpretation and give way to three different interpretations: exclusive, forward (the first premise implies the second) and backward (the second premise implies the first). As a consequence, the inferences participants judge valid may depend on the possibilities the premises refer to.
Since modulation might change these possibilities of one type of interpretation to another, the type of disjunctive inference devised could also be modified.
The basic proposals of the mental models theory predict and explain infe-M. D. Valiña, M. Martín rences from disjunctions that are systematically false. In that sense, and to reduce the load on working memory, on occasion participants develop procedures that lead them to erroneous conclusions. These errors are "illusory inferences" which may look convincing illusions as they are made when reasoning with connectives such as disjunction (see Pohl, 2017, for a publication on cognitive illusions in thinking). More specific studies on the role of illusions in reasoning are, for example, Johnson-Laird (2006); Johnson-Laird & Savary (1996, 1999; Khemlani et al. (2009Khemlani et al. ( , 2017; López Astorga (2014); Sablé-Meyer, & Mascarenhas (2021); Santamaría & Johnson-Laird (1998, 2000. Thus, one of the lines of research of the last decades in the mental models theory revolves around studies on illusory inferences and how they can be reduced or eliminated. For example, Khemlani et al. (2009: experiment 3) asked participants to make an added inference from the premises, in order to help them think about what is true as well as what is false. For their part, Santamaría and Johnson-Laird (2000) have proposed an "antidote" for illusory inferences. This entails reasoning from disjunctions of physical objects (such as newspaper advertisements), rather than disjunctions based on the truth values of assertions (see also Carriedo et al., 1998;Johnson-Laird & Savary, 1996, 1999Khemlani & Johnson-Laird, 2017).
The existence of illusory inferences "appear to be a decisive test for the use of mental models, because no other current theory predicts their occurrence" (Khemlani & Johnson-Laird, 2009: p. 618). From a wider perspective, the study of these types of inferences is very exemplifying as it helps advance in more general questions in reasoning, such as the nature of human rationality. In effect, the THOG problem, defined as "a case of illusory inference" (Manktelow & Galbraith, 2012: p. 117) has been and still is a useful tool to analyse broader questions in reasoning, such as the debate about human rationality (see for example Viale, 2021), or the study of individual differences (see for example Oberauer et al., 2007, in which authors proposed that individual differences in working memory capacity are a good predictor of reasoning ability).
The Theory of Mental Models has stated the principle of truth as a key to be able to explain both the difficulties in reasoning with disjunctions and the illusions in reasoning themselves. However, other authors question said principle, since participants can easily and accurately represent false possibilities, if instructed to do so . In this respect, from different theory perspectives, other principles have been proposed to explain hypothetical thinking.
In this context, the following section presents the heuristic-analytic theory, the dual process theory and the hypothetical thinking theory.

The Heuristic-Analytic Theory, Dual Process Theory and the Hypothetical Thinking Theory
Dual Process Theories are well known in many fields of psychology (see Ball & De Neys, 2018;Evans, , 2012Evans, , 2018Frankish & Evans, 2009, for reviews).
Focusing on the psychology of reasoning and following Evans (2004), the two roots of the modern dual process theories (Evans, 1989;Sloman, 1996;Stanovich & West, 2000) are the journal article written by Wason & Evans (1975) and the "Two-Factor Theory" (Evans, 1982). Wason & Evans (1975) tried to explain the reasoning behind another metainference task also devised by Peter Wason: the selection task (Wason, 1966(Wason, , 1968. The authors observed discrepancies between participant's execution and the explanations given about how they had solved the task. According to these authors, the differences were due to the activation of two cognitive processes: type 1 and type 2. Type 1 processes select aspects relevant to the task preconsciously, according to linguistic, semantic and/or pragmatic keys. Type 2 is explicit and conscious reasoning processes that develop from previously selected information. The two-factor theory is based on the results obtained by Evans (1972aEvans ( , 1972bEvans ( , 1977 with another conditional paradigm: the negations paradigm. These papers revealed the influence of logical and non-logical factors on reasoning. But "this theory is descriptive and provides no real explanation of the cognitive processes that underlie our observations" (Evans, 2021: p. 124).
Later, Evans (1984Evans ( , 1989 broadened the previous explanation in the Heuristic-Analytic Theory (see Evans, 2004. This theory is a bridge between the two roots described above and the dual process theory of Evans & Over (1996). Evans (1984Evans ( , 1989 explained the origin of reasoning bias via heuristic processes. The analytic processes were initially "mysterious" and related to the deductive competence. In this theory, heuristics are preconscious and their function is to selectively represent relevant information, retrieve and add knowledge from memory. Then, subjects reason with these personalized representations. In this context, participants might make mistakes if they choose logically irrelevant information or do not take relevant information into account when reasoning about it in a second analytical phase. In a review of this early heuristic-analytic theory (Evans, 2006(Evans, , 2007, it is stated that said errors not only occur in the heuristic phase, as defended in the initial version, but also occur during the later explicit reasoning process. Therefore, both types of processing, heuristic and analytic, could be influenced by the participants' beliefs, empirical knowledge or experience. Specifically, Evans (2006) presents the "fundamental heuristic bias" and the "fundamental analytic bias" in order to explain the role of type 1 and type 2 processing in the causes of cognitive biases. In this sense, following Stanovich (2011), the activation of Type 2 thinking was not a guarantee of normative success. At the same time, the "search for counterexamples" and "fleshing-out implicit models" proposed by the mental models theory, might account for analytic reasoning (Evans, 2004).
Let us return to the THOG problem, in order to develop how the Heuristic-Analytic Theory explains reasoning with this experimental task. The type 1 or implicit processes are responsible for the activation of heuristic strategies such as attentional heuristics that may lead to, for example, "intuitive error" in the

THOG.
Specifically, heuristic processes focus on black and diamond as relevant characteristics from the example THOG and both will be decoded into a single mental model of the hypothesis. This model must be rejected by explicit analytic reasoning, in order to find the correct answer. Likewise, participants can automatically activate pragmatic keys which contextualise the problem from beliefs, empirical knowledge etc. "Decontextualisation" is precisely a function of system 2, which focuses on reasoning from the structure of the task, overcoming pragmatic inferences derived from system 1. Type 2 processes are explicit, conscious, specifically human, limited by processing capacity and linked to general intelligence. They are responsible for abstract, analytic and hypothetico-deductive reasoning, required by the formal solution to the THOG. In particular, this type of reasoning leads participants to make hypotheses on what the experimenter has written, compare them to existing designs and explicitly evaluate the answer. Hypothetical thought involves a process of mental simulation through the imagination of possibilities and the exploration of their consequences (Evans 2007(Evans , 2019. A theoretical framework which includes the approaches of dual process theories and explains the THOG problem execution is the Theory of Hypothetical Thinking (Evans, 2007(Evans, , 2019. The basic idea posits that participants create mental simulations to analyse possibilities, both in reasoning and decision-making processes. Both are modulated by three principles: 1) the principle of singularity, considers that participants construct one mental model at a time, which expresses a hypothetical situation, 2) the principle of relevance: the model represented will be the most relevant one on a pragmatic level and 3) the principle of satisfaction: this representation is subject to an explicit (analytic) evaluation that is accepted if satisfactory.
According to the principle of singularity, participants considered only one possibility at a time, but that does not mean they cannot think of other models, for example, contemplating the consequences of alternative choices in decision-making. It is nevertheless difficult to consider other hypotheses, especially if it has to be done simultaneously. Specifically, the THOG problem requires two hypotheses to be considered. One of the theoretical explanations around the difficulty of the THOG focussed precisely on this approach. It involves the non-consequential reasoning hypothesis, developed in a previous section (Newstead & Griggs, 1992;. Regarding the principle of relevance, participants choose to represent the most prominent model on a pragmatic level. This contextualisation of the task relies on empirical knowledge and experience, and is an adaptive process, which can lead to either generating thematic facilitation or errors on a formal level. In this vein, Stanovich (1999) defined "the fundamental computational bias" as a powerful tendency to contextualise problems and decisions from previous knowledge, occasionally giving formally wrong answers. According to Stanovich & West (2000), system 2 aims to suppress the automatic contextualisation of the problem. It is therefore responsible for the process of "cognitive dissociation" (Stanovich, 2009), necessary to reach the logically correct solution in tasks such as the THOG, where the normative answer requires the suppression of pragmatic influences or cognitive illusions.
According to Evans (2006), one of the most fascinating aspects of human cognition is the ability to make suppositions, that is, temporary beliefs which are the basis of a mental simulation of a possible scenario. However, "… it is essential that such suppositions be represented within epistemic mental models that encode their hypothetical nature" (p. 386). Therefore, to make sure the formally correct answer is given, it is necessary to develop an abstract and analytic reasoning, which allows for the explicit evaluation of the answer, accepting it if satisfactory (principle of satisfaction). As Stanovich et al. (2011) suggested, to reason hypothetically, the participant must have a critical cognitive capacity that allows them to differentiate between representations in the real world and those in imaginary situations. He must discriminate between the representation of an action and that of alternative potential actions, in the development of cognitive simulations.
The Dual Process Theory (Evans & Over, 1996) explains the interaction between processes 1 and 2 in the development of human thinking and both the correct execution as well as errors made is justified. In general terms, the existence of dual processes or systems of thought is "one of the most widespread and influential theoretical ideas in contemporary cognitive psychology" (Rhodes et al., 2020: p. 185).

Conclusion
This work has aimed to analyse why the THOG is such a difficult experimental task to solve and present the main theoretical approaches and lines of empirical research that have tackled this question.
One of the most frequently accepted conclusions by researchers studying the THOG is that there is no unanimous agreement on the origin of the task's difficulty or, at the very least, on the key modulating parameters of the execution. On one hand, reasoning with this task revealed biases strategies in individuals. On the other hand, reasoning seems to be different from participant to participant.
Moreover, the content of the problem (not only the meaning of the words but also their relations to one another), the scenario, the experimental instructions participants receive etc. matter just as much as the formal structure of the problem.
Specifically, several explanations on THOG reasoning have been put forward.
Some, more specific ones, focused on analysing why it was such a complex task, such as the non-consequential reasoning task, or the theory of confusion.
We have also developed two theories of human inference that have explained the reasoning behind the task: The Mental Models Theory and the Dual Process Theory. In general terms, both theoretical proposals defend that the behaviour of participants when they are reasoning could be reflected in a "competition" between "processes 1 and 2" (Evans, 2010(Evans, , 2021 or between "intuition/deliberation or both" (Johnson-Laird, 2021), in the determination of the final answer.
Today we know the differential characteristics of both type 1 and type 2 processes, and it is also known that the majority of them are aspects related to intuition and/or reflexion, respectively (Evans & Frankish, 2009;Evans & Stanovich, 2013). However, how they behave is still unknown. It seems that occasionally, "type 1" processes, linked to intuition, are not as effective as traditionally thought (Johnson et al., 2016). In this sense, the "model of the three stages of dual process" has been put forward to analyse "what makes participants think and what triggers type 2 processing" (see Pennycook et al., 2015). Moreover, some recent theories on thinking and reasoning are studying the role of analytic reasoning in human morality, creativity or religious beliefs (see for example Pennycook, 2018). In the Dual Process Theory, a debated issue in THOG reasoning is as follows: when participants face the task, do they decide which designs are THOG without thinking, or do they think before deciding? (Martín & Valiña, 2019). In general terms, is the execution explained from the activation of attentional heuristics associated to system 1? or are the culprits' hypothetical thinking processes linked to system 2? A key line of research related to this question is the study of individual differences, "essential components in dual process models" (Bonnefon & Billaut, 2016: p. 222). The analysis of the influence they have on reasoning, differentiating aspects of participants as regards capacity and cognitive ability, together with mental disposition and styles of thinking, has also contributed to explaining THOG reasoning. In fact, research over the last decades on individual differences has shown that the analytical reasoning required by the experimental reasoning tasks, such as the THOG problem, is related to cognitive ability, thinking dispositions and a range of situational variables. The practical implications of these results in different fields, for example, that of education, politics, etc. have yet to be studied in depth Stanovich, 2021bStanovich, , 2021c.
In the nineties,  predicted that the THOG would be a task that would fascinate researches for a long time. But, why is it such a special problem?, "what is this thing called Thog?" (Manktelow, 2021). It is difficult to say. In 2003, we published a review of the THOG in which we studied this question. In this new work we have aimed to study how theoretical explanations and empirical research have advanced regarding this task. Nevertheless, researchers' interest in studying the THOG is not limited to understanding the problem per se. Empirical studies on this hypothetico-deductive reasoning task have also contributed to "illuminate the nature of human rationality" (Khemlani & Johnson-Laird, 2017; see also Chater et al., 2018;Stanovich, 2021a;Stanovich et al., 2016;Viale, 2021) and to go deeper into "the nature of thought" (Newstead, 2003). In this sense, having more clues about the underlying mechanisms of reasoning, hypothetical thinking or decision making can contribute to a better understanding of people's behavior and why they sometimes make mistakes in different contexts of their daily life.