Predictive Processing Theory in Mind Studies: Cross Points with 4E Cognition and Cognitive Linguistics ()
1. Introduction
One of the most challenging problems of philosophy and science is the nature of mind and the mechanisms by which it operates. The hard problem of consciousness—how physical processes in the brain give rise to subjective experience (Chalmers, 1995: p. 63), or the mind/body problem, remains as difficult and intractable as ever, and there is no solid bridge across the explanatory gap. This study focuses on the contemporary philosophical project of naturalization and its associated strong dependence on the natural sciences. It discusses a useful tool for studying consciousness and the mind in general, based on Bayesian statistics—Predictive Processing Theory (PPT). I start with a brief review of the foundational postulates, the contributions of key theorists such as Karl Friston, Jacob Hohwy, Andy Clark, and Anil Seth, who have refined the theory’s explanatory power in a neurocognitive context, as well as in the creation of generative computer models.
Building on this foundation, the article seeks connections between PPT and the broad research agenda—the so-called dynamic metaphor that considers consciousness as movement and takes into account both the environment and the body—4E cognition (embodied, embedded, extended, enactive mind). Intersections are also found with the postulates of Cognitive Linguistics and especially the role of Conceptual Metaphor in shaping not only thought and language, but also the overall picture of the world. Special attention is paid to the interaction between the unconscious processes that underlie the predictive models of PPT and the linguistic structures that often reveal deeper cognitive constructs.
By integrating these perspectives, the study also aims to shed light on how Predictive Processing Theory (PPT) can serve as a unifying framework for studying mind, behavior and language as a dynamic system. The findings suggest the possibility of deepening interdisciplinary research combining Philosophy of mind, Philosophy of science, Cognitive science, Linguistics and other related disciplines.
2. Predictive Processing Theory
Predictive Processing Theory (PPT), or the theory of the anticipatory brain, is a contemporary, viable and promising concept in cognitive science and neuroscience for studying mind and for developing generative computer models. The Predictive Processing Framework (PPF) seeks to address fundamental philosophical questions, such as how the brain gives rise to consciousness, how bodily processes influence cognition, and how cognitive mechanisms shape perception. By integrating concepts from disciplines like physics, computer science, mathematics, artificial intelligence, economics, psychology, and neuroscience, PPT has become a unifying perspective.
Despite its widespread appeal, Predictive Processing Theory is not without its critics. Some common objections include оvergeneralization. Critics argue that PPT is sometimes applied too broadly, making it difficult to test it empirically or falsify it. The answer of its creators is that PPT is actually a tool or an instrument that can be applied, not a theory that can be verified or falsified. Anyl Seth and Jacob Hohwy propose that Predictive Processing should not be seen as a theory of consciousness, but rather as a theory for consciousness science—that is, a theoretical and methodological tool for mapping relationships between neural mechanisms, cognitive functions, and phenomenological properties (Seth & Hohwy, 2020). The authors advocate for a mapping-based approach, which focuses on connecting biological mechanisms with functional capacities and phenomenological qualities, asking what kind of experience arises from what kind of processing, under which functional constraints.
Moreover, some researchers question whether the brain can feasibly perform the vast number of Bayesian computations required by the theory in real-time аnd there is still debate over how exactly predictive coding is implemented in neural circuits. Nonetheless, this framework is becoming more and more influential among the researchers.
2.1. Main Concepts
PPT is a leading framework that explains perception, action, and cognition as a process of prediction. This theory posits that the brain continuously generates predictions about sensory input and updates these predictions based on incoming information, which also turns body and environment into key factors. Rather than passively receiving stimuli from the external world, the brain actively constructs its perception by minimizing the discrepancy between its predictions and actual sensory input. This discrepancy is referred to as Prediction Error. Moreover, the brain makes an Active Inference—it actively engages with the environment by adjusting its sensory input through actions to confirm or refine its predictions. The brain determines how much weight to give to Prediction Errors based on their reliability, which can vary in different contexts and conditions. This procedure is called Precision Weighting.
The theory provides crucial insights into how conscious experience arises from the brain’s hierarchical prediction models, with some arguing that self-awareness is an emergent property of predictive processes. By emphasizing Active Inference, the theory highlights the role of bodily action in shaping perception and cognition.
2.2. A Historical Glance on the Idea of Anticipatory Brain
The philosophical and scientific intuitions underlying the concept can be traced back centuries, perhaps millennia, but the most direct connection is to the work of Hermann von Helmholtz, who in 1867 coined the term “unconscious inference” (German: unbewusster Schluss) to describe an involuntary, pre-rational, and reflex-like mechanism as part of the formation of visual impressions. Through this mechanism, the brain interprets incoming sensory signals automatically and unconsciously based on prior knowledge and experience. Helmholtz believed that perception is not a passive reception of sensory stimuli, but a process of active inference in which the brain makes predictions to fill in the gaps in sensory data (Von Helmholtz, 1925).
These inferences are unconscious, allowing for rapid and efficient perception of the world. Some authors, such as Link Swanson, find parallels between the Theory of Predictive Processing and Kant’s concepts (Swanson, 2016)—for example, that higher levels of the cognitive system influence perception, with the mind actively constructing experience based on prior knowledge and hypotheses; the effort to explain how the mind recognizes causal structures in the world using only sensory data; PPT relies on generative models to predict data from sensory receptors and can be traced back to Kant’s “schemas” that connect the categories of reason with sensory experience; the idea that perception involves the synthesis of information from different sources is present in both approaches; the role of imagination is noted, etc. Swanson also traces the historical connection between Kant and PPT through the work of Helmholtz, who, according to him, sought to provide a scientific justification for Kant’s ideas.
2.3. Brain as a Bayesian Inference System
At the heart of Predictive Processing Theory lies the concept of hierarchical predictive coding. The brain is thought to operate as a Bayesian inference system, meaning that it constantly updates its beliefs about the world based on prior knowledge and new evidence. This process occurs at multiple levels of the neural hierarchy, with higher levels generating abstract predictions about what is expected to be perceived and lower levels refining these predictions based on sensory input.
PPT postulates that the brain, based on statistical analysis (Bayesian inference), continuously generates and updates a “mental model” of the environment to predict future sensory inputs, which are then compared for error with the actual sensory inputs. There are two research approaches to PPT theory—reductionist and non-reductionist. Reductionism views all different processes as the result of a single fundamental principle—the Free Energy Principle (FEP), according to which self-organizing systems that are in equilibrium with their environment should minimize the amount of free energy, that is, reduce the uncertainty of the possible choices for directing energy to one (Friston, 2010). From an information-theoretic perspective, the equivalent of minimizing high-entropy states is to avoid surprise (Applebaum, 2008). In the non-reductionist form of PPT, the position is taken that there may be processes that are not related to PPT (Clark, 2013, 2016). For example, it can be discussed whether emotional states or the reward system in the brain are the result of predictive processing.
2.4. PPT Concept of Top-Down Information Processing
The conceptual framework of PPT overturns the dominant traditional view that organisms process information coming from the environment from the bottom up—at first uncritically and not included in a cognitive model of the world, and only in the next steps these perceptual representations are used for processing by the cognitive system to build a model of the environment. Some authors in the English-language literature consider PPT to be a revolutionary approach, or even a Copernican turn (Seth, 2021).
According to PPT, the brain constantly makes predictions or hypotheses about the causes of sensory input data, examining them from the top down, layer by layer, against the incoming stream of signals, and on this basis shapes perceptual content and guides actions and learning. The brain’s processing architecture is hierarchical, with higher-order areas predicting activity in lower-order sensory areas. The exact mechanism is some approximation to Bayesian inference or error prediction minimization (Friston, 2010; Hohwy, 2013; Hohwy, 2020). That is, the brain employs Bayesian probability to integrate prior knowledge (priors) with new sensory evidence, updating its models accordingly.
In other words, the fundamental idea of PPT is that the brain is not a passive recipient of sensory information, but rather an active predictor of what is likely to happen in the environment in which the organism is immersed. The brain, based on previous experience and the created internal models of the world, “predicts” what the sensory input data will be and adjusts its expectations based on the newly incoming information. That is, the brain is constantly trying to “predict” what will happen, without necessarily taking into account all the sensory information, but only the error, the difference from the expected. This prediction is made in a cascade at multiple levels in the neural hierarchy of the brain – from the basic sensory areas of the cortex to the areas where higher-order cognitive functions are performed.
2.5. Probabilistic and Unconscious Nature of Predictions
These predictions are not only about the immediate sensory data, they also include predictions about the actions, intentions and interactions of the body with the environment. In this way, the brain builds an internal model of the world that helps it understand and anticipate sensory events. This prediction process is not deterministic, but probabilistic in nature, so that the organism can cope with uncertainty in its environment. The predictions are based on probabilities derived from previous experiences (previous sensory inputs, actions or learned patterns) and are adjusted in response to sensory evidence – the actual inputs from the environment.
A key aspect of PPT is that most of the prediction making happens unconsciously—the brain’s continuous process of error correction goes unnoticed. The use of prior beliefs (prior information) to generate predictions is based on implicit knowledge that we acquire from experience through learning and that is embedded in generative models. Many cognitive processes, including perception and motor activity, are highly automatic and unconscious. For example, when we walk across a room, our brains constantly anticipate where obstacles might be and adjust our movements to avoid them without us realizing it.
2.6. The Importance of Balance in Predictive Processing
Predictive processing must sustain a delicate balance between the familiar and the new. In an environment filled with too many sensory stimuli and ambiguity, the organism/system must sift the important from the unimportant, decide when to give more weight to the expectation, to the internal model, and when to the new, the surprising—and most importantly, what the balance between them is (Clark, 2016). If a problem arises in the mechanisms for Precision Weighing and the balance between top-down expectations and bottom-up perception is disrupted, then the judgment no longer corresponds to reality.
Thus, either we hold too much to an already outdated and irrelevant unconscious model (the problem of overfitting—well known in machine learning) through which we shape our perceptions, which in turn confirm the distorted model, or we give too much importance to irrelevant sensory data and cannot see the faded model of the world. If the balance between predictions and sensory data is disrupted, hallucinations or illusions can occur. Some visual illusions show how the brain can interpret ambiguous or misleading sensory signals based on its expectations, leading to inaccurate perceptions. For example, in Edward Adelson’s famous chessboard shadow illusion (Adelson, 1995), the brain perceives two squares (A and B) as differently illuminated, even though they are the same shade of gray. This shows us how our internal model predicts colors and luminance based on context.
This defect of losing balance in the predictive brain can explain the manifestations of certain mental disorders such as schizophrenia and psychosis, or the mechanism of autism spectrum conditions.
Overfitting is especially relevant in social cognition, where dynamic and unpredictable interactions demand cognitive flexibility. However, PPT includes several intrinsic mechanisms that mitigate this risk effectively.
One key mechanism is Precision Weighting—in complex social environments, where sensory cues—like tone, gesture, or expression—may be subtle or ambiguous, the brain dynamically reduces the weight of low-confidence signals. This allows the system to resist overfitting to misleading or context-inappropriate data (Clark, 2013).
Furthermore, PPT operates hierarchically, with predictions generated at multiple levels of abstraction, from raw sensory input to complex, conceptual representations (Friston, 2010; Clark, 2015). High-level models constrain and interpret lower-level data, allowing for top-down corrections that prevent the system from becoming overly sensitive to surface-level mismatches. In social cognition, this means that a brief facial expression or isolated gesture doesn’t immediately override one’s broader understanding of another person’s intentions or identity (Koster-Hale & Saxe, 2013).
Importantly, model updating in PPT is continuous and context sensitive, not reactive to isolated errors. The brain requires persistent, Precision Weighted discrepancies before significantly revising its beliefs (Hohwy, 2016). This ensures a balance between model stability and adaptability, helping avoid overfitting to momentary irregularities while still maintaining responsiveness to environmental change.
Finally, neuromodulatory systems—notably those involving dopamine—are pro-posed to regulate precision estimation and belief updating (Friston et al., 2012). In social contexts, the default mode network and mentalizing regions such as the me-dial prefrontal cortex and temporo-parietal junction help represent other minds and adjust internal models of others flexibly (Frith & Frith, 2006; Friston & Frith, 2015).
2.7. Principal Differences between PPT and Traditional
Cognitive Theories
To summarize, we can say that PPT offers a new, fruitful and radically different approach from traditional cognitive theories. These differences can be outlined in several directions.
While the traditional cognitive theories view the brain as a reactive system, PPT gives a fundamentally different perspective viewing the brain as a prediction machine that continuously anticipates sensory input rather than passively receiving and processing data. These top-down predictions meet bottom-up sensory data which makes minimizing Prediction Error across hierarchies a central point of discussion on how the brain actually works. On the contrary, traditional theories propose a model in which information flow is primarily bottom-up: from stimulus to response, transforming inputs into representations and outputs.
In terms of the role of perception and action, according to PPT perception is inference based on prediction corrected by the actual sensory evidence. In this situation action becomes rather an active inference, being a means of minimizing Prediction Error. Thus, in PPT action is part of perception. The brain does not passively wait for the environment to get stimulated, it actively predicts and shapes the environmental circumstances through movement. In this point of view PPT differs a lot from the traditional cognitive theories. For them perception is a process of passive decoding of sensory data which leads to action which is a separate output process in response to perception.
Additionally, perceptual modules in PPT are not fully encapsulated, opposing to traditional cognitive theories like Fodor’s approach which states that perception is not influenced by beliefs, desires etc. (i.e. it is informationally encapsulated). On the contrary, for PPT cognitive states can penetrate perception and there is strong interplay between emotion, memory, perception, action. While in traditional theories cognitive domains are often treated as separate modules.
2.8. Use of PPT in AI
PPT is successfully used in the development of Artificial Intelligence. It inspires hierarchical, generative, unsupervised models like in deep learning variational autoencoders (Marino, 2020), transformers (Mentzelopoulos et al., 2024) vs. symbolic logic or connectionist (neural network) pattern matching in Classic Symbolic AI/Connectionism.
Deep generative models (e.g., Variational Autoencoders—VAEs, Generative Adversarial Networks—GANs) learn to predict input distributions, like the brain generating sensory predictions. Contrastive Predictive Coding, for example, learns representations by predicting future inputs. Self-supervised learning models (e.g., Generative Pre-Trained Transformer—GPT, Bidirectional Encoder Representations from Transformers—BERT) trained to predict masked or next-word tokens are highly aligned with predictive frameworks as well (Vaswani et al., 2017). In language models like GPT, the system predicts the next word based on previous ones, as it will be explained in section 3.1. In audio and vision models, it predicts missing frames, sounds, or pixels. These architectures aim to anticipate structure in data and minimize error, much like brains under PPT aim to minimize Prediction Error (Brown et al., 2020).
Active Inference in Robotics inspired by Karl Friston’s “Active Inference AI Systems” act to fulfill predictions not just react to stimuli (Da Costa et al., 2022; Lanillos et al., 2021). This approach has been recruited in situations, where robots need to move to reduce surprise e.g., uncertainty about their environment.
3. Predictive Processing and Language
Predictive Processing Theory views language as a process of actively predicting and minimizing errors in the course of communication and comprehension. Language fits into the basic framework of PPT, being seen as a hierarchically organized system that continuously predicts and adapts incoming information, largely unconsciously. According to PPT, language comprehension involves continuously predicting the following words or sounds in an utterance, speaker’s intention, contextual meanings, and goals. The brain makes hypotheses about future incoming linguistic signals and compares these hypotheses with actual sensory data to minimize errors. When you learn a new language, initial errors (misunderstanding words or structure) gradually decrease as the brain updates its generative models.
3.1. Hierarchy and Active Inference of the Language Processes
Language processes are organized hierarchically: low levels process sensory details, such as phonemes or graphemes; middle levels generate syntactic structures and word connections; high levels process semantic and pragmatic aspects, such as intention and context. Information flows both directions: top-bottom when the brain predicts what you will hear or read, and bottom-top when the actual sensory signals correct these predictions. For example, when you hear, “The cat is chasing…”, your brain automatically predicts words like “mouse” or “ball,” basing its predictions on context and past experience. However, if “pink dinosaur” follows, the prediction error triggers a surprise signal that requires rethinking. When speaking, the brain uses predictions to plan speech based on goals and social context; when listening, it continuously predicts the next words, correcting its hypotheses in real time; reading involves predictions about graphic symbols, grammar, and semantic content. Speed reading is the result of efficient generative models that minimize the need to process each word.
When speaking and listening, we use Active Inference to predict both our own behavior and that of others. It requires the suppression of Prediction Errors by updating an internal model that generates predictions—both at short time intervals (in Perceptual Inference) and at longer time intervals (in Perceptual Learning). If two agents perceive the same model, they can predict each other’s linguistic behavior and simultaneously minimize their prediction errors. In other words, communication induces perceptual learning and allows others to (literally) change our minds and vice versa (Friston & Frith, 2015).
In general, within the framework of PPT, language is viewed as a dynamic process of active prediction and adaptation. It is hierarchically organized, context-dependent, and based on generative models that minimize Prediction Errors. This perspective emphasizes the integration of linguistic processes into the broader cognitive system and their adaptive nature.
Recent empirical studies have begun to explore how these principles apply to language processing, proposing that the brain actively anticipates linguistic input at multiple levels of abstraction and across varying temporal scales.
In a study researching the brain activity across the left fronto-temporal hierarchy during language comprehension, Wang et al. used electroencephalography (EEG) and magnetoencephalography (MEG) datasets as well as structural MRI data. Their findings indicate that the computational principles of predictive coding may account for the time-course dynamics of evoked activity within the fronto-temporal network, which sustains higher-level language comprehension (Wang et al., 2023).
Gagl et al. recorded EEG data in visual word recognition. Their results suggest that pre-existing visual-orthographic knowledge plays a crucial role in refining the representation of visually presented words, thereby facilitating highly efficient reading (Gagl et al., 2020).
Grisoni et al. too evaluated predictive processing models using EEG data. Their observations revealed similar neurophysiological markers of prediction in both language production and comprehension, yet distinct cortical source distributions when predicting words with different meanings (in their study animal vs. tool nouns). These results support the idea that the same distributed neural circuits are engaged during word prediction in both production and comprehension, leading to comparable activation patterns across the two modalities (Grisoni et al., 2024).
Another demonstration of how these principles apply to language processing is offered by Caucheteux et al., who compared human brain activity during natural language comprehension with the internal representations of deep language models such as GPT-2. Using fMRI data, they found that brain regions associated with language, particularly in frontal and temporal cortices, reflect predictive dynamics similar to those in hierarchical AI systems. Crucially, they show that the brain does not merely predict the next word in a sentence, but constructs long-range and semantically rich predictions, involving both syntactic and conceptual structures (Caucheteux et al., 2021).
This alignment between neural and computational prediction highlights not only the neurobiological plausibility of transformer-based AI models, but also suggests directions for improving artificial systems. By drawing on the brain’s ability to generate hierarchical and temporally extended representations, future AI models could be enhanced to better handle long-range dependencies, abstract meaning, and contextual nuance (Caucheteux et al., 2021).
This evidence supports the view that linguistic understanding is deeply inferential in nature, consistent with PPT’s claim that perception and cognition emerge from the minimization of Prediction Errors across hierarchical generative models and that artificial systems may benefit from mirroring this structure.
3.2. Concepts According to the Predictive Processing Theory
Predictive processing theory offers a powerful framework for understanding concepts by viewing them as dynamic internal models that the brain uses to predict and interpret the world (Newen, De Bruin, & Gallagher, 2018: pp. 241-261). Within this framework, concepts are viewed as generative models which play a key role in minimizing prediction errors and making sense of sensory data in a given context. These models are built on previous experience and adapt in real time by minimizing the differences between expected and received signals (prediction errors). If you see an object with a round shape, your brain may activate the concept of “ball,” which includes predictions about other properties of the object. For example, that it can roll or is made of a certain material. The hierarchical structure of concepts goes through the lower levels that process specific sensory data such as shape, color, and texture. They focus on the details of perception; higher levels support more abstract and generalized concepts, such as prototypes and categories. They also integrate information from lower levels to form more complex interpretations in the presence of dynamic interaction—higher levels send top-down predictions that guide perception, and lower levels send feedback to correct predictions.
Concepts aid in making predictions, facilitating perception and understanding, and make the world more predictable. They also reduce cognitive load by allowing the brain to “fill in” missing information based on previous experience. Concepts serve as a means of reducing surprise, allowing for the rapid recognition of objects, events and situations. When you walk into a kitchen, your concepts of “chair”, “table” and “refrigerator” help the brain interpret the sensory data it receives quickly and efficiently.
Concepts are formed and improved through learning and experience. They are the result of Bayesian inference processes in which the brain adjusts its models based on new sensory data. If a concept fails to explain the sensory data received, the model is updated to reflect the new information. The way a new idea or concept is built is by encountering a new sensory pattern that leads to a high-weight Prediction Error. High-weight Prediction Errors, if the system is unable to explain them by any of the existing internal models, increase plasticity and the acquisition of new knowledge about the shape and nature of the causes of the surprising sensory data (Clark, 2016: p. 341). We can list several types of concepts. In visual concepts, there is recognition of objects, faces, or situations through predictions of shape, color, and motion. For example, the brain recognizes “cat” by the way it moves and meows, even in low light. When predicting social signals such as emotions or intentions, concepts help us interpret the situation. If someone is smiling, for example, the concept of a “friendly gesture” makes it easier to understand the other person’s behavior. Abstract concepts such as “fairness” or “freedom” are based on a combination of cultural context, personal experience, and abstract thinking.
4. 4E Cognition
Although modern science, based on physicalism, rejects the ideas of dualism, which in the 17th century Descartes set as a framework for thinking for centuries to come, it inherits the idea that mind/soul is located in a person’s head (Decartes, 1978). So, for a long period of time it searches for it there and only there. However, according to the observations of George Lakoff and Mark Johnson (Lakoff & Johnson, 1999: pp. 363-364), since the time of ancient Greeks and later again through Descartes and his contemporaries, we implicitly accept that thought is a mathematical calculation.
“We can see that the (mathematical) metaphor (for thinking) was common in Europe at the time of Descartes from its well-known use in the writings of his contemporary Thomas Hobbes.” They cite Hobbs who explicitly equals thinking to calculating, saying that when a man reasons, he does nothing but add different parts to a sum, or form a remainder by subtracting one sum from another. These operations are not only applied to numbers, but to all things that can be added together or subtracted from each other (Hobbes, 1651; Lakoff & Johnson, 1999: p. 364). The same notion is attested in the 16th century literary works like for example:
Bassanio:
“Confess and love”
Had been the very sum of my confession:
O happy torment, when my torturer
Doth teach me answers for deliverance!
But let me to my fortune and the caskets.
(The Merchant of Venice, III:2) (Shakespeare, 1596)
We can see this legacy even today in everyday language usages: “She summed up the situation quickly”, “He put two and two together and concluded that…”, etc.
About 4 decades ago new idea arose that went beyond this way of thinking.
4.1. The Holistic Approach of 4E Cognition
In the 20th century, classical cognitive science treated the brain as the main organ of cognition, where information is processed in a way similar to a computer – calculating data in symbolic form. So, the cognitive processes were separated from the environment and the body. On the other hand, the dominant view in Philosophy was that concepts are a direct reflection of the world itself. These are the models of cognition that are embedded in the Computational Representational Theory of Mind (CRTM) and objectivist theories that see concepts as a correspondence with reality. This traditional view was challenged at the end of the last century on the basis of several basic criticisms—it ignores the role of the body and the environment in cognition, it is limited only to amodal representations, and it does not explain well certain important phenomena such as intuition, emotions, and social interaction.
Thus, the dynamic metaphor of mind, or 4E Cognition, emerges. It rejects these limitations, seeks a more holistic approach, and views cognition as embodied, embedded, extended, and enactive. The theory is not the product of a single act of creation by a single author, but is the collective result of many heterogeneous ideas developed in different disciplines such as Philosophy of Mind, Cognitive Science, Psychology, and Neuroscience. Among the researchers with significant contributions are George Lakoff and Mark Johnson, Francisco Varela, Evan Thompson and Eleanor Rosch, Andy Clark and David Chalmers, Shaun Gallagher, and others. It postulates that cognition is located not only in the brain, but is deeply connected to the body, is formed in the context of the environment, can involve external objects and technologies, and is a process of active and agent-initiated interaction with the world.
According to this concept, the role of the body is crucial—sensorimotor systems play a central role in thinking, and emotions and bodily states influence cognitive processes. Cognitive processes are optimized for a certain physical and social environment, which often provides cues and structure for cognitive tasks. The whole chain includes not only the brain, but also external tools and technologies—books, notebooks, calculators, smartphones, which are considered part of the cognitive system. Cognition is an active interaction with the world—it occurs through actions, not through passive reception of information, with subjective experience and temporal dynamics being important.
4.2. Language According 4E Cognition Approach
4E cognition also provides an unconventional perspective on language, viewing it not simply as a symbolic system for transmitting information, but as a dynamic process dependent on the body, the environment, and social interaction. This framework revises the classical linguistic theories that treat language as an abstract, amodal system. Key ideas include the view that language and concepts are closely related to bodily experiences and sensory modalities (Barsalou, 2008: p. 628) and that linguistic expressions and metaphors are often tied to physical experience (Lakoff & Johnson, 1980).
Language develops, adapts and functions in a specific social and physical context. It is a process of active interaction with the world and the social environment; it is not simply a means of describing the world, but a tool for creating meaning through interaction. Communication is a dynamic process that includes gestures, intonations and contextual signals. Real-time interaction involves dynamic adjustments of linguistic expressions according to the reactions of the other participant(s) in the conversation. Linguistic expressions adapt to the social situation and environment (Varela, Thompson, & Rosch, 1991). The use of certain verbal expressions may depend on the presence of specific objects in the environment, and gestures and facial expressions play an important role in supporting verbal communication (Hutchins, 1995). Language can be extended by external tools and technologies. Writing, reading and the use of smartphones and the Internet are examples of extended cognitive processes, with external systems facilitating the storage and transfer of linguistic knowledge. The use of dictionaries, notes, and online translators expands cognitive abilities related to language (Clark & Chalmers, 1998).
4.3. Cognitive Linguistic as Part of the 4E Cognition Framework
Within the conceptual framework of 4E Cognition, an important direction for the study of language ability and mind is Cognitive Linguistics. Cognitive linguistics (CL) is a direction that can be considered as part of cognitive science, focusing on language as a universal cognitive mechanism. According to this approach, the structure and functions of language are based on generally applicable cognitive processes occurring in the brain, while at the same time being closely related to the body, environment and cultural framework. Language is a reflection and consequence of these processes, and language abilities are an integral part of general cognitive abilities. Thinking is figurative, and metaphor, metonymy and figurative structures in general (mental imagery) are a mechanism for forming concepts that are not conditioned by our immediate experience. This trend in science and linguistics emerged at the end of the 20th century as a counterpoint to the traditional objectivist view of language (Skrebtsova, 2000), as well as to Noam Chomsky’s generative grammar.
Conceptual Metaphor
One of the main theoretical constructs within CL is the Conceptual Metaphor of George Lakoff and Mark Johnson (Lakoff & Johnson, 1980). According to them, metaphors are not just rhetorical figures, but a fundamental cognitive mechanism by which people understand and structure abstract concepts through their experience with the physical world. The main role in human reasoning is not played by formal procedures for inference based on symbolic processing, but by analogy as a transfer of knowledge from one content area to another. Thought is related to affects and is mostly unconscious, and abstract concepts are largely metaphorical (Lakoff & Johnson, 1999: pp. 3-4). Lakoff and Johnson use the concept of “cognitive unconscious”, which includes all unconscious mental operations and structures that contribute to our abilities to conceptualize and reason (Lakoff & Johnson, 1999: pp. 9-11).
Among the examples of Conceptual Metaphors that Lakoff and Johnson give are “argument is war”; “time is money”; “theories are buildings” (Lakoff & Johnson, 1980: pp. 4-50). In other words, from a source domain, something concrete about which we have direct sensory experience, we transfer certain characteristics to a target domain—something as yet unknown and more abstract. We argue as we fight, time is a resource just like money, theories have a structure like architectural buildings. That is, the metaphor is not about words, but about thoughts that are closely related to emotions, and the associative process is unconscious.
The embodiment of language is supported by the so-called neural theory of language and functional magnetic resonance imaging (fMRI) neuroscience studies that show how thought is carried out in the brain by the same neural structures that process vision, hearing, sensation, action, and emotion. For example, action words referring to facial, hand, or foot movements such as lick, pick, or kick, when presented to subjects in a passive reading task, differentially activate areas in the motor cortex of the brain that are adjacent to or overlap with areas activated by actual tongue, finger, or foot movement. These results indicate that the referential meaning of action words correlates with somatotopic activation of the motor and premotor cortex. This rules out a single “meaning center” in the human brain and supports a dynamic view that words are processed by distributed neural assemblies with cortical topographies that reflect the semantics of the word (Hauk, Johnsrude, & Pulvermüller, 2004). Studies with similar results can be found in (Pulvermüller, 2005: p. 578; Feldman & Narayanan, 2003; Bergen, 2012).
5. Points of Reference
From the theoretical frameworks thus presented, a number of points of contact between them are clearly noticeable. 4E Cognition and PPT complement each other, providing an explanation of cognitive processes through different but compatible conceptual apparatuses. While 4E Cognition emphasizes embodied, situated, extended and enactive cognition, PP provides a mechanism for how the brain predicts and adapts its behavior and perception, taking into account the environment and the body.
For 4E cognition, cognition is embodied, and PPT postulates that motor and sensory systems play a central role in the predictions of the brain and explains how the brain uses bodily models to make predictions. Motor activity is the product of generative models that predict the outcomes of actions. On this point, 4E also emphasizes that the body is not just a tool, but an active participant and inextricably linked to the whole element of cognition. For example, when walking, the brain predicts the position of the feet based on bodily experience and sensory feedback.
According to the Predictive Processing Theory, the brain anticipates upcoming sensory input based on the environment and minimizes errors by adapting to external conditions. 4E considers how these signals shape cognitive processes. For example, interaction with an environment such as an office involves predictions about objects (computers, desks), which 4E views as structuring cognition, and PP as sources of predictions.
The points of reference between 4E and PPT are enlisted in the table:
According to 4Е, cognition is: |
In the concepts of PPT: |
Embodied |
Generative models include bodily predictions and motor dynamics. |
Embedded |
Environment provides context, which minimizes the prediction errors. |
Extended |
External instruments are involved in the prediction processes. |
Enactive |
Actions result from adaptive prognosis, which minimize insecurity. And through Active Inference the system engages with the environment by adjusting its sensory input via actions to confirm or refine its predictions. |
5.1. Language Processes in 4E Cognition and PPT
4E Cognition and Predictive Processing Theory overlap in viewing language as a dynamic, body-bound, context-dependent, and interactive process. For 4E, the body and sensorimotor activity are integral to linguistic processes. Language understanding (from a Cognitive Linguistics perspective) is often based on bodily metaphors based on physical experience, e.g., “raising a question,” “hard argument.” Language is not only a cognitive activity, but also a sensorimotor activity involving articulation and perception. For PPT, sensorimotor systems play an important role in predicting phonemes, words, and sentences. The brain generates predictions about incoming sensory data, including auditory and visual aspects of language.
According to the 4E, linguistic processes are deeply rooted in context and environment, and language serves to adapt and navigate social and physical conditions. In PPT, predictions related to language include information from the environment that helps faster and more efficient comprehension. For example, it is possible that visual stimuli (such as objects in a room) can influence the interpretation of speech. Concepts serve as internal models that help recognize and navigate reality. Both theories assume that language processes can be extended by external tools that become functionally integrated into the cognitive system.
Regarding the hierarchy of language processes, there is again overlap between the two theories. In 4E, language processes involve different levels—from sensorimotor activity to the integration of social and cultural semantics. According to PPT, language is processed hierarchically—the low levels predict phonemes and graphemes, the middle ones—syntactic structures, and the high ones—semantic and pragmatic meanings.
5.2. PPT and Cognitive Linguistics with the Conceptual Metaphor
Theory
Both theories emphasize that language is not a stand-alone system, but part of broader cognitive processes based on experiences and modeling, and that bodily experience is the basis for linguistic understanding and structuring of abstract concepts. For Cognitive Linguistics, language is embodied because abstract concepts are often based on bodily experience. Embodiment is a core element of PPT as the brain uses bodily experience to predict the meaning of language signals, including metaphorical expressions. Both theories emphasize that linguistic structures and meanings are constructed hierarchically, associatively, and in context.
Conceptual Metaphors in Cognitive Linguistics and predictive models in PPT are mechanisms by which the brain makes abstract concepts accessible and understandable. Through the lens of PPT, metaphors can be viewed as generative models that the brain uses to predict meanings in abstract domains. For example, when you hear the phrase “crossroads in life,” the brain envisions a scenario based on the specific experience of crossing paths and the choice that needs to be made.
6. Conclusion
Over the past 15 years, Predictive Processing Theory has been one of the most influential frameworks in neuroscience and philosophy for exploring the nature of the mind. It has transformed our understanding of how living organisms interpret and adjust to their surroundings. However, its roots can be traced back through centuries, if not millennia, of philosophical thought and intuition. By focusing on unconscious and automatic processes, PPT offers a powerful framework for bridging the gap between subjective experience and its biological foundation in the brain and body. It has also become an essential tool in the development of artificial intelligence, with its application—both explicit and implicit—expanding across a wide range of projects. Furthermore, PPT shares deep conceptual ties with other significant research programs in the Philosophy of Mind, such as 4E cognition and Cognitive Linguistics. A closer analysis reveals that its theoretical structure provides insights into the mechanisms underlying our linguistic abilities, which serve as the gateway to human mind.